From oferg at mellanox.co.il Mon May 1 01:22:35 2006 From: oferg at mellanox.co.il (Ofer Gigi) Date: Mon, 1 May 2006 11:22:35 +0300 Subject: [openib-general] [PATCH] osm_lid_mgr.c : exit only if exit_on_fatal in case of corrupted guid2lid file Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FE009A@mtlexch01.mtl.com> Hi Hal, The default of opensm is to exit_on_fatal. However, opensm can overcome sometimes fatal errors. One of this errors is a corrupted guid2lid file. Therefore, if you want opensm to overcome this problem you can use -y option (don't exit on fatal) and opensm won't exit in case of a corrupted guid2lid file - it will just put an error in the log. Thanks Ofer G. Signed-off-by: Ofer Gigi Index: osm_lid_mgr.c =================================================================== --- osm_lid_mgr.c (revision 6640) +++ osm_lid_mgr.c (working copy) @@ -304,11 +304,19 @@ osm_lid_mgr_init( { if (osm_db_restore(p_mgr->p_g2l)) { + if (p_subn->opt.exit_on_fatal) + { + osm_log( p_mgr->p_log, OSM_LOG_SYS, + "Fatal: Error restoring Guid-to-Lid persistent database\n" ); + status = IB_ERROR; + goto Exit; + } + else + { osm_log( p_mgr->p_log, OSM_LOG_ERROR, "osm_lid_mgr_init: ERR 0317: " "Error restoring Guid-to-Lid persistent database\n"); - status = IB_ERROR; - goto Exit; + } } /* we need to make sure we did not get duplicates with From oferg at mellanox.co.il Mon May 1 01:28:11 2006 From: oferg at mellanox.co.il (Ofer Gigi) Date: Mon, 1 May 2006 11:28:11 +0300 Subject: [openib-general] RE: [PATCH] osm_lid_mgr.c : exit only if exit_on_fatal in case of corrupted guid2lid file Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FE00A0@mtlexch01.mtl.com> Hi Hal, Please apply to trunk and branch. Thanks! Ofer -----Original Message----- From: Ofer Gigi Sent: Monday, May 01, 2006 11:23 AM To: 'openib-general at openib.org' Cc: 'halr at voltaire.com' Subject: [PATCH] osm_lid_mgr.c : exit only if exit_on_fatal in case of corrupted guid2lid file Hi Hal, The default of opensm is to exit_on_fatal. However, opensm can overcome sometimes fatal errors. One of this errors is a corrupted guid2lid file. Therefore, if you want opensm to overcome this problem you can use -y option (don't exit on fatal) and opensm won't exit in case of a corrupted guid2lid file - it will just put an error in the log. Thanks Ofer G. Signed-off-by: Ofer Gigi Index: osm_lid_mgr.c =================================================================== --- osm_lid_mgr.c (revision 6640) +++ osm_lid_mgr.c (working copy) @@ -304,11 +304,19 @@ osm_lid_mgr_init( { if (osm_db_restore(p_mgr->p_g2l)) { + if (p_subn->opt.exit_on_fatal) + { + osm_log( p_mgr->p_log, OSM_LOG_SYS, + "Fatal: Error restoring Guid-to-Lid persistent database\n" ); + status = IB_ERROR; + goto Exit; + } + else + { osm_log( p_mgr->p_log, OSM_LOG_ERROR, "osm_lid_mgr_init: ERR 0317: " "Error restoring Guid-to-Lid persistent database\n"); - status = IB_ERROR; - goto Exit; + } } /* we need to make sure we did not get duplicates with From oferg at mellanox.co.il Mon May 1 01:33:10 2006 From: oferg at mellanox.co.il (Ofer Gigi) Date: Mon, 1 May 2006 11:33:10 +0300 Subject: [openib-general] [PATCH] osm_switch.c : bug fix Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FE00A9@mtlexch01.mtl.com> Hi Hal, Bug fix: In the function osm_switch_get_fwd_tbl_block in osm_switch.c we were missing one block in case the maximum lid was the multiplication of lids_per_block (==64). Adding <= instead of < to fix the problem. Please apply to trunk and branch. Thanks Ofer G. Signed-off-by: Ofer Gigi Index: osm_switch.c =================================================================== --- osm_switch.c (revision 6640) +++ osm_switch.c (working copy) @@ -191,7 +191,7 @@ osm_switch_get_fwd_tbl_block( lids_per_block = osm_fwd_tbl_get_lids_per_block( &p_sw->fwd_tbl ); base_lid_ho = (uint16_t)(block_id * lids_per_block); - if( base_lid_ho < max_lid_ho ) + if( base_lid_ho <= max_lid_ho ) { cl_memclr( p_block, IB_SMP_DATA_SIZE ); /* From halr at voltaire.com Mon May 1 03:35:04 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 01 May 2006 06:35:04 -0400 Subject: [openib-general] [PATCH] osm_port_info_rcv.c : clear clientreregister bit In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FE0027@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FE0027@mtlexch01.mtl.com> Message-ID: <1146479682.2124.148557.camel@hal.voltaire.com> On Mon, 2006-05-01 at 02:59, Ofer Gigi wrote: > Hi Hal, > Did you apply this one below? Huh ? It was committed to both trunk and 1.0 branch on Thursday AM not very long after you originally sent it. An email response on this was also sent direct to you as well as the list. -- Hal > I forgot to CC you, and as far as I can > see it is not applied. > > Thanks! > Ofer > > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Ofer Gigi > Sent: Thursday, April 27, 2006 1:46 PM > To: OPENIB > Subject: [openib-general] [PATCH] osm_port_info_rcv.c : clear > clientreregister bit > > Hi Hal, > Bug Fix: > On receive of client reregister - clear the reregister bit - so > reregistering won't be sent again and again > > Please apply to trunk and branch. > > Thanks > > Ofer G. > > Signed-off-by: Ofer Gigi > > Index: osm_port_info_rcv.c > =================================================================== > --- osm_port_info_rcv.c (revision 6640) > +++ osm_port_info_rcv.c (working copy) > @@ -666,6 +666,17 @@ osm_pi_rcv_process( > p_smp = osm_madw_get_smp_ptr( p_madw ); > p_context = osm_madw_get_pi_context_ptr( p_madw ); > p_pi = (ib_port_info_t*)ib_smp_get_payload_ptr( p_smp ); > + > + /* On receive of client reregister - clear the reregister bit - so > + reregistering won't be sent again and again*/ > + if (ib_port_info_get_client_rereg(p_pi)) > + { > + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, > + "osm_pi_rcv_process: " > + "client reregister received on response\n"); > + ib_port_info_set_client_rereg(p_pi,0); > + } > + > port_num = (uint8_t)cl_ntoh32( p_smp->attr_mod ); > > port_guid = p_context->port_guid; > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Mon May 1 03:51:17 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 01 May 2006 06:51:17 -0400 Subject: [openib-general] RE: [PATCH] osm_lid_mgr.c : exit only if exit_on_fatal in case of corrupted guid2lid file In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FE00A0@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FE00A0@mtlexch01.mtl.com> Message-ID: <1146480665.2124.148862.camel@hal.voltaire.com> Hi Ofer, On Mon, 2006-05-01 at 04:28, Ofer Gigi wrote: > Hi Hal, > Please apply to trunk and branch. The patch below was line wrapped. I fixed it by hand. Please fix your procedure in the future. Applied to both trunk and 1.0 branch. -- Hal > Thanks! > Ofer > > -----Original Message----- > From: Ofer Gigi > Sent: Monday, May 01, 2006 11:23 AM > To: 'openib-general at openib.org' > Cc: 'halr at voltaire.com' > Subject: [PATCH] osm_lid_mgr.c : exit only if exit_on_fatal in case of > corrupted guid2lid file > > Hi Hal, > > The default of opensm is to exit_on_fatal. > However, opensm can overcome sometimes fatal errors. > One of this errors is a corrupted guid2lid file. > Therefore, if you want opensm to overcome this problem you can use > -y option (don't exit > on fatal) and opensm won't exit in case of a corrupted guid2lid file - > it will just put an error in the log. > > Thanks > > Ofer G. > > Signed-off-by: Ofer Gigi > > Index: osm_lid_mgr.c > =================================================================== > --- osm_lid_mgr.c (revision 6640) > +++ osm_lid_mgr.c (working copy) > @@ -304,11 +304,19 @@ osm_lid_mgr_init( > { > if (osm_db_restore(p_mgr->p_g2l)) > { > + if (p_subn->opt.exit_on_fatal) > + { > + osm_log( p_mgr->p_log, OSM_LOG_SYS, > + "Fatal: Error restoring Guid-to-Lid persistent > database\n" ); > + status = IB_ERROR; > + goto Exit; > + } > + else > + { > osm_log( p_mgr->p_log, OSM_LOG_ERROR, > "osm_lid_mgr_init: ERR 0317: " > "Error restoring Guid-to-Lid persistent database\n"); > - status = IB_ERROR; > - goto Exit; > + } > } > > /* we need to make sure we did not get duplicates with > From tziporet at mellanox.co.il Mon May 1 04:06:37 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 1 May 2006 14:06:37 +0300 Subject: [openib-general] OFED 1.0 release plan Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6E94@mtlexch01.mtl.com> Hi All, This is the release plan 1. RC3 - target: All components are in Schedule: done on Apr-10. 2. RC4 - target: features freeze - meaning all features and modules targeted for 1.0 release should be in. Schedule: May-4 (delayed from May-1 since some modules are not ready yet) Main changes from RC3 to RC4: 1. Bug fixes according to problems reported. 2. SDP - new code that Michael Tsirkin developed. 3. SRP - with new features: FMR, tunable parameters, SRP daemon 4. Open MPI - new package based on 1.1a3 5. RDS - new version from main trunk 6. Kernel code based on git 7. Standard network configuration 3. RC5 - target: bug fixes Schedule: May-16 Main changes from RC4 to RC5: 1. Bug fixes according to problems reported 2. Updated documentation 4. Final 1.0 release - after QA of all companies Schedule: May-29 Main changes from RC4 to RC5: 1. Showstopper bug fixes only 2. Final documentation Please send me any comments you have. Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon May 1 04:19:01 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 01 May 2006 07:19:01 -0400 Subject: [openib-general] Re: [PATCH] osm_switch.c : bug fix In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FE00A9@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FE00A9@mtlexch01.mtl.com> Message-ID: <1146482315.2124.149378.camel@hal.voltaire.com> Hi Ofer, On Mon, 2006-05-01 at 04:33, Ofer Gigi wrote: > Hi Hal, > > Bug fix: > In the function osm_switch_get_fwd_tbl_block in osm_switch.c we were > missing one block in case the maximum lid was the multiplication of > lids_per_block (==64). > Adding <= instead of < to fix the problem. > > Please apply to trunk and branch. Good catch. Thanks. Applied to trunk and 1.0 branch. -- Hal > Thanks > > Ofer G. > > Signed-off-by: Ofer Gigi From ishai at mellanox.co.il Mon May 1 04:21:45 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 14:21:45 +0300 Subject: [openib-general] [PATCH 00/12] SRP: Changing ibsrpdm Message-ID: <20060501112145.GA17552@mellanox.co.il> Hi, I'm going to send 12 patches. 6 patches for the kernel, and 6 for the userspace ibsrpdm. The kernel patches avoid adding the same target twice, allow the removal of a target, and add a query about the connected targets. The userspace patches change ibsrpdm to a real daemon - that runs all the time and updates the kernel with the visible targets. Some of the kernel patches should be applied after Vu patches for fmr. The functionality of the changes can work without Vu's patches, but they are changing code in the same functions, so there may be some simple conflicts when applied without Vu's patches. -- Ishai Rabinovitz From ishai at mellanox.co.il Mon May 1 04:24:01 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 14:24:01 +0300 Subject: [openib-general] [PATCHE 01/12] SRP: changing ibsrpdm Message-ID: <20060501112401.GB17552@mellanox.co.il> Remove a redundant if Signed-off-by: Ishai Rabinovitz Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-17 10:06:19.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-17 10:11:24.000000000 +0300 @@ -1798,8 +1798,7 @@ static void srp_remove_one(struct ib_dev list_for_each_entry_safe(target, tmp_target, &host->target_list, list) { spin_lock_irqsave(target->scsi_host->host_lock, flags); - if (target->state != SRP_TARGET_REMOVED) - target->state = SRP_TARGET_REMOVED; + target->state = SRP_TARGET_REMOVED; spin_unlock_irqrestore(target->scsi_host->host_lock, flags); } mutex_unlock(&host->target_mutex); -- Ishai Rabinovitz From ishai at mellanox.co.il Mon May 1 04:25:46 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 14:25:46 +0300 Subject: [openib-general] [PATCHE 02/12] SRP: changing ibsrpdm Message-ID: <20060501112546.GC17552@mellanox.co.il> Move the destruction of the host and the removal from a list to a function. Signed-off-by: Ishai Rabinovitz Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-23 14:08:03.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-24 10:47:00.000000000 +0300 @@ -344,6 +344,16 @@ static void srp_disconnect_target(struct wait_for_completion(&target->done); } +static void destruct_scsi_host_and_target(struct srp_target_port *target, int disconnect_target) +{ + scsi_remove_host(target->scsi_host); + if (disconnect_target) + srp_disconnect_target(target); + ib_destroy_cm_id(target->cm_id); + srp_free_target_ib(target); + scsi_host_put(target->scsi_host); +} + static void srp_remove_work(void *target_ptr) { struct srp_target_port *target = target_ptr; @@ -357,10 +374,7 @@ static void srp_remove_work(void *target list_del(&target->list); mutex_unlock(&target->srp_host->target_mutex); - scsi_remove_host(target->scsi_host); - ib_destroy_cm_id(target->cm_id); - srp_free_target_ib(target); - scsi_host_put(target->scsi_host); + destruct_scsi_host_and_target(target, 0); /* And another put to really free the target port... */ scsi_host_put(target->scsi_host); } @@ -1734,13 +1746,8 @@ static void srp_remove_one(struct ib_dev flush_scheduled_work(); list_for_each_entry_safe(target, tmp_target, - &host->target_list, list) { - scsi_remove_host(target->scsi_host); - srp_disconnect_target(target); - ib_destroy_cm_id(target->cm_id); - srp_free_target_ib(target); - scsi_host_put(target->scsi_host); - } + &host->target_list, list) + destruct_scsi_host_and_target(target, 1); ib_dereg_mr(host->mr); ib_dealloc_pd(host->pd); -- Ishai Rabinovitz From ishai at mellanox.co.il Mon May 1 04:27:10 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 14:27:10 +0300 Subject: [openib-general] [PATCH 03/12] SRP: Changing ibsrpdm Message-ID: <20060501112710.GD17552@mellanox.co.il> It is nicer to perform the init_work just before the call to schedule_work. Signed-off-by: Ishai Rabinovitz Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-17 10:57:59.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-18 02:26:29.000000000 +0300 @@ -828,8 +829,10 @@ static void srp_completion(struct ib_cq wc.wr_id & SRP_OP_RECV ? "receive" : "send", wc.status); spin_lock_irqsave(target->scsi_host->host_lock, flags); - if (target->state == SRP_TARGET_LIVE) + if (target->state == SRP_TARGET_LIVE) { + INIT_WORK(&target->work, srp_reconnect_work, target); schedule_work(&target->work); + } spin_unlock_irqrestore(target->scsi_host->host_lock, flags); break; } @@ -1601,8 +1684,6 @@ static ssize_t srp_create_target(struct target->scsi_host = target_host; target->srp_host = host; - INIT_WORK(&target->work, srp_reconnect_work, target); - for (i = 0; i < SRP_SQ_SIZE - 1; ++i) target->req_ring[i].next = i + 1; target->req_ring[SRP_SQ_SIZE - 1].next = -1; -- Ishai Rabinovitz From ishai at mellanox.co.il Mon May 1 04:27:39 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 14:27:39 +0300 Subject: [openib-general] [PATCH 04/12] SRP: Changing ibsrpdm Message-ID: <20060501112739.GE17552@mellanox.co.il> Do not add the same target twice. Signed-off-by: Ishai Rabinovitz Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-25 15:17:34.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-25 15:19:37.000000000 +0300 @@ -1478,7 +1478,8 @@ static int srp_parse_options(const char printk(KERN_WARNING PFX "bad max sect parameter '%s'\n", p); goto out; } - target->scsi_host->max_sectors = token; + if (target->scsi_host != NULL) + target->scsi_host->max_sectors = token; break; default: @@ -1503,20 +1504,89 @@ out: return ret; } +/* srp_find_target - If the target exists return it in target, + otherwise target is set to NULL. + host->target_mutex should be hold */ +static int srp_find_target(const char *buf, struct srp_host *host, + struct srp_target_port **target) +{ + struct srp_target_port *target_to_find, *curr_target; + int ret, i; + + target_to_find = kzalloc(sizeof *target_to_find, GFP_KERNEL); + ret = srp_parse_options(buf, target_to_find); + if (ret) + goto free; + + list_for_each_entry(curr_target, &host->target_list, list) + if (target_to_find->ioc_guid == curr_target->ioc_guid && + target_to_find->id_ext == curr_target->id_ext && + target_to_find->path.pkey == curr_target->path.pkey && + target_to_find->service_id == curr_target->service_id) { + for (i = 0; i < 16; ++i) + if (target_to_find->path.dgid.raw[i] != curr_target->path.dgid.raw[i]) + break; + if (i == 16) { + *target = curr_target; + goto free; + } + } + + *target = NULL; + +free: + kfree(target_to_find); + return 0; +} + static ssize_t srp_create_target(struct class_device *class_dev, const char *buf, size_t count) { struct srp_host *host = container_of(class_dev, struct srp_host, class_dev); struct Scsi_Host *target_host; - struct srp_target_port *target; + struct srp_target_port *target, *existing_target = NULL; int ret; int i; + /* first check if the target already exists */ + + mutex_lock(&host->target_mutex); + ret = srp_find_target(buf, host, &existing_target); + if (ret) + goto unlock_mutex; + + if (existing_target) { + /* target already exists */ + spin_lock_irq(existing_target->scsi_host->host_lock); + switch (existing_target->state) { + case SRP_TARGET_LIVE: + printk(KERN_WARNING PFX "target %s already exists\n", + buf); + ret = -EEXIST; + break; + case SRP_TARGET_CONNECTING: + /* It is in the middle of reconnecting */ + ret = -EALREADY; + break; + case SRP_TARGET_DEAD: + /* It will be removed soon - create a new one */ + case SRP_TARGET_REMOVED: + /* target is dead, create a new one */ + break; + } + spin_unlock_irq(existing_target->scsi_host->host_lock); + if (ret) + goto unlock_mutex; + } + + /* really create the target */ target_host = scsi_host_alloc(&srp_template, sizeof (struct srp_target_port)); - if (!target_host) - return -ENOMEM; + if (!target_host) { + ret = -ENOMEM; + goto unlock_mutex; + } target_host->max_lun = SRP_MAX_LUN; @@ -1533,7 +1603,7 @@ static ssize_t srp_create_target(struct ret = srp_parse_options(buf, target); if (ret) - goto err; + goto err_put_scsi_host; ib_get_cached_gid(host->dev, host->port, 0, &target->path.sgid); @@ -1554,7 +1624,7 @@ static ssize_t srp_create_target(struct ret = srp_create_target_ib(target); if (ret) - goto err; + goto err_put_scsi_host; target->cm_id = ib_create_cm_id(host->dev, srp_cm_handler, target); if (IS_ERR(target->cm_id)) { @@ -1572,7 +1642,8 @@ static ssize_t srp_create_target(struct if (ret) goto err_disconnect; - return count; + ret = count; + goto unlock_mutex; err_disconnect: srp_disconnect_target(target); @@ -1583,9 +1654,12 @@ err_cm_id: err_free: srp_free_target_ib(target); -err: +err_put_scsi_host: scsi_host_put(target_host); +unlock_mutex: + mutex_unlock(&host->target_mutex); + return ret; } -- Ishai Rabinovitz From ogerlitz at voltaire.com Mon May 1 04:31:33 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 1 May 2006 14:31:33 +0300 (IDT) Subject: [openib-general] re cma upcalls serialization / disconnected event question Message-ID: Hi Sean, Can a ULP assume that cma callbacks for to the same CMA ID are serialized? Also and related to this, is it correct that ***always** before DISCONNECTED event there will be one of {ESTABLISHED, REJECTED, CONNECT_ERROR}? I am talking on the active side and assuming there's no notification on CONNECT_RESPONSE. thanks, Or. /* * Upon receiving a device removal event, users must destroy the associated * RDMA identifier and release all resources allocated with the device. */ enum rdma_cm_event_type { RDMA_CM_EVENT_ADDR_RESOLVED, RDMA_CM_EVENT_ADDR_ERROR, RDMA_CM_EVENT_ROUTE_RESOLVED, RDMA_CM_EVENT_ROUTE_ERROR, RDMA_CM_EVENT_CONNECT_REQUEST, RDMA_CM_EVENT_CONNECT_RESPONSE, RDMA_CM_EVENT_CONNECT_ERROR, RDMA_CM_EVENT_UNREACHABLE, RDMA_CM_EVENT_REJECTED, RDMA_CM_EVENT_ESTABLISHED, RDMA_CM_EVENT_DISCONNECTED, RDMA_CM_EVENT_DEVICE_REMOVAL, }; From ishai at mellanox.co.il Mon May 1 04:28:12 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 14:28:12 +0300 Subject: [openib-general] [PATCH 05/12] SRP: Changing ibsrpdm Message-ID: <20060501112812.GF17552@mellanox.co.il> Support a remove of a target from user level. Signed-off-by: Ishai Rabinovitz Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-01 12:30:01.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-01 12:36:22.000000000 +0300 @@ -960,10 +960,12 @@ static int srp_queuecommand(struct scsi_ long req_index; int len; - if (target->state == SRP_TARGET_CONNECTING) + if (target->state == SRP_TARGET_CONNECTING || + target->state == SRP_TARGET_RECONNECTING) goto err; if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_DISCONNECTED || target->state == SRP_TARGET_REMOVED) { scmnd->result = DID_BAD_TARGET << 16; done(scmnd); @@ -1254,6 +1256,7 @@ static int srp_send_tsk_mgmt(struct scsi spin_lock_irq(target->scsi_host->host_lock); if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_DISCONNECTED || target->state == SRP_TARGET_REMOVED) { scmnd->result = DID_BAD_TARGET << 16; goto out; @@ -1359,6 +1362,7 @@ static ssize_t show_ioc_guid(struct clas struct srp_target_port *target = host_to_target(class_to_shost(cdev)); if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_DISCONNECTED || target->state == SRP_TARGET_REMOVED) return -ENODEV; @@ -1371,6 +1375,7 @@ static ssize_t show_service_id(struct cl struct srp_target_port *target = host_to_target(class_to_shost(cdev)); if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_DISCONNECTED || target->state == SRP_TARGET_REMOVED) return -ENODEV; @@ -1383,6 +1388,7 @@ static ssize_t show_pkey(struct class_de struct srp_target_port *target = host_to_target(class_to_shost(cdev)); if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_DISCONNECTED || target->state == SRP_TARGET_REMOVED) return -ENODEV; @@ -1394,6 +1400,8 @@ static ssize_t show_dgid(struct class_de struct srp_target_port *target = host_to_target(class_to_shost(cdev)); if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_DISCONNECTED || + target->state == SRP_TARGET_DISCONNECTED || target->state == SRP_TARGET_REMOVED) return -ENODEV; @@ -1447,11 +1455,11 @@ static int srp_add_target(struct srp_hos if (scsi_add_host(target->scsi_host, host->dev->dev->dma_device)) return -ENODEV; - mutex_lock(&host->target_mutex); list_add_tail(&target->list, &host->target_list); - mutex_unlock(&host->target_mutex); + spin_lock_irq(target->scsi_host->host_lock); target->state = SRP_TARGET_LIVE; + spin_unlock_irq(target->scsi_host->host_lock); /* XXX: are we supposed to have a definition of SCAN_WILD_CARD ?? */ scsi_scan_target(&target->scsi_host->shost_gendev, @@ -1642,7 +1650,6 @@ static ssize_t srp_create_target(struct { struct srp_host *host = container_of(class_dev, struct srp_host, class_dev); - struct Scsi_Host *target_host; struct srp_target_port *target, *existing_target = NULL; int ret; int i; @@ -1663,6 +1670,7 @@ static ssize_t srp_create_target(struct buf); ret = -EEXIST; break; + case SRP_TARGET_RECONNECTING: case SRP_TARGET_CONNECTING: /* It is in the middle of reconnecting */ ret = -EALREADY; @@ -1671,6 +1679,10 @@ static ssize_t srp_create_target(struct /* It will be removed soon - create a new one */ case SRP_TARGET_REMOVED: /* target is dead, create a new one */ + existing_target = NULL; + break; + case SRP_TARGET_DISCONNECTED: + existing_target->state = SRP_TARGET_RECONNECTING; break; } spin_unlock_irq(existing_target->scsi_host->host_lock); @@ -1678,26 +1690,30 @@ static ssize_t srp_create_target(struct goto unlock_mutex; } - /* really create the target */ - target_host = scsi_host_alloc(&srp_template, - sizeof (struct srp_target_port)); - if (!target_host) { - ret = -ENOMEM; - goto unlock_mutex; - } - - target_host->max_lun = SRP_MAX_LUN; - - target = host_to_target(target_host); - memset(target, 0, sizeof *target); + if (!existing_target) { + struct Scsi_Host *target_host; - target->scsi_host = target_host; - target->srp_host = host; - - for (i = 0; i < SRP_SQ_SIZE - 1; ++i) - target->req_ring[i].next = i + 1; - target->req_ring[SRP_SQ_SIZE - 1].next = -1; - INIT_LIST_HEAD(&target->req_queue); + target_host = scsi_host_alloc(&srp_template, + sizeof (struct srp_target_port)); + if (!target_host) { + ret = -ENOMEM; + goto unlock_mutex; + } + + target_host->max_lun = SRP_MAX_LUN; + + target = host_to_target(target_host); + memset(target, 0, sizeof *target); + + target->scsi_host = target_host; + target->srp_host = host; + + for (i = 0; i < SRP_SQ_SIZE - 1; ++i) + target->req_ring[i].next = i + 1; + target->req_ring[SRP_SQ_SIZE - 1].next = -1; + INIT_LIST_HEAD(&target->req_queue); + } else + target = existing_target; ret = srp_parse_options(buf, target); if (ret) @@ -1736,9 +1752,15 @@ static ssize_t srp_create_target(struct goto err_cm_id; } - ret = srp_add_target(host, target); - if (ret) - goto err_disconnect; + if (!existing_target) { + ret = srp_add_target(host, target); + if (ret) + goto err_disconnect; + } else { + spin_lock_irq(target->scsi_host->host_lock); + target->state = SRP_TARGET_LIVE; + spin_unlock_irq(target->scsi_host->host_lock); + } ret = count; goto unlock_mutex; @@ -1753,7 +1775,9 @@ err_free: srp_free_target_ib(target); err_put_scsi_host: - scsi_host_put(target_host); + if (existing_target) + list_del(&target->list); + scsi_host_put(target->scsi_host); unlock_mutex: mutex_unlock(&host->target_mutex); @@ -1763,6 +1787,62 @@ unlock_mutex: static CLASS_DEVICE_ATTR(add_target, S_IWUSR, NULL, srp_create_target); +static ssize_t srp_remove_target(struct class_device *class_dev, + const char *buf, size_t count) +{ + struct srp_host *host = + container_of(class_dev, struct srp_host, class_dev); + struct srp_target_port *existing_target; + int ret; + + /* first check if the target exists */ + + mutex_lock(&host->target_mutex); + ret = srp_find_target(buf, host, &existing_target); + if (ret) + goto unlock_mutex; + + if (!existing_target) { + printk(KERN_WARNING PFX "target %s does not exist\n", buf); + ret = -ENOENT; + goto unlock_mutex; + } + + spin_lock_irq(existing_target->scsi_host->host_lock); + + switch (existing_target->state) { + case SRP_TARGET_REMOVED: + case SRP_TARGET_DEAD: + case SRP_TARGET_DISCONNECTED: + /* target not exists */ + printk(KERN_WARNING PFX "target %s does not exist\n", buf); + ret = -ENOENT; + break; + + case SRP_TARGET_RECONNECTING: + case SRP_TARGET_CONNECTING: + ret = -EAGAIN; /* So the caller will try again later - + after the connection ends one way or another */ + break; + + case SRP_TARGET_LIVE: + existing_target->state = SRP_TARGET_DISCONNECTED; + spin_unlock_irq(existing_target->scsi_host->host_lock); + mutex_unlock(&host->target_mutex); + srp_disconnect_target(existing_target); + ib_destroy_cm_id(existing_target->cm_id); + srp_free_target_ib(existing_target); + return count; + } + + spin_unlock_irq(existing_target->scsi_host->host_lock); +unlock_mutex: + mutex_unlock(&host->target_mutex); + return ret; +} + +static CLASS_DEVICE_ATTR(remove_target, S_IWUSR, NULL, srp_remove_target); + static ssize_t show_ibdev(struct class_device *class_dev, char *buf) { struct srp_host *host = @@ -1809,6 +1889,8 @@ static struct srp_host *srp_add_port(str goto free_host; if (class_device_create_file(&host->class_dev, &class_device_attr_add_target)) goto err_class; + if (class_device_create_file(&host->class_dev, &class_device_attr_remove_target)) + goto err_class; if (class_device_create_file(&host->class_dev, &class_device_attr_ibdev)) goto err_class; if (class_device_create_file(&host->class_dev, &class_device_attr_port)) Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.h =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.h 2006-05-01 12:30:01.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.h 2006-05-01 12:31:06.000000000 +0300 @@ -78,8 +78,10 @@ enum { enum srp_target_state { SRP_TARGET_LIVE, SRP_TARGET_CONNECTING, + SRP_TARGET_RECONNECTING, + SRP_TARGET_DISCONNECTED, SRP_TARGET_DEAD, - SRP_TARGET_REMOVED + SRP_TARGET_REMOVED, }; struct srp_device { -- Ishai Rabinovitz From ishai at mellanox.co.il Mon May 1 04:28:48 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 14:28:48 +0300 Subject: [openib-general] [PATCH 06/12] SRP: Changing ibsrpdm Message-ID: <20060501112848.GG17552@mellanox.co.il> Support a display of list of target from user level. Signed-off-by: Ishai Rabinovitz Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-21 01:13:04.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-21 03:56:05.000000000 +0300 @@ -1730,6 +1730,63 @@ end: static CLASS_DEVICE_ATTR(remove_target, S_IWUSR, NULL, srp_remove_target); +#define TARGET_INFO_BUF_SIZE 126 + +static ssize_t list_targets(struct class_device *class_dev, char *buf) +{ + struct srp_host *host = + container_of(class_dev, struct srp_host, class_dev); + struct srp_target_port *target; + int printed=0, ret; + + mutex_lock(&host->target_mutex); + list_for_each_entry(target, &host->target_list, list) + if (target->state == SRP_TARGET_LIVE) { + ret = sprintf(buf+printed, + "id_ext=%016llx,ioc_guid=%016llx," + "dgid=%04x%04x%04x%04x%04x%04x%04x%04x," + "pkey=%04x,service_id=%016llx\n", + (unsigned long long) + be64_to_cpu(target->id_ext), + (unsigned long long) + be64_to_cpu(target->ioc_guid), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[0]), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[2]), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[4]), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[6]), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[8]), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[10]), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[12]), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[14]), + be16_to_cpu(target->path.pkey), + (unsigned long long) + be64_to_cpu(target->service_id)); + if (ret <= 0) + goto end; + + printed += ret; + + if (printed + TARGET_INFO_BUF_SIZE > PAGE_SIZE - 1) + break; + } + + ret = printed; + +end: + mutex_unlock(&host->target_mutex); + return ret; +} + +static CLASS_DEVICE_ATTR(list_targets, S_IRUGO, list_targets, NULL); + static ssize_t show_ibdev(struct class_device *class_dev, char *buf) { struct srp_host *host = @@ -1789,6 +1846,8 @@ static struct srp_host *srp_add_port(str goto err_class; if (class_device_create_file(&host->class_dev, &class_device_attr_remove_target)) goto err_class; + if (class_device_create_file(&host->class_dev, &class_device_attr_list_targets)) + goto err_class; if (class_device_create_file(&host->class_dev, &class_device_attr_ibdev)) goto err_class; if (class_device_create_file(&host->class_dev, &class_device_attr_port)) -- Ishai Rabinovitz From ishai at mellanox.co.il Mon May 1 04:29:27 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 14:29:27 +0300 Subject: [openib-general] [PATCH 07/12] SRP: Changing ibsrpdm Message-ID: <20060501112927.GH17552@mellanox.co.il> Remove trailing spaces and arranging tabs. Signed-off-by: Ishai Rabinovitz Index: last_stable/src/userspace/srptools/src/srp-dm.c =================================================================== --- last_stable.orig/src/userspace/srptools/src/srp-dm.c 2006-04-21 03:54:05.000000000 +0300 +++ last_stable/src/userspace/srptools/src/srp-dm.c 2006-04-21 04:20:43.000000000 +0300 @@ -102,7 +102,6 @@ static int read_file(const char *dir, co return len; } - static int setup_port_sysfs_path(void) { char *env; char class_dev_path[256]; @@ -135,7 +134,7 @@ static int setup_port_sysfs_path(void) { fprintf(stderr, "Couldn't read ibdev attribute\n"); return -1; } - + if (read_file(class_dev_path, "port", ibport, sizeof ibport) < 0) { fprintf(stderr, "Couldn't read port attribute\n"); return -1; @@ -385,7 +384,7 @@ static int do_port(int fd, uint32_t agen pr_human(" change ID: %04x\n", ntohs(iou_info.change_id)); pr_human(" max controllers: 0x%02x\n", iou_info.max_controllers); - if (verbose > 0) + if (verbose > 0) for (i = 0; i < iou_info.max_controllers; ++i) { pr_human(" controller[%3d]: ", i + 1); switch ((iou_info.controller_list[i / 2] >> -- Ishai Rabinovitz From ishai at mellanox.co.il Mon May 1 04:29:53 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 14:29:53 +0300 Subject: [openib-general] [PATCH 08/12] SRP: Changing ibsrpdm Message-ID: <20060501112953.GI17552@mellanox.co.il> Use constants for bits in masks. Improves readability. Signed-off-by: Ishai Rabinovitz Index: last_stable/src/userspace/srptools/src/srp-dm.c =================================================================== --- last_stable.orig/src/userspace/srptools/src/srp-dm.c 2006-04-21 01:18:55.000000000 +0300 +++ last_stable/src/userspace/srptools/src/srp-dm.c 2006-04-21 03:41:39.000000000 +0300 @@ -70,6 +70,14 @@ static inline uint64_t ntohll(uint64_t x static inline uint64_t htonll(uint64_t x) { return x; } #endif +#define IS_DM_MASK (1 << 19) + +#define SIZE_OF_QUERY_RESPONSE (1 << 18) + +#define N_COMP_MASK_NODE_TYPE htonll(1 << 4); + +#define N_COMP_MASK_LID htonll(1); + static char *sysfs_path = "/sys"; static void usage(const char *argv0) @@ -474,7 +482,7 @@ static int get_port_info(int fd, uint32_ out_sa_mad->mgmt_class = SRP_MGMT_CLASS_SA; out_sa_mad->class_version = 2; - out_sa_mad->comp_mask = htonll(1); /* LID */ + out_sa_mad->comp_mask = N_COMP_MASK_LID; port_info = (void *) out_sa_mad->data; port_info->endport_lid = htons(dlid); @@ -495,7 +503,7 @@ again: port_info = (void *) in_sa_mad->data; *subnet_prefix = ntohll(port_info->subnet_prefix); - *isdm = !!(ntohl(port_info->capability_mask) & (1 << 19)); + *isdm = !!(ntohl(port_info->capability_mask) & IS_DM_MASK); return 0; } @@ -519,7 +527,7 @@ static int get_port_list(int fd, uint32_ sm_lid = strtol(val, NULL, 0); - in_mad = alloca(1 << 18); + in_mad = alloca(SIZE_OF_QUERY_RESPONSE); in_sa_mad = (void *) in_mad->data; out_sa_mad = (void *) out_mad.data; @@ -529,7 +537,7 @@ static int get_port_list(int fd, uint32_ out_sa_mad->mgmt_class = SRP_MGMT_CLASS_SA; out_sa_mad->method = SRP_SA_METHOD_GET_TABLE; out_sa_mad->class_version = 2; - out_sa_mad->comp_mask = htonll(1ul << 4); /* node type */ + out_sa_mad->comp_mask = N_COMP_MASK_NODE_TYPE; out_sa_mad->rmpp_version = 1; out_sa_mad->rmpp_type = 1; node = (void *) out_sa_mad->data; @@ -541,7 +549,7 @@ again: return -1; } - len = read(fd, in_mad, 1 << 18); + len = read(fd, in_mad, SIZE_OF_QUERY_RESPONSE); if (len < 0) { fprintf(stderr, "%s/%d: ", __func__, __LINE__); perror("read"); -- Ishai Rabinovitz From ishai at mellanox.co.il Mon May 1 04:30:19 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 14:30:19 +0300 Subject: [openib-general] [PATCH 09/12] SRP: Changing ibsrpdm Message-ID: <20060501113019.GJ17552@mellanox.co.il> alloca man page on my system says: The alloca() function is machine and compiler dependent. On many systems its implementation is buggy. Its use is discouraged. Lets not use it. Signed-off-by: Ishai Rabinovitz Index: last_stable/src/userspace/srptools/src/srp-dm.c =================================================================== --- last_stable.orig/src/userspace/srptools/src/srp-dm.c 2006-04-16 13:09:07.000000000 +0300 +++ last_stable/src/userspace/srptools/src/srp-dm.c 2006-04-16 13:11:17.000000000 +0300 @@ -510,7 +510,8 @@ again: int get_port_list(int fd, uint32_t agent[2]) { - struct ib_user_mad out_mad, *in_mad; + uint8_t in_mad_space[SIZE_OF_QUERY_RESPONSE]; + struct ib_user_mad out_mad, *in_mad=(void *) in_mad_space; struct srp_dm_rmpp_sa_mad *out_sa_mad, *in_sa_mad; struct srp_sa_node_rec *node; ssize_t len; @@ -521,8 +522,6 @@ int get_port_list(int fd, uint32_t agent uint64_t subnet_prefix; int isdm; - in_mad = alloca(SIZE_OF_QUERY_RESPONSE); - in_sa_mad = (void *) in_mad->data; out_sa_mad = (void *) out_mad.data; -- Ishai Rabinovitz From ishai at mellanox.co.il Mon May 1 04:30:45 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 14:30:45 +0300 Subject: [openib-general] [PATCH 10/12] SRP: Changing ibsrpdm Message-ID: <20060501113045.GK17552@mellanox.co.il> Add a function send_and_get that handles the communication and retries. Reduce redundancy. Increment TID on retry. Bound the number of retries. Signed-off-by: Ishai Rabinovitz Index: last_stable/src/userspace/srptools/src/srp-dm.c =================================================================== --- last_stable.orig/src/userspace/srptools/src/srp-dm.c 2006-04-21 01:35:10.000000000 +0300 +++ last_stable/src/userspace/srptools/src/srp-dm.c 2006-04-21 01:41:28.000000000 +0300 @@ -85,6 +85,36 @@ static void usage(const char *argv0) fprintf(stderr, "Usage: %s [-vc] [-d ]\n", argv0); } +#define NUM_OF_RETRIES 3 +int send_and_get(int fd, struct ib_user_mad *out_mad, + struct ib_user_mad *in_mad, long in_mad_size) +{ + int i, len, in_mad_real_size; + struct srp_dm_mad *out_dm_mad; + + in_mad_real_size = (in_mad_size ? in_mad_size : sizeof(struct ib_user_mad)); + for (i = 0; i < NUM_OF_RETRIES; ++i) + { + len = write(fd, out_mad, sizeof(struct ib_user_mad)); + if (len != sizeof(struct ib_user_mad)) { + fprintf(stderr, "write: %s\n", strerror(errno)); + return -1; + } + + len = read(fd, in_mad, in_mad_real_size); + if ((in_mad_size == 0 && len == in_mad_real_size) || + (in_mad_size != 0 && len > 0)) + return len; + else if (in_mad->hdr.status != ETIMEDOUT) { + fprintf(stderr, "%s/%d: read: %s\n", __func__, __LINE__, strerror(errno)); + return -1; + } + out_dm_mad = (void *) out_mad->data; + ((uint32_t *) &out_dm_mad->tid)[1] = tid++; + } + return -1; +} + static int read_file(const char *dir, const char *file, char *buf, size_t size) { char *path; @@ -234,19 +264,8 @@ static int set_class_port_info(int fd, u ((uint16_t *) cpi->trap_gid)[i] = htons(strtol(val + i * 5, NULL, 16)); } -again: - if (write(fd, &out_mad, sizeof out_mad) != sizeof out_mad) { - perror("write"); + if (send_and_get(fd, &out_mad, &in_mad, 0) < 0) return -1; - } - - if (read(fd, &in_mad, sizeof in_mad) != sizeof in_mad) { - if (in_mad.hdr.status == ETIMEDOUT) - goto again; - fprintf(stderr, "%s/%d: ", __func__, __LINE__); - perror("read"); - return -1; - } in_dm_mad = (void *) in_mad.data; if (in_dm_mad->status) { @@ -266,19 +285,8 @@ static int get_iou_info(int fd, uint32_t init_srp_dm_mad(&out_mad, agent[1], dlid, SRP_DM_ATTR_IO_UNIT_INFO, 0); -again: - if (write(fd, &out_mad, sizeof out_mad) != sizeof out_mad) { - perror("write"); - return -1; - } - - if (read(fd, &in_mad, sizeof in_mad) != sizeof in_mad) { - if (in_mad.hdr.status == ETIMEDOUT) - goto again; - fprintf(stderr, "%s/%d: ", __func__, __LINE__); - perror("read"); + if (send_and_get(fd, &out_mad, &in_mad, 0) < 0) return -1; - } in_dm_mad = (void *) in_mad.data; if (in_dm_mad->status) { @@ -300,19 +308,8 @@ static int get_ioc_prof(int fd, uint32_t init_srp_dm_mad(&out_mad, agent[1], dlid, SRP_DM_ATTR_IO_CONTROLLER_PROFILE, ioc); -again: - if (write(fd, &out_mad, sizeof out_mad) != sizeof out_mad) { - perror("write"); + if (send_and_get(fd, &out_mad, &in_mad, 0) < 0) return -1; - } - - if (read(fd, &in_mad, sizeof in_mad) != sizeof in_mad) { - if (in_mad.hdr.status == ETIMEDOUT) - goto again; - fprintf(stderr, "%s/%d: ", __func__, __LINE__); - perror("read"); - return -1; - } if (in_mad.hdr.status != 0) { fprintf(stderr, "IO Controller Profile query timed out\n"); @@ -340,19 +337,8 @@ static int get_svc_entries(int fd, uint3 init_srp_dm_mad(&out_mad, agent[1], dlid, SRP_DM_ATTR_SERVICE_ENTRIES, (ioc << 16) | (end << 8) | start); -again: - if (write(fd, &out_mad, sizeof out_mad) != sizeof out_mad) { - perror("write"); + if (send_and_get(fd, &out_mad, &in_mad, 0) < 0) return -1; - } - - if (read(fd, &in_mad, sizeof in_mad) != sizeof in_mad) { - if (in_mad.hdr.status == ETIMEDOUT) - goto again; - fprintf(stderr, "%s/%d: ", __func__, __LINE__); - perror("read"); - return -1; - } if (in_mad.hdr.status != 0) { fprintf(stderr, "Service Entries query timed out\n"); @@ -486,20 +472,8 @@ static int get_port_info(int fd, uint32_ port_info = (void *) out_sa_mad->data; port_info->endport_lid = htons(dlid); -again: - if (write(fd, &out_mad, sizeof out_mad) != sizeof out_mad) { - perror("write"); - return -1; - } - - if (read(fd, &in_mad, sizeof in_mad) != sizeof in_mad) { - if (in_mad.hdr.status == ETIMEDOUT) - goto again; - - fprintf(stderr, "%s/%d: ", __func__, __LINE__); - perror("read"); + if (send_and_get(fd, &out_mad, &in_mad, 0) < 0) return -1; - } port_info = (void *) in_sa_mad->data; *subnet_prefix = ntohll(port_info->subnet_prefix); @@ -542,21 +516,8 @@ static int get_port_list(int fd, uint32_ node = (void *) out_sa_mad->data; node->type = 1; /* CA */ -again: - if (write(fd, &out_mad, sizeof out_mad) != sizeof out_mad) { - perror("write"); + if ((len = send_and_get(fd, &out_mad, in_mad, SIZE_OF_QUERY_RESPONSE)) < 0) return -1; - } - - len = read(fd, in_mad, SIZE_OF_QUERY_RESPONSE); - if (len < 0) { - fprintf(stderr, "%s/%d: ", __func__, __LINE__); - perror("read"); - return -1; - } - - if (in_mad->hdr.status == ETIMEDOUT) - goto again; size = ntohs(in_sa_mad->attr_offset) * 8; -- Ishai Rabinovitz From ishai at mellanox.co.il Mon May 1 04:31:16 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 14:31:16 +0300 Subject: [openib-general] [PATCH 11/12] SRP: Changing ibsrpdm Message-ID: <20060501113116.GL17552@mellanox.co.il> Add -l option to ibsrpdm. This option activates a daemon that queries for the targets in a loop and tells ib_srp about new target that appears and old target that disappears. Signed-off-by: Ishai Rabinovitz Index: last_stable/src/userspace/srptools/src/srp-dm.c =================================================================== --- last_stable.orig/src/userspace/srptools/src/srp-dm.c 2006-04-21 04:47:54.000000000 +0300 +++ last_stable/src/userspace/srptools/src/srp-dm.c 2006-04-21 05:16:11.000000000 +0300 @@ -35,6 +35,7 @@ #include #include #include +#include #include "ib_user_mad.h" #include "srp-dm.h" @@ -49,17 +50,39 @@ static uint32_t tid = 1; static int cmd = 0; static int verbose = 0; +static int loop = 0; +static int add_target_fd; +static int remove_target_fd; +static char *list_targets_path; + +#define pr_log(arg...) \ + do { \ + if (verbose) { \ + if (loop) \ + syslog(LOG_WARNING, arg); \ + else if (!cmd) \ + printf(arg); \ + } \ + } while (0) #define pr_human(arg...) \ do { \ - if (!cmd) \ + if (!cmd && !loop) \ printf(arg); \ } while (0) #define pr_cmd(arg...) \ do { \ - if (cmd) \ - printf(arg); \ + if (cmd && !loop) \ + printf(arg); \ + } while (0) + +#define pr_err(arg...) \ + do { \ + if (loop) \ + syslog(LOG_WARNING, arg); \ + else \ + fprintf(stderr, arg); \ } while (0) #if __BYTE_ORDER == __LITTLE_ENDIAN @@ -78,11 +101,106 @@ static inline uint64_t htonll(uint64_t x #define N_COMP_MASK_LID htonll(1); +#define INITIAL_SIZE_OF_TARGET_TABLE 10 + +#define SLEEP_TIME 60 + +static int size_of_target_table = INITIAL_SIZE_OF_TARGET_TABLE; + +/* Implementaion of target set in an array. +* Assumption: there will be small number of targets +* TODO: If this assumption does not hold, +* change the implemantaion to a hash or a tree +*/ + +typedef struct { + char **array; + unsigned int next_index; + unsigned int size; +} targets_set; + +static int create_set(targets_set *set) +{ + set->next_index = 0; + set->size = size_of_target_table; + set->array = calloc(set->size, sizeof(char *)); + if (set->array == NULL) { + perror("calloc:"); + return -1; + } + + return 0; +} + +static int add_to_set(targets_set *set, char *target_info) +{ + if (set->next_index == set->size) { + if (set->size == size_of_target_table) + size_of_target_table *= 2; + set->size = size_of_target_table; + set->array = realloc(set->array, set->size * sizeof(char *)); + if (set->array == NULL) { + pr_err("realloc: %s\n", strerror(errno)); + return -1; + } + } + set->array[set->next_index] = strdup(target_info); + if (set->array[set->next_index] == NULL) { + pr_err("strdup: %s\n", strerror(errno)); + return -1; + } + ++set->next_index; + + return 0; +} + +static int remove_from_set(targets_set *set, char *target_info) +{ + int i; + + for (i = 0; i < set->next_index; ++i) + if (!strcmp(set->array[i], target_info)) { + free(set->array[i]); + set->array[i] = set->array[set->next_index]; + --set->next_index; + return 0; + } + + return -1; +} + +static void empty_set(targets_set *set) +{ + int i; + + for (i = 0; i < set->next_index; ++i) + free(set->array[i]); + set->next_index = 0; +} + +static void destroy_set(targets_set *set) +{ + int i; + + empty_set(set); + free(set->array); +} + +/* for_each_entry_in_set(char *target, targets_set *set, int i) */ +#define for_each_entry_in_set(target, set, i) \ + for (i = 0, target = set->array[i]; \ + i < set->next_index; \ + ++i, target = set->array[i]) + +/* End of the impemantaion of the set */ + +targets_set *targets_in_kernel_set; + static char *sysfs_path = "/sys"; static void usage(const char *argv0) { - fprintf(stderr, "Usage: %s [-vc] [-d ]\n", argv0); + fprintf(stderr, "Usage: %s [-vcl] [-d ]\n", argv0); } #define NUM_OF_RETRIES 3 @@ -97,7 +215,7 @@ int send_and_get(int fd, struct ib_user_ { len = write(fd, out_mad, sizeof(struct ib_user_mad)); if (len != sizeof(struct ib_user_mad)) { - fprintf(stderr, "write: %s\n", strerror(errno)); + pr_err("write: %s\n", strerror(errno)); return -1; } @@ -106,7 +224,7 @@ int send_and_get(int fd, struct ib_user_ (in_mad_size != 0 && len > 0)) return len; else if (in_mad->hdr.status != ETIMEDOUT) { - fprintf(stderr, "%s/%d: read: %s\n", __func__, __LINE__, strerror(errno)); + pr_err("%s/%d: read: %s\n", __func__, __LINE__, strerror(errno)); return -1; } out_dm_mad = (void *) out_mad->data; @@ -181,6 +299,37 @@ static int setup_port_sysfs_path(void) { asprintf(&port_sysfs_path, "%s/class/infiniband/%s/ports/%s", sysfs_path, ibdev, ibport); + if (loop) { + char *add_target_path, *remove_target_path; + + asprintf(&add_target_path, + "%s/class/infiniband_srp/srp-%s-%s/add_target", + sysfs_path, ibdev, ibport); + + add_target_fd = open(add_target_path, O_WRONLY); + if (add_target_fd < 0) { + pr_err("Couldn't open %s\n", add_target_path); + return -1; + } + + free(add_target_path); + asprintf(&remove_target_path, + "%s/class/infiniband_srp/srp-%s-%s/remove_target", + sysfs_path, ibdev, ibport); + + remove_target_fd = open(remove_target_path, O_WRONLY); + if (remove_target_fd < 0) { + pr_err("Couldn't open %s\n", remove_target_path); + return -1; + } + + free(remove_target_path); + } + + asprintf(&list_targets_path, + "%s/class/infiniband_srp/srp-%s-%s/list_targets", + sysfs_path, ibdev, ibport); + return 0; } @@ -248,14 +397,14 @@ static int set_class_port_info(int fd, u cpi = (void *) out_dm_mad->data; if (read_file(port_sysfs_path, "lid", val, sizeof val) < 0) { - fprintf(stderr, "Couldn't read LID\n"); + pr_err("Couldn't read LID\n"); return -1; } cpi->trap_lid = htons(strtol(val, NULL, 0)); if (read_file(port_sysfs_path, "gids/0", val, sizeof val) < 0) { - fprintf(stderr, "Couldn't read GID[0]\n"); + pr_err("Couldn't read GID[0]\n"); return -1; } @@ -268,7 +417,7 @@ static int set_class_port_info(int fd, u in_dm_mad = (void *) in_mad.data; if (in_dm_mad->status) { - fprintf(stderr, "Class Port Info query returned status 0x%04x\n", + pr_err("Class Port Info query returned status 0x%04x\n", ntohs(in_dm_mad->status)); return -1; } @@ -289,7 +438,7 @@ static int get_iou_info(int fd, uint32_t in_dm_mad = (void *) in_mad.data; if (in_dm_mad->status) { - fprintf(stderr, "IO Unit Info query returned status 0x%04x\n", + pr_err("IO Unit Info query returned status 0x%04x\n", ntohs(in_dm_mad->status)); return -1; } @@ -311,13 +460,13 @@ static int get_ioc_prof(int fd, uint32_t return -1; if (in_mad.hdr.status != 0) { - fprintf(stderr, "IO Controller Profile query timed out\n"); + pr_err("IO Controller Profile query timed out\n"); return -1; } in_dm_mad = (void *) in_mad.data; if (in_dm_mad->status) { - fprintf(stderr, "IO Controller Profile query returned status 0x%04x\n", + pr_err("IO Controller Profile query returned status 0x%04x\n", ntohs(in_dm_mad->status)); return -1; } @@ -340,13 +489,13 @@ static int get_svc_entries(int fd, uint3 return -1; if (in_mad.hdr.status != 0) { - fprintf(stderr, "Service Entries query timed out\n"); + pr_err("Service Entries query timed out\n"); return -1; } in_dm_mad = (void *) in_mad.data; if (in_dm_mad->status) { - fprintf(stderr, "Service Entries query returned status 0x%04x\n", + pr_err("Service Entries query returned status 0x%04x\n", ntohs(in_dm_mad->status)); return -1; } @@ -356,17 +505,57 @@ static int get_svc_entries(int fd, uint3 return 0; } +int add_target(char *new_target, int len) +{ + int ret; + + ret = remove_from_set(targets_in_kernel_set, new_target); + if (ret == 0) + return 0; + + /* It is a new target */ + if (verbose) + pr_log("Writing new target %s\n", new_target); + if (write(add_target_fd, new_target, len) != len) { + pr_err("write: %s\n", strerror(errno)); + return -1; + } + + return 0; +} + +void free_old_targets() +{ + char *target; + int i; + + for_each_entry_in_set(target, targets_in_kernel_set, i) { + int len = strlen(target); + if (verbose) + pr_log("Removing target %s\n", target); + if (write(remove_target_fd, target, len) != len) + if (errno != EEXIST) { + /* could not remove the target + and not because it is not exist*/ + pr_err("write: %s\n", strerror(errno)); + continue; + } + } + empty_set(targets_in_kernel_set); +} + static int do_port(int fd, uint32_t agent[2], uint16_t dlid, uint64_t subnet_prefix, uint64_t guid) { struct srp_dm_iou_info iou_info; struct srp_dm_ioc_prof ioc_prof; struct srp_dm_svc_entries svc_entries; - int i, j, k; + int i, j, k, len; + char *target_info; if (!memcmp(&guid, topspin_oui, 3) && set_class_port_info(fd, agent, dlid)) - fprintf(stderr, "Warning: set of ClassPortInfo failed\n"); + pr_log("Warning: set of ClassPortInfo failed\n"); if (get_iou_info(fd, agent, dlid, &iou_info)) return 1; @@ -431,22 +620,31 @@ static int do_port(int fd, uint32_t agen (unsigned long long) ntohll(svc_entries.service[k].id), svc_entries.service[k].name); - pr_cmd("id_ext=%s," + len = asprintf(&target_info, "id_ext=%s," "ioc_guid=%016llx," "dgid=%016llx%016llx," "pkey=ffff," - "service_id=%016llx\n", + "service_id=%016llx", id_ext, (unsigned long long) ntohll(ioc_prof.guid), (unsigned long long) subnet_prefix, (unsigned long long) guid, (unsigned long long) ntohll(svc_entries.service[k].id)); + if (len < 0) { + pr_err("Cannot create target_info\n"); + return -1; + } + + pr_cmd("%s\n", target_info); + + if (loop) + add_target(target_info, len); } } } } - pr_human("\n"); + pr_log("\n"); return 0; } @@ -495,7 +693,7 @@ static int get_port_list(int fd, uint32_ int isdm; if (read_file(port_sysfs_path, "sm_lid", val, sizeof val) < 0) { - fprintf(stderr, "Couldn't read SM LID\n"); + pr_err("Couldn't read SM LID\n"); return -1; } @@ -537,16 +735,45 @@ static int get_port_list(int fd, uint32_ return 0; } +#define TARGET_INFO_SIZE 126 +static int get_existing_targets() +{ + char buf[TARGET_INFO_SIZE]; + int list_targets_fd; + int ret; + + list_targets_fd = open(list_targets_path, O_RDONLY); + if (list_targets_fd < 0) { + pr_err("Couldn't open %s\n", list_targets_path); + return -1; + } + + ret = read(list_targets_fd, buf, TARGET_INFO_SIZE); + while (ret > 0) { + buf[ret - 1] = 0; + ret = add_to_set(targets_in_kernel_set, buf); + if (ret) + return ret; + pr_log("found %s in the kernel\n", buf); + ret = read(list_targets_fd, buf, TARGET_INFO_SIZE); + } + + close(list_targets_fd); + return 0; +} + int main(int argc, char *argv[]) { int fd; uint32_t agent[0]; char *cmd_name = strdup(argv[0]); + pid_t pid, sid; + int ret; while (1) { int c; - c = getopt(argc, argv, "cvd:"); + c = getopt(argc, argv, "cvld:"); if (c == -1) break; @@ -560,25 +787,88 @@ int main(int argc, char *argv[]) case 'v': ++verbose; break; + case 'l': + ++loop; + break; default: usage(cmd_name); return 1; } } - fd = open(umad_dev, O_RDWR); + fd = open(umad_dev, O_RDWR); if (fd < 0) { perror("open"); - return 1; + exit(EXIT_FAILURE); } if (setup_port_sysfs_path()) - return 1; + exit(EXIT_FAILURE); if (create_agent(fd, agent)) - return 1; + exit(EXIT_FAILURE); + + /* Daemon-specific initialization goes here */ + targets_in_kernel_set = (targets_set *) malloc(sizeof(targets_set)); + create_set(targets_in_kernel_set); + + if (loop) { + /* Fork off the parent process */ + pid = fork(); + if (pid < 0) { + exit(EXIT_FAILURE); + } + /* If we got a good PID, then + we can exit the parent process. */ + if (pid > 0) { + exit(EXIT_SUCCESS); + } + + /* Change the file mode mask */ + umask(0); + + /* Open any logs here */ + openlog("ibsrpdm: ", LOG_NDELAY, LOG_DAEMON | LOG_SYSLOG); + + /* Create a new SID for the child process */ + sid = setsid(); + if (sid < 0) { + /* Log the failure */ + exit(EXIT_FAILURE); + } + + /* Change the current working directory */ + if ((chdir("/")) < 0) { + /* Log the failure */ + exit(EXIT_FAILURE); + } + + /* Close out the standard file descriptors */ + close(STDIN_FILENO); + close(STDOUT_FILENO); + close(STDERR_FILENO); + } + + /* The Big Loop */ + while (1) { + if (loop) + (void) get_existing_targets(); + + ret = get_port_list(fd, agent); + if (loop == 0) + return ret; + + free_old_targets(); + + sleep(SLEEP_TIME); /* wait SLEEP_TIME seconds */ + } - get_port_list(fd, agent); + destroy_set(targets_in_kernel_set); + free(port_sysfs_path); + free(list_targets_path); + close(add_target_fd); + close(remove_target_fd); + closelog(); return 0; } -- Ishai Rabinovitz From ishai at mellanox.co.il Mon May 1 04:31:41 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 14:31:41 +0300 Subject: [openib-general] [PATCH 12/12] SRP: Changing ibsrpdm Message-ID: <20060501113141.GM17552@mellanox.co.il> The query can be improved if working against OpenSM that supports the option to ask about a certain bit in the capability mask. Signed-off-by: Ishai Rabinovitz Index: last_stable/src/userspace/srptools/src/srp-dm.c =================================================================== --- last_stable.orig/src/userspace/srptools/src/srp-dm.c 2006-04-21 06:26:25.000000000 +0300 +++ last_stable/src/userspace/srptools/src/srp-dm.c 2006-04-21 06:30:36.000000000 +0300 @@ -97,10 +97,16 @@ static inline uint64_t htonll(uint64_t x #define SIZE_OF_QUERY_RESPONSE (1 << 18) +#define SM_SUPPORTS_QUERY_OF_PART_OF_CAP_MASK_BIT_MASK (1 << 13) + +#define TEST_ONLY_SET_BIT_BIT_MASK (1 << 31) + #define N_COMP_MASK_NODE_TYPE htonll(1 << 4); #define N_COMP_MASK_LID htonll(1); +#define N_COMP_MASK_CAPABILITY_MASK htonll(1 << 7); + #define INITIAL_SIZE_OF_TARGET_TABLE 10 #define SLEEP_TIME 60 @@ -180,8 +186,6 @@ static void empty_set(targets_set *set) static void destroy_set(targets_set *set) { - int i; - empty_set(set); free(set->array); } @@ -679,6 +683,63 @@ static int get_port_info(int fd, uint32_ return 0; } +int get_class_port_info(int fd, uint32_t agent[2], uint16_t dlid, + int *is_mask_match_supported) +{ + struct ib_user_mad out_mad, in_mad; + struct srp_dm_rmpp_sa_mad *out_sa_mad, *in_sa_mad; + struct srp_dm_mad *in_dm_mad; + struct srp_dm_class_port_info *class_port_info; + + in_sa_mad = (void *) in_mad.data; + in_dm_mad = (void *) in_mad.data; + out_sa_mad = (void *) out_mad.data; + + init_srp_dm_mad(&out_mad, agent[1], sm_lid, SRP_DM_ATTR_CLASS_PORT_INFO, 0); + + out_sa_mad->mgmt_class = SRP_MGMT_CLASS_SA; + out_sa_mad->class_version = 2; + + if (send_and_get(fd, &out_mad, &in_mad, 0) < 0) + return -1; + + /* TODO: to handle forwarding */ + class_port_info = (void *) in_sa_mad->data; + *is_mask_match_supported = + !!(ntohs(class_port_info->cap_mask) & + SM_SUPPORTS_QUERY_OF_PART_OF_CAP_MASK_BIT_MASK); + + return 0; +} + +int get_node_info(int fd, uint32_t agent[2], uint16_t dlid, uint64_t *n_guid) +{ + struct ib_user_mad out_mad, in_mad; + struct srp_dm_rmpp_sa_mad *out_sa_mad, *in_sa_mad; + struct srp_dm_mad *in_dm_mad; + struct srp_sa_node_rec *node_info; + + in_sa_mad = (void *) in_mad.data; + in_dm_mad = (void *) in_mad.data; + out_sa_mad = (void *) out_mad.data; + + init_srp_dm_mad(&out_mad, agent[1], sm_lid, SRP_SA_ATTR_NODE, 0); + + out_sa_mad->mgmt_class = SRP_MGMT_CLASS_SA; + out_sa_mad->class_version = 2; + out_sa_mad->comp_mask = htonll((uint64_t)1); /* LID */ + node_info = (void *) out_sa_mad->data; + node_info->lid = htons(dlid); + + if (send_and_get(fd, &out_mad, &in_mad, 0) < 0) + return -1; + + node_info = (void *) in_sa_mad->data; + *n_guid = node_info->port_guid; + + return 0; +} + static int get_port_list(int fd, uint32_t agent[2]) { uint8_t in_mad_space[SIZE_OF_QUERY_RESPONSE]; @@ -686,19 +747,11 @@ static int get_port_list(int fd, uint32_ struct srp_dm_rmpp_sa_mad *out_sa_mad, *in_sa_mad; struct srp_sa_node_rec *node; ssize_t len; - char val[64]; int size; int i; uint64_t subnet_prefix; int isdm; - if (read_file(port_sysfs_path, "sm_lid", val, sizeof val) < 0) { - pr_err("Couldn't read SM LID\n"); - return -1; - } - - sm_lid = strtol(val, NULL, 0); - in_sa_mad = (void *) in_mad->data; out_sa_mad = (void *) out_mad.data; @@ -762,6 +815,57 @@ static int get_existing_targets() return 0; } +int get_port_list_new(int fd, uint32_t agent[2]) +{ + uint8_t in_mad_space[SIZE_OF_QUERY_RESPONSE]; + struct ib_user_mad out_mad, *in_mad=(void *) in_mad_space; + struct srp_dm_rmpp_sa_mad *out_sa_mad, *in_sa_mad; + struct srp_sa_port_info_rec *port_info; + ssize_t len; + int size; + int i; + uint64_t subnet_prefix; + uint16_t lid; + uint64_t guid; + + in_sa_mad = (void *) in_mad->data; + out_sa_mad = (void *) out_mad.data; + + init_srp_dm_mad(&out_mad, agent[1], sm_lid, SRP_SA_ATTR_PORT_INFO, + TEST_ONLY_SET_BIT_BIT_MASK); + + out_sa_mad->mgmt_class = SRP_MGMT_CLASS_SA; + out_sa_mad->method = SRP_SA_METHOD_GET_TABLE; + out_sa_mad->class_version = 2; + out_sa_mad->comp_mask = N_COMP_MASK_CAPABILITY_MASK; + port_info = (void *) out_sa_mad->data; + port_info->capability_mask = htonl(IS_DM_MASK); + + if ((len = send_and_get(fd, &out_mad, in_mad, SIZE_OF_QUERY_RESPONSE)) < 0) + return -1; + + size = ntohs(in_sa_mad->attr_offset) * 8; + + for (i = 0; (i + 1) * size <= len - 56 - 36; ++i) { + port_info = (void *) in_sa_mad->data + i * size; + + if (!(ntohl(port_info->capability_mask) & IS_DM_MASK)) { + pr_err("Error in query %s%d\n", __func__, __LINE__); + return -1; + } + + lid = ntohs(port_info->endport_lid); + + if (get_node_info(fd, agent, lid, &guid)) + continue; + + subnet_prefix = ntohll(port_info->subnet_prefix); + do_port(fd, agent, lid, subnet_prefix, ntohll(guid)); + } + + return 0; +} + int main(int argc, char *argv[]) { int fd; @@ -769,6 +873,7 @@ int main(int argc, char *argv[]) char *cmd_name = strdup(argv[0]); pid_t pid, sid; int ret; + char val[64]; while (1) { int c; @@ -808,6 +913,13 @@ int main(int argc, char *argv[]) if (create_agent(fd, agent)) exit(EXIT_FAILURE); + if (read_file(port_sysfs_path, "sm_lid", val, sizeof val) < 0) { + fprintf(stderr, "Couldn't read SM LID\n"); + exit(EXIT_FAILURE); + } + + sm_lid = strtol(val, NULL, 0); + /* Daemon-specific initialization goes here */ targets_in_kernel_set = (targets_set *) malloc(sizeof(targets_set)); create_set(targets_in_kernel_set); @@ -851,16 +963,39 @@ int main(int argc, char *argv[]) /* The Big Loop */ while (1) { + int is_mask_match_supported; + if (loop) (void) get_existing_targets(); - ret = get_port_list(fd, agent); + if (get_class_port_info(fd, agent, sm_lid, &is_mask_match_supported)) + exit(EXIT_FAILURE); + + if (is_mask_match_supported) + { + pr_log("SM supports query for is dm\n"); + ret = get_port_list_new(fd, agent); + } + else + { + pr_log("SM does not supoprt query for is dm\n"); + ret = get_port_list(fd, agent); + } + if (loop == 0) return ret; free_old_targets(); sleep(SLEEP_TIME); /* wait SLEEP_TIME seconds */ + + while (read_file(port_sysfs_path, "sm_lid", val, sizeof val) < 0) { + pr_err("Couldn't read SM LID\n"); + sleep(SLEEP_TIME); /* wait another SLEEP_TIME seconds */ + } + + sm_lid = strtol(val, NULL, 0); + } destroy_set(targets_in_kernel_set); -- Ishai Rabinovitz From ishai at mellanox.co.il Mon May 1 04:35:48 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 14:35:48 +0300 Subject: [openib-general] [PATCH] SRP: Avoid a potential deadlock Message-ID: <20060501113548.GN17552@mellanox.co.il> Hi, I think there is a potential deadlock when disconnecting from the CM. Roland, can you look at this patch and check if it is needed. Thanks Ishai ---------------------------------------------------------------------- Avoid a potential dead-lock. In srp_disconnect_target there is a call to ib_send_cm_dreq and a wait for completion If when getting DREP there is no comp no one will end this wait Signed-off-by: Ishai Rabinovitz Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-17 10:03:08.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-17 10:06:19.000000000 +0300 @@ -1194,6 +1194,7 @@ static int srp_cm_handler(struct ib_cm_i break; case IB_CM_DREP_RECEIVED: + comp = 1; break; case IB_CM_TIMEWAIT_EXIT: -- Ishai Rabinovitz From ogerlitz at voltaire.com Mon May 1 05:51:10 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 01 May 2006 15:51:10 +0300 Subject: [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbsinteraction In-Reply-To: References: Message-ID: <4456043E.6040207@voltaire.com> Sean Hefty wrote: >> +static int iser_free_device_ib_res(struct iser_device *device) > Can you eliminate the return code? >> +struct iser_device *iser_device_find_by_ib_device(struct rdma_cm_id *cma_id) >> + if (device == NULL) >> + goto end; > goto out; // see below > out: both fixes are committed in r761 Or. ------------------------------------------------------------------------ r6761 | ogerlitz | 2006-04-30 15:35:15 +0300 (Sun, 30 Apr 2006) | 5 lines made iser_free_device_ib_res() void, changed the goto label of iser_device_find_by_ib_device() to be named "out" instead of "end". Signed-off-by: Or Gerlitz From ogerlitz at voltaire.com Mon May 1 06:02:13 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 01 May 2006 16:02:13 +0300 Subject: [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbsinteraction In-Reply-To: References: Message-ID: <445606D5.20907@voltaire.com> Sean Hefty wrote: >> +static void iser_disconnected_handler(struct rdma_cm_id *cma_id) >> +{ >> + struct iser_conn *ib_conn; >> + >> + ib_conn = (struct iser_conn *)cma_id->context; >> + ib_conn->disc_evt_flag = 1; >> + >> + /* If this event is unsolicited this means that the conn is being */ >> + /* terminated asynchronously from the iSCSI layer's perspective. */ >> + if (atomic_read(&ib_conn->state) == ISER_CONN_PENDING) { >> + atomic_set(&ib_conn->state, ISER_CONN_DOWN); >> + wake_up_interruptible(&ib_conn->wait); >> + } else { >> + if (atomic_read(&ib_conn->state) == ISER_CONN_UP) { >> + atomic_set(&ib_conn->state, ISER_CONN_TERMINATING); >> + iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, >> + ISCSI_ERR_CONN_FAILED); >> + } >> + /* Complete the termination process if no posts are pending */ >> + if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) && >> + (atomic_read(&ib_conn->post_send_buf_count) == 0)) { >> + atomic_set(&ib_conn->state, ISER_CONN_DOWN); >> + wake_up_interruptible(&ib_conn->wait); >> + } >> + } > Are there races here between reading ib_conn->state and setting it? Could it > have changed in between the atomic_read() and atomic_set()? It seems that indeed a race is possible here, i am rethinking now on the implementation of the ib connection states moves, thanks for pointing this. >> + src = (struct sockaddr *)src_addr; >> + dst = (struct sockaddr *)dst_addr; >> + err = rdma_resolve_addr(ib_conn->cma_id, src, dst, 1000); >> + if (err) { >> + iser_err("rdma_resolve_addr failed: %d\n", err); >> + goto addr_failure; >> + } >> + >> + if (!non_blocking) { >> + wait_event_interruptible(ib_conn->wait, >> + atomic_read(&ib_conn->state) != ISER_CONN_PENDING); >> + >> + if (atomic_read(&ib_conn->state) != ISER_CONN_UP) { >> + err = -EIO; >> + goto connect_failure; >> + } >> + } >> + >> + mutex_lock(&ig.connlist_mutex); >> + list_add(&ib_conn->conn_list, &ig.connlist); >> + mutex_unlock(&ig.connlist_mutex); > Not sure if there's a race here or not, but rdma_resolve_addr() will result in a > callback from a separate thread. That callback could occur before the ib_conn > is added to the ig.connlist. Do you assume that ib_conn is in the connlist in > any of the callbacks? No, i don't assume this in the callbacks. ib_conn is inserted to the list in iser_connect and being lookup-ed in ep_poll, conn_bind and ep_disconnect where each subset of the latter three functions are serialized are iser_connect since they are called by the same user space process (iscsid, via iscsi netlink u/k IPC mechanism). However, in a review i have made to fully answer your question i have found a possible double call to iser_conn_release where the fix below handles it. ------------------------------------------------------------------------ r6802 | ogerlitz | 2006-05-01 12:27:12 +0300 (Mon, 01 May 2006) | 5 lines move the ib conn deletion from the global connlist to iser_conn_release, fix ep_disconnect to call conn_terminate or conn_release but not both. Signed-off-by: Or Gerlitz Index: iser_verbs.c =================================================================== --- iser_verbs.c (revision 6761) +++ iser_verbs.c (revision 6802) @@ -301,10 +301,6 @@ void iser_conn_terminate(struct iser_con wait_event_interruptible(ib_conn->wait, (atomic_read(&ib_conn->state) == ISER_CONN_DOWN)); - mutex_lock(&ig.connlist_mutex); - list_del(&ib_conn->conn_list); - mutex_unlock(&ig.connlist_mutex); - iser_conn_release(ib_conn); } @@ -463,6 +459,7 @@ int iser_conn_init(struct iser_conn **ib atomic_set(&ib_conn->post_send_buf_count, 0); INIT_WORK(&ib_conn->comperror_work, iser_comp_error_worker, ib_conn); + INIT_LIST_HEAD(&ib_conn->conn_list); *ibconn = ib_conn; return 0; @@ -541,6 +538,10 @@ void iser_conn_release(struct iser_conn BUG_ON(atomic_read(&ib_conn->state) != ISER_CONN_DOWN); + mutex_lock(&ig.connlist_mutex); + list_del(&ib_conn->conn_list); + mutex_unlock(&ig.connlist_mutex); + iser_free_ib_conn_res(ib_conn); ib_conn->device = NULL; /* on EVENT_ADDR_ERROR there's no device yet for this conn */ Index: iscsi_iser.c =================================================================== --- iscsi_iser.c (revision 6761) +++ iscsi_iser.c (revision 6802) @@ -680,8 +680,8 @@ iscsi_iser_ep_disconnect(__u64 ep_handle if (atomic_read(&ib_conn->state) == ISER_CONN_UP) iser_conn_terminate(ib_conn); - - iser_conn_release(ib_conn); + else + iser_conn_release(ib_conn); } static struct scsi_host_template iscsi_iser_sht = { From muli at il.ibm.com Mon May 1 06:33:42 2006 From: muli at il.ibm.com (Muli Ben-Yehuda) Date: Mon, 1 May 2006 16:33:42 +0300 Subject: [openib-general] [PATCHE 02/12] SRP: changing ibsrpdm In-Reply-To: <20060501112546.GC17552@mellanox.co.il> References: <20060501112546.GC17552@mellanox.co.il> Message-ID: <20060501133342.GJ3599@rhun.haifa.ibm.com> On Mon, May 01, 2006 at 02:25:46PM +0300, Ishai Rabinovitz wrote: > > Move the destruction of the host and the removal from a list to a function. > > Signed-off-by: Ishai Rabinovitz > > Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c > =================================================================== > --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-23 14:08:03.000000000 +0300 > +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-24 10:47:00.000000000 +0300 > @@ -344,6 +344,16 @@ static void srp_disconnect_target(struct > wait_for_completion(&target->done); > } > > +static void destruct_scsi_host_and_target(struct srp_target_port *target, int disconnect_target) > +{ > + scsi_remove_host(target->scsi_host); > + if (disconnect_target) > + srp_disconnect_target(target); > + ib_destroy_cm_id(target->cm_id); > + srp_free_target_ib(target); > + scsi_host_put(target->scsi_host); > +} > + > static void srp_remove_work(void *target_ptr) > { > struct srp_target_port *target = target_ptr; > @@ -357,10 +374,7 @@ static void srp_remove_work(void *target > list_del(&target->list); > mutex_unlock(&target->srp_host->target_mutex); > > - scsi_remove_host(target->scsi_host); > - ib_destroy_cm_id(target->cm_id); > - srp_free_target_ib(target); > - scsi_host_put(target->scsi_host); > + destruct_scsi_host_and_target(target, 0); Is not disconnecting from the target here actually the right thing to do? considering we're then destroying the target's queue pairs and freeing it? Cheers, Muli From ogerlitz at voltaire.com Mon May 1 06:40:48 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 01 May 2006 16:40:48 +0300 Subject: [openib-general] Re: possible bug in kmem_cache related code In-Reply-To: <1146293055.11279.2.camel@localhost> References: <84144f020604270419s10696877he2ec27ae6d52e486@mail.gmail.com> <15ddcffd0604281224i4308b08fs93f9ebaf7e9a16b3@mail.gmail.com> <1146293055.11279.2.camel@localhost> Message-ID: <44560FE0.2000004@voltaire.com> Pekka Enberg wrote: > On Fri, 2006-04-28 at 21:24 +0200, Or Gerlitz wrote: >> Yes, i can reproduce this at will, no local modifications, my system >> is amd dual x86_64, i have attached my .config to the first email of >> this thread, and also mentioned that some CONFIG_DEBUG_ options are >> set, including one related to slab debugging. >> > Yeah, arch/um/. Unfortunately I don't have a SMP box, so I probably > can't reproduce this. You could try git bisect to isolate the offending > changeset. mmm, I might be able to do git bisection later this week or next week. However, for the mean time can more people of the openib and open iscsi communities set 2.6.17-rcX to see that the issue reproduces with my synthetic module and with ib/iscsi code (you know this kernel will be out in few weeks from now...) Or. From muli at il.ibm.com Mon May 1 06:43:23 2006 From: muli at il.ibm.com (Muli Ben-Yehuda) Date: Mon, 1 May 2006 16:43:23 +0300 Subject: [openib-general] [PATCH 04/12] SRP: Changing ibsrpdm In-Reply-To: <20060501112739.GE17552@mellanox.co.il> References: <20060501112739.GE17552@mellanox.co.il> Message-ID: <20060501134323.GK3599@rhun.haifa.ibm.com> On Mon, May 01, 2006 at 02:27:39PM +0300, Ishai Rabinovitz wrote: > > Do not add the same target twice. > > Signed-off-by: Ishai Rabinovitz > Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c > =================================================================== > --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-25 15:17:34.000000000 +0300 > +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-25 15:19:37.000000000 +0300 > @@ -1478,7 +1478,8 @@ static int srp_parse_options(const char > printk(KERN_WARNING PFX "bad max sect parameter '%s'\n", p); > goto out; > } > - target->scsi_host->max_sectors = token; > + if (target->scsi_host != NULL) > + target->scsi_host->max_sectors = token; > break; This chunk does not look related to the rest. Is a NULL target->scsi_host legal here? if not, the check should be removed as we'd rather take an oops here than hide the problem behind the NULL pointer check. > +/* srp_find_target - If the target exists return it in target, > + otherwise target is set to NULL. > + host->target_mutex should be hold */ Please use the usual kernel /* * stuff */ style for multi line comments. > +static int srp_find_target(const char *buf, struct srp_host *host, > + struct srp_target_port **target) > +{ > + struct srp_target_port *target_to_find, *curr_target; > + int ret, i; > + > + target_to_find = kzalloc(sizeof *target_to_find, GFP_KERNEL); > + ret = srp_parse_options(buf, target_to_find); > + if (ret) > + goto free; > + > + list_for_each_entry(curr_target, &host->target_list, list) > + if (target_to_find->ioc_guid == curr_target->ioc_guid && > + target_to_find->id_ext == curr_target->id_ext && > + target_to_find->path.pkey == curr_target->path.pkey && > + target_to_find->service_id == curr_target->service_id) { > + for (i = 0; i < 16; ++i) > + if (target_to_find->path.dgid.raw[i] != curr_target->path.dgid.raw[i]) > + break; The conditional and check here probably deserves an inline helper called same_target() or some such. > + if (i == 16) { > + *target = curr_target; > + goto free; > + } > + } > + > + *target = NULL; > + > +free: > + kfree(target_to_find); > + return 0; We always return 0 - either this should return void, or you meant to return ret here instead of 0? > +} > + > static ssize_t srp_create_target(struct class_device *class_dev, > const char *buf, size_t count) > { > struct srp_host *host = > container_of(class_dev, struct srp_host, class_dev); > struct Scsi_Host *target_host; > - struct srp_target_port *target; > + struct srp_target_port *target, *existing_target = NULL; > int ret; > int i; > > + /* first check if the target already exists */ > + > + mutex_lock(&host->target_mutex); > + ret = srp_find_target(buf, host, &existing_target); > + if (ret) > + goto unlock_mutex; > + > + if (existing_target) { > + /* target already exists */ > + spin_lock_irq(existing_target->scsi_host->host_lock); why _irq and not _irqsave? Are you sure this code can't ever be called with interrupts off via some other path? Cheers, Muli From muli at il.ibm.com Mon May 1 06:50:32 2006 From: muli at il.ibm.com (Muli Ben-Yehuda) Date: Mon, 1 May 2006 16:50:32 +0300 Subject: [openib-general] [PATCH 06/12] SRP: Changing ibsrpdm In-Reply-To: <20060501112848.GG17552@mellanox.co.il> References: <20060501112848.GG17552@mellanox.co.il> Message-ID: <20060501135032.GL3599@rhun.haifa.ibm.com> On Mon, May 01, 2006 at 02:28:48PM +0300, Ishai Rabinovitz wrote: > > Support a display of list of target from user level. > > Signed-off-by: Ishai Rabinovitz > Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c > =================================================================== > --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-21 01:13:04.000000000 +0300 > +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-21 03:56:05.000000000 +0300 > @@ -1730,6 +1730,63 @@ end: > > static CLASS_DEVICE_ATTR(remove_target, S_IWUSR, NULL, srp_remove_target); > > +#define TARGET_INFO_BUF_SIZE 126 > + > +static ssize_t list_targets(struct class_device *class_dev, char *buf) > +{ > + struct srp_host *host = > + container_of(class_dev, struct srp_host, class_dev); > + struct srp_target_port *target; > + int printed=0, ret; > + > + mutex_lock(&host->target_mutex); > + list_for_each_entry(target, &host->target_list, list) Can this race with list addition / removal? I saw that you removed the lock in an earlier patch? > + if (target->state == SRP_TARGET_LIVE) { You'd have an easier time with the indentation if you'd do if (target->state != SRP_TARGET_LIVE) continue; here > + ret = sprintf(buf+printed, > + "id_ext=%016llx,ioc_guid=%016llx," > + "dgid=%04x%04x%04x%04x%04x%04x%04x%04x," > + "pkey=%04x,service_id=%016llx\n", > + (unsigned long long) > + be64_to_cpu(target->id_ext), > + (unsigned long long) > + be64_to_cpu(target->ioc_guid), > + (int) be16_to_cpu(*(__be16 *) > + &target->path.dgid.raw[0]), > + (int) be16_to_cpu(*(__be16 *) > + &target->path.dgid.raw[2]), > + (int) be16_to_cpu(*(__be16 *) > + &target->path.dgid.raw[4]), > + (int) be16_to_cpu(*(__be16 *) > + &target->path.dgid.raw[6]), > + (int) be16_to_cpu(*(__be16 *) > + &target->path.dgid.raw[8]), > + (int) be16_to_cpu(*(__be16 *) > + &target->path.dgid.raw[10]), > + (int) be16_to_cpu(*(__be16 *) > + &target->path.dgid.raw[12]), > + (int) be16_to_cpu(*(__be16 *) > + &target->path.dgid.raw[14]), This is pretty horrible - could you use show_dgid() here? Cheers, Muli From mst at mellanox.co.il Mon May 1 07:04:58 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 1 May 2006 17:04:58 +0300 Subject: [openib-general] Re: [PATCH 04/12] SRP: Changing ibsrpdm In-Reply-To: <20060501134323.GK3599@rhun.haifa.ibm.com> References: <20060501112739.GE17552@mellanox.co.il> <20060501134323.GK3599@rhun.haifa.ibm.com> Message-ID: <20060501140458.GH3032@mellanox.co.il> Quoting r. Muli Ben-Yehuda : > > static ssize_t srp_create_target(struct class_device *class_dev, > > const char *buf, size_t count) > > { > > struct srp_host *host = > > container_of(class_dev, struct srp_host, class_dev); > > struct Scsi_Host *target_host; > > - struct srp_target_port *target; > > + struct srp_target_port *target, *existing_target = NULL; > > int ret; > > int i; > > > > + /* first check if the target already exists */ > > + > > + mutex_lock(&host->target_mutex); > > + ret = srp_find_target(buf, host, &existing_target); > > + if (ret) > > + goto unlock_mutex; > > + > > + if (existing_target) { > > + /* target already exists */ > > + spin_lock_irq(existing_target->scsi_host->host_lock); > > why _irq and not _irqsave? Are you sure this code can't ever be called > with interrupts off via some other path? Given the mutex_lock above, this better be true. -- MST From muli at il.ibm.com Mon May 1 07:10:09 2006 From: muli at il.ibm.com (Muli Ben-Yehuda) Date: Mon, 1 May 2006 17:10:09 +0300 Subject: [openib-general] Re: [PATCH 04/12] SRP: Changing ibsrpdm In-Reply-To: <20060501140458.GH3032@mellanox.co.il> References: <20060501112739.GE17552@mellanox.co.il> <20060501134323.GK3599@rhun.haifa.ibm.com> <20060501140458.GH3032@mellanox.co.il> Message-ID: <20060501141009.GM3599@rhun.haifa.ibm.com> On Mon, May 01, 2006 at 05:04:58PM +0300, Michael S. Tsirkin wrote: > > why _irq and not _irqsave? Are you sure this code can't ever be called > > with interrupts off via some other path? > > Given the mutex_lock above, this better be true. Good point, but _irq instinctively makes me worried. Cheers, Muli From ogerlitz at voltaire.com Mon May 1 07:12:40 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 1 May 2006 17:12:40 +0300 Subject: [openib-general] re RDS missing features Message-ID: Can you elaborate on each of the features, specifically the following points are of interest to us: +1 so you running Oracle Loopback traffic over RDS sockets? if yes, what the issue here? the openib CMA supports listen/connect on loopback addresses (eg 127.0.0.1 or IPoIB local address) +2 by failover, are you referring to APM? that is failover between IB pathes to/from the same HCA over which the original connection/QP was established or you are talking on failover between HCAs +3 is the no support for /proc like for RDS an issue to run crload or demo Oracle (that is specific tuning and usage of non defaults is needed for any/optimal operation) Or. [openfabrics-ewg] Before we can start testing - we needto ensure that RDS is fully ported. Pandit, Ranjit rpandit at silverstorm.com Following features are yet to be implemented in OpenFabric Rds: 1. Failover 2. Loopback connections 3. support for /proc fs like Rds config, stats and info. Ranjit From halr at voltaire.com Mon May 1 07:44:04 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 01 May 2006 10:44:04 -0400 Subject: [openib-general] RFC: detecting duplicate MAD requests In-Reply-To: <44529AB3.7080905@ichips.intel.com> References: <44529AB3.7080905@ichips.intel.com> Message-ID: <1146494601.2124.152923.camel@hal.voltaire.com> On Fri, 2006-04-28 at 18:44, Sean Hefty wrote: > Sean Hefty wrote: > > I'd like to propose that the MAD layer detect duplicate requests. After a > > request MAD has been handed to a client, its context would be maintained until > > the user calls ib_free_recv_mad(), allowing duplicate requests to be discarded. > > I should add that this also provides context that the MAD layer can use when > performing DS RMPP. On the initiator side, DS RMPP would be detected by an RMPP > request that expected a response. (This assumes that the response is also > RMPP.) Aren't there 3 cases possible here: (1) non RMPP request/RMPP response (e.g. SA GetTable for one), (2) RMPP request/RMPP response (e.g. SA GetMulti), and (3) RMPP request/non RMPP response (I don't think this currently exists but may be mistaken). Are all handled on the initiator/requester side ? Are the changes only for case (2) ? > On the responder side, DS RMPP is detected when an RMPP response is sent > in response to an RMPP request. The responder side sounds more straightforward. -- Hal > - Sean > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Mon May 1 07:51:20 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 01 May 2006 10:51:20 -0400 Subject: [openib-general] RFC: detecting duplicate MAD requests In-Reply-To: References: Message-ID: <1146495061.2124.153047.camel@hal.voltaire.com> On Fri, 2006-04-28 at 18:20, Sean Hefty wrote: > Today, a request MAD received by the MAD layer is handed to a client. The > client processes the MAD, and generates a response. If the client is slow to > process the MAD, the request may have been resent. The duplicate request is > also handed to the client. The result is that clients perform duplicate > processing of the MAD or must detect the duplicates themselves. > > I'd like to propose that the MAD layer detect duplicate requests. After a > request MAD has been handed to a client, its context would be maintained until > the user calls ib_free_recv_mad(), allowing duplicate requests to be discarded. There's still a window here depending on when free MAD is called versus when the response gets back to the original requester. > One drawback to this approach are that the MAD layer may discard a MAD as a > duplicate that wasn't, I suppose this depends on how the duplicate discard works. Are you envisioning a specific scenario here ? > but I'm not sure if this would happen in practice. A > second drawback is that the receive MAD would need to be kept around until the > send completed (as opposed to the send started). Is this to handle the case where free MAD is called prior to the send completing ? Is this on the response side only ? -- Hal > Finally, a way would need to be found for when to call ib_free_recv_mad() for > userspace clients. > > - Sean > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Mon May 1 08:05:44 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 01 May 2006 11:05:44 -0400 Subject: [openib-general] RFC: detecting duplicate MAD requests In-Reply-To: References: Message-ID: <1146495936.2124.153278.camel@hal.voltaire.com> On Sat, 2006-04-29 at 02:23, Sean Hefty wrote: > >You can't add this kind of thing piecemeal to a protocol and have it > >work. If the sender doesn't see a response (perhaps the response was > >lost, or was slow coming), and sends another MAD, this 2nd MAD will > >have a different sequence number. How does the recipient know it's the > > If a MAD is sent with a different sequence number (transaction ID), then it's a > different transaction or request. > > There is a real issue that is seen when a duplicate request (same TID, SGID, > mgmt class) is received at the client, resulting in a duplicate response. You had mentioned in the previous email on this that this was the case of a slow responder. Is the responder slow but playing by the IB timeouts in effect or is it violating those timeouts ? > The MAD layer cannot allow the duplicate response to be sent because of RMPP issues. Is this different for non RMPP MADs v. RMPP MADs ? Is the RMPP issue what you mention below (RMPP receiving a duplicate response) ? If so, is this an implementation or architecture issue or both ? > The most efficient solution is to detect the duplicate request, and avoid all of > the processing overhead of generating a response that must be discarded. > > No change to the MAD protocol is being proposed. Ib_free_recv_mad() already > exists, and must be called by each client. The only change being proposed is > that until ib_free_recv_mad() is called, another message with the same TID, > SGID, and mgmt class is treated as a duplicate. I believe that this is > consistent with C13-18.1.1. C13-18.1.1 defines a new operation. Isn't the case you are describing is responding to an existing operation ? > >same request? If the response was lost the first time, eating the 2nd > >MAD without sending a response will result in another timeout and a > >3rd MAD... so maybe the recipient remembers the response and sends it > > The proposal is to only discard duplicate requests while a response to the first > request is being generated. Just because a client sends a request 3 times > before we can send a response doesn't mean that we need to send 3 responses. > Such an implementation is suboptimal, and the responses that are of most concern > use RMPP anyway. > > >Really, it's up to the MAD client to deal with duplicates in its own > >way. > > A client is still restricted from sending a duplicate response while a previous > response is in progress. RMPP cannot handle this case. Why not ? Wouldn't the second response not match anything in the client on the request side ? -- Hal > - Sean > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From tziporet at mellanox.co.il Mon May 1 08:23:28 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 01 May 2006 18:23:28 +0300 Subject: [openib-general] Re: [openfabrics-ewg] OFED release plan - update on RC4 status In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6E6D@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6E6D@mtlexch01.mtl.com> Message-ID: <445627F0.40403@mellanox.co.il> Tziporet Koren wrote: > > Hi All, > > This is the release plan: > > *1. ** RC3* -- target: All components are in > Schedule: done on Apr-10. > > *2. ** RC4* -- target: features freeze -- meaning all features and > modules targeted for 1.0 release should be in. > Schedule: May-4 (delayed from May-1 since some modules are not ready yet) > *Main changes from RC3 to RC4:* > > 1. Bug fixes according to problems reported. > > 2. SDP - new code that Michael Tsirkin developed. > > 3. SRP - with new features: FMR, tunable parameters, SRP daemon > > 4. Open MPI -- new package based on 1.1a3 > > 5. RDS -- new version from main trunk > > 6. Kernel code based on git > > 7. Standard network configuration > Its seems that SDP will not make it this week thus RC4 will not include the new SDP code. Note that the new code is already checked-in to the main trunk so anyone who wish looking at it can start the review. Since there are many other changes in this RC that need testing we will not delay RC4. We may add another RC at end of next week adding SDP only. Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Mon May 1 08:37:42 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 01 May 2006 18:37:42 +0300 Subject: [openib-general] Do we want the change of not using libsysfs in OFED? Message-ID: <44562B46.6010304@mellanox.co.il> Hi, After RC3 time frame Roland changed the trunk to avoid usage of libsysfs. Currently this change was not applied to the branch thus it is not in OFED RC4. Do we want this change to be in OFED too? Note that in any case it will not be in RC4 but it can be done for RC5 with SDP. Tziporet From mst at mellanox.co.il Mon May 1 08:35:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 1 May 2006 18:35:55 +0300 Subject: [openib-general] sdp code in trunk Message-ID: <20060501153555.GK3032@mellanox.co.il> Hello! I have replaced the SDP code on trunk with new, much smaller code base, based on CMA. Note that only bcopy mode is supported. The old sdp code has been moved to https://openib.org/svn/gen2/branches/sdp_historic Please note that smaller LOC count does not mean less bugs yet - in fact, while the CMA code (mostly sdp_cma.c) is ready and works well for me, the data transfer part is in active development, and I'm aware of several race condition/data corruption issues which prevent it from being generally useful just yet, and which I am in the process of addressing. -- MST From rdreier at cisco.com Mon May 1 09:48:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 09:48:32 -0700 Subject: [openib-general] [PATCH][UVERBS][RFC] Exporting device node_type to user mode In-Reply-To: <1145895100.18808.13.camel@trinity.ogc.int> (Tom Tucker's message of "Mon, 24 Apr 2006 11:11:40 -0500") References: <1145567117.27405.38.camel@trinity.ogc.int> <1145581028.8968.9.camel@bigtime.es335.com> <1145895100.18808.13.camel@trinity.ogc.int> Message-ID: Tom> Roland: Thinking about this a little more and having read Tom> some less than flattering commentary on various mailing list Tom> about sysfs, ABI, differences between distros, etc... Is Tom> using sysfs for device attributes the right approach here, or Tom> should be bite the bullet and update the kernel-abi? Well, we already have node_type in sysfs. I don't think it's worth adding a redundant way to get the info, and we're already using sysfs for stuff like node_guid anyway. - R. From rdreier at cisco.com Mon May 1 09:53:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 09:53:00 -0700 Subject: [openib-general] [PATCH 2/2] Wean libehca off of libsysfs In-Reply-To: (Heiko J. Schick's message of "Tue, 25 Apr 2006 11:43:33 +0200") References: Message-ID: Heiko> Hello Roland, does OpenIB 1.0 RC2 (RC3, ...) still uses Heiko> libsysfs or is it only change for subversion head (trunk)? All versions of libibverbs still use libsysfs. However this patch for libehca reduces the dependency on libsysfs and will make the transition to a no-libsysfs world smoother. - R. From robert.j.woodruff at intel.com Mon May 1 09:52:58 2006 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Mon, 1 May 2006 09:52:58 -0700 Subject: [openib-general] SDP fails to compile on SVN6829 Message-ID: <000001c66d3f$b3cc3670$7aa9070a@amr.corp.intel.com> When I try to build SDP from SVN6829, I get the following error. In file included from drivers/infiniband/ulp/sdp/sdp_main.c:44: drivers/infiniband/ulp/sdp/sdp.h:6:27: net/inet_sock.h: No such file or directory Looks like there is a new include/net directory but no header file inet_sock.h in it. woody From rdreier at cisco.com Mon May 1 09:55:16 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 09:55:16 -0700 Subject: [openib-general] [RFC] [PATCH 1/3] RDMA CM: add rdma_get/set_optioncalls to get/set path records In-Reply-To: <20060426173716.GA10098@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 26 Apr 2006 20:37:16 +0300") References: <20060426075916.GB8155@mellanox.co.il> <444F9F49.7060608@ichips.intel.com> <20060426164455.GP31324@mellanox.co.il> <444FA9B5.8090507@ichips.intel.com> <20060426173716.GA10098@mellanox.co.il> Message-ID: > > I don't think that we want to start adding a new set of APIs for every > > option that may eventually need to be supported. > Why not? Agreed... as an interface to userspace, get/set opt makes sense, but inside the kernel you just end up with a dispatch function that demultiplexes things to the real work. So I think the real work functions should be the kernel API. - R. From rdreier at cisco.com Mon May 1 09:56:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 09:56:32 -0700 Subject: [openib-general] Re: [PATCH] SRP: fix crash in srp_process_rsp In-Reply-To: <20060426144817.GA21822@mellanox.co.il> (Ishai Rabinovitz's message of "Wed, 26 Apr 2006 17:50:01 +0300") References: <20060426144817.GA21822@mellanox.co.il> Message-ID: Ishai> srp_process_rsp crashes on NULL pointer dereference. Ishai> The following fixes the crash. Is this a correct fix? We should never get a RSP for a request without a a command associated. So this is just covering up a driver bug. How do you hit this crash? - R. From rdreier at cisco.com Mon May 1 10:03:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 10:03:24 -0700 Subject: [openib-general] Re: [PATCH 00/16] ehca: IBM eHCA InfiniBand Device Driver In-Reply-To: (Heiko Joerg Schick's message of "Thu, 27 Apr 2006 21:50:28 +0200") References: <4450B378.9000705@de.ibm.com> <20060427125726.GK32127@wohnheim.fh-wedel.de> Message-ID: Heiko> I don't like the idea to put the whole driver in one patch Heiko> file. I would propose to put the patch "ehca: integration Heiko> in Linux kernel" last instead of first, as Arnd Heiko> mentioned. With that change we leave the kernel in a Heiko> working state when applying the patches. Yes, that makes sense. And I can fold the patches into a single git changeset when we finally merge it, since I don't see any advantage to having the driver split into pieces. (No one is going to git biset a half-applied driver or anything like that) - R. From rdreier at cisco.com Mon May 1 10:06:11 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 10:06:11 -0700 Subject: [openib-general] ib_srp In-Reply-To: (Di Domenico's message of "Thu, 27 Apr 2006 13:10:25 -0400") References: Message-ID: Di> Hi, Is there a way to remove a target from the SRP Di> configuration without unloading the driver module (which seems Di> to have partially removed the disk, but appears to be Di> hanging)? Unloading the module should work. A trace from sysrq-t that shows where rmmod/modprobe -r is hanging would be useful. Right now there isn't a way to disconnect from a particular target port without unloading the module though. - R. From rdreier at cisco.com Mon May 1 10:09:23 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 10:09:23 -0700 Subject: [openib-general] Do we want the change of not using libsysfs in OFED? In-Reply-To: <44562B46.6010304@mellanox.co.il> (Tziporet Koren's message of "Mon, 01 May 2006 18:37:42 +0300") References: <44562B46.6010304@mellanox.co.il> Message-ID: Tziporet> Hi, After RC3 time frame Roland changed the trunk to Tziporet> avoid usage of libsysfs. Currently this change was not Tziporet> applied to the branch thus it is not in OFED RC4. Tziporet> Do we want this change to be in OFED too? Actually the trunk still uses libsysfs, it just uses it less. There should be no functional change, but of course there's always the chance of regression. So there's no strong reason to merge the changes from the trunk. - R. From mst at mellanox.co.il Mon May 1 10:14:05 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 1 May 2006 20:14:05 +0300 Subject: [openib-general] Re: SDP fails to compile on SVN6829 In-Reply-To: <000001c66d3f$b3cc3670$7aa9070a@amr.corp.intel.com> References: <000001c66d3f$b3cc3670$7aa9070a@amr.corp.intel.com> Message-ID: <20060501171405.GA4580@mellanox.co.il> Quoting r. Bob Woodruff : > Subject: SDP fails to compile on SVN6829 > > > When I try to build SDP from SVN6829, I get the following error. > In file included from drivers/infiniband/ulp/sdp/sdp_main.c:44: > drivers/infiniband/ulp/sdp/sdp.h:6:27: net/inet_sock.h: No such file or > directory > > Looks like there is a new include/net directory but no header file > inet_sock.h in it. > > woody > Which kernel are you building on? Looks like you might want the backport patches from https://openib.org/svn/gen2/branches/backport -- MST From robert.j.woodruff at intel.com Mon May 1 10:18:29 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Mon, 1 May 2006 10:18:29 -0700 Subject: [openib-general] RE: SDP fails to compile on SVN6829 Message-ID: <1AC79F16F5C5284499BB9591B33D6F0007928E8C@orsmsx408> Michael wrote, >Which kernel are you building on? 2.6.9-34EL. >Looks like you might want the backport patches from https://openib.org/svn/gen2/branches/backport I applied the sdp patch from 2.6.9_U3/sdp_6754_to_2_6_11.patch but still get the error, > drivers/infiniband/ulp/sdp/sdp.h:6:27: net/inet_sock.h: No such file or > directory do I need an additional patch or is the backport patch broken ? woody From robert.j.woodruff at intel.com Mon May 1 10:20:24 2006 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Mon, 1 May 2006 10:20:24 -0700 Subject: [openib-general] sdp code in trunk In-Reply-To: <20060501153555.GK3032@mellanox.co.il> Message-ID: <000201c66d43$889dfed0$7aa9070a@amr.corp.intel.com> Micheal wrote, >Hello! >I have replaced the SDP code on trunk with new, much smaller code base, >based on CMA. Note that only bcopy mode is supported. >The old sdp code has been moved to https://openib.org/svn/gen2/branches/sdp_historic >Please note that smaller LOC count does not mean less bugs yet - in fact, while >the CMA code (mostly sdp_cma.c) is ready and works well for me, the data >transfer part is in active development, and I'm aware of several race >condition/data corruption issues which prevent it from being generally useful >just yet, and which I am in the process of addressing. >-- >MST Should we be replacing stable code in the trunk with code that is known to be unstable ? Seems like we should wait till it is useful before moving it into the trunk. Anyone else have an opinion on this one ? woody From halr at voltaire.com Mon May 1 10:16:43 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 01 May 2006 13:16:43 -0400 Subject: [openib-general] [RFC] AT and user AT In-Reply-To: <1146067835.2124.28965.camel@hal.voltaire.com> References: <1146060008.2124.27486.camel@hal.voltaire.com> <1146067835.2124.28965.camel@hal.voltaire.com> Message-ID: <1146503788.2124.155201.camel@hal.voltaire.com> On Wed, 2006-04-26 at 12:15, Hal Rosenstock wrote: > On Wed, 2006-04-26 at 11:00, James Lentini wrote: > > On Wed, 26 Apr 2006, Hal Rosenstock wrote: > > > > > As AT and user AT have been obsoleted (and superceeded by CMA which > > > is now in the process of going upstream), any objections to removnow ing > > > AT and user AT from the trunk ? If I don't hear back by COB Friday, > > > I will presume this is OK. > > > > I'm in agreement with moving this off of the trunk. > > > > Will the code still be available for reference? > > Sure. I can move it somewhere before deleting it from the trunk. Both AT and user AT are now saved as https://openib.org/svn/gen2/branches/ibat -- Hal > -- Hal From rdreier at cisco.com Mon May 1 10:22:45 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 10:22:45 -0700 Subject: [openib-general] Re: [PATCH 8 of 13] ipath - fix a number of RC protocol bugs In-Reply-To: <20060425005654.4c08481f.akpm@osdl.org> (Andrew Morton's message of "Tue, 25 Apr 2006 00:56:54 -0700") References: <20060425005654.4c08481f.akpm@osdl.org> Message-ID: Andrew> Please don't play around with list_head internals like Andrew> this - some speedfreak might legitimately choose to remove Andrew> the list_head poisoning debug code, or make it Andrew> Kconfigurable. Bryan, can you fix this up and resend this patch? Are the other patches independent of this? Should I apply all the others, or do I need to wait for the fixed version of this one? - R. From mst at mellanox.co.il Mon May 1 10:24:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 1 May 2006 20:24:03 +0300 Subject: [openib-general] Re: SDP fails to compile on SVN6829 In-Reply-To: <1AC79F16F5C5284499BB9591B33D6F0007928E8C@orsmsx408> References: <1AC79F16F5C5284499BB9591B33D6F0007928E8C@orsmsx408> Message-ID: <20060501172403.GB4580@mellanox.co.il> Quoting r. Woodruff, Robert J : > Subject: RE: SDP fails to compile on SVN6829 > > Michael wrote, > > >Which kernel are you building on? > > 2.6.9-34EL. > > >Looks like you might want the backport patches from > https://openib.org/svn/gen2/branches/backport > > I applied the sdp patch from 2.6.9_U3/sdp_6754_to_2_6_11.patch > but still get the error, > > > drivers/infiniband/ulp/sdp/sdp.h:6:27: net/inet_sock.h: No such file > or > > directory > > do I need an additional patch or is the backport patch broken ? I think you need these: A /gen2/branches/backport/2.6.9/linux_skbuff_6754_to_2_6_11.patch (from /gen2/branches/backport/2.6.11/linux_skbuff_6754_to_2_6_11.patch:6765) A /gen2/branches/backport/2.6.9/net_inet_sock_6754_to_2_6_15.patch (from /gen2/branches/backport/2.6.11/net_inet_sock_6754_to_2_6_15.patch:6765) A /gen2/branches/backport/2.6.9/net_sock_1_6754_to_2_6_13.patch (from /gen2/branches/backport/2.6.11/net_sock_1_6754_to_2_6_13.patch:6765) A /gen2/branches/backport/2.6.9/net_sock_2_6754_to_2_6_11.patch (from /gen2/branches/backport/2.6.11/net_sock_2_6754_to_2_6_11.patch:6767) A /gen2/branches/backport/2.6.9/net_tcp_states_6754_to_2_6_13.patch (from /gen2/branches/backport/2.6.11/net_tcp_states_6754_to_2_6_13.patch:6765) A /gen2/branches/backport/2.6.9/sdp_6754_to_2_6_11.patch (from /gen2/branches/backport/2.6.11/sdp_6754_to_2_6_11.patch:6765) A /gen2/branches/backport/2.6.9_U3/linux_skbuff_6754_to_2_6_11.patch (from /gen2/branches/backport/2.6.11/linux_skbuff_6754_to_2_6_11.patch:6765) A /gen2/branches/backport/2.6.9_U3/net_inet_sock_6754_to_2_6_15.patch (from /gen2/branches/backport/2.6.11/net_inet_sock_6754_to_2_6_15.patch:6765) A /gen2/branches/backport/2.6.9_U3/net_sock_1_6754_to_2_6_13.patch (from /gen2/branches/backport/2.6.11/net_sock_1_6754_to_2_6_13.patch:6765) A /gen2/branches/backport/2.6.9_U3/net_sock_2_6754_to_2_6_11.patch (from /gen2/branches/backport/2.6.11/net_sock_2_6754_to_2_6_11.patch:6767) A /gen2/branches/backport/2.6.9_U3/net_tcp_states_6754_to_2_6_13.patch (from /gen2/branches/backport/2.6.11/net_tcp_states_6754_to_2_6_13.patch:6765) A /gen2/branches/backport/2.6.9_U3/sdp_6754_to_2_6_11.patch (from /gen2/branches/backport/2.6.11/sdp_6754_to_2_6_11.patch:6765) -- MST From halr at voltaire.com Mon May 1 10:22:33 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 01 May 2006 13:22:33 -0400 Subject: [openib-general] [RFC] AT and user AT In-Reply-To: <1146060008.2124.27486.camel@hal.voltaire.com> References: <1146060008.2124.27486.camel@hal.voltaire.com> Message-ID: <1146504151.2124.155274.camel@hal.voltaire.com> On Wed, 2006-04-26 at 10:00, Hal Rosenstock wrote: > As AT and user AT have been obsoleted (and superceeded by CMA which is > now in the process of going upstream), any objections to removing AT and > user AT from the trunk ? If I don't hear back by COB Friday, I will > presume this is OK. This is now done. AT and user AT are gone from the trunk. -- Hal From bos at pathscale.com Mon May 1 10:34:34 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Mon, 01 May 2006 10:34:34 -0700 Subject: [openib-general] Re: [PATCH 8 of 13] ipath - fix a number of RC protocol bugs In-Reply-To: References: <20060425005654.4c08481f.akpm@osdl.org> Message-ID: <1146504874.2906.7.camel@chalcedony.pathscale.com> On Mon, 2006-05-01 at 10:22 -0700, Roland Dreier wrote: > Andrew> Please don't play around with list_head internals like > Andrew> this - some speedfreak might legitimately choose to remove > Andrew> the list_head poisoning debug code, or make it > Andrew> Kconfigurable. > > Bryan, can you fix this up and resend this patch? Yep. We already have a fix; I just need to put it in my queue. > Are the other patches independent of this? Should I apply all the > others, or do I need to wait for the fixed version of this one? They're all independent of this, so please fire away. Thanks, References: Message-ID: <96f8e60e0605011042ve9bb9m5e9675256a11eacd@mail.gmail.com> On 5/1/06, Or Gerlitz wrote: > Can you elaborate on each of the features, specifically the following > points are of interest to us: > > +1 so you running Oracle Loopback traffic over RDS sockets? if yes, what > the issue here? > the openib CMA supports listen/connect on loopback addresses (eg > 127.0.0.1 or IPoIB local address) Yes. There is no issue. It's just next in line for me to implement. > > +2 by failover, are you referring to APM? that is failover between IB > pathes to/from the same HCA > over which the original connection/QP was established or you are talking > on failover between HCAs Failover within and across HCAs. APM does not work for failover across HCAs. > > +3 is the no support for /proc like for RDS an issue to run crload or > demo Oracle (that is specific tuning > and usage of non defaults is needed for any/optimal operation) No, this does not affect core functionality. You should be able to run Oracle or crload without this feature. That was a list of things that still need to be implemented for GA and not just demo > > Or. > > [openfabrics-ewg] Before we can start testing - we needto ensure that > RDS is fully ported. > > Pandit, Ranjit rpandit at silverstorm.com > > Following features are yet to be implemented in OpenFabric Rds: > > 1. Failover > 2. Loopback connections > 3. support for /proc fs like Rds config, stats and info. > > > > Ranjit > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Mon May 1 10:46:20 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 10:46:20 -0700 Subject: [openib-general] re RDS missing features In-Reply-To: <96f8e60e0605011042ve9bb9m5e9675256a11eacd@mail.gmail.com> (Ranjit Pandit's message of "Mon, 1 May 2006 10:42:51 -0700") References: <96f8e60e0605011042ve9bb9m5e9675256a11eacd@mail.gmail.com> Message-ID: > > +3 is the no support for /proc like for RDS an issue to run crload or > > demo Oracle (that is specific tuning > > and usage of non defaults is needed for any/optimal operation) > No, this does not affect core functionality. You should be able to run > Oracle or crload without this feature. Don't put RDS tunables in /proc. They don't have anything to do with processes. Probably the best place for them is in sysfs, following the "one value per file" rule. If you can't follow that rule then create your own filesystem. - R. From sean.hefty at intel.com Mon May 1 10:48:40 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 1 May 2006 10:48:40 -0700 Subject: [openib-general] re cma upcalls serialization / disconnected eventquestion In-Reply-To: Message-ID: >Can a ULP assume that cma callbacks for to the same CMA ID >are serialized? Yes. (This is required to avoid reporting events out of order to the user.) >Also and related to this, is it correct that ***always** before >DISCONNECTED event there will be one of {ESTABLISHED, REJECTED, >CONNECT_ERROR}? You should always see ESTABLISHED before DISCONNECTED. If not, then there's a bug in the CMA. - Sean From rdreier at cisco.com Mon May 1 10:53:36 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 10:53:36 -0700 Subject: [openib-general] Re: [PATCH 00/12] SRP: Changing ibsrpdm In-Reply-To: <20060501112145.GA17552@mellanox.co.il> (Ishai Rabinovitz's message of "Mon, 1 May 2006 14:21:45 +0300") References: <20060501112145.GA17552@mellanox.co.il> Message-ID: Ishai> Hi, I'm going to send 12 patches. 6 patches for the kernel, Ishai> and 6 for the userspace ibsrpdm. The kernel patches avoid Ishai> adding the same target twice, allow the removal of a Ishai> target, and add a query about the connected targets. In the future can you use a different descriptive title for each patch? Also (although I haven't reviewed the actual code yet) this mostly makes sense, but I'm not sure we want to disallow connecting to the same target twice. Userspace may want to implement a policy of one conncetion per target, but having multiple connections to the same target for multipathing/failover seems like something the kernel should allow. What was your reason for forbidding this? - R. From krause at cup.hp.com Mon May 1 10:58:13 2006 From: krause at cup.hp.com (Michael Krause) Date: Mon, 01 May 2006 10:58:13 -0700 Subject: [openib-general] re RDS missing features In-Reply-To: <96f8e60e0605011042ve9bb9m5e9675256a11eacd@mail.gmail.com> References: <96f8e60e0605011042ve9bb9m5e9675256a11eacd@mail.gmail.com> Message-ID: <6.2.0.14.2.20060501105552.021ff4b8@esmail.cup.hp.com> Given this is an extension to Sockets, should it not also be reviewed by the Sockets owners? What about the API itself? Any plans to make this portable to other OS / endnodes or have a spec and associated wire protocol that is reviewed perhaps in the IETF so it is applicable to more than just Oracle? It seems this really should be standardized within the IETF to gain broad adoption and insure it will be interoperable across all implementations not just OpenFabric's. At 10:42 AM 5/1/2006, Ranjit Pandit wrote: >On 5/1/06, Or Gerlitz wrote: >>Can you elaborate on each of the features, specifically the following >>points are of interest to us: >> >>+1 so you running Oracle Loopback traffic over RDS sockets? if yes, what >>the issue here? >>the openib CMA supports listen/connect on loopback addresses (eg >>127.0.0.1 or IPoIB local address) > >Yes. >There is no issue. It's just next in line for me to implement. > >> >>+2 by failover, are you referring to APM? that is failover between IB >>pathes to/from the same HCA >>over which the original connection/QP was established or you are talking >>on failover between HCAs > >Failover within and across HCAs. APM does not work for failover across HCAs. For OpenFabric, one would need to have this work across RNIC as well. APM is not part of iWARP so can't be relied upon. >>+3 is the no support for /proc like for RDS an issue to run crload or >>demo Oracle (that is specific tuning >> and usage of non defaults is needed for any/optimal operation) > >No, this does not affect core functionality. You should be able to run >Oracle or crload without this feature. > >That was a list of things that still need to be implemented for GA and >not just demo > >> >>Or. >> >>[openfabrics-ewg] Before we can start testing - we needto ensure that >>RDS is fully ported. >> >>Pandit, Ranjit rpandit at silverstorm.com >> >>Following features are yet to be implemented in OpenFabric Rds: >> >> 1. Failover >>2. Loopback connections >>3. support for /proc fs like Rds config, stats and info. >> >> >> >>Ranjit >> >> >> >>_______________________________________________ >>openib-general mailing list >>openib-general at openib.org >>http://openib.org/mailman/listinfo/openib-general >> >>To unsubscribe, please visit >>http://openib.org/mailman/listinfo/openib-general >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From albertt at broadcom.com Mon May 1 10:59:02 2006 From: albertt at broadcom.com (Albert To) Date: Mon, 1 May 2006 10:59:02 -0700 Subject: [openib-general] Problem running mpdboot command in MVAPICH2 v0.9.3-RC0 Message-ID: Hi Wei, Thanks for a prompt reply. Yes, I did originally export the LD_LIBRARY_PATH in .bashrc as followed: export LD_LIBRARY_PATH=/usr/local/lib I've also tried your suggestion: export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH In either case, issue still exists. Given the same setup, I did NOT see this issue in v0.9.2 (obtained from https://openib.org/svn/gen2/trunk/src/userspace/mpi/mvapich2-gen2). Thanks, Albert ----Original Message----- From: wei huang [mailto:huanwei at cse.ohio-state.edu] Sent: Saturday, April 29, 2006 2:09 PM To: Albert To Cc: openib-general at openib.org Subject: Re: [openib-general] Problem running mpdboot command in MVAPICH2 v0.9.3-RC0 Hi Albert, Not sure if you export /usr/local/lib to LD_LIBRARY_PATH manually or it is in your bashrc. Could you please try to put export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH in your .bashrc (assume using bash) and try again? Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Fri, 28 Apr 2006, Albert To wrote: > Hi, > > I downloaded and compiled the MVAPICH2 v0.9.3-RC0 using > make.mvapich2.gen2 script. The script finished without any errors. > However, I received "mpdboot: error while loading shared libraries: > libibverbs.so.1: cannot open shared object file: No such file or > directory" error while executing mpdboot -n 2 -f mpd.hosts. I checked > library file libibverbs.so.1 and found it in /usr/local/lib folder. > LD_LIBRARY_PATH is already set to /usr/local/bin, but that didn't help. > > Is there another environment variable that I need to set to make > mpdboot works? Thanks in advance for your help. > > -Albert > From sean.hefty at intel.com Mon May 1 11:04:34 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 1 May 2006 11:04:34 -0700 Subject: [openib-general] RFC: detecting duplicate MAD requests In-Reply-To: <1146495936.2124.153278.camel@hal.voltaire.com> Message-ID: >> There is a real issue that is seen when a duplicate request (same TID, SGID, >> mgmt class) is received at the client, resulting in a duplicate response. > >You had mentioned in the previous email on this that this was the case >of a slow responder. Is the responder slow but playing by the IB >timeouts in effect or is it violating those timeouts ? I don't believe that the responder is violating any timeouts. >> The MAD layer cannot allow the duplicate response to be sent because of RMPP >issues. > >Is this different for non RMPP MADs v. RMPP MADs ? Is the RMPP issue >what you mention below (RMPP receiving a duplicate response) ? If so, is >this an implementation or architecture issue or both ? The issue is a result of the RMPP architecture, but I wouldn't say that RMPP has an issue. It's simply a matter that you can't reassemble multiple MADs from the same source that use the same transaction ID. For non-RMPP MADs, the only issue is one of efficiency. A duplicate response would just be dropped on the requester side if the first response is received. >> A client is still restricted from sending a duplicate response while a >previous >> response is in progress. RMPP cannot handle this case. > >Why not ? Wouldn't the second response not match anything in the client >on the request side ? This is true if the first response completes before the second response is sent. The problem is when both responses are active at the same time. - Sean From robert.j.woodruff at intel.com Mon May 1 11:07:03 2006 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Mon, 1 May 2006 11:07:03 -0700 Subject: [openib-general] RE: SDP fails to compile on SVN6829 In-Reply-To: <20060501172403.GB4580@mellanox.co.il> Message-ID: <000301c66d4a$0cd1a070$7aa9070a@amr.corp.intel.com> Michael wrote, > do I need an additional patch or is the backport patch broken ? Personally, I don't think we should be moving code into the trunk until it is ready, and obviously this new SDP is not ready. Anyone else have an opinion on how/when things get moved to the trunk ? Shouldn't it be kept on a branch till it is ready ? >I think you need these: > A /gen2/branches/backport/2.6.9/linux_skbuff_6754_to_2_6_11.patch (from >/gen2/branches/backport/2.6.11/linux_skbuff_6754_to_2_6_11.patch:6765) For example, this patch adds a function static inline void skb_header_release(struct sk_buff *skb) that does not do anything yet. Maybe I better wait till you have this new SDP completed before moving to it, until then I will use the older SDP. woody =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ last_stable/include/linux/skbuff.h 2006-04-30 09:16:05.000000000 +0300 @@ -0,0 +1,19 @@ +#ifndef LINUX_SKBUFF_H_BACKPORT +#define LINUX_SKBUFF_H_BACKPORT + +#include_next + +/** + * skb_header_release - release reference to header + * @skb: buffer to operate on + * + * Drop a reference to the header part of the buffer. This is done + * by acquiring a payload reference. You must not read from the header + * part of skb->data after this. + */ +static inline void skb_header_release(struct sk_buff *skb) +{ +} + + +#endif MST From sean.hefty at intel.com Mon May 1 11:13:58 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 1 May 2006 11:13:58 -0700 Subject: [openib-general] RFC: detecting duplicate MAD requests In-Reply-To: <1146495061.2124.153047.camel@hal.voltaire.com> Message-ID: >There's still a window here depending on when free MAD is called versus >when the response gets back to the original requester. There are no issues in this case. We just need to avoid having two responses being sent at the same time. >> but I'm not sure if this would happen in practice. A >> second drawback is that the receive MAD would need to be kept around until >the >> send completed (as opposed to the send started). > >Is this to handle the case where free MAD is called prior to the send >completing ? Is this on the response side only ? The basic idea is that when a MAD with the response bit set is sent, a check is made against a list received MADs that have been reported to the user. If a received MAD is found, it is removed from the list, and the response is sent. If no request is found (e.g. the MAD had already been freed), then the send fails. - Sean From halr at voltaire.com Mon May 1 11:13:21 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 01 May 2006 14:13:21 -0400 Subject: [openib-general] RFC: detecting duplicate MAD requests In-Reply-To: References: Message-ID: <1146506892.24063.531.camel@hal.voltaire.com> On Mon, 2006-05-01 at 14:04, Sean Hefty wrote: > >> There is a real issue that is seen when a duplicate request (same TID, SGID, > >> mgmt class) is received at the client, resulting in a duplicate response. > > > >You had mentioned in the previous email on this that this was the case > >of a slow responder. Is the responder slow but playing by the IB > >timeouts in effect or is it violating those timeouts ? > > I don't believe that the responder is violating any timeouts. Why is the requester resending ? > >> The MAD layer cannot allow the duplicate response to be sent because of RMPP > >issues. > > > >Is this different for non RMPP MADs v. RMPP MADs ? Is the RMPP issue > >what you mention below (RMPP receiving a duplicate response) ? If so, is > >this an implementation or architecture issue or both ? > > The issue is a result of the RMPP architecture, but I wouldn't say that RMPP has > an issue. It's simply a matter that you can't reassemble multiple MADs from the > same source that use the same transaction ID. > > For non-RMPP MADs, the only issue is one of efficiency. A duplicate response > would just be dropped on the requester side if the first response is received. > > >> A client is still restricted from sending a duplicate response while a > >previous > >> response is in progress. RMPP cannot handle this case. > > > >Why not ? Wouldn't the second response not match anything in the client > >on the request side ? > > This is true if the first response completes before the second response is sent. > The problem is when both responses are active at the same time. > > - Sean From sean.hefty at intel.com Mon May 1 11:18:39 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 1 May 2006 11:18:39 -0700 Subject: [openib-general] RFC: detecting duplicate MAD requests In-Reply-To: <1146494601.2124.152923.camel@hal.voltaire.com> Message-ID: >Aren't there 3 cases possible here: (1) non RMPP request/RMPP response >(e.g. SA GetTable for one), (2) RMPP request/RMPP response (e.g. SA >GetMulti), and (3) RMPP request/non RMPP response (I don't think this >currently exists but may be mistaken). Are all handled on the >initiator/requester side ? Are the changes only for case (2) ? For case 1, we don't have DS RMPP. For case 2, we have DS RMPP. And I don't believe that case 3 exists either, but would end up being treated as DS RMPP by the implementation. If case 3 doesn't exist, then I think we can come up with a generic way to identify DS RMPP that doesn't require checking class or methods. If case 3 does exist, then I think we'll need class / method checking to identify DS RMPP. - Sean From halr at voltaire.com Mon May 1 11:17:32 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 01 May 2006 14:17:32 -0400 Subject: [openib-general] RFC: detecting duplicate MAD requests In-Reply-To: References: Message-ID: <1146507327.24063.625.camel@hal.voltaire.com> On Mon, 2006-05-01 at 14:13, Sean Hefty wrote: > >There's still a window here depending on when free MAD is called versus > >when the response gets back to the original requester. > > There are no issues in this case. We just need to avoid having two responses > being sent at the same time. > > >> but I'm not sure if this would happen in practice. A > >> second drawback is that the receive MAD would need to be kept around until > >the > >> send completed (as opposed to the send started). > > > >Is this to handle the case where free MAD is called prior to the send > >completing ? Is this on the response side only ? > > The basic idea is that when a MAD with the response bit set is sent, a check is > made against a list received MADs that have been reported to the user. If a > received MAD is found, it is removed from the list, and the response is sent. > If no request is found (e.g. the MAD had already been freed), then the send > fails. It needs to fail in a way so that it is not retried, right ? -- Hal > - Sean From sean.hefty at intel.com Mon May 1 11:23:04 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 1 May 2006 11:23:04 -0700 Subject: [openib-general] RFC: detecting duplicate MAD requests In-Reply-To: <1146506892.24063.531.camel@hal.voltaire.com> Message-ID: >Why is the requester resending ? He's simply timed out waiting for a response. For instance, if this is an SA query, maybe the SA is swamped with requests. I don't think that there are any timeout restrictions for this. - Sean From sean.hefty at intel.com Mon May 1 11:25:46 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 1 May 2006 11:25:46 -0700 Subject: [openib-general] RFC: detecting duplicate MAD requests In-Reply-To: <1146507327.24063.625.camel@hal.voltaire.com> Message-ID: >It needs to fail in a way so that it is not retried, right ? The ib_post_send_mad() call will fail. Since the first response removed the request from the list to check, subsequent retries will also fail. Basically, this prevents a user from sending a response MAD unless it had previously received a request. - Sean From rdreier at cisco.com Mon May 1 11:27:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 11:27:24 -0700 Subject: [openib-general] Re: [PATCH] SRP: Avoid a potential deadlock In-Reply-To: <20060501113548.GN17552@mellanox.co.il> (Ishai Rabinovitz's message of "Mon, 1 May 2006 14:35:48 +0300") References: <20060501113548.GN17552@mellanox.co.il> Message-ID: Ishai> Avoid a potential dead-lock. In srp_disconnect_target Ishai> there is a call to ib_send_cm_dreq and a wait for Ishai> completion If when getting DREP there is no comp no one Ishai> will end this wait I thought that after the DREP is received, the CM will go through timewait and we will eventually get a TIMEWAIT_EXIT event (with a completion). Am I wrong? Have you actually seen this deadlock happen in practice? - R. From rdreier at cisco.com Mon May 1 11:32:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 11:32:08 -0700 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: (Or Gerlitz's message of "Thu, 27 Apr 2006 15:30:03 +0300 (IDT)") References: Message-ID: Is this ready for queuing in my for-2.6.18 tree? What is the status of all the non-IB dependencies? If it is ready for merging, please send me a clean patch series with the comments from this thread addressed. And also remind me of which SCSI git trees this depends on... Thanks, Roland From rdreier at cisco.com Mon May 1 11:47:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 11:47:32 -0700 Subject: [openib-general] Re: [PATCH 2 of 13] ipath - set up 32-bit DMA mask if 64-bit setup fails In-Reply-To: <1906950392f7ef8c7d07.1145913778@eng-12.pathscale.com> (Bryan O'Sullivan's message of "Mon, 24 Apr 2006 14:22:58 -0700") References: <1906950392f7ef8c7d07.1145913778@eng-12.pathscale.com> Message-ID: Bryan> Some systems do not set up 64-bit maps on systems with 2GB Bryan> or less of memory installed, so we have to fall back to Bryan> trying a 32-bit setup. Which systems does this happen on? I'm just curious, because mthca has err = pci_set_dma_mask(pdev, DMA_64BIT_MASK); if (err) { dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI DMA mask.\n"); and I've never had a single report of that warning triggering. - R. From rdreier at cisco.com Mon May 1 11:50:36 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 11:50:36 -0700 Subject: [openib-general] Re: [PATCH 5 of 13] ipath - use proper address translation routine In-Reply-To: <1ab168913f0fea5d18b4.1145913781@eng-12.pathscale.com> (Bryan O'Sullivan's message of "Mon, 24 Apr 2006 14:23:01 -0700") References: <1ab168913f0fea5d18b4.1145913781@eng-12.pathscale.com> Message-ID: Bryan> Move away from an obsolete, unportable routine for Bryan> translating physical addresses. This change: > - isge->vaddr = bus_to_virt(sge->addr); > + isge->vaddr = phys_to_virt(sge->addr); is really wrong. bus_to_virt() is really what you want, because in this case the address is a bus address that came from dma_map_xxx(). You're still going to be hosed on systems with IOMMUs for example. - R. From arjan at infradead.org Mon May 1 11:54:06 2006 From: arjan at infradead.org (Arjan van de Ven) Date: Mon, 01 May 2006 20:54:06 +0200 Subject: [openib-general] Re: [PATCH 5 of 13] ipath - use proper address translation routine In-Reply-To: References: <1ab168913f0fea5d18b4.1145913781@eng-12.pathscale.com> Message-ID: <1146509646.20760.63.camel@laptopd505.fenrus.org> On Mon, 2006-05-01 at 11:50 -0700, Roland Dreier wrote: > Bryan> Move away from an obsolete, unportable routine for > Bryan> translating physical addresses. > > This change: > > > - isge->vaddr = bus_to_virt(sge->addr); > > + isge->vaddr = phys_to_virt(sge->addr); > > is really wrong. bus_to_virt() is really what you want, because in > this case the address is a bus address that came from dma_map_xxx(). > You're still going to be hosed on systems with IOMMUs for example. do you really NEED the vaddr? (most of the time linux drivers don't need it, while other OSes do) If you really need it you should grab it at dma_map time ... (and realize that it's not kernel addressable per se ;) From ishai at mellanox.co.il Mon May 1 11:51:00 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 21:51:00 +0300 Subject: [openib-general] [PATCHE 02/12] SRP: changing ibsrpdm In-Reply-To: <20060501133342.GJ3599@rhun.haifa.ibm.com> References: <20060501133342.GJ3599@rhun.haifa.ibm.com> Message-ID: <20060501185100.GA22488@mellanox.co.il> On Mon, May 01, 2006 at 04:33:42PM +0300, Muli Ben-Yehuda wrote: > On Mon, May 01, 2006 at 02:25:46PM +0300, Ishai Rabinovitz wrote: > > > > Move the destruction of the host and the removal from a list to a function. > > > > Signed-off-by: Ishai Rabinovitz > > > > Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c > > =================================================================== > > --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-23 14:08:03.000000000 +0300 > > +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-24 10:47:00.000000000 +0300 > > @@ -344,6 +344,16 @@ static void srp_disconnect_target(struct > > wait_for_completion(&target->done); > > } > > > > +static void destruct_scsi_host_and_target(struct srp_target_port *target, int disconnect_target) > > +{ > > + scsi_remove_host(target->scsi_host); > > + if (disconnect_target) > > + srp_disconnect_target(target); > > + ib_destroy_cm_id(target->cm_id); > > + srp_free_target_ib(target); > > + scsi_host_put(target->scsi_host); > > +} > > + > > static void srp_remove_work(void *target_ptr) > > { > > struct srp_target_port *target = target_ptr; > > @@ -357,10 +374,7 @@ static void srp_remove_work(void *target > > list_del(&target->list); > > mutex_unlock(&target->srp_host->target_mutex); > > > > - scsi_remove_host(target->scsi_host); > > - ib_destroy_cm_id(target->cm_id); > > - srp_free_target_ib(target); > > - scsi_host_put(target->scsi_host); > > + destruct_scsi_host_and_target(target, 0); > > Is not disconnecting from the target here actually the right thing to > do? considering we're then destroying the target's queue pairs and > freeing it? > > Cheers, > Muli Hi Muli, srp_remove_target is being called only when we were unable to reconnect in srp_reconnect_target so the target is already disconnected. Ishai -- Ishai Rabinovitz From rdreier at cisco.com Mon May 1 12:00:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 12:00:00 -0700 Subject: [openib-general] Re: [PATCH 5 of 13] ipath - use proper address translation routine In-Reply-To: <1146509646.20760.63.camel@laptopd505.fenrus.org> (Arjan van de Ven's message of "Mon, 01 May 2006 20:54:06 +0200") References: <1ab168913f0fea5d18b4.1145913781@eng-12.pathscale.com> <1146509646.20760.63.camel@laptopd505.fenrus.org> Message-ID: Arjan> do you really NEED the vaddr? (most of the time linux Arjan> drivers don't need it, while other OSes do) If you really Arjan> need it you should grab it at dma_map time ... (and Arjan> realize that it's not kernel addressable per se ;) Yes, they need some kind of vaddr. It's kind of a layering problem. The IB stack assumes that IB devices have a DMA engine that deals with bus addresses. But the ipath driver has to simulate this by using a memcpy on the CPU to move data to the PCI device. I really don't know what the right solution is. Maybe having some way to override the dma mapping operations so that the ipath driver can keep the info it needs? - R. From bos at pathscale.com Mon May 1 12:03:16 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Mon, 01 May 2006 12:03:16 -0700 Subject: [openib-general] Re: [PATCH 5 of 13] ipath - use proper address translation routine In-Reply-To: References: <1ab168913f0fea5d18b4.1145913781@eng-12.pathscale.com> Message-ID: <1146510196.4544.1.camel@chalcedony.pathscale.com> On Mon, 2006-05-01 at 11:50 -0700, Roland Dreier wrote: > Bryan> Move away from an obsolete, unportable routine for > Bryan> translating physical addresses. > > This change: > > > - isge->vaddr = bus_to_virt(sge->addr); > > + isge->vaddr = phys_to_virt(sge->addr); > > is really wrong. bus_to_virt() is really what you want, because in > this case the address is a bus address that came from dma_map_xxx(). Well, bus_to_virt is not portable, so we definitely can't use it. I'll have to do some thinking about this. References: <20060501134323.GK3599@rhun.haifa.ibm.com> Message-ID: <20060501190444.GB22488@mellanox.co.il> On Mon, May 01, 2006 at 04:43:23PM +0300, Muli Ben-Yehuda wrote: > On Mon, May 01, 2006 at 02:27:39PM +0300, Ishai Rabinovitz wrote: > > > > Do not add the same target twice. > > > > Signed-off-by: Ishai Rabinovitz > > Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c > > =================================================================== > > --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-25 15:17:34.000000000 +0300 > > +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-25 15:19:37.000000000 +0300 > > @@ -1478,7 +1478,8 @@ static int srp_parse_options(const char > > printk(KERN_WARNING PFX "bad max sect parameter '%s'\n", p); > > goto out; > > } > > - target->scsi_host->max_sectors = token; > > + if (target->scsi_host != NULL) > > + target->scsi_host->max_sectors = token; > > break; > > This chunk does not look related to the rest. Is a NULL > target->scsi_host legal here? if not, the check should be removed as > we'd rather take an oops here than hide the problem behind the NULL > pointer check. > > > +/* srp_find_target - If the target exists return it in target, > > + otherwise target is set to NULL. > > + host->target_mutex should be hold */ > > Please use the usual kernel > /* > * stuff > */ > style for multi line comments. OK, Thanks. > > > +static int srp_find_target(const char *buf, struct srp_host *host, > > + struct srp_target_port **target) > > +{ > > + struct srp_target_port *target_to_find, *curr_target; > > + int ret, i; > > + > > + target_to_find = kzalloc(sizeof *target_to_find, GFP_KERNEL); > > + ret = srp_parse_options(buf, target_to_find); > > + if (ret) > > + goto free; > > + > > + list_for_each_entry(curr_target, &host->target_list, list) > > + if (target_to_find->ioc_guid == curr_target->ioc_guid && > > + target_to_find->id_ext == curr_target->id_ext && > > + target_to_find->path.pkey == curr_target->path.pkey && > > + target_to_find->service_id == curr_target->service_id) { > > + for (i = 0; i < 16; ++i) > > + if (target_to_find->path.dgid.raw[i] != curr_target->path.dgid.raw[i]) > > + break; > > The conditional and check here probably deserves an inline helper > called same_target() or some such. > > > + if (i == 16) { > > + *target = curr_target; > > + goto free; > > + } > > + } > > + > > + *target = NULL; > > + > > +free: > > + kfree(target_to_find); > > + return 0; > > We always return 0 - either this should return void, or you meant to > return ret here instead of 0? You are right as usual, We should return ret. > > > +} > > + > > static ssize_t srp_create_target(struct class_device *class_dev, > > const char *buf, size_t count) > > { > > struct srp_host *host = > > container_of(class_dev, struct srp_host, class_dev); > > struct Scsi_Host *target_host; > > - struct srp_target_port *target; > > + struct srp_target_port *target, *existing_target = NULL; > > int ret; > > int i; > > > > + /* first check if the target already exists */ > > + > > + mutex_lock(&host->target_mutex); > > + ret = srp_find_target(buf, host, &existing_target); > > + if (ret) > > + goto unlock_mutex; > > + > > + if (existing_target) { > > + /* target already exists */ > > + spin_lock_irq(existing_target->scsi_host->host_lock); > > why _irq and not _irqsave? Are you sure this code can't ever be called > with interrupts off via some other path? This function is being called from userspace (writing to /sys/class/infiniband_srp/.../add_target) so no need for irqsave. Do you think we should always use irqsave just to be on the safe side (Maybe in the future someone else will call us)? > > Cheers, > Muli --------------- Resending the fixed patch ---------------------- ----------------------------------------------------------------- Do not add the same target twice. Signed-off-by: Ishai Rabinovitz Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-25 15:17:34.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-25 15:19:37.000000000 +0300 @@ -1478,7 +1478,8 @@ static int srp_parse_options(const char printk(KERN_WARNING PFX "bad max sect parameter '%s'\n", p); goto out; } - target->scsi_host->max_sectors = token; + if (target->scsi_host != NULL) + target->scsi_host->max_sectors = token; break; default: @@ -1503,20 +1504,92 @@ out: return ret; } +/* + * srp_find_target - If the target exists return it in target, + * otherwise target is set to NULL. + * host->target_mutex should be hold + */ +static int srp_find_target(const char *buf, struct srp_host *host, + struct srp_target_port **target) +{ + struct srp_target_port *target_to_find, *curr_target; + int ret, i; + + target_to_find = kzalloc(sizeof *target_to_find, GFP_KERNEL); + ret = srp_parse_options(buf, target_to_find); + if (ret) + goto free; + + list_for_each_entry(curr_target, &host->target_list, list) + if (target_to_find->ioc_guid == curr_target->ioc_guid && + target_to_find->id_ext == curr_target->id_ext && + target_to_find->path.pkey == curr_target->path.pkey && + target_to_find->service_id == curr_target->service_id) { + for (i = 0; i < 16; ++i) + if (target_to_find->path.dgid.raw[i] != + curr_target->path.dgid.raw[i]) + break; + if (i == 16) { + *target = curr_target; + goto free; + } + } + + *target = NULL; + +free: + kfree(target_to_find); + return ret; +} + static ssize_t srp_create_target(struct class_device *class_dev, const char *buf, size_t count) { struct srp_host *host = container_of(class_dev, struct srp_host, class_dev); struct Scsi_Host *target_host; - struct srp_target_port *target; + struct srp_target_port *target, *existing_target = NULL; int ret; int i; + /* first check if the target already exists */ + + mutex_lock(&host->target_mutex); + ret = srp_find_target(buf, host, &existing_target); + if (ret) + goto unlock_mutex; + + if (existing_target) { + /* target already exists */ + spin_lock_irq(existing_target->scsi_host->host_lock); + switch (existing_target->state) { + case SRP_TARGET_LIVE: + printk(KERN_WARNING PFX "target %s already exists\n", + buf); + ret = -EEXIST; + break; + case SRP_TARGET_CONNECTING: + /* It is in the middle of reconnecting */ + ret = -EALREADY; + break; + case SRP_TARGET_DEAD: + /* It will be removed soon - create a new one */ + case SRP_TARGET_REMOVED: + /* target is dead, create a new one */ + break; + } + spin_unlock_irq(existing_target->scsi_host->host_lock); + if (ret) + goto unlock_mutex; + } + + /* really create the target */ target_host = scsi_host_alloc(&srp_template, sizeof (struct srp_target_port)); - if (!target_host) - return -ENOMEM; + if (!target_host) { + ret = -ENOMEM; + goto unlock_mutex; + } target_host->max_lun = SRP_MAX_LUN; @@ -1533,7 +1603,7 @@ static ssize_t srp_create_target(struct ret = srp_parse_options(buf, target); if (ret) - goto err; + goto err_put_scsi_host; ib_get_cached_gid(host->dev, host->port, 0, &target->path.sgid); @@ -1554,7 +1624,7 @@ static ssize_t srp_create_target(struct ret = srp_create_target_ib(target); if (ret) - goto err; + goto err_put_scsi_host; target->cm_id = ib_create_cm_id(host->dev, srp_cm_handler, target); if (IS_ERR(target->cm_id)) { @@ -1572,7 +1642,8 @@ static ssize_t srp_create_target(struct if (ret) goto err_disconnect; - return count; + ret = count; + goto unlock_mutex; err_disconnect: srp_disconnect_target(target); @@ -1583,9 +1654,12 @@ err_cm_id: err_free: srp_free_target_ib(target); -err: +err_put_scsi_host: scsi_host_put(target_host); +unlock_mutex: + mutex_unlock(&host->target_mutex); + return ret; } -- Ishai Rabinovitz From rdreier at cisco.com Mon May 1 12:12:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 12:12:05 -0700 Subject: [openib-general] [PATCH 04/12] SRP: Changing ibsrpdm In-Reply-To: <20060501190444.GB22488@mellanox.co.il> (Ishai Rabinovitz's message of "Mon, 1 May 2006 22:04:44 +0300") References: <20060501134323.GK3599@rhun.haifa.ibm.com> <20060501190444.GB22488@mellanox.co.il> Message-ID: Ishai> Do you think we should always use irqsave just to be on the Ishai> safe side (Maybe in the future someone else will call us)? Not in a function that does mutex_lock() also... - R. From rdreier at cisco.com Mon May 1 12:17:23 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 12:17:23 -0700 Subject: [openib-general] [GIT PULL] InfiniBand driver fixes for 2.6.17 Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This is mostly fixes for the ipath driver, with one mthca driver fix thrown in. The exact changes and patch are: Bryan O'Sullivan: IB/ipath: fix race with exposing reset file IB/ipath: set up 32-bit DMA mask if 64-bit setup fails IB/ipath: iterate over correct number of ports during reset IB/ipath: change handling of PIO buffers IB/ipath: fix verbs registration IB/ipath: prevent hardware from being accessed during reset IB/ipath: simplify RC send posting IB/ipath: simplify IB timer usage IB/ipath: improve sparse annotation IB/ipath: fix label name in interrupt handler IB/ipath: tidy up white space in a few files Roland Dreier: IB/mthca: Fix offset in query_gid method drivers/infiniband/hw/ipath/ipath_debug.h | 15 +++++----- drivers/infiniband/hw/ipath/ipath_diag.c | 3 +- drivers/infiniband/hw/ipath/ipath_driver.c | 18 +++++++++--- drivers/infiniband/hw/ipath/ipath_init_chip.c | 36 ++++++++++++++--------- drivers/infiniband/hw/ipath/ipath_intr.c | 21 +++++++++++-- drivers/infiniband/hw/ipath/ipath_kernel.h | 10 +++--- drivers/infiniband/hw/ipath/ipath_layer.c | 6 +++- drivers/infiniband/hw/ipath/ipath_pe800.c | 4 +++ drivers/infiniband/hw/ipath/ipath_registers.h | 31 ++++++++++++-------- drivers/infiniband/hw/ipath/ipath_ruc.c | 15 +++------- drivers/infiniband/hw/ipath/ipath_sysfs.c | 14 ++++++++- drivers/infiniband/hw/ipath/ipath_ud.c | 6 +++- drivers/infiniband/hw/ipath/ipath_verbs.c | 39 ++++++------------------- drivers/infiniband/hw/ipath/ipath_verbs.h | 3 +- drivers/infiniband/hw/ipath/ips_common.h | 2 + drivers/infiniband/hw/mthca/mthca_provider.c | 2 + 16 files changed, 131 insertions(+), 94 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_debug.h b/drivers/infiniband/hw/ipath/ipath_debug.h index 593e289..4676238 100644 --- a/drivers/infiniband/hw/ipath/ipath_debug.h +++ b/drivers/infiniband/hw/ipath/ipath_debug.h @@ -60,11 +60,11 @@ #define __IPATH_KERNEL_SEND 0x2000 /* use kernel mode send */ #define __IPATH_EPKTDBG 0x4000 /* print ethernet packet data */ #define __IPATH_SMADBG 0x8000 /* sma packet debug */ -#define __IPATH_IPATHDBG 0x10000 /* Ethernet (IPATH) general debug on */ -#define __IPATH_IPATHWARN 0x20000 /* Ethernet (IPATH) warnings on */ -#define __IPATH_IPATHERR 0x40000 /* Ethernet (IPATH) errors on */ -#define __IPATH_IPATHPD 0x80000 /* Ethernet (IPATH) packet dump on */ -#define __IPATH_IPATHTABLE 0x100000 /* Ethernet (IPATH) table dump on */ +#define __IPATH_IPATHDBG 0x10000 /* Ethernet (IPATH) gen debug */ +#define __IPATH_IPATHWARN 0x20000 /* Ethernet (IPATH) warnings */ +#define __IPATH_IPATHERR 0x40000 /* Ethernet (IPATH) errors */ +#define __IPATH_IPATHPD 0x80000 /* Ethernet (IPATH) packet dump */ +#define __IPATH_IPATHTABLE 0x100000 /* Ethernet (IPATH) table dump */ #else /* _IPATH_DEBUGGING */ @@ -79,11 +79,12 @@ #define __IPATH_TRSAMPLE 0x0 /* generate trace buffer sample entries */ #define __IPATH_VERBDBG 0x0 /* very verbose debug */ #define __IPATH_PKTDBG 0x0 /* print packet data */ -#define __IPATH_PROCDBG 0x0 /* print process startup (init)/exit messages */ +#define __IPATH_PROCDBG 0x0 /* process startup (init)/exit messages */ /* print mmap/nopage stuff, not using VDBG any more */ #define __IPATH_MMDBG 0x0 #define __IPATH_EPKTDBG 0x0 /* print ethernet packet data */ -#define __IPATH_SMADBG 0x0 /* print process startup (init)/exit messages */#define __IPATH_IPATHDBG 0x0 /* Ethernet (IPATH) table dump on */ +#define __IPATH_SMADBG 0x0 /* process startup (init)/exit messages */ +#define __IPATH_IPATHDBG 0x0 /* Ethernet (IPATH) table dump on */ #define __IPATH_IPATHWARN 0x0 /* Ethernet (IPATH) warnings on */ #define __IPATH_IPATHERR 0x0 /* Ethernet (IPATH) errors on */ #define __IPATH_IPATHPD 0x0 /* Ethernet (IPATH) packet dump on */ diff --git a/drivers/infiniband/hw/ipath/ipath_diag.c b/drivers/infiniband/hw/ipath/ipath_diag.c index 7d3fb69..28ddceb 100644 --- a/drivers/infiniband/hw/ipath/ipath_diag.c +++ b/drivers/infiniband/hw/ipath/ipath_diag.c @@ -277,13 +277,14 @@ static int ipath_diag_open(struct inode bail: spin_unlock_irqrestore(&ipath_devs_lock, flags); - mutex_unlock(&ipath_mutex); /* Only expose a way to reset the device if we make it into diag mode. */ if (ret == 0) ipath_expose_reset(&dd->pcidev->dev); + mutex_unlock(&ipath_mutex); + return ret; } diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index e7617c3..398add4 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -418,9 +418,19 @@ static int __devinit ipath_init_one(stru ret = pci_set_dma_mask(pdev, DMA_64BIT_MASK); if (ret) { - dev_info(&pdev->dev, "pci_set_dma_mask unit %u " - "fails: %d\n", dd->ipath_unit, ret); - goto bail_regions; + /* + * if the 64 bit setup fails, try 32 bit. Some systems + * do not setup 64 bit maps on systems with 2GB or less + * memory installed. + */ + ret = pci_set_dma_mask(pdev, DMA_32BIT_MASK); + if (ret) { + dev_info(&pdev->dev, "pci_set_dma_mask unit %u " + "fails: %d\n", dd->ipath_unit, ret); + goto bail_regions; + } + else + ipath_dbg("No 64bit DMA mask, used 32 bit mask\n"); } pci_set_master(pdev); @@ -1949,7 +1959,7 @@ int ipath_reset_device(int unit) } if (dd->ipath_pd) - for (i = 1; i < dd->ipath_portcnt; i++) { + for (i = 1; i < dd->ipath_cfgports; i++) { if (dd->ipath_pd[i] && dd->ipath_pd[i]->port_cnt) { ipath_dbg("unit %u port %d is in use " "(PID %u cmd %s), can't reset\n", diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index 2823ff9..16f640e 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -53,13 +53,19 @@ MODULE_PARM_DESC(cfgports, "Set max numb /* * Number of buffers reserved for driver (layered drivers and SMA - * send). Reserved at end of buffer list. + * send). Reserved at end of buffer list. Initialized based on + * number of PIO buffers if not set via module interface. + * The problem with this is that it's global, but we'll use different + * numbers for different chip types. So the default value is not + * very useful. I've redefined it for the 1.3 release so that it's + * zero unless set by the user to something else, in which case we + * try to respect it. */ -static ushort ipath_kpiobufs = 32; +static ushort ipath_kpiobufs; static int ipath_set_kpiobufs(const char *val, struct kernel_param *kp); -module_param_call(kpiobufs, ipath_set_kpiobufs, param_get_uint, +module_param_call(kpiobufs, ipath_set_kpiobufs, param_get_ushort, &ipath_kpiobufs, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(kpiobufs, "Set number of PIO buffers for driver"); @@ -531,8 +537,11 @@ static int init_housekeeping(struct ipat * Don't clear ipath_flags as 8bit mode was set before * entering this func. However, we do set the linkstate to * unknown, so we can watch for a transition. + * PRESENT is set because we want register reads to work, + * and the kernel infrastructure saw it in config space; + * We clear it if we have failures. */ - dd->ipath_flags |= IPATH_LINKUNK; + dd->ipath_flags |= IPATH_LINKUNK | IPATH_PRESENT; dd->ipath_flags &= ~(IPATH_LINKACTIVE | IPATH_LINKARMED | IPATH_LINKDOWN | IPATH_LINKINIT); @@ -560,6 +569,7 @@ static int init_housekeeping(struct ipat || (dd->ipath_uregbase & 0xffffffff) == 0xffffffff) { ipath_dev_err(dd, "Register read failures from chip, " "giving up initialization\n"); + dd->ipath_flags &= ~IPATH_PRESENT; ret = -ENODEV; goto done; } @@ -682,16 +692,14 @@ int ipath_init_chip(struct ipath_devdata */ dd->ipath_pioavregs = ALIGN(val, sizeof(u64) * BITS_PER_BYTE / 2) / (sizeof(u64) * BITS_PER_BYTE / 2); - if (!ipath_kpiobufs) /* have to have at least 1, for SMA */ - kpiobufs = ipath_kpiobufs = 1; - else if ((dd->ipath_piobcnt2k + dd->ipath_piobcnt4k) < - (dd->ipath_cfgports * IPATH_MIN_USER_PORT_BUFCNT)) { - dev_info(&dd->pcidev->dev, "Too few PIO buffers (%u) " - "for %u ports to have %u each!\n", - dd->ipath_piobcnt2k + dd->ipath_piobcnt4k, - dd->ipath_cfgports, IPATH_MIN_USER_PORT_BUFCNT); - kpiobufs = 1; /* reserve just the minimum for SMA/ether */ - } else + if (ipath_kpiobufs == 0) { + /* not set by user, or set explictly to default */ + if ((dd->ipath_piobcnt2k + dd->ipath_piobcnt4k) > 128) + kpiobufs = 32; + else + kpiobufs = 16; + } + else kpiobufs = ipath_kpiobufs; if (kpiobufs > diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c index 0bcb428..3e72a1f 100644 --- a/drivers/infiniband/hw/ipath/ipath_intr.c +++ b/drivers/infiniband/hw/ipath/ipath_intr.c @@ -665,14 +665,14 @@ static void handle_layer_pioavail(struct ret = __ipath_layer_intr(dd, IPATH_LAYER_INT_SEND_CONTINUE); if (ret > 0) - goto clear; + goto set; ret = __ipath_verbs_piobufavail(dd); if (ret > 0) - goto clear; + goto set; return; -clear: +set: set_bit(IPATH_S_PIOINTBUFAVAIL, &dd->ipath_sendctrl); ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl); @@ -719,11 +719,24 @@ static void handle_rcv(struct ipath_devd irqreturn_t ipath_intr(int irq, void *data, struct pt_regs *regs) { struct ipath_devdata *dd = data; - u32 istat = ipath_read_kreg32(dd, dd->ipath_kregs->kr_intstatus); + u32 istat; ipath_err_t estat = 0; static unsigned unexpected = 0; irqreturn_t ret; + if(!(dd->ipath_flags & IPATH_PRESENT)) { + /* this is mostly so we don't try to touch the chip while + * it is being reset */ + /* + * This return value is perhaps odd, but we do not want the + * interrupt core code to remove our interrupt handler + * because we don't appear to be handling an interrupt + * during a chip reset. + */ + return IRQ_HANDLED; + } + + istat = ipath_read_kreg32(dd, dd->ipath_kregs->kr_intstatus); if (unlikely(!istat)) { ipath_stats.sps_nullintr++; ret = IRQ_NONE; /* not our interrupt, or already handled */ diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 0ce5f19..e6507f8 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -731,7 +731,7 @@ u64 ipath_read_kreg64_port(const struct static inline u32 ipath_read_ureg32(const struct ipath_devdata *dd, ipath_ureg regno, int port) { - if (!dd->ipath_kregbase) + if (!dd->ipath_kregbase || !(dd->ipath_flags & IPATH_PRESENT)) return 0; return readl(regno + (u64 __iomem *) @@ -762,7 +762,7 @@ static inline void ipath_write_ureg(cons static inline u32 ipath_read_kreg32(const struct ipath_devdata *dd, ipath_kreg regno) { - if (!dd->ipath_kregbase) + if (!dd->ipath_kregbase || !(dd->ipath_flags & IPATH_PRESENT)) return -1; return readl((u32 __iomem *) & dd->ipath_kregbase[regno]); } @@ -770,7 +770,7 @@ static inline u32 ipath_read_kreg32(cons static inline u64 ipath_read_kreg64(const struct ipath_devdata *dd, ipath_kreg regno) { - if (!dd->ipath_kregbase) + if (!dd->ipath_kregbase || !(dd->ipath_flags & IPATH_PRESENT)) return -1; return readq(&dd->ipath_kregbase[regno]); @@ -786,7 +786,7 @@ static inline void ipath_write_kreg(cons static inline u64 ipath_read_creg(const struct ipath_devdata *dd, ipath_sreg regno) { - if (!dd->ipath_kregbase) + if (!dd->ipath_kregbase || !(dd->ipath_flags & IPATH_PRESENT)) return 0; return readq(regno + (u64 __iomem *) @@ -797,7 +797,7 @@ static inline u64 ipath_read_creg(const static inline u32 ipath_read_creg32(const struct ipath_devdata *dd, ipath_sreg regno) { - if (!dd->ipath_kregbase) + if (!dd->ipath_kregbase || !(dd->ipath_flags & IPATH_PRESENT)) return 0; return readl(regno + (u64 __iomem *) (dd->ipath_cregbase + diff --git a/drivers/infiniband/hw/ipath/ipath_layer.c b/drivers/infiniband/hw/ipath/ipath_layer.c index 69ed110..9cb5258 100644 --- a/drivers/infiniband/hw/ipath/ipath_layer.c +++ b/drivers/infiniband/hw/ipath/ipath_layer.c @@ -46,13 +46,15 @@ /* Acquire before ipath_devs_lock. */ static DEFINE_MUTEX(ipath_layer_mutex); +static int ipath_verbs_registered; + u16 ipath_layer_rcv_opcode; + static int (*layer_intr)(void *, u32); static int (*layer_rcv)(void *, void *, struct sk_buff *); static int (*layer_rcv_lid)(void *, void *); static int (*verbs_piobufavail)(void *); static void (*verbs_rcv)(void *, void *, void *, u32); -static int ipath_verbs_registered; static void *(*layer_add_one)(int, struct ipath_devdata *); static void (*layer_remove_one)(void *); @@ -586,6 +588,8 @@ void ipath_verbs_unregister(void) verbs_rcv = NULL; verbs_timer_cb = NULL; + ipath_verbs_registered = 0; + mutex_unlock(&ipath_layer_mutex); } diff --git a/drivers/infiniband/hw/ipath/ipath_pe800.c b/drivers/infiniband/hw/ipath/ipath_pe800.c index e1dc4f7..6318067 100644 --- a/drivers/infiniband/hw/ipath/ipath_pe800.c +++ b/drivers/infiniband/hw/ipath/ipath_pe800.c @@ -972,6 +972,8 @@ static int ipath_setup_pe_reset(struct i /* Use ERROR so it shows up in logs, etc. */ ipath_dev_err(dd, "Resetting PE-800 unit %u\n", dd->ipath_unit); + /* keep chip from being accessed in a few places */ + dd->ipath_flags &= ~(IPATH_INITTED|IPATH_PRESENT); val = dd->ipath_control | INFINIPATH_C_RESET; ipath_write_kreg(dd, dd->ipath_kregs->kr_control, val); mb(); @@ -997,6 +999,8 @@ static int ipath_setup_pe_reset(struct i if ((r = pci_enable_device(dd->pcidev))) ipath_dev_err(dd, "pci_enable_device failed after " "reset: %d\n", r); + /* whether it worked or not, mark as present, again */ + dd->ipath_flags |= IPATH_PRESENT; val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_revision); if (val == dd->ipath_revision) { ipath_cdbg(VERBOSE, "Got matching revision " diff --git a/drivers/infiniband/hw/ipath/ipath_registers.h b/drivers/infiniband/hw/ipath/ipath_registers.h index 1e59750..402126e 100644 --- a/drivers/infiniband/hw/ipath/ipath_registers.h +++ b/drivers/infiniband/hw/ipath/ipath_registers.h @@ -34,8 +34,9 @@ #define _IPATH_REGISTERS_H /* - * This file should only be included by kernel source, and by the diags. - * It defines the registers, and their contents, for the InfiniPath HT-400 chip + * This file should only be included by kernel source, and by the diags. It + * defines the registers, and their contents, for the InfiniPath HT-400 + * chip. */ /* @@ -156,8 +157,10 @@ #define INFINIPATH_IBCC_FLOWCTRLWATERMARK_SHIFT 8 #define INFINIPATH_IBCC_LINKINITCMD_MASK 0x3ULL #define INFINIPATH_IBCC_LINKINITCMD_DISABLE 1 -#define INFINIPATH_IBCC_LINKINITCMD_POLL 2 /* cycle through TS1/TS2 till OK */ -#define INFINIPATH_IBCC_LINKINITCMD_SLEEP 3 /* wait for TS1, then go on */ +/* cycle through TS1/TS2 till OK */ +#define INFINIPATH_IBCC_LINKINITCMD_POLL 2 +/* wait for TS1, then go on */ +#define INFINIPATH_IBCC_LINKINITCMD_SLEEP 3 #define INFINIPATH_IBCC_LINKINITCMD_SHIFT 16 #define INFINIPATH_IBCC_LINKCMD_MASK 0x3ULL #define INFINIPATH_IBCC_LINKCMD_INIT 1 /* move to 0x11 */ @@ -182,7 +185,8 @@ #define INFINIPATH_IBCS_LINKSTATE_SHIFT 4 #define INFINIPATH_IBCS_TXREADY 0x40000000 #define INFINIPATH_IBCS_TXCREDITOK 0x80000000 -/* link training states (shift by INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) */ +/* link training states (shift by + INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) */ #define INFINIPATH_IBCS_LT_STATE_DISABLED 0x00 #define INFINIPATH_IBCS_LT_STATE_LINKUP 0x01 #define INFINIPATH_IBCS_LT_STATE_POLLACTIVE 0x02 @@ -267,10 +271,12 @@ /* kr_serdesconfig0 bits */ #define INFINIPATH_SERDC0_RESET_MASK 0xfULL /* overal reset bits */ #define INFINIPATH_SERDC0_RESET_PLL 0x10000000ULL /* pll reset */ -#define INFINIPATH_SERDC0_TXIDLE 0xF000ULL /* tx idle enables (per lane) */ -#define INFINIPATH_SERDC0_RXDETECT_EN 0xF0000ULL /* rx detect enables (per lane) */ -#define INFINIPATH_SERDC0_L1PWR_DN 0xF0ULL /* L1 Power down; use with RXDETECT, - Otherwise not used on IB side */ +/* tx idle enables (per lane) */ +#define INFINIPATH_SERDC0_TXIDLE 0xF000ULL +/* rx detect enables (per lane) */ +#define INFINIPATH_SERDC0_RXDETECT_EN 0xF0000ULL +/* L1 Power down; use with RXDETECT, Otherwise not used on IB side */ +#define INFINIPATH_SERDC0_L1PWR_DN 0xF0ULL /* kr_xgxsconfig bits */ #define INFINIPATH_XGXS_RESET 0x7ULL @@ -390,12 +396,13 @@ struct ipath_kregs { ipath_kreg kr_txintmemsize; ipath_kreg kr_xgxsconfig; ipath_kreg kr_ibpllcfg; - /* use these two (and the following N ports) only with ipath_k*_kreg64_port(); - * not *kreg64() */ + /* use these two (and the following N ports) only with + * ipath_k*_kreg64_port(); not *kreg64() */ ipath_kreg kr_rcvhdraddr; ipath_kreg kr_rcvhdrtailaddr; - /* remaining registers are not present on all types of infinipath chips */ + /* remaining registers are not present on all types of infinipath + chips */ ipath_kreg kr_rcvpktledcnt; ipath_kreg kr_pcierbuftestreg0; ipath_kreg kr_pcierbuftestreg1; diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c index f232e77..eb81424 100644 --- a/drivers/infiniband/hw/ipath/ipath_ruc.c +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c @@ -531,19 +531,12 @@ int ipath_post_rc_send(struct ipath_qp * } wqe->wr.num_sge = j; qp->s_head = next; - /* - * Wake up the send tasklet if the QP is not waiting - * for an RNR timeout. - */ - next = qp->s_rnr_timeout; spin_unlock_irqrestore(&qp->s_lock, flags); - if (next == 0) { - if (qp->ibqp.qp_type == IB_QPT_UC) - ipath_do_uc_send((unsigned long) qp); - else - ipath_do_rc_send((unsigned long) qp); - } + if (qp->ibqp.qp_type == IB_QPT_UC) + ipath_do_uc_send((unsigned long) qp); + else + ipath_do_rc_send((unsigned long) qp); ret = 0; diff --git a/drivers/infiniband/hw/ipath/ipath_sysfs.c b/drivers/infiniband/hw/ipath/ipath_sysfs.c index 32acd80..f323791 100644 --- a/drivers/infiniband/hw/ipath/ipath_sysfs.c +++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c @@ -711,10 +711,22 @@ static struct attribute_group dev_attr_g * enters diag mode. A device reset is quite likely to crash the * machine entirely, so we don't want to normally make it * available. + * + * Called with ipath_mutex held. */ int ipath_expose_reset(struct device *dev) { - return device_create_file(dev, &dev_attr_reset); + static int exposed; + int ret; + + if (!exposed) { + ret = device_create_file(dev, &dev_attr_reset); + exposed = 1; + } + else + ret = 0; + + return ret; } int ipath_driver_create_group(struct device_driver *drv) diff --git a/drivers/infiniband/hw/ipath/ipath_ud.c b/drivers/infiniband/hw/ipath/ipath_ud.c index 01cfb30..e606daf 100644 --- a/drivers/infiniband/hw/ipath/ipath_ud.c +++ b/drivers/infiniband/hw/ipath/ipath_ud.c @@ -46,8 +46,10 @@ * This is called from ipath_post_ud_send() to forward a WQE addressed * to the same HCA. */ -static void ipath_ud_loopback(struct ipath_qp *sqp, struct ipath_sge_state *ss, - u32 length, struct ib_send_wr *wr, struct ib_wc *wc) +static void ipath_ud_loopback(struct ipath_qp *sqp, + struct ipath_sge_state *ss, + u32 length, struct ib_send_wr *wr, + struct ib_wc *wc) { struct ipath_ibdev *dev = to_idev(sqp->ibqp.device); struct ipath_qp *qp; diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index 8d2558a..cb9e387 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -449,7 +449,6 @@ static void ipath_ib_timer(void *arg) { struct ipath_ibdev *dev = (struct ipath_ibdev *) arg; struct ipath_qp *resend = NULL; - struct ipath_qp *rnr = NULL; struct list_head *last; struct ipath_qp *qp; unsigned long flags; @@ -465,32 +464,18 @@ static void ipath_ib_timer(void *arg) last = &dev->pending[dev->pending_index]; while (!list_empty(last)) { qp = list_entry(last->next, struct ipath_qp, timerwait); - if (last->next == LIST_POISON1 || - last->next != &qp->timerwait || - qp->timerwait.prev != last) { - INIT_LIST_HEAD(last); - } else { - list_del(&qp->timerwait); - qp->timerwait.prev = (struct list_head *) resend; - resend = qp; - atomic_inc(&qp->refcount); - } + list_del(&qp->timerwait); + qp->timer_next = resend; + resend = qp; + atomic_inc(&qp->refcount); } last = &dev->rnrwait; if (!list_empty(last)) { qp = list_entry(last->next, struct ipath_qp, timerwait); if (--qp->s_rnr_timeout == 0) { do { - if (last->next == LIST_POISON1 || - last->next != &qp->timerwait || - qp->timerwait.prev != last) { - INIT_LIST_HEAD(last); - break; - } list_del(&qp->timerwait); - qp->timerwait.prev = - (struct list_head *) rnr; - rnr = qp; + tasklet_hi_schedule(&qp->s_task); if (list_empty(last)) break; qp = list_entry(last->next, struct ipath_qp, @@ -530,8 +515,7 @@ static void ipath_ib_timer(void *arg) spin_unlock_irqrestore(&dev->pending_lock, flags); /* XXX What if timer fires again while this is running? */ - for (qp = resend; qp != NULL; - qp = (struct ipath_qp *) qp->timerwait.prev) { + for (qp = resend; qp != NULL; qp = qp->timer_next) { struct ib_wc wc; spin_lock_irqsave(&qp->s_lock, flags); @@ -545,9 +529,6 @@ static void ipath_ib_timer(void *arg) if (atomic_dec_and_test(&qp->refcount)) wake_up(&qp->wait); } - for (qp = rnr; qp != NULL; - qp = (struct ipath_qp *) qp->timerwait.prev) - tasklet_hi_schedule(&qp->s_task); } /** @@ -556,9 +537,9 @@ static void ipath_ib_timer(void *arg) * * This is called from ipath_intr() at interrupt level when a PIO buffer is * available after ipath_verbs_send() returned an error that no buffers were - * available. Return 0 if we consumed all the PIO buffers and we still have + * available. Return 1 if we consumed all the PIO buffers and we still have * QPs waiting for buffers (for now, just do a tasklet_hi_schedule and - * return one). + * return zero). */ static int ipath_ib_piobufavail(void *arg) { @@ -579,7 +560,7 @@ static int ipath_ib_piobufavail(void *ar spin_unlock_irqrestore(&dev->pending_lock, flags); bail: - return 1; + return 0; } static int ipath_query_device(struct ib_device *ibdev, @@ -1159,7 +1140,7 @@ static ssize_t show_stats(struct class_d len = sprintf(buf, "RC resends %d\n" - "RC QACKs %d\n" + "RC no QACK %d\n" "RC ACKs %d\n" "RC SEQ NAKs %d\n" "RC RDMA seq %d\n" diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h index fcafbc7..4f8d593 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.h +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h @@ -282,7 +282,8 @@ struct ipath_srq { */ struct ipath_qp { struct ib_qp ibqp; - struct ipath_qp *next; /* link list for QPN hash table */ + struct ipath_qp *next; /* link list for QPN hash table */ + struct ipath_qp *timer_next; /* link list for ipath_ib_timer() */ struct list_head piowait; /* link for wait PIO buf */ struct list_head timerwait; /* link for waiting for timeouts */ struct ib_ah_attr remote_ah_attr; diff --git a/drivers/infiniband/hw/ipath/ips_common.h b/drivers/infiniband/hw/ipath/ips_common.h index 410a764..ab7cbbb 100644 --- a/drivers/infiniband/hw/ipath/ips_common.h +++ b/drivers/infiniband/hw/ipath/ips_common.h @@ -95,7 +95,7 @@ struct ether_header { __u8 seq_num; __le32 len; /* MUST be of word size due to PIO write requirements */ - __u32 csum; + __le32 csum; __le16 csum_offset; __le16 flags; __u16 first_2_bytes; diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c index 565a24b..a2eae8a 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -306,7 +306,7 @@ static int mthca_query_gid(struct ib_dev goto out; } - memcpy(gid->raw + 8, out_mad->data + (index % 8) * 16, 8); + memcpy(gid->raw + 8, out_mad->data + (index % 8) * 8, 8); out: kfree(in_mad); From ishai at mellanox.co.il Mon May 1 12:13:52 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 22:13:52 +0300 Subject: [openib-general] [PATCH 06/12] SRP: Changing ibsrpdm In-Reply-To: <20060501135032.GL3599@rhun.haifa.ibm.com> References: <20060501135032.GL3599@rhun.haifa.ibm.com> Message-ID: <20060501191352.GC22488@mellanox.co.il> On Mon, May 01, 2006 at 04:50:32PM +0300, Muli Ben-Yehuda wrote: > On Mon, May 01, 2006 at 02:28:48PM +0300, Ishai Rabinovitz wrote: > > > > Support a display of list of target from user level. > > > > Signed-off-by: Ishai Rabinovitz > > Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c > > =================================================================== > > --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-21 01:13:04.000000000 +0300 > > +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-21 03:56:05.000000000 +0300 > > @@ -1730,6 +1730,63 @@ end: > > > > static CLASS_DEVICE_ATTR(remove_target, S_IWUSR, NULL, srp_remove_target); > > > > +#define TARGET_INFO_BUF_SIZE 126 > > + > > +static ssize_t list_targets(struct class_device *class_dev, char *buf) > > +{ > > + struct srp_host *host = > > + container_of(class_dev, struct srp_host, class_dev); > > + struct srp_target_port *target; > > + int printed=0, ret; > > + > > + mutex_lock(&host->target_mutex); > > + list_for_each_entry(target, &host->target_list, list) > > Can this race with list addition / removal? I saw that you removed the > lock in an earlier patch? No, In an erlier patch I did not removed the lock, I enlarged it scope to include the entire call to srp_find_target. > > > + if (target->state == SRP_TARGET_LIVE) { > > You'd have an easier time with the indentation if you'd do > > if (target->state != SRP_TARGET_LIVE) > continue; > > here > > > + ret = sprintf(buf+printed, > > + "id_ext=%016llx,ioc_guid=%016llx," > > + "dgid=%04x%04x%04x%04x%04x%04x%04x%04x," > > + "pkey=%04x,service_id=%016llx\n", > > + (unsigned long long) > > + be64_to_cpu(target->id_ext), > > + (unsigned long long) > > + be64_to_cpu(target->ioc_guid), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[0]), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[2]), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[4]), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[6]), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[8]), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[10]), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[12]), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[14]), > > This is pretty horrible - could you use show_dgid() here? Id will add a redundant copy of the buffer. > > Cheers, > Muli -- Ishai Rabinovitz From rdreier at cisco.com Mon May 1 12:17:59 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 12:17:59 -0700 Subject: [openib-general] Re: [PATCH 13 of 13] ipath - tidy up white space in a few files In-Reply-To: <895650567032e5b48153.1145913789@eng-12.pathscale.com> (Bryan O'Sullivan's message of "Mon, 24 Apr 2006 14:23:09 -0700") References: <895650567032e5b48153.1145913789@eng-12.pathscale.com> Message-ID: Applied all except 5/13 and 8/13... From arjan at infradead.org Mon May 1 12:20:00 2006 From: arjan at infradead.org (Arjan van de Ven) Date: Mon, 01 May 2006 21:20:00 +0200 Subject: [openib-general] Re: [PATCH 5 of 13] ipath - use proper address translation routine In-Reply-To: References: <1ab168913f0fea5d18b4.1145913781@eng-12.pathscale.com> <1146509646.20760.63.camel@laptopd505.fenrus.org> Message-ID: <1146511201.20760.65.camel@laptopd505.fenrus.org> On Mon, 2006-05-01 at 12:00 -0700, Roland Dreier wrote: > Arjan> do you really NEED the vaddr? (most of the time linux > Arjan> drivers don't need it, while other OSes do) If you really > Arjan> need it you should grab it at dma_map time ... (and > Arjan> realize that it's not kernel addressable per se ;) > > Yes, they need some kind of vaddr. > > It's kind of a layering problem. The IB stack assumes that IB devices > have a DMA engine that deals with bus addresses. But the ipath driver > has to simulate this by using a memcpy on the CPU to move data to the > PCI device. > > I really don't know what the right solution is. Maybe having some way > to override the dma mapping operations so that the ipath driver can > keep the info it needs? sounds like you need to redesign your layering ;) In linux it's common to have the lowest level driver do the mapping (even when the mid layer will provide the most commonly used helper to do it for the common case)... From rdreier at cisco.com Mon May 1 12:22:57 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 12:22:57 -0700 Subject: [openib-general] Re: [PATCH 06/12] SRP: Changing ibsrpdm In-Reply-To: <20060501112848.GG17552@mellanox.co.il> (Ishai Rabinovitz's message of "Mon, 1 May 2006 14:28:48 +0300") References: <20060501112848.GG17552@mellanox.co.il> Message-ID: This patch is not acceptable. It's totally violating the sysfs "one-value-per-file" rule. What's wrong with the existing info in sysfs? - R. From rdreier at cisco.com Mon May 1 12:28:41 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 12:28:41 -0700 Subject: [openib-general] Re: [PATCH 5 of 13] ipath - use proper address translation routine In-Reply-To: <1146511201.20760.65.camel@laptopd505.fenrus.org> (Arjan van de Ven's message of "Mon, 01 May 2006 21:20:00 +0200") References: <1ab168913f0fea5d18b4.1145913781@eng-12.pathscale.com> <1146509646.20760.63.camel@laptopd505.fenrus.org> <1146511201.20760.65.camel@laptopd505.fenrus.org> Message-ID: Arjan> sounds like you need to redesign your layering ;) In linux Arjan> it's common to have the lowest level driver do the mapping Arjan> (even when the mid layer will provide the most commonly Arjan> used helper to do it for the common case)... It's not that simple of course... InfiniBand allows RDMA -- _remote_ DMA. So that address might be something that a protocol sent to the remote host and which is now showing up for a DMA operation initiated by the remote side. And we can't very well send a struct page * + offset to the remote side... From ishai at mellanox.co.il Mon May 1 12:28:43 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 22:28:43 +0300 Subject: [openib-general] Re: [PATCH 00/12] SRP: Changing ibsrpdm In-Reply-To: References: <20060501112145.GA17552@mellanox.co.il> Message-ID: <20060501192843.GA22993@mellanox.co.il> On Mon, May 01, 2006 at 10:53:36AM -0700, Roland Dreier wrote: > Ishai> Hi, I'm going to send 12 patches. 6 patches for the kernel, > Ishai> and 6 for the userspace ibsrpdm. The kernel patches avoid > Ishai> adding the same target twice, allow the removal of a > Ishai> target, and add a query about the connected targets. > > In the future can you use a different descriptive title for each patch? OK > > Also (although I haven't reviewed the actual code yet) this mostly > makes sense, but I'm not sure we want to disallow connecting to the > same target twice. Userspace may want to implement a policy of one > conncetion per target, but having multiple connections to the same > target for multipathing/failover seems like something the kernel > should allow. What was your reason for forbidding this? > > - R. As I understand it, the path in multipathing is going to be part of the attributes of the connection to the target. So there will be no problem to add the same target twice, if it has a different path leading to it. -- Ishai Rabinovitz From mst at mellanox.co.il Mon May 1 12:35:45 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 1 May 2006 22:35:45 +0300 Subject: [openib-general] Re: Re: [PATCH v2] mad: use GID/LID on requester sidewhen matching responses to requests In-Reply-To: References: <20060429193913.GA9584@mellanox.co.il> Message-ID: <20060501193545.GA6642@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: Re: [PATCH v2] mad: use GID/LID on requester sidewhen matching responses to requests > > >> >Check GID/LID for requester side when searching for request which matches > >> >received response. This, in order to guarantee uniqueness if use same TID > >> >when requesting via multiple source LIDs (when LMC is not zero). To perform > >> >check, add LMC to cache. > >> > > >> >Further, do not perform LID check for direct-routed packets, since > >> >permissive > >> >LID makes a proper check impossible. > >> > >> Thanks - I'll look at this within the next couple of days. > > > >Could this patch be merged please? Sean? > > There was a request to submit the LMC cache piece as a separate patch. I can > merge in the MAD changes after the LMC cache has been accepted. Roland, could this get merged please? -- MST From segher at kernel.crashing.org Mon May 1 12:56:09 2006 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Mon, 1 May 2006 21:56:09 +0200 Subject: [openib-general] Re: [PATCH 2 of 13] ipath - set up 32-bit DMA mask if 64-bit setup fails In-Reply-To: References: <1906950392f7ef8c7d07.1145913778@eng-12.pathscale.com> Message-ID: <4B05D10C-407E-46A5-848F-0897D1E6D1CD@kernel.crashing.org> > Bryan> Some systems do not set up 64-bit maps on systems with 2GB > Bryan> or less of memory installed, so we have to fall back to > Bryan> trying a 32-bit setup. > > Which systems does this happen on? PowerPC with U3 or U4 northbridge, i.e. Maple or PowerMac G5 systems. If the IOMMU (DART) is disabled, we have a 32-bit only DMA mask. The DART will be disabled by default if there is 2GB or less of memory (as it isn't needed then). > I'm just curious, because mthca has > > err = pci_set_dma_mask(pdev, DMA_64BIT_MASK); > if (err) { > dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI DMA mask. > \n"); > > and I've never had a single report of that warning triggering. That's only because I never used those cards on systems with fewer than 4GB of memory :-) Segher From ishai at mellanox.co.il Mon May 1 12:58:50 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 1 May 2006 22:58:50 +0300 Subject: [openib-general] Re: [PATCH] SRP: Avoid a potential deadlock In-Reply-To: References: <20060501113548.GN17552@mellanox.co.il> Message-ID: <20060501195850.GB23117@mellanox.co.il> On Mon, May 01, 2006 at 11:27:24AM -0700, Roland Dreier wrote: > Ishai> Avoid a potential dead-lock. In srp_disconnect_target > Ishai> there is a call to ib_send_cm_dreq and a wait for > Ishai> completion If when getting DREP there is no comp no one > Ishai> will end this wait > > I thought that after the DREP is received, the CM will go through > timewait and we will eventually get a TIMEWAIT_EXIT event (with a > completion). Am I wrong? Have you actually seen this deadlock happen > in practice? > > - R. I had a deadlock and I suspected at this. I'm not sure that it was the reason for the deadlock. Vu, What do you think? -- Ishai Rabinovitz From mdidomenico at silverstorm.com Mon May 1 13:21:52 2006 From: mdidomenico at silverstorm.com (Di Domenico, Michael) Date: Mon, 1 May 2006 16:21:52 -0400 Subject: [openib-general] ib_srp Message-ID: Roland, I've never done that before, if it's simple to do, I would be happy to re-run the test and produce the output. Thanks > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Monday, May 01, 2006 1:06 PM > To: Di Domenico, Michael > Cc: openib-general at openib.org > Subject: Re: [openib-general] ib_srp > > Di> Hi, Is there a way to remove a target from the SRP > Di> configuration without unloading the driver module (which seems > Di> to have partially removed the disk, but appears to be > Di> hanging)? > > Unloading the module should work. A trace from sysrq-t that shows > where rmmod/modprobe -r is hanging would be useful. > > Right now there isn't a way to disconnect from a particular target > port without unloading the module though. > > - R. From rdreier at cisco.com Mon May 1 14:38:15 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 14:38:15 -0700 Subject: [openib-general] Re: [PATCH 00/12] SRP: Changing ibsrpdm In-Reply-To: <20060501192843.GA22993@mellanox.co.il> (Ishai Rabinovitz's message of "Mon, 1 May 2006 22:28:43 +0300") References: <20060501112145.GA17552@mellanox.co.il> <20060501192843.GA22993@mellanox.co.il> Message-ID: Ishai> As I understand it, the path in multipathing is going to be Ishai> part of the attributes of the connection to the target. So Ishai> there will be no problem to add the same target twice, if Ishai> it has a different path leading to it. Yes, that's the usual way to do it. But I don't see why we want to forbid having the same path but multiple local QPs. For example that would allow some intelligent upper layer to implement "storage QoS" by avoiding head-of-line blocking. - R. From halr at voltaire.com Mon May 1 14:37:42 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 01 May 2006 17:37:42 -0400 Subject: [openib-general] RFC: detecting duplicate MAD requests In-Reply-To: References: Message-ID: <1146508240.24063.825.camel@hal.voltaire.com> On Mon, 2006-05-01 at 14:18, Sean Hefty wrote: > >Aren't there 3 cases possible here: (1) non RMPP request/RMPP response > >(e.g. SA GetTable for one), (2) RMPP request/RMPP response (e.g. SA > >GetMulti), and (3) RMPP request/non RMPP response (I don't think this > >currently exists but may be mistaken). Are all handled on the > >initiator/requester side ? Are the changes only for case (2) ? > > For case 1, we don't have DS RMPP. I think you mean (we have) non DS RMPP. > For case 2, we have DS RMPP. > > And I don't believe that case 3 exists either, but would end up being treated as > DS RMPP by the implementation. Why ? Just wondering... > If case 3 doesn't exist, then I think we can > come up with a generic way to identify DS RMPP that doesn't require checking > class or methods. How ? > If case 3 does exist, then I think we'll need class / method > checking to identify DS RMPP. By doesn't exist, do you mean not possible in the architecture or no current use cases like this ? -- Hal > - Sean From rdreier at cisco.com Mon May 1 14:41:39 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 14:41:39 -0700 Subject: [openib-general] Re: [PATCH 2 of 13] ipath - set up 32-bit DMA mask if 64-bit setup fails In-Reply-To: <4B05D10C-407E-46A5-848F-0897D1E6D1CD@kernel.crashing.org> (Segher Boessenkool's message of "Mon, 1 May 2006 21:56:09 +0200") References: <1906950392f7ef8c7d07.1145913778@eng-12.pathscale.com> <4B05D10C-407E-46A5-848F-0897D1E6D1CD@kernel.crashing.org> Message-ID: Segher> PowerPC with U3 or U4 northbridge, i.e. Maple or PowerMac Segher> G5 systems. If the IOMMU (DART) is disabled, we have a Segher> 32-bit only DMA mask. The DART will be disabled by Segher> default if there is 2GB or less of memory (as it isn't Segher> needed then). OK, thanks. I was not aware of that situation. However, I suspect that PathScale has a different situation in mind, considering that their driver isn't even buildable for that platform ;) - R. From nakano_no_mail at aquarius.livedoor.com Mon May 1 15:40:15 2006 From: nakano_no_mail at aquarius.livedoor.com (=?iso-2022-jp?B?GyRCQ2ZMbjBJO1IbKEI=?=) Date: Mon, 01 May 2006 22:40:15 -0000 Subject: [openib-general] =?iso-2022-jp?b?GyRCTWgkXiQ3JD8bKEI=?= Message-ID: <20060501224009.B70772283DA@openib.ca.sandia.gov> 谷原さんから連絡来ました!ほんとに良かった…心配かけちゃって本当にごめんなさい(T-T) なんだか実家の方で色々あって大変だったみたいでちょっと寂しそうでした。 もう彼女の方には連絡してもらえていましたか?谷原さんに聞きのがしちゃって、、けっこう辛そうな感じだったから良かったら後でまた連絡してあげてください。 実家のことでもう2,3日連休もらってるみたいだから、時間が空いてたら一緒にお食事とか誘ってあげたらすごく喜ぶと思います。 谷原さんは外食好きな人で色んなお店知ってるし楽しいと思いますよ(^^) あ、連絡取れてなかったり直接連絡ならhttp://dendeke.net/pb/index.php?b=2から0円でみんな出来るようになりましたから! 他の人も早く紹介してあげたいし良かったら牧野さんたちにも連絡してあげてくださいね! 二人とも今日明日なら大丈夫って言ってましたからタイミング的にもよいと思いますよ(^^) 中野杏子(nakano_no_mail at aquarius.livedoor.com)でした(^^) From rdreier at cisco.com Mon May 1 15:33:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 15:33:24 -0700 Subject: [openib-general] ib_srp In-Reply-To: (Di Domenico's message of "Mon, 1 May 2006 16:21:52 -0400") References: Message-ID: Di> Roland, I've never done that before, if it's simple to do, I Di> would be happy to re-run the test and produce the output. Yes, please do. All you need to do is echo "t" > /proc/sysrq-trigger and send the output. The PID of the process that's hanging is useful too. - R. From segher at kernel.crashing.org Mon May 1 16:13:12 2006 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Tue, 2 May 2006 01:13:12 +0200 Subject: [openib-general] Re: [PATCH 2 of 13] ipath - set up 32-bit DMA mask if 64-bit setup fails In-Reply-To: References: <1906950392f7ef8c7d07.1145913778@eng-12.pathscale.com> <4B05D10C-407E-46A5-848F-0897D1E6D1CD@kernel.crashing.org> Message-ID: <114102B4-FBCB-4A5A-B986-80D4A730DD91@kernel.crashing.org> > Segher> PowerPC with U3 or U4 northbridge, i.e. Maple or PowerMac > Segher> G5 systems. If the IOMMU (DART) is disabled, we have a > Segher> 32-bit only DMA mask. The DART will be disabled by > Segher> default if there is 2GB or less of memory (as it isn't > Segher> needed then). > > OK, thanks. I was not aware of that situation. > > However, I suspect that PathScale has a different situation in mind, > considering that their driver isn't even buildable for that > platform ;) Well (a previous version of) that patch came from me, draw your own conclusions :-) And it builds just fine -- what is the problem you're thinking of? Segher From rdreier at cisco.com Mon May 1 16:27:36 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 16:27:36 -0700 Subject: [openib-general] Re: [PATCH 2 of 13] ipath - set up 32-bit DMA mask if 64-bit setup fails In-Reply-To: <114102B4-FBCB-4A5A-B986-80D4A730DD91@kernel.crashing.org> (Segher Boessenkool's message of "Tue, 2 May 2006 01:13:12 +0200") References: <1906950392f7ef8c7d07.1145913778@eng-12.pathscale.com> <4B05D10C-407E-46A5-848F-0897D1E6D1CD@kernel.crashing.org> <114102B4-FBCB-4A5A-B986-80D4A730DD91@kernel.crashing.org> Message-ID: Segher> And it builds just fine -- what is the problem you're Segher> thinking of? Well, the ipath driver depends on PCI_MSI, and PCI_MSI depends on (X86_LOCAL_APIC && X86_IO_APIC) || IA64 So how do you enable the driver? And what powerpc platform can you use the device on? - R. From segher at kernel.crashing.org Mon May 1 17:13:59 2006 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Tue, 2 May 2006 02:13:59 +0200 Subject: [openib-general] Re: [PATCH 2 of 13] ipath - set up 32-bit DMA mask if 64-bit setup fails In-Reply-To: References: <1906950392f7ef8c7d07.1145913778@eng-12.pathscale.com> <4B05D10C-407E-46A5-848F-0897D1E6D1CD@kernel.crashing.org> <114102B4-FBCB-4A5A-B986-80D4A730DD91@kernel.crashing.org> Message-ID: > Segher> And it builds just fine -- what is the problem you're > Segher> thinking of? > > Well, the ipath driver depends on PCI_MSI, and PCI_MSI depends on > (X86_LOCAL_APIC && X86_IO_APIC) || IA64 Oh, that. Right. It's about time I get my whole MSI patch set into shape for submission here, yes. > So how do you enable the driver? In a very hackish way right now :-( > And what powerpc platform can you use the device on? The latest PowerMac's have hardware support for MSI, to name just one platform. Segher From rdreier at cisco.com Mon May 1 17:18:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 17:18:34 -0700 Subject: [openib-general] Re: [PATCH 2 of 13] ipath - set up 32-bit DMA mask if 64-bit setup fails In-Reply-To: (Segher Boessenkool's message of "Tue, 2 May 2006 02:13:59 +0200") References: <1906950392f7ef8c7d07.1145913778@eng-12.pathscale.com> <4B05D10C-407E-46A5-848F-0897D1E6D1CD@kernel.crashing.org> <114102B4-FBCB-4A5A-B986-80D4A730DD91@kernel.crashing.org> Message-ID: Segher> Oh, that. Right. It's about time I get my whole MSI Segher> patch set into shape for submission here, yes. OK, that explains everything ;) So the ipath driver with a PCIe device works on a PowerMac G5? Cool. - R. From tokiko at hushmail.com Mon May 1 17:37:33 2006 From: tokiko at hushmail.com (tokiko at hushmail.com) Date: Mon, 1 May 2006 17:37:33 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCNmJBLEUqJEs3QyReJGwbKEI=?= =?iso-2022-jp?b?GyRCJEYkaz1PPXckSD1QMnEkJCReJDskcyQrISkbKEI=?= Message-ID: 20060502081911.24440mail@mail.lovelove-queensex552158754_lookserver772_womansystem01_woman-queen-love.tv 人妻セフレ探しの決定版! ※‥※‥※‥※‥※‥※‥※‥※‥※‥※ 世の中の女性の中で、人妻が一番出会えます。 それは、時間とお金に余裕があり、旦那とのSEXに飽きているからです。 妻とはこうあるべき、という仮面を脱いだ彼女達6万3千人にご登録いただいております。 ※‥※‥※‥※‥※‥※‥※‥※‥※‥※ <<今日の新規人妻>> ------------------------------------------------------------------- キララ様(25才) コメント: あまり経験のない方・・・ 詳しく見る⇒    http://lovlyqueen.cx/h/ ------------------------------------------------------------------- 谷様(36才) コメント: なんか家事に疲れちゃった・・・ 詳しく見る⇒    http://lovlyqueen.cx/h/ ------------------------------------------------------------------- 紹介料・登録料・退会料金等全て無料 エッチが好きな女性たちがあなたの欲求を満たしてくれます。 人妻との大人の関係をぜひこちらでお楽しみください。    http://lovlyqueen.cx/h/ ⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔⇔ 逆◎助 逆援では逢えないと思っている人いませんか?その思い込みを180度ひっくり返せるのがこのサイト! 当サイトは女性会員様の月会費で運営さしていただいてるので男性の紹介料・登録料・退会料金等全て無料となっています。 ↓↓↓↓↓    http://lovlyqueen.cx/h/ -------------------------------------------------------------------- From pauljbaxter at hotmail.com Mon May 1 18:27:48 2006 From: pauljbaxter at hotmail.com (Paul Baxter) Date: Tue, 2 May 2006 02:27:48 +0100 Subject: [openib-general] OpenIB Linux and Solaris Message-ID: Can anybody comment on recent experience regarding inter-operability of OpenIB/Linux on Intel 64 bit x86 and Sparx/Solaris using the Sun stack? Ideally inter-operability would start at IPoIB and the SM but perhaps extend to inter-operable SDP? Has anyone tried porting some userspace low-level OpenIB verbs comms (UC, RDMA write specifically). I am not experienced enough to understand if there are any gotchas with unimplemented features in either the Solaris or OpenIB implementations. I've not had much luck investigating the equivalent of userspace verbs support for Solaris. I notice that some of the commercial stacks offer cross platform support and say 'OpenIB support, Linux, Windows, Solaris and Mac OS X'. I suspect this might be subtle wording as I don't think OpenIB is ported toSolaris, ULP support is using some Solaris 10 drivers instead? Advice/ URL pointers appreciated (Google wasn't my friend!) Paul Baxter From hitozumabi at yahoo.co.jp Mon May 1 19:06:09 2006 From: hitozumabi at yahoo.co.jp (hitozumabi at yahoo.co.jp) Date: Mon, 1 May 2006 19:06:09 -0700 (PDT) Subject: [openib-general] =?utf-8?b?woHCmcKBwprCgsOgwoLCpMKCwrfCgsKuwok=?= =?utf-8?b?w4TCgsOFwoLCt8KCw4vCgcKawoHCmQ==?= Message-ID: 20050502105318.62415mail@mail.love-woman889889_gogo-server114_freesystem01_freefree-lovelove.tv �������͏����X�^�b�t�݂̂Œj����۔ԑg��^�c���Ă���܂��B �ŋ߂͗ގ��ԑg���������߂��A�j�����[�U�[���s�����Ă��܂����Ԃ��N���Ă���܂��B �������݂��Ă�قƂ�ǒj�����烁�[�������Ȃ��Ƃ��� �����������̋��̐������܂��Ă���A�������������Ă���܂��B ���ߏ��̏����ɋ��������L��̕��͂��Ј�x�����o�ł����p���Ă݂Ă��������܂��񂩁H �o�^���E�Љ�ȂLj�ؖ����ƂȂ��Ă���܂��̂ŁA �����ۋ�����悤�Ȃ��Ƃ͐�΂ɂ������܂���B ���S���Ă����p����������΁c�Ǝv���܂��B �@�@�@http://yaii.net/htm ���̃T�C�g�ɓo�^���Ă��鏗���́A��ɕ��i�Ȃ��Ȃ��~���𖞂������Ƃ̂ł��Ȃ������A �����ҁi�l�ȁj��A�n���A��w���A�L���o���A�ꌩ���ʂ̏��̎q�܂ŗl�X�ł��B �ʃ��[������J���Ă��������������Ⴂ�܂��̂ł������Q�l�ɂȂ����� ���C�ɓ���̏��̎q�Ɗy�������Ԃ�߂����Ă݂Ă͂������ł��傤���B �R�`�����疳���o�^�̎葱����s���Ă��������A ���̒j���ɐ��肳���O�Ɏ����̎v�����܂܂̃��C�t�X�^�C���𖞋i���Ă��������B �@�@�@http://yaii.net/htm ����ɂ����Ă����͂��Ē�����΁A�ŒZ��3���ȓ�ɑ҂����킹���”\�ƂȂ�܂��B �Ō�܂ł��ǂ݉������܂��āA���ɂ��肪�Ƃ��������܂����B From koop at cse.ohio-state.edu Mon May 1 18:55:35 2006 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Mon, 1 May 2006 21:55:35 -0400 (EDT) Subject: [openib-general] Problem running mpdboot command in MVAPICH2 v0.9.3-RC0 In-Reply-To: Message-ID: Albert, Has anything changed with your system since compiling MVAPICH2? I'm a bit confused why it would have worked with 0.9.2 and not 0.9.3-RC0. There wasn't any change between 0.9.2 and 0.9.3-RC0 that should create this type of issue. Can you try re-compiling and re-running? If you could also send along the configure.log generated it may help us look into this issue. Also, can you just send the output from ls -l /usr/local/lib, just to make sure there isn't any problems there? Thanks, Matthew Koop - Network-Based Computing Laboratory Ohio State University > Hi Wei, > > Thanks for a prompt reply. > > Yes, I did originally export the LD_LIBRARY_PATH in .bashrc as followed: > export LD_LIBRARY_PATH=/usr/local/lib > > I've also tried your suggestion: > export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH > > In either case, issue still exists. Given the same setup, I did NOT see > this issue in v0.9.2 (obtained from > https://openib.org/svn/gen2/trunk/src/userspace/mpi/mvapich2-gen2). > > Thanks, > Albert > > ----Original Message----- > From: wei huang [mailto:huanwei at cse.ohio-state.edu] > Sent: Saturday, April 29, 2006 2:09 PM > To: Albert To > Cc: openib-general at openib.org > Subject: Re: [openib-general] Problem running mpdboot command in > MVAPICH2 v0.9.3-RC0 > > Hi Albert, > > Not sure if you export /usr/local/lib to LD_LIBRARY_PATH manually or it > is in your bashrc. > > Could you please try to put > export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH > > in your .bashrc (assume using bash) and try again? > > Thanks. > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering Ohio State University OH 43210 > Tel: (614)292-8501 > > > On Fri, 28 Apr 2006, Albert To wrote: > > > Hi, > > > > I downloaded and compiled the MVAPICH2 v0.9.3-RC0 using > > make.mvapich2.gen2 script. The script finished without any errors. > > However, I received "mpdboot: error while loading shared libraries: > > libibverbs.so.1: cannot open shared object file: No such file or > > directory" error while executing mpdboot -n 2 -f mpd.hosts. I checked > > > library file libibverbs.so.1 and found it in /usr/local/lib folder. > > LD_LIBRARY_PATH is already set to /usr/local/bin, but that didn't > help. > > > > Is there another environment variable that I need to set to make > > mpdboot works? Thanks in advance for your help. > > > > -Albert > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From albertt at broadcom.com Mon May 1 20:11:29 2006 From: albertt at broadcom.com (Albert To) Date: Mon, 1 May 2006 20:11:29 -0700 Subject: [openib-general] Problem running mpdboot command in MVAPICH2 v0.9.3-RC0 Message-ID: Hi Matthew, Thanks for helping out on this issue. As requested, the info is attached. Note: The info was obtained from compilations of v0.9.2 and v0.9.3 back-to-back. Thus, system and setup are exactly the same. Afterwards, mpdboot works on v0.9.2 but NOT v0.9.3 Thanks, Albert -----Original Message----- From: Matthew Koop [mailto:koop at cse.ohio-state.edu] Sent: Monday, May 01, 2006 6:56 PM To: Albert To Cc: wei huang; openib-general at openib.org Subject: RE: [openib-general] Problem running mpdboot command in MVAPICH2 v0.9.3-RC0 Albert, Has anything changed with your system since compiling MVAPICH2? I'm a bit confused why it would have worked with 0.9.2 and not 0.9.3-RC0. There wasn't any change between 0.9.2 and 0.9.3-RC0 that should create this type of issue. Can you try re-compiling and re-running? If you could also send along the configure.log generated it may help us look into this issue. Also, can you just send the output from ls -l /usr/local/lib, just to make sure there isn't any problems there? Thanks, Matthew Koop - Network-Based Computing Laboratory Ohio State University > Hi Wei, > > Thanks for a prompt reply. > > Yes, I did originally export the LD_LIBRARY_PATH in .bashrc as followed: > export LD_LIBRARY_PATH=/usr/local/lib > > I've also tried your suggestion: > export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH > > In either case, issue still exists. Given the same setup, I did NOT > see this issue in v0.9.2 (obtained from > https://openib.org/svn/gen2/trunk/src/userspace/mpi/mvapich2-gen2). > > Thanks, > Albert > > ----Original Message----- > From: wei huang [mailto:huanwei at cse.ohio-state.edu] > Sent: Saturday, April 29, 2006 2:09 PM > To: Albert To > Cc: openib-general at openib.org > Subject: Re: [openib-general] Problem running mpdboot command in > MVAPICH2 v0.9.3-RC0 > > Hi Albert, > > Not sure if you export /usr/local/lib to LD_LIBRARY_PATH manually or > it is in your bashrc. > > Could you please try to put > export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH > > in your .bashrc (assume using bash) and try again? > > Thanks. > > Regards, > Wei Huang > > 774 Dreese Lab, 2015 Neil Ave, > Dept. of Computer Science and Engineering Ohio State University OH > 43210 > Tel: (614)292-8501 > > > On Fri, 28 Apr 2006, Albert To wrote: > > > Hi, > > > > I downloaded and compiled the MVAPICH2 v0.9.3-RC0 using > > make.mvapich2.gen2 script. The script finished without any errors. > > However, I received "mpdboot: error while loading shared libraries: > > libibverbs.so.1: cannot open shared object file: No such file or > > directory" error while executing mpdboot -n 2 -f mpd.hosts. I > > checked > > > library file libibverbs.so.1 and found it in /usr/local/lib folder. > > LD_LIBRARY_PATH is already set to /usr/local/bin, but that didn't > help. > > > > Is there another environment variable that I need to set to make > > mpdboot works? Thanks in advance for your help. > > > > -Albert > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > -------------- next part -------------- A non-text attachment was scrubbed... Name: mvapich2-ts.tgz Type: application/x-compressed Size: 89082 bytes Desc: mvapich2-ts.tgz URL: From hood at abfs.com Mon May 1 21:48:52 2006 From: hood at abfs.com (Bertha Sanford) Date: Mon, 01 May 2006 20:48:52 -0800 Subject: [openib-general] Your mortagee approval Message-ID: <03013.$$.01798.Etrack@hotmail.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mediocre.9.gif Type: image/gif Size: 8467 bytes Desc: not available URL: From tokiko at hushmail.com Mon May 1 21:35:16 2006 From: tokiko at hushmail.com (tokiko at hushmail.com) Date: Mon, 1 May 2006 21:35:16 -0700 (PDT) Subject: [openib-general] =?utf-8?b?wpZ7wpPDusKCw4zCjcKCwop6wonCh8KPwpU=?= =?utf-8?b?worDs8KWXcKCw4XCgsK3?= Message-ID: 20030827194909.16402mail@mail.love-sexlife88545879889_woman-server889_womansystem01_woman-sexlife-love.tv �y�t������]���[��1���z�͂��܂����B http://twilight.cx/h/ �w���O�x�F�R�� �w�N��x�F35�� �w�E�Ɓx�F���c�� �w�N���x�F6000���~ �w�����x�F�o���܂� �w�ʐ^�x�F���� �w��e�x�F���ꂩ���܂����H �w�ꌾ�x�F�����͖]�܂Ȃ��̂ŁA�������̃Z�t���ɂȂ��Ă���܂��񂩁H �@�@�@�@�@ �������炩�疳���ԐM�� http://twilight.cx/h/ �����݁A�R�����񂩂�̊�]���[�������Ă��܂��B ��yahoo�A�h���X�Ȃǃt���[���[���A�h���X����ł�o�^�ł��܂��� ���R�����񂩂�̋t������]�͑�ϐl�C�ł��̂ł����ڂ̂��Ԏ������ߒv���܂��B �y�ۏ؋�E�o�^�E�Љ�ȂǑS�Ė����z From devesh28 at gmail.com Mon May 1 21:45:07 2006 From: devesh28 at gmail.com (Devesh Sharma) Date: Tue, 2 May 2006 10:15:07 +0530 Subject: [openib-general] RE: SDP fails to compile on SVN6829 In-Reply-To: <000301c66d4a$0cd1a070$7aa9070a@amr.corp.intel.com> References: <20060501172403.GB4580@mellanox.co.il> <000301c66d4a$0cd1a070$7aa9070a@amr.corp.intel.com> Message-ID: <309a667c0605012145v1363c170q6bc7321d7e93d799@mail.gmail.com> Hi, are there any limitations faced by old SDP implementation, why reimplementation is required? On 5/1/06, Bob Woodruff wrote: > > Michael wrote, > > do I need an additional patch or is the backport patch broken ? > > Personally, I don't think we should be moving code into the trunk > until it is ready, and obviously this new SDP is not ready. This is correct code should no be added to the trunk until its not complete. Anyone else have an opinion on how/when things get moved to the trunk ? > Shouldn't it be kept on a branch till it is ready ? > > > >I think you need these: > > > A /gen2/branches/backport/2.6.9/linux_skbuff_6754_to_2_6_11.patch > (from > >/gen2/branches/backport/2.6.11/linux_skbuff_6754_to_2_6_11.patch:6765) > > For example, this patch adds a function > static inline void skb_header_release(struct sk_buff *skb) > that does not do anything yet. > > Maybe I better wait till you have this new > SDP completed before moving to it, until then I will use the older > SDP. > > woody > > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ last_stable/include/linux/skbuff.h 2006-04-30 09:16:05.000000000+0300 > @@ -0,0 +1,19 @@ > +#ifndef LINUX_SKBUFF_H_BACKPORT > +#define LINUX_SKBUFF_H_BACKPORT > + > +#include_next > + > +/** > + * skb_header_release - release reference to header > + * @skb: buffer to operate on > + * > + * Drop a reference to the header part of the buffer. This is done > + * by acquiring a payload reference. You must not read from the > header > + * part of skb->data after this. > + */ > +static inline void skb_header_release(struct sk_buff *skb) > +{ > +} > + > + > +#endif > > MST > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon May 1 22:13:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 May 2006 22:13:05 -0700 Subject: [openib-general] Re: [PATCH][UVERBS][RFC] node type in ibv_context In-Reply-To: <1145911267.18808.36.camel@trinity.ogc.int> (Tom Tucker's message of "Mon, 24 Apr 2006 15:41:07 -0500") References: <1145900760.18808.19.camel@trinity.ogc.int> <444D144E.3020506@ichips.intel.com> <1145911267.18808.36.camel@trinity.ogc.int> Message-ID: Tom> Here's a patch that puts a node_type in the ibv_context. Two problems: - It breaks the ABI (which is frozen for the libibverbs 1.0 series) - Even when we're ready to break ABI, I think node_type should be in struct ibv_device since it's not per-context at all. - R. From sns_parking_henz at tiger.livedoor.com Mon May 1 22:55:23 2006 From: sns_parking_henz at tiger.livedoor.com (=?iso-2022-jp?B?GyRCJUElYyE8JUglaSVzJS0lcyUwGyhC?=) Date: Mon, 1 May 2006 22:55:23 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?c25zGyRCJTUlJCVIJFgkSCVqGyhC?= =?iso-2022-jp?b?GyRCJUslZSE8JSIlayUqITwlVyVzQ1ckNyReJDckPxsoQg==?= Message-ID: <20060502055523.157322283DA@openib.ca.sandia.gov> フリー掲示板からSNS(招待制コミニティー)へとリニューアルオープンしましたのでご連絡さしあげます。 フリー掲示板では皆様からのご意見をもとに改正させて頂き、今月頭より新番組としてオープン致しております。 openib-generalこちらのユーザー様はフリー掲示板の方ですでに登録済みとなっておりましたので変更通知をお送りさせて頂きました。 使用上のポイントとしては完全無料のフリーコミニティとしてお使い頂けます。 コミニティ・ナビから【かおりさん・綾音さん】こちらの3名の方が友ナビとして話すコトができます。 フリー掲示板で行っていました、18禁の書き込み(乱交イベント・アダルトイベント・¥助・逆¥…フェチ画像投稿)などなどこちらもコミニティの方で断続してお使いになれます。 使用料金はいっさい頂きませんが、アダルト掲示板のため任意の上お使いになられてください。 http://hanabira.org/c/new_p.cgi?ix13a 本来、招待制のコミニティですがopenib-generalユーザー様へはすでにご登録いただいておりますのでこちらより変更のお手続きを済ませてください。 もし、見に覚えのないメールでしたらフリー登録を済ませた後、情報をご確認いただけますようお願い致します。 http://hanabira.org/e/new_p.cgi?ix13a こちらは個室コミニティとなっています、 アプローチ待ちの女性が随時更新されるシステムになっていますので、チェックしてみては如何でしょうか!! すべて完全無料のコミニティです! 今現在参加いただいてる会員数は76万人となっています! リニューアルされたコミニティ!ぜひお使いになられてください。 From hanhill2003 at thekidsmall.com Mon May 1 23:47:42 2006 From: hanhill2003 at thekidsmall.com (Alan Scott) Date: Tue, 02 May 2006 14:47:42 +0800 Subject: [openib-general] The Industries leading enhancement product, now on sale! Message-ID: <000001c66ddf$1428f180$0100007f@localhost> In a trice without warning the face of nature grew sullen Black angry mouths, the clouds swallowed up the sun The air was dense with suppressed excitement The wind howled through the long corridors and sobbed and whisperedin the secret recesses -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: top.jpg Type: image/jpeg Size: 8387 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: down.gif Type: image/gif Size: 7523 bytes Desc: not available URL: From ogerlitz at voltaire.com Tue May 2 00:49:11 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 02 May 2006 10:49:11 +0300 Subject: [openib-general] re cma upcalls serialization / disconnected eventquestion In-Reply-To: References: Message-ID: <44570EF7.5060109@voltaire.com> Sean Hefty wrote: >> Can a ULP assume that cma callbacks for to the same CMA ID >> are serialized? > > Yes. (This is required to avoid reporting events out of order to the user.) > >> Also and related to this, is it correct that ***always** before >> DISCONNECTED event there will be one of {ESTABLISHED, REJECTED, >> CONNECT_ERROR}? > > You should always see ESTABLISHED before DISCONNECTED. If not, then there's a > bug in the CMA. I see, so just to make sure: following rmda_connect i will get always see one of {ESTABLISHED, REJECTED, CONNECT_ERROR} ? OK, thanks for the clarifications. Or. From ogerlitz at voltaire.com Tue May 2 00:56:26 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 02 May 2006 10:56:26 +0300 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: References: Message-ID: <445710AA.9080709@voltaire.com> Roland Dreier wrote: > Is this ready for queuing in my for-2.6.18 tree? What is the status > of all the non-IB dependencies? > If it is ready for merging, please send me a clean patch series with > the comments from this thread addressed. And also remind me of which > SCSI git trees this depends on... I am working on reviewing / applying fixes to the comments, and will send you a clean patch set when done. The only non-IB dependency is in the iSCSI updates for 2.6.18. The git from which those updates are pushed upstream is scsi-misc-2.6 . Now, James have accepted into it 5/6 of the updates (see below) but there's still one which is not there yet. I will let you know. Or. Mike Christie [SCSI] iscsi: convert iscsi tcp to libiscsi Mike Christie [SCSI] iscsi: add libiscsi Mike Christie [SCSI] iscsi: fix up iscsi eh Mike Christie [SCSI] iscsi: add sysfs attrs for uspace sync up Mike Christie [SCSI] iscsi: rm kernel iscsi handles usage for session From ogerlitz at voltaire.com Tue May 2 01:23:33 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 02 May 2006 11:23:33 +0300 Subject: [openib-general] re RDS missing features In-Reply-To: <96f8e60e0605011042ve9bb9m5e9675256a11eacd@mail.gmail.com> References: <96f8e60e0605011042ve9bb9m5e9675256a11eacd@mail.gmail.com> Message-ID: <44571705.9000208@voltaire.com> Ranjit Pandit wrote: > On 5/1/06, Or Gerlitz wrote: >> Can you elaborate on each of the features, specifically the following >> points are of interest to us: >> >> +1 so you running Oracle Loopback traffic over RDS sockets? if yes, what >> the issue here? >> the openib CMA supports listen/connect on loopback addresses (eg >> 127.0.0.1 or IPoIB local address) > > Yes. > There is no issue. It's just next in line for me to implement. So what's remained to implement? if the app attempt to send data to 127.0.0.1 or a local IPoIB address then you are opening a connection to this address over the CMA and in the passive "side" you just do rdma_listen without binding to any device. In other words, no change on the active side and a simplification of the passive side to support this. Do i miss something here? >> +2 by failover, are you referring to APM? that is failover between IB >> pathes to/from the same HCA >> over which the original connection/QP was established or you are talking >> on failover between HCAs > > Failover within and across HCAs. APM does not work for failover across > HCAs. I see. Can you remind me ... where is the location of the reference gen1 RDS code? does it support failover? Also, for within the HCA failover, are you talking on APM or basically, you apply the same failover scheme between to ports no matter if they are on on the same HCA or on different HCAs? Are you aware to something in the openib infrastructure which is missing for the failover design of RDS? if you specify the design/requirements i am sure people on this list can quickly say if something is missing... Or. >> [openfabrics-ewg] Before we can start testing - we needto ensure that >> RDS is fully ported. >> >> Pandit, Ranjit rpandit at silverstorm.com >> >> Following features are yet to be implemented in OpenFabric Rds: >> >> 1. Failover >> 2. Loopback connections >> 3. support for /proc fs like Rds config, stats and info. >> >> >> >> Ranjit From eitan at mellanox.co.il Tue May 2 01:32:55 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 2 May 2006 11:32:55 +0300 Subject: [openib-general] RE: [PATCH 1/4] opensm: don't try to enforce partitions on router port Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAD4@mtlexch01.mtl.com> Look fine to me. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Sunday, April 23, 2006 5:26 PM > To: Hal Rosenstock > Cc: openib-general at openib.org; Eitan Zahavi; Ofer Gigi; Yael Kalka > Subject: [PATCH 1/4] opensm: don't try to enforce partitions on router port > > > When router port is connected directly to CA don't try handle it as > switch external ports (update pkey table and enforce partitions). > Router ports are handled by partition manager as end ports. > > Signed-off-by: Sasha Khapyorsky > --- > > osm/opensm/osm_pkey_mgr.c | 43 ++++++++++++++++++++----------------------- > 1 files changed, 20 insertions(+), 23 deletions(-) > > diff --git a/osm/opensm/osm_pkey_mgr.c b/osm/opensm/osm_pkey_mgr.c > index 938632e..bdb3ae4 100644 > --- a/osm/opensm/osm_pkey_mgr.c > +++ b/osm/opensm/osm_pkey_mgr.c > @@ -307,7 +307,8 @@ __osm_pkey_mgr_process_physical_port( > static void > osm_pkey_mgr_update_peer_port( > const osm_pkey_mgr_t * const p_mgr, > - const osm_port_t * const p_port ) > + const osm_port_t * const p_port, > + boolean_t enforce) > { > osm_physp_t *p, *peer; > osm_node_t *p_node; > @@ -326,18 +327,25 @@ osm_pkey_mgr_update_peer_port( > if ( !peer || !osm_physp_is_valid( peer ) ) > return; > p_node = osm_physp_get_node_ptr( peer ); > - if ( osm_node_get_type( p_node ) == IB_NODE_TYPE_CA ) > + if ( osm_node_get_type( p_node ) != IB_NODE_TYPE_SWITCH ) > return; > - else if ( osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH ) { > - if (!(p_sw = osm_get_switch_by_guid( p_mgr->p_subn, > - osm_node_get_node_guid( p_node ))) || > - !(p_si = osm_switch_get_si_ptr( p_sw )) || > - !p_si->enforce_cap) > - return; > + > + p_sw = osm_get_switch_by_guid( p_mgr->p_subn, osm_node_get_node_guid( > p_node )); > + if (!p_sw || !(p_si = osm_switch_get_si_ptr( p_sw )) || > + !p_si->enforce_cap) > + return; > + > + if (osm_pkey_mgr_enforce_partition( p_mgr, peer, enforce ) != IB_SUCCESS) { > + osm_log( p_mgr->p_log, OSM_LOG_ERROR, > + "osm_pkey_mgr_update_peer_port: " > + "osm_pkey_mgr_enforce_partition() failed to update " > + "node 0x%016" PRIx64 " port %u\n", > + cl_ntoh64( osm_node_get_node_guid( p_node ) ), > + osm_physp_get_port_num( peer ) ); > } > > - if (p_mgr->p_subn->opt.no_partition_enforcement == TRUE) > - goto _enforce_port; > + if (enforce == FALSE) > + return; > > p_pkey_tbl = osm_physp_get_pkey_tbl( p ); > p_peer_pkey_tbl = osm_physp_get_pkey_tbl( peer ); > @@ -377,18 +385,6 @@ osm_pkey_mgr_update_peer_port( > cl_ntoh64( osm_node_get_node_guid( p_node ) ), > osm_physp_get_port_num( peer ) ); > } > - > - _enforce_port: > - if (osm_pkey_mgr_enforce_partition( p_mgr, peer, > - p_mgr->p_subn->opt.no_partition_enforcement == FALSE ) != > - IB_SUCCESS) { > - osm_log( p_mgr->p_log, OSM_LOG_ERROR, > - "osm_pkey_mgr_update_peer_port: " > - "osm_pkey_mgr_enforce_partition() failed to update " > - "node 0x%016" PRIx64 " port %u\n", > - cl_ntoh64( osm_node_get_node_guid( p_node ) ), > - osm_physp_get_port_num( peer ) ); > - } > } > > /********************************************************************** > @@ -484,7 +480,8 @@ osm_pkey_mgr_process( > if ( osm_node_get_type( osm_port_get_parent_node( p_port ) ) != > IB_NODE_TYPE_SWITCH ) > { > - osm_pkey_mgr_update_peer_port( p_mgr, p_port ); > + osm_pkey_mgr_update_peer_port( p_mgr, p_port, > + !p_mgr->p_subn->opt.no_partition_enforcement); > } > } > From ishai at mellanox.co.il Tue May 2 01:30:13 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Tue, 2 May 2006 11:30:13 +0300 Subject: [openib-general] Re: [openfabrics-ewg] Current OFED kernel snapshot - problems in back porting SRP to RH4 In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30208CEF7@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30208CEF7@mtlexch01.mtl.com> Message-ID: <20060502083013.GA781@mellanox.co.il> On Tue, May 02, 2006 at 10:34:29AM +0300, Aviram Gutman wrote: > > Yes, we have uploaded a pre version of RC4. You can find it in: > > URL: https://openib.org/svn/gen2/branches/1.0/ofed/releases > > OFED-1.0-rc4-pre4.tgz > OFED-1.0-rc4-pre4.tgz.md5sum > > Please pay attention that we still face issues. The following is the diff between RC3 and the pre: > > 1. Bug fixes according to problems reported. > > 2. SRP - with new features: FMR, tunable parameters, SRP daemon - We have an issue with the back port of SRP to RH4 U2 and U3. Ishai will issue a mail with explanation. > > 3. Open MPI - new package based on 1.1a3 - Please be noted that RPM building process failed. Vlad will > > 4. RDS - new version from main trunk > > 5. Kernel code based on git > > 6. Standard network configuration > > > Known issues: > 1. ipath installation fails on 2.6.9 - 2.6.11* kernels > 2. OSU MPI compilation fails on SLES10, PPC64 > 3. SRP is not supported on 2.6.9 - 2.6.13* kernels - Ishai will follow up with details > 4. Open MPI RPM build process fails - Jeff, will you be able to send us fixes by Wed? > > > Regards, > Aviram > > -----Original Message----- > From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Doug Ledford > Sent: Monday, May 01, 2006 11:53 PM > To: openfabrics-ewg at openib.org > Subject: [openfabrics-ewg] Current OFED kernel snapshot > > There's rumored to be a significant number of changes between rc3 and rc4 on the kernel module front. I would like to get started on integrating those changes sooner rather than later. So, where would I go to get a snapshot of the latest OFED kernel > tree. So far I've only found kernel trees under the tags directory and obviously the rc4 tag hasn't been populated yet. > > -- > Doug Ledford > Red Hat, Inc. > 1801 Varsity Dr. > Raleigh, NC 27606 > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg Hi We have a problem when trying to back port SRP to RH4 U2 and U3 (Actually to any kernel earlier than 2.6.13). The problem is when the SCSI driver is calling to eh_abort_handler, or to eh_device_reset_handler. In the current kernel (starting from 2.6.13) this call is made without host_lock spin-lock locked. In the SRP code that performs the abort and the reset (srp_send_tsk_mgmt) we send a message to the target and we wait for a response from the target. In early versions of the kernel the SCSI driver performs irq_spinlock_save to the host_lock before calling to the abort or reset handlers. This creates a problem: The SRP driver can not go to sleep until the target will answer. Any ideas? -- Ishai Rabinovitz From eitan at mellanox.co.il Tue May 2 01:41:24 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 2 May 2006 11:41:24 +0300 Subject: [openib-general] RE: [PATCH 2/4] opensm: remove unused osm_pkey_mgr_t object Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAD5@mtlexch01.mtl.com> Hi Sasha, I really do not like this patch. I think that although it does not break the code TODAY, it will be reversed later. OpenSM uses the concept of "manager" for each of the algorithms used. One could claim that all these managers are redundant and could be replaced by an extension to the osm object. This is true but will result with a non clear boundary between the managers. Although there is no right or wrong on this kind of issues, I think that the winning argument is that today OpenSM is written according to the above simple rule. Let's not break it. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Sunday, April 23, 2006 5:26 PM > To: Hal Rosenstock > Cc: openib-general at openib.org; Eitan Zahavi; Ofer Gigi; Yael Kalka > Subject: [PATCH 2/4] opensm: remove unused osm_pkey_mgr_t object > > > The structure osm_pkey_mgr_t is not used for pkey management - > clean it up. > > Signed-off-by: Sasha Khapyorsky > --- > > osm/include/opensm/osm_pkey_mgr.h | 183 ++---------------------------------- > osm/include/opensm/osm_sm.h | 2 > osm/include/opensm/osm_state_mgr.h | 11 -- > osm/opensm/osm_pkey_mgr.c | 182 ++++++++++++++---------------------- > osm/opensm/osm_sm.c | 12 -- > osm/opensm/osm_state_mgr.c | 8 +- > 6 files changed, 82 insertions(+), 316 deletions(-) > > diff --git a/osm/include/opensm/osm_pkey_mgr.h > b/osm/include/opensm/osm_pkey_mgr.h > index fef3667..cb0075d 100644 > --- a/osm/include/opensm/osm_pkey_mgr.h > +++ b/osm/include/opensm/osm_pkey_mgr.h > @@ -1,4 +1,5 @@ > /* > + * Copyright (c) 2006 Voltaire, Inc. All rights reserved. > * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. > * > * This software is available to you under a choice of one of two > @@ -35,9 +36,8 @@ > > /* > * Abstract: > - * Declaration of osm_pkey_mgr_t. > - * This object represents the P_Key Manager object. > - * This object is part of the OpenSM family of objects. > + * Prototype for osm_pkey_mgr_process() function > + * This is part of the OpenSM family of objects. > * > * Environment: > * Linux User Mode > @@ -49,10 +49,8 @@ > #ifndef _OSM_PKEY_MGR_H_ > #define _OSM_PKEY_MGR_H_ > > -#include > -#include > -#include > -#include > +#include > +#include > > #ifdef __cplusplus > # define BEGIN_C_DECLS extern "C" { > @@ -64,166 +62,6 @@ #endif /* __cplusplus */ > > BEGIN_C_DECLS > > -/****h* OpenSM/P_Key Manager > -* NAME > -* P_Key Manager > -* > -* DESCRIPTION > -* The P_Key Manager object manage the p_key tables of all > -* objects in the subnet > -* > -* AUTHOR > -* Ofer Gigi, Mellanox > -* > -*********/ > -/****s* OpenSM: P_Key Manager/osm_pkey_mgr_t > -* NAME > -* osm_pkey_mgr_t > -* > -* DESCRIPTION > -* p_Key Manager structure. > -* > -* > -* SYNOPSIS > -*/ > - > -typedef struct _osm_pkey_mgr > -{ > - osm_subn_t *p_subn; > - osm_log_t *p_log; > - osm_req_t *p_req; > - cl_plock_t *p_lock; > - > -} osm_pkey_mgr_t; > - > -/* > -* FIELDS > -* p_subn > -* Pointer to the Subnet object for this subnet. > -* > -* p_log > -* Pointer to the log object. > -* > -* p_req > -* Pointer to the Request object. > -* > -* p_lock > -* Pointer to the serializing lock. > -* > -* SEE ALSO > -* P_Key Manager object > -*********/ > - > -/****** OpenSM: P_Key Manager/osm_pkey_mgr_construct > -* NAME > -* osm_pkey_mgr_construct > -* > -* DESCRIPTION > -* This function constructs a P_Key Manager object. > -* > -* SYNOPSIS > -*/ > -void > -osm_pkey_mgr_construct( > - IN osm_pkey_mgr_t* const p_mgr ); > -/* > -* PARAMETERS > -* p_mgr > -* [in] Pointer to a P_Key Manager object to construct. > -* > -* RETURN VALUE > -* This function does not return a value. > -* > -* NOTES > -* Allows calling osm_pkey_mgr_init, osm_pkey_mgr_destroy > -* > -* Calling osm_pkey_mgr_construct is a prerequisite to calling any other > -* method except osm_pkey_mgr_init. > -* > -* SEE ALSO > -* P_Key Manager object, osm_pkey_mgr_init, > -* osm_pkey_mgr_destroy > -*********/ > - > -/****f* OpenSM: P_Key Manager/osm_pkey_mgr_destroy > -* NAME > -* osm_pkey_mgr_destroy > -* > -* DESCRIPTION > -* The osm_pkey_mgr_destroy function destroys the object, releasing > -* all resources. > -* > -* SYNOPSIS > -*/ > -void > -osm_pkey_mgr_destroy( > - IN osm_pkey_mgr_t* const p_mgr ); > -/* > -* PARAMETERS > -* p_mgr > -* [in] Pointer to the object to destroy. > -* > -* RETURN VALUE > -* This function does not return a value. > -* > -* NOTES > -* Performs any necessary cleanup of the specified > -* P_Key Manager object. > -* Further operations should not be attempted on the destroyed object. > -* This function should only be called after a call to > -* osm_pkey_mgr_construct or osm_pkey_mgr_init. > -* > -* SEE ALSO > -* P_Key Manager object, osm_pkey_mgr_construct, > -* osm_pkey_mgr_init > -*********/ > - > -/****f* OpenSM: P_Key Manager/osm_pkey_mgr_init > -* NAME > -* osm_pkey_mgr_init > -* > -* DESCRIPTION > -* The osm_pkey_mgr_init function initializes a > -* P_Key Manager object for use. > -* > -* SYNOPSIS > -*/ > -ib_api_status_t > -osm_pkey_mgr_init( > - IN osm_pkey_mgr_t* const p_mgr, > - IN osm_subn_t* const p_subn, > - IN osm_log_t* const p_log, > - IN osm_req_t* const p_req, > - IN cl_plock_t* const p_lock ); > -/* > -* PARAMETERS > -* p_mgr > -* [in] Pointer to an osm_pkey_mgr_t object to initialize. > -* > -* p_subn > -* [in] Pointer to the Subnet object for this subnet. > -* > -* p_log > -* [in] Pointer to the log object. > -* > -* p_req > -* [in] Pointer to an osm_req_t object. > -* > -* p_lock > -* [in] Pointer to the OpenSM serializing lock. > -* > -* RETURN VALUES > -* IB_SUCCESS if the P_Key Manager object was initialized > -* successfully. > -* > -* NOTES > -* Allows calling other P_Key Manager methods. > -* > -* SEE ALSO > -* P_Key Manager object, osm_pkey_mgr_construct, > -* osm_pkey_mgr_destroy > -*********/ > - > /****f* OpenSM: P_Key Manager/osm_pkey_mgr_process > * NAME > * osm_pkey_mgr_process > @@ -235,23 +73,18 @@ osm_pkey_mgr_init( > */ > osm_signal_t > osm_pkey_mgr_process( > - IN const osm_pkey_mgr_t* const p_mgr ); > + IN osm_opensm_t *p_osm ); > /* > * PARAMETERS > -* p_mgr > -* [in] Pointer to an osm_pkey_mgr_t object. > +* p_osm > +* [in] Pointer to an osm_opensm_t object. > * > * RETURN VALUES > * None > * > * NOTES > -* Current Operations: > -* - Inserts IB_DEFAULT_PKEY to all node objects that don't have > -* IB_DEFAULT_PARTIAL_PKEY or IB_DEFAULT_PKEY as part > -* of their p_key table > * > * SEE ALSO > -* P_Key Manager > *********/ > > END_C_DECLS > diff --git a/osm/include/opensm/osm_sm.h b/osm/include/opensm/osm_sm.h > index d9fbd8a..d6086d4 100644 > --- a/osm/include/opensm/osm_sm.h > +++ b/osm/include/opensm/osm_sm.h > @@ -74,7 +74,6 @@ #include > #include > #include > #include > -#include > #include > #include > #include > @@ -162,7 +161,6 @@ typedef struct _osm_sm > osm_link_mgr_t link_mgr; > osm_state_mgr_t state_mgr; > osm_drop_mgr_t drop_mgr; > - osm_pkey_mgr_t pkey_mgr; > osm_lft_rcv_t lft_rcv; > osm_lft_rcv_ctrl_t lft_rcv_ctrl; > osm_mft_rcv_t mft_rcv; > diff --git a/osm/include/opensm/osm_state_mgr.h > b/osm/include/opensm/osm_state_mgr.h > index 92aa910..a9385d1 100644 > --- a/osm/include/opensm/osm_state_mgr.h > +++ b/osm/include/opensm/osm_state_mgr.h > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. > + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. > * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. > * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. > * > @@ -60,7 +60,6 @@ #include > #include > #include > #include > -#include > #include > #include > > @@ -113,7 +112,6 @@ typedef struct _osm_state_mgr > osm_mcast_mgr_t *p_mcast_mgr; > osm_link_mgr_t *p_link_mgr; > osm_drop_mgr_t *p_drop_mgr; > - osm_pkey_mgr_t *p_pkey_mgr; > osm_req_t *p_req; > osm_stats_t *p_stats; > struct _osm_sm_state_mgr *p_sm_state_mgr; > @@ -151,9 +149,6 @@ typedef struct _osm_state_mgr > * p_drop_mgr > * Pointer to the Drop Manager object. > * > -* p_pkey_mgr > -* Pointer to the P_Key Manager object. > -* > * p_req > * Pointer to the Requester object sending SMPs. > * > @@ -379,7 +374,6 @@ osm_state_mgr_init( > IN osm_mcast_mgr_t* const p_mcast_mgr, > IN osm_link_mgr_t* const p_link_mgr, > IN osm_drop_mgr_t* const p_drop_mgr, > - IN osm_pkey_mgr_t* const p_pkey_mgr, > IN osm_req_t* const p_req, > IN osm_stats_t* const p_stats, > IN struct _osm_sm_state_mgr* const p_sm_state_mgr, > @@ -411,9 +405,6 @@ osm_state_mgr_init( > * p_drop_mgr > * [in] Pointer to the Drop Manager object. > * > -* p_pkey_mgr > -* [in] Pointer to the P_Key Manager object. > -* > * p_req > * [in] Pointer to the Request Controller object. > * > diff --git a/osm/opensm/osm_pkey_mgr.c b/osm/opensm/osm_pkey_mgr.c > index bdb3ae4..7b3da26 100644 > --- a/osm/opensm/osm_pkey_mgr.c > +++ b/osm/opensm/osm_pkey_mgr.c > @@ -37,9 +37,8 @@ > > /* > * Abstract: > - * Implementation of osm_pkey_mgr_t. > - * This object represents the P_Key Manager object. > - * This object is part of the opensm family of objects. > + * Implementation of the P_Key Manager (Partititon Manager). > + * This is part of the OpenSM. > * > * Environment: > * Linux User Mode > @@ -58,62 +57,15 @@ #include > #include > #include > #include > - > -/********************************************************************** > - **********************************************************************/ > -void > -osm_pkey_mgr_construct( > - IN osm_pkey_mgr_t * const p_mgr ) > -{ > - CL_ASSERT( p_mgr ); > - cl_memclr( p_mgr, sizeof( *p_mgr ) ); > -} > - > -/********************************************************************** > - **********************************************************************/ > -void > -osm_pkey_mgr_destroy( > - IN osm_pkey_mgr_t * const p_mgr ) > -{ > - CL_ASSERT( p_mgr ); > - > - OSM_LOG_ENTER( p_mgr->p_log, osm_pkey_mgr_destroy ); > - > - OSM_LOG_EXIT( p_mgr->p_log ); > -} > - > -/********************************************************************** > - **********************************************************************/ > -ib_api_status_t > -osm_pkey_mgr_init( > - IN osm_pkey_mgr_t * const p_mgr, > - IN osm_subn_t * const p_subn, > - IN osm_log_t * const p_log, > - IN osm_req_t * const p_req, > - IN cl_plock_t * const p_lock ) > -{ > - ib_api_status_t status = IB_SUCCESS; > - > - OSM_LOG_ENTER( p_log, osm_pkey_mgr_init ); > - > - osm_pkey_mgr_construct( p_mgr ); > - > - p_mgr->p_log = p_log; > - p_mgr->p_subn = p_subn; > - p_mgr->p_lock = p_lock; > - p_mgr->p_req = p_req; > - > - OSM_LOG_EXIT( p_mgr->p_log ); > - return ( status ); > -} > +#include > > /********************************************************************** > **********************************************************************/ > static ib_api_status_t > -osm_pkey_mgr_update_pkey_entry( > - IN const osm_pkey_mgr_t * const p_mgr, > - IN const osm_physp_t * p_physp, > - IN const ib_pkey_table_t * block, > +pkey_mgr_update_pkey_entry( > + IN const osm_req_t *p_req, > + IN const osm_physp_t *p_physp, > + IN const ib_pkey_table_t *block, > IN const uint16_t block_index ) > { > osm_madw_context_t context; > @@ -126,7 +78,7 @@ osm_pkey_mgr_update_pkey_entry( > attr_mod = block_index; > if ( osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH ) > attr_mod |= osm_physp_get_port_num( p_physp ) << 16; > - return osm_req_set( p_mgr->p_req, osm_physp_get_dr_path_ptr( p_physp ), > + return osm_req_set( p_req, osm_physp_get_dr_path_ptr( p_physp ), > ( uint8_t * ) block, sizeof( *block ), > IB_MAD_ATTR_P_KEY_TABLE, > cl_hton32( attr_mod ), CL_DISP_MSGID_NONE, &context ); > @@ -135,9 +87,9 @@ osm_pkey_mgr_update_pkey_entry( > /********************************************************************** > **********************************************************************/ > static ib_api_status_t > -osm_pkey_mgr_enforce_partition( > - IN const osm_pkey_mgr_t * const p_mgr, > - IN const osm_physp_t * p_physp, > +pkey_mgr_enforce_partition( > + IN const osm_req_t *p_req, > + IN const osm_physp_t *p_physp, > IN const boolean_t enforce) > { > osm_madw_context_t context; > @@ -168,7 +120,7 @@ osm_pkey_mgr_enforce_partition( > context.pi_context.ignore_errors = FALSE; > context.pi_context.light_sweep = FALSE; > > - return osm_req_set( p_mgr->p_req, osm_physp_get_dr_path_ptr( p_physp ), > + return osm_req_set( p_req, osm_physp_get_dr_path_ptr( p_physp ), > payload, sizeof(payload), > IB_MAD_ATTR_PORT_INFO, > cl_hton32(osm_physp_get_port_num( p_physp )), > @@ -184,10 +136,11 @@ osm_pkey_mgr_enforce_partition( > */ > > static boolean_t > -__osm_pkey_mgr_process_physical_port( > - IN const osm_pkey_mgr_t * const p_mgr, > +pkey_mgr_process_physical_port( > + IN osm_log_t *p_log, > + IN const osm_req_t *p_req, > IN const ib_net16_t pkey, > - IN osm_physp_t * p_physp ) > + IN osm_physp_t *p_physp ) > { > boolean_t return_val = FALSE; /* TRUE if pkey was inserted or updated */ > ib_api_status_t status; > @@ -200,7 +153,7 @@ __osm_pkey_mgr_process_physical_port( > uint32_t i; > boolean_t block_found = FALSE; > > - OSM_LOG_ENTER( p_mgr->p_log, __osm_pkey_mgr_process_physical_port ); > + OSM_LOG_ENTER( p_log, pkey_mgr_process_physical_port ); > > p_pkey_tbl = osm_physp_get_pkey_tbl( p_physp ); > num_of_blocks = osm_pkey_tbl_get_num_blocks( p_pkey_tbl ); > @@ -209,10 +162,10 @@ __osm_pkey_mgr_process_physical_port( > > if ( p_orig_pkey && *p_orig_pkey == pkey ) > { > - if ( osm_log_is_active( p_mgr->p_log, OSM_LOG_VERBOSE ) ) > + if ( osm_log_is_active( p_log, OSM_LOG_VERBOSE ) ) > { > - osm_log( p_mgr->p_log, OSM_LOG_VERBOSE, > - "__osm_pkey_mgr_process_physical_port: " > + osm_log( p_log, OSM_LOG_VERBOSE, > + "pkey_mgr_process_physical_port: " > "No need to insert pkey 0x%04x for node 0x%016" PRIx64 > " port %u\n", > cl_ntoh16( pkey ), > @@ -258,8 +211,8 @@ __osm_pkey_mgr_process_physical_port( > > if ( block_found == FALSE ) > { > - osm_log( p_mgr->p_log, OSM_LOG_ERROR, > - "__osm_pkey_mgr_process_physical_port: ERR 0501: " > + osm_log( p_log, OSM_LOG_ERROR, > + "pkey_mgr_process_physical_port: ERR 0501: " > "No empty pkey entry was found to insert 0x%04x for node " > "0x%016" PRIx64 " port %u\n", > cl_ntoh16( pkey ), > @@ -269,13 +222,13 @@ __osm_pkey_mgr_process_physical_port( > } > > status = > - osm_pkey_mgr_update_pkey_entry( p_mgr, p_physp, block, block_index ); > + pkey_mgr_update_pkey_entry( p_req, p_physp, block, block_index ); > > if ( status != IB_SUCCESS ) > { > - osm_log( p_mgr->p_log, OSM_LOG_ERROR, > - "__osm_pkey_mgr_process_physical_port: " > - "osm_pkey_mgr_update_pkey_entry() failed to update " > + osm_log( p_log, OSM_LOG_ERROR, > + "pkey_mgr_process_physical_port: " > + "pkey_mgr_update_pkey_entry() failed to update " > "pkey table block %d for node 0x%016" PRIx64 " port %u\n", > block_index, > cl_ntoh64( osm_node_get_node_guid( p_node ) ), > @@ -285,10 +238,10 @@ __osm_pkey_mgr_process_physical_port( > > return_val = TRUE; /* pkey was inserted/updated */ > > - if ( osm_log_is_active( p_mgr->p_log, OSM_LOG_VERBOSE ) ) > + if ( osm_log_is_active( p_log, OSM_LOG_VERBOSE ) ) > { > - osm_log( p_mgr->p_log, OSM_LOG_VERBOSE, > - "__osm_pkey_mgr_process_physical_port: " > + osm_log( p_log, OSM_LOG_VERBOSE, > + "pkey_mgr_process_physical_port: " > "pkey 0x%04x was inserted for node 0x%016" PRIx64 > " port %u\n", > cl_ntoh16( pkey ), > @@ -297,7 +250,7 @@ __osm_pkey_mgr_process_physical_port( > } > > _done: > - OSM_LOG_EXIT( p_mgr->p_log ); > + OSM_LOG_EXIT( p_log ); > return ( return_val ); > } > > @@ -305,9 +258,11 @@ __osm_pkey_mgr_process_physical_port( > /********************************************************************** > **********************************************************************/ > static void > -osm_pkey_mgr_update_peer_port( > - const osm_pkey_mgr_t * const p_mgr, > - const osm_port_t * const p_port, > +pkey_mgr_update_peer_port( > + osm_log_t *p_log, > + const osm_req_t *p_req, > + const osm_subn_t *p_subn, > + const osm_port_t *p_port, > boolean_t enforce) > { > osm_physp_t *p, *peer; > @@ -330,15 +285,15 @@ osm_pkey_mgr_update_peer_port( > if ( osm_node_get_type( p_node ) != IB_NODE_TYPE_SWITCH ) > return; > > - p_sw = osm_get_switch_by_guid( p_mgr->p_subn, osm_node_get_node_guid( > p_node )); > + p_sw = osm_get_switch_by_guid( p_subn, osm_node_get_node_guid( p_node )); > if (!p_sw || !(p_si = osm_switch_get_si_ptr( p_sw )) || > !p_si->enforce_cap) > return; > > - if (osm_pkey_mgr_enforce_partition( p_mgr, peer, enforce ) != IB_SUCCESS) { > - osm_log( p_mgr->p_log, OSM_LOG_ERROR, > - "osm_pkey_mgr_update_peer_port: " > - "osm_pkey_mgr_enforce_partition() failed to update " > + if (pkey_mgr_enforce_partition( p_req, peer, enforce ) != IB_SUCCESS) { > + osm_log( p_log, OSM_LOG_ERROR, > + "pkey_mgr_update_peer_port: " > + "pkey_mgr_enforce_partition() failed to update " > "node 0x%016" PRIx64 " port %u\n", > cl_ntoh64( osm_node_get_node_guid( p_node ) ), > osm_physp_get_port_num( peer ) ); > @@ -361,12 +316,12 @@ osm_pkey_mgr_update_peer_port( > { > cl_memcpy( peer_block, block, sizeof( *block ) ); > status = > - osm_pkey_mgr_update_pkey_entry( p_mgr, peer, peer_block, > + pkey_mgr_update_pkey_entry( p_req, peer, peer_block, > block_index ); > if ( status != IB_SUCCESS ) > - osm_log( p_mgr->p_log, OSM_LOG_ERROR, > - "osm_pkey_mgr_update_peer_port: " > - "osm_pkey_mgr_update_pkey_entry() failed to update " > + osm_log( p_log, OSM_LOG_ERROR, > + "pkey_mgr_update_peer_port: " > + "pkey_mgr_update_pkey_entry() failed to update " > "pkey table block %d for node 0x%016" PRIx64 > " port %u\n", > block_index, > @@ -376,10 +331,10 @@ osm_pkey_mgr_update_peer_port( > } > > if ( num_of_blocks && status == IB_SUCCESS && > - osm_log_is_active( p_mgr->p_log, OSM_LOG_VERBOSE ) ) > + osm_log_is_active( p_log, OSM_LOG_VERBOSE ) ) > { > - osm_log( p_mgr->p_log, OSM_LOG_VERBOSE, > - "osm_pkey_mgr_update_peer_port: " > + osm_log( p_log, OSM_LOG_VERBOSE, > + "pkey_mgr_update_peer_port: " > "pkey table was updated for node 0x%016" PRIx64 > " port %u\n", > cl_ntoh64( osm_node_get_node_guid( p_node ) ), > @@ -390,9 +345,10 @@ osm_pkey_mgr_update_peer_port( > /********************************************************************** > **********************************************************************/ > static boolean_t > -osm_pkey_mgr_process_partition_table( > - const osm_pkey_mgr_t * const p_mgr, > - const osm_prtn_t * const p_prtn, > +pkey_mgr_process_partition_table( > + osm_log_t *p_log, > + const osm_req_t *p_req, > + const osm_prtn_t *p_prtn, > const boolean_t full ) > { > const cl_map_t *p_tbl = full ? > @@ -412,12 +368,12 @@ osm_pkey_mgr_process_partition_table( > i_next = cl_map_next( i ); > p_physp = cl_map_obj( i ); > if ( p_physp && osm_physp_is_valid( p_physp ) && > - __osm_pkey_mgr_process_physical_port( p_mgr, pkey, p_physp ) ) > + pkey_mgr_process_physical_port( p_log, p_req, pkey, p_physp ) ) > { > result = TRUE; > - if ( osm_log_is_active( p_mgr->p_log, OSM_LOG_VERBOSE ) ) > - osm_log( p_mgr->p_log, OSM_LOG_VERBOSE, > - "osm_pkey_mgr_process_partition_table: " > + if ( osm_log_is_active( p_log, OSM_LOG_VERBOSE ) ) > + osm_log( p_log, OSM_LOG_VERBOSE, > + "pkey_mgr_process_partition_table: " > "Adding 0x%04x to pkey table of node " > "0x%016" PRIx64 " port %u\n", > cl_ntoh16( pkey ), > @@ -434,7 +390,7 @@ osm_pkey_mgr_process_partition_table( > **********************************************************************/ > osm_signal_t > osm_pkey_mgr_process( > - IN const osm_pkey_mgr_t * const p_mgr ) > + IN osm_opensm_t *p_osm ) > { > cl_qmap_t *p_tbl; > cl_map_item_t *p_next; > @@ -442,20 +398,20 @@ osm_pkey_mgr_process( > osm_port_t *p_port; > osm_signal_t signal = OSM_SIGNAL_DONE; > > - CL_ASSERT( p_mgr ); > + CL_ASSERT( p_osm ); > > - OSM_LOG_ENTER( p_mgr->p_log, osm_pkey_mgr_process ); > + OSM_LOG_ENTER( &p_osm->log, osm_pkey_mgr_process ); > > - CL_PLOCK_EXCL_ACQUIRE( p_mgr->p_lock ); > + CL_PLOCK_EXCL_ACQUIRE( &p_osm->lock ); > > - if ( osm_prtn_make_partitions( p_mgr->p_log, p_mgr->p_subn ) != IB_SUCCESS ) > + if ( osm_prtn_make_partitions( &p_osm->log, &p_osm->subn ) != IB_SUCCESS ) > { > - osm_log( p_mgr->p_log, OSM_LOG_ERROR, "osm_pkey_mgr_process: " > + osm_log( &p_osm->log, OSM_LOG_ERROR, "osm_pkey_mgr_process: " > "osm_prtn_make_partitions() failed\n" ); > goto _err; > } > > - p_tbl = &p_mgr->p_subn->prtn_pkey_tbl; > + p_tbl = &p_osm->subn.prtn_pkey_tbl; > > p_next = cl_qmap_head( p_tbl ); > while ( p_next != cl_qmap_end( p_tbl ) ) > @@ -463,13 +419,13 @@ osm_pkey_mgr_process( > p_prtn = ( osm_prtn_t * ) p_next; > p_next = cl_qmap_next( p_next ); > > - if ( osm_pkey_mgr_process_partition_table( p_mgr, p_prtn, FALSE ) ) > + if ( pkey_mgr_process_partition_table( &p_osm->log, &p_osm->sm.req, p_prtn, > FALSE ) ) > signal = OSM_SIGNAL_DONE_PENDING; > - if ( osm_pkey_mgr_process_partition_table( p_mgr, p_prtn, TRUE ) ) > + if ( pkey_mgr_process_partition_table( &p_osm->log, &p_osm->sm.req, p_prtn, > TRUE ) ) > signal = OSM_SIGNAL_DONE_PENDING; > } > > - p_tbl = &p_mgr->p_subn->port_guid_tbl; > + p_tbl = &p_osm->subn.port_guid_tbl; > > p_next = cl_qmap_head( p_tbl ); > while ( p_next != cl_qmap_end( p_tbl ) ) > @@ -480,13 +436,13 @@ osm_pkey_mgr_process( > if ( osm_node_get_type( osm_port_get_parent_node( p_port ) ) != > IB_NODE_TYPE_SWITCH ) > { > - osm_pkey_mgr_update_peer_port( p_mgr, p_port, > - !p_mgr->p_subn->opt.no_partition_enforcement); > + pkey_mgr_update_peer_port( &p_osm->log, &p_osm->sm.req, &p_osm->subn, > + p_port, !p_osm->subn.opt.no_partition_enforcement ); > } > } > > _err: > - CL_PLOCK_RELEASE( p_mgr->p_lock ); > - OSM_LOG_EXIT( p_mgr->p_log ); > + CL_PLOCK_RELEASE( &p_osm->lock ); > + OSM_LOG_EXIT( &p_osm->log ); > return ( signal ); > } > diff --git a/osm/opensm/osm_sm.c b/osm/opensm/osm_sm.c > index 9c10651..99e5627 100644 > --- a/osm/opensm/osm_sm.c > +++ b/osm/opensm/osm_sm.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. > + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. > * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. > * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. > * > @@ -66,7 +66,6 @@ #include > #include > #include > #include > -#include > #include > #include > #include > @@ -160,7 +159,6 @@ osm_sm_construct( > osm_state_mgr_construct( &p_sm->state_mgr ); > osm_state_mgr_ctrl_construct( &p_sm->state_mgr_ctrl ); > osm_drop_mgr_construct( &p_sm->drop_mgr ); > - osm_pkey_mgr_construct( &p_sm->pkey_mgr ); > osm_lft_rcv_construct( &p_sm->lft_rcv ); > osm_lft_rcv_ctrl_construct( &p_sm->lft_rcv_ctrl ); > osm_mft_rcv_construct( &p_sm->mft_rcv ); > @@ -250,7 +248,6 @@ osm_sm_destroy( > osm_ucast_mgr_destroy( &p_sm->ucast_mgr ); > osm_link_mgr_destroy( &p_sm->link_mgr ); > osm_drop_mgr_destroy( &p_sm->drop_mgr ); > - osm_pkey_mgr_destroy( &p_sm->pkey_mgr ); > osm_lft_rcv_destroy( &p_sm->lft_rcv ); > osm_mft_rcv_destroy( &p_sm->mft_rcv ); > osm_slvl_rcv_destroy( &p_sm->slvl_rcv ); > @@ -408,7 +405,6 @@ osm_sm_init( > &p_sm->mcast_mgr, > &p_sm->link_mgr, > &p_sm->drop_mgr, > - &p_sm->pkey_mgr, > &p_sm->req, > p_stats, > &p_sm->sm_state_mgr, > @@ -431,12 +427,6 @@ osm_sm_init( > if( status != IB_SUCCESS ) > goto Exit; > > - status = osm_pkey_mgr_init( &p_sm->pkey_mgr, > - p_sm->p_subn, > - p_sm->p_log, &p_sm->req, p_sm->p_lock ); > - if( status != IB_SUCCESS ) > - goto Exit; > - > status = osm_lft_rcv_init( &p_sm->lft_rcv, p_subn, p_log, p_lock ); > if( status != IB_SUCCESS ) > goto Exit; > diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c > index 083185c..1aefc0b 100644 > --- a/osm/opensm/osm_state_mgr.c > +++ b/osm/opensm/osm_state_mgr.c > @@ -1,5 +1,5 @@ > /* > - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. > + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. > * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. > * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. > * > @@ -64,6 +64,7 @@ #include > #include > #include > #include > +#include > #include > #include > #include > @@ -108,7 +109,6 @@ osm_state_mgr_init( > IN osm_mcast_mgr_t * const p_mcast_mgr, > IN osm_link_mgr_t * const p_link_mgr, > IN osm_drop_mgr_t * const p_drop_mgr, > - IN osm_pkey_mgr_t * const p_pkey_mgr, > IN osm_req_t * const p_req, > IN osm_stats_t * const p_stats, > IN osm_sm_state_mgr_t * const p_sm_state_mgr, > @@ -128,7 +128,6 @@ osm_state_mgr_init( > CL_ASSERT( p_mcast_mgr ); > CL_ASSERT( p_link_mgr ); > CL_ASSERT( p_drop_mgr ); > - CL_ASSERT( p_pkey_mgr ); > CL_ASSERT( p_req ); > CL_ASSERT( p_stats ); > CL_ASSERT( p_sm_state_mgr ); > @@ -145,7 +144,6 @@ osm_state_mgr_init( > p_mgr->p_mcast_mgr = p_mcast_mgr; > p_mgr->p_link_mgr = p_link_mgr; > p_mgr->p_drop_mgr = p_drop_mgr; > - p_mgr->p_pkey_mgr = p_pkey_mgr; > p_mgr->p_mad_ctrl = p_mad_ctrl; > p_mgr->p_req = p_req; > p_mgr->p_stats = p_stats; > @@ -2235,7 +2233,7 @@ osm_state_mgr_process( > OSM_SM_SIGNAL_DISCOVERY_COMPLETED ); > > /* the returned signal might be DONE or DONE_PENDING */ > - signal = osm_pkey_mgr_process( p_mgr->p_pkey_mgr ); > + signal = osm_pkey_mgr_process( p_mgr->p_subn->p_osm ); > break; > > default: From eitan at mellanox.co.il Tue May 2 01:45:02 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 2 May 2006 11:45:02 +0300 Subject: [openib-general] RE: [PATCH 3/4] opensm: pkey manager performance improvement Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAD6@mtlexch01.mtl.com> Hi Sasha, This is an important improvement to the previous algorithm. I read through but am not sure this does not break the logic. I hope we can get the random test flow written soon - to gain confidence. But since we did not thoroughly test the previous one - it is OK to commit this in my mind. Thanks Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Sunday, April 23, 2006 5:26 PM > To: Hal Rosenstock > Cc: openib-general at openib.org; Eitan Zahavi; Ofer Gigi; Yael Kalka > Subject: [PATCH 3/4] opensm: pkey manager performance improvement > > > Send changed pkey table blocks to ports only after full update and not > after each pkey value change/update. > > Signed-off-by: Sasha Khapyorsky > --- > > osm/include/opensm/osm_pkey.h | 51 +++++++++ > osm/opensm/osm_pkey.c | 32 ++++++ > osm/opensm/osm_pkey_mgr.c | 233 ++++++++++++++++++++--------------------- > 3 files changed, 197 insertions(+), 119 deletions(-) > > diff --git a/osm/include/opensm/osm_pkey.h b/osm/include/opensm/osm_pkey.h > index d4ee9a1..f5e8c11 100644 > --- a/osm/include/opensm/osm_pkey.h > +++ b/osm/include/opensm/osm_pkey.h > @@ -90,16 +90,28 @@ struct _osm_physp; > typedef struct _osm_pkey_tbl > { > cl_ptr_vector_t blocks; > + cl_ptr_vector_t new_blocks; > cl_map_t keys; > } osm_pkey_tbl_t; > /* > * FIELDS > * blocks > -* The IBA defined blocks of pkey values > +* The IBA defined blocks of pkey values, updated from the net > +* > +* new_blocks > +* The blocks of pkey values, will be used for updates by SM > * > * keys > * A set holding all keys > * > +* NOTES > +* 'blocks' vector should be used to store pkey values obtained from > +* the port and SM pkey manager should not change it directly, for this > +* purpose 'new_blocks' should be used. > +* > +* The only pkey values stored in 'blocks' vector will be mapped with > +* 'keys' map > +* > *********/ > > /****f* OpenSM: osm_pkey_tbl_construct > @@ -214,6 +226,43 @@ static inline ib_pkey_table_t *osm_pkey_ > * > *********/ > > +/****f* OpenSM: osm_pkey_tbl_new_block_get > +* NAME > +* osm_pkey_tbl_new_block_get > +* > +* DESCRIPTION > +* The same as above but for new block > +* > +* SYNOPSIS > +*/ > +static inline ib_pkey_table_t *osm_pkey_tbl_new_block_get( > + const osm_pkey_tbl_t *p_pkey_tbl, uint16_t block) > +{ > + return (block < cl_ptr_vector_get_size(&p_pkey_tbl->new_blocks)) ? > + cl_ptr_vector_get(&p_pkey_tbl->new_blocks, block) : NULL; > +}; > +/* > + *********/ > + > +/****f* OpenSM: osm_pkey_tbl_sync_new_blocks > +* NAME > +* osm_pkey_tbl_sync_new_blocks > +* > +* DESCRIPTION > +* Syncs new_blocks vector content with current pkey table blocks > +* > +* SYNOPSIS > +*/ > +void osm_pkey_tbl_sync_new_blocks( > + const osm_pkey_tbl_t *p_pkey_tbl); > +/* > +* p_pkey_tbl > +* [in] Pointer to osm_pkey_tbl_t object. > +* > +* NOTES > +* > +*********/ > + > /****f* OpenSM: osm_pkey_tbl_set > * NAME > * osm_pkey_tbl_set > diff --git a/osm/opensm/osm_pkey.c b/osm/opensm/osm_pkey.c > index 5a4ca0d..d661bd6 100644 > --- a/osm/opensm/osm_pkey.c > +++ b/osm/opensm/osm_pkey.c > @@ -67,6 +67,7 @@ void osm_pkey_tbl_construct( > IN osm_pkey_tbl_t *p_pkey_tbl) > { > cl_ptr_vector_construct( &p_pkey_tbl->blocks ); > + cl_ptr_vector_construct( &p_pkey_tbl->new_blocks ); > cl_map_construct( &p_pkey_tbl->keys ); > } > > @@ -82,6 +83,11 @@ void osm_pkey_tbl_destroy( > cl_free(cl_ptr_vector_get( &p_pkey_tbl->blocks, i )); > cl_ptr_vector_destroy( &p_pkey_tbl->blocks ); > > + num_blocks = (uint16_t)(cl_ptr_vector_get_size( &p_pkey_tbl->new_blocks )); > + for (i = 0; i < num_blocks; i++) > + cl_free(cl_ptr_vector_get( &p_pkey_tbl->new_blocks, i )); > + cl_ptr_vector_destroy( &p_pkey_tbl->new_blocks ); > + > cl_map_remove_all( &p_pkey_tbl->keys ); > cl_map_destroy( &p_pkey_tbl->keys ); > } > @@ -92,12 +98,38 @@ int osm_pkey_tbl_init( > IN osm_pkey_tbl_t *p_pkey_tbl) > { > cl_ptr_vector_init( &p_pkey_tbl->blocks, 0, 1); > + cl_ptr_vector_init( &p_pkey_tbl->new_blocks, 0, 1); > cl_map_init( &p_pkey_tbl->keys, 1 ); > return(IB_SUCCESS); > } > > /********************************************************************** > **********************************************************************/ > +void osm_pkey_tbl_sync_new_blocks( > + IN const osm_pkey_tbl_t *p_pkey_tbl) > +{ > + ib_pkey_table_t *p_block, *p_new_block; > + int16_t b, num_blocks, new_blocks; > + > + num_blocks = cl_ptr_vector_get_size(&p_pkey_tbl->blocks); > + new_blocks = cl_ptr_vector_get_size(&p_pkey_tbl->new_blocks); > + > + for (b = 0; b < num_blocks; b++) { > + p_block = cl_ptr_vector_get(&p_pkey_tbl->blocks, b); > + if ( b < new_blocks ) > + p_new_block = cl_ptr_vector_get(&p_pkey_tbl->new_blocks, b); > + else { > + p_new_block = (ib_pkey_table_t *)cl_zalloc(sizeof(*p_new_block)); > + if (!p_new_block) > + break; > + cl_ptr_vector_set(&((osm_pkey_tbl_t *)p_pkey_tbl)->new_blocks, b, > p_new_block); > + } > + cl_memcpy(p_new_block, p_block, sizeof(*p_new_block)); > + } > +} > + > +/********************************************************************** > + **********************************************************************/ > int osm_pkey_tbl_set( > IN osm_pkey_tbl_t *p_pkey_tbl, > IN uint16_t block, > diff --git a/osm/opensm/osm_pkey_mgr.c b/osm/opensm/osm_pkey_mgr.c > index 7b3da26..da8dfa8 100644 > --- a/osm/opensm/osm_pkey_mgr.c > +++ b/osm/opensm/osm_pkey_mgr.c > @@ -131,86 +131,45 @@ pkey_mgr_enforce_partition( > **********************************************************************/ > > /* > - * Send a new entry for the pkey table for this port when this pkey > + * Prepare a new entry for the pkey table for this port when this pkey > * does not exist. Update existed entry when membership was changed. > */ > > -static boolean_t > -pkey_mgr_process_physical_port( > +static void pkey_mgr_process_physical_port( > IN osm_log_t *p_log, > IN const osm_req_t *p_req, > IN const ib_net16_t pkey, > IN osm_physp_t *p_physp ) > { > - boolean_t return_val = FALSE; /* TRUE if pkey was inserted or updated */ > - ib_api_status_t status; > osm_node_t *p_node = osm_physp_get_node_ptr( p_physp ); > - ib_pkey_table_t *block = NULL; > + ib_pkey_table_t *block; > uint16_t block_index; > uint16_t num_of_blocks; > const osm_pkey_tbl_t *p_pkey_tbl; > ib_net16_t *p_orig_pkey; > + char *stat = NULL; > uint32_t i; > - boolean_t block_found = FALSE; > - > - OSM_LOG_ENTER( p_log, pkey_mgr_process_physical_port ); > > p_pkey_tbl = osm_physp_get_pkey_tbl( p_physp ); > num_of_blocks = osm_pkey_tbl_get_num_blocks( p_pkey_tbl ); > > p_orig_pkey = cl_map_get( &p_pkey_tbl->keys, ib_pkey_get_base( pkey ) ); > > - if ( p_orig_pkey && *p_orig_pkey == pkey ) > - { > - if ( osm_log_is_active( p_log, OSM_LOG_VERBOSE ) ) > - { > - osm_log( p_log, OSM_LOG_VERBOSE, > - "pkey_mgr_process_physical_port: " > - "No need to insert pkey 0x%04x for node 0x%016" PRIx64 > - " port %u\n", > - cl_ntoh16( pkey ), > - cl_ntoh64( osm_node_get_node_guid( p_node ) ), > - osm_physp_get_port_num( p_physp ) ); > - } > - goto _done; > - } > - else if ( !p_orig_pkey ) > + if ( !p_orig_pkey ) > { > for ( block_index = 0; block_index < num_of_blocks; block_index++ ) > { > - block = osm_pkey_tbl_block_get( p_pkey_tbl, block_index ); > + block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index ); > for ( i = 0; i < IB_NUM_PKEY_ELEMENTS_IN_BLOCK; i++ ) > { > if ( ib_pkey_is_invalid( block->pkey_entry[i] ) ) > { > block->pkey_entry[i] = pkey; > - block_found = TRUE; > - break; > + stat = "inserted"; > + goto _done; > } > } > - if ( block_found ) > - { > - break; > - } > } > - } > - else > - { > - *p_orig_pkey = pkey; > - for ( block_index = 0; block_index < num_of_blocks; block_index++ ) > - { > - block = osm_pkey_tbl_block_get( p_pkey_tbl, block_index ); > - i = p_orig_pkey - block->pkey_entry; > - if ( i < IB_NUM_PKEY_ELEMENTS_IN_BLOCK ) > - { > - block_found = TRUE; > - break; > - } > - } > - } > - > - if ( block_found == FALSE ) > - { > osm_log( p_log, OSM_LOG_ERROR, > "pkey_mgr_process_physical_port: ERR 0501: " > "No empty pkey entry was found to insert 0x%04x for node " > @@ -218,46 +177,40 @@ pkey_mgr_process_physical_port( > cl_ntoh16( pkey ), > cl_ntoh64( osm_node_get_node_guid( p_node ) ), > osm_physp_get_port_num( p_physp ) ); > - goto _done; > } > - > - status = > - pkey_mgr_update_pkey_entry( p_req, p_physp, block, block_index ); > - > - if ( status != IB_SUCCESS ) > + else if ( *p_orig_pkey != pkey ) > { > - osm_log( p_log, OSM_LOG_ERROR, > - "pkey_mgr_process_physical_port: " > - "pkey_mgr_update_pkey_entry() failed to update " > - "pkey table block %d for node 0x%016" PRIx64 " port %u\n", > - block_index, > - cl_ntoh64( osm_node_get_node_guid( p_node ) ), > - osm_physp_get_port_num( p_physp ) ); > - goto _done; > + for ( block_index = 0; block_index < num_of_blocks; block_index++ ) > + { > + /* we need real block (not just new_block) in order > + * to resolve block/pkey indices */ > + block = osm_pkey_tbl_block_get( p_pkey_tbl, block_index ); > + i = p_orig_pkey - block->pkey_entry; > + if (i < IB_NUM_PKEY_ELEMENTS_IN_BLOCK) { > + block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index ); > + block->pkey_entry[i] = pkey; > + stat = "updated"; > + goto _done; > + } > + } > } > > - return_val = TRUE; /* pkey was inserted/updated */ > - > - if ( osm_log_is_active( p_log, OSM_LOG_VERBOSE ) ) > - { > + _done: > + if (stat) { > osm_log( p_log, OSM_LOG_VERBOSE, > "pkey_mgr_process_physical_port: " > - "pkey 0x%04x was inserted for node 0x%016" PRIx64 > + "pkey 0x%04x was %s for node 0x%016" PRIx64 > " port %u\n", > - cl_ntoh16( pkey ), > + cl_ntoh16( pkey ), stat, > cl_ntoh64( osm_node_get_node_guid( p_node ) ), > osm_physp_get_port_num( p_physp ) ); > } > - > - _done: > - OSM_LOG_EXIT( p_log ); > - return ( return_val ); > } > > > /********************************************************************** > **********************************************************************/ > -static void > +static boolean_t > pkey_mgr_update_peer_port( > osm_log_t *p_log, > const osm_req_t *p_req, > @@ -274,21 +227,22 @@ pkey_mgr_update_peer_port( > uint16_t block_index; > uint16_t num_of_blocks; > ib_api_status_t status = IB_SUCCESS; > + boolean_t ret_val = FALSE; > > p = osm_port_get_default_phys_ptr( p_port ); > if ( !osm_physp_is_valid( p ) ) > - return; > + return FALSE; > peer = osm_physp_get_remote( p ); > if ( !peer || !osm_physp_is_valid( peer ) ) > - return; > + return FALSE; > p_node = osm_physp_get_node_ptr( peer ); > if ( osm_node_get_type( p_node ) != IB_NODE_TYPE_SWITCH ) > - return; > + return FALSE; > > p_sw = osm_get_switch_by_guid( p_subn, osm_node_get_node_guid( p_node )); > if (!p_sw || !(p_si = osm_switch_get_si_ptr( p_sw )) || > !p_si->enforce_cap) > - return; > + return FALSE; > > if (pkey_mgr_enforce_partition( p_req, peer, enforce ) != IB_SUCCESS) { > osm_log( p_log, OSM_LOG_ERROR, > @@ -300,7 +254,7 @@ pkey_mgr_update_peer_port( > } > > if (enforce == FALSE) > - return; > + return FALSE; > > p_pkey_tbl = osm_physp_get_pkey_tbl( p ); > p_peer_pkey_tbl = osm_physp_get_pkey_tbl( peer ); > @@ -310,15 +264,15 @@ pkey_mgr_update_peer_port( > > for ( block_index = 0; block_index < num_of_blocks; block_index++ ) > { > - block = osm_pkey_tbl_block_get( p_pkey_tbl, block_index ); > + block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index ); > peer_block = osm_pkey_tbl_block_get( p_peer_pkey_tbl, block_index ); > - if ( cl_memcmp( peer_block, block, sizeof( *block ) ) ) > + if ( cl_memcmp( peer_block, block, sizeof( *peer_block ) ) ) > { > - cl_memcpy( peer_block, block, sizeof( *block ) ); > status = > - pkey_mgr_update_pkey_entry( p_req, peer, peer_block, > - block_index ); > - if ( status != IB_SUCCESS ) > + pkey_mgr_update_pkey_entry( p_req, peer, block, block_index ); > + if ( status == IB_SUCCESS ) > + ret_val = TRUE; > + else > osm_log( p_log, OSM_LOG_ERROR, > "pkey_mgr_update_peer_port: " > "pkey_mgr_update_pkey_entry() failed to update " > @@ -330,7 +284,7 @@ pkey_mgr_update_peer_port( > } > } > > - if ( num_of_blocks && status == IB_SUCCESS && > + if ( ret_val == TRUE && > osm_log_is_active( p_log, OSM_LOG_VERBOSE ) ) > { > osm_log( p_log, OSM_LOG_VERBOSE, > @@ -340,11 +294,61 @@ pkey_mgr_update_peer_port( > cl_ntoh64( osm_node_get_node_guid( p_node ) ), > osm_physp_get_port_num( peer ) ); > } > + > + return ret_val; > } > > /********************************************************************** > **********************************************************************/ > -static boolean_t > +static boolean_t pkey_mgr_update_port( > + osm_log_t *p_log, > + osm_req_t *p_req, > + const osm_port_t * const p_port ) > +{ > + osm_physp_t *p; > + osm_node_t *p_node; > + ib_pkey_table_t *block, *new_block; > + const osm_pkey_tbl_t *p_pkey_tbl; > + uint16_t block_index; > + uint16_t num_of_blocks; > + ib_api_status_t status; > + boolean_t ret_val = FALSE; > + > + p = osm_port_get_default_phys_ptr( p_port ); > + if ( !osm_physp_is_valid( p ) ) > + return FALSE; > + > + p_pkey_tbl = osm_physp_get_pkey_tbl(p); > + num_of_blocks = osm_pkey_tbl_get_num_blocks( p_pkey_tbl ); > + > + for ( block_index = 0; block_index < num_of_blocks; block_index++ ) > + { > + block = osm_pkey_tbl_block_get( p_pkey_tbl, block_index ); > + new_block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index ); > + > + if (!new_block || !cl_memcmp( new_block, block, sizeof( *block ) ) ) > + continue; > + > + status = > + pkey_mgr_update_pkey_entry( p_req, p, new_block, block_index ); > + if (status == IB_SUCCESS) > + ret_val = TRUE; > + else > + osm_log( p_log, OSM_LOG_ERROR, > + "pkey_mgr_update_port: " > + "pkey_mgr_update_pkey_entry() failed to update " > + "pkey table block %d for node 0x%016" PRIx64 " port %u\n", > + block_index, > + cl_ntoh64( osm_node_get_node_guid( p_node ) ), > + osm_physp_get_port_num( p ) ); > + } > + > + return ret_val; > +} > + > +/********************************************************************** > + **********************************************************************/ > +static void > pkey_mgr_process_partition_table( > osm_log_t *p_log, > const osm_req_t *p_req, > @@ -356,7 +360,6 @@ pkey_mgr_process_partition_table( > cl_map_iterator_t i, i_next; > ib_net16_t pkey = p_prtn->pkey; > osm_physp_t *p_physp; > - boolean_t result = FALSE; > > if ( full ) > pkey = cl_hton16( cl_ntoh16( pkey ) | 0x8000 ); > @@ -367,23 +370,9 @@ pkey_mgr_process_partition_table( > i = i_next; > i_next = cl_map_next( i ); > p_physp = cl_map_obj( i ); > - if ( p_physp && osm_physp_is_valid( p_physp ) && > - pkey_mgr_process_physical_port( p_log, p_req, pkey, p_physp ) ) > - { > - result = TRUE; > - if ( osm_log_is_active( p_log, OSM_LOG_VERBOSE ) ) > - osm_log( p_log, OSM_LOG_VERBOSE, > - "pkey_mgr_process_partition_table: " > - "Adding 0x%04x to pkey table of node " > - "0x%016" PRIx64 " port %u\n", > - cl_ntoh16( pkey ), > - cl_ntoh64( osm_node_get_node_guid > - ( osm_physp_get_node_ptr( p_physp ) ) ), > - osm_physp_get_port_num( p_physp ) ); > - } > + if ( p_physp && osm_physp_is_valid( p_physp ) ) > + pkey_mgr_process_physical_port( p_log, p_req, pkey, p_physp ); > } > - > - return result; > } > > /********************************************************************** > @@ -397,6 +386,7 @@ osm_pkey_mgr_process( > osm_prtn_t *p_prtn; > osm_port_t *p_port; > osm_signal_t signal = OSM_SIGNAL_DONE; > + osm_physp_t *p_physp; > > CL_ASSERT( p_osm ); > > @@ -411,34 +401,41 @@ osm_pkey_mgr_process( > goto _err; > } > > - p_tbl = &p_osm->subn.prtn_pkey_tbl; > + p_tbl = &p_osm->subn.port_guid_tbl; > + p_next = cl_qmap_head( p_tbl ); > + while ( p_next != cl_qmap_end( p_tbl ) ) > + { > + p_port = ( osm_port_t * ) p_next; > + p_next = cl_qmap_next( p_next ); > + p_physp = osm_port_get_default_phys_ptr( p_port ); > + if (osm_physp_is_valid( p_physp ) ) > + osm_pkey_tbl_sync_new_blocks(osm_physp_get_pkey_tbl(p_physp)); > + } > > + p_tbl = &p_osm->subn.prtn_pkey_tbl; > p_next = cl_qmap_head( p_tbl ); > while ( p_next != cl_qmap_end( p_tbl ) ) > { > p_prtn = ( osm_prtn_t * ) p_next; > p_next = cl_qmap_next( p_next ); > - > - if ( pkey_mgr_process_partition_table( &p_osm->log, &p_osm->sm.req, p_prtn, > FALSE ) ) > - signal = OSM_SIGNAL_DONE_PENDING; > - if ( pkey_mgr_process_partition_table( &p_osm->log, &p_osm->sm.req, p_prtn, > TRUE ) ) > - signal = OSM_SIGNAL_DONE_PENDING; > + pkey_mgr_process_partition_table( &p_osm->log, &p_osm->sm.req, p_prtn, > FALSE ); > + pkey_mgr_process_partition_table( &p_osm->log, &p_osm->sm.req, p_prtn, > TRUE ); > } > > p_tbl = &p_osm->subn.port_guid_tbl; > - > p_next = cl_qmap_head( p_tbl ); > while ( p_next != cl_qmap_end( p_tbl ) ) > { > p_port = ( osm_port_t * ) p_next; > p_next = cl_qmap_next( p_next ); > - > - if ( osm_node_get_type( osm_port_get_parent_node( p_port ) ) != > - IB_NODE_TYPE_SWITCH ) > - { > - pkey_mgr_update_peer_port( &p_osm->log, &p_osm->sm.req, &p_osm->subn, > - p_port, !p_osm->subn.opt.no_partition_enforcement ); > - } > + if (pkey_mgr_update_port(&p_osm->log, &p_osm->sm.req, p_port)) > + signal = OSM_SIGNAL_DONE_PENDING; > + if (osm_node_get_type( osm_port_get_parent_node( p_port ) ) != > + IB_NODE_TYPE_SWITCH && > + pkey_mgr_update_peer_port( &p_osm->log, &p_osm->sm.req, > + &p_osm->subn, p_port, > + !p_osm->subn.opt.no_partition_enforcement )) > + signal = OSM_SIGNAL_DONE_PENDING; > } > > _err: > From eitan at mellanox.co.il Tue May 2 01:55:47 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 2 May 2006 11:55:47 +0300 Subject: [openib-general] RE: [PATCH 4/4] opensm: no need to wait for pkey_mgr Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAD7@mtlexch01.mtl.com> Hi Sasha, This patch breaks the basic concept for OpenSM to report "SUBNET UP" only if all SubnMgt.Set were sent successfully. I do not think we want to remove this feature. In detail (you probably know that so it is just clarification for the audience): OpenSM uses SMP MADs to initialize the fabric components. SMP delivery reliability is provided through the fact each Set gets a GetResp. But the actual delivery of the SMP is not guaranteed. The layers below OpenSM provide "retry" capability such that if a GetResp is not received in some time window the Set is resent. However, there is no guarantee that even with multiple retries the packet will reach the destination. But in that case OpenSM eventually receives a transaction timeout error. When OpenSM fails to set some of the fabric components it does not give up. Instead it will report "errors during initialization" and will restart its fabric configuration sequence. This patch breaks this concept: if OpenSM does not wait for the partition setting to occur before completing the fabric configuration "FULL SWEEP" it will not report "errors during initialization" nor start a new sweep. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Sunday, April 23, 2006 5:26 PM > To: Hal Rosenstock > Cc: openib-general at openib.org; Eitan Zahavi; Ofer Gigi; Yael Kalka > Subject: [PATCH 4/4] opensm: no need to wait for pkey_mgr > > > Don't wait for pkey tables update responses in partition manager - > we may just continue resweep process. > > Signed-off-by: Sasha Khapyorsky > --- > > osm/opensm/osm_pkey_mgr.c | 66 +++++++++++++++++--------------------------- > osm/opensm/osm_state_mgr.c | 41 ++------------------------- > 2 files changed, 29 insertions(+), 78 deletions(-) > > diff --git a/osm/opensm/osm_pkey_mgr.c b/osm/opensm/osm_pkey_mgr.c > index da8dfa8..167b4c1 100644 > --- a/osm/opensm/osm_pkey_mgr.c > +++ b/osm/opensm/osm_pkey_mgr.c > @@ -135,7 +135,8 @@ pkey_mgr_enforce_partition( > * does not exist. Update existed entry when membership was changed. > */ > > -static void pkey_mgr_process_physical_port( > +static void > +pkey_mgr_process_physical_port( > IN osm_log_t *p_log, > IN const osm_req_t *p_req, > IN const ib_net16_t pkey, > @@ -210,7 +211,7 @@ static void pkey_mgr_process_physical_po > > /********************************************************************** > **********************************************************************/ > -static boolean_t > +static void > pkey_mgr_update_peer_port( > osm_log_t *p_log, > const osm_req_t *p_req, > @@ -227,22 +228,21 @@ pkey_mgr_update_peer_port( > uint16_t block_index; > uint16_t num_of_blocks; > ib_api_status_t status = IB_SUCCESS; > - boolean_t ret_val = FALSE; > > p = osm_port_get_default_phys_ptr( p_port ); > if ( !osm_physp_is_valid( p ) ) > - return FALSE; > + return; > peer = osm_physp_get_remote( p ); > if ( !peer || !osm_physp_is_valid( peer ) ) > - return FALSE; > + return; > p_node = osm_physp_get_node_ptr( peer ); > if ( osm_node_get_type( p_node ) != IB_NODE_TYPE_SWITCH ) > - return FALSE; > + return; > > p_sw = osm_get_switch_by_guid( p_subn, osm_node_get_node_guid( p_node )); > if (!p_sw || !(p_si = osm_switch_get_si_ptr( p_sw )) || > !p_si->enforce_cap) > - return FALSE; > + return; > > if (pkey_mgr_enforce_partition( p_req, peer, enforce ) != IB_SUCCESS) { > osm_log( p_log, OSM_LOG_ERROR, > @@ -254,7 +254,7 @@ pkey_mgr_update_peer_port( > } > > if (enforce == FALSE) > - return FALSE; > + return; > > p_pkey_tbl = osm_physp_get_pkey_tbl( p ); > p_peer_pkey_tbl = osm_physp_get_pkey_tbl( peer ); > @@ -271,36 +271,30 @@ pkey_mgr_update_peer_port( > status = > pkey_mgr_update_pkey_entry( p_req, peer, block, block_index ); > if ( status == IB_SUCCESS ) > - ret_val = TRUE; > + osm_log( p_log, OSM_LOG_VERBOSE, > + "pkey_mgr_update_peer_port: " > + "pkey table block %u was updated for node 0x%016" PRIx64 > + " port %u\n", > + block_index, > + cl_ntoh64( osm_node_get_node_guid( p_node ) ), > + osm_physp_get_port_num( peer ) ); > else > osm_log( p_log, OSM_LOG_ERROR, > "pkey_mgr_update_peer_port: " > "pkey_mgr_update_pkey_entry() failed to update " > - "pkey table block %d for node 0x%016" PRIx64 > + "pkey table block %u for node 0x%016" PRIx64 > " port %u\n", > block_index, > cl_ntoh64( osm_node_get_node_guid( p_node ) ), > osm_physp_get_port_num( peer ) ); > } > } > - > - if ( ret_val == TRUE && > - osm_log_is_active( p_log, OSM_LOG_VERBOSE ) ) > - { > - osm_log( p_log, OSM_LOG_VERBOSE, > - "pkey_mgr_update_peer_port: " > - "pkey table was updated for node 0x%016" PRIx64 > - " port %u\n", > - cl_ntoh64( osm_node_get_node_guid( p_node ) ), > - osm_physp_get_port_num( peer ) ); > - } > - > - return ret_val; > } > > /********************************************************************** > **********************************************************************/ > -static boolean_t pkey_mgr_update_port( > +static void > +pkey_mgr_update_port( > osm_log_t *p_log, > osm_req_t *p_req, > const osm_port_t * const p_port ) > @@ -312,11 +306,10 @@ static boolean_t pkey_mgr_update_port( > uint16_t block_index; > uint16_t num_of_blocks; > ib_api_status_t status; > - boolean_t ret_val = FALSE; > > p = osm_port_get_default_phys_ptr( p_port ); > if ( !osm_physp_is_valid( p ) ) > - return FALSE; > + return; > > p_pkey_tbl = osm_physp_get_pkey_tbl(p); > num_of_blocks = osm_pkey_tbl_get_num_blocks( p_pkey_tbl ); > @@ -331,9 +324,7 @@ static boolean_t pkey_mgr_update_port( > > status = > pkey_mgr_update_pkey_entry( p_req, p, new_block, block_index ); > - if (status == IB_SUCCESS) > - ret_val = TRUE; > - else > + if (status != IB_SUCCESS) > osm_log( p_log, OSM_LOG_ERROR, > "pkey_mgr_update_port: " > "pkey_mgr_update_pkey_entry() failed to update " > @@ -342,8 +333,6 @@ static boolean_t pkey_mgr_update_port( > cl_ntoh64( osm_node_get_node_guid( p_node ) ), > osm_physp_get_port_num( p ) ); > } > - > - return ret_val; > } > > /********************************************************************** > @@ -385,7 +374,6 @@ osm_pkey_mgr_process( > cl_map_item_t *p_next; > osm_prtn_t *p_prtn; > osm_port_t *p_port; > - osm_signal_t signal = OSM_SIGNAL_DONE; > osm_physp_t *p_physp; > > CL_ASSERT( p_osm ); > @@ -428,18 +416,16 @@ osm_pkey_mgr_process( > { > p_port = ( osm_port_t * ) p_next; > p_next = cl_qmap_next( p_next ); > - if (pkey_mgr_update_port(&p_osm->log, &p_osm->sm.req, p_port)) > - signal = OSM_SIGNAL_DONE_PENDING; > + pkey_mgr_update_port(&p_osm->log, &p_osm->sm.req, p_port); > if (osm_node_get_type( osm_port_get_parent_node( p_port ) ) != > - IB_NODE_TYPE_SWITCH && > - pkey_mgr_update_peer_port( &p_osm->log, &p_osm->sm.req, > - &p_osm->subn, p_port, > - !p_osm->subn.opt.no_partition_enforcement )) > - signal = OSM_SIGNAL_DONE_PENDING; > + IB_NODE_TYPE_SWITCH ) > + pkey_mgr_update_peer_port( &p_osm->log, &p_osm->sm.req, > + &p_osm->subn, p_port, > + !p_osm->subn.opt.no_partition_enforcement ); > } > > _err: > CL_PLOCK_RELEASE( &p_osm->lock ); > OSM_LOG_EXIT( &p_osm->log ); > - return ( signal ); > + return OSM_SIGNAL_DONE; > } > diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c > index 1aefc0b..194e51e 100644 > --- a/osm/opensm/osm_state_mgr.c > +++ b/osm/opensm/osm_state_mgr.c > @@ -2232,8 +2232,10 @@ osm_state_mgr_process( > osm_sm_state_mgr_process( p_mgr->p_sm_state_mgr, > OSM_SM_SIGNAL_DISCOVERY_COMPLETED ); > > - /* the returned signal might be DONE or DONE_PENDING */ > + /* the returned signal will be always DONE */ > signal = osm_pkey_mgr_process( p_mgr->p_subn->p_osm ); > + p_mgr->state = OSM_SM_STATE_SET_PKEY_DONE; > + > break; > > default: > @@ -2243,43 +2245,6 @@ osm_state_mgr_process( > } > break; > > - case OSM_SM_STATE_SET_PKEY: > - switch ( signal ) > - { > - case OSM_SIGNAL_DONE: > - p_mgr->state = OSM_SM_STATE_SET_PKEY_DONE; > - break; > - > - case OSM_SIGNAL_DONE_PENDING: > - /* > - * There are outstanding transactions, so we > - * must wait for the wire to clear. > - */ > - p_mgr->state = OSM_SM_STATE_SET_PKEY_WAIT; > - signal = OSM_SIGNAL_NONE; > - break; > - > - default: > - __osm_state_mgr_signal_error( p_mgr, signal ); > - signal = OSM_SIGNAL_NONE; > - break; > - } > - break; > - > - case OSM_SM_STATE_SET_PKEY_WAIT: > - switch ( signal ) > - { > - case OSM_SIGNAL_NO_PENDING_TRANSACTIONS: > - p_mgr->state = OSM_SM_STATE_SET_PKEY_DONE; > - break; > - > - default: > - __osm_state_mgr_signal_error( p_mgr, signal ); > - signal = OSM_SIGNAL_NONE; > - break; > - } > - break; > - > case OSM_SM_STATE_SET_PKEY_DONE: > switch ( signal ) > { From RAISCH at de.ibm.com Tue May 2 02:30:35 2006 From: RAISCH at de.ibm.com (Christoph Raisch) Date: Tue, 2 May 2006 11:30:35 +0200 Subject: [openib-general] Re: [PATCH 13/16] ehca: firmware InfiniBand interface In-Reply-To: <17489.18630.75412.66803@cargo.ozlabs.ibm.com> Message-ID: We started like that to get a clean interface between the register intensive h_calls and the driver code. We're in the middle of the tradeoff "nice interface" vs strict fencing of data structures from one code piece to another. Initially these functions, which only move paramaters from the stack into registers and back, were inline functions. So the compiler collapsed the function call into "nothing", which won't work if you use a struct *. Somewhen during code reviews people agreed that having this many inline functions leads to large header files which isn't a good idea either. We're about to change that interface again, so what should be the max number of parameters in a function call? The limit in existing kernel code is somewhere between 5-8 (just as a reminder, 8 is the max nr of parameters to be passed by register on ppc) christoph raisch Paul Mackerras wrote on 28.04.2006 00:42:14: > Jörn Engel writes: > > > 25 parameters? If you tell me which drugs were involved in this code, > > I know what to stay away from. > > You really need to ask the firmware architects that, since this is > basically a single firmware call. > > Mind you, since a lot of the parameters are used to return individual > bytes or half-words, which are then put into structures, it might be > better to pass the pointers to the structures and let the wrapper put > the values straight into the structures. > > Paul. From hch at infradead.org Tue May 2 06:35:07 2006 From: hch at infradead.org (Christoph Hellwig) Date: Tue, 2 May 2006 14:35:07 +0100 Subject: [openib-general] Re: [PATCH 5 of 13] ipath - use proper address translation routine In-Reply-To: References: <1ab168913f0fea5d18b4.1145913781@eng-12.pathscale.com> <1146509646.20760.63.camel@laptopd505.fenrus.org> Message-ID: <20060502133507.GA26704@infradead.org> On Mon, May 01, 2006 at 12:00:00PM -0700, Roland Dreier wrote: > Arjan> do you really NEED the vaddr? (most of the time linux > Arjan> drivers don't need it, while other OSes do) If you really > Arjan> need it you should grab it at dma_map time ... (and > Arjan> realize that it's not kernel addressable per se ;) > > Yes, they need some kind of vaddr. > > It's kind of a layering problem. The IB stack assumes that IB devices > have a DMA engine that deals with bus addresses. But the ipath driver > has to simulate this by using a memcpy on the CPU to move data to the > PCI device. > > I really don't know what the right solution is. Maybe having some way > to override the dma mapping operations so that the ipath driver can > keep the info it needs? Or stop doing the dma mapping in the IB upper level drivers. I told you that we'll get broken hardware that doesn't want dma mapping in the upper level driver, and pathscale created exactly that :) From rdreier at cisco.com Tue May 2 07:24:18 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 May 2006 07:24:18 -0700 Subject: [openib-general] Re: [PATCH 5 of 13] ipath - use proper address translation routine In-Reply-To: <20060502133507.GA26704@infradead.org> (Christoph Hellwig's message of "Tue, 2 May 2006 14:35:07 +0100") References: <1ab168913f0fea5d18b4.1145913781@eng-12.pathscale.com> <1146509646.20760.63.camel@laptopd505.fenrus.org> <20060502133507.GA26704@infradead.org> Message-ID: Christoph> Or stop doing the dma mapping in the IB upper level Christoph> drivers. I told you that we'll get broken hardware Christoph> that doesn't want dma mapping in the upper level Christoph> driver, and pathscale created exactly that :) But see my earlier mail to Arjan about RDMA -- what address can a protocol (eg SRP initiator) put in a message that the other side will use to initiate a remote DMA operation? It seems to me it has to be a bus address, and that means that the protocol has to do the DMA mapping. - R. From hch at infradead.org Tue May 2 07:27:29 2006 From: hch at infradead.org (Christoph Hellwig) Date: Tue, 2 May 2006 15:27:29 +0100 Subject: [openib-general] Re: [PATCH 5 of 13] ipath - use proper address translation routine In-Reply-To: References: <1ab168913f0fea5d18b4.1145913781@eng-12.pathscale.com> <1146509646.20760.63.camel@laptopd505.fenrus.org> <20060502133507.GA26704@infradead.org> Message-ID: <20060502142729.GA29721@infradead.org> On Tue, May 02, 2006 at 07:24:18AM -0700, Roland Dreier wrote: > Christoph> Or stop doing the dma mapping in the IB upper level > Christoph> drivers. I told you that we'll get broken hardware > Christoph> that doesn't want dma mapping in the upper level > Christoph> driver, and pathscale created exactly that :) > > But see my earlier mail to Arjan about RDMA -- what address can a > protocol (eg SRP initiator) put in a message that the other side will > use to initiate a remote DMA operation? It seems to me it has to be a > bus address, and that means that the protocol has to do the DMA mapping. Then we're back to the discussion on why RDMA is a fundamentally flawed approach, but we already knew that. The usual workaround is to only allow RDMA operations to registered memory windows for which we can use the normal dma operation. There's also the *dac* pci dma operations that can avoid iommu overhead if you support 64bit addressing. But for all this to work dma mapping fundamentally needs to be handled by the low level driver. From alan at lxorguk.ukuu.org.uk Tue May 2 07:55:04 2006 From: alan at lxorguk.ukuu.org.uk (Alan Cox) Date: Tue, 02 May 2006 15:55:04 +0100 Subject: [openib-general] Re: [PATCH 5 of 13] ipath - use proper address translation routine In-Reply-To: References: <1ab168913f0fea5d18b4.1145913781@eng-12.pathscale.com> <1146509646.20760.63.camel@laptopd505.fenrus.org> <20060502133507.GA26704@infradead.org> Message-ID: <1146581705.3519.61.camel@localhost.localdomain> On Maw, 2006-05-02 at 07:24 -0700, Roland Dreier wrote: > But see my earlier mail to Arjan about RDMA -- what address can a > protocol (eg SRP initiator) put in a message that the other side will > use to initiate a remote DMA operation? It seems to me it has to be a > bus address, and that means that the protocol has to do the DMA mapping. For most drivers properly, but you are making assumptions again. Why can't a driver which is doing its own mapping not also do its own rdma cookie handling ? You opt out of mapping being done for you, then you get opted out of defaults for other stuff too. Alan From rdreier at cisco.com Tue May 2 07:58:22 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 May 2006 07:58:22 -0700 Subject: [openib-general] Re: [PATCH 5 of 13] ipath - use proper address translation routine In-Reply-To: <1146581705.3519.61.camel@localhost.localdomain> (Alan Cox's message of "Tue, 02 May 2006 15:55:04 +0100") References: <1ab168913f0fea5d18b4.1145913781@eng-12.pathscale.com> <1146509646.20760.63.camel@laptopd505.fenrus.org> <20060502133507.GA26704@infradead.org> <1146581705.3519.61.camel@localhost.localdomain> Message-ID: Alan> For most drivers properly, but you are making assumptions Alan> again. Why can't a driver which is doing its own mapping not Alan> also do its own rdma cookie handling ? You opt out of Alan> mapping being done for you, then you get opted out of Alan> defaults for other stuff too. You're right, and that was what I was driving at in my earlier message when I talked about overriding the dma mapping operations for a device. That would let ipath or whatever create its own RDMA cookies, and keep track of the struct page or kernel virtual address of the original memory, so it can do memcpy when needed. I don't think the idea lets you push mapping down into the low-level driver, though. Take the SRP initiator as a specific example. The SCSI midlayer gives SRP a SCSI command to send. The SRP initiator formats that into an SRP message, with a "memory descriptor" (address and RDMA cookie) for the buffer associated with the SCSI command, and tells the low-level driver to send that message to the target. The target then performs RDMA into that buffer, sending back only the RDMA cookie and address. So unless you teach every low-level driver how to snoop inside SRP messages (along with NFS/RDMA, iSER and all the other protocols), I don't see where the low-level driver has a chance to do the mapping. - R. From caitlinb at broadcom.com Tue May 2 09:10:21 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Tue, 2 May 2006 09:10:21 -0700 Subject: [openib-general] Re: [PATCH 5 of 13] ipath - use proper address translation routine Message-ID: <54AD0F12E08D1541B826BE97C98F99F143B391@NT-SJCA-0751.brcm.ad.broadcom.com> openib-general-bounces at openib.org wrote: > proper address translation routine > > On Tue, May 02, 2006 at 07:24:18AM -0700, Roland Dreier wrote: >> Christoph> Or stop doing the dma mapping in the IB upper level >> Christoph> drivers. I told you that we'll get broken hardware >> Christoph> that doesn't want dma mapping in the upper level >> Christoph> driver, and pathscale created exactly that :) >> >> But see my earlier mail to Arjan about RDMA -- what address can a >> protocol (eg SRP initiator) put in a message that the other side will >> use to initiate a remote DMA operation? It seems to me it has to be >> a bus address, and that means that the protocol has to do the DMA >> mapping. > > Then we're back to the discussion on why RDMA is a > fundamentally flawed approach, but we already knew that. The > usual workaround is to only allow RDMA operations to > registered memory windows for which we can use the normal dma > operation. There's also the *dac* pci dma operations that > can avoid iommu overhead if you support 64bit addressing. > But for all this to work dma mapping fundamentally needs to > be handled by the low level driver. This is not a flaw in the RDMA model. All the RDMA Model requires is exposing virtual addresses that the device can translate back to host memory for remote operations. The protocols do not specify what the backing of an R-Key or STag is. And the local interface only requires that a L-Key / MR Stag be backed by translations from Key/TAG and Address to a memory reference that the hardware is capable of using immediately. So the problem here is that the upper layer driver does not properly understand what type of address the specific driver requires (and/or of the driver/device to understand what type of addresses it will be given). That's a failure to document interface requirements, not a failure of the RDMA model. From rdreier at cisco.com Tue May 2 09:38:16 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 May 2006 09:38:16 -0700 Subject: [openib-general] Re: [PATCH 07/12] SRP: Changing ibsrpdm In-Reply-To: <20060501112927.GH17552@mellanox.co.il> (Ishai Rabinovitz's message of "Mon, 1 May 2006 14:29:27 +0300") References: <20060501112927.GH17552@mellanox.co.il> Message-ID: Thanks, applied From rdreier at cisco.com Tue May 2 09:42:18 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 May 2006 09:42:18 -0700 Subject: [openib-general] Re: [PATCH 09/12] SRP: Changing ibsrpdm In-Reply-To: <20060501113019.GJ17552@mellanox.co.il> (Ishai Rabinovitz's message of "Mon, 1 May 2006 14:30:19 +0300") References: <20060501113019.GJ17552@mellanox.co.il> Message-ID: Ishai> alloca man page on my system says: The alloca() function is Ishai> machine and compiler dependent. On many systems its Ishai> implementation is buggy. Its use is discouraged. Lets not Ishai> use it. The man page is talking about non-Linux systems I think. I don't know of any problems with any modern Linux libc so I think it's better to have something automatically leak proof. - R. From jgunthorpe at obsidianresearch.com Tue May 2 09:48:41 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 2 May 2006 10:48:41 -0600 Subject: [openib-general] Re: [PATCH 09/12] SRP: Changing ibsrpdm In-Reply-To: References: <20060501113019.GJ17552@mellanox.co.il> Message-ID: <20060502164841.GL12964@obsidianresearch.com> On Tue, May 02, 2006 at 09:42:18AM -0700, Roland Dreier wrote: > Ishai> alloca man page on my system says: The alloca() function is > Ishai> machine and compiler dependent. On many systems its > Ishai> implementation is buggy. Its use is discouraged. Lets not > Ishai> use it. > The man page is talking about non-Linux systems I think. I don't know > of any problems with any modern Linux libc so I think it's better to > have something automatically leak proof. As a note the standards compliant way to get 'alloca' is via C99 runtime sized arrays, ie: void foo(int len) { char bar[len]; } GCC has supported this feature from C99 for a very long time as long as it isn't turned off by a compiler flag. alloca was never standardized by ISO or POSIX. -- Jason Gunthorpe (780)4406067x832 Chief Technology Officer, Obsidian Research Corp Edmonton, Canada From rpandit at silverstorm.com Tue May 2 10:35:47 2006 From: rpandit at silverstorm.com (Ranjit Pandit) Date: Tue, 2 May 2006 10:35:47 -0700 Subject: [openib-general] re RDS missing features In-Reply-To: <44571705.9000208@voltaire.com> References: <96f8e60e0605011042ve9bb9m5e9675256a11eacd@mail.gmail.com> <44571705.9000208@voltaire.com> Message-ID: <96f8e60e0605021035j24cb9d61jbce4f5f582acfe35@mail.gmail.com> On 5/2/06, Or Gerlitz wrote: > Ranjit Pandit wrote: > > Yes. > > There is no issue. It's just next in line for me to implement. > > So what's remained to implement? if the app attempt to send data to > 127.0.0.1 or a local IPoIB address then you are opening a connection to > this address over the CMA and in the passive "side" you just do > rdma_listen without binding to any device. In other words, no change on > the active side and a simplification of the passive side to support > this. Do i miss something here? Loopback connections can be optimized by not going to the HCA. In b-copy mode we can directly copy sends into destination sockets on the same node. > > >> +2 by failover, are you referring to APM? that is failover between IB > >> pathes to/from the same HCA > >> over which the original connection/QP was established or you are talking > >> on failover between HCAs > > > > Failover within and across HCAs. APM does not work for failover across > > HCAs. > > I see. Can you remind me ... where is the location of the reference gen1 > RDS code? does it support failover? Yes, Rds reference implementation implements failover across HCAs. It was checked into contrib/silverstorm/rds. r3471 was the first checkin and then a few more updates were made with bug fixes. > > Also, for within the HCA failover, are you talking on APM or basically, > you apply the same failover scheme between to ports no matter if they > are on on the same HCA or on different HCAs? Keep it simple ie., apply the same failover scheme between two ports whether on same HCA or not. > > Are you aware to something in the openib infrastructure which is missing > for the failover design of RDS? if you specify the design/requirements i > am sure people on this list can quickly say if something is missing... > For failover Rds need support for the following: 1. Ability to assign single IP address to multiple IB ports 2. Address resolution mechanism should return multiple paths for the same destination IP address. On SilverStorm stack a single IP address can be assigned to two ports in the system. When a path fails, RDS can re-establish connection to the same destination IP address...ipoib_path( dst_ip) returns all possible paths to the destination ip. Does the CMA handle multiple paths to a destination IP? It does not need to return multiple paths to Rds. For now, even if it picks the first available path that should be sufficient. > Or. > > >> [openfabrics-ewg] Before we can start testing - we needto ensure that > >> RDS is fully ported. > >> > >> Pandit, Ranjit rpandit at silverstorm.com > >> > >> Following features are yet to be implemented in OpenFabric Rds: > >> > >> 1. Failover > >> 2. Loopback connections > >> 3. support for /proc fs like Rds config, stats and info. > >> > >> > >> > >> Ranjit > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From jlentini at netapp.com Tue May 2 11:21:28 2006 From: jlentini at netapp.com (James Lentini) Date: Tue, 2 May 2006 14:21:28 -0400 (EDT) Subject: [openib-general] Re: [PATCH] [RFC] dapltest change for iwarp In-Reply-To: <1146248558.28503.20.camel@stevo-desktop> References: <1146248558.28503.20.camel@stevo-desktop> Message-ID: On Fri, 28 Apr 2006, Steve Wise wrote: > James, > > This patch changes the dapltest transaction test to force the client > side (the side that dat_ep_connect()) to send the first RDMA message. > This ensures that the IWARP MPA protocol requirements are met. > > I'm presenting this for discussion and possible inclusion in the > trunk. > > A transport independent application should be designed to work over all > transports and should therefore utilize the only the common features. > This implies that the application should always initiate RDMA exchanges > starting with the client, to avoid MPA problems. > > Comments? Steve, Thanks for pointing me to the MPA spec. requirement. At this time, I don't see a way around this. Although I like the idea of transparently satisfying this requirement as part of the iWARP CMs connection establishment, that doesn't appear to be viable or standard. Any objections to using this compressed version of your patch? I updated the comment to explain the flow control and used the same portion of code for both the client and server's receive. test/dapltest/test/dapl_transaction_test.c =================================================================== --- test/dapltest/test/dapl_transaction_test.c (revision 6735) +++ test/dapltest/test/dapl_transaction_test.c (working copy) @@ -972,38 +972,42 @@ retry: } /* end foreach op */ /* - * Send our memory info (synchronously) + * Send our memory info. The client performs the first send to comply + * with the iWARP MPA protocol's "Connection Startup Rules". */ DT_Tdep_PT_Debug (1,(phead,"Test[" F64x "]: Sending %s Memory Info\n", test_ptr->base_port, test_ptr->is_server ? "Server" : "Client")); - /* post the send buffer */ - if (!DT_post_send_buffer (phead, + if (!test_ptr->is_server ) { + + /* post the send buffer */ + if (!DT_post_send_buffer (phead, test_ptr->ep_context[i].ep_handle, test_ptr->ep_context[i].bp, RMI_SEND_BUFFER_ID, buff_size)) - { - /* error message printed by DT_post_send_buffer */ - goto test_failure; - } - /* reap the send and verify it */ - dto_cookie.as_64 = LZERO; - dto_cookie.as_ptr = - (DAT_PVOID) DT_Bpool_GetBuffer ( - test_ptr->ep_context[i].bp, - RMI_SEND_BUFFER_ID); - if (!DT_dto_event_wait (phead, test_ptr->reqt_evd_hdl, &dto_stat) || - !DT_dto_check ( phead, + { + /* error message printed by DT_post_send_buffer */ + goto test_failure; + } + /* reap the send and verify it */ + dto_cookie.as_64 = LZERO; + dto_cookie.as_ptr = + (DAT_PVOID) DT_Bpool_GetBuffer ( + test_ptr->ep_context[i].bp, + RMI_SEND_BUFFER_ID); + if (!DT_dto_event_wait (phead, test_ptr->reqt_evd_hdl, &dto_stat) || + !DT_dto_check ( phead, &dto_stat, test_ptr->ep_context[i].ep_handle, buff_size, dto_cookie, test_ptr->is_server ? "Client_Mem_Info_Send" : "Server_Mem_Info_Send")) - { - goto test_failure; + { + goto test_failure; + } } /* @@ -1029,6 +1033,36 @@ retry: goto test_failure; } + if (test_ptr->is_server ) { + /* post the send buffer */ + if (!DT_post_send_buffer (phead, + test_ptr->ep_context[i].ep_handle, + test_ptr->ep_context[i].bp, + RMI_SEND_BUFFER_ID, + buff_size)) + { + /* error message printed by DT_post_send_buffer */ + goto test_failure; + } + /* reap the send and verify it */ + dto_cookie.as_64 = LZERO; + dto_cookie.as_ptr = + (DAT_PVOID) DT_Bpool_GetBuffer ( + test_ptr->ep_context[i].bp, + RMI_SEND_BUFFER_ID); + if (!DT_dto_event_wait (phead, test_ptr->reqt_evd_hdl, &dto_stat) || + !DT_dto_check ( phead, + &dto_stat, + test_ptr->ep_context[i].ep_handle, + buff_size, + dto_cookie, + test_ptr->is_server ? "Client_Mem_Info_Send" + : "Server_Mem_Info_Send")) + { + goto test_failure; + } + } + /* * Extract what we need */ From rdreier at cisco.com Tue May 2 11:52:23 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 May 2006 11:52:23 -0700 Subject: [openib-general] Re: [PATCH 10/12] SRP: Changing ibsrpdm In-Reply-To: <20060501113045.GK17552@mellanox.co.il> (Ishai Rabinovitz's message of "Mon, 1 May 2006 14:30:45 +0300") References: <20060501113045.GK17552@mellanox.co.il> Message-ID: Thanks, I committed a cleaned up version of this. From rdreier at cisco.com Tue May 2 11:53:40 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 May 2006 11:53:40 -0700 Subject: [openib-general] Re: [PATCH 03/12] SRP: Changing ibsrpdm In-Reply-To: <20060501112710.GD17552@mellanox.co.il> (Ishai Rabinovitz's message of "Mon, 1 May 2006 14:27:10 +0300") References: <20060501112710.GD17552@mellanox.co.il> Message-ID: Ishai> It is nicer to perform the init_work just before the call Ishai> to schedule_work. I disagree... it seems cleaner to initialize the work structure only once, instead of redoing it every time we schedule it. - R. From rdreier at cisco.com Tue May 2 11:55:52 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 May 2006 11:55:52 -0700 Subject: [openib-general] Re: [PATCH 05/12] SRP: Changing ibsrpdm In-Reply-To: <20060501112812.GF17552@mellanox.co.il> (Ishai Rabinovitz's message of "Mon, 1 May 2006 14:28:12 +0300") References: <20060501112812.GF17552@mellanox.co.il> Message-ID: Seems like it would be simpler to have a "remove" attribute in the individual SCSI host's sysfs directory (ie put it in the srp_host_attrs), rather than having to search for a given target among all the target ports we have. - R. From koop at cse.ohio-state.edu Tue May 2 12:31:29 2006 From: koop at cse.ohio-state.edu (Matthew Koop) Date: Tue, 2 May 2006 15:31:29 -0400 (EDT) Subject: [openib-general] Problem running mpdboot command in MVAPICH2 v0.9.3-RC0 In-Reply-To: Message-ID: Albert, Thanks for sending the information. We'll have to take a look at it and get back to you. Thanks, Matthew Koop > Thanks for helping out on this issue. As requested, the info is > attached. > > Note: The info was obtained from compilations of v0.9.2 and v0.9.3 > back-to-back. Thus, system and setup are exactly the same. Afterwards, > mpdboot works on v0.9.2 but NOT v0.9.3 > -----Original Message----- > From: Matthew Koop [mailto:koop at cse.ohio-state.edu] > Sent: Monday, May 01, 2006 6:56 PM > To: Albert To > Cc: wei huang; openib-general at openib.org > Subject: RE: [openib-general] Problem running mpdboot command in > MVAPICH2 v0.9.3-RC0 > > Albert, > > Has anything changed with your system since compiling MVAPICH2? I'm a > bit confused why it would have worked with 0.9.2 and not 0.9.3-RC0. > There wasn't any change between 0.9.2 and 0.9.3-RC0 that should create > this type of issue. > > Can you try re-compiling and re-running? If you could also send along > the configure.log generated it may help us look into this issue. Also, > can you just send the output from ls -l /usr/local/lib, just to make > sure there isn't any problems there? > > Thanks, > > Matthew Koop > > - > Network-Based Computing Laboratory > Ohio State University > > > > > Hi Wei, > > > > Thanks for a prompt reply. > > > > Yes, I did originally export the LD_LIBRARY_PATH in .bashrc as > followed: > > export LD_LIBRARY_PATH=/usr/local/lib > > > > I've also tried your suggestion: > > export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH > > > > In either case, issue still exists. Given the same setup, I did NOT > > see this issue in v0.9.2 (obtained from > > https://openib.org/svn/gen2/trunk/src/userspace/mpi/mvapich2-gen2). > > > > Thanks, > > Albert > > > > ----Original Message----- > > From: wei huang [mailto:huanwei at cse.ohio-state.edu] > > Sent: Saturday, April 29, 2006 2:09 PM > > To: Albert To > > Cc: openib-general at openib.org > > Subject: Re: [openib-general] Problem running mpdboot command in > > MVAPICH2 v0.9.3-RC0 > > > > Hi Albert, > > > > Not sure if you export /usr/local/lib to LD_LIBRARY_PATH manually or > > it is in your bashrc. > > > > Could you please try to put > > export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH > > > > in your .bashrc (assume using bash) and try again? > > > > Thanks. > > > > Regards, > > Wei Huang > > > > 774 Dreese Lab, 2015 Neil Ave, > > Dept. of Computer Science and Engineering Ohio State University OH > > 43210 > > Tel: (614)292-8501 > > > > > > On Fri, 28 Apr 2006, Albert To wrote: > > > > > Hi, > > > > > > I downloaded and compiled the MVAPICH2 v0.9.3-RC0 using > > > make.mvapich2.gen2 script. The script finished without any errors. > > > However, I received "mpdboot: error while loading shared libraries: > > > libibverbs.so.1: cannot open shared object file: No such file or > > > directory" error while executing mpdboot -n 2 -f mpd.hosts. I > > > checked > > > > > library file libibverbs.so.1 and found it in /usr/local/lib folder. > > > LD_LIBRARY_PATH is already set to /usr/local/bin, but that didn't > > help. > > > > > > Is there another environment variable that I need to set to make > > > mpdboot works? Thanks in advance for your help. > > > > > > -Albert > > > > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > > From swise at opengridcomputing.com Tue May 2 12:32:05 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 02 May 2006 14:32:05 -0500 Subject: [openib-general] Re: [PATCH] [RFC] dapltest change for iwarp In-Reply-To: References: <1146248558.28503.20.camel@stevo-desktop> Message-ID: <1146598325.19826.22.camel@stevo-desktop> > Steve, > > Thanks for pointing me to the MPA spec. requirement. At this time, I > don't see a way around this. Although I like the idea of transparently > satisfying this requirement as part of the iWARP CMs connection > establishment, that doesn't appear to be viable or standard. > > Any objections to using this compressed version of your patch? No objections. > I > updated the comment to explain the flow control and used the > same portion of code for both the client and server's receive. > Do the other dapltest programs need this too? Like the perf tests and limit tests? I have only run the transaction tests over cxgb3... > test/dapltest/test/dapl_transaction_test.c > =================================================================== > --- test/dapltest/test/dapl_transaction_test.c (revision 6735) > +++ test/dapltest/test/dapl_transaction_test.c (working copy) > @@ -972,38 +972,42 @@ retry: > } /* end foreach op */ > > /* > - * Send our memory info (synchronously) > + * Send our memory info. The client performs the first send to comply > + * with the iWARP MPA protocol's "Connection Startup Rules". > */ > DT_Tdep_PT_Debug (1,(phead,"Test[" F64x "]: Sending %s Memory Info\n", > test_ptr->base_port, > test_ptr->is_server ? "Server" : "Client")); > > - /* post the send buffer */ > - if (!DT_post_send_buffer (phead, > + if (!test_ptr->is_server ) { > + > + /* post the send buffer */ > + if (!DT_post_send_buffer (phead, > test_ptr->ep_context[i].ep_handle, > test_ptr->ep_context[i].bp, > RMI_SEND_BUFFER_ID, > buff_size)) > - { > - /* error message printed by DT_post_send_buffer */ > - goto test_failure; > - } > - /* reap the send and verify it */ > - dto_cookie.as_64 = LZERO; > - dto_cookie.as_ptr = > - (DAT_PVOID) DT_Bpool_GetBuffer ( > - test_ptr->ep_context[i].bp, > - RMI_SEND_BUFFER_ID); > - if (!DT_dto_event_wait (phead, test_ptr->reqt_evd_hdl, &dto_stat) || > - !DT_dto_check ( phead, > + { > + /* error message printed by DT_post_send_buffer */ > + goto test_failure; > + } > + /* reap the send and verify it */ > + dto_cookie.as_64 = LZERO; > + dto_cookie.as_ptr = > + (DAT_PVOID) DT_Bpool_GetBuffer ( > + test_ptr->ep_context[i].bp, > + RMI_SEND_BUFFER_ID); > + if (!DT_dto_event_wait (phead, test_ptr->reqt_evd_hdl, &dto_stat) || > + !DT_dto_check ( phead, > &dto_stat, > test_ptr->ep_context[i].ep_handle, > buff_size, > dto_cookie, > test_ptr->is_server ? "Client_Mem_Info_Send" > : "Server_Mem_Info_Send")) > - { > - goto test_failure; > + { > + goto test_failure; > + } > } > > /* > @@ -1029,6 +1033,36 @@ retry: > goto test_failure; > } > > + if (test_ptr->is_server ) { > + /* post the send buffer */ > + if (!DT_post_send_buffer (phead, > + test_ptr->ep_context[i].ep_handle, > + test_ptr->ep_context[i].bp, > + RMI_SEND_BUFFER_ID, > + buff_size)) > + { > + /* error message printed by DT_post_send_buffer */ > + goto test_failure; > + } > + /* reap the send and verify it */ > + dto_cookie.as_64 = LZERO; > + dto_cookie.as_ptr = > + (DAT_PVOID) DT_Bpool_GetBuffer ( > + test_ptr->ep_context[i].bp, > + RMI_SEND_BUFFER_ID); > + if (!DT_dto_event_wait (phead, test_ptr->reqt_evd_hdl, &dto_stat) || > + !DT_dto_check ( phead, > + &dto_stat, > + test_ptr->ep_context[i].ep_handle, > + buff_size, > + dto_cookie, > + test_ptr->is_server ? "Client_Mem_Info_Send" > + : "Server_Mem_Info_Send")) > + { > + goto test_failure; > + } > + } > + > /* > * Extract what we need > */ From mst at mellanox.co.il Tue May 2 12:34:31 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 2 May 2006 22:34:31 +0300 Subject: [openib-general] Re: [PATCH 09/12] SRP: Changing ibsrpdm In-Reply-To: References: <20060501113019.GJ17552@mellanox.co.il> Message-ID: <20060502193431.GB2980@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH 09/12] SRP: Changing ibsrpdm > > Ishai> alloca man page on my system says: The alloca() function is > Ishai> machine and compiler dependent. On many systems its > Ishai> implementation is buggy. Its use is discouraged. Lets not > Ishai> use it. > > The man page is talking about non-Linux systems I think. I don't know > of any problems with any modern Linux libc so I think it's better to > have something automatically leak proof. Fine, but note that what Ishai did is also leak-proof. Since the size is constant SIZE_OF_QUERY_RESPONSE, what's the reason for playing tricks with alloca? -- MST From rdreier at cisco.com Tue May 2 12:51:17 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 May 2006 12:51:17 -0700 Subject: [openib-general] Re: [PATCH 09/12] SRP: Changing ibsrpdm In-Reply-To: <20060502193431.GB2980@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 2 May 2006 22:34:31 +0300") References: <20060501113019.GJ17552@mellanox.co.il> <20060502193431.GB2980@mellanox.co.il> Message-ID: Michael> Fine, but note that what Ishai did is also leak-proof. Michael> Since the size is constant SIZE_OF_QUERY_RESPONSE, what's Michael> the reason for playing tricks with alloca? It's ugly to declare an array of uint8_t and then cast to void *. And I don't really consider alloca() to be a trick. Anyway, I guess I'll fix it up to get rid of that one alloca call. From mst at mellanox.co.il Tue May 2 14:57:26 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 3 May 2006 00:57:26 +0300 Subject: [openib-general] Re: [PATCH 09/12] SRP: Changing ibsrpdm In-Reply-To: References: <20060501113019.GJ17552@mellanox.co.il> <20060502193431.GB2980@mellanox.co.il> Message-ID: <20060502215726.GA20261@mellanox.co.il> Quoting r. Roland Dreier : > It's ugly to declare an array of uint8_t and then cast to void *. That's true. Maybe use a proper stucture with actual query format? -- MST From jlentini at netapp.com Tue May 2 14:23:40 2006 From: jlentini at netapp.com (James Lentini) Date: Tue, 2 May 2006 17:23:40 -0400 (EDT) Subject: [openib-general] Re: [PATCH] [RFC] dapltest change for iwarp In-Reply-To: <1146598325.19826.22.camel@stevo-desktop> References: <1146248558.28503.20.camel@stevo-desktop> <1146598325.19826.22.camel@stevo-desktop> Message-ID: On Tue, 2 May 2006, Steve Wise wrote: > > > Steve, > > > > Thanks for pointing me to the MPA spec. requirement. At this time, I > > don't see a way around this. Although I like the idea of transparently > > satisfying this requirement as part of the iWARP CMs connection > > establishment, that doesn't appear to be viable or standard. > > > > Any objections to using this compressed version of your patch? > > No objections. Committed in revision 6873. > > I updated the comment to explain the flow control and used the > > same portion of code for both the client and server's receive. > > Do the other dapltest programs need this too? Like the perf tests and > limit tests? I have only run the transaction tests over cxgb3... Yes, there are probably several places with this rule will need to be enforced. > > test/dapltest/test/dapl_transaction_test.c > > =================================================================== > > --- test/dapltest/test/dapl_transaction_test.c (revision 6735) > > +++ test/dapltest/test/dapl_transaction_test.c (working copy) > > @@ -972,38 +972,42 @@ retry: > > } /* end foreach op */ > > > > /* > > - * Send our memory info (synchronously) > > + * Send our memory info. The client performs the first send to comply > > + * with the iWARP MPA protocol's "Connection Startup Rules". > > */ > > DT_Tdep_PT_Debug (1,(phead,"Test[" F64x "]: Sending %s Memory Info\n", > > test_ptr->base_port, > > test_ptr->is_server ? "Server" : "Client")); > > > > - /* post the send buffer */ > > - if (!DT_post_send_buffer (phead, > > + if (!test_ptr->is_server ) { > > + > > + /* post the send buffer */ > > + if (!DT_post_send_buffer (phead, > > test_ptr->ep_context[i].ep_handle, > > test_ptr->ep_context[i].bp, > > RMI_SEND_BUFFER_ID, > > buff_size)) > > - { > > - /* error message printed by DT_post_send_buffer */ > > - goto test_failure; > > - } > > - /* reap the send and verify it */ > > - dto_cookie.as_64 = LZERO; > > - dto_cookie.as_ptr = > > - (DAT_PVOID) DT_Bpool_GetBuffer ( > > - test_ptr->ep_context[i].bp, > > - RMI_SEND_BUFFER_ID); > > - if (!DT_dto_event_wait (phead, test_ptr->reqt_evd_hdl, &dto_stat) || > > - !DT_dto_check ( phead, > > + { > > + /* error message printed by DT_post_send_buffer */ > > + goto test_failure; > > + } > > + /* reap the send and verify it */ > > + dto_cookie.as_64 = LZERO; > > + dto_cookie.as_ptr = > > + (DAT_PVOID) DT_Bpool_GetBuffer ( > > + test_ptr->ep_context[i].bp, > > + RMI_SEND_BUFFER_ID); > > + if (!DT_dto_event_wait (phead, test_ptr->reqt_evd_hdl, &dto_stat) || > > + !DT_dto_check ( phead, > > &dto_stat, > > test_ptr->ep_context[i].ep_handle, > > buff_size, > > dto_cookie, > > test_ptr->is_server ? "Client_Mem_Info_Send" > > : "Server_Mem_Info_Send")) > > - { > > - goto test_failure; > > + { > > + goto test_failure; > > + } > > } > > > > /* > > @@ -1029,6 +1033,36 @@ retry: > > goto test_failure; > > } > > > > + if (test_ptr->is_server ) { > > + /* post the send buffer */ > > + if (!DT_post_send_buffer (phead, > > + test_ptr->ep_context[i].ep_handle, > > + test_ptr->ep_context[i].bp, > > + RMI_SEND_BUFFER_ID, > > + buff_size)) > > + { > > + /* error message printed by DT_post_send_buffer */ > > + goto test_failure; > > + } > > + /* reap the send and verify it */ > > + dto_cookie.as_64 = LZERO; > > + dto_cookie.as_ptr = > > + (DAT_PVOID) DT_Bpool_GetBuffer ( > > + test_ptr->ep_context[i].bp, > > + RMI_SEND_BUFFER_ID); > > + if (!DT_dto_event_wait (phead, test_ptr->reqt_evd_hdl, &dto_stat) || > > + !DT_dto_check ( phead, > > + &dto_stat, > > + test_ptr->ep_context[i].ep_handle, > > + buff_size, > > + dto_cookie, > > + test_ptr->is_server ? "Client_Mem_Info_Send" > > + : "Server_Mem_Info_Send")) > > + { > > + goto test_failure; > > + } > > + } > > + > > /* > > * Extract what we need > > */ > From rdreier at cisco.com Tue May 2 14:57:36 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 May 2006 14:57:36 -0700 Subject: [openib-general] [ANNOUNCE] libibverbs 1.0.3 released Message-ID: I just made a 1.0.3 release of libibverbs and pushed it out to the relevant channels, which means that it should appear on http://openib.org/downloads/ shortly. Binary packages will also appear in Debian and Fedora Extras when the builds complete. Changes since 1.0.2 include: - Reduce dependency on libsysfs. Introduce ibv_get_sysfs_path() and ibv_read_sysfs_file() functions for low-level driver use. All libsysfs use will be removed in libibverbs 1.1. - Deprecate ib_XXX symbols and introduce ibv_XXX versions for better consistency in the API. The ib_XXX versions will be removed in libibverbs 1.1. - Add ibv_rate_to_mult() and mult_to_ibv_rate() functions. - Fix problems building for pre-V9 sparc ISAs. - Other misc fixes and cleanups. See the ChangeLog in the package for full details. Thanks, Roland From hozer at hozed.org Tue May 2 15:15:45 2006 From: hozer at hozed.org (Troy Benjegerdes) Date: Tue, 2 May 2006 17:15:45 -0500 Subject: [openib-general] Re: TSO and IPoIB performance degradation In-Reply-To: <20060428001629.GA3364@greglaptop.internal.keyresearch.com> References: <20060320090629.GA11352@mellanox.co.il> <20060320.015500.72136710.davem@davemloft.net> <20060320102234.GV29929@mellanox.co.il> <20060320.023704.70907203.davem@davemloft.net> <20060427041323.GX15855@narn.hozed.org> <20060427072352.GB1805@greglaptop.hsd1.ca.comcast.net> <20060427232240.GB3265@esmail.cup.hp.com> <20060428001629.GA3364@greglaptop.internal.keyresearch.com> Message-ID: <20060502221530.GC15855@narn.hozed.org> On Thu, Apr 27, 2006 at 05:16:29PM -0700, Greg Lindahl wrote: > On Thu, Apr 27, 2006 at 04:22:40PM -0700, Grant Grundler wrote: > > > Anything preventnig such a gateway from routing SDP to ethernet? > > Those gateways obviously will grok IB protocols. > > I'm asking becuase I don't understand/know if there is a real > > barrier to an IB -> ethernet gateway _without_ IPoIB. > > I don't know if a SDP to ethernet gateway even exists, but I do know > that it's a lot more work than just an IPoIB to ethernet gateway -- > the gateway is going to have to pass all its data through a TCP stack. > So I would expect SDP to ethernet to not run very fast, especially on > a gateway with lots of streams going. And this is exactly the reason that we should not be playing games with "infiniband specific" TCP optimzations. If you stay on the IB network, use SDP or verbs. If you are going to cross networks, you want to be running the full host TCP stack that has been well tested and is robust to all the kinds of failures you see crossing networks. This does not mean that it won't be fast, but you *will* have more overhead than on a single network fabric. If someone has a configureation where full TCP processing on the host is a bottleneck and not the IPoIB to ethernet gateway, then let's have this discussion again. But I don't believe such a configuration actually exists anywhere. If you think you have some problem like this, I would love to be able to run some benchmarks on the system. From rpandit at silverstorm.com Tue May 2 17:10:22 2006 From: rpandit at silverstorm.com (Ranjit Pandit) Date: Tue, 2 May 2006 17:10:22 -0700 Subject: [openib-general] [PATCH] RDMA CM: assign port numbers when binding a cm_id to an address In-Reply-To: References: Message-ID: <96f8e60e0605021710ka1db225gd6300197a71d9718@mail.gmail.com> Sean, Rds uses RDMA_PS_UDP. Here is a patch to add that. Signed-off-by: Ranjit Pandit Index: cma.c =================================================================== --- cma.c (revision 6737) +++ cma.c (working copy) @@ -62,6 +62,7 @@ static struct workqueue_struct *cma_wq; static DEFINE_IDR(sdp_ps); static DEFINE_IDR(tcp_ps); +static DEFINE_IDR(udp_ps); struct cma_device { struct list_head list; @@ -1427,6 +1428,9 @@ case RDMA_PS_TCP: ps = &tcp_ps; break; + case RDMA_PS_UDP: + ps = &udp_ps; + break; default: return -EPROTONOSUPPORT; } @@ -1831,6 +1835,7 @@ destroy_workqueue(cma_wq); idr_destroy(&sdp_ps); idr_destroy(&tcp_ps); + idr_destroy(&udp_ps); } module_init(cma_init); > >> +static int cma_get_port(struct rdma_id_private *id_priv) > >> +{ > >> + struct idr *ps; > >> + int ret; > >> + > >> + switch (id_priv->id.ps) { > >> + case RDMA_PS_SDP: > >> + ps = &sdp_ps; > >> + break; > >> + case RDMA_PS_TCP: > >> + ps = &tcp_ps; > >> + break; > >> + default: > >> + return -EPROTONOSUPPORT; > >> + } > > > >Do you plan to add support for UDP and SCTP since they have > >rdma_port_space values? Is it as simple as adding a UDP and SCTP idr? > > Adding the port space values should be as simple as adding the UDP / SCTP idr's. > I'm just not as sure that SCTP support is handled in other locations in the > code, and I don't think it makes sense to have UDP connections. I deferred both > of these for now. > - Sean > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From huanwei at cse.ohio-state.edu Tue May 2 18:23:25 2006 From: huanwei at cse.ohio-state.edu (wei huang) Date: Tue, 2 May 2006 21:23:25 -0400 (EDT) Subject: [openib-general] Problem running mpdboot command in MVAPICH2 v0.9.3-RC0 In-Reply-To: Message-ID: Hi Albert, It seems a bit weird that you cannot find the library while the path is included in LD_LIBRARY_PATH. In order to make sure that this is not an environment setup problem, would you please do a small test: >From one of your cluster nodes: ssh $mc ibv_devinfo $mc is all the machines that are in your hostfile. If everything is fine, you should be able to ssh to each machine and get infiniband device information. Thanks. Regards, Wei Huang 774 Dreese Lab, 2015 Neil Ave, Dept. of Computer Science and Engineering Ohio State University OH 43210 Tel: (614)292-8501 On Mon, 1 May 2006, Albert To wrote: > Hi Matthew, > > Thanks for helping out on this issue. As requested, the info is > attached. > > Note: The info was obtained from compilations of v0.9.2 and v0.9.3 > back-to-back. Thus, system and setup are exactly the same. Afterwards, > mpdboot works on v0.9.2 but NOT v0.9.3 > > Thanks, > Albert > > -----Original Message----- > From: Matthew Koop [mailto:koop at cse.ohio-state.edu] > Sent: Monday, May 01, 2006 6:56 PM > To: Albert To > Cc: wei huang; openib-general at openib.org > Subject: RE: [openib-general] Problem running mpdboot command in > MVAPICH2 v0.9.3-RC0 > > Albert, > > Has anything changed with your system since compiling MVAPICH2? I'm a > bit confused why it would have worked with 0.9.2 and not 0.9.3-RC0. > There wasn't any change between 0.9.2 and 0.9.3-RC0 that should create > this type of issue. > > Can you try re-compiling and re-running? If you could also send along > the configure.log generated it may help us look into this issue. Also, > can you just send the output from ls -l /usr/local/lib, just to make > sure there isn't any problems there? > > Thanks, > > Matthew Koop > > - > Network-Based Computing Laboratory > Ohio State University > > > > > Hi Wei, > > > > Thanks for a prompt reply. > > > > Yes, I did originally export the LD_LIBRARY_PATH in .bashrc as > followed: > > export LD_LIBRARY_PATH=/usr/local/lib > > > > I've also tried your suggestion: > > export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH > > > > In either case, issue still exists. Given the same setup, I did NOT > > see this issue in v0.9.2 (obtained from > > https://openib.org/svn/gen2/trunk/src/userspace/mpi/mvapich2-gen2). > > > > Thanks, > > Albert > > > > ----Original Message----- > > From: wei huang [mailto:huanwei at cse.ohio-state.edu] > > Sent: Saturday, April 29, 2006 2:09 PM > > To: Albert To > > Cc: openib-general at openib.org > > Subject: Re: [openib-general] Problem running mpdboot command in > > MVAPICH2 v0.9.3-RC0 > > > > Hi Albert, > > > > Not sure if you export /usr/local/lib to LD_LIBRARY_PATH manually or > > it is in your bashrc. > > > > Could you please try to put > > export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH > > > > in your .bashrc (assume using bash) and try again? > > > > Thanks. > > > > Regards, > > Wei Huang > > > > 774 Dreese Lab, 2015 Neil Ave, > > Dept. of Computer Science and Engineering Ohio State University OH > > 43210 > > Tel: (614)292-8501 > > > > > > On Fri, 28 Apr 2006, Albert To wrote: > > > > > Hi, > > > > > > I downloaded and compiled the MVAPICH2 v0.9.3-RC0 using > > > make.mvapich2.gen2 script. The script finished without any errors. > > > However, I received "mpdboot: error while loading shared libraries: > > > libibverbs.so.1: cannot open shared object file: No such file or > > > directory" error while executing mpdboot -n 2 -f mpd.hosts. I > > > checked > > > > > library file libibverbs.so.1 and found it in /usr/local/lib folder. > > > LD_LIBRARY_PATH is already set to /usr/local/bin, but that didn't > > help. > > > > > > Is there another environment variable that I need to set to make > > > mpdboot works? Thanks in advance for your help. > > > > > > -Albert > > > > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > > From sean.hefty at intel.com Tue May 2 20:48:28 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 2 May 2006 20:48:28 -0700 Subject: [openib-general] [PATCH] RDMA CM: assign port numbers when binding a cm_id to an address In-Reply-To: <96f8e60e0605021710ka1db225gd6300197a71d9718@mail.gmail.com> Message-ID: >Rds uses RDMA_PS_UDP. >Here is a patch to add that. I thought that RDS established a connection. (Maybe it should be called a channel multiplexing service?) I don't think that we want to use the RDMA UDP port space for connected QPs. That should be reserved for UD QPs. Can't RDS sit over the TCP port space? It seems that RDS would use TCP, rather than UDP, if implemented over sockets, since it requires reliability from the lower layer. - Sean From sean.hefty at intel.com Tue May 2 21:10:45 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 2 May 2006 21:10:45 -0700 Subject: [openib-general] re cma upcalls serialization / disconnected eventquestion In-Reply-To: <44570EF7.5060109@voltaire.com> Message-ID: >I see, so just to make sure: following rmda_connect i will get always >see one of {ESTABLISHED, REJECTED, CONNECT_ERROR} ? Or DEVICE_REMOVAL, but those are the typical callbacks. - Sean From sean.hefty at intel.com Tue May 2 21:15:39 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 2 May 2006 21:15:39 -0700 Subject: [openib-general] RFC: detecting duplicate MAD requests In-Reply-To: <1146508240.24063.825.camel@hal.voltaire.com> Message-ID: >> And I don't believe that case 3 exists either, but would end up being treated >as >> DS RMPP by the implementation. > >Why ? Just wondering... > >> If case 3 doesn't exist, then I think we can >> come up with a generic way to identify DS RMPP that doesn't require checking >> class or methods. > >How ? The MAD layer can assume that any RMPP request that expects a response is DS RMPP. This is a simple check that doesn't involve looking at the class or method. If the check is incorrect, and it's an RMPP request followed by a non-RMPP response, then we send an extra ACK of the final ACK. I'm not sure if the extra ACK would cause any problems. >> If case 3 does exist, then I think we'll need class / method >> checking to identify DS RMPP. > >By doesn't exist, do you mean not possible in the architecture or no >current use cases like this ? No use cases. - Sean From sean.hefty at intel.com Tue May 2 21:24:33 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 2 May 2006 21:24:33 -0700 Subject: [openib-general] RE: Re: [PATCH v2] mad: use GID/LID on requester sidewhen matching responses to requests In-Reply-To: <20060430012531.GA15584@mellanox.co.il> Message-ID: >It probably would be better to commit it as a separate patch -- one >idea per patch." > >so I understand he's fine with it, and the comment was with regard to >how to commit this - first core files, then MAD files. > >Anyway, its trivial to split the patch, if you want help with that let me know. I'm a little slow to respond because I'm out of the office. I see that you split the patches, so I'll look at them tomorrow. - Sean From sean.hefty at intel.com Tue May 2 21:27:55 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 2 May 2006 21:27:55 -0700 Subject: [openib-general] [RFC] [PATCH 1/3] RDMA CM:add rdma_get/set_optioncalls to get/set path records In-Reply-To: Message-ID: >Agreed... as an interface to userspace, get/set opt makes sense, but >inside the kernel you just end up with a dispatch function that >demultiplexes things to the real work. So I think the real work >functions should be the kernel API. The dispatch function would still be there for userspace, but I'll export the kernel functions. What I'm more concerned with is whether it's acceptable to call copy_to_user / copy_from_user on kernel memory? - Sean From rdreier at cisco.com Tue May 2 21:34:15 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 02 May 2006 21:34:15 -0700 Subject: [openib-general] [RFC] [PATCH 1/3] RDMA CM:add rdma_get/set_optioncalls to get/set path records In-Reply-To: (Sean Hefty's message of "Tue, 2 May 2006 21:27:55 -0700") References: Message-ID: Sean> What I'm more concerned with is whether it's acceptable to Sean> call copy_to_user / copy_from_user on kernel memory? No, I don't think you should do that. It might work by chance but it really sounds like something is wrong with the design if you want to do that. For one thing it will make it impossible to have sparse check __user annotations. Why wouldn't you just do the copying in the place where you dispatch user functions to the kernel handlers? - R. From Nitin.Hande at Sun.COM Wed May 3 00:36:37 2006 From: Nitin.Hande at Sun.COM (Nitin Hande) Date: Wed, 03 May 2006 00:36:37 -0700 Subject: [openib-general] OpenIB Linux and Solaris In-Reply-To: References: Message-ID: <44585D85.1000304@sun.com> Paul Solaris has its own stack implementation of the IB components. We do run some basic interoperability test's on various components (I can confirm about IPoIB) between solaris and OpenIB stack. Thanks Nitin Paul Baxter wrote: > Can anybody comment on recent experience regarding inter-operability > of OpenIB/Linux on Intel 64 bit x86 and Sparx/Solaris using the Sun > stack? > > Ideally inter-operability would start at IPoIB and the SM but perhaps > extend to inter-operable SDP? > > Has anyone tried porting some userspace low-level OpenIB verbs comms > (UC, RDMA write specifically). I am not experienced enough to > understand if there are any gotchas with unimplemented features in > either the Solaris or OpenIB implementations. I've not had much luck > investigating the equivalent of userspace verbs support for Solaris. > > I notice that some of the commercial stacks offer cross platform > support and say 'OpenIB support, Linux, Windows, Solaris and Mac OS > X'. I suspect this might be subtle wording as I don't think OpenIB is > ported toSolaris, ULP support is using some Solaris 10 drivers instead? > > Advice/ URL pointers appreciated (Google wasn't my friend!) > > Paul Baxter > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general b From hofegger at ips.at Wed May 3 02:51:05 2006 From: hofegger at ips.at (Patrick Hofegger) Date: Wed, 03 May 2006 11:51:05 +0200 Subject: [openib-general] reserve_mtt_segs:Cannot reserve 511 MTT segments Message-ID: <44587D09.9090403@ips.at> Hello, Please look at the following syslog output... About 30 minutes later, the computer froze completely. A reboot resumed normal operations, but the problem is serious.As seen as in the syslogs... thanks for your help in advance. May 2 14:38:29 k02 kernel: THH(4): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox-hca/mlxhh/thh/tptm.c[262]: reserve_mtt_segs: Cannot reserve 511 MTT segments (506 dynamic MTT segments left) May 2 14:38:29 k02 kernel: THH(4): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox-hca/mlxhh/thh/tptm.c[1114]: alloc_reg_pages: Out of MTT entries May 2 14:38:29 k02 kernel: THH(1): XHH_mrwm_register_mr: rc=HH_EAGAIN May 2 14:38:29 k02 kernel: VIPKL(1): [MM_create_mr]:MM_mr_get_keys failed May 2 14:38:29 k02 kernel: THH(4): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox-hca/mlxhh/thh/tptm.c[262]: reserve_mtt_segs: Cannot reserve 511 MTT segments (506 dynamic MTT segments left) May 2 14:38:29 k02 kernel: THH(4): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox-hca/mlxhh/thh/tptm.c[1114]: alloc_reg_pages: Out of MTT entries May 2 14:38:29 k02 kernel: THH(1): XHH_mrwm_register_mr: rc=HH_EAGAIN May 2 14:38:29 k02 kernel: VIPKL(1): [MM_create_mr]:MM_mr_get_keys failed May 2 14:38:30 k02 kernel: THH(4): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox-hca/mlxhh/thh/tptm.c[262]: reserve_mtt_segs: Cannot reserve 511 MTT segments (506 dynamic MTT segments left) May 2 14:38:30 k02 kernel: THH(4): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox-hca/mlxhh/thh/tptm.c[1114]: alloc_reg_pages: Out of MTT entries May 2 14:38:30 k02 kernel: THH(1): XHH_mrwm_register_mr: rc=HH_EAGAIN (many more messages cut) -- Ing. Patrick Hofegger Systems Engineer IPS GmbH Franzosengraben 10 A - 1030 Vienna T# +43 1 796 86 86 - 52 F# +43 1 796 86 86 - 15 http://www.ips.at From sarahstanley101 at yahoo.co.in Wed May 3 03:56:57 2006 From: sarahstanley101 at yahoo.co.in (=?iso-8859-1?q?sarah=20stanley?=) Date: Wed, 3 May 2006 16:26:57 +0530 (IST) Subject: [openib-general] hello Message-ID: <20060503105657.2800.qmail@web8912.mail.in.yahoo.com> I have a new email address!You can now email me at: sarahstanley101 at yahoo.co.in We are experts in the sale of raw materials and we export into Canada and America and some parts of Europe. We are searching for reliable representatives who can help us establish a medium of getting to our customers in the Canada and America as well as legally receiving cash and all forms of payment on our behalf from them as our Representative. Please if interested in transacting business in view of helping us, so our clients could be making payment to you as our representative, we will be very glad. Compensations will be given and other benefits. Contact us for more information, if this proposal is acceptable to you. Please get back to me, so that remuneration can be worked out for your services as our representative in Canada and America as your location shall determine. For further information and inquiries: Contact Mr. Paul Thomas (Consultant) paul_thomas450 at yahoo.co.inMetallurgical Import & Export Company Tel: 0044 703 181 8799. We anticipate your earliest response in this regard. Thank - sarah stanley -------------- next part -------------- An HTML attachment was scrubbed... URL: From akpm at osdl.org Wed May 3 05:43:12 2006 From: akpm at osdl.org (Andrew Morton) Date: Wed, 3 May 2006 05:43:12 -0700 Subject: [openib-general] Re: [PATCH 00/16] ehca: IBM eHCA InfiniBand Device Driver In-Reply-To: <20060427125726.GK32127@wohnheim.fh-wedel.de> References: <4450B378.9000705@de.ibm.com> <20060427125726.GK32127@wohnheim.fh-wedel.de> Message-ID: <20060503054312.b3978297.akpm@osdl.org> On Thu, 27 Apr 2006 14:57:26 +0200 Jörn Engel wrote: > Don't expect much cheer and rejoicing over this. I suspect that akpm > or Linus will either want the 17 patches merged into one or have a > patchset where every single patch leaves the kernel in a working > state, including working eHCA driver. It doesn't matter in this case. The "don't break the build at any stage of a series" preference exists because it's extremely irritating to hit a won't-build in the middle of a git-bisect operation. But anybody who is bisection searching for a bug won't want to enable a brand-new driver in their config, so no problems. From RAISCH at de.ibm.com Wed May 3 06:56:34 2006 From: RAISCH at de.ibm.com (Christoph Raisch) Date: Wed, 3 May 2006 15:56:34 +0200 Subject: [openib-general] Re: [PATCH 13/16] ehca: firmware InfiniBand interface In-Reply-To: <17489.18630.75412.66803@cargo.ozlabs.ibm.com> Message-ID: Paul Mackerras wrote on 28.04.2006 00:42:14: > Mind you, since a lot of the parameters are used to return individual > bytes or half-words, which are then put into structures, it might be > better to pass the pointers to the structures and let the wrapper put > the values straight into the structures. > > Paul. As Paul already mentioned we can't change the firmware interface. ...so we would propose the following solution: For the two h_call wrappers with more than 8 parameters we'll change to the following signature: hipz_h_alloc_resource_cq(const struct ipz_adapter_handle adapter_handle, struct ehca_cq *cq, /* used for input and output parameters */ const struct ipz_eq_handle eq_handle); hipz_h_alloc_resource_qp(const struct ipz_adapter_handle adapter_handle, struct ehca_qp * qp, /* used for input and output parameters */ struct ehca_alloc_qp_params * param); /*input params not in ehca_qp*/ hipz_h_alloc_resource_mr(const struct ipz_adapter_handle adapter_handle, struct ehca_mr *mr, const u64 vaddr, const u64 length, const u32 access_ctrl, const struct ipz_pd pd); u64 hipz_h_query_mr(const struct ipz_adapter_handle adapter_handle, struct ehca_mr *mr); u64 hipz_h_reregister_pmr(const struct ipz_adapter_handle adapter_handle, struct ehca_mr *mr, const u64 vaddr_in, const u64 length, const u32 access_ctrl, const struct ipz_pd pd, const u64 mr_addr_cb); u64 hipz_h_register_smr(const struct ipz_adapter_handle adapter_handle, struct ehca_mr *mr, struct ehca_mr *orig_mr, const u64 vaddr_in, const u32 access_ctrl, const struct ipz_pd pd); What do you think about this solution? Gruss / Regards . . . Christoph Raisch From krause at cup.hp.com Wed May 3 09:50:11 2006 From: krause at cup.hp.com (Michael Krause) Date: Wed, 03 May 2006 09:50:11 -0700 Subject: [openib-general] re RDS missing features In-Reply-To: <96f8e60e0605011042ve9bb9m5e9675256a11eacd@mail.gmail.com> References: <96f8e60e0605011042ve9bb9m5e9675256a11eacd@mail.gmail.com> Message-ID: <6.2.0.14.2.20060503094518.02c59c38@esmail.cup.hp.com> At 10:42 AM 5/1/2006, Ranjit Pandit wrote: >On 5/1/06, Or Gerlitz wrote: >>Can you elaborate on each of the features, specifically the following >>points are of interest to us: >> >>+1 so you running Oracle Loopback traffic over RDS sockets? if yes, what >>the issue here? >>the openib CMA supports listen/connect on loopback addresses (eg >>127.0.0.1 or IPoIB local address) > >Yes. >There is no issue. It's just next in line for me to implement. > >> >>+2 by failover, are you referring to APM? that is failover between IB >>pathes to/from the same HCA >>over which the original connection/QP was established or you are talking >>on failover between HCAs > >Failover within and across HCAs. APM does not work for failover across HCAs. That is because it is two different types of fail over being discussed. APM is completely transparent to the IB RC connections thus there is no disruption or loss of data. Fail over across HCA is in effect replaying ULP transactions across a new RC connection. Without an application / ULP level acknowledgement, there is still a hole in the RDS proposal that has been raised and acknowledged in the past as existing as recently as the Sonoma get together. I still have not seen a response to my inquiry about the this ULP and API changes being at least comprehended beyond the Oracle usage model and perhaps being reviewed within the IETF given it represents changes in API and communication semantics. If the goal is to have RDS be a generic service then it should be reviewed and validated by other potential consumers as well as those subsystems that may be impacted. Mike >>+3 is the no support for /proc like for RDS an issue to run crload or >>demo Oracle (that is specific tuning >> and usage of non defaults is needed for any/optimal operation) > >No, this does not affect core functionality. You should be able to run >Oracle or crload without this feature. > >That was a list of things that still need to be implemented for GA and >not just demo > >> >>Or. >> >>[openfabrics-ewg] Before we can start testing - we needto ensure that >>RDS is fully ported. >> >>Pandit, Ranjit rpandit at silverstorm.com >> >>Following features are yet to be implemented in OpenFabric Rds: >> >> 1. Failover >>2. Loopback connections >>3. support for /proc fs like Rds config, stats and info. >> >> >> >>Ranjit >> >> >> >>_______________________________________________ >>openib-general mailing list >>openib-general at openib.org >>http://openib.org/mailman/listinfo/openib-general >> >>To unsubscribe, please visit >>http://openib.org/mailman/listinfo/openib-general >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Wed May 3 09:55:55 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 3 May 2006 09:55:55 -0700 Subject: [openib-general] [RFC] [PATCH 1/3] RDMA CM:add rdma_get/set_optioncalls to get/set path records In-Reply-To: Message-ID: >No, I don't think you should do that. It might work by chance but it >really sounds like something is wrong with the design if you want to >do that. For one thing it will make it impossible to have sparse >check __user annotations. It did work with my tests, but wasn't sure if it was guaranteed to work. I had a lot of trouble following the networking stack, but it seemed to pass char * in places, then cast to __user. >Why wouldn't you just do the copying in the place where you dispatch >user functions to the kernel handlers? I wanted to avoid duplicating the functionality, and protect against device removal, which is handled by the lower module. I will rework the patch. - Sean From sean.hefty at intel.com Wed May 3 10:04:34 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 3 May 2006 10:04:34 -0700 Subject: [openib-general] [rds]: there is a kernel oops while loading theRDS module in kernel 2.6.11 In-Reply-To: <200604301129.55041.dotanb@mellanox.co.il> Message-ID: >Trace:{:ib_local_sa:sa_db_init+148} Can you reproduce this just loading the ib_local_sa module? - Sean From mshefty at ichips.intel.com Wed May 3 10:40:31 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 03 May 2006 10:40:31 -0700 Subject: [openib-general] Re: [PATCH] SRP: Avoid a potential deadlock In-Reply-To: References: <20060501113548.GN17552@mellanox.co.il> Message-ID: <4458EB0F.6060506@ichips.intel.com> Roland Dreier wrote: > I thought that after the DREP is received, the CM will go through > timewait and we will eventually get a TIMEWAIT_EXIT event (with a > completion). Am I wrong? Have you actually seen this deadlock happen > in practice? This should be the case. TIMEWAIT_EXIT should follow DREP_RECEIVED. - Sean From halr at voltaire.com Wed May 3 10:50:59 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 May 2006 13:50:59 -0400 Subject: [openib-general] sdp code in trunk In-Reply-To: <20060501153555.GK3032@mellanox.co.il> References: <20060501153555.GK3032@mellanox.co.il> Message-ID: <1146678659.4719.29930.camel@hal.voltaire.com> On Mon, 2006-05-01 at 11:35, Michael S. Tsirkin wrote: > Hello! > I have replaced the SDP code on trunk with new, much smaller code base, > based on CMA. Note that only bcopy mode is supported. > The old sdp code has been moved to > https://openib.org/svn/gen2/branches/sdp_historic Isn't the usual way to first put out an RFC indicating you intend on doing this to solicit feedback as others may be using/depending on the trunk version ? > Please note that smaller LOC count does not mean less bugs yet - in fact, while > the CMA code (mostly sdp_cma.c) is ready and works well for me, the data > transfer part is in active development, and I'm aware of several race > condition/data corruption issues which prevent it from being generally useful > just yet, and which I am in the process of addressing. Just wondering what the rush was to replace the existing trunk code with some less capable and more buggy ? Should this have waited until that was not the case ? -- Hal From mshefty at ichips.intel.com Wed May 3 11:11:06 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 03 May 2006 11:11:06 -0700 Subject: [openib-general] [PATCH 1 of 2] add lmc cache In-Reply-To: <20060430012834.GA15657@mellanox.co.il> References: <20060430012834.GA15657@mellanox.co.il> Message-ID: <4458F23A.7070900@ichips.intel.com> Michael S. Tsirkin wrote: > Add LMC cache. Committed - thanks! From mshefty at ichips.intel.com Wed May 3 11:44:54 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 03 May 2006 11:44:54 -0700 Subject: [openib-general] [PATCH 2 of 2] mad: check GID/LID when searching for request In-Reply-To: <20060430012922.GA15663@mellanox.co.il> References: <20060430012922.GA15663@mellanox.co.il> Message-ID: <4458FA26.3030100@ichips.intel.com> Applied - thanks. - Sean From ralphc at pathscale.com Wed May 3 11:58:17 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Wed, 03 May 2006 11:58:17 -0700 Subject: [openib-general] [PATCH] fix core dump if umad_register() fails Message-ID: <1146682697.13951.139.camel@brick.internal.keyresearch.com> This patch fixes a core dump problem if debugging is on and umad_register() fails. Signed-off-by: Ralph Campbell Index: src/userspace/management/libibumad/src/umad.c =================================================================== --- src/userspace/management/libibumad/src/umad.c (revision 6885) +++ src/userspace/management/libibumad/src/umad.c (working copy) @@ -888,7 +888,7 @@ return req.id; /* return agentid */ } - DEBUG("portid %d registering qp %d class %s version %d oui 0x%x failed: %m", + DEBUG("portid %d registering qp %d class 0x%x version %d oui 0x%x failed: %m", portid, req.qpn, req.mgmt_class, req.mgmt_class_version, oui); return -EPERM; } -- Ralph Campbell From halr at voltaire.com Wed May 3 12:59:49 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 May 2006 15:59:49 -0400 Subject: [openib-general] [PATCH] fix core dump if umad_register() fails In-Reply-To: <1146682697.13951.139.camel@brick.internal.keyresearch.com> References: <1146682697.13951.139.camel@brick.internal.keyresearch.com> Message-ID: <1146686387.4719.31403.camel@hal.voltaire.com> On Wed, 2006-05-03 at 14:58, Ralph Campbell wrote: > This patch fixes a core dump problem if debugging is on and > umad_register() fails. > > Signed-off-by: Ralph Campbell Thanks. Applied to both trunk and 1.0 branch. Also, found another instance of same problem and fixed it. -- Hal From ralphc at pathscale.com Wed May 3 13:34:33 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Wed, 03 May 2006 13:34:33 -0700 Subject: [openib-general] [PATCH] fix madrpc_init() to use correct rmpp version Message-ID: <1146688473.13951.143.camel@brick.internal.keyresearch.com> I noticed that perfquery wasn't working and tracked it down to madrpc_init() using the wrong rmpp version. This patch fixes the problem. Signed-off-by: Ralph Campbell Index: src/userspace/management/libibmad/src/rpc.c =================================================================== --- src/userspace/management/libibmad/src/rpc.c (revision 6885) +++ src/userspace/management/libibmad/src/rpc.c (working copy) @@ -269,8 +269,6 @@ void madrpc_init(char *dev_name, int dev_port, int *mgmt_classes, int num_classes) { - int rmpp_version = 0; - if (umad_init() < 0) IBPANIC("can't init UMAD library"); @@ -278,6 +276,7 @@ IBPANIC("can't open UMAD port (%s:%d)", dev_name, dev_port); while (num_classes--) { + int rmpp_version = 0; int mgmt = *mgmt_classes++; if (mgmt == IB_SA_CLASS) -- Ralph Campbell From ralphc at pathscale.com Wed May 3 14:30:10 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Wed, 03 May 2006 14:30:10 -0700 Subject: [openib-general] [PATCH] sysfs display for local_link_integrity_errors broken Message-ID: <1146691810.13951.149.camel@brick.internal.keyresearch.com> The code to display local_link_integrity_errors and excessive_buffer_overrun_errors in /sys/class/infiniband//ports//counters/ had a bug extracting the 4 bit values. Signed-off-by: Ralph Campbell Index: src/linux-kernel/infiniband/core/sysfs.c =================================================================== --- src/linux-kernel/infiniband/core/sysfs.c (revision 6885) +++ src/linux-kernel/infiniband/core/sysfs.c (working copy) @@ -336,7 +336,7 @@ switch (width) { case 4: ret = sprintf(buf, "%u\n", (out_mad->data[40 + offset / 8] >> - (offset % 4)) & 0xf); + (4 - (offset % 8))) & 0xf); break; case 8: ret = sprintf(buf, "%u\n", out_mad->data[40 + offset / 8]); -- Ralph Campbell From halr at voltaire.com Wed May 3 14:28:26 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 03 May 2006 17:28:26 -0400 Subject: [openib-general] [PATCH] fix madrpc_init() to use correct rmpp version In-Reply-To: <1146688473.13951.143.camel@brick.internal.keyresearch.com> References: <1146688473.13951.143.camel@brick.internal.keyresearch.com> Message-ID: <1146691705.4719.32525.camel@hal.voltaire.com> On Wed, 2006-05-03 at 16:34, Ralph Campbell wrote: > I noticed that perfquery wasn't working and tracked it down to > madrpc_init() using the wrong rmpp version. > This patch fixes the problem. > > Signed-off-by: Ralph Campbell Thanks. Applied to both trunk and 1.0 branch. -- Hal From sweitzen at cisco.com Wed May 3 16:05:03 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 3 May 2006 16:05:03 -0700 Subject: [openib-general] RE: [openfabrics-ewg] Current OFED kernel snapshot - problems in backporting SRP to RH4 Message-ID: > > Known issues: > > 1. ipath installation fails on 2.6.9 - 2.6.11* kernels > > 2. OSU MPI compilation fails on SLES10, PPC64 > > 3. SRP is not supported on 2.6.9 - 2.6.13* kernels - Ishai > will follow up with details > > 4. Open MPI RPM build process fails - Jeff, will you be > able to send us fixes by Wed? Do we have any progress on the MPI and SRP issues? Scott From sweitzen at cisco.com Wed May 3 16:15:30 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 3 May 2006 16:15:30 -0700 Subject: [openib-general] uDAPL not supported on ppc64? Message-ID: > > I get this trying to compile uDAPL using install.sh with IBED > > 1.0 rc3 on RHEL4 U2 2.6.9-22 ppc64: > > > > WARNING: Dapl is not supported on PPC64 arcitecture > > WARNING: Dapl is not supported on PPC64 arcitecture > > > There are include files that map DAT-defined types > to architecture appropriate choices. Just fill in > the correct choices for PPC64 and submit a patch. > > Don't be afraid to ask for clarification on the > semantics of any types, but with the examples > already given it should be fairly clear. > I opened bug #48 for this issue. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems From sweitzen at cisco.com Wed May 3 16:49:55 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 3 May 2006 16:49:55 -0700 Subject: [openib-general] IPoIB ifconfig HWaddr blank on RHEL4 U3? Message-ID: OFED 1.0 rc3 on RHEL4 U3. IPoIB is working, but I just noticed the HWaddr is 00-00-00-00-00-00-00-00-00-00-00-00-00-00-0, shouldn't this have the GID? [root at svbu-qa1850-4 ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:13:72:50:B7:D1 inet addr:172.29.238.49 Bcast:172.29.239.255 Mask:255.255.252.0 inet6 addr: fe80::213:72ff:fe50:b7d1/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:31232 errors:0 dropped:0 overruns:0 frame:0 TX packets:13122 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:20539406 (19.5 MiB) TX bytes:1415914 (1.3 MiB) Base address:0xdcc0 Memory:dfae0000-dfb00000 ib0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-0 inet addr:192.168.2.49 Bcast:192.168.3.255 Mask:255.255.252.0 inet6 addr: fe80::202:c902:21:51d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:839425 errors:0 dropped:0 overruns:0 frame:0 TX packets:4384118 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:44110930 (42.0 MiB) TX bytes:8046551416 (7.4 GiB) ib1 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-0 inet addr:192.168.4.49 Bcast:192.168.5.255 Mask:255.255.254.0 inet6 addr: fe80::202:c902:21:51e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:364 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:46824 (45.7 KiB) TX bytes:408 (408.0 b) Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From sashak at voltaire.com Wed May 3 17:05:39 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 4 May 2006 03:05:39 +0300 Subject: [openib-general] [PATCH] opensm: prevent ports duplication in partition config Message-ID: <20060504000539.GC3689@sashak.voltaire.com> Hello Hal, There is fix for case when port is repeatedly configured as member of the same partition. If membership is different this may broke pkey tables update code. Sasha. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_prtn.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/osm/opensm/osm_prtn.c b/osm/opensm/osm_prtn.c index d2ed77b..1ebecb7 100644 --- a/osm/opensm/osm_prtn.c +++ b/osm/opensm/osm_prtn.c @@ -127,7 +127,8 @@ ib_api_status_t osm_prtn_add_port(osm_lo return status; } - if (osm_prtn_is_guid(p, guid)) { + if (cl_map_remove(&p->part_guid_tbl, guid) || + cl_map_remove(&p->full_guid_tbl, guid)) { osm_log(p_log, OSM_LOG_VERBOSE, "osm_prtn_add_port: " "port 0x%" PRIx64 " already in " "partition \'%s\' (0x%04x). Will overwrite\n", From rdreier at cisco.com Wed May 3 17:04:17 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 03 May 2006 17:04:17 -0700 Subject: [openib-general] Re: [openfabrics-ewg] IPoIB ifconfig HWaddr blank on RHEL4 U3? In-Reply-To: (Scott Weitzenkamp's message of "Wed, 3 May 2006 16:49:55 -0700") References: Message-ID: Scott> OFED 1.0 rc3 on RHEL4 U3. IPoIB is working, but I just Scott> noticed the HWaddr is Scott> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-0, shouldn't this Scott> have the GID? This is basically the same bug as the tcpdump problem on RHEL kernels I think. "ip addr" should show it properly. - R. From sweitzen at cisco.com Wed May 3 17:09:02 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 3 May 2006 17:09:02 -0700 Subject: [openib-general] RE: [openfabrics-ewg] Current OFED kernel snapshot Message-ID: > > > Known issues: > > > 1. ipath installation fails on 2.6.9 - 2.6.11* kernels > > > 2. OSU MPI compilation fails on SLES10, PPC64 > > > 3. SRP is not supported on 2.6.9 - 2.6.13* kernels - Ishai > > will follow up with details > > > 4. Open MPI RPM build process fails - Jeff, will you be > > able to send us fixes by Wed? > > Do we have any progress on the MPI and SRP issues? I opened bug #49 regarding OSU MPI not compiling on PPC64, it's assigned to the default owner huanwei at cse.ohio-state.edu. http://openib.org/bugzilla/show_bug.cgi?id=49 Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems From sashak at voltaire.com Wed May 3 17:26:25 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 4 May 2006 03:26:25 +0300 Subject: [openib-general] Re: [PATCH 2/4] opensm: remove unused osm_pkey_mgr_t object In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAD5@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAD5@mtlexch01.mtl.com> Message-ID: <20060504002625.GD3689@sashak.voltaire.com> Hello Eitan, On 11:41 Tue 02 May , Eitan Zahavi wrote: > > I really do not like this patch. I think that although it does not break > the code TODAY, it will be reversed later. > OpenSM uses the concept of "manager" for each of the algorithms used. > One could claim that all these managers are redundant and could be > replaced by an extension to the osm object. This is true but will result > with a non clear boundary between the managers. "manager" concept is fine, but I don't see how useless structure should help in implementing this. OTOH there are tons of duplications and unnecessary code in OpenSM today - we need to improve this. > Although there is no right or wrong on this kind of issues, I think that > the winning argument is that today OpenSM is written according to the > above simple rule. Hmm, so what is your argument - "it is so now, don't change it"? But the goal is to improve things. Sasha. From sashak at voltaire.com Wed May 3 17:51:33 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 4 May 2006 03:51:33 +0300 Subject: [openib-general] Re: [PATCH 4/4] opensm: no need to wait for pkey_mgr In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAD7@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAD7@mtlexch01.mtl.com> Message-ID: <20060504005133.GE3689@sashak.voltaire.com> Hello Eitan, On 11:55 Tue 02 May , Eitan Zahavi wrote: > > This patch breaks the basic concept for OpenSM to report "SUBNET UP" > only if all SubnMgt.Set were sent successfully. I do not think we want > to remove this feature. This patch does not break the basic concept (or just in case if there are bugs). Instead, pkey updater after sending all set requests will let for the next OpenSM resweep component to run (specifically it is LID assignment manager). Then final WAIT state will "collect" responses from both components. I think my subject was unclear, and better name would be "parallel pkey and lid managers execution" or like this. Of course such "parallelism" is possible only with "independent" components (may be more than two BTW), where functionality of one component does not depend on results of other component's work. So actual question with this patch is could "pkey manager" and "lid assignment" run in parallel? Sasha. From sns_parking_henz at tiger.livedoor.com Wed May 3 19:05:06 2006 From: sns_parking_henz at tiger.livedoor.com (=?iso-2022-jp?B?GyRCJUElYyE8JUglaSVzJS0lcyUwGyhC?=) Date: Wed, 3 May 2006 19:05:06 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?RncbJEIhJyUiJUAlayVIJTMbKEI=?= =?iso-2022-jp?b?GyRCJV8lSyVGJSMhSiQrJCokaiFLJDUkcyQrJGkkTiQqGyhC?= =?iso-2022-jp?b?GyRCTTYkJCVhITwlaxsoQg==?= Message-ID: <20060504020506.7596D2283DC@openib.ca.sandia.gov> openib-general様へ認証メールが届いています。 下記アドレスより記載の上ご確認ください http://hanabira.org/c/new_p.cgi?ix13a リニューアルされたsnsでは全て無料で提供していますが、フリーコミニティとアダルトに分かれています、どちらも登録から使用まで完全無料でお使いになれますが 18禁コミニティの方ではイベントや模様仕事がコンスタントに行われます。 (乱交イベント・アダルトイベント・¥助・逆¥…フェチ画像投稿)などは一覧のイベントの一種となります。 イベント企画や情報は随時更新しますので参加される場合などはメールを頂けると抽選を行いたいと思います。 http://hanabira.org/e/new_p.cgi?ix13a 18禁コミニティ入場!! まずは本日認証メールを頂いてる【綾音】さんの確認をお願いします。 年齢・地域・アダルトイベント情報など詳細はプロフィールとしてご覧になることができます。 認証メールには期限がありますので、期日までに確認いただけますようお願い致します。 From krkumar2 at in.ibm.com Wed May 3 22:35:11 2006 From: krkumar2 at in.ibm.com (Krishna Kumar2) Date: Thu, 4 May 2006 11:05:11 +0530 Subject: [openib-general] [RFC] [PATCH] ib_unregister_client() Message-ID: Whenever a client is started, it registers with each device present on the system. Every client is always registered exactly once (RFC part here - so far no client registers itself twice but should we care for such a scenario happening in future?), hence there is one context per client and ib_unregister_client() can break on matching that client. diff -ruNp a/core/device.c b/core/device.c --- a/core/device.c 2006-05-04 10:49:03.000000000 +0530 +++ b/core/device.c 2006-05-04 10:51:50.000000000 +0530 @@ -349,6 +349,7 @@ void ib_unregister_client(struct ib_clie if (context->client == client) { list_del(&context->list); kfree(context); + break; } spin_unlock_irqrestore(&device->client_data_lock, flags); } Thanks, - KK -------------- next part -------------- A non-text attachment was scrubbed... Name: unregister_client.patch Type: application/octet-stream Size: 388 bytes Desc: not available URL: From eitan at mellanox.co.il Wed May 3 23:01:17 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 4 May 2006 09:01:17 +0300 Subject: [openib-general] RE: [PATCH 2/4] opensm: remove unused osm_pkey_mgr_t object Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAE9@mtlexch01.mtl.com> Hi Sasha, I think changing the basic concept of a "manager" in OpenSM is not just a cleanup issue. I am for improving the code - but not for breaking its basic architecture. If you find dead code or unused code - let's fix it. But please try to keep the "structure" untouched. I have a many ideas for how OpenSM could be re-written in a better way too. (Like avoiding SA code duplication by using C++ or C virtual functions) but I do not think it is a small change - but rather a big one (actually a re-write). One day we might decide a re-write of the SM is required but this should not be taken lightly as it would probably take a significant effort and a few years to get back to the current status. Eitan Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Thursday, May 04, 2006 3:26 AM > To: Eitan Zahavi > Cc: Hal Rosenstock; openib-general at openib.org; Ofer Gigi; Yael Kalka > Subject: Re: [PATCH 2/4] opensm: remove unused osm_pkey_mgr_t object > > Hello Eitan, > > On 11:41 Tue 02 May , Eitan Zahavi wrote: > > > > I really do not like this patch. I think that although it does not break > > the code TODAY, it will be reversed later. > > OpenSM uses the concept of "manager" for each of the algorithms used. > > One could claim that all these managers are redundant and could be > > replaced by an extension to the osm object. This is true but will result > > with a non clear boundary between the managers. > > "manager" concept is fine, but I don't see how useless structure should > help in implementing this. OTOH there are tons of duplications and > unnecessary code in OpenSM today - we need to improve this. > > > Although there is no right or wrong on this kind of issues, I think that > > the winning argument is that today OpenSM is written according to the > > above simple rule. > > Hmm, so what is your argument - "it is so now, don't change it"? But the > goal is to improve things. > > Sasha. From eitan at mellanox.co.il Wed May 3 23:11:16 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 4 May 2006 09:11:16 +0300 Subject: [openib-general] RE: [PATCH 4/4] opensm: no need to wait for pkey_mgr Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAEA@mtlexch01.mtl.com> Hi Sasha, So how would the partition manager signal to the LID manager that some transaction were made and needed to be waited on? Normally a manager reports back the need for waiting in its return code that is assigned as the next signal to the state machine. So if this signal is not provided from the partition manager (which by your patch returns void) how would the state machine actually wait for it if the LID manager returns OSM_SIGNAL_DONE ? Although this can be fixed by forwarding the result of the partition manager to affect the LID manager next signal calculation I would rather not do it. Keeping the state machine as "linear" as possible have great merits in avoiding extra complexity and bugs. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Thursday, May 04, 2006 3:52 AM > To: Eitan Zahavi > Cc: Hal Rosenstock; openib-general at openib.org; Ofer Gigi; Yael Kalka > Subject: Re: [PATCH 4/4] opensm: no need to wait for pkey_mgr > > Hello Eitan, > > On 11:55 Tue 02 May , Eitan Zahavi wrote: > > > > This patch breaks the basic concept for OpenSM to report "SUBNET UP" > > only if all SubnMgt.Set were sent successfully. I do not think we want > > to remove this feature. > > This patch does not break the basic concept (or just in case if there > are bugs). Instead, pkey updater after sending all set requests will let > for the next OpenSM resweep component to run (specifically it is LID > assignment manager). Then final WAIT state will "collect" responses from > both components. > > I think my subject was unclear, and better name would be "parallel pkey > and lid managers execution" or like this. > > Of course such "parallelism" is possible only with "independent" > components (may be more than two BTW), where functionality of one > component does not depend on results of other component's work. > > So actual question with this patch is could "pkey manager" and "lid > assignment" run in parallel? > > Sasha. From bugzilla-daemon at openib.org Thu May 4 01:15:09 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 4 May 2006 01:15:09 -0700 (PDT) Subject: [openib-general] [Bug 36] enabling the rdma_ucm causes kernel oops in the host Message-ID: <20060504081509.76B8A2283D7@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=36 dotanb at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Thu May 4 01:15:29 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 4 May 2006 01:15:29 -0700 (PDT) Subject: [openib-general] [Bug 36] enabling the rdma_ucm causes kernel oops in the host Message-ID: <20060504081529.C499422854A@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=36 dotanb at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Thu May 4 01:16:16 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 4 May 2006 01:16:16 -0700 (PDT) Subject: [openib-general] [Bug 36] enabling the rdma_ucm causes kernel oops in the host Message-ID: <20060504081616.19F4B2283D7@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=36 ------- Additional Comments From dotanb at mellanox.co.il 2006-05-04 01:16 ------- issue was closed, the name of the workqueue was too long (and causes kernel oops in older kernels < 2.6.12). ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tziporet at mellanox.co.il Thu May 4 03:14:30 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 04 May 2006 13:14:30 +0300 Subject: [openib-general] sdp code in trunk In-Reply-To: <1146678659.4719.29930.camel@hal.voltaire.com> References: <20060501153555.GK3032@mellanox.co.il> <1146678659.4719.29930.camel@hal.voltaire.com> Message-ID: <4459D406.4070109@mellanox.co.il> Hal Rosenstock wrote: > Just wondering what the rush was to replace the existing trunk code with > some less capable and more buggy ? Should this have waited until that > was not the case ? > > > The new code is going to be part of OFED 1.0 thus I think it is important that everybody will review and report issues against it. Old code is still available and any one can use it if needed. Note also that while the previous code was 16074 lines the new code is only 2033 and its much cleaner and written in Linux standards. Tziporet From tziporet at mellanox.co.il Thu May 4 03:16:53 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 04 May 2006 13:16:53 +0300 Subject: [openib-general] Do we want the change of not using libsysfs in OFED? In-Reply-To: References: <44562B46.6010304@mellanox.co.il> Message-ID: <4459D495.3060202@mellanox.co.il> Roland Dreier wrote: > Actually the trunk still uses libsysfs, it just uses it less. There > should be no functional change, but of course there's always the > chance of regression. So there's no strong reason to merge the > changes from the trunk. > Thus we will not take the new changes to OFED unless you will put the kernel changes into git for 2.6.17. Tziporet From halr at voltaire.com Thu May 4 03:17:06 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 04 May 2006 06:17:06 -0400 Subject: [openib-general] sdp code in trunk In-Reply-To: <4459D406.4070109@mellanox.co.il> References: <20060501153555.GK3032@mellanox.co.il> <1146678659.4719.29930.camel@hal.voltaire.com> <4459D406.4070109@mellanox.co.il> Message-ID: <1146737826.4719.40931.camel@hal.voltaire.com> On Thu, 2006-05-04 at 06:14, Tziporet Koren wrote: > Hal Rosenstock wrote: > > Just wondering what the rush was to replace the existing trunk code with > > some less capable and more buggy ? Should this have waited until that > > was not the case ? > > > > > > > The new code is going to be part of OFED 1.0 Is it in the upcoming OFED rc ? > thus I think it is > important that everybody will review and report issues against it. It could have been done the other way 'round as well with the new SDP on a new branch as other ULPs have done prior to being ready for the trunk and all the same goals that you mention accomplished. > Old code is still available and any one can use it if needed. > Note also that while the previous code was 16074 lines the new code is > only 2033 and its much cleaner and written in Linux standards. Not exactly a fair comparison until the features are the same. I am confident there will be fewer LOC though. -- Hal > Tziporet > > > From wahlpoorni at osotspa.com Thu May 4 03:30:20 2006 From: wahlpoorni at osotspa.com (Poornima Wahl) Date: Thu, 4 May 2006 03:30:20 -0700 Subject: [openib-general] Re: ludaj news Message-ID: <000001c66f65$bebac8d0$b6d0a8c0@oht27> De q ar Home Ow h ne c r , Your c w redi m t doesn't matter to us ! If you OW h N real e c st q at l e and want I a MMED d IAT o E c t ash to s h pen d d ANY way you like, or simply wish to L h OWER your monthly p g ayme c nts by a third or more, here are the deal r s we have T n ODA e Y : $ 4 p 88 , 000 at a 3 h , 67% fi i xed - ra m te $ 37 a 2 , 000 at a 3 , 9 r 0% v n aria u ble - rat e e $ 49 h 2 , 000 at a 3 , 2 i 1% i k ntere m st - only $ 24 v 8 , 000 at a 3 , 3 l 6% f q ixed - rat x e $ 19 b 8 , 000 at a 3 , n 55% va m riable - rat t e Hurr r y, when these dea y Is are gone, they are gone ! Don't worry about app e rova f l, your c z redi y t will not dis g qualif b y you ! V h isi z t our sit h e Sincerely, Poornima Wahl A p ppro l val Manager -------------- next part -------------- An HTML attachment was scrubbed... URL: From glebn at voltaire.com Thu May 4 04:47:15 2006 From: glebn at voltaire.com (Gleb Natapov) Date: Thu, 4 May 2006 14:47:15 +0300 Subject: [openib-general] [RFC][PATCH] adding call to madvise Message-ID: <20060504114715.GE5319@minantech.com> Hello Roland, Included patch adds call to madvise(MADV_DO[NT]FORK) to libibverbs and libmthca. In libibverbs it uses memory.c to do reference counting on overlapping user registrations and in libmthca it marks all internal qp/cq memory. The MADV_DOFORK/MADV_DONTFORK defines not yet propagate to libc so I added them in local header files just to be able to compile. I think the proper way to handle this is in configure. Suggestions are welcome. Note that this patch also changes ABI since struct ibv_mr is bigger now. Index: libibverbs/include/infiniband/verbs.h =================================================================== --- libibverbs/include/infiniband/verbs.h (revision 6750) +++ libibverbs/include/infiniband/verbs.h (working copy) @@ -289,6 +289,8 @@ struct ibv_mr { uint32_t handle; uint32_t lkey; uint32_t rkey; + void *addr; + size_t length; }; struct ibv_global_route { Index: libibverbs/src/verbs.c =================================================================== --- libibverbs/src/verbs.c (revision 6750) +++ libibverbs/src/verbs.c (working copy) @@ -154,10 +154,15 @@ struct ibv_mr *ibv_reg_mr(struct ibv_pd { struct ibv_mr *mr; + ibv_dontfork_range(addr, length); mr = pd->context->ops.reg_mr(pd, addr, length, access); if (mr) { mr->context = pd->context; mr->pd = pd; + mr->addr = addr; + mr->length = length; + } else { + ibv_dofork_range(addr, length); } return mr; @@ -165,7 +170,12 @@ struct ibv_mr *ibv_reg_mr(struct ibv_pd int ibv_dereg_mr(struct ibv_mr *mr) { - return mr->context->ops.dereg_mr(mr); + int rc = mr->context->ops.dereg_mr(mr); + + if (!rc) + ibv_dofork_range(mr->addr, mr->length); + + return rc; } static struct ibv_comp_channel *ibv_create_comp_channel_v2(struct ibv_context *context) Index: libibverbs/src/ibverbs.h =================================================================== --- libibverbs/src/ibverbs.h (revision 6750) +++ libibverbs/src/ibverbs.h (working copy) @@ -61,8 +61,8 @@ extern HIDDEN int abi_ver; extern HIDDEN int ibverbs_init(struct ibv_device ***list); extern HIDDEN int ibv_init_mem_map(void); -extern HIDDEN int ibv_lock_range(void *base, size_t size); -extern HIDDEN int ibv_unlock_range(void *base, size_t size); +extern HIDDEN int ibv_dontfork_range(void *base, size_t size); +extern HIDDEN int ibv_dofork_range(void *base, size_t size); #define IBV_INIT_CMD(cmd, size, opcode) \ do { \ @@ -85,4 +85,11 @@ extern HIDDEN int ibv_unlock_range(void (cmd)->response = (uintptr_t) (out); \ } while (0) +#ifndef MADV_DONTFORK +#define MADV_DONTFORK 10 +#endif +#ifndef MADV_DOFORK +#define MADV_DOFORK 11 +#endif + #endif /* IB_VERBS_H */ Index: libibverbs/src/memory.c =================================================================== --- libibverbs/src/memory.c (revision 6750) +++ libibverbs/src/memory.c (working copy) @@ -136,7 +136,7 @@ static void __mm_remove(struct ibv_mem_n node->next->prev = node->prev; } -int ibv_lock_range(void *base, size_t size) +int ibv_dontfork_range(void *base, size_t size) { uintptr_t start, end; struct ibv_mem_node *node, *tmp; @@ -187,8 +187,8 @@ int ibv_lock_range(void *base, size_t si if (node->refcnt++ == 0) { - ret = mlock((void *) node->start, - node->end - node->start + 1); + ret = madvise((void *) node->start, + node->end - node->start + 1, MADV_DONTFORK); if (ret) goto out; } @@ -202,7 +202,7 @@ out: return ret; } -int ibv_unlock_range(void *base, size_t size) +int ibv_dofork_range(void *base, size_t size) { uintptr_t start, end; struct ibv_mem_node *node, *tmp; @@ -226,8 +226,8 @@ int ibv_unlock_range(void *base, size_t while (node && node->end <= end) { if (--node->refcnt == 0) { - ret = munlock((void *) node->start, - node->end - node->start + 1); + ret = madvise((void *) node->start, + node->end - node->start + 1, MADV_DOFORK); } if (__mm_prev(node) && node->refcnt == __mm_prev(node)->refcnt) { Index: libmthca/src/mthca.h =================================================================== --- libmthca/src/mthca.h (revision 6750) +++ libmthca/src/mthca.h (working copy) @@ -341,4 +341,10 @@ void mthca_free_av(struct mthca_ah *ah); int mthca_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); int mthca_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); +#ifndef MADV_DONTFORK +#define MADV_DONTFORK 10 +#endif +#ifndef MADV_DOFORK +#define MADV_DOFORK 11 +#endif #endif /* MTHCA_H */ Index: libmthca/src/verbs.c =================================================================== --- libmthca/src/verbs.c (revision 6750) +++ libmthca/src/verbs.c (working copy) @@ -134,6 +134,9 @@ static struct ibv_mr *__mthca_reg_mr(str return NULL; } + mr->addr = addr; + mr->length = length; + return mr; } @@ -188,6 +191,7 @@ struct ibv_cq *mthca_create_cq(struct ib if (!cq->buf) goto err; + madvise(cq->buf, cqe * MTHCA_CQ_ENTRY_SIZE, MADV_DONTFORK); cq->mr = __mthca_reg_mr(to_mctx(context)->pd, cq->buf, cqe * MTHCA_CQ_ENTRY_SIZE, 0, IBV_ACCESS_LOCAL_WRITE); @@ -247,6 +251,7 @@ err_unreg: mthca_dereg_mr(cq->mr); err_buf: + madvise(cq->buf, cqe * MTHCA_CQ_ENTRY_SIZE, MADV_DOFORK); free(cq->buf); err: @@ -278,6 +283,7 @@ int mthca_resize_cq(struct ibv_cq *ibcq, goto out; } + madvise(buf, cqe * MTHCA_CQ_ENTRY_SIZE, MADV_DONTFORK); mr = __mthca_reg_mr(to_mctx(ibcq->context)->pd, buf, cqe * MTHCA_CQ_ENTRY_SIZE, 0, IBV_ACCESS_LOCAL_WRITE); @@ -302,12 +308,14 @@ int mthca_resize_cq(struct ibv_cq *ibcq, mthca_cq_resize_copy_cqes(cq, buf, old_cqe); mthca_dereg_mr(cq->mr); + madvise(cq->mr->addr, cq->mr->length, MADV_DOFORK); free(cq->buf); cq->buf = buf; cq->mr = mr; out: + madvise(buf, cqe * MTHCA_CQ_ENTRY_SIZE, MADV_DOFORK); pthread_spin_unlock(&cq->lock); return ret; } @@ -328,6 +336,7 @@ int mthca_destroy_cq(struct ibv_cq *cq) } mthca_dereg_mr(to_mcq(cq)->mr); + madvise(to_mcq(cq)->mr->addr, to_mcq(cq)->mr->length, MADV_DOFORK); free(to_mcq(cq)->buf); free(to_mcq(cq)); @@ -381,6 +390,7 @@ struct ibv_srq *mthca_create_srq(struct if (mthca_alloc_srq_buf(pd, &attr->attr, srq)) goto err; + madvise(srq->buf, srq->buf_size, MADV_DONTFORK); srq->mr = __mthca_reg_mr(pd, srq->buf, srq->buf_size, 0, 0); if (!srq->mr) goto err_free; @@ -421,6 +431,7 @@ err_unreg: mthca_dereg_mr(srq->mr); err_free: + madvise(srq->buf, srq->buf_size, MADV_DOFORK); free(srq->wrid); free(srq->buf); @@ -460,6 +471,7 @@ int mthca_destroy_srq(struct ibv_srq *sr to_msrq(srq)->db_index); mthca_dereg_mr(to_msrq(srq)->mr); + madvise(to_msrq(srq)->mr->addr, to_msrq(srq)->mr->length, MADV_DOFORK); free(to_msrq(srq)->buf); free(to_msrq(srq)->wrid); @@ -499,6 +511,7 @@ struct ibv_qp *mthca_create_qp(struct ib pthread_spin_init(&qp->rq.lock, PTHREAD_PROCESS_PRIVATE)) goto err_free; + madvise(qp->buf, qp->buf_size, MADV_DONTFORK); qp->mr = __mthca_reg_mr(pd, qp->buf, qp->buf_size, 0, 0); if (!qp->mr) goto err_free; @@ -565,6 +578,7 @@ err_unreg: mthca_dereg_mr(qp->mr); err_free: + madvise(qp->buf, qp->buf_size, MADV_DOFORK); free(qp->wrid); free(qp->buf); @@ -647,6 +661,7 @@ int mthca_destroy_qp(struct ibv_qp *qp) } mthca_dereg_mr(to_mqp(qp)->mr); + madvise(to_mqp(qp)->mr->addr, to_mqp(qp)->mr->length, MADV_DOFORK); free(to_mqp(qp)->buf); free(to_mqp(qp)->wrid); Index: libmthca/src/ah.c =================================================================== --- libmthca/src/ah.c (revision 6750) +++ libmthca/src/ah.c (working copy) @@ -64,8 +64,10 @@ static struct mthca_ah_page *__add_page( return NULL; } + madvise(page->buf, page_size, MADV_DONTFORK); page->mr = mthca_reg_mr(&pd->ibv_pd, page->buf, page_size, 0); if (!page->mr) { + madvise(page->buf, page_size, MADV_DOFORK); free(page->buf); free(page); return NULL; @@ -183,6 +185,7 @@ void mthca_free_av(struct mthca_ah *ah) page->next->prev = page->prev; mthca_dereg_mr(page->mr); + madvise(page->mr->addr, page->mr->length, MADV_DOFORK); free(page->buf); free(page); } -- Gleb. From monis at voltaire.com Thu May 4 04:59:19 2006 From: monis at voltaire.com (Moni Shoua) Date: Thu, 04 May 2006 14:59:19 +0300 Subject: [openib-general] Re: [openfabrics-ewg] IPoIB ifconfig HWaddr blank on RHEL4 U3? In-Reply-To: References: Message-ID: <4459EC97.6050309@voltaire.com> An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Thu May 4 06:00:52 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 04 May 2006 16:00:52 +0300 Subject: [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbs interaction In-Reply-To: <445606D5.20907@voltaire.com> References: <445606D5.20907@voltaire.com> Message-ID: <4459FB04.3090302@voltaire.com> Or Gerlitz wrote: > Sean Hefty wrote: >>> +static void iser_disconnected_handler(struct rdma_cm_id *cma_id) >>> +{ >>> + struct iser_conn *ib_conn; >>> + >>> + ib_conn = (struct iser_conn *)cma_id->context; >>> + ib_conn->disc_evt_flag = 1; >>> + >>> + /* If this event is unsolicited this means that the conn is >>> being */ >>> + /* terminated asynchronously from the iSCSI layer's >>> perspective. */ >>> + if (atomic_read(&ib_conn->state) == ISER_CONN_PENDING) { >>> + atomic_set(&ib_conn->state, ISER_CONN_DOWN); >>> + wake_up_interruptible(&ib_conn->wait); >>> + } else { >>> + if (atomic_read(&ib_conn->state) == ISER_CONN_UP) { >>> + atomic_set(&ib_conn->state, ISER_CONN_TERMINATING); >>> + iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, >>> + ISCSI_ERR_CONN_FAILED); >>> + } >>> + /* Complete the termination process if no posts are pending */ >>> + if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) && >>> + (atomic_read(&ib_conn->post_send_buf_count) == 0)) { >>> + atomic_set(&ib_conn->state, ISER_CONN_DOWN); >>> + wake_up_interruptible(&ib_conn->wait); >>> + } >>> + } >> Are there races here between reading ib_conn->state and setting it? >> Could it have changed in between the atomic_read() and atomic_set()? > It seems that indeed a race is possible here, i am rethinking now on the > implementation of the ib connection states moves, thanks for pointing this. Following a review and the clarification i have got from you re cma callbacks serialization, i have committed this change which removes unneeded state checks from two flows (disconnect handler and connect error) Or. r6900 | ogerlitz | 2006-05-04 11:06:24 +0300 (Thu, 04 May 2006) | 7 lines two fixes to iser ib conn state management: +1 when getting DISCONNECTED cma event, iser's state can't be PENDING +2 when connect_error is called, iser's state is PENDING, no need to check it Signed-off-by: Or Gerlitz context; - if (atomic_read(&ib_conn->state) == ISER_CONN_PENDING) { - atomic_set(&ib_conn->state, ISER_CONN_DOWN); - wake_up_interruptible(&ib_conn->wait); - } else - iser_err("Unexpected evt for conn.state: %d\n", - atomic_read(&ib_conn->state)); + atomic_set(&ib_conn->state, ISER_CONN_DOWN); + wake_up_interruptible(&ib_conn->wait); } static void iser_addr_handler(struct rdma_cm_id *cma_id) @@ -386,21 +382,16 @@ static void iser_disconnected_handler(st /* If this event is unsolicited this means that the conn is being */ /* terminated asynchronously from the iSCSI layer's perspective. */ - if (atomic_read(&ib_conn->state) == ISER_CONN_PENDING) { + if (atomic_read(&ib_conn->state) == ISER_CONN_UP) { + atomic_set(&ib_conn->state, ISER_CONN_TERMINATING); + iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, + ISCSI_ERR_CONN_FAILED); + } + /* Complete the termination process if no posts are pending */ + if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) && + (atomic_read(&ib_conn->post_send_buf_count) == 0)) { atomic_set(&ib_conn->state, ISER_CONN_DOWN); wake_up_interruptible(&ib_conn->wait); - } else { - if (atomic_read(&ib_conn->state) == ISER_CONN_UP) { - atomic_set(&ib_conn->state, ISER_CONN_TERMINATING); - iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, - ISCSI_ERR_CONN_FAILED); - } - /* Complete the termination process if no posts are pending */ - if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) && - (atomic_read(&ib_conn->post_send_buf_count) == 0)) { - atomic_set(&ib_conn->state, ISER_CONN_DOWN); - wake_up_interruptible(&ib_conn->wait); - } } } From ogerlitz at voltaire.com Thu May 4 06:06:09 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 04 May 2006 16:06:09 +0300 Subject: [openib-general] [PATCH 5/6] iser RDMA CM (CMA) and IB verbs interaction In-Reply-To: <4459FB04.3090302@voltaire.com> References: <445606D5.20907@voltaire.com> <4459FB04.3090302@voltaire.com> Message-ID: <4459FC41.3070700@voltaire.com> Or Gerlitz wrote: > Or Gerlitz wrote: >> Sean Hefty wrote: >>>> +static void iser_disconnected_handler(struct rdma_cm_id *cma_id) >>>> +{ >>>> + struct iser_conn *ib_conn; >>>> + >>>> + ib_conn = (struct iser_conn *)cma_id->context; >>>> + ib_conn->disc_evt_flag = 1; >>>> + >>>> + /* If this event is unsolicited this means that the conn is >>>> being */ >>>> + /* terminated asynchronously from the iSCSI layer's >>>> perspective. */ >>>> + if (atomic_read(&ib_conn->state) == ISER_CONN_PENDING) { >>>> + atomic_set(&ib_conn->state, ISER_CONN_DOWN); >>>> + wake_up_interruptible(&ib_conn->wait); >>>> + } else { >>>> + if (atomic_read(&ib_conn->state) == ISER_CONN_UP) { >>>> + atomic_set(&ib_conn->state, ISER_CONN_TERMINATING); >>>> + iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, >>>> + ISCSI_ERR_CONN_FAILED); >>>> + } >>>> + /* Complete the termination process if no posts are pending */ >>>> + if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) && >>>> + (atomic_read(&ib_conn->post_send_buf_count) == 0)) { >>>> + atomic_set(&ib_conn->state, ISER_CONN_DOWN); >>>> + wake_up_interruptible(&ib_conn->wait); >>>> + } >>>> + } > >>> Are there races here between reading ib_conn->state and setting it? >>> Could it have changed in between the atomic_read() and atomic_set()? >> It seems that indeed a race is possible here, i am rethinking now on >> the implementation of the ib connection states moves, thanks for >> pointing this. > Following a review and the clarification i have got from you re cma > callbacks serialization, i have committed this change which removes > unneeded state checks from two flows (disconnect handler and connect error) This is the actual fix to the possible races you were pointing on, thanks for your feedback. Or. r6924 | ogerlitz | 2006-05-04 16:03:21 +0300 (Thu, 04 May 2006) | 7 lines changed iser ib conn state management to be done with an int variable keeping the state and a lock. When a related race is possible the lock is used to check (comp) or change (comp_exch) the state. When no race can happen the state is just examined or changed. Signed-off-by: Or Gerlitz lock); + ret = (ib_conn->state == comp); + spin_unlock_bh(&ib_conn->lock); + return ret; +} + +static int iser_conn_state_comp_exch(struct iser_conn *ib_conn, + enum iser_ib_conn_state comp, + enum iser_ib_conn_state exch) +{ + int ret; + + spin_lock_bh(&ib_conn->lock); + if ((ret = (ib_conn->state == comp))) + ib_conn->state = exch; + spin_unlock_bh(&ib_conn->lock); + return ret; +} + /** * triggers start of the disconnect procedures and wait for them to be done */ @@ -294,12 +318,17 @@ void iser_conn_terminate(struct iser_con { int err = 0; - atomic_set(&ib_conn->state, ISER_CONN_TERMINATING); - err = rdma_disconnect(ib_conn->cma_id); - if (err) - iser_bug("Failed to disconnect, conn: 0x%p err %d\n",ib_conn,err); + if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, + ISER_CONN_TERMINATING)) { + err = rdma_disconnect(ib_conn->cma_id); + if (err) + iser_err("Failed to disconnect, conn: 0x%p err %d\n", + ib_conn,err); + + } + wait_event_interruptible(ib_conn->wait, - (atomic_read(&ib_conn->state) == ISER_CONN_DOWN)); + ib_conn->state == ISER_CONN_DOWN); iser_conn_release(ib_conn); } @@ -309,7 +338,7 @@ static void iser_connect_error(struct rd struct iser_conn *ib_conn; ib_conn = (struct iser_conn *)cma_id->context; - atomic_set(&ib_conn->state, ISER_CONN_DOWN); + ib_conn->state = ISER_CONN_DOWN; wake_up_interruptible(&ib_conn->wait); } @@ -369,7 +398,7 @@ static void iser_connected_handler(struc struct iser_conn *ib_conn; ib_conn = (struct iser_conn *)cma_id->context; - atomic_set(&ib_conn->state, ISER_CONN_UP); + ib_conn->state = ISER_CONN_UP; wake_up_interruptible(&ib_conn->wait); } @@ -380,17 +409,17 @@ static void iser_disconnected_handler(st ib_conn = (struct iser_conn *)cma_id->context; ib_conn->disc_evt_flag = 1; - /* If this event is unsolicited this means that the conn is being */ - /* terminated asynchronously from the iSCSI layer's perspective. */ - if (atomic_read(&ib_conn->state) == ISER_CONN_UP) { - atomic_set(&ib_conn->state, ISER_CONN_TERMINATING); + /* getting here when the state is UP means that the conn is being * + * terminated asynchronously from the iSCSI layer's perspective. */ + if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, + ISER_CONN_TERMINATING)) iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, ISCSI_ERR_CONN_FAILED); - } + /* Complete the termination process if no posts are pending */ if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) && (atomic_read(&ib_conn->post_send_buf_count) == 0)) { - atomic_set(&ib_conn->state, ISER_CONN_DOWN); + ib_conn->state = ISER_CONN_DOWN; wake_up_interruptible(&ib_conn->wait); } } @@ -444,13 +473,14 @@ int iser_conn_init(struct iser_conn **ib iser_err("can't alloc memory for struct iser_conn\n"); return -ENOMEM; } - atomic_set(&ib_conn->state, ISER_CONN_INIT); + ib_conn->state = ISER_CONN_INIT; init_waitqueue_head(&ib_conn->wait); atomic_set(&ib_conn->post_recv_buf_count, 0); atomic_set(&ib_conn->post_send_buf_count, 0); INIT_WORK(&ib_conn->comperror_work, iser_comp_error_worker, ib_conn); INIT_LIST_HEAD(&ib_conn->conn_list); + spin_lock_init(&ib_conn->lock); *ibconn = ib_conn; return 0; @@ -477,7 +507,7 @@ int iser_connect(struct iser_conn *ib_ iser_err("connecting to: %d.%d.%d.%d, port 0x%x\n", NIPQUAD(dst_addr->sin_addr), dst_addr->sin_port); - atomic_set(&ib_conn->state, ISER_CONN_PENDING); + ib_conn->state = ISER_CONN_PENDING; ib_conn->cma_id = rdma_create_id(iser_cma_handler, (void *)ib_conn, @@ -498,9 +528,9 @@ int iser_connect(struct iser_conn *ib_ if (!non_blocking) { wait_event_interruptible(ib_conn->wait, - atomic_read(&ib_conn->state) != ISER_CONN_PENDING); + (ib_conn->state != ISER_CONN_PENDING)); - if (atomic_read(&ib_conn->state) != ISER_CONN_UP) { + if (ib_conn->state != ISER_CONN_UP) { err = -EIO; goto connect_failure; } @@ -514,7 +544,7 @@ int iser_connect(struct iser_conn *ib_ id_failure: ib_conn->cma_id = NULL; addr_failure: - atomic_set(&ib_conn->state, ISER_CONN_DOWN); + ib_conn->state = ISER_CONN_DOWN; connect_failure: iser_conn_release(ib_conn); return err; @@ -527,7 +557,7 @@ void iser_conn_release(struct iser_conn { struct iser_device *device = ib_conn->device; - BUG_ON(atomic_read(&ib_conn->state) != ISER_CONN_DOWN); + BUG_ON(ib_conn->state != ISER_CONN_DOWN); mutex_lock(&ig.connlist_mutex); list_del(&ib_conn->conn_list); @@ -719,16 +749,17 @@ static void iser_comp_error_worker(void { struct iser_conn *ib_conn = data; - if (atomic_read(&ib_conn->state) == ISER_CONN_UP) { - atomic_set(&ib_conn->state, ISER_CONN_TERMINATING); + /* getting here when the state is UP means that the conn is being * + * terminated asynchronously from the iSCSI layer's perspective. */ + if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, + ISER_CONN_TERMINATING)) iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, ISCSI_ERR_CONN_FAILED); - } /* complete the termination process if disconnect event was delivered * * note there are no more non completed posts to the QP */ if (ib_conn->disc_evt_flag) { - atomic_set(&ib_conn->state, ISER_CONN_DOWN); + ib_conn->state = ISER_CONN_DOWN; wake_up_interruptible(&ib_conn->wait); } } Index: iser_initiator.c =================================================================== --- iser_initiator.c (revision 6900) +++ iser_initiator.c (revision 6924) @@ -370,7 +370,7 @@ int iser_send_command(struct iscsi_conn struct iscsi_cmd *hdr = ctask->hdr; struct scsi_cmnd *sc = ctask->sc; - if (atomic_read(&iser_conn->ib_conn->state) != ISER_CONN_UP) { + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); return -EPERM; } @@ -454,7 +454,7 @@ int iser_send_data_out(struct iscsi_conn unsigned int itt; int err = 0; - if (atomic_read(&iser_conn->ib_conn->state) != ISER_CONN_UP) { + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); return -EPERM; } @@ -528,7 +528,7 @@ int iser_send_control(struct iscsi_conn struct iser_regd_buf *regd_buf; struct iser_device *device; - if (atomic_read(&iser_conn->ib_conn->state) != ISER_CONN_UP) { + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); return -EPERM; } Index: iscsi_iser.c =================================================================== --- iscsi_iser.c (revision 6900) +++ iscsi_iser.c (revision 6924) @@ -649,13 +649,13 @@ iscsi_iser_ep_poll(__u64 ep_handle, int return -EINVAL; rc = wait_event_interruptible_timeout(ib_conn->wait, - atomic_read(&ib_conn->state) == ISER_CONN_UP, + ib_conn->state == ISER_CONN_UP, msecs_to_jiffies(timeout_ms)); /* if conn establishment failed, return error code to iscsi */ if (!rc && - (atomic_read(&ib_conn->state) == ISER_CONN_TERMINATING || - atomic_read(&ib_conn->state) == ISER_CONN_DOWN)) + (ib_conn->state == ISER_CONN_TERMINATING || + ib_conn->state == ISER_CONN_DOWN)) rc = -1; iser_err("ib conn %p rc = %d\n", ib_conn, rc); @@ -676,12 +676,9 @@ iscsi_iser_ep_disconnect(__u64 ep_handle if (!ib_conn) return; - iser_err("ib conn %p state %d\n",ib_conn, atomic_read(&ib_conn->state)); + iser_err("ib conn %p state %d\n",ib_conn, ib_conn->state); - if (atomic_read(&ib_conn->state) == ISER_CONN_UP) - iser_conn_terminate(ib_conn); - else - iser_conn_release(ib_conn); + iser_conn_terminate(ib_conn); } static struct scsi_host_template iscsi_iser_sht = { From ogerlitz at voltaire.com Thu May 4 06:09:31 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 4 May 2006 16:09:31 +0300 (IDT) Subject: [openib-general] [PATCH] change iser ib connection state management Message-ID: Sean, This is the patch i just sent to openib and lkml, i see now that the other email had some line wrapping problems, so here's again. Or. r6924 | ogerlitz | 2006-05-04 16:03:21 +0300 (Thu, 04 May 2006) | 7 lines changed iser ib conn state management to be done with an int variable keeping the state and a lock. When a related race is possible the lock is used to check (comp) or change (comp_exch) the state. When no race can happen the state is just examined or changed. Signed-off-by: Or Gerlitz lock); + ret = (ib_conn->state == comp); + spin_unlock_bh(&ib_conn->lock); + return ret; +} + +static int iser_conn_state_comp_exch(struct iser_conn *ib_conn, + enum iser_ib_conn_state comp, + enum iser_ib_conn_state exch) +{ + int ret; + + spin_lock_bh(&ib_conn->lock); + if ((ret = (ib_conn->state == comp))) + ib_conn->state = exch; + spin_unlock_bh(&ib_conn->lock); + return ret; +} + /** * triggers start of the disconnect procedures and wait for them to be done */ @@ -294,12 +318,17 @@ void iser_conn_terminate(struct iser_con { int err = 0; - atomic_set(&ib_conn->state, ISER_CONN_TERMINATING); - err = rdma_disconnect(ib_conn->cma_id); - if (err) - iser_bug("Failed to disconnect, conn: 0x%p err %d\n",ib_conn,err); + if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, + ISER_CONN_TERMINATING)) { + err = rdma_disconnect(ib_conn->cma_id); + if (err) + iser_err("Failed to disconnect, conn: 0x%p err %d\n", + ib_conn,err); + + } + wait_event_interruptible(ib_conn->wait, - (atomic_read(&ib_conn->state) == ISER_CONN_DOWN)); + ib_conn->state == ISER_CONN_DOWN); iser_conn_release(ib_conn); } @@ -309,7 +338,7 @@ static void iser_connect_error(struct rd struct iser_conn *ib_conn; ib_conn = (struct iser_conn *)cma_id->context; - atomic_set(&ib_conn->state, ISER_CONN_DOWN); + ib_conn->state = ISER_CONN_DOWN; wake_up_interruptible(&ib_conn->wait); } @@ -369,7 +398,7 @@ static void iser_connected_handler(struc struct iser_conn *ib_conn; ib_conn = (struct iser_conn *)cma_id->context; - atomic_set(&ib_conn->state, ISER_CONN_UP); + ib_conn->state = ISER_CONN_UP; wake_up_interruptible(&ib_conn->wait); } @@ -380,17 +409,17 @@ static void iser_disconnected_handler(st ib_conn = (struct iser_conn *)cma_id->context; ib_conn->disc_evt_flag = 1; - /* If this event is unsolicited this means that the conn is being */ - /* terminated asynchronously from the iSCSI layer's perspective. */ - if (atomic_read(&ib_conn->state) == ISER_CONN_UP) { - atomic_set(&ib_conn->state, ISER_CONN_TERMINATING); + /* getting here when the state is UP means that the conn is being * + * terminated asynchronously from the iSCSI layer's perspective. */ + if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, + ISER_CONN_TERMINATING)) iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, ISCSI_ERR_CONN_FAILED); - } + /* Complete the termination process if no posts are pending */ if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) && (atomic_read(&ib_conn->post_send_buf_count) == 0)) { - atomic_set(&ib_conn->state, ISER_CONN_DOWN); + ib_conn->state = ISER_CONN_DOWN; wake_up_interruptible(&ib_conn->wait); } } @@ -444,13 +473,14 @@ int iser_conn_init(struct iser_conn **ib iser_err("can't alloc memory for struct iser_conn\n"); return -ENOMEM; } - atomic_set(&ib_conn->state, ISER_CONN_INIT); + ib_conn->state = ISER_CONN_INIT; init_waitqueue_head(&ib_conn->wait); atomic_set(&ib_conn->post_recv_buf_count, 0); atomic_set(&ib_conn->post_send_buf_count, 0); INIT_WORK(&ib_conn->comperror_work, iser_comp_error_worker, ib_conn); INIT_LIST_HEAD(&ib_conn->conn_list); + spin_lock_init(&ib_conn->lock); *ibconn = ib_conn; return 0; @@ -477,7 +507,7 @@ int iser_connect(struct iser_conn *ib_ iser_err("connecting to: %d.%d.%d.%d, port 0x%x\n", NIPQUAD(dst_addr->sin_addr), dst_addr->sin_port); - atomic_set(&ib_conn->state, ISER_CONN_PENDING); + ib_conn->state = ISER_CONN_PENDING; ib_conn->cma_id = rdma_create_id(iser_cma_handler, (void *)ib_conn, @@ -498,9 +528,9 @@ int iser_connect(struct iser_conn *ib_ if (!non_blocking) { wait_event_interruptible(ib_conn->wait, - atomic_read(&ib_conn->state) != ISER_CONN_PENDING); + (ib_conn->state != ISER_CONN_PENDING)); - if (atomic_read(&ib_conn->state) != ISER_CONN_UP) { + if (ib_conn->state != ISER_CONN_UP) { err = -EIO; goto connect_failure; } @@ -514,7 +544,7 @@ int iser_connect(struct iser_conn *ib_ id_failure: ib_conn->cma_id = NULL; addr_failure: - atomic_set(&ib_conn->state, ISER_CONN_DOWN); + ib_conn->state = ISER_CONN_DOWN; connect_failure: iser_conn_release(ib_conn); return err; @@ -527,7 +557,7 @@ void iser_conn_release(struct iser_conn { struct iser_device *device = ib_conn->device; - BUG_ON(atomic_read(&ib_conn->state) != ISER_CONN_DOWN); + BUG_ON(ib_conn->state != ISER_CONN_DOWN); mutex_lock(&ig.connlist_mutex); list_del(&ib_conn->conn_list); @@ -719,16 +749,17 @@ static void iser_comp_error_worker(void { struct iser_conn *ib_conn = data; - if (atomic_read(&ib_conn->state) == ISER_CONN_UP) { - atomic_set(&ib_conn->state, ISER_CONN_TERMINATING); + /* getting here when the state is UP means that the conn is being * + * terminated asynchronously from the iSCSI layer's perspective. */ + if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, + ISER_CONN_TERMINATING)) iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, ISCSI_ERR_CONN_FAILED); - } /* complete the termination process if disconnect event was delivered * * note there are no more non completed posts to the QP */ if (ib_conn->disc_evt_flag) { - atomic_set(&ib_conn->state, ISER_CONN_DOWN); + ib_conn->state = ISER_CONN_DOWN; wake_up_interruptible(&ib_conn->wait); } } Index: iser_initiator.c =================================================================== --- iser_initiator.c (revision 6900) +++ iser_initiator.c (revision 6924) @@ -370,7 +370,7 @@ int iser_send_command(struct iscsi_conn struct iscsi_cmd *hdr = ctask->hdr; struct scsi_cmnd *sc = ctask->sc; - if (atomic_read(&iser_conn->ib_conn->state) != ISER_CONN_UP) { + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); return -EPERM; } @@ -454,7 +454,7 @@ int iser_send_data_out(struct iscsi_conn unsigned int itt; int err = 0; - if (atomic_read(&iser_conn->ib_conn->state) != ISER_CONN_UP) { + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); return -EPERM; } @@ -528,7 +528,7 @@ int iser_send_control(struct iscsi_conn struct iser_regd_buf *regd_buf; struct iser_device *device; - if (atomic_read(&iser_conn->ib_conn->state) != ISER_CONN_UP) { + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); return -EPERM; } Index: iscsi_iser.c =================================================================== --- iscsi_iser.c (revision 6900) +++ iscsi_iser.c (revision 6924) @@ -649,13 +649,13 @@ iscsi_iser_ep_poll(__u64 ep_handle, int return -EINVAL; rc = wait_event_interruptible_timeout(ib_conn->wait, - atomic_read(&ib_conn->state) == ISER_CONN_UP, + ib_conn->state == ISER_CONN_UP, msecs_to_jiffies(timeout_ms)); /* if conn establishment failed, return error code to iscsi */ if (!rc && - (atomic_read(&ib_conn->state) == ISER_CONN_TERMINATING || - atomic_read(&ib_conn->state) == ISER_CONN_DOWN)) + (ib_conn->state == ISER_CONN_TERMINATING || + ib_conn->state == ISER_CONN_DOWN)) rc = -1; iser_err("ib conn %p rc = %d\n", ib_conn, rc); @@ -676,12 +676,9 @@ iscsi_iser_ep_disconnect(__u64 ep_handle if (!ib_conn) return; - iser_err("ib conn %p state %d\n",ib_conn, atomic_read(&ib_conn->state)); + iser_err("ib conn %p state %d\n",ib_conn, ib_conn->state); - if (atomic_read(&ib_conn->state) == ISER_CONN_UP) - iser_conn_terminate(ib_conn); - else - iser_conn_release(ib_conn); + iser_conn_terminate(ib_conn); } static struct scsi_host_template iscsi_iser_sht = { From ishai at mellanox.co.il Thu May 4 06:53:36 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Thu, 4 May 2006 16:53:36 +0300 Subject: [openib-general] SRP: changes to ibsrpdm Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30208D2EE@mtlexch01.mtl.com> Hi, After implementing and submitting the ibsrpdm patches I got the following important remarks: 1) Sometimes there is a need to add a target twice. 2) It it unnecessary complication to the kernel to look for a target in the targets list in order to remove it. 3) There is a conceptual problem in the list_target query - In sysfs each file should report only one value. Before implementing the fixes according to this remarks, I want to see if there are any comments to the changes I'm going to do. I'm going to change ibsrpdm and SRP driver code in the following manner: 1) There is going to be an attribute for a target indicating if it was added by the daemon (Named daemons). Only the daemon should add targets with this attribute set. 2) The kernel will not allow the daemon to add the same target twice. Regular activation of add_target can add multiple instances of the same target. 3) list_target query will be removed. The information will be in several directories in sysfs (One for each target) - Roland, vu, can you send me a pointer that explains which target information can be found in the sysfs today? 4) I'll change the way I've implemented the activation of remove target. There will be a remove target file for each existing target directory in sysfs, echo 1 to this file will remove the corresponding target. 5) The daemon will remove only targets that it added. (Has the daemons attribute set). 6) Adding execution modes to ibsrpdm: a) When activated without flags or with -c flag, ibsrpdm will executes once (has before) and display the targets in the network. b) When activated with -l flag, ibsrpdm will be activated in a loop and display at each cycle the targets that join the network and the targets the leaves the network. c) When activated with -l and -a flags, ibsrpdm will be activated as a daemon that adds targets that join the network. d) When activated with -l and -r flags, ibsrpdm will be activated as a daemon that removes targets that leave the network. e) When activated with -l, -a and -r flags, ibsrpdm will be activated as a daemon that adds targets that join the network, and removes targets that leave the network. The reason we need option b, is because costumers may want to add the targets in a certain order (for binding purposes). Any comments? Ishai Rabinovitz -------------- next part -------------- An HTML attachment was scrubbed... URL: From vishnu at cse.ohio-state.edu Thu May 4 08:07:31 2006 From: vishnu at cse.ohio-state.edu (Abhinav Vishnu) Date: Thu, 4 May 2006 11:07:31 -0400 (EDT) Subject: [openib-general] Re: [mvapich-discuss] RE: [openfabrics-ewg] Current OFED kernel snapshot In-Reply-To: Message-ID: Hello All, Thanks for reporting the compilation problem of MVAPICH on PPC64. I looked at the bugzilla entry #49 at openib.org. The CFLAGS which have been used for compilation are indicated below. ---- D_DDR_ -DCH_GEN2 - ^^^^^ DMEMORY_SCALE -D_AFFINITY_ -g -Wall -D_PCI_EX_ -D_SMALL_CLUSTER -D_SMP_ ^^^^^^^^^^^^^ -D_SMP_RNDV_ - ^^^^^^^^ DVI ADEV_RPUT_SUPPORT -DEARLY_SEND_COMPLETION -DLAZY_MEM_UNREGISTER -D_IA64_ ^^^^^^ The DDR flag is expected to be enabled for DDR mellanox HCAs, similarly PCI_EX is expected to enabled for PCI-Express Based HCAs. Also, starting MVAPICH-0.9.7, for scalability to ultra-scale clusters, we have defined MEMORY_SCALE flag, which is a combination of SRQ and ADAPTIVE_RDMA_FAST_PATH. However, AFAIK, SRQ is not available for PPC64. We would also recommend using -D_PPC64_ as the CFLAG for the architecture. In order to get the optimal performance, we have a unified script for different architectures/platforms which is available in the top directory of MVAPICH; make.mvapich.gen2 and make.mvapich.gen2_multirail. As an example, the flags generated by the script for PPC64 would be: -D_PPC64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE -DVIADEV_RPUT_SUPPORT -DLAZY_MEM_UNREGISTER -DCH_GEN2 -D_SMP_ -D_SMP_RNDV_ -D_PCI_X_ -D_SDR_ We would strongly encourage for this script to be used for compilation on PPC64. In addition, there seems to be an assembler problem, could possibly be a gcc configuration problem? /tmp/ccTRXdQu.s: Assembler messages: /tmp/ccTRXdQu.s:127: Error: Unrecognized opcode: `mf' Please let us know if the problem persists by using the top level make script. Thanks, -- Abhinav ------------------------------- Abhinav Vishnu, Graduate Research Associate, Department Of Comp. Sc. & Engg. The Ohio State University. ------------------------------- On Wed, 3 May 2006, Scott Weitzenkamp (sweitzen) wrote: > > > > Known issues: > > > > 1. ipath installation fails on 2.6.9 - 2.6.11* kernels > > > > 2. OSU MPI compilation fails on SLES10, PPC64 > > > > 3. SRP is not supported on 2.6.9 - 2.6.13* kernels - Ishai > > > will follow up with details > > > > 4. Open MPI RPM build process fails - Jeff, will you be > > > able to send us fixes by Wed? > > > > Do we have any progress on the MPI and SRP issues? > > I opened bug #49 regarding OSU MPI not compiling on PPC64, it's assigned > to the default owner huanwei at cse.ohio-state.edu. > > http://openib.org/bugzilla/show_bug.cgi?id=49 > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss at cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From rdreier at cisco.com Thu May 4 08:24:18 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 04 May 2006 08:24:18 -0700 Subject: [openib-general] Re: [mvapich-discuss] RE: [openfabrics-ewg] Current OFED kernel snapshot In-Reply-To: (Abhinav Vishnu's message of "Thu, 4 May 2006 11:07:31 -0400 (EDT)") References: Message-ID: Abhinav> The DDR flag is expected to be enabled for DDR mellanox Abhinav> HCAs, similarly PCI_EX is expected to enabled for Abhinav> PCI-Express Based HCAs. Also, starting MVAPICH-0.9.7, for Abhinav> scalability to ultra-scale clusters, we have defined Abhinav> MEMORY_SCALE flag, which is a combination of SRQ and Abhinav> ADAPTIVE_RDMA_FAST_PATH. However, AFAIK, SRQ is not Abhinav> available for PPC64. Why would SRQ not be available for ppc64? The low-level drivers are identical. And why would DDR and/or PCI Express not be available for ppc64? - R. From jackm at mellanox.co.il Thu May 4 08:38:50 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Thu, 4 May 2006 18:38:50 +0300 Subject: [openib-general] [PATCH] libmthca: allow make dist without needing to install libibverbs first (option to disable libcheck) Message-ID: <200605041838.50612.jackm@mellanox.co.il> Roland, Please apply this patch ASAP, so that we can do "make dist" (see userspace/management/diags/configure.in for an example of how this patch solves the problem). Thanks! Jack --- Allow disabling libcheck, so that can do make dist without installing ibverbs package first. Signed-off-by: Jack Morgenstein Index: latest/src/userspace/libmthca/configure.in =================================================================== --- latest.orig/src/userspace/libmthca/configure.in (revision 6935) +++ latest/src/userspace/libmthca/configure.in (working copy) @@ -8,12 +8,21 @@ AM_CONFIG_HEADER(config.h) AM_INIT_AUTOMAKE(libmthca, 1.0.2) AM_PROG_LIBTOOL +AC_ARG_ENABLE(libcheck, [ --disable-libcheck do not test for presence of ib libraries], +[ if test x$enableval = xno ; then + disable_libcheck=yes + fi +]) + dnl Checks for programs AC_PROG_CC +if test "$disable_libcheck" != "yes" +then dnl Checks for libraries AC_CHECK_LIB(ibverbs, ibv_get_device_list, [], AC_MSG_ERROR([ibv_get_device_list() not found. libmthca requires libibverbs.])) +fi dnl Checks for header files. AC_CHECK_HEADER(infiniband/driver.h, [], From tziporet at mellanox.co.il Thu May 4 09:03:59 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 4 May 2006 19:03:59 +0300 Subject: [openib-general] OFED-1.0-rc4 is available Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6EC4@mtlexch01.mtl.com> Hi All, We have prepared OFED 1.0 RC4. Release location: *https://openib.org/svn/gen2/branches/1.0/ofed/releases* File: OFED-1.0-rc4.tgz *BUILD_ID:* OFED-1.0-rc4 openib-1.0 (REV=6922) # User space https://openib.org/svn/gen2/branches/1.0/src/userspace # Kernel space git/HEAD: ref: refs/heads/for-2.6.17 commit 9817d207dc13e3a9fc0287bbd36bdfa3cffe5ed4 https://openib.org/svn/gen2/branches/1.0/ofed/tags/rc4/linux-kernel # MPI mpi_osu-0.9.7-mlx2.1.0.tgz openmpi-1.1a3-1.src.rpm mpitests-1.0-0.src.rpm *OSes:* * RH EL4 up2: 2.6.9-22.ELsmp * RH EL4 up3: 2.6.9-34.ELsmp * Fedora C4: 2.6.11-1.1369_FC4 * SLES10 beta 7: 2.6.16-rc5-git9-2-smp * SUSE 10 Pro: 2.6.13-15-smp * kernel.org: 2.6.16 *Systems:* * x86_64 * x86 * ia64 * ppc64 *Main changes from RC3:* 1. Kernel code based on git (see BUILD_ID for version) 2. SRP - with new features: FMR, tunable parameters, SRP daemon (see details below) 3. Open MPI - new package based on 1.1a3 4. RDS - new version from main trunk 5. Standard network configuration: Network configuration scripts ifcfg-ib*, located under /etc/sysconfig/network-scripts (.../network for SuSE) 6. /etc/modprobe.conf updated with: alias ib0 ib_ipoib ... alias net-pf-26 ib_sdp Note: this causes unloading the ipoib module to fail on SuSE (hotplug issue) 7. ipath driver available on RH EL4 (up2 and up3). 8. uDAPL is available on RH EL4 (up2 and up3). 9. Documentation updated (the installation guide is not complete). 10. pdsh has been removed from the package. 11. libibat has been removed. 12. Bug Fixes *Package limitations:* 1. iSER is working on SuSE SLES 10 Beta8 only 2. SDP has not been upgraded yet. 3. MPI OSU and Open MPI compilation fails on PPC64 4. ipath driver compilation fails on FedoraC4. *SRP details:* SRP new features and changes: * FMR support was added. * Added an interface for removing a target in the initiator. * Added an interface for querying the connected targets of an initiator. * The SRP tool ibsrpdm can execute as a daemon. The daemon adds new targets that join the network and removes targets that leave the network. * ibsrpdm can use new SM feature: enhanced capability mask matching (errata MGTWG8372) Limitation: * Attempting to add the same target twice fails - to be fixed in RC5. * The implementation of the target list query must be modified. The new implementation should create a directory in sysfs for each target - to be fixed in RC5. Please send us any issues you encounter and/or test results. Thanks Tziporet & Vlad -------------- next part -------------- An HTML attachment was scrubbed... URL: From vishnu at cse.ohio-state.edu Thu May 4 09:27:27 2006 From: vishnu at cse.ohio-state.edu (Abhinav Vishnu) Date: Thu, 4 May 2006 12:27:27 -0400 (EDT) Subject: [openib-general] Re: [mvapich-discuss] RE: [openfabrics-ewg] Current OFED kernel snapshot In-Reply-To: Message-ID: Hi Roland, > Abhinav> The DDR flag is expected to be enabled for DDR mellanox > Abhinav> HCAs, similarly PCI_EX is expected to enabled for > Abhinav> PCI-Express Based HCAs. Also, starting MVAPICH-0.9.7, for > Abhinav> scalability to ultra-scale clusters, we have defined > Abhinav> MEMORY_SCALE flag, which is a combination of SRQ and > Abhinav> ADAPTIVE_RDMA_FAST_PATH. However, AFAIK, SRQ is not > Abhinav> available for PPC64. > > Why would SRQ not be available for ppc64? The low-level drivers are > identical. > > And why would DDR and/or PCI Express not be available for ppc64? > > - R. > By referring to the PPC64 architecture, i was mentioning about the IBM HCAs(4x/12x) running on GX/GX+ Bus. To the best of my knowledge, these HCAs do not support the features mentioned above. Thanks, -- Abhinav *** Forgot to CC this mail to everyone in the initial thread *** Hello All, Thanks for reporting the compilation problem of MVAPICH on PPC64. I looked at the bugzilla entry #49 at openib.org. The CFLAGS which have been used for compilation are indicated below. ---- D_DDR_ -DCH_GEN2 - ^^^^^ DMEMORY_SCALE -D_AFFINITY_ -g -Wall -D_PCI_EX_ -D_SMALL_CLUSTER -D_SMP_ ^^^^^^^^^^^^^ -D_SMP_RNDV_ - ^^^^^^^^ DVI ADEV_RPUT_SUPPORT -DEARLY_SEND_COMPLETION -DLAZY_MEM_UNREGISTER -D_IA64_ ^^^^^^ The DDR flag is expected to be enabled for DDR mellanox HCAs, similarly PCI_EX is expected to enabled for PCI-Express Based HCAs. Also, starting MVAPICH-0.9.7, for scalability to ultra-scale clusters, we have defined MEMORY_SCALE flag, which is a combination of SRQ and ADAPTIVE_RDMA_FAST_PATH. However, AFAIK, SRQ is not available for PPC64. We would also recommend using -D_PPC64_ as the CFLAG for the architecture. In order to get the optimal performance, we have a unified script for different architectures/platforms which is available in the top directory of MVAPICH; make.mvapich.gen2 and make.mvapich.gen2_multirail. As an example, the flags generated by the script for PPC64 would be: -D_PPC64_ -DEARLY_SEND_COMPLETION -DMEMORY_SCALE -DVIADEV_RPUT_SUPPORT -DLAZY_MEM_UNREGISTER -DCH_GEN2 -D_SMP_ -D_SMP_RNDV_ -D_PCI_X_ -D_SDR_ We would strongly encourage for this script to be used for compilation on PPC64. In addition, there seems to be an assembler problem, could possibly be a gcc configuration problem? /tmp/ccTRXdQu.s: Assembler messages: /tmp/ccTRXdQu.s:127: Error: Unrecognized opcode: `mf' Please let us know if the problem persists by using the top level make script. Thanks, -- Abhinav On Wed, 3 May 2006, Scott Weitzenkamp (sweitzen) wrote: > > > > Known issues: > > > > 1. ipath installation fails on 2.6.9 - 2.6.11* kernels > > > > 2. OSU MPI compilation fails on SLES10, PPC64 > > > > 3. SRP is not supported on 2.6.9 - 2.6.13* kernels - Ishai > > > will follow up with details > > > > 4. Open MPI RPM build process fails - Jeff, will you be > > > able to send us fixes by Wed? > > > > Do we have any progress on the MPI and SRP issues? > > I opened bug #49 regarding OSU MPI not compiling on PPC64, it's assigned > to the default owner huanwei at cse.ohio-state.edu. > > http://openib.org/bugzilla/show_bug.cgi?id=49 > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > _______________________________________________ > mvapich-discuss mailing list > mvapich-discuss at cse.ohio-state.edu > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss > From rdreier at cisco.com Thu May 4 09:31:22 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 04 May 2006 09:31:22 -0700 Subject: [openib-general] Re: [mvapich-discuss] RE: [openfabrics-ewg] Current OFED kernel snapshot In-Reply-To: (Abhinav Vishnu's message of "Thu, 4 May 2006 12:27:27 -0400 (EDT)") References: Message-ID: Abhinav> By referring to the PPC64 architecture, i was mentioning Abhinav> about the IBM HCAs(4x/12x) running on GX/GX+ Bus. To the Abhinav> best of my knowledge, these HCAs do not support the Abhinav> features mentioned above. Hmm, making this a compile-time thing seems like a problem then. Some ppc64 systems have IBM eHCAs and some have Mellanox and/or PathScale HCAs. Shouldn't the same MPI package work on all of these systems? - R. From mshefty at ichips.intel.com Thu May 4 09:31:39 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 04 May 2006 09:31:39 -0700 Subject: [openib-general] sdp code in trunk In-Reply-To: <1146737826.4719.40931.camel@hal.voltaire.com> References: <20060501153555.GK3032@mellanox.co.il> <1146678659.4719.29930.camel@hal.voltaire.com> <4459D406.4070109@mellanox.co.il> <1146737826.4719.40931.camel@hal.voltaire.com> Message-ID: <445A2C6B.2070302@ichips.intel.com> Hal Rosenstock wrote: > It could have been done the other way 'round as well with the new SDP on > a new branch as other ULPs have done prior to being ready for the trunk > and all the same goals that you mention accomplished. I agree with Hal. If there was a good chance that the code was more stable, then checking it into the trunk would have been fine, even with fewer features. From the initial post, though, this didn't sound like the case. - Sean From gjohnson at lanl.gov Thu May 4 09:34:16 2006 From: gjohnson at lanl.gov (Greg Johnson) Date: Thu, 4 May 2006 10:34:16 -0600 Subject: [openib-general] Dump and load routes with opensm? Message-ID: <20060504163416.GA2201@durango.c3.lanl.gov> Is there currently a way to dump and load routes with opensm? If not, how would I go about writing one? Thanks, Greg From dledford at redhat.com Thu May 4 09:36:37 2006 From: dledford at redhat.com (Doug Ledford) Date: Thu, 4 May 2006 12:36:37 -0400 Subject: [openib-general] Re: [mvapich-discuss] RE: [openfabrics-ewg] Current OFED kernel snapshot In-Reply-To: References: Message-ID: <20060504163637.GC16018@redhat.com> On Thu, May 04, 2006 at 09:31:22AM -0700, Roland Dreier wrote: > Abhinav> By referring to the PPC64 architecture, i was mentioning > Abhinav> about the IBM HCAs(4x/12x) running on GX/GX+ Bus. To the > Abhinav> best of my knowledge, these HCAs do not support the > Abhinav> features mentioned above. > > Hmm, making this a compile-time thing seems like a problem then. Some > ppc64 systems have IBM eHCAs and some have Mellanox and/or PathScale > HCAs. Shouldn't the same MPI package work on all of these systems? /agree Hardware detection and optimization should always be runtime for all available hardware on a platform IMHO. -- Doug Ledford Red Hat, Inc. 1801 Varsity Dr. Raleigh, NC 27606 From mshefty at ichips.intel.com Thu May 4 09:37:15 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 04 May 2006 09:37:15 -0700 Subject: [openib-general] Re: [PATCH] change iser ib connection state management In-Reply-To: References: Message-ID: <445A2DBB.6020509@ichips.intel.com> Or Gerlitz wrote: > changed iser ib conn state management to be done with an int variable > keeping the state and a lock. When a related race is possible the lock is > used to check (comp) or change (comp_exch) the state. When no race can > happen the state is just examined or changed. These look fine to me. - Sean From halr at voltaire.com Thu May 4 09:39:38 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 04 May 2006 12:39:38 -0400 Subject: [openib-general] Dump and load routes with opensm? In-Reply-To: <20060504163416.GA2201@durango.c3.lanl.gov> References: <20060504163416.GA2201@durango.c3.lanl.gov> Message-ID: <1146760777.4719.45651.camel@hal.voltaire.com> Hi Greg, On Thu, 2006-05-04 at 12:34, Greg Johnson wrote: > Is there currently a way to dump and load routes with opensm? If not, > how would I go about writing one? Is it really routes or stable LIDs you want ? LIDs are stored in /var/cache/osm/guid2lid and restored from there when OpenSM is started assuming the reassign LIDs option (-r or --reassign_lids) is not used when invoking OpenSM. -- Hal From rdreier at cisco.com Thu May 4 09:46:28 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 04 May 2006 09:46:28 -0700 Subject: [openib-general] Re: [openfabrics-ewg] SRP: changes to ibsrpdm In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30208D2EE@mtlexch01.mtl.com> (Ishai Rabinovitz's message of "Thu, 4 May 2006 16:53:36 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30208D2EE@mtlexch01.mtl.com> Message-ID: Ishai> 1) There is going to be an attribute for a target Ishai> indicating if it was added by the daemon (Named Ishai> daemons). Only the daemon should add targets with this Ishai> attribute set. Ishai> 2) The kernel will not allow the daemon to add the same Ishai> target twice. Regular activation of add_target can add Ishai> multiple instances of the same target. This seems like a strange design to me. Why can't the daemon keep track of which targets it has added and not add a target twice? Is there some reason that the kernel has to be involved? - R. From rdreier at cisco.com Thu May 4 09:50:48 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 04 May 2006 09:50:48 -0700 Subject: [openib-general] Re: [PATCH] libmthca: allow make dist without needing to install libibverbs first (option to disable libcheck) In-Reply-To: <200605041838.50612.jackm@mellanox.co.il> (Jack Morgenstein's message of "Thu, 4 May 2006 18:38:50 +0300") References: <200605041838.50612.jackm@mellanox.co.il> Message-ID: Jack> Roland, Please apply this patch ASAP, so that we can do Jack> "make dist" (see userspace/management/diags/configure.in for Jack> an example of how this patch solves the problem). Why do you want to do "make dist" anyway? This seems like a really strange configure option ("please allow broken setups"??). - R. From rdreier at cisco.com Thu May 4 09:53:40 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 04 May 2006 09:53:40 -0700 Subject: [openib-general] [PATCH] sysfs display for local_link_integrity_errors broken In-Reply-To: <1146691810.13951.149.camel@brick.internal.keyresearch.com> (Ralph Campbell's message of "Wed, 03 May 2006 14:30:10 -0700") References: <1146691810.13951.149.camel@brick.internal.keyresearch.com> Message-ID: Thanks, applied. From gjohnson at lanl.gov Thu May 4 09:55:41 2006 From: gjohnson at lanl.gov (Greg Johnson) Date: Thu, 4 May 2006 10:55:41 -0600 Subject: [openib-general] Dump and load routes with opensm? In-Reply-To: <1146760777.4719.45651.camel@hal.voltaire.com> References: <20060504163416.GA2201@durango.c3.lanl.gov> <1146760777.4719.45651.camel@hal.voltaire.com> Message-ID: <20060504165541.GB2201@durango.c3.lanl.gov> On Thu, May 04, 2006 at 12:39:38PM -0400, Hal Rosenstock wrote: > Hi Greg, > > On Thu, 2006-05-04 at 12:34, Greg Johnson wrote: > > Is there currently a way to dump and load routes with opensm? If not, > > how would I go about writing one? > > Is it really routes or stable LIDs you want ? I actually want routes. I have queried them with ibtraceroute and ibroute, but we need routes for the whole fabric. BTW, if you call ibtraceroute thousands of times it stops working. Maybe a problem in the MAD driver? > LIDs are stored in /var/cache/osm/guid2lid and restored from there when > OpenSM is started assuming the reassign LIDs option (-r or > --reassign_lids) is not used when invoking OpenSM. Thanks, that's good to know. Greg From rdreier at cisco.com Thu May 4 09:57:17 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 04 May 2006 09:57:17 -0700 Subject: [openib-general] Re: the QP attribute have a double defenition of the QP primary port In-Reply-To: <200604301110.42718.dotanb@mellanox.co.il> (Dotan Barak's message of "Sun, 30 Apr 2006 11:10:42 +0300") References: <200604301110.42718.dotanb@mellanox.co.il> Message-ID: Dotan> In the IB spec: in the transition RESET->INIT, the primary Dotan> port is required. in the transition INIT->RTR, the address Dotan> vector is required for connected QPs. Dotan> In the driver: in the transition RESET->INIT, the primary Dotan> port is required (the mask IBV_QP_PORT is required) in the Dotan> transition INIT->RTR, the port number is one of the Dotan> attributes of the address vector (which mean there are 2 Dotan> attributes which define the QP port number). Dotan> I think that there are 2 problems with this implementation: Dotan> 1) the user can use two different values for those port Dotan> numbers (in mthca driver, the port number that was defined Dotan> in the address vector will be used) 2) the user can define Dotan> / change the QP port number in the transition INIT->RTR Dotan> (which means a IB spec violation) Probably the low-level drivers should ignore the port number in the primary address vector for INIT->RTR. That would fix both problems, right? - R. From rdreier at cisco.com Thu May 4 10:06:26 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 04 May 2006 10:06:26 -0700 Subject: [openib-general] [RFC] [PATCH] ib_unregister_client() In-Reply-To: (Krishna Kumar2's message of "Thu, 4 May 2006 11:05:11 +0530") References: Message-ID: I think this is a valid change but on the other hand I don't see much motivation to apply it. It slightly optimizes a slow path at the cost of slightly enlarging the code, which doesn't seem like a good tradeoff to me. Am I off base? - R. From mshefty at ichips.intel.com Thu May 4 10:07:26 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 04 May 2006 10:07:26 -0700 Subject: [openib-general] SA MultiPathRecord v.PathRecord In-Reply-To: <1146760956.4719.45698.camel@hal.voltaire.com> References: <1146760956.4719.45698.camel@hal.voltaire.com> Message-ID: <445A34CE.4060700@ichips.intel.com> Moving discussion to list. Hal Rosenstock wrote: > MPR does allow for better selection of paths for APM though. Beyond adding the Independence Selector, does MPR add anything else? I assume that the Independence Selector works when paths between multiple GIDs are requested. Have you thought about how to expose this capability to a client in such a way that they can make use of it? To me, this feature seems most useful for all-to-all type connections, but would require some sort of coordination between connecting end-points in order to have fault independent connections between different nodes. E.g. the connection from A to B is independent from the connection between C and D. Also, for large fabrics, even MPR seems limited by the GID counts being 8-bits. - Sean From halr at voltaire.com Thu May 4 10:08:54 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 04 May 2006 13:08:54 -0400 Subject: [openib-general] Dump and load routes with opensm? In-Reply-To: <20060504165541.GB2201@durango.c3.lanl.gov> References: <20060504163416.GA2201@durango.c3.lanl.gov> <1146760777.4719.45651.camel@hal.voltaire.com> <20060504165541.GB2201@durango.c3.lanl.gov> Message-ID: <1146762523.4719.46028.camel@hal.voltaire.com> On Thu, 2006-05-04 at 12:55, Greg Johnson wrote: > On Thu, May 04, 2006 at 12:39:38PM -0400, Hal Rosenstock wrote: > > Hi Greg, > > > > On Thu, 2006-05-04 at 12:34, Greg Johnson wrote: > > > Is there currently a way to dump and load routes with opensm? If not, > > > how would I go about writing one? > > > > Is it really routes or stable LIDs you want ? > > I actually want routes. OpenSM calculates the unicast and multicast routes and populates the (unicast and multicast) forwarding tables. > I have queried them with ibtraceroute and ibroute, ibroute dumps the forwarding tables and ibtracert traces the path from a source to a destination so these are displaying how OpenSM has setup the fabric which is a function of the routing algorithm chosen and the physical topology (which may be dynamic). > but we need routes for the whole fabric. Unicast, multicast, or both ? Just to look at ? There is no way to load these into OpenSM. > BTW, if you call ibtraceroute thousands of times it stops working. Any more info on how it fails ? What version of diags, the management libraries, and the kernel (including OpenIB svn if that is being used) are you using ? > Maybe a problem in the MAD driver? Not sure as I haven't seen this. -- Hal > > LIDs are stored in /var/cache/osm/guid2lid and restored from there when > > OpenSM is started assuming the reassign LIDs option (-r or > > --reassign_lids) is not used when invoking OpenSM. > > Thanks, that's good to know. > > Greg From mst at mellanox.co.il Thu May 4 10:15:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 4 May 2006 20:15:43 +0300 Subject: [openib-general] Re: [PATCH] libmthca: allow make dist without needing to install libibverbs first (option to disable libcheck) In-Reply-To: References: <200605041838.50612.jackm@mellanox.co.il> Message-ID: <20060504171543.GB4682@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] libmthca: allow make dist without needing to install libibverbs first (option to disable libcheck) > > Jack> Roland, Please apply this patch ASAP, so that we can do > Jack> "make dist" (see userspace/management/diags/configure.in for > Jack> an example of how this patch solves the problem). > > Why do you want to do "make dist" anyway? This seems like a really > strange configure option ("please allow broken setups"??). I think that configure/make/make install is a de-facto standard for software installation from source tarballs. -- MST From rdreier at cisco.com Thu May 4 10:18:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 04 May 2006 10:18:24 -0700 Subject: [openib-general] Re: [PATCH] libmthca: allow make dist without needing to install libibverbs first (option to disable libcheck) In-Reply-To: <20060504171543.GB4682@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 4 May 2006 20:15:43 +0300") References: <200605041838.50612.jackm@mellanox.co.il> <20060504171543.GB4682@mellanox.co.il> Message-ID: Michael> I think that configure/make/make install is a de-facto Michael> standard for software installation from source tarballs. Yes, but the "make configure succeed on a system missing a dependency" option is far from a de facto standard. In fact I don't know of any packages that have such an option. If you want a libmthca tarball and you can't be bothered to install libibverbs, what's wrong with http://openib.org/downloads.html? - R. From bos at pathscale.com Thu May 4 10:23:55 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 04 May 2006 10:23:55 -0700 Subject: [openib-general] OFED-1.0-rc4 is available In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6EC4@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6EC4@mtlexch01.mtl.com> Message-ID: <1146763435.5106.28.camel@localhost.localdomain> On Thu, 2006-05-04 at 19:03 +0300, Tziporet Koren wrote: > We have prepared OFED 1.0 RC4. And I've tagged OpenIB 1.0 RC4 (recall that we skipped over RC3 so that the two sets of release candidates would come into sync). For people on openib-general who I think have not been informed of this, we plan to issue one more release candidate of both OFED and OpenIB, due to the large number of changes between the previous release candidate and RC4. References: <20060504163416.GA2201@durango.c3.lanl.gov> <1146760777.4719.45651.camel@hal.voltaire.com> <20060504165541.GB2201@durango.c3.lanl.gov> <1146762523.4719.46028.camel@hal.voltaire.com> Message-ID: <20060504172656.GC2201@durango.c3.lanl.gov> On Thu, May 04, 2006 at 01:08:54PM -0400, Hal Rosenstock wrote: > On Thu, 2006-05-04 at 12:55, Greg Johnson wrote: > > I actually want routes. > > OpenSM calculates the unicast and multicast routes and populates the > (unicast and multicast) forwarding tables. > > > I have queried them with ibtraceroute and ibroute, > > ibroute dumps the forwarding tables and ibtracert traces the path from a > source to a destination so these are displaying how OpenSM has setup the > fabric which is a function of the routing algorithm chosen and the > physical topology (which may be dynamic). > > > but we need routes for the whole fabric. > > Unicast, multicast, or both ? Just to look at ? There is no way to load > these into OpenSM. At this point we are only interested in unicast routes. We would like to be able to dump and load the forwarding tables. We have a single 288 port switch chassis for our cluster. We would like to be able to load routes for two reasons. One is to be able to do testing with a fixed set of routes. The other is that we would like to program our own routes into the switches. > > BTW, if you call ibtraceroute thousands of times it stops working. > > Any more info on how it fails ? What version of diags, the management > libraries, and the kernel (including OpenIB svn if that is being used) > are you using ? I'll have to make it fail again and get back to you. Thanks, Greg From halr at voltaire.com Thu May 4 10:31:12 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 04 May 2006 13:31:12 -0400 Subject: [openib-general] Re: SA MultiPathRecord v.PathRecord In-Reply-To: <445A34CE.4060700@ichips.intel.com> References: <1146760956.4719.45698.camel@hal.voltaire.com> <445A34CE.4060700@ichips.intel.com> Message-ID: <1146763871.4719.46353.camel@hal.voltaire.com> On Thu, 2006-05-04 at 13:07, Sean Hefty wrote: > Moving discussion to list. > > Hal Rosenstock wrote: > > MPR does allow for better selection of paths for APM though. > > Beyond adding the Independence Selector, does MPR add anything else? Yes, there is 1.2 erratum which extends SA MPR by adding scope for both the S/DGID which allows for choosing GIDs from same HCA or from same system (based on SystemImageGUID) as well as explicit which is the way it is in the 1.2 spec. > I assume that the Independence Selector works when paths between multiple GIDs > are requested. Yes but with scope set to more than just explicit, this can be between individual GIDs (and the SM calculates the rest so the end client doesn't need to figure out the other GIDs). > Have you thought about how to expose this capability to a client > in such a way that they can make use of it? No; not yet. > To me, this feature seems most useful for all-to-all type connections, but would > require some sort of coordination between connecting end-points in order to have > fault independent connections between different nodes. E.g. the connection from > A to B is independent from the connection between C and D. I don't understand what end node coordination you are referring to here in terms of this. > Also, for large fabrics, even MPR seems limited by the GID counts being 8-bits. If there was a half (one of source and dest wildcarded) and full world (both source and dest wildcarded) capability like PathRecord that would not be a limit. -- Hal > - Sean From sashak at voltaire.com Thu May 4 10:54:44 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 4 May 2006 20:54:44 +0300 Subject: [openib-general] Re: [PATCH 4/4] opensm: no need to wait for pkey_mgr In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAEA@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAEA@mtlexch01.mtl.com> Message-ID: <20060504175444.GD5227@sashak.voltaire.com> Hi Eitan, On 09:11 Thu 04 May , Eitan Zahavi wrote: > > So how would the partition manager signal to the LID manager that some > transaction were made and needed to be waited on? It does not. > Normally a manager reports back the need for waiting in its return code > that is assigned as the next signal to the state machine. > > So if this signal is not provided from the partition manager (which by > your patch returns void) how would the state machine actually wait for > it if the LID manager returns OSM_SIGNAL_DONE ? Then this will wait at the next "wait point" (link_manager iirc). But then if if we will continue there w/out waiting too, and so on... I will need to check, not sure that there is "unconditional wait point" somewhere. Anyway, I see your point now. > Although this can be fixed by forwarding the result of the partition > manager to affect the LID manager next signal calculation Right, or just simple conditional wait for outstanding transactions if any SMPs were sent and not yet responded. Actually then we may group even more independent resweeper components for parallel execution in simpler way. > I would rather > not do it. Keeping the state machine as "linear" as possible have great > merits in avoiding extra complexity and bugs. This may only simplify the state machine. Other obvious advantage of this is performance. Sasha. From halr at voltaire.com Thu May 4 10:48:29 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 04 May 2006 13:48:29 -0400 Subject: [openib-general] Dump and load routes with opensm? In-Reply-To: <20060504172656.GC2201@durango.c3.lanl.gov> References: <20060504163416.GA2201@durango.c3.lanl.gov> <1146760777.4719.45651.camel@hal.voltaire.com> <20060504165541.GB2201@durango.c3.lanl.gov> <1146762523.4719.46028.camel@hal.voltaire.com> <20060504172656.GC2201@durango.c3.lanl.gov> Message-ID: <1146764909.4719.46578.camel@hal.voltaire.com> On Thu, 2006-05-04 at 13:26, Greg Johnson wrote: > On Thu, May 04, 2006 at 01:08:54PM -0400, Hal Rosenstock wrote: > > On Thu, 2006-05-04 at 12:55, Greg Johnson wrote: > > > I actually want routes. > > > > OpenSM calculates the unicast and multicast routes and populates the > > (unicast and multicast) forwarding tables. > > > > > I have queried them with ibtraceroute and ibroute, > > > > ibroute dumps the forwarding tables and ibtracert traces the path from a > > source to a destination so these are displaying how OpenSM has setup the > > fabric which is a function of the routing algorithm chosen and the > > physical topology (which may be dynamic). > > > > > but we need routes for the whole fabric. > > > > Unicast, multicast, or both ? Just to look at ? There is no way to load > > these into OpenSM. > > At this point we are only interested in unicast routes. We would like > to be able to dump and load the forwarding tables. You can dump them (via ibroute) just not load them. > We have a single 288 port switch chassis for our cluster. We would like to be able to > load routes for two reasons. One is to be able to do testing with a > fixed set of routes. So the topology is fixed and no links ever fail ? > The other is that we would like to program our own > routes into the switches. Once the fabric is up, what are the requirements ? Do you need SA queries (e.g. PathRecords) to work ? > > > BTW, if you call ibtraceroute thousands of times it stops working. > > > > Any more info on how it fails ? What version of diags, the management > > libraries, and the kernel (including OpenIB svn if that is being used) > > are you using ? > > I'll have to make it fail again and get back to you. Thanks. -- Hal > > Thanks, > Greg From mshefty at ichips.intel.com Thu May 4 10:54:58 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 04 May 2006 10:54:58 -0700 Subject: [openib-general] Re: SA MultiPathRecord v.PathRecord In-Reply-To: <1146763871.4719.46353.camel@hal.voltaire.com> References: <1146760956.4719.45698.camel@hal.voltaire.com> <445A34CE.4060700@ichips.intel.com> <1146763871.4719.46353.camel@hal.voltaire.com> Message-ID: <445A3FF2.6050508@ichips.intel.com> Hal Rosenstock wrote: >>To me, this feature seems most useful for all-to-all type connections, but would >>require some sort of coordination between connecting end-points in order to have >>fault independent connections between different nodes. E.g. the connection from >>A to B is independent from the connection between C and D. > > I don't understand what end node coordination you are referring to here > in terms of this. Today, connections are established from the local node to some given destination without considering what other connections may exist between other nodes. For example, an app may select a path to connect A to B based on the connection from C to D. I'm just trying to determine how an implementation can make use of MPR to understand the best way to expose it. The use of MPR over just path records seems complex. - Sean From halr at voltaire.com Thu May 4 11:06:56 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 04 May 2006 14:06:56 -0400 Subject: [openib-general] Re: SA MultiPathRecord v.PathRecord In-Reply-To: <445A3FF2.6050508@ichips.intel.com> References: <1146760956.4719.45698.camel@hal.voltaire.com> <445A34CE.4060700@ichips.intel.com> <1146763871.4719.46353.camel@hal.voltaire.com> <445A3FF2.6050508@ichips.intel.com> Message-ID: <1146766014.4719.46836.camel@hal.voltaire.com> On Thu, 2006-05-04 at 13:54, Sean Hefty wrote: > Hal Rosenstock wrote: > >>To me, this feature seems most useful for all-to-all type connections, but would > >>require some sort of coordination between connecting end-points in order to have > >>fault independent connections between different nodes. E.g. the connection from > >>A to B is independent from the connection between C and D. > > > > I don't understand what end node coordination you are referring to here > > in terms of this. > > Today, connections are established from the local node to some given destination > without considering what other connections may exist between other nodes. For > example, an app may select a path to connect A to B based on the connection from > C to D. So each node needs to know the entire connection mesh (and traffic pattern) to help it choose a path ? I'm not sure how MPR v. PR relates to this exactly other than there is a current difference in how the half world paths would be obtained using MPR v. PR. > I'm just trying to determine how an implementation can make use of MPR to > understand the best way to expose it. The use of MPR over just path records > seems complex. In terms of what ? -- Hal > - Sean From mshefty at ichips.intel.com Thu May 4 11:38:35 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 04 May 2006 11:38:35 -0700 Subject: [openib-general] Re: SA MultiPathRecord v.PathRecord In-Reply-To: <1146766014.4719.46836.camel@hal.voltaire.com> References: <1146760956.4719.45698.camel@hal.voltaire.com> <445A34CE.4060700@ichips.intel.com> <1146763871.4719.46353.camel@hal.voltaire.com> <445A3FF2.6050508@ichips.intel.com> <1146766014.4719.46836.camel@hal.voltaire.com> Message-ID: <445A4A2B.2080802@ichips.intel.com> Hal Rosenstock wrote: >>I'm just trying to determine how an implementation can make use of MPR to >>understand the best way to expose it. The use of MPR over just path records >>seems complex. > > In terms of what ? To be clear, that wasn't a criticism, just a statement that MPR provides additional information to process. If the goal of exposing MPR is just to provide disjoint paths for failover purposes, that's a relatively simple task. If the goal is to select paths to optimize traffic flow across the fabric, it's a fairly complex operation, that I'm not sure we can provide value beyond basic MAD support. - Sean From gjohnson at lanl.gov Thu May 4 11:41:25 2006 From: gjohnson at lanl.gov (Greg Johnson) Date: Thu, 4 May 2006 12:41:25 -0600 Subject: [openib-general] Dump and load routes with opensm? In-Reply-To: <1146764909.4719.46578.camel@hal.voltaire.com> References: <20060504163416.GA2201@durango.c3.lanl.gov> <1146760777.4719.45651.camel@hal.voltaire.com> <20060504165541.GB2201@durango.c3.lanl.gov> <1146762523.4719.46028.camel@hal.voltaire.com> <20060504172656.GC2201@durango.c3.lanl.gov> <1146764909.4719.46578.camel@hal.voltaire.com> Message-ID: <20060504184125.GE2201@durango.c3.lanl.gov> On Thu, May 04, 2006 at 01:48:29PM -0400, Hal Rosenstock wrote: > On Thu, 2006-05-04 at 13:26, Greg Johnson wrote: > > On Thu, May 04, 2006 at 01:08:54PM -0400, Hal Rosenstock wrote: > > > On Thu, 2006-05-04 at 12:55, Greg Johnson wrote: > > > > I actually want routes. > > > > > > OpenSM calculates the unicast and multicast routes and populates the > > > (unicast and multicast) forwarding tables. > > > > > > > I have queried them with ibtraceroute and ibroute, > > > > > > ibroute dumps the forwarding tables and ibtracert traces the path from a > > > source to a destination so these are displaying how OpenSM has setup the > > > fabric which is a function of the routing algorithm chosen and the > > > physical topology (which may be dynamic). > > > > > > > but we need routes for the whole fabric. > > > > > > Unicast, multicast, or both ? Just to look at ? There is no way to load > > > these into OpenSM. > > > > At this point we are only interested in unicast routes. We would like > > to be able to dump and load the forwarding tables. > > You can dump them (via ibroute) just not load them. Right. I imagine a tool that would dump the routes to a file that could be reloaded later. If I could edit the file, I could load my own routes as well. > > We have a single 288 port switch chassis for our cluster. We would like to be able to > > load routes for two reasons. One is to be able to do testing with a > > fixed set of routes. > > So the topology is fixed and no links ever fail ? Yes, basically. We want this for testing, not production (at this point). We can handle faults manually for now. The internal topology of the switch chassis is fixed. > > The other is that we would like to program our own > > routes into the switches. > > Once the fabric is up, what are the requirements ? Do you need SA > queries (e.g. PathRecords) to work ? I'm not sure if we need SA queries. What are they good for? Basically, we want to be able to run MPI over the IB fabric. We don't need anything else. I'm not sure if MVAPICH or OpenMPI use SA queries internally. Greg From halr at voltaire.com Thu May 4 11:52:07 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 04 May 2006 14:52:07 -0400 Subject: [openib-general] Dump and load routes with opensm? In-Reply-To: <20060504184125.GE2201@durango.c3.lanl.gov> References: <20060504163416.GA2201@durango.c3.lanl.gov> <1146760777.4719.45651.camel@hal.voltaire.com> <20060504165541.GB2201@durango.c3.lanl.gov> <1146762523.4719.46028.camel@hal.voltaire.com> <20060504172656.GC2201@durango.c3.lanl.gov> <1146764909.4719.46578.camel@hal.voltaire.com> <20060504184125.GE2201@durango.c3.lanl.gov> Message-ID: <1146768720.4719.47421.camel@hal.voltaire.com> On Thu, 2006-05-04 at 14:41, Greg Johnson wrote: > On Thu, May 04, 2006 at 01:48:29PM -0400, Hal Rosenstock wrote: > > On Thu, 2006-05-04 at 13:26, Greg Johnson wrote: > > > On Thu, May 04, 2006 at 01:08:54PM -0400, Hal Rosenstock wrote: > > > > On Thu, 2006-05-04 at 12:55, Greg Johnson wrote: > > > > > I actually want routes. > > > > > > > > OpenSM calculates the unicast and multicast routes and populates the > > > > (unicast and multicast) forwarding tables. > > > > > > > > > I have queried them with ibtraceroute and ibroute, > > > > > > > > ibroute dumps the forwarding tables and ibtracert traces the path from a > > > > source to a destination so these are displaying how OpenSM has setup the > > > > fabric which is a function of the routing algorithm chosen and the > > > > physical topology (which may be dynamic). > > > > > > > > > but we need routes for the whole fabric. > > > > > > > > Unicast, multicast, or both ? Just to look at ? There is no way to load > > > > these into OpenSM. > > > > > > At this point we are only interested in unicast routes. We would like > > > to be able to dump and load the forwarding tables. > > > > You can dump them (via ibroute) just not load them. > > Right. I imagine a tool that would dump the routes to a file that could > be reloaded later. If I could edit the file, I could load my own > routes as well. > > > > We have a single 288 port switch chassis for our cluster. We would like to be able to > > > load routes for two reasons. One is to be able to do testing with a > > > fixed set of routes. > > > > So the topology is fixed and no links ever fail ? > > Yes, basically. We want this for testing, not production (at this > point). We can handle faults manually for now. The internal topology > of the switch chassis is fixed. > > > > The other is that we would like to program our own > > > routes into the switches. > > > > Once the fabric is up, what are the requirements ? Do you need SA > > queries (e.g. PathRecords) to work ? > > I'm not sure if we need SA queries. What are they good for? Helping to set up connections, etc. > Basically, we want to be able to run MPI over the IB fabric. We don't need > anything else. I'm not sure if MVAPICH or OpenMPI use SA queries > internally. I don't think they do currently but will in the near term future. I'm not sure whether MVAPICH supports multicast but that also would require SA support. -- Hal > Greg From halr at voltaire.com Thu May 4 11:56:19 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 04 May 2006 14:56:19 -0400 Subject: [openib-general] Re: SA MultiPathRecord v.PathRecord In-Reply-To: <445A4A2B.2080802@ichips.intel.com> References: <1146760956.4719.45698.camel@hal.voltaire.com> <445A34CE.4060700@ichips.intel.com> <1146763871.4719.46353.camel@hal.voltaire.com> <445A3FF2.6050508@ichips.intel.com> <1146766014.4719.46836.camel@hal.voltaire.com> <445A4A2B.2080802@ichips.intel.com> Message-ID: <1146768905.4719.47467.camel@hal.voltaire.com> On Thu, 2006-05-04 at 14:38, Sean Hefty wrote: > Hal Rosenstock wrote: > >>I'm just trying to determine how an implementation can make use of MPR to > >>understand the best way to expose it. The use of MPR over just path records > >>seems complex. > > > > In terms of what ? > > To be clear, that wasn't a criticism, just a statement that MPR provides > additional information to process. It certainly does have extra information in terms of the request not the response. > If the goal of exposing MPR is just to > provide disjoint paths for failover purposes, that's a relatively simple task. Yes, this part/goal is clear. > If the goal is to select paths to optimize traffic flow across the fabric, it's > a fairly complex operation, When there are more paths available than those needed to satify the request which are returned is implementation dependent as well as what information is used to decide what paths to return. > that I'm not sure we can provide value beyond basic MAD support. By that do you mean expose the PRs returned ? -- Hal > - Sean > From mst at mellanox.co.il Thu May 4 13:04:50 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 4 May 2006 23:04:50 +0300 Subject: [openib-general] Re: sdp code in trunk In-Reply-To: <445A2C6B.2070302@ichips.intel.com> References: <20060501153555.GK3032@mellanox.co.il> <1146678659.4719.29930.camel@hal.voltaire.com> <4459D406.4070109@mellanox.co.il> <1146737826.4719.40931.camel@hal.voltaire.com> <445A2C6B.2070302@ichips.intel.com> Message-ID: <20060504200450.GC4682@mellanox.co.il> Quoting r. Sean Hefty : > If there was a good chance that the code was more > stable, then checking it into the trunk would have been fine, even with > fewer features. From the initial post, though, this didn't sound like the > case. Its actually quite stable for me now. What is missing to run real applications is the shutdown/garceful close feature imlementation which I'm working on. -- MST From mst at mellanox.co.il Thu May 4 13:05:51 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 4 May 2006 23:05:51 +0300 Subject: [openib-general] Re: [openfabrics-ewg] SRP: changes to ibsrpdm In-Reply-To: References: <6AB138A2AB8C8E4A98B9C0C3D52670E30208D2EE@mtlexch01.mtl.com> Message-ID: <20060504200551.GD4682@mellanox.co.il> Quoting r. Roland Dreier : > Why can't the daemon keep > track of which targets it has added and not add a target twice? Is > there some reason that the kernel has to be involved? What if the daemon is killed/restarted? -- MST From mst at mellanox.co.il Thu May 4 13:11:20 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 4 May 2006 23:11:20 +0300 Subject: [openib-general] Re: [PATCH] libmthca: allow make dist without needing to install libibverbs first (option to disable libcheck) In-Reply-To: References: <200605041838.50612.jackm@mellanox.co.il> <20060504171543.GB4682@mellanox.co.il> Message-ID: <20060504201120.GE4682@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] libmthca: allow make dist without needing to install libibverbs first (option to disable libcheck) > > Michael> I think that configure/make/make install is a de-facto > Michael> standard for software installation from source tarballs. > > Yes, but the "make configure succeed on a system missing a dependency" > option is far from a de facto standard. In fact I don't know of any > packages that have such an option. > > If you want a libmthca tarball and you can't be bothered to install > libibverbs, what's wrong with http://openib.org/downloads.html? Sorry, I was replying out of context. I think the ability to compile packages as a regular user, and then install them all as root is required by some of our users. That's why we have all the disable-libcheck flags in configure e.g. in management. -- MST From rdreier at cisco.com Thu May 4 13:13:17 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 04 May 2006 13:13:17 -0700 Subject: [openib-general] Re: [PATCH] libmthca: allow make dist without needing to install libibverbs first (option to disable libcheck) In-Reply-To: <20060504201120.GE4682@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 4 May 2006 23:11:20 +0300") References: <200605041838.50612.jackm@mellanox.co.il> <20060504171543.GB4682@mellanox.co.il> <20060504201120.GE4682@mellanox.co.il> Message-ID: Michael> Sorry, I was replying out of context. I think the Michael> ability to compile packages as a regular user, and then Michael> install them all as root is required by some of our Michael> users. How does disabling the check help you? The build will just fail at compile time instead of configure time. - R. From mst at mellanox.co.il Thu May 4 13:24:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 4 May 2006 23:24:43 +0300 Subject: [openib-general] Re: [PATCH] libmthca: allow make dist without needing to install libibverbs first (option to disable libcheck) In-Reply-To: References: <200605041838.50612.jackm@mellanox.co.il> <20060504171543.GB4682@mellanox.co.il> <20060504201120.GE4682@mellanox.co.il> Message-ID: <20060504202443.GF4682@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] libmthca: allow make dist without needing to install libibverbs first (option to disable libcheck) > > Michael> Sorry, I was replying out of context. I think the > Michael> ability to compile packages as a regular user, and then > Michael> install them all as root is required by some of our > Michael> users. > > How does disabling the check help you? The build will just fail at > compile time instead of configure time. But make dist will pass, so you can prepare tarballs without installing (and compiling) things. -- MST From mshefty at ichips.intel.com Thu May 4 13:27:47 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 04 May 2006 13:27:47 -0700 Subject: [openib-general] Re: SA MultiPathRecord v.PathRecord In-Reply-To: <1146768905.4719.47467.camel@hal.voltaire.com> References: <1146760956.4719.45698.camel@hal.voltaire.com> <445A34CE.4060700@ichips.intel.com> <1146763871.4719.46353.camel@hal.voltaire.com> <445A3FF2.6050508@ichips.intel.com> <1146766014.4719.46836.camel@hal.voltaire.com> <445A4A2B.2080802@ichips.intel.com> <1146768905.4719.47467.camel@hal.voltaire.com> Message-ID: <445A63C3.8060808@ichips.intel.com> Hal Rosenstock wrote: >> that I'm not sure we can provide value beyond basic MAD support. > > By that do you mean expose the PRs returned ? That along with exposing the details of the query use to obtain the path records. I'm wondering if we can come up with a sensible abstraction for path records returned by a MPR query, or if a MAD send/receive interface is the best that we can realistically do. I'm trying to determine how to integrate MPR queries with something like the RDMA CM. - Sean From rdreier at cisco.com Thu May 4 13:28:06 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 04 May 2006 13:28:06 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <4449487A.3080004@mellanox.com> (Vu Pham's message of "Fri, 21 Apr 2006 14:02:50 -0700") References: <443C4934.7080400@mellanox.com> <443E9A88.7020302@mellanox.com> <443FC35F.6080301@mellanox.com> <44450F89.7020500@mellanox.com> <44451D32.1010106@mellanox.com> <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> Message-ID: > 1. srp_unmap_data() and srp_remove_req() for .eh_abort_handler(scmnd) > a. abort get timeout or > b. req->cmd_done or > c. !req->tsk_status > 2. we should do step (1) for .eh_abort_handler(scmnd) only and don't > do step 1 for .eh_device_reset_handler(scmnd) since same scsi command > is used for all .eh_handler() > 3. scsi command is used in all .eh_handler() will be freed by scsi > midlayer at the end of error handling sequences > 4. If we don't do step 1, scsi command which is used in all > .eh_handler() and freed is still in our pending queue and is > referenced in srp_reconnect_target() / reinit request ring So I finally got a chance to look at this in detail. It does look like we should remove the request in (1) if the command finishes or the abort succeeds. However if the abort times out then then command is still out there -- shouldn't we wait for the eh_device_reset_handler and then flush all matching commands there? And I don't understand (4) -- isn't srp_reconnect_target() being called from srp_reset_host() as part of the error handling sequence? Unless I'm misreading the code in scsi_error.c, commands don't get freed (assuming all aborts and device resets fail) before then. What am I missing? In your case, where the abort and device reset fail and then the host reset gets called, where was the command getting freed? Thanks, Roland From rdreier at cisco.com Thu May 4 13:31:26 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 04 May 2006 13:31:26 -0700 Subject: [openib-general] Re: [PATCH] libmthca: allow make dist without needing to install libibverbs first (option to disable libcheck) In-Reply-To: <20060504202443.GF4682@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 4 May 2006 23:24:43 +0300") References: <200605041838.50612.jackm@mellanox.co.il> <20060504171543.GB4682@mellanox.co.il> <20060504201120.GE4682@mellanox.co.il> <20060504202443.GF4682@mellanox.co.il> Message-ID: Michael> But make dist will pass, so you can prepare tarballs Michael> without installing (and compiling) things. I guess we're back to where we started then. Why do you want to do that? If you want a tarball, why can't you just download a tarball? The previous example of building as a normal user obviously isn't the reason, since this flag won't let you do that. An "--enable-cryptic-build-failure" flag doesn't seem a good thing to distribute by default. - R. From mst at mellanox.co.il Thu May 4 13:36:42 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 4 May 2006 23:36:42 +0300 Subject: [openib-general] Re: Re: [PATCH] libmthca: allow make dist without needing to install libibverbs first (option to disable libcheck) In-Reply-To: References: <200605041838.50612.jackm@mellanox.co.il> <20060504171543.GB4682@mellanox.co.il> <20060504201120.GE4682@mellanox.co.il> <20060504202443.GF4682@mellanox.co.il> Message-ID: <20060504203642.GG4682@mellanox.co.il> Quoting r. Roland Dreier : > If you want a tarball, why can't you just download a tarball? Well, we are trying to build the tarballs. Someone has to do this :) It would be nice not to check dependencies at that point, since we are not going to build on the same machine. -- MST From or.gerlitz at gmail.com Thu May 4 13:54:18 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Thu, 4 May 2006 22:54:18 +0200 Subject: [openib-general] re RDS missing features In-Reply-To: <96f8e60e0605021035j24cb9d61jbce4f5f582acfe35@mail.gmail.com> References: <96f8e60e0605011042ve9bb9m5e9675256a11eacd@mail.gmail.com> <44571705.9000208@voltaire.com> <96f8e60e0605021035j24cb9d61jbce4f5f582acfe35@mail.gmail.com> Message-ID: <15ddcffd0605041354x4436201dseeb16644b216f4b0@mail.gmail.com> On 5/2/06, Ranjit Pandit wrote: > On 5/2/06, Or Gerlitz wrote: > > Ranjit Pandit wrote: > Loopback connections can be optimized by not going to the HCA. > In b-copy mode we can directly copy sends into destination sockets on > the same node. So its an optimization missing but the functionality is there, that is currently its possible to run netperf/crload/oracle over RDS when both client and server are on the same node? >> I see. Can you remind me ... where is the location of the reference gen1 >> RDS code? does it support failover? > Yes, Rds reference implementation implements failover across HCAs. > It was checked into contrib/silverstorm/rds. > r3471 was the first checkin and then a few more updates were made with > bug fixes. sorry, but i don't find it, are you reffering to the code under https://openib.org/svn/trunk/contrib/silverstorm/rds/ ? it does not seems to me your GEN1 code, am i wrong? please send me a pointer. > Keep it simple ie., apply the same failover scheme between two ports > whether on same HCA or not. >> Are you aware to something in the openib infrastructure which is missing >> for the failover design of RDS? if you specify the design/requirements i >> am sure people on this list can quickly say if something is missing... > For failover Rds need support for the following: > 1. Ability to assign single IP address to multiple IB ports > 2. Address resolution mechanism should return multiple paths for the > same destination IP address. Cool, i like that. > On SilverStorm stack a single IP address can be assigned to two ports > in the system. > When a path fails, RDS can re-establish connection to the same > destination IP address...ipoib_path( dst_ip) returns all possible > paths to the destination ip. I see. > Does the CMA handle multiple paths to a destination IP? > It does not need to return multiple paths to Rds. For now, even if it > picks the first available path that should be sufficient. I guess by "should be sufficient" you mean for everything but failover? Sean, what will the current ib_addr module implementation does if it being asked to send ARP and there are > 1 network interfaces for the subnet over which the resolution is needed? will it send ARP on the first match or over all the devices? It seems that if there's no issue with setting two network/ipoib devices with the same address the RDS failover scheme should work over the CMA if the callback following rdma_resolve_addr would just be able to return N > 1 SRC / DST GIDs and IB devices, does it make sense to you? And if i have (say) two GID couples, then i would need to just call rdma_resolve_route twice and then rdma_connect twice, correct? Or. From or.gerlitz at gmail.com Thu May 4 14:04:36 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Thu, 4 May 2006 23:04:36 +0200 Subject: [openib-general] Re: RDS RX buf allocation why on RX callback flow? Message-ID: <15ddcffd0605041404r55cd1c1evfc42923212e9a624@mail.gmail.com> On 4/27/06, Ranjit Pandit wrote: > On 4/27/06, Leonid Arsh wrote: >> During the run we get error messages in dmesg on the server side. >> Have you seen anything like this? >> Please see the dmesg output below: > I will see if I can reproduce it. I think the issue here is not to reproduce it (easy) but to understand/discuss the design. You are doing GFP_ATOMIC allocation in the rx callback flow which can fail ofcourse but since this is hard irq context you can't use GFP_KERNEL. Since RDS comes to offload Oracle IPC which is somehow transactional by nature (at least the cache fusions) does it make sense to you in the server side to post initial rx buffers and then for each TX before (after) post sending it just post another rx buffer. Same for the client side before (after) posting tx post an rx. This would make the rx posting from thread (process) context and you can use GFP_KERNEL. What about the other tpyes of Oracle IPC, do they also have transactional (req/resp) nature? Or. > > swapper: page allocation failure. order:1, mode:0x20 > > > > Call Trace: {__alloc_pages+662} > > {smp_apic_timer_interrupt+54} > > {apic_timer_interrupt+132} > > {cache_grow+288} > > {cache_alloc_refill+419} > > {kmem_cache_alloc+87} > > {:ib_rds:rds_alloc_buf+16} > > {:ib_rds:rds_alloc_recv_buffer+12} > > {:ib_rds:rds_post_new_recv+23} > > {:ib_rds:rds_recv_completion+85} > > {:ib_rds:rds_cq_callback+87} > > {:ib_mthca:mthca_eq_int+119} > > {do_IRQ+50} {ret_from_intr+0} > > {:ib_mthca:mthca_tavor_interrupt+91} > > {handle_IRQ_event+41} > > {__do_IRQ+156} > > {do_IRQ+45} {ret_from_intr+0} > > {mwait_idle+54} > > {cpu_idle+93} > > {start_secondary+1131} From xma at us.ibm.com Thu May 4 14:15:27 2006 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 4 May 2006 14:15:27 -0700 Subject: [openib-general] Re: [PATCH] IPoIB splitting CQ, increase both send/recv poll NUM_WC & interval In-Reply-To: <10e223bf0604300351r3cd8767fmddd7d8b7a2e130ad@mail.gmail.com> Message-ID: Hello Roland, Finally I finished some of my tests over UP<->UP, UP<->SMP. One of mthca driver couldn't up on SMP after I updated firmware to 3.4.0. I got below error while loading ib_mthca module: May 4 13:42:09 elm3b100 kernel: ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) May 4 13:42:09 elm3b100 kernel: ib_mthca: Initializing ?????<8e>,? May 4 13:42:09 elm3b100 kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 96 (level, low) -> IRQ 19 May 4 13:42:09 elm3b100 kernel: ?????<8e>,?: Missing DCS, aborting. It has no problem to bring it up on UP kernel. Any clue? UP<->UP results are good. The one netperf performance could tune up to around 80% throughput improvement with 4-5% latency increase on linux 2.6.16 UP kernel. UP<->SMP has the similar result as SMP with one stream, increase CPU doesn't help the performance since the driver is the bottleneck. I will submit these patches for review soon. In the meanwhile I will continue to test them on different configuration. It would be helpful if anyone can test on large node cluster with different adapters. (multiple streams? two ports? multiple links?) Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Thu May 4 14:23:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 04 May 2006 14:23:00 -0700 Subject: [openib-general] Re: [PATCH] IPoIB splitting CQ, increase both send/recv poll NUM_WC & interval In-Reply-To: (Shirley Ma's message of "Thu, 4 May 2006 14:15:27 -0700") References: Message-ID: > May 4 13:42:09 elm3b100 kernel: ib_mthca: Initializing ?????<8e>,? This is coming from printk(KERN_INFO PFX "Initializing %s\n", pci_name(pdev)); so something is screwed up. Perhaps your ib_mthca module needs to be recompiled to match the SMP kernel? Shirley> I will submit these patches for review soon. In the Shirley> meanwhile I will continue to test them on different Shirley> configuration. It would be helpful if anyone can test on Shirley> large node cluster with different adapters. (multiple Shirley> streams? two ports? multiple links?) Kind of hard for anyone to do anything with the patches until you post them... - R. From rdreier at cisco.com Thu May 4 14:29:54 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 04 May 2006 14:29:54 -0700 Subject: [openib-general] [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: <4450A196.2050901@de.ibm.com> (Heiko J. Schick's message of "Thu, 27 Apr 2006 12:48:54 +0200") References: <4450A196.2050901@de.ibm.com> Message-ID: > +void ehca_queue_comp_task(struct ehca_comp_pool *pool, struct ehca_cq *__cq) > +{ > + int cpu; > + int cpu_id; > + struct ehca_cpu_comp_task *cct; > + unsigned long flags_cct; > + unsigned long flags_cq; > + > + cpu = get_cpu(); > + cpu_id = find_next_online_cpu(pool); > + > + EDEB_EN(7, "pool=%p cq=%p cq_nr=%x CPU=%x:%x:%x:%x", > + pool, __cq, __cq->cq_number, > + cpu, cpu_id, num_online_cpus(), num_possible_cpus()); > + > + BUG_ON(!cpu_online(cpu_id)); > + > + cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); > + > + spin_lock_irqsave(&cct->task_lock, flags_cct); > + spin_lock_irqsave(&__cq->task_lock, flags_cq); > + > + if (__cq->nr_callbacks == 0) { > + __cq->nr_callbacks++; > + list_add_tail(&__cq->entry, &cct->cq_list); > + wake_up(&cct->wait_queue); > + } > + else > + __cq->nr_callbacks++; > + > + spin_unlock_irqrestore(&__cq->task_lock, flags_cq); > + spin_unlock_irqrestore(&cct->task_lock, flags_cct); > + > + put_cpu(); > + > + EDEB_EX(7, "cct=%p", cct); > + > + return; > +} I never read the ehca completion event handling code very carefully until now. But I was motivated by Shirley's work on IPoIB to take a closer look. It seems that you are deferring completion event dispatch into threads spread across all the CPUs. This seems like a very strange thing to me -- you are adding latency and possibly causing cacheline pingpong. It may help throughput in some cases to spread the work across multiple CPUs but it seems strange to me to do this in the low-level driver. My intuition would be that it would be better to do this in the higher levels, and leave open the possibility for protocols that want the lowest possible latency to be called directly from the interrupt handler. What was the thinking that led to this design? - R. From weiny2 at llnl.gov Thu May 4 15:16:19 2006 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 04 May 2006 15:16:19 -0700 Subject: [openib-general] "ib0 already has pending op 2" ??? Message-ID: <20060504151619.46171cc5.weiny2@llnl.gov> I have put svn 6829 on our test cluster here and I am getting this error again. (I used to get it a while ago but it went away. I figured something had been fixed as time went on.) This stack was backported to 2.6.9 using Woody's patch. May 4 14:37:27 odev1 ib_at: ib_dev_ats_op: dev (ffffffffa00dbcc0) ib0 already has pending op 2 What does this mean and how can I get rid of it? ibv_rc_pingpong is not working and ib0 is up but not working either. I saw this some time ago and a reboot would clear it up. Is this something stuck in the hardware? I tried a power off and that did not clear it up either. # odev1 /root > cat /sys/class/infiniband/mthca0/hw_rev a1 # odev1 /root > cat /sys/class/infiniband/mthca0/fw_ver 3.3.2 # odev1 /root > cat /sys/class/infiniband/mthca0/hca_type MT23108 # odev1 /root > cat /sys/class/infiniband/mthca0/node_type 1: CA # odev1 /root > cat /sys/class/infiniband/mthca0/board_id VLT0010020001 Thanks, Ira Weiny LLNL From bugzilla-daemon at openib.org Thu May 4 15:34:30 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 4 May 2006 15:34:30 -0700 (PDT) Subject: [openib-general] [Bug 50] New: 1.0rc4 build fails Message-ID: <20060504223430.CAB0B2283D7@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=50 Summary: 1.0rc4 build fails Product: OpenFabrics Linux Version: gen2 Platform: X86-64 OS/Version: 2.6.9 Status: NEW Severity: normal Priority: P2 Component: IB Core AssignedTo: bugzilla at openib.org ReportedBy: weiny2 at llnl.gov I downloaded the 1.0rc4 from: https://openib.org/svn/gen2/branches/1.0/ofed/releases/ and followed the build instructions on a RHEL4 U3 installation. The following message appeared: ERROR: Failed executing "env NETWORK_CONF_DIR=/etc/sysconfig/network-scripts /tmp/openib-1.0/build_rpm.sh --prefix /usr/local/ofed --build_root /var/tmp/OFED --packages ib_ipath ib_ipoib ib_mthca ib_rds ib_sdp ib_verbs -- --with-libibcm --with-libibcommon --with-libibmad --with-libibumad --with-libibverbs --with-libipathverbs --with-libmthca --with-mstflint --with-openib-diags --with-opensm --with-perftest -kver 2.6.9-39chaos --ksrc /lib/modules/2.6.9-39chaos/build " Digging through the log files I find the following compile error: gcc -Wp,-MD,/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/.user_mad.o.d -nostdinc -iwithprefix include -D__KERNEL__ -I/var/tmp/OFED/tmp/openib/openib/include -I/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/include -Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Os -g -Wdeclaration-after-statement -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -funit-at-a-time -I/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/include -I/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/ulp/ipoib -I/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/ulp/kdapl -I/var/tmp/OFED/tmp/openib/openib/drivers/infiniband/debug -D__nocast= -DMODULE -DKBUILD_BASENAME=user_mad -DKBUILD_MODNAME=ib_umad -c -o /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/.tmp_user_mad.o /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/user_mad.c /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/user_mad.c:832: warning: static declaration of 'class_create' follows non-static declaration include/linux/device.h:242: warning: previous declaration of 'class_create' was here /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/user_mad.c:866: warning: static declaration of 'class_destroy' follows non-static declaration include/linux/device.h:243: warning: previous declaration of 'class_destroy' was here /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/user_mad.c:892: error: conflicting types for 'class_device_create' include/linux/device.h:248: error: previous declaration of 'class_device_create' was here /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/user_mad.c:892: error: conflicting types for 'class_device_create' include/linux/device.h:248: error: previous declaration of 'class_device_create' was here make[3]: *** [/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/user_mad.o] Error 1 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sean.hefty at intel.com Thu May 4 15:51:20 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 4 May 2006 15:51:20 -0700 Subject: [openib-general] [PATCH 1/3] rdma cm: allow user to specify path record for connections Message-ID: Allow a user to set the route used by a connection established over Infiniband. Signed-off-by: Sean Hefty --- Index: core/cma.c =================================================================== --- core/cma.c (revision 6884) +++ core/cma.c (working copy) @@ -34,8 +34,11 @@ #include #include #include + #include + #include +#include #include #include #include @@ -1183,6 +1186,30 @@ err1: return ret; } +int rdma_set_ib_paths(struct rdma_cm_id *id, + struct ib_sa_path_rec *path_rec, int num_paths) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp_exch(id_priv, CMA_ADDR_RESOLVED, CMA_ROUTE_RESOLVED)) + return -EINVAL; + + id->route.path_rec = kmalloc(sizeof *path_rec * num_paths, GFP_KERNEL); + if (!id->route.path_rec) { + ret = -ENOMEM; + goto err; + } + + memcpy(id->route.path_rec, path_rec, sizeof *path_rec * num_paths); + return 0; +err: + cma_comp_exch(id_priv, CMA_ROUTE_RESOLVED, CMA_ADDR_RESOLVED); + return ret; +} +EXPORT_SYMBOL(rdma_set_ib_paths); + int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms) { struct rdma_id_private *id_priv; Index: include/rdma/rdma_cm_ib.h =================================================================== --- include/rdma/rdma_cm_ib.h (revision 0) +++ include/rdma/rdma_cm_ib.h (revision 0) @@ -0,0 +1,48 @@ +/* + * Copyright (c) 2006 Intel Corporation. All rights reserved. + * + * This Software is licensed under one of the following licenses: + * + * 1) under the terms of the "Common Public License 1.0" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/cpl.php. + * + * 2) under the terms of the "The BSD License" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/bsd-license.php. + * + * 3) under the terms of the "GNU General Public License (GPL) Version 2" a + * copy of which is available from the Open Source Initiative, see + * http://www.opensource.org/licenses/gpl-license.php. + * + * Licensee has the right to choose one of the above licenses. + * + * Redistributions of source code must retain the above copyright + * notice and one of the license notices. + * + * Redistributions in binary form must reproduce both the above copyright + * notice, one of the license notices in the documentation + * and/or other materials provided with the distribution. + * + */ + +#if !defined(RDMA_CM_IB_H) +#define RDMA_CM_IB_H + +#include + +/** + * rdma_set_ib_paths - Manually sets the path records used to establish a + * connection. + * @id: Connection identifier associated with the request. + * @path_rec: Reference to the path record + * + * This call permits a user to specify routing information for rdma_cm_id's + * bound to Infiniband devices. It is called on the client side of a + * connection and replaces the call to rdma_resolve_route. + */ +int rdma_set_ib_paths(struct rdma_cm_id *id, + struct ib_sa_path_rec *path_rec, int num_paths); + +#endif /* RDMA_CM_IB_H */ + Property changes on: include/rdma/rdma_cm_ib.h ___________________________________________________________________ Name: svn:executable + * From robert.j.woodruff at intel.com Thu May 4 15:56:52 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 4 May 2006 15:56:52 -0700 Subject: [openib-general] OFED-1.0-rc4 is available Message-ID: <1AC79F16F5C5284499BB9591B33D6F00079C64F4@orsmsx408> Tziporet wrote, >Hi All, >We have prepared OFED 1.0 RC4. I took a version of the OFED RC4 kernel code, gen2/branches/1.0/ofed/tags/rc4/linux-kernel applied my latest backport patch (for svn6829), which applied fine. and built a kernel RPM for testing. Then I took the 1.0 userspace code and built it. I found that using the cma version of uDAPL did not work and caused a core dump. Using the newer userspace cma.c code fixes the problem. I applied this patch and it fixed the problem. Not sure if anyone cares about having the rdma_cm in OFED, but if they do, I think it needs this fix. woody --- cma.c 2006-04-07 10:15:20.000000000 -0700 +++ /home/woody/gen2/trunk/src/userspace/librdmacm/src/cma.c 2006-05-04 16:24:00.701184088 -0700 @@ -109,6 +109,7 @@ struct cma_id_private { struct rdma_cm_id id; struct cma_device *cma_dev; int events_completed; + int connect_error; pthread_cond_t cond; pthread_mutex_t mut; uint32_t handle; @@ -150,10 +151,8 @@ static int check_abi_version(void) return -ENODEV; } - strncat(path, "/class/misc/rdma_cm/abi_version", sizeof path); - if (sysfs_read_attribute_value(path, val, sizeof val)) - abi_ver = 1; /* ABI version wasn't available until version 2 */ - else + strncat(path, "/class/infiniband_ucma/abi_version", sizeof path); + if (!sysfs_read_attribute_value(path, val, sizeof val)) abi_ver = strtol(val, NULL, 10); if (abi_ver < RDMA_USER_CM_MIN_ABI_VERSION || @@ -435,11 +434,9 @@ int rdma_bind_addr(struct rdma_cm_id *id if (ret != size) return (ret > 0) ? -ENODATA : ret; - if (abi_ver > 1) { - ret = ucma_query_route(id); - if (ret) - return ret; - } + ret = ucma_query_route(id); + if (ret) + return ret; memcpy(&id->route.addr.src_addr, addr, addrlen); return 0; @@ -689,7 +686,7 @@ int rdma_listen(struct rdma_cm_id *id, i if (ret != size) return (ret > 0) ? -ENODATA : ret; - return 0; + return ucma_query_route(id); } int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) @@ -924,17 +921,27 @@ retry: evt->status = ucma_process_conn_resp(id_priv); if (!evt->status) evt->event = RDMA_CM_EVENT_ESTABLISHED; - else + else { evt->event = RDMA_CM_EVENT_CONNECT_ERROR; + id_priv->connect_error = 1; + } break; case RDMA_CM_EVENT_ESTABLISHED: evt->status = ucma_process_establish(&id_priv->id); - if (evt->status) + if (evt->status) { evt->event = RDMA_CM_EVENT_CONNECT_ERROR; + id_priv->connect_error = 1; + } break; case RDMA_CM_EVENT_REJECTED: + if (id_priv->connect_error) + goto retry; ucma_modify_qp_err(evt->id); break; + case RDMA_CM_EVENT_DISCONNECTED: + if (id_priv->connect_error) + goto retry; + break; default: break; } From sean.hefty at intel.com Thu May 4 16:13:45 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 4 May 2006 16:13:45 -0700 Subject: [openib-general] [PATCH 2/3] ucma: add kernel support for get/set options In-Reply-To: Message-ID: Add kernel support for retrieving and setting options on an rdma_cm_id. This provides functionality conceptually similar to getsockopt/getsockopt. Add an option to retrieve possible routes (path records) for a connection. A client may use this to select a path that a connection will use when being established. Signed-off-by: Sean Hefty --- Index: core/ucma.c =================================================================== --- core/ucma.c (revision 6884) +++ core/ucma.c (working copy) @@ -41,6 +41,8 @@ #include #include +#include "ucma_ib.h" + MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("RDMA Userspace Connection Manager Access"); MODULE_LICENSE("Dual BSD/GPL"); @@ -656,6 +658,83 @@ out: return ret; } +static ssize_t ucma_get_option(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_get_option cmd; + struct rdma_ucm_get_option_resp resp; + struct ucma_context *ctx; + int ret; + + if (out_len < sizeof(resp)) + return -ENOSPC; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + resp.optlen = cmd.optlen; + + switch (cmd.level) { + case RDMA_PROTO_IP: + ret = -ENOSYS; + break; + case RDMA_PROTO_IB: + ret = ucma_get_ib_option(ctx->cm_id, cmd.optname, + (void *) (unsigned long) cmd.optval, + &resp.optlen); + break; + default: + ret = -EINVAL; + break; + } + + if (ret) + goto out; + + if (copy_to_user((void __user *)(unsigned long)cmd.response, + &resp, sizeof(resp))) + ret = -EFAULT; +out: + ucma_put_ctx(ctx); + return ret; +} + +static ssize_t ucma_set_option(struct ucma_file *file, const char __user *inbuf, + int in_len, int out_len) +{ + struct rdma_ucm_set_option cmd; + struct ucma_context *ctx; + int ret; + + if (copy_from_user(&cmd, inbuf, sizeof(cmd))) + return -EFAULT; + + ctx = ucma_get_ctx(file, cmd.id); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + + switch (cmd.level) { + case RDMA_PROTO_IP: + ret = -ENOSYS; + break; + case RDMA_PROTO_IB: + ret = ucma_set_ib_option(ctx->cm_id, cmd.optname, + (void *) (unsigned long) cmd.optval, + cmd.optlen); + break; + default: + ret = -EINVAL; + break; + } + + ucma_put_ctx(ctx); + return ret; +} + static ssize_t (*ucma_cmd_table[])(struct ucma_file *file, const char __user *inbuf, int in_len, int out_len) = { @@ -671,7 +750,9 @@ static ssize_t (*ucma_cmd_table[])(struc [RDMA_USER_CM_CMD_REJECT] = ucma_reject, [RDMA_USER_CM_CMD_DISCONNECT] = ucma_disconnect, [RDMA_USER_CM_CMD_INIT_QP_ATTR] = ucma_init_qp_attr, - [RDMA_USER_CM_CMD_GET_EVENT] = ucma_get_event + [RDMA_USER_CM_CMD_GET_EVENT] = ucma_get_event, + [RDMA_USER_CM_CMD_GET_OPTION] = ucma_get_option, + [RDMA_USER_CM_CMD_SET_OPTION] = ucma_set_option }; static ssize_t ucma_write(struct file *filp, const char __user *buf, Index: core/ucma_ib.c =================================================================== --- core/ucma_ib.c (revision 0) +++ core/ucma_ib.c (revision 0) @@ -0,0 +1,144 @@ +/* + * Copyright (c) 2006 Intel Corporation. All rights reserved. + * + * This Software is licensed under one of the following licenses: + * + * 1) under the terms of the "Common Public License 1.0" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/cpl.php. + * + * 2) under the terms of the "The BSD License" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/bsd-license.php. + * + * 3) under the terms of the "GNU General Public License (GPL) Version 2" a + * copy of which is available from the Open Source Initiative, see + * http://www.opensource.org/licenses/gpl-license.php. + * + * Licensee has the right to choose one of the above licenses. + * + * Redistributions of source code must retain the above copyright + * notice and one of the license notices. + * + * Redistributions in binary form must reproduce both the above copyright + * notice, one of the license notices in the documentation + * and/or other materials provided with the distribution. + * + */ + +#include +#include +#include +#include +#include + +#include "ucma_ib.h" + +static int ucma_get_paths(struct rdma_cm_id *id, + void __user *paths, size_t *len) +{ + struct ib_sa_cursor *cursor; + struct ib_sa_path_rec *path; + struct ib_user_path_rec user_path; + union ib_gid *gid; + int left, ret = 0; + u16 pkey; + + if (!id->device) + return -ENODEV; + + gid = ib_addr_get_dgid(&id->route.addr.dev_addr); + pkey = ib_addr_get_pkey(&id->route.addr.dev_addr); + cursor = ib_create_path_cursor(id->device, id->port_num, gid); + if (IS_ERR(cursor)) + return PTR_ERR(cursor); + + gid = ib_addr_get_sgid(&id->route.addr.dev_addr); + left = *len; + *len = 0; + + for (path = ib_get_next_sa_attr(&cursor); path; + path = ib_get_next_sa_attr(&cursor)) { + if (pkey == path->pkey && + !memcmp(gid, path->sgid.raw, sizeof *gid)) { + if (paths) { + ib_copy_path_rec_to_user(&user_path, path); + if (copy_to_user(paths, &user_path, + sizeof(user_path))) { + ret = -EFAULT; + break; + } + left -= sizeof(user_path); + if (left < sizeof(user_path)) + break; + paths += sizeof(user_path); + } + *len += sizeof(user_path); + } + } + + ib_free_sa_cursor(cursor); + return ret; +} + +int ucma_get_ib_option(struct rdma_cm_id *id, int optname, + void *optval, size_t *optlen) +{ + switch (optname) { + case IB_PATH_OPTIONS: + return ucma_get_paths(id, optval, optlen); + default: + return -EINVAL; + } +} + +static int ucma_set_paths(struct rdma_cm_id *id, + void __user *paths, size_t len) +{ + struct ib_sa_path_rec *path_rec; + struct ib_user_path_rec *user_path; + int ret, num_paths, i; + + if (len == sizeof(*user_path)) + num_paths = 1; + else if (len == (sizeof(*user_path) << 1)) + num_paths = 2; + else + return -EINVAL; + + path_rec = kmalloc(sizeof *path_rec * num_paths, GFP_KERNEL); + if (!path_rec) + return -ENOMEM; + + user_path = kmalloc(sizeof *user_path * num_paths, GFP_KERNEL); + if (!user_path) { + ret = -ENOMEM; + goto out; + } + + if (copy_from_user(user_path, paths, sizeof *user_path * num_paths)) { + ret = -EFAULT; + goto out2; + } + + for (i = 0; i < num_paths; i++) + ib_copy_path_rec_from_user(path_rec + i, user_path + i); + + ret = rdma_set_ib_paths(id, path_rec, num_paths); +out2: + kfree(user_path); +out: + kfree(path_rec); + return ret; +} + +int ucma_set_ib_option(struct rdma_cm_id *id, int optname, + void *optval, size_t optlen) +{ + switch (optname) { + case IB_PATH_OPTIONS: + return ucma_set_paths(id, optval, optlen); + default: + return -EINVAL; + } +} Property changes on: core/ucma_ib.c ___________________________________________________________________ Name: svn:executable + * Index: core/ucma_ib.h =================================================================== --- core/ucma_ib.h (revision 0) +++ core/ucma_ib.h (revision 0) @@ -0,0 +1,40 @@ +/* + * Copyright (c) 2006 Intel Corporation. All rights reserved. + * + * This Software is licensed under one of the following licenses: + * + * 1) under the terms of the "Common Public License 1.0" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/cpl.php. + * + * 2) under the terms of the "The BSD License" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/bsd-license.php. + * + * 3) under the terms of the "GNU General Public License (GPL) Version 2" a + * copy of which is available from the Open Source Initiative, see + * http://www.opensource.org/licenses/gpl-license.php. + * + * Licensee has the right to choose one of the above licenses. + * + * Redistributions of source code must retain the above copyright + * notice and one of the license notices. + * + * Redistributions in binary form must reproduce both the above copyright + * notice, one of the license notices in the documentation + * and/or other materials provided with the distribution. + * + */ + +#if !defined(UCMA_IB_H) +#define UCMA_IB_H + +#include + +int ucma_get_ib_option(struct rdma_cm_id *id, int optname, + void *optval, size_t *optlen); + +int ucma_set_ib_option(struct rdma_cm_id *id, int optname, + void *optval, size_t optlen); + +#endif /* UCMA_IB_H */ Property changes on: core/ucma_ib.h ___________________________________________________________________ Name: svn:executable + * Index: core/Makefile =================================================================== --- core/Makefile (revision 6884) +++ core/Makefile (working copy) @@ -22,7 +22,7 @@ ib_cm-y := cm.o rdma_cm-y := cma.o -rdma_ucm-y := ucma.o +rdma_ucm-y := ucma.o ucma_ib.o ib_addr-y := addr.o Index: include/rdma/rdma_user_cm.h =================================================================== --- include/rdma/rdma_user_cm.h (revision 6884) +++ include/rdma/rdma_user_cm.h (working copy) @@ -55,7 +55,9 @@ enum { RDMA_USER_CM_CMD_REJECT, RDMA_USER_CM_CMD_DISCONNECT, RDMA_USER_CM_CMD_INIT_QP_ATTR, - RDMA_USER_CM_CMD_GET_EVENT + RDMA_USER_CM_CMD_GET_EVENT, + RDMA_USER_CM_CMD_GET_OPTION, + RDMA_USER_CM_CMD_SET_OPTION, }; /* @@ -183,4 +185,36 @@ struct rdma_ucm_event_resp { __u8 private_data[RDMA_MAX_PRIVATE_DATA]; }; +struct rdma_ucm_get_option { + __u64 response; + __u64 optval; + __u32 id; + __u32 level; + __u32 optname; + __u32 optlen; +}; + +/* Protocol levels for get/set options. */ +enum { + RDMA_PROTO_IP = 0, + RDMA_PROTO_IB = 1, +}; + +/* IB specific option names for get/set. */ +enum { + IB_PATH_OPTIONS = 1, +}; + +struct rdma_ucm_get_option_resp { + __u32 optlen; +}; + +struct rdma_ucm_set_option { + __u64 optval; + __u32 id; + __u32 level; + __u32 optname; + __u32 optlen; +}; + #endif /* RDMA_USER_CM_H */ From sean.hefty at intel.com Thu May 4 16:15:59 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 4 May 2006 16:15:59 -0700 Subject: [openib-general] [PATCH 3/3] librdmacm: add ability to get/set transport specific options In-Reply-To: Message-ID: Add routines to the userspace RDMA CM library to get/set transport specific options. Add an option to retrieve possible path records for a connection, and set which path a connection will be established on. Signed-off-by: Sean Hefty --- Index: include/rdma/rdma_cma_abi.h =================================================================== --- include/rdma/rdma_cma_abi.h (revision 6335) +++ include/rdma/rdma_cma_abi.h (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -57,7 +57,9 @@ enum { UCMA_CMD_REJECT, UCMA_CMD_DISCONNECT, UCMA_CMD_INIT_QP_ATTR, - UCMA_CMD_GET_EVENT + UCMA_CMD_GET_EVENT, + UCMA_CMD_GET_OPTION, + UCMA_CMD_SET_OPTION, }; struct ucma_abi_cmd_hdr { @@ -182,4 +184,25 @@ struct ucma_abi_event_resp { __u8 private_data[RDMA_MAX_PRIVATE_DATA]; }; +struct ucma_abi_get_option { + __u64 response; + __u64 optval; + __u32 id; + __u32 level; + __u32 optname; + __u32 optlen; +}; + +struct ucma_abi_get_option_resp { + __u32 optlen; +}; + +struct ucma_abi_set_option { + __u64 optval; + __u32 id; + __u32 level; + __u32 optname; + __u32 optlen; +}; + #endif /* RDMA_CMA_ABI_H */ Index: include/rdma/rdma_cma.h =================================================================== --- include/rdma/rdma_cma.h (revision 5693) +++ include/rdma/rdma_cma.h (working copy) @@ -1,6 +1,6 @@ /* * Copyright (c) 2005 Voltaire Inc. All rights reserved. - * Copyright (c) 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. * * This Software is licensed under one of the following licenses: * @@ -54,6 +54,17 @@ enum rdma_cm_event_type { RDMA_CM_EVENT_DEVICE_REMOVAL, }; +/* Protocol levels for get/set options. */ +enum { + RDMA_PROTO_IP = 0, + RDMA_PROTO_IB = 1, +}; + +/* IB specific option names for get/set. */ +enum { + IB_PATH_OPTIONS = 1, +}; + struct ib_addr { union ibv_gid sgid; union ibv_gid dgid; @@ -219,4 +230,27 @@ int rdma_ack_cm_event(struct rdma_cm_eve int rdma_get_fd(void); +/** + * rdma_get_option - Retrieve options for an rdma_cm_id. + * @id: Communication identifier to retrieve option for. + * @level: Protocol level of the option to retrieve. + * @optname: Name of the option to retrieve. + * @optval: Buffer to receive the returned options. + * @optlen: On input, the size of the %optval buffer. On output, the + * size of the returned data. + */ +int rdma_get_option(struct rdma_cm_id *id, int level, int optname, + void *optval, size_t *optlen); + +/** + * rdma_set_option - Set options for an rdma_cm_id. + * @id: Communication identifier to set option for. + * @level: Protocol level of the option to set. + * @optname: Name of the option to set. + * @optval: Reference to the option data. + * @optlen: The size of the %optval buffer. + */ +int rdma_set_option(struct rdma_cm_id *id, int level, int optname, + void *optval, size_t optlen); + #endif /* RDMA_CMA_H */ Index: src/cma.c =================================================================== --- src/cma.c (revision 6674) +++ src/cma.c (working copy) @@ -963,3 +963,51 @@ int rdma_get_fd(void) return cma_fd; } + +int rdma_get_option(struct rdma_cm_id *id, int level, int optname, + void *optval, size_t *optlen) +{ + struct ucma_abi_get_option_resp *resp; + struct ucma_abi_get_option *cmd; + struct cma_id_private *id_priv; + void *msg; + int ret, size; + + CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_GET_OPTION, size); + id_priv = container_of(id, struct cma_id_private, id); + cmd->id = id_priv->handle; + cmd->optval = (uintptr_t) optval; + cmd->level = level; + cmd->optname = optname; + cmd->optlen = *optlen; + + ret = write(cma_fd, msg, size); + if (ret != size) + return (ret > 0) ? -ENODATA : ret; + + *optlen = resp->optlen; + return 0; +} + +int rdma_set_option(struct rdma_cm_id *id, int level, int optname, + void *optval, size_t optlen) +{ + struct ucma_abi_set_option *cmd; + struct cma_id_private *id_priv; + void *msg; + int ret, size; + + CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_SET_OPTION, size); + id_priv = container_of(id, struct cma_id_private, id); + cmd->id = id_priv->handle; + cmd->optval = (uintptr_t) optval; + cmd->level = level; + cmd->optname = optname; + cmd->optlen = optlen; + + ret = write(cma_fd, msg, size); + if (ret != size) + return (ret > 0) ? -ENODATA : ret; + + return 0; +} Index: src/librdmacm.map =================================================================== --- src/librdmacm.map (revision 5693) +++ src/librdmacm.map (working copy) @@ -15,5 +15,7 @@ RDMACM_1.0 { rdma_get_cm_event; rdma_ack_cm_event; rdma_get_fd; + rdma_get_option; + rdma_set_option; local: *; }; From halr at voltaire.com Thu May 4 16:54:04 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 04 May 2006 19:54:04 -0400 Subject: [openib-general] "ib0 already has pending op 2" ??? In-Reply-To: <20060504151619.46171cc5.weiny2@llnl.gov> References: <20060504151619.46171cc5.weiny2@llnl.gov> Message-ID: <1146786843.4719.51603.camel@hal.voltaire.com> Hi Ita, On Thu, 2006-05-04 at 18:16, Ira Weiny wrote: > I have put svn 6829 on our test cluster here and I am getting this error again. > (I used to get it a while ago but it went away. I figured something had been > fixed as time went on.) This stack was backported to 2.6.9 using Woody's > patch. > > May 4 14:37:27 odev1 ib_at: ib_dev_ats_op: dev (ffffffffa00dbcc0) ib0 already has pending op 2 > > What does this mean and how can I get rid of it? IBAT has been deprecated. You shouldn't need to load and run this. > ibv_rc_pingpong is not > working and ib0 is up but not working either. I saw this some time ago and a > reboot would clear it up. Is this something stuck in the hardware? I tried a > power off and that did not clear it up either. > > # odev1 /root > cat /sys/class/infiniband/mthca0/hw_rev > a1 > # odev1 /root > cat /sys/class/infiniband/mthca0/fw_ver > 3.3.2 > # odev1 /root > cat /sys/class/infiniband/mthca0/hca_type > MT23108 > # odev1 /root > cat /sys/class/infiniband/mthca0/node_type > 1: CA > # odev1 /root > cat /sys/class/infiniband/mthca0/board_id > VLT0010020001 Are you running OpenSM ? Any errors in the log ? Are you running with -V ? That might indicate an issue. -- Hal > Thanks, > Ira Weiny > LLNL > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From xma at us.ibm.com Thu May 4 17:42:48 2006 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 4 May 2006 17:42:48 -0700 Subject: [openib-general] Re: [PATCH] IPoIB splitting CQ, increase both send/recv poll NUM_WC & interval In-Reply-To: Message-ID: Roland, Retried and rebuilt SMP (32) kernel. Got below error. There is no problem on 64 kernel on other node. The mthca driver is from SVN 69XX tree. ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) ib_mthca: Initializing 0000:05:00.0 ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 96 (level, low) -> IRQ 19 Device mthca0 is missing mandatory function query_device ib_mthca: probe of 0000:05:00.0 failed with error -22 Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Thu May 4 18:11:27 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 04 May 2006 18:11:27 -0700 Subject: [openib-general] Re: [PATCH] IPoIB splitting CQ, increase both send/recv poll NUM_WC & interval In-Reply-To: (Shirley Ma's message of "Thu, 4 May 2006 17:42:48 -0700") References: Message-ID: > Retried and rebuilt SMP (32) kernel. Got below error. There is no problem > on 64 kernel on other node. The mthca driver is from SVN 69XX tree. > > ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) > ib_mthca: Initializing 0000:05:00.0 > ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 96 (level, low) -> IRQ 19 > Device mthca0 is missing mandatory function query_device > ib_mthca: probe of 0000:05:00.0 failed with error -22 You have a mismatch between versions of the IB midlayer and the mthca driver. I would suggest doing a completely clean build and making sure that the kernel and all your modules on the target are definitely up-to-date. - R. From sns_parking_henz at tiger.livedoor.com Thu May 4 18:22:50 2006 From: sns_parking_henz at tiger.livedoor.com (=?iso-2022-jp?B?GyRCJUElYyE8JUglaSVzJS0lcyUwGyhC?=) Date: Thu, 4 May 2006 18:22:50 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCJUwlLSVMJS0lUSE8JUYbKEI=?= =?iso-2022-jp?b?GyRCJSM+XDpZJEckORsoQg==?= Message-ID: <20060505012250.3D3512283D7@openib.ca.sandia.gov> 只今好評中のヌキヌキパーティの詳細をお送りさせて頂きます。 ヌキヌキパーティは男女(各20名)様限定でのパーティーとなっています、 毎月開かれておりますので、今回の抽選にもれた場合は次回の抽選対象となります! まずは、ホテルの1室を貸切、全ての方が「覆面着用」でパーティが始まります。 男性女性ともに室内では仮想の服装をお貸ししていますのでそちらに着替えて頂きます。 時間はPM8時開催・終了はAM5時です 途中帰宅などは自由ですのでご安心ください。 http://hanabira.org/c/new_p.cgi?ix13a 動画で見たい方はこちら!! 皆様には一糸乱れぬプレイを楽しんで貰うため、女性の中には毎回プロの方を4名ご用意しております。 http://hanabira.org/c/new_p.cgi?ix13aこちらのコミニティー参加より受付もおこなっていますのでご確認ください。 次回のヌキヌキパーティ参加者の中に「かおりさん」も含まれています。 もし、不明な点などあればご連絡ください。 また、パーティの中には1室乱交ルーム・しっとりルーム(こちらは男女1人ずつしか入場できません)があります。 参加日無料の企画ですが、好評中のため抽選とさせて頂く事はご了承ください。 http://hanabira.org/c/new_p.cgi?ix13aこちらより招待者へのご連絡をお忘れにならないようお願いします。 From mamidala at cse.ohio-state.edu Thu May 4 18:39:42 2006 From: mamidala at cse.ohio-state.edu (amith rajith mamidala) Date: Thu, 4 May 2006 21:39:42 -0400 (EDT) Subject: [openib-general] Dump and load routes with opensm? In-Reply-To: <1146768720.4719.47421.camel@hal.voltaire.com> Message-ID: MVAPICH uses multicast support. It doesnot use SA queries currently. But, we plan to use this feature for multicast in the future, Thanks, Amith On 4 May 2006, Hal Rosenstock wrote: > On Thu, 2006-05-04 at 14:41, Greg Johnson wrote: > > On Thu, May 04, 2006 at 01:48:29PM -0400, Hal Rosenstock wrote: > > > On Thu, 2006-05-04 at 13:26, Greg Johnson wrote: > > > > On Thu, May 04, 2006 at 01:08:54PM -0400, Hal Rosenstock wrote: > > > > > On Thu, 2006-05-04 at 12:55, Greg Johnson wrote: > > > > > > I actually want routes. > > > > > > > > > > OpenSM calculates the unicast and multicast routes and populates the > > > > > (unicast and multicast) forwarding tables. > > > > > > > > > > > I have queried them with ibtraceroute and ibroute, > > > > > > > > > > ibroute dumps the forwarding tables and ibtracert traces the path from a > > > > > source to a destination so these are displaying how OpenSM has setup the > > > > > fabric which is a function of the routing algorithm chosen and the > > > > > physical topology (which may be dynamic). > > > > > > > > > > > but we need routes for the whole fabric. > > > > > > > > > > Unicast, multicast, or both ? Just to look at ? There is no way to load > > > > > these into OpenSM. > > > > > > > > At this point we are only interested in unicast routes. We would like > > > > to be able to dump and load the forwarding tables. > > > > > > You can dump them (via ibroute) just not load them. > > > > Right. I imagine a tool that would dump the routes to a file that could > > be reloaded later. If I could edit the file, I could load my own > > routes as well. > > > > > > We have a single 288 port switch chassis for our cluster. We would like to be able to > > > > load routes for two reasons. One is to be able to do testing with a > > > > fixed set of routes. > > > > > > So the topology is fixed and no links ever fail ? > > > > Yes, basically. We want this for testing, not production (at this > > point). We can handle faults manually for now. The internal topology > > of the switch chassis is fixed. > > > > > > The other is that we would like to program our own > > > > routes into the switches. > > > > > > Once the fabric is up, what are the requirements ? Do you need SA > > > queries (e.g. PathRecords) to work ? > > > > I'm not sure if we need SA queries. What are they good for? > > Helping to set up connections, etc. > > > Basically, we want to be able to run MPI over the IB fabric. We don't need > > anything else. I'm not sure if MVAPICH or OpenMPI use SA queries > > internally. > > I don't think they do currently but will in the near term future. I'm > not sure whether MVAPICH supports multicast but that also would require > SA support. > > -- Hal > > > Greg > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From xma at us.ibm.com Thu May 4 18:53:00 2006 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 4 May 2006 18:53:00 -0700 Subject: [openib-general] Re: [PATCH] IPoIB splitting CQ, increase both send/recv poll NUM_WC & interval In-Reply-To: Message-ID: Roland, Thanks. You are right. After a clean build, it works. I had too many build trees on each node, different kernels, different patches, different SVN trees. which confused me. :( I will clean my patches and start to submit them for you to review. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpandit at silverstorm.com Thu May 4 21:37:11 2006 From: rpandit at silverstorm.com (Ranjit Pandit) Date: Thu, 4 May 2006 21:37:11 -0700 Subject: [openib-general] Re: RDS RX buf allocation why on RX callback flow? In-Reply-To: <15ddcffd0605041404r55cd1c1evfc42923212e9a624@mail.gmail.com> References: <15ddcffd0605041404r55cd1c1evfc42923212e9a624@mail.gmail.com> Message-ID: <96f8e60e0605042137i34fdc175h6a8d509eabecb71f@mail.gmail.com> On 5/4/06, Or Gerlitz wrote: > On 4/27/06, Ranjit Pandit wrote: > > On 4/27/06, Leonid Arsh wrote: > > >> During the run we get error messages in dmesg on the server side. > >> Have you seen anything like this? > >> Please see the dmesg output below: > > > I will see if I can reproduce it. > > I think the issue here is not to reproduce it (easy) but to understand/discuss > the design. You are doing GFP_ATOMIC allocation in the rx callback > flow which can > fail ofcourse but since this is hard irq context you can't use GFP_KERNEL. That is correct. The allocations have to be GFP_ATOMIC since they are happening in interrupt context. > > Since RDS comes to offload Oracle IPC which is somehow transactional > by nature (at least the cache fusions) does it make sense to you in > the server side to post initial rx buffers and then for each TX before > (after) post sending it just post another rx buffer. Same for the > client side before (after) posting tx post an rx. This would make the > rx posting from thread (process) context and you can use GFP_KERNEL. There in no notion of client/server in RDS. Only passive/active based on which node first initiated the connection. Once the connection is established it's more peer-to-peer in terms of data movement. We can't depend on Tx path to refill the Rx queue as the application could decide not to send anything but only receive. We attempt to allocate a new Rx buffer in the interrupt context. If the allocations fail, and we go below a certain low water mark, we should wakeup a thread to refill the Rx queue. Currently I'm re-posting an Rx buffer when it's done being read by recv_msg(). In this particular case though any ideas why the kernel should complain when a GFP_ATOMIC allocation fails?? > > What about the other tpyes of Oracle IPC, do they also have > transactional (req/resp) nature? > > Or. > > > > swapper: page allocation failure. order:1, mode:0x20 > > > > > > Call Trace: {__alloc_pages+662} > > > {smp_apic_timer_interrupt+54} > > > {apic_timer_interrupt+132} > > > {cache_grow+288} > > > {cache_alloc_refill+419} > > > {kmem_cache_alloc+87} > > > {:ib_rds:rds_alloc_buf+16} > > > {:ib_rds:rds_alloc_recv_buffer+12} > > > {:ib_rds:rds_post_new_recv+23} > > > {:ib_rds:rds_recv_completion+85} > > > {:ib_rds:rds_cq_callback+87} > > > {:ib_mthca:mthca_eq_int+119} > > > {do_IRQ+50} {ret_from_intr+0} > > > {:ib_mthca:mthca_tavor_interrupt+91} > > > {handle_IRQ_event+41} > > > {__do_IRQ+156} > > > {do_IRQ+45} {ret_from_intr+0} > > > {mwait_idle+54} > > > {cpu_idle+93} > > > {start_secondary+1131} > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sns_parking_henz at tiger.livedoor.com Thu May 4 22:44:12 2006 From: sns_parking_henz at tiger.livedoor.com (=?iso-2022-jp?B?GyRCJUElYyE8JUglaSVzJS0lcyUwGyhC?=) Date: Thu, 4 May 2006 22:44:12 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?RncbJEIhJyQrJCokaiRHJDkbKEI=?= =?iso-2022-jp?b?GyRCISowbD1vJEs7MjJDJDckXiQ7JHMkKyEpISkbKEI=?= Message-ID: <20060505054412.7C7D72283D7@openib.ca.sandia.gov> 以下、かおりさんからの転送メール -----------ここから----------- こんにちは!かおりといいます。 掲示板を紹介して貰ってたみたいですけど、見て貰えましたか? なんか前から凄く気になってたパーティだったんですけど、なかなか勇気がでなくて参加できませんでした、 今回は友達と一緒に参加してみることにしたんです! 今回のパーティはペアを探さないと参加できないみたいで。。よかったら私と一緒に参加して貰えませんか? http://hanabira.org/c/new_p.cgi?ix13aここで解ると思うけどかおりにメールしてください! 覆面とかして衣装も着るパーティってなんか怖いけど楽しそうですよね!! 初参加なのでぜひ一緒にどうですか?? 返事待ってますね! -----------ここまで----------- 以上 転送メールを頂いています かおりさんからの招待がありました。 のりこさんからはコミニティへの認証メールが届いています。 両者ともに期日がありますのでお早めにご連絡ください。 From sweitzen at cisco.com Thu May 4 22:58:01 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 4 May 2006 22:58:01 -0700 Subject: [openib-general] please create "Open MPI" component in OF bugzilla (third request) Message-ID: Bryan, can you please create this component, or give me admin privs in bugzilla to do it myself? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at openib.org Fri May 5 00:08:45 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Fri, 5 May 2006 00:08:45 -0700 (PDT) Subject: [openib-general] [Bug 50] 1.0rc4 build fails on 2.6.9-39chaos Message-ID: <20060505070845.A366B2283D7@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=50 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|1.0rc4 build fails |1.0rc4 build fails on 2.6.9- | |39chaos ------- Additional Comments From sweitzen at cisco.com 2006-05-05 00:08 ------- I was able to build OFED 1.0 rc4 fine on RH kernel 2.6.9-34.ELsmp. What is 2.6.9-39chaos? Perhaps the problem is with that or any RH 2.6.9-39 kernel. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ishai at mellanox.co.il Fri May 5 05:50:54 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Fri, 5 May 2006 15:50:54 +0300 Subject: [openib-general] Re: [openfabrics-ewg] SRP: changes to ibsrpdm In-Reply-To: <20060504200551.GD4682@mellanox.co.il> References: <20060504200551.GD4682@mellanox.co.il> Message-ID: <20060505125054.GA9818@mellanox.co.il> On Thu, May 04, 2006 at 11:05:51PM +0300, Michael S. Tsirkin wrote: > Quoting r. Roland Dreier : > > Why can't the daemon keep > > track of which targets it has added and not add a target twice? Is > > there some reason that the kernel has to be involved? > > What if the daemon is killed/restarted? > > -- > MST What if there are two daemons by accident? -- Ishai Rabinovitz From schihei at de.ibm.com Fri May 5 06:05:45 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Fri, 05 May 2006 15:05:45 +0200 Subject: [openib-general] [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: References: <4450A196.2050901@de.ibm.com> Message-ID: <445B4DA9.9040601@de.ibm.com> Hello Roland, Roland Dreier wrote: > It seems that you are deferring completion event dispatch into threads > spread across all the CPUs. This seems like a very strange thing to > me -- you are adding latency and possibly causing cacheline pingpong. > > It may help throughput in some cases to spread the work across > multiple CPUs but it seems strange to me to do this in the low-level > driver. My intuition would be that it would be better to do this in > the higher levels, and leave open the possibility for protocols that > want the lowest possible latency to be called directly from the > interrupt handler. We've implemented this "spread CQ callbacks across multiple CPUs" functionality to get better throughput on a SMP system, as you have seen. Originaly, we had the same idea as you mentioned, that it would be better to do this in the higher levels. The point is that we can't see so far any simple posibility how this can done in the OpenIB stack, the TCP/IP network layer or somewhere in the Linux kernel. For example: For IPoIB we get the best throughput when we do the CQ callbacks on different CPUs and not to stay on the same CPU. In other papers and slides (see [1]) you can see similar approaches. I think such one implementation or functionality could require more or less a non-trivial changes. This could be also releated to other I/O traffic. [1]: Speeding up Networking, Van Jacobson and Bob Felderman, http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf Regards, Heiko From rdreier at cisco.com Fri May 5 07:42:14 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 05 May 2006 07:42:14 -0700 Subject: [openib-general] Re: [openfabrics-ewg] SRP: changes to ibsrpdm In-Reply-To: <20060505125054.GA9818@mellanox.co.il> (Ishai Rabinovitz's message of "Fri, 5 May 2006 15:50:54 +0300") References: <20060504200551.GD4682@mellanox.co.il> <20060505125054.GA9818@mellanox.co.il> Message-ID: Ishai> What if there are two daemons by accident? I think you just lose then. If you run two copies of sendmail then all hell will break loose too but I don't think that's a real issue. I guess the daemon should try to check if it's already running just to be safe but I don't think this is a real issue. - R. From rdreier at cisco.com Fri May 5 07:43:44 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 05 May 2006 07:43:44 -0700 Subject: [openib-general] Re: [openfabrics-ewg] SRP: changes to ibsrpdm In-Reply-To: <20060504200551.GD4682@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 4 May 2006 23:05:51 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30208D2EE@mtlexch01.mtl.com> <20060504200551.GD4682@mellanox.co.il> Message-ID: >> Why can't the daemon keep track of which targets it has added >> and not add a target twice? Is there some reason that the >> kernel has to be involved? Michael> What if the daemon is killed/restarted? I guess that's a good point. So I guess the daemon should keep track of the targets that exist by tracking sysfs (and look at which ones are there already). But I still don't think we should have half of the daemon's policy in the kernel. - R. From rdreier at cisco.com Fri May 5 07:49:10 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 05 May 2006 07:49:10 -0700 Subject: [openib-general] [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: <445B4DA9.9040601@de.ibm.com> (Heiko J. Schick's message of "Fri, 05 May 2006 15:05:45 +0200") References: <4450A196.2050901@de.ibm.com> <445B4DA9.9040601@de.ibm.com> Message-ID: Heiko> Originaly, we had the same idea as you mentioned, that it Heiko> would be better to do this in the higher levels. The point Heiko> is that we can't see so far any simple posibility how this Heiko> can done in the OpenIB stack, the TCP/IP network layer or Heiko> somewhere in the Linux kernel. Heiko> For example: For IPoIB we get the best throughput when we Heiko> do the CQ callbacks on different CPUs and not to stay on Heiko> the same CPU. So why not do it in IPoIB then? This approach is not optimal globally. For example, uverbs event dispatch is just going to queue an event and wake up the process waiting for events, and doing this on some random CPU not related to the where the process will run is clearly the worst possible way to dispatch the event. Heiko> In other papers and slides (see [1]) you can see similar Heiko> approaches. Heiko> [1]: Speeding up Networking, Van Jacobson and Bob Heiko> Felderman, Heiko> http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf I think you've misunderstood this paper. It's about maximizing CPU locality and pushing processing directly into the consumer. In the context of slide 9, what you've done is sort of like adding another control loop inside the kernel, since you dispatch from interrupt handler to driver thread to final consumer. So I would argue that your approach is exactly the opposite of what VJ is advocating. - R. From bos at pathscale.com Fri May 5 08:02:27 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 05 May 2006 08:02:27 -0700 Subject: [openib-general] Re: please create "Open MPI" component in OF bugzilla (third request) In-Reply-To: References: Message-ID: <1146841347.4965.3.camel@chalcedony.pathscale.com> On Thu, 2006-05-04 at 22:58 -0700, Scott Weitzenkamp (sweitzen) wrote: > Bryan, can you please create this component, 'tis done. You're the default owner; let me know if you want it to be someone else. Thanks, please make Jeff Squyres (jsquyres at cisco.com) the default owner. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Bryan O'Sullivan [mailto:bos at pathscale.com] > Sent: Friday, May 05, 2006 8:02 AM > To: Scott Weitzenkamp (sweitzen) > Cc: openib-general at openib.org; openfabrics-ewg at openib.org > Subject: Re: please create "Open MPI" component in OF > bugzilla (thirdrequest) > > On Thu, 2006-05-04 at 22:58 -0700, Scott Weitzenkamp (sweitzen) wrote: > > Bryan, can you please create this component, > > 'tis done. You're the default owner; let me know if you want it to be > someone else. > > From bos at pathscale.com Fri May 5 08:44:24 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 05 May 2006 08:44:24 -0700 Subject: [openib-general] RE: please create "Open MPI" component in OF bugzilla (thirdrequest) In-Reply-To: References: Message-ID: <1146843864.4965.29.camel@chalcedony.pathscale.com> On Fri, 2006-05-05 at 08:36 -0700, Scott Weitzenkamp (sweitzen) wrote: > Thanks, please make Jeff Squyres (jsquyres at cisco.com) the default owner. Done. From mshefty at ichips.intel.com Fri May 5 10:55:31 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 05 May 2006 10:55:31 -0700 Subject: [openib-general] [PATCH 1/3] rdma cm: allow user to specify path record for connections In-Reply-To: References: Message-ID: <445B9193.9060102@ichips.intel.com> This patch series has been committed. - Sean From halr at voltaire.com Fri May 5 10:52:15 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 05 May 2006 13:52:15 -0400 Subject: [openib-general] [RFC] [PATCH] OpenSM: Add support for SA MultiPathRecord Message-ID: <1146851533.4719.67172.camel@hal.voltaire.com> OpenSM: Add support for SA MultiPathRecord Add the optional support for SA MultiPathRecord. This is an initial implementation. Note that this capability is not enabled in the build just yet. Signed-off-by: Hal Rosenstock Index: osm/include/opensm/osm_sa_multipath_record.h =================================================================== --- osm/include/opensm/osm_sa_multipath_record.h (revision 0) +++ osm/include/opensm/osm_sa_multipath_record.h (revision 0) @@ -0,0 +1,275 @@ +/* + * Copyright (c) 2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. + * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id$ + */ + + +/* + * Abstract: + * Declaration of osm_mpr_rcv_t. + * This object represents the MultiPathRecord Receiver object. + * attribute from a node. + * This object is part of the OpenSM family of objects. + * + * Environment: + * Linux User Mode + * + */ + +#ifndef _OSM_MPR_RCV_H_ +#define _OSM_MPR_RCV_H_ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef __cplusplus +# define BEGIN_C_DECLS extern "C" { +# define END_C_DECLS } +#else /* !__cplusplus */ +# define BEGIN_C_DECLS +# define END_C_DECLS +#endif /* __cplusplus */ + +BEGIN_C_DECLS + +/****h* OpenSM/MultiPath Record Receiver +* NAME +* MultiPath Record Receiver +* +* DESCRIPTION +* The MultiPath Record Receiver object encapsulates the information +* needed to receive the PathRecord request from a node. +* +* The MultiPath Record Receiver object is thread safe. +* +* This object should be treated as opaque and should be +* manipulated only through the provided functions. +* +* AUTHOR +* Hal Rosenstock, Voltaire +* +*********/ + +/****s* OpenSM: MultiPath Record Receiver/osm_mpr_rcv_t +* NAME +* osm_mpr_rcv_t +* +* DESCRIPTION +* MultiPath Record Receiver structure. +* +* This object should be treated as opaque and should +* be manipulated only through the provided functions. +* +* SYNOPSIS +*/ +typedef struct _osm_mpr_rcv +{ + osm_subn_t *p_subn; + osm_sa_resp_t *p_resp; + osm_mad_pool_t *p_mad_pool; + osm_log_t *p_log; + cl_plock_t *p_lock; + cl_qlock_pool_t pr_pool; +} osm_mpr_rcv_t; +/* +* FIELDS +* p_subn +* Pointer to the Subnet object for this subnet. +* +* p_gen_req_ctrl +* Pointer to the generic request controller. +* +* p_log +* Pointer to the log object. +* +* p_lock +* Pointer to the serializing lock. +* +* pr_pool +* Pool of multipath record objects used to generate query responses. +* +* SEE ALSO +* MultiPath Record Receiver object +*********/ + +/****f* OpenSM: MultiPath Record Receiver/osm_mpr_rcv_construct +* NAME +* osm_mpr_rcv_construct +* +* DESCRIPTION +* This function constructs a MultiPath Record Receiver object. +* +* SYNOPSIS +*/ +void +osm_mpr_rcv_construct( + IN osm_mpr_rcv_t* const p_rcv ); +/* +* PARAMETERS +* p_rcv +* [in] Pointer to a MultiPath Record Receiver object to construct. +* +* RETURN VALUE +* This function does not return a value. +* +* NOTES +* Allows calling osm_mpr_rcv_init, osm_mpr_rcv_destroy +* +* Calling osm_mpr_rcv_construct is a prerequisite to calling any other +* method except osm_mpr_rcv_init. +* +* SEE ALSO +* MultiPath Record Receiver object, osm_mpr_rcv_init, osm_mpr_rcv_destroy +*********/ + +/****f* OpenSM: MultiPath Record Receiver/osm_mpr_rcv_destroy +* NAME +* osm_mpr_rcv_destroy +* +* DESCRIPTION +* The osm_mpr_rcv_destroy function destroys the object, releasing +* all resources. +* +* SYNOPSIS +*/ +void +osm_mpr_rcv_destroy( + IN osm_mpr_rcv_t* const p_rcv ); +/* +* PARAMETERS +* p_rcv +* [in] Pointer to the object to destroy. +* +* RETURN VALUE +* This function does not return a value. +* +* NOTES +* Performs any necessary cleanup of the specified +* MultiPath Record Receiver object. +* Further operations should not be attempted on the destroyed object. +* This function should only be called after a call to +* osm_mpr_rcv_construct or osm_mpr_rcv_init. +* +* SEE ALSO +* MultiPath Record Receiver object, osm_mpr_rcv_construct, +* osm_mpr_rcv_init +*********/ + +/****f* OpenSM: MultiPath Record Receiver/osm_mpr_rcv_init +* NAME +* osm_mpr_rcv_init +* +* DESCRIPTION +* The osm_mpr_rcv_init function initializes a +* MultiPath Record Receiver object for use. +* +* SYNOPSIS +*/ +ib_api_status_t +osm_mpr_rcv_init( + IN osm_mpr_rcv_t* const p_rcv, + IN osm_sa_resp_t* const p_resp, + IN osm_mad_pool_t* const p_mad_pool, + IN osm_subn_t* const p_subn, + IN osm_log_t* const p_log, + IN cl_plock_t* const p_lock ); +/* +* PARAMETERS +* p_rcv +* [in] Pointer to an osm_mpr_rcv_t object to initialize. +* +* p_subn +* [in] Pointer to the Subnet object for this subnet. +* +* p_log +* [in] Pointer to the log object. +* +* p_lock +* [in] Pointer to the OpenSM serializing lock. +* +* RETURN VALUES +* IB_SUCCESS if the MultiPath Record Receiver object was initialized +* successfully. +* +* NOTES +* Allows calling other MultiPath Record Receiver methods. +* +* SEE ALSO +* MultiPath Record Receiver object, osm_mpr_rcv_construct, +* osm_mpr_rcv_destroy +*********/ + +/****f* OpenSM: MultiPath Record Receiver/osm_mpr_rcv_process +* NAME +* osm_mpr_rcv_process +* +* DESCRIPTION +* Process the MultiPathRecord request. +* +* SYNOPSIS +*/ +void +osm_mpr_rcv_process( + IN osm_mpr_rcv_t* const p_rcv, + IN osm_madw_t* const p_madw ); +/* +* PARAMETERS +* p_rcv +* [in] Pointer to an osm_mpr_rcv_t object. +* +* p_madw +* [in] Pointer to the MAD Wrapper containing the MAD +* that contains the node's MultiPathRecord attribute. +* +* RETURN VALUES +* IB_SUCCESS if the MultiPathRecord processing was successful. +* +* NOTES +* This function processes a MultiPathRecord attribute. +* +* SEE ALSO +* MultiPath Record Receiver, Node Info Response Controller +*********/ + +END_C_DECLS + +#endif /* _OSM_MPR_RCV_H_ */ Property changes on: osm/include/opensm/osm_sa_multipath_record.h ___________________________________________________________________ Name: svn:keywords + Id Index: osm/include/opensm/osm_helper.h =================================================================== --- osm/include/opensm/osm_helper.h (revision 6920) +++ osm/include/opensm/osm_helper.h (working copy) @@ -217,6 +217,12 @@ osm_dump_path_record( IN const osm_log_level_t log_level ); void +osm_dump_multipath_record( + IN osm_log_t* const p_log, + IN const ib_multipath_rec_t* const p_mpr, + IN const osm_log_level_t log_level ); + +void osm_dump_node_record( IN osm_log_t* const p_log, IN const ib_node_record_t* const p_nr, Index: osm/include/opensm/osm_sa.h =================================================================== --- osm/include/opensm/osm_sa.h (revision 6920) +++ osm/include/opensm/osm_sa.h (working copy) @@ -67,6 +67,7 @@ #include #include #include +#include #include #include #include @@ -167,7 +168,11 @@ typedef struct _osm_sa osm_mcmr_rcv_ctrl_t mcmr_rcv_ctlr; osm_sr_rcv_t sr_rcv; osm_sr_rcv_ctrl_t sr_rcv_ctrl; - +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) + osm_mpr_rcv_t mpr_rcv; + osm_mpr_rcv_ctrl_t mpr_rcv_ctrl; +#endif + /* InformInfo Receiver */ osm_infr_rcv_t infr_rcv; osm_infr_rcv_ctrl_t infr_rcv_ctrl; Index: osm/include/opensm/osm_msgdef.h =================================================================== --- osm/include/opensm/osm_msgdef.h (revision 6920) +++ osm/include/opensm/osm_msgdef.h (working copy) @@ -192,6 +192,9 @@ enum OSM_MSG_MAD_VL_ARB, OSM_MSG_MAD_SLVL, OSM_MSG_MAD_GUIDINFO_RECORD, +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) + OSM_MSG_MAD_MULTIPATH_RECORD, +#endif OSM_MSG_MAX }; Index: osm/include/opensm/osm_sa_multipath_record_ctrl.h =================================================================== --- osm/include/opensm/osm_sa_multipath_record_ctrl.h (revision 0) +++ osm/include/opensm/osm_sa_multipath_record_ctrl.h (revision 0) @@ -0,0 +1,263 @@ +/* + * Copyright (c) 2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. + * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id$ + */ + + +/* + * Abstract: + * Declaration of osm_mpr_rcv_ctrl_t. + * This object represents a controller that receives the IBA + * MultiPathRecord attribute from a node. + * This object is part of the OpenSM family of objects. + * + * Environment: + * Linux User Mode + * + */ + + +#ifndef _OSM_MPRCTRL_H_ +#define _OSM_MPRCTRL_H_ + + +#include +#include +#include +#include +#include + +#ifdef __cplusplus +# define BEGIN_C_DECLS extern "C" { +# define END_C_DECLS } +#else /* !__cplusplus */ +# define BEGIN_C_DECLS +# define END_C_DECLS +#endif /* __cplusplus */ + +BEGIN_C_DECLS + +/****h* OpenSM/MultiPath Record Receive Controller +* NAME +* MultiPath Record Receive Controller +* +* DESCRIPTION +* The MultiPath Record Receive Controller object encapsulates +* the information needed to receive the MultiPathRecord attribute from a node. +* +* The MultiPath record Receive Controller object is thread safe. +* +* This object should be treated as opaque and should be +* manipulated only through the provided functions. +* +* AUTHOR +* Hal Rosenstock, Voltaire +* +*********/ +/****s* OpenSM: MultiPath Record Receive Controller/osm_mpr_rcv_ctrl_t +* NAME +* osm_mpr_rcv_ctrl_t +* +* DESCRIPTION +* MultiPath Record Receive Controller structure. +* +* This object should be treated as opaque and should +* be manipulated only through the provided functions. +* +* SYNOPSIS +*/ +typedef struct _osm_mpr_rcv_ctrl +{ + osm_mpr_rcv_t *p_rcv; + osm_log_t *p_log; + cl_dispatcher_t *p_disp; + cl_disp_reg_handle_t h_disp; + +} osm_mpr_rcv_ctrl_t; +/* +* FIELDS +* p_rcv +* Pointer to the MultiPath Record Receiver object. +* +* p_log +* Pointer to the log object. +* +* p_disp +* Pointer to the Dispatcher. +* +* h_disp +* Handle returned from dispatcher registration. +* +* SEE ALSO +* MultiPath Record Receive Controller object +* MultiPath Record Receiver object +*********/ + +/****f* OpenSM: MultiPath Record Receive Controller/osm_pr_rcv_ctrl_construct +* NAME +* osm_mpr_rcv_ctrl_construct +* +* DESCRIPTION +* This function constructs a MultiPath Record Receive Controller object. +* +* SYNOPSIS +*/ +void osm_mpr_rcv_ctrl_construct( + IN osm_mpr_rcv_ctrl_t* const p_ctrl ); +/* +* PARAMETERS +* p_ctrl +* [in] Pointer to a MultiPath Record Receive Controller +* object to construct. +* +* RETURN VALUE +* This function does not return a value. +* +* NOTES +* Allows calling osm_mpr_rcv_ctrl_init, osm_mpr_rcv_ctrl_destroy, +* and osm_mpr_rcv_ctrl_is_inited. +* +* Calling osm_mpr_rcv_ctrl_construct is a prerequisite to calling any +* other method except osm_mpr_rcv_ctrl_init. +* +* SEE ALSO +* MultiPath Record Receive Controller object, osm_mpr_rcv_ctrl_init, +* osm_mpr_rcv_ctrl_destroy, osm_mpr_rcv_ctrl_is_inited +*********/ + +/****f* OpenSM: MultiPath Record Receive Controller/osm_mpr_rcv_ctrl_destroy +* NAME +* osm_mpr_rcv_ctrl_destroy +* +* DESCRIPTION +* The osm_mpr_rcv_ctrl_destroy function destroys the object, releasing +* all resources. +* +* SYNOPSIS +*/ +void osm_mpr_rcv_ctrl_destroy( + IN osm_mpr_rcv_ctrl_t* const p_ctrl ); +/* +* PARAMETERS +* p_ctrl +* [in] Pointer to the object to destroy. +* +* RETURN VALUE +* This function does not return a value. +* +* NOTES +* Performs any necessary cleanup of the specified +* MultiPath Record Receive Controller object. +* Further operations should not be attempted on the destroyed object. +* This function should only be called after a call to +* osm_mpr_rcv_ctrl_construct or osm_mpr_rcv_ctrl_init. +* +* SEE ALSO +* MultiPath Record Receive Controller object, osm_mpr_rcv_ctrl_construct, +* osm_mpr_rcv_ctrl_init +*********/ + +/****f* OpenSM: MultiPath Record Receive Controller/osm_mpr_rcv_ctrl_init +* NAME +* osm_mpr_rcv_ctrl_init +* +* DESCRIPTION +* The osm_mpr_rcv_ctrl_init function initializes a +* MultiPath Record Receive Controller object for use. +* +* SYNOPSIS +*/ +ib_api_status_t osm_mpr_rcv_ctrl_init( + IN osm_mpr_rcv_ctrl_t* const p_ctrl, + IN osm_mpr_rcv_t* const p_rcv, + IN osm_log_t* const p_log, + IN cl_dispatcher_t* const p_disp ); +/* +* PARAMETERS +* p_ctrl +* [in] Pointer to an osm_mpr_rcv_ctrl_t object to initialize. +* +* p_rcv +* [in] Pointer to an osm_mpr_t object. +* +* p_log +* [in] Pointer to the log object. +* +* p_disp +* [in] Pointer to the OpenSM central Dispatcher. +* +* RETURN VALUES +* CL_SUCCESS if the MultiPath Record Receive Controller object was +* initialized successfully. +* +* NOTES +* Allows calling other MultiPath Record Receive Controller methods. +* +* SEE ALSO +* MultiPath Record Receive Controller object, osm_pr_rcv_ctrl_construct, +* osm_mpr_rcv_ctrl_destroy, osm_mpr_rcv_ctrl_is_inited +*********/ + +/****f* OpenSM: MultiPath Record Receive Controller/osm_mpr_rcv_ctrl_is_inited +* NAME +* osm_mpr_rcv_ctrl_is_inited +* +* DESCRIPTION +* Indicates if the object has been initialized with osm_mpr_rcv_ctrl_init. +* +* SYNOPSIS +*/ +boolean_t osm_mpr_rcv_ctrl_is_inited( + IN const osm_mpr_rcv_ctrl_t* const p_ctrl ); +/* +* PARAMETERS +* p_ctrl +* [in] Pointer to an osm_mpr_rcv_ctrl_t object. +* +* RETURN VALUES +* TRUE if the object was initialized successfully, +* FALSE otherwise. +* +* NOTES +* The osm_mpr_rcv_ctrl_construct or osm_mpr_rcv_ctrl_init must be +* called before using this function. +* +* SEE ALSO +* MultiPath Record Receive Controller object, osm_mpr_rcv_ctrl_construct, +* osm_mpr_rcv_ctrl_init +*********/ + +END_C_DECLS + +#endif /* _OSM_MPRCTRL_H_ */ Property changes on: osm/include/opensm/osm_sa_multipath_record_ctrl.h ___________________________________________________________________ Name: svn:keywords + Id Index: osm/include/Makefile.am =================================================================== --- osm/include/Makefile.am (revision 6920) +++ osm/include/Makefile.am (working copy) @@ -8,6 +8,7 @@ EXTRA_DIST = \ $(srcdir)/opensm/osm_version.h \ $(srcdir)/opensm/osm_sa_portinfo_record_ctrl.h \ $(srcdir)/opensm/osm_sa_guidinfo_record_ctrl.h \ + $(srcdir)/opensm/osm_sa_multipath_record_ctrl.h \ $(srcdir)/opensm/osm_sa_path_record.h \ $(srcdir)/opensm/osm_lid_mgr.h \ $(srcdir)/opensm/osm_vl_arb_rcv.h \ @@ -37,6 +38,7 @@ EXTRA_DIST = \ $(srcdir)/opensm/osm_helper.h \ $(srcdir)/opensm/osm_sa_portinfo_record.h \ $(srcdir)/opensm/osm_sa_guidinfo_record.h \ + $(srcdir)/opensm/osm_sa_multipath_record.h \ $(srcdir)/opensm/osm_sa_service_record.h \ $(srcdir)/opensm/osm_sa_response.h \ $(srcdir)/opensm/osm_node.h \ Index: osm/include/iba/ib_types.h =================================================================== --- osm/include/iba/ib_types.h (revision 6920) +++ osm/include/iba/ib_types.h (working copy) @@ -1615,6 +1615,17 @@ ib_class_is_rmpp( * SOURCE */ #define IB_PATH_REC_SELECTOR_MASK 0xC0 +/****d* IBA Base: Constants/IB_MULTIPATH_REC_SELECTOR_MASK +* NAME +* IB_MULTIPATH_REC_SELECTOR_MASK +* +* DESCRIPTION +* Mask for the selector field for multipath record MTU, rate, +* and packet lifetime. +* +* SOURCE +*/ +#define IB_MULTIPATH_REC_SELECTOR_MASK 0xC0 /**********/ /****d* IBA Base: Constants/IB_PATH_REC_BASE_MASK * NAME @@ -1628,6 +1639,18 @@ ib_class_is_rmpp( */ #define IB_PATH_REC_BASE_MASK 0x3F /**********/ +/****d* IBA Base: Constants/IB_MULTIPATH_REC_BASE_MASK +* NAME +* IB_MULTIPATH_REC_BASE_MASK +* +* DESCRIPTION +* Mask for the base value field for multipath record MTU, rate, +* and packet lifetime. +* +* SOURCE +*/ +#define IB_MULTIPATH_REC_BASE_MASK 0x3F +/**********/ /****h* IBA Base/Type Definitions * NAME @@ -2401,6 +2424,72 @@ typedef struct _ib_path_rec #define IB_GIR_COMPMASK_GID6 (CL_HTON64(((uint64_t)1)<<10)) #define IB_GIR_COMPMASK_GID7 (CL_HTON64(((uint64_t)1)<<11)) +/* MultiPath Record Component Masks */ +#define IB_MPR_COMPMASK_RAWTRAFFIC (CL_HTON64(((uint64_t)1)<<0)) +#define IB_MPR_COMPMASK_RESV0 (CL_HTON64(((uint64_t)1)<<1)) +#define IB_MPR_COMPMASK_FLOWLABEL (CL_HTON64(((uint64_t)1)<<2)) +#define IB_MPR_COMPMASK_HOPLIMIT (CL_HTON64(((uint64_t)1)<<3)) +#define IB_MPR_COMPMASK_TCLASS (CL_HTON64(((uint64_t)1)<<4)) +#define IB_MPR_COMPMASK_REVERSIBLE (CL_HTON64(((uint64_t)1)<<5)) +#define IB_MPR_COMPMASK_NUMBPATH (CL_HTON64(((uint64_t)1)<<6)) +#define IB_MPR_COMPMASK_PKEY (CL_HTON64(((uint64_t)1)<<7)) +#define IB_MPR_COMPMASK_RESV1 (CL_HTON64(((uint64_t)1)<<8)) +#define IB_MPR_COMPMASK_SL (CL_HTON64(((uint64_t)1)<<9)) +#define IB_MPR_COMPMASK_MTUSELEC (CL_HTON64(((uint64_t)1)<<10)) +#define IB_MPR_COMPMASK_MTU (CL_HTON64(((uint64_t)1)<<11)) +#define IB_MPR_COMPMASK_RATESELEC (CL_HTON64(((uint64_t)1)<<12)) +#define IB_MPR_COMPMASK_RATE (CL_HTON64(((uint64_t)1)<<13)) +#define IB_MPR_COMPMASK_PKTLIFETIMESELEC (CL_HTON64(((uint64_t)1)<<14)) +#define IB_MPR_COMPMASK_PKTLIFETIME (CL_HTON64(((uint64_t)1)<<15)) +#define IB_MPR_COMPMASK_RESV2 (CL_HTON64(((uint64_t)1)<<16)) +#define IB_MPR_COMPMASK_INDEPSELEC (CL_HTON64(((uint64_t)1)<<17)) +#define IB_MPR_COMPMASK_RESV3 (CL_HTON64(((uint64_t)1)<<18)) +#define IB_MPR_COMPMASK_SGIDCOUNT (CL_HTON64(((uint64_t)1)<<19)) +#define IB_MPR_COMPMASK_DGIDCOUNT (CL_HTON64(((uint64_t)1)<<20)) +#define IB_MPR_COMPMASK_RESV4 (CL_HTON64(((uint64_t)1)<<21)) +#define IB_MPR_COMPMASK_SDGID1 (CL_HTON64(((uint64_t)1)<<22)) +#define IB_MPR_COMPMASK_SDGID2 (CL_HTON64(((uint64_t)1)<<23)) +#define IB_MPR_COMPMASK_SDGID3 (CL_HTON64(((uint64_t)1)<<24)) +#define IB_MPR_COMPMASK_SDGID4 (CL_HTON64(((uint64_t)1)<<25)) +#define IB_MPR_COMPMASK_SDGID5 (CL_HTON64(((uint64_t)1)<<26)) +#define IB_MPR_COMPMASK_SDGID6 (CL_HTON64(((uint64_t)1)<<27)) +#define IB_MPR_COMPMASK_SDGID7 (CL_HTON64(((uint64_t)1)<<28)) +#define IB_MPR_COMPMASK_SDGID8 (CL_HTON64(((uint64_t)1)<<29)) +#define IB_MPR_COMPMASK_SDGID9 (CL_HTON64(((uint64_t)1)<<30)) +#define IB_MPR_COMPMASK_SDGID10 (CL_HTON64(((uint64_t)1)<<31)) +#define IB_MPR_COMPMASK_SDGID11 (CL_HTON64(((uint64_t)1)<<32)) +#define IB_MPR_COMPMASK_SDGID12 (CL_HTON64(((uint64_t)1)<<33)) +#define IB_MPR_COMPMASK_SDGID13 (CL_HTON64(((uint64_t)1)<<34)) +#define IB_MPR_COMPMASK_SDGID14 (CL_HTON64(((uint64_t)1)<<35)) +#define IB_MPR_COMPMASK_SDGID15 (CL_HTON64(((uint64_t)1)<<36)) +#define IB_MPR_COMPMASK_SDGID16 (CL_HTON64(((uint64_t)1)<<37)) +#define IB_MPR_COMPMASK_SDGID17 (CL_HTON64(((uint64_t)1)<<38)) +#define IB_MPR_COMPMASK_SDGID18 (CL_HTON64(((uint64_t)1)<<39)) +#define IB_MPR_COMPMASK_SDGID19 (CL_HTON64(((uint64_t)1)<<40)) +#define IB_MPR_COMPMASK_SDGID20 (CL_HTON64(((uint64_t)1)<<41)) +#define IB_MPR_COMPMASK_SDGID21 (CL_HTON64(((uint64_t)1)<<42)) +#define IB_MPR_COMPMASK_SDGID22 (CL_HTON64(((uint64_t)1)<<43)) +#define IB_MPR_COMPMASK_SDGID23 (CL_HTON64(((uint64_t)1)<<44)) +#define IB_MPR_COMPMASK_SDGID24 (CL_HTON64(((uint64_t)1)<<45)) +#define IB_MPR_COMPMASK_SDGID25 (CL_HTON64(((uint64_t)1)<<46)) +#define IB_MPR_COMPMASK_SDGID26 (CL_HTON64(((uint64_t)1)<<47)) +#define IB_MPR_COMPMASK_SDGID27 (CL_HTON64(((uint64_t)1)<<48)) +#define IB_MPR_COMPMASK_SDGID28 (CL_HTON64(((uint64_t)1)<<49)) +#define IB_MPR_COMPMASK_SDGID29 (CL_HTON64(((uint64_t)1)<<50)) +#define IB_MPR_COMPMASK_SDGID30 (CL_HTON64(((uint64_t)1)<<51)) +#define IB_MPR_COMPMASK_SDGID31 (CL_HTON64(((uint64_t)1)<<52)) +#define IB_MPR_COMPMASK_SDGID32 (CL_HTON64(((uint64_t)1)<<53)) +#define IB_MPR_COMPMASK_SDGID33 (CL_HTON64(((uint64_t)1)<<54)) +#define IB_MPR_COMPMASK_SDGID34 (CL_HTON64(((uint64_t)1)<<55)) +#define IB_MPR_COMPMASK_SDGID35 (CL_HTON64(((uint64_t)1)<<56)) +#define IB_MPR_COMPMASK_SDGID36 (CL_HTON64(((uint64_t)1)<<57)) +#define IB_MPR_COMPMASK_SDGID37 (CL_HTON64(((uint64_t)1)<<58)) +#define IB_MPR_COMPMASK_SDGID38 (CL_HTON64(((uint64_t)1)<<59)) +#define IB_MPR_COMPMASK_SDGID39 (CL_HTON64(((uint64_t)1)<<60)) +#define IB_MPR_COMPMASK_SDGID40 (CL_HTON64(((uint64_t)1)<<61)) +#define IB_MPR_COMPMASK_SDGID41 (CL_HTON64(((uint64_t)1)<<62)) +#define IB_MPR_COMPMASK_SDGID42 (CL_HTON64(((uint64_t)1)<<63)) + /****f* IBA Base: Types/ib_path_rec_init_local * NAME * ib_path_rec_init_local @@ -5508,6 +5597,316 @@ typedef struct _ib_guidinfo_record } PACK_SUFFIX ib_guidinfo_record_t; #include +#define IB_MULTIPATH_MAX_GIDS 11 /* Support max that can fit into first MAD (for now) */ + +#include +typedef struct _ib_multipath_rec_t +{ + ib_net32_t hop_flow_raw; + uint8_t tclass; + uint8_t num_path; + ib_net16_t pkey; + uint8_t resv0; + uint8_t sl; + uint8_t mtu; + uint8_t rate; + uint8_t pkt_life; + uint8_t resv1; + uint8_t independence; /* formerly resv2 */ + uint8_t sgid_count; + uint8_t dgid_count; + uint8_t resv3[7]; + ib_gid_t gids[IB_MULTIPATH_MAX_GIDS]; +} PACK_SUFFIX ib_multipath_rec_t; +#include +/* +* FIELDS +* hop_flow_raw +* Global routing parameters: hop count, flow label and raw bit. +* +* tclass +* Another global routing parameter. +* +* num_path +* Reversible path - 1 bit to say if path is reversible. +* num_path [6:0] In queries, maximum number of paths to return. +* In responses, undefined. +* +* pkey +* Partition key (P_Key) to use on this path. +* +* sl +* Service level to use on this path. +* +* mtu +* MTU and MTU selector fields to use on this path +* rate +* Rate and rate selector fields to use on this path. +* +* pkt_life +* Packet lifetime +* +* preference +* Indicates the relative merit of this path versus other path +* records returned from the SA. Lower numbers are better. +* +* SEE ALSO +*********/ + +/****f* IBA Base: Types/ib_multipath_rec_num_path +* NAME +* ib_multipath_rec_num_path +* +* DESCRIPTION +* Get max number of paths to return. +* +* SYNOPSIS +*/ +static inline uint8_t +ib_multipath_rec_num_path( + IN const ib_multipath_rec_t* const p_rec ) +{ + return( p_rec->num_path &0x7F ); +} +/* +* PARAMETERS +* p_rec +* [in] Pointer to the multipath record object. +* +* RETURN VALUES +* Maximum number of paths to return for each unique SGID_DGID combination. +* +* NOTES +* +* SEE ALSO +* ib_multipath_rec_t +*********/ + +/****f* IBA Base: Types/ib_multipath_rec_sl +* NAME +* ib_multipath_rec_sl +* +* DESCRIPTION +* Get multipath service level. +* +* SYNOPSIS +*/ +static inline uint8_t +ib_multipath_rec_sl( + IN const ib_multipath_rec_t* const p_rec ) +{ + return( (uint8_t)((cl_ntoh16( p_rec->sl )) & 0xF) ); +} +/* +* PARAMETERS +* p_rec +* [in] Pointer to the multipath record object. +* +* RETURN VALUES +* SL. +* +* NOTES +* +* SEE ALSO +* ib_multipath_rec_t +*********/ + +/****f* IBA Base: Types/ib_multipath_rec_mtu +* NAME +* ib_multipath_rec_mtu +* +* DESCRIPTION +* Get encoded path MTU. +* +* SYNOPSIS +*/ +static inline uint8_t +ib_multipath_rec_mtu( + IN const ib_multipath_rec_t* const p_rec ) +{ + return( (uint8_t)(p_rec->mtu & IB_MULTIPATH_REC_BASE_MASK) ); +} +/* +* PARAMETERS +* p_rec +* [in] Pointer to the multipath record object. +* +* RETURN VALUES +* Encoded path MTU. +* 1: 256 +* 2: 512 +* 3: 1024 +* 4: 2048 +* 5: 4096 +* others: reserved +* +* NOTES +* +* SEE ALSO +* ib_multipath_rec_t +*********/ + +/****f* IBA Base: Types/ib_multipath_rec_mtu_sel +* NAME +* ib_multipath_rec_mtu_sel +* +* DESCRIPTION +* Get encoded multipath MTU selector. +* +* SYNOPSIS +*/ +static inline uint8_t +ib_multipath_rec_mtu_sel( + IN const ib_multipath_rec_t* const p_rec ) +{ + return( (uint8_t)((p_rec->mtu & IB_MULTIPATH_REC_SELECTOR_MASK) >> 6) ); +} +/* +* PARAMETERS +* p_rec +* [in] Pointer to the multipath record object. +* +* RETURN VALUES +* Encoded path MTU selector value (for queries). +* 0: greater than MTU specified +* 1: less than MTU specified +* 2: exactly the MTU specified +* 3: largest MTU available +* +* NOTES +* +* SEE ALSO +* ib_multipath_rec_t +*********/ + +/****f* IBA Base: Types/ib_multipath_rec_rate +* NAME +* ib_multipath_rec_rate +* +* DESCRIPTION +* Get encoded multipath rate. +* +* SYNOPSIS +*/ +static inline uint8_t +ib_multipath_rec_rate( + IN const ib_multipath_rec_t* const p_rec ) +{ + return( (uint8_t)(p_rec->rate & IB_MULTIPATH_REC_BASE_MASK) ); +} +/* +* PARAMETERS +* p_rec +* [in] Pointer to the multipath record object. +* +* RETURN VALUES +* Encoded multipath rate. +* 2: 2.5 Gb/sec. +* 3: 10 Gb/sec. +* 4: 30 Gb/sec. +* others: reserved +* +* NOTES +* +* SEE ALSO +* ib_multipath_rec_t +*********/ + +/****f* IBA Base: Types/ib_multipath_rec_rate_sel +* NAME +* ib_multipath_rec_rate_sel +* +* DESCRIPTION +* Get encoded multipath rate selector. +* +* SYNOPSIS +*/ +static inline uint8_t +ib_multipath_rec_rate_sel( + IN const ib_multipath_rec_t* const p_rec ) +{ + return( (uint8_t)((p_rec->rate & IB_MULTIPATH_REC_SELECTOR_MASK) >> 6) ); +} +/* +* PARAMETERS +* p_rec +* [in] Pointer to the multipath record object. +* +* RETURN VALUES +* Encoded path rate selector value (for queries). +* 0: greater than rate specified +* 1: less than rate specified +* 2: exactly the rate specified +* 3: largest rate available +* +* NOTES +* +* SEE ALSO +* ib_multipath_rec_t +*********/ + +/****f* IBA Base: Types/ib_multipath_rec_pkt_life +* NAME +* ib_multipath_rec_pkt_life +* +* DESCRIPTION +* Get encoded multipath pkt_life. +* +* SYNOPSIS +*/ +static inline uint8_t +ib_multipath_rec_pkt_life( + IN const ib_multipath_rec_t* const p_rec ) +{ + return( (uint8_t)(p_rec->pkt_life & IB_MULTIPATH_REC_BASE_MASK) ); +} +/* +* PARAMETERS +* p_rec +* [in] Pointer to the multipath record object. +* +* RETURN VALUES +* Encoded multipath pkt_life = 4.096 ode must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id$ + */ + + +/* + * Abstract: + * Implementation of osm_mpr_rcv_t. + * This object represents the MultiPath Record Receiver object. + * This object is part of the opensm family of objects. + * + * Environment: + * Linux User Mode + * + */ + +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define OSM_MPR_RCV_POOL_MIN_SIZE 64 +#define OSM_MPR_RCV_POOL_GROW_SIZE 64 + +#define OSM_SA_MPR_MAX_NUM_PATH 127 + +typedef struct _osm_mpr_item +{ + cl_pool_item_t pool_item; + const osm_port_t *p_src_port; + const osm_port_t *p_dest_port; + int hops; + ib_path_rec_t path_rec; +} osm_mpr_item_t; + +typedef struct _osm_path_parms +{ + ib_net16_t pkey; + uint8_t mtu; + uint8_t rate; + uint8_t sl; + uint8_t pkt_life; + boolean_t reversible; + int hops; +} osm_path_parms_t; + + +/********************************************************************** + **********************************************************************/ +void +osm_mpr_rcv_construct( + IN osm_mpr_rcv_t* const p_rcv ) +{ + cl_memclr( p_rcv, sizeof(*p_rcv) ); + cl_qlock_pool_construct( &p_rcv->pr_pool ); +} + +/********************************************************************** + **********************************************************************/ +void +osm_mpr_rcv_destroy( + IN osm_mpr_rcv_t* const p_rcv ) +{ + OSM_LOG_ENTER( p_rcv->p_log, osm_mpr_rcv_destroy ); + cl_qlock_pool_destroy( &p_rcv->pr_pool ); + OSM_LOG_EXIT( p_rcv->p_log ); +} + +/********************************************************************** + **********************************************************************/ +ib_api_status_t +osm_mpr_rcv_init( + IN osm_mpr_rcv_t* const p_rcv, + IN osm_sa_resp_t* const p_resp, + IN osm_mad_pool_t* const p_mad_pool, + IN osm_subn_t* const p_subn, + IN osm_log_t* const p_log, + IN cl_plock_t* const p_lock ) +{ + ib_api_status_t status; + + OSM_LOG_ENTER( p_log, osm_mpr_rcv_init ); + + osm_mpr_rcv_construct( p_rcv ); + + p_rcv->p_log = p_log; + p_rcv->p_subn = p_subn; + p_rcv->p_lock = p_lock; + p_rcv->p_resp = p_resp; + p_rcv->p_mad_pool = p_mad_pool; + + status = cl_qlock_pool_init( &p_rcv->pr_pool, + OSM_MPR_RCV_POOL_MIN_SIZE, + 0, + OSM_MPR_RCV_POOL_GROW_SIZE, + sizeof(osm_mpr_item_t), + NULL, NULL, NULL ); + + OSM_LOG_EXIT( p_rcv->p_log ); + return( status ); +} + +/********************************************************************** + **********************************************************************/ +static ib_api_status_t +__osm_mpr_rcv_get_path_parms( + IN osm_mpr_rcv_t* const p_rcv, + IN const ib_multipath_rec_t* const p_mpr, + IN const osm_port_t* const p_src_port, + IN const osm_port_t* const p_dest_port, + IN const uint16_t dest_lid_ho, + IN const ib_net64_t comp_mask, + OUT osm_path_parms_t* const p_parms ) +{ + ib_net64_t node_guid; + const osm_node_t* p_node; + const osm_physp_t* p_physp; + const osm_physp_t* p_dest_physp; + const osm_switch_t* p_sw; + const ib_port_info_t* p_pi; + const cl_qmap_t* p_sw_tbl; + ib_slvl_table_t* p_slvl_tbl; + ib_api_status_t status = IB_SUCCESS; + uint8_t mtu; + uint8_t rate; + uint8_t pkt_life; + uint8_t required_mtu; + uint8_t required_rate; + uint16_t required_pkey; + uint8_t required_sl; + uint8_t required_pkt_life; + ib_net16_t dest_lid; + int hops = 0; + int in_port_num = 0; + uint8_t vl; + + OSM_LOG_ENTER( p_rcv->p_log, __osm_mpr_rcv_get_path_parms ); + + dest_lid = cl_hton16( dest_lid_ho ); + + p_dest_physp = osm_port_get_default_phys_ptr( p_dest_port ); + p_physp = osm_port_get_default_phys_ptr( p_src_port ); + p_pi = osm_physp_get_port_info_ptr( p_physp ); + p_sw_tbl = &p_rcv->p_subn->sw_guid_tbl; + + mtu = ib_port_info_get_neighbor_mtu( p_pi ); + rate = ib_port_info_compute_rate( p_pi ); + + if ( comp_mask & IB_MPR_COMPMASK_SL ) + required_sl = ib_multipath_rec_sl( p_mpr ); + else + required_sl = OSM_DEFAULT_SL; + + if ( comp_mask & IB_MPR_COMPMASK_PKEY ) { + required_pkey = p_mpr->pkey; + if ( !osm_physp_has_pkey( p_rcv->p_log, required_pkey, p_physp ) || + !osm_physp_has_pkey( p_rcv->p_log, required_pkey, p_dest_physp ) ) { + osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, + "__osm_mpr_rcv_get_path_parms: " + "path not found for PKEY = 0x%x\n" + "\t\tsrc %Lx dst %Lx\n", + required_pkey, + osm_physp_get_port_guid( p_physp ), + osm_physp_get_port_guid( p_dest_physp ) ); + + status = IB_NOT_FOUND; + goto Exit; + } + } else + required_pkey = IB_DEFAULT_PKEY; + + /* + Walk the subnet object from source to destination, + tracking the most restrictive rate and mtu values along the way... + + If source port node is a switch, then p_physp should + point to the port that routes the destination lid + */ + + p_node = osm_physp_get_node_ptr( p_physp ); + + if ( osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH ) + { + p_sw = (osm_switch_t *)cl_qmap_get( p_sw_tbl, + osm_node_get_node_guid( p_node ) ); + + if( p_sw == (osm_switch_t *)cl_qmap_end( p_sw_tbl ) ) + { + status = IB_ERROR; + goto Exit; + } + + /* + * If the dest_lid_ho is equal to the lid of the switch pointed by + * p_sw then p_physp will be the physical port of the switch port zero. + */ + p_physp = osm_switch_get_route_by_lid( p_sw, cl_ntoh16( dest_lid_ho ) ); + if ( p_physp == 0 ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_get_path_parms: ERR 4514: " + "Can't find routing to LID 0x%X from switch for guid 0x%016" PRIx64 "\n", + dest_lid_ho, + cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); + status = IB_ERROR; + goto Exit; + } + } + + /* + * Same as above + */ + p_node = osm_physp_get_node_ptr( p_dest_physp ); + + if ( osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH ) + { + p_sw = (osm_switch_t *)cl_qmap_get( p_sw_tbl, + osm_node_get_node_guid( p_node ) ); + + if ( p_sw == (osm_switch_t *)cl_qmap_end( p_sw_tbl ) ) + { + status = IB_ERROR; + goto Exit; + } + + p_dest_physp = osm_switch_get_route_by_lid( p_sw, cl_ntoh16( dest_lid_ho ) ); + + if ( p_dest_physp == 0 ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_get_path_parms: ERR 4515: " + "Can't find routing to LID 0x%X from switch for guid 0x%016" PRIx64 "\n", + dest_lid_ho, + cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); + status = IB_ERROR; + goto Exit; + } + + } + + while ( p_physp != p_dest_physp ) + { + p_physp = osm_physp_get_remote( p_physp ); + + if ( p_physp == 0 ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_get_path_parms: ERR 4505: " + "Can't find remote phys port when routing to LID 0x%X from node guid 0x%016" PRIx64 "\n", + dest_lid_ho, + cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); + status = IB_ERROR; + goto Exit; + } + + hops++; + + /* + This is point to point case (no switch in between) + */ + if ( p_physp == p_dest_physp ) + break; + + p_node = osm_physp_get_node_ptr( p_physp ); + + if ( osm_node_get_type( p_node ) != IB_NODE_TYPE_SWITCH ) + { + /* + There is some sort of problem in the subnet object! + If this isn't a switch, we should have reached + the destination by now! + */ + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_get_path_parms: ERR 4503: " + "Internal error, bad path\n" ); + status = IB_ERROR; + goto Exit; + } + + node_guid = osm_node_get_node_guid( p_node ); + p_sw = (osm_switch_t*)cl_qmap_get( p_sw_tbl, node_guid ); + + if ( p_sw == (osm_switch_t*)cl_qmap_end( p_sw_tbl ) ) + { + /* + There is some sort of problem in the subnet object! + */ + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_get_path_parms: ERR 4504: " + "Internal error, no switch for guid 0x%016" PRIx64 "\n", + cl_ntoh64( node_guid ) ); + status = IB_ERROR; + goto Exit; + } + + /* + Check parameters for the ingress port in this switch. + */ + p_pi = osm_physp_get_port_info_ptr( p_physp ); + + if ( mtu > ib_port_info_get_mtu_cap( p_pi ) ) + { + mtu = ib_port_info_get_mtu_cap( p_pi ); + if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_path_parms: " + "New smallest MTU = %u at intervening port 0x%016" PRIx64 "\n", + mtu, + osm_physp_get_port_guid( p_physp ) ); + } + } + + if ( rate > ib_port_info_compute_rate( p_pi ) ) + { + rate = ib_port_info_compute_rate( p_pi ); + if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_path_parms: " + "New smallest rate = %u at intervening port 0x%016" PRIx64 "\n", + rate, + osm_physp_get_port_guid( p_physp ) ); + } + } + + /* + Continue with the egress port on this switch. + */ + p_physp = osm_switch_get_route_by_lid( p_sw, dest_lid ); + + if ( p_physp == 0 ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_get_path_parms: ERR 4516: " + "Dead end on path to LID 0x%X from switch for guid 0x%016" PRIx64 "\n", + dest_lid_ho, + cl_ntoh64( node_guid ) ); + status = IB_ERROR; + goto Exit; + } + + CL_ASSERT( p_physp ); + CL_ASSERT( osm_physp_is_valid( p_physp ) ); + + if ( comp_mask & IB_MPR_COMPMASK_SL ) { + in_port_num = osm_physp_get_port_num( p_physp ); + p_slvl_tbl = osm_physp_get_slvl_tbl( p_physp, in_port_num ); + vl = ib_slvl_table_get( p_slvl_tbl, required_sl ); + if (vl == IB_DROP_VL) { /* discard packet */ + osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, + "__osm_mpr_rcv_get_path_parms: Path not found for SL %d\n" + "\t\tin_port_num %d port_guid %Lx\n", + required_sl, in_port_num, + osm_physp_get_port_guid( p_physp ) ); + status = IB_NOT_FOUND; + goto Exit; + } + } + + p_pi = osm_physp_get_port_info_ptr( p_physp ); + + if ( mtu > ib_port_info_get_mtu_cap( p_pi ) ) + { + mtu = ib_port_info_get_mtu_cap( p_pi ); + if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_path_parms: " + "New smallest MTU = %u at intervening port 0x%016" PRIx64 "\n", + mtu, + osm_physp_get_port_guid( p_physp ) ); + } + } + + if ( rate > ib_port_info_compute_rate( p_pi ) ) + { + rate = ib_port_info_compute_rate( p_pi ); + if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_path_parms: " + "New smallest rate = %u at intervening port 0x%016" PRIx64 "\n", + rate, + osm_physp_get_port_guid( p_physp ) ); + } + } + + } + + /* + p_physp now points to the destination + */ + p_pi = osm_physp_get_port_info_ptr( p_physp ); + + if ( mtu > ib_port_info_get_mtu_cap( p_pi ) ) + { + mtu = ib_port_info_get_mtu_cap( p_pi ); + if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_path_parms: " + "New smallest MTU = %u at destination port 0x%016" PRIx64 "\n", + mtu, + osm_physp_get_port_guid( p_physp ) ); + } + } + + if ( rate > ib_port_info_compute_rate( p_pi ) ) + { + rate = ib_port_info_compute_rate( p_pi ); + if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_path_parms: " + "New smallest rate = %u at destination port 0x%016" PRIx64 "\n", + rate, + osm_physp_get_port_guid( p_physp ) ); + } + } + + if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_path_parms: " + "Path min MTU = %u, min rate = %u\n", mtu, rate ); + } + + /* + Determine if these values meet the user criteria + */ + + /* we silently ignore cases where only the MTU selector is defined */ + if ( ( comp_mask & IB_MPR_COMPMASK_MTUSELEC ) && + ( comp_mask & IB_MPR_COMPMASK_MTU ) ) + { + required_mtu = ib_multipath_rec_mtu( p_mpr ); + switch ( ib_multipath_rec_mtu_sel( p_mpr ) ) + { + case 0: /* must be greater than */ + if ( mtu <= required_mtu ) + status = IB_NOT_FOUND; + break; + + case 1: /* must be less than */ + if ( mtu >= required_mtu ) + status = IB_NOT_FOUND; + break; + + case 2: /* exact match */ + if ( mtu != required_mtu ) + status = IB_NOT_FOUND; + break; + + case 3: /* largest available */ + /* can't be disqualified by this one */ + break; + + default: + /* if we're here, there's a bug in ib_multipath_rec_mtu_sel() */ + CL_ASSERT( FALSE ); + status = IB_ERROR; + break; + } + } + + /* we silently ignore cases where only the Rate selector is defined */ + if ( ( comp_mask & IB_MPR_COMPMASK_RATESELEC ) && + ( comp_mask & IB_PR_COMPMASK_RATE ) ) + { + required_rate = ib_multipath_rec_rate( p_mpr ); + switch ( ib_multipath_rec_rate_sel( p_mpr ) ) + { + case 0: /* must be greater than */ + if ( rate <= required_rate ) + status = IB_NOT_FOUND; + break; + + case 1: /* must be less than */ + if ( rate >= required_rate ) + status = IB_NOT_FOUND; + break; + + case 2: /* exact match */ + if ( rate != required_rate ) + status = IB_NOT_FOUND; + break; + + case 3: /* largest available */ + /* can't be disqualified by this one */ + break; + + default: + /* if we're here, there's a bug in ib_multipath_rec_mtu_sel() */ + CL_ASSERT( FALSE ); + status = IB_ERROR; + break; + } + } + + /* Verify the pkt_life_time */ + /* According to spec definition IBA 1.2 Table 205 PacketLifeTime description, + for loopback paths, packetLifeTime shall be zero. */ + if ( p_src_port == p_dest_port ) + pkt_life = 0; /* loopback */ + else + pkt_life = OSM_DEFAULT_SUBNET_TIMEOUT; + + /* we silently ignore cases where only the PktLife selector is defined */ + if ( ( comp_mask & IB_MPR_COMPMASK_PKTLIFETIMESELEC ) && + ( comp_mask & IB_MPR_COMPMASK_PKTLIFETIME ) ) + { + required_pkt_life = ib_multipath_rec_pkt_life( p_mpr ); + switch ( ib_multipath_rec_pkt_life_sel( p_mpr ) ) + { + case 0: /* must be greater than */ + if ( pkt_life <= required_pkt_life ) + status = IB_NOT_FOUND; + break; + + case 1: /* must be less than */ + if ( pkt_life >= required_pkt_life ) + status = IB_NOT_FOUND; + break; + + case 2: /* exact match */ + if ( pkt_life != required_pkt_life ) + status = IB_NOT_FOUND; + break; + + case 3: /* smallest available */ + /* can't be disqualified by this one */ + break; + + default: + /* if we're here, there's a bug in ib_path_rec_pkt_life_sel() */ + CL_ASSERT( FALSE ); + status = IB_ERROR; + break; + } + } + + if (status != IB_SUCCESS) + goto Exit; + + p_parms->mtu = mtu; + p_parms->rate = rate; + p_parms->pkey = required_pkey; + p_parms->pkt_life = pkt_life; + p_parms->sl = required_sl; + p_parms->hops = hops; + + Exit: + OSM_LOG_EXIT( p_rcv->p_log ); + return( status ); +} + +/********************************************************************** + **********************************************************************/ +static void +__osm_mpr_rcv_build_pr( + IN osm_mpr_rcv_t* const p_rcv, + IN const osm_port_t* const p_src_port, + IN const osm_port_t* const p_dest_port, + IN const uint16_t src_lid_ho, + IN const uint16_t dest_lid_ho, + IN const uint8_t preference, + IN const osm_path_parms_t* const p_parms, + OUT ib_path_rec_t* const p_pr ) +{ + const osm_physp_t* p_src_physp; + const osm_physp_t* p_dest_physp; + + OSM_LOG_ENTER( p_rcv->p_log, __osm_mpr_rcv_build_pr ); + + p_src_physp = osm_port_get_default_phys_ptr( p_src_port ); + p_dest_physp = osm_port_get_default_phys_ptr( p_dest_port ); + + p_pr->dgid.unicast.prefix = osm_physp_get_subnet_prefix( p_dest_physp ); + p_pr->dgid.unicast.interface_id = osm_physp_get_port_guid( p_dest_physp ); + + p_pr->sgid.unicast.prefix = osm_physp_get_subnet_prefix( p_src_physp ); + p_pr->sgid.unicast.interface_id = osm_physp_get_port_guid( p_src_physp ); + + p_pr->dlid = cl_hton16( dest_lid_ho ); + p_pr->slid = cl_hton16( src_lid_ho ); + + p_pr->pkey = p_parms->pkey; + p_pr->sl = p_parms->sl; + p_pr->mtu = (uint8_t)( p_parms->mtu | 0x80 ); + p_pr->rate = (uint8_t)( p_parms->rate | 0x80 ); + + /* According to 1.2 spec definition Table 205 PacketLifeTime description, + for loopback paths, packetLifeTime shall be zero. */ + if ( p_src_port == p_dest_port ) + p_pr->pkt_life = 0x80; /* loopback */ + else + p_pr->pkt_life = (uint8_t)( p_parms->pkt_life | 0x80 ); + + p_pr->preference = preference; + + /* always return num_path = 0 so this is only the reversible component */ + if ( p_parms->reversible ) + p_pr->num_path = 0x80; + + OSM_LOG_EXIT( p_rcv->p_log ); +} + +/********************************************************************** + **********************************************************************/ +static osm_mpr_item_t* +__osm_mpr_rcv_get_lid_pair_path( + IN osm_mpr_rcv_t* const p_rcv, + IN const ib_multipath_rec_t* const p_mpr, + IN const osm_port_t* const p_src_port, + IN const osm_port_t* const p_dest_port, + IN const uint16_t src_lid_ho, + IN const uint16_t dest_lid_ho, + IN const ib_net64_t comp_mask, + IN const uint8_t preference ) +{ + osm_path_parms_t path_parms; + osm_path_parms_t rev_path_parms; + osm_mpr_item_t *p_pr_item; + ib_api_status_t status, rev_path_status; + + OSM_LOG_ENTER( p_rcv->p_log, __osm_mpr_rcv_get_lid_pair_path ); + + if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_lid_pair_path: " + "Src LID 0x%X, Dest LID 0x%X\n", + src_lid_ho, dest_lid_ho ); + } + + p_pr_item = (osm_mpr_item_t*)cl_qlock_pool_get( &p_rcv->pr_pool ); + if ( p_pr_item == NULL ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_get_lid_pair_path: ERR 4501: " + "Unable to allocate path record\n" ); + goto Exit; + } + + status = __osm_mpr_rcv_get_path_parms( p_rcv, p_mpr, p_src_port, + p_dest_port, dest_lid_ho, + comp_mask, &path_parms ); + + if ( status != IB_SUCCESS ) + { + cl_qlock_pool_put( &p_rcv->pr_pool, &p_pr_item->pool_item ); + p_pr_item = NULL; + goto Exit; + } + + /* now try the reversible path */ + rev_path_status = __osm_mpr_rcv_get_path_parms( p_rcv, p_mpr, p_dest_port, + p_src_port, src_lid_ho, + comp_mask, &rev_path_parms ); + path_parms.reversible = ( rev_path_status == IB_SUCCESS ); + + /* did we get a Reversible Path compmask ? */ + /* + NOTE that if the reversible component = 0, it is a don't care + rather then requiring non-reversible paths ... + see Vol1 Ver1.2 p900 l16 + */ + if ( comp_mask & IB_MPR_COMPMASK_REVERSIBLE ) + if ( (! path_parms.reversible && ( p_mpr->num_path & 0x80 ) ) ) + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_lid_pair_path: " + "Requested reversible path but failed to get one\n"); + + cl_qlock_pool_put( &p_rcv->pr_pool, &p_pr_item->pool_item ); + p_pr_item = NULL; + goto Exit; + }; + + p_pr_item->p_src_port = p_src_port; + p_pr_item->p_dest_port = p_dest_port; + p_pr_item->hops = path_parms.hops; + + __osm_mpr_rcv_build_pr( p_rcv, p_src_port, p_dest_port, src_lid_ho, + dest_lid_ho, preference, &path_parms, + &p_pr_item->path_rec ); + + Exit: + OSM_LOG_EXIT( p_rcv->p_log ); + return( p_pr_item ); +} + +/********************************************************************** + **********************************************************************/ +static uint32_t +__osm_mpr_rcv_get_port_pair_paths( + IN osm_mpr_rcv_t* const p_rcv, + IN const ib_multipath_rec_t* const p_mpr, + IN const osm_port_t* const p_req_port, + IN const osm_port_t* const p_src_port, + IN const osm_port_t* const p_dest_port, + IN const uint32_t rem_paths, + IN const ib_net64_t comp_mask, + IN cl_qlist_t* const p_list ) +{ + osm_mpr_item_t* p_pr_item; + uint16_t src_lid_min_ho; + uint16_t src_lid_max_ho; + uint16_t dest_lid_min_ho; + uint16_t dest_lid_max_ho; + uint16_t src_lid_ho; + uint16_t dest_lid_ho; + uint32_t path_num = 0; + uint8_t preference; + uintn_t src_offset; + uintn_t dest_offset; + + OSM_LOG_ENTER( p_rcv->p_log, __osm_mpr_rcv_get_port_pair_paths ); + + if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_port_pair_paths: " + "Src port 0x%016" PRIx64 ", " + "Dst port 0x%016" PRIx64 "\n", + cl_ntoh64( osm_port_get_guid( p_src_port ) ), + cl_ntoh64( osm_port_get_guid( p_dest_port ) ) ); + } + + /* Check that the req_port, src_port and dest_port all share a + pkey. The check is done on the default physical port of the ports. */ + if ( osm_port_share_pkey(p_rcv->p_log, p_req_port, p_src_port ) == FALSE || + osm_port_share_pkey(p_rcv->p_log, p_req_port, p_dest_port ) == FALSE || + osm_port_share_pkey(p_rcv->p_log, p_src_port, p_dest_port ) == FALSE ) + { + /* One of the pairs doesn't share a pkey so the path is disqualified. */ + goto Exit; + } + + /* + We shouldn't be here if the paths are disqualified in some way... + Thus, we assume every possible connection is valid. + + We desire to return high-quality paths first. + In OpenSM, higher quality mean least overlap with other paths. + This is acheived in practice by returning paths with + different LID value on each end, which means these + paths are more redundent that paths with the same LID repeated + on one side. For example, in OpenSM the paths between two + endpoints with LMC = 1 might be as follows: + + Port A, LID 1 <-> Port B, LID 3 + Port A, LID 1 <-> Port B, LID 4 + Port A, LID 2 <-> Port B, LID 3 + Port A, LID 2 <-> Port B, LID 4 + + The OpenSM unicast routing algorithms attempt to disperse each path + to as varied a physical path as is reasonable. 1<->3 and 1<->4 have + more physical overlap (hence less redundancy) than 1<->3 and 2<->4. + + OpenSM ranks paths in three preference groups: + + Preference Value Description + ---------------- ------------------------------------------- + 0 Redundant in both directions with other + pref value = 0 paths + + 1 Redundant in one direction with other + pref value = 0 and pref value = 1 paths + + 2 Not redundant in either direction with + other paths + + 3-FF Unused + + + SA clients don't need to know these details, only that the lower + preference paths are preferred, as stated in the spec. The paths + may not actually be physically redundant depending on the topology + of the subnet, but the point of LMC > 0 is to offer redundancy, + so I assume the subnet is physically appropriate for the specified + LMC value. A more advanced implementation could inspect for physical + redundancy, but I'm not going to bother with that now. + */ + + osm_port_get_lid_range_ho( p_src_port, &src_lid_min_ho, &src_lid_max_ho ); + osm_port_get_lid_range_ho( p_dest_port, &dest_lid_min_ho, &dest_lid_max_ho ); + + if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_port_pair_paths: " + "Src LID [0x%X-0x%X], " + "Dest LID [0x%X-0x%X]\n", + src_lid_min_ho, src_lid_max_ho, + dest_lid_min_ho, dest_lid_max_ho ); + } + + src_lid_ho = src_lid_min_ho; + dest_lid_ho = dest_lid_min_ho; + + /* + Preferred paths come first in OpenSM + */ + preference = 0; + + while ( path_num < rem_paths ) + { + /* + These paths are "fully redundant" + */ + p_pr_item = __osm_mpr_rcv_get_lid_pair_path( p_rcv, p_mpr, + p_src_port, p_dest_port, + src_lid_ho, dest_lid_ho, + comp_mask, preference ); + + if ( p_pr_item ) + { + cl_qlist_insert_tail( p_list, + (cl_list_item_t*)&p_pr_item->pool_item ); + ++path_num; + } + + if ( ++src_lid_ho > src_lid_max_ho ) + break; + + if ( ++dest_lid_ho > dest_lid_max_ho ) + break; + } + + /* + Check if we've accumulated all the paths that the user cares to see + */ + if ( path_num == rem_paths ) + goto Exit; + + /* + Don't bother reporting preference 1 paths for now. + It's more trouble than it's worth and can only occur + if ports have different LMC values, which isn't supported + by OpenSM right now anyway. + */ + preference = 2; + src_lid_ho = src_lid_min_ho; + dest_lid_ho = dest_lid_min_ho; + src_offset = 0; + dest_offset = 0; + + /* + Iterate over the remaining paths + */ + while ( path_num < rem_paths ) + { + dest_offset++; + dest_lid_ho++; + + if ( dest_lid_ho > dest_lid_max_ho ) + { + src_offset++; + src_lid_ho++; + + if ( src_lid_ho > src_lid_max_ho ) + break; /* done */ + + dest_offset = 0; + dest_lid_ho = dest_lid_min_ho; + } + + /* + These paths are "fully non-redundant" with paths already + identified above and consequently not of much value. + + Don't return paths we already identified above, as indicated + by the offset values being equal. + */ + if ( src_offset == dest_offset ) + continue; /* already reported */ + + p_pr_item = __osm_mpr_rcv_get_lid_pair_path( p_rcv, p_mpr, + p_src_port, p_dest_port, + src_lid_ho, dest_lid_ho, + comp_mask, preference ); + + if ( p_pr_item ) + { + cl_qlist_insert_tail( p_list, + (cl_list_item_t*)&p_pr_item->pool_item ); + ++path_num; + } + } + + Exit: + OSM_LOG_EXIT( p_rcv->p_log ); + return path_num; +} + +#undef min +#define min(x,y) (((x) < (y)) ? (x) : (y)) + +/********************************************************************** + **********************************************************************/ +static osm_mpr_item_t* +__osm_mpr_rcv_get_apm_port_pair_paths( + IN osm_mpr_rcv_t* const p_rcv, + IN const ib_multipath_rec_t* const p_mpr, + IN const osm_port_t* const p_src_port, + IN const osm_port_t* const p_dest_port, + IN int base_offs, + IN const ib_net64_t comp_mask, + IN cl_qlist_t* const p_list ) +{ + osm_mpr_item_t* p_pr_item = 0; + uint16_t src_lid_min_ho; + uint16_t src_lid_max_ho; + uint16_t dest_lid_min_ho; + uint16_t dest_lid_max_ho; + uint16_t src_lid_ho; + uint16_t dest_lid_ho; + uintn_t iterations; + int src_lids, dest_lids; + + OSM_LOG_ENTER( p_rcv->p_log, __osm_mpr_rcv_get_apm_port_pair_paths ); + + if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_apm_port_pair_paths: " + "Src port 0x%016" PRIx64 ", " + "Dst port 0x%016" PRIx64 ", base offs %d\n", + cl_ntoh64( osm_port_get_guid( p_src_port ) ), + cl_ntoh64( osm_port_get_guid( p_dest_port ) ), + base_offs ); + } + + osm_port_get_lid_range_ho( p_src_port, &src_lid_min_ho, &src_lid_max_ho ); + osm_port_get_lid_range_ho( p_dest_port, &dest_lid_min_ho, &dest_lid_max_ho ); + + src_lid_ho = src_lid_min_ho; + dest_lid_ho = dest_lid_min_ho; + + src_lids = src_lid_max_ho - src_lid_min_ho + 1; + dest_lids = dest_lid_max_ho - dest_lid_min_ho + 1; + + src_lid_ho += base_offs % src_lids; + dest_lid_ho += base_offs % dest_lids; + + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_apm_port_pair_paths: " + "Src LIDs [0x%X-0x%X] hashed %d, " + "Dest LIDs [0x%X-0x%X] hashed %d\n", + src_lid_min_ho, src_lid_max_ho, src_lid_ho, + dest_lid_min_ho, dest_lid_max_ho, dest_lid_ho ); + + iterations = min( src_lids, dest_lids ); + + while ( iterations-- ) + { + /* + These paths are "fully redundant" + */ + p_pr_item = __osm_mpr_rcv_get_lid_pair_path( p_rcv, p_mpr, + p_src_port, p_dest_port, + src_lid_ho, dest_lid_ho, + comp_mask, 0 ); + + if ( p_pr_item ) + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_get_apm_port_pair_paths: " + "Found matching path from Src LID 0x%X to Dest LID 0x%X with %d hops\n", + src_lid_ho, dest_lid_ho, p_pr_item->hops); + break; + } + + if ( ++src_lid_ho > src_lid_max_ho ) + src_lid_ho = src_lid_min_ho; + + if ( ++dest_lid_ho > dest_lid_max_ho ) + dest_lid_ho = dest_lid_min_ho; + } + + OSM_LOG_EXIT( p_rcv->p_log ); + return p_pr_item; +} + +/********************************************************************** + **********************************************************************/ +static ib_net16_t +__osm_mpr_rcv_get_gids( + IN osm_mpr_rcv_t* const p_rcv, + IN const ib_gid_t * gids, + IN int ngids, + OUT osm_port_t** pp_port ) +{ + osm_port_t *p_port; + ib_net16_t ib_status = IB_SUCCESS; + int i; + + OSM_LOG_ENTER( p_rcv->p_log, __osm_mpr_rcv_get_gids ); + + for ( i = 0; i < ngids; i++, gids++ ) { + p_port = (osm_port_t *)cl_qmap_get( &p_rcv->p_subn->port_guid_tbl, + gids->unicast.interface_id ); + if ( !p_port || + p_port == (osm_port_t *)cl_qmap_end( &p_rcv->p_subn->port_guid_tbl ) ) { + /* + This 'error' is the client's fault (bad gid) so + don't enter it as an error in our own log. + Return an error response to the client. + */ + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_get_gids: ERR 4506: " + "No port with GUID = 0x%016" PRIx64 "\n", + cl_ntoh64( gids->unicast.interface_id ) ); + + ib_status = IB_SA_MAD_STATUS_INVALID_GID; + goto Exit; + } + + pp_port[i] = p_port; + } + + Exit: + OSM_LOG_EXIT(p_rcv->p_log); + + return ib_status; +} + +/********************************************************************** + **********************************************************************/ +static ib_net16_t +__osm_mpr_rcv_get_end_points( + IN osm_mpr_rcv_t* const p_rcv, + IN const osm_madw_t* const p_madw, + OUT osm_port_t ** pp_ports, + OUT int * nsrc, + OUT int * ndest ) +{ + const ib_multipath_rec_t* p_mpr; + const ib_sa_mad_t* p_sa_mad; + ib_net64_t comp_mask; + ib_net16_t sa_status = IB_SA_MAD_STATUS_SUCCESS; + ib_gid_t * gids; + + OSM_LOG_ENTER( p_rcv->p_log, __osm_mpr_rcv_get_end_points ); + + /* + Determine what fields are valid and then get a pointer + to the source and destination port objects, if possible. + */ + p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); + p_mpr = (ib_multipath_rec_t*)ib_sa_mad_get_payload_ptr( p_sa_mad ); + gids = (ib_gid_t *)p_mpr->gids; + + comp_mask = p_sa_mad->comp_mask; + + /* + Check a few easy disqualifying cases up front before getting + into the endpoints. + */ + + /* SDGIDs could be checked for multicast disqualification. */ + + *nsrc = *ndest = 0; + + if ( comp_mask & IB_MPR_COMPMASK_SGIDCOUNT ) { + *nsrc = p_mpr->sgid_count; + if ( *nsrc > IB_MULTIPATH_MAX_GIDS ) + *nsrc = IB_MULTIPATH_MAX_GIDS; + sa_status = __osm_mpr_rcv_get_gids( p_rcv, gids, *nsrc, pp_ports ); + if ( sa_status != IB_SUCCESS ) + goto Exit; + } + + if ( comp_mask & IB_MPR_COMPMASK_DGIDCOUNT ) { + *ndest = p_mpr->dgid_count; + if ( *ndest + *nsrc > IB_MULTIPATH_MAX_GIDS ) + *ndest = IB_MULTIPATH_MAX_GIDS - *nsrc; + sa_status = __osm_mpr_rcv_get_gids( p_rcv, gids + *nsrc, *ndest, + pp_ports + *nsrc ); + } + + Exit: + OSM_LOG_EXIT( p_rcv->p_log ); + return( sa_status ); +} + +#define __hash_lids(a, b, lmc) \ + (((((a) >> (lmc)) << 4) | ((b) >> (lmc))) % 103) + +/********************************************************************** + **********************************************************************/ +static void +__osm_mpr_rcv_get_apm_paths( + IN osm_mpr_rcv_t* const p_rcv, + IN const ib_multipath_rec_t* const p_mpr, + IN const osm_port_t* const p_req_port, + IN osm_port_t ** _pp_ports, + IN const ib_net64_t comp_mask, + IN cl_qlist_t* const p_list ) +{ + osm_port_t *pp_ports[4]; + osm_mpr_item_t *matrix[2][2]; + int base_offs, src_lid_ho, dest_lid_ho; + int sumA, sumB, minA, minB; + + OSM_LOG_ENTER( p_rcv->p_log, __osm_mpr_rcv_get_apm_paths ); + + /* + * We want to: + * 1. use different lid offsets (from base) for the resultant paths + * to increase the probability of redundant paths or in case + * of Clos - to ensure it (different offset => different spine!) + * 2. keep consistent paths no matter of direction and order of ports + * 3. distibute the lid offsets to balance the load + * So, we sort the ports (within the srcs, and within the dests), + * hash the lids of S0, D0 (after the sort), and call __osm_mpr_rcv_get_apm_port_pair_paths + * with base_lid for S0, D0 and base_lid + 1 for S1, D1. This way we will get + * always the same offsets - order indepentent, and make sure different spines are used. + * Note that the diagonals on a Clos have the same number of hops, so it doesn't + * really matter which diagonal we use. + */ + if ( _pp_ports[0]->guid < _pp_ports[1]->guid ) { + pp_ports[0] = _pp_ports[0]; + pp_ports[1] = _pp_ports[1]; + } else { + pp_ports[0] = _pp_ports[1]; + pp_ports[1] = _pp_ports[0]; + } + if ( _pp_ports[2]->guid < _pp_ports[3]->guid ) { + pp_ports[2] = _pp_ports[2]; + pp_ports[3] = _pp_ports[3]; + } else { + pp_ports[2] = _pp_ports[3]; + pp_ports[3] = _pp_ports[2]; + } + + src_lid_ho = osm_port_get_base_lid( pp_ports[0] ); + dest_lid_ho = osm_port_get_base_lid( pp_ports[2] ); + + base_offs = src_lid_ho < dest_lid_ho ? + __hash_lids( src_lid_ho, dest_lid_ho, p_rcv->p_subn->opt.lmc ) : + __hash_lids( dest_lid_ho, src_lid_ho, p_rcv->p_subn->opt.lmc ); + + matrix[0][0] = __osm_mpr_rcv_get_apm_port_pair_paths( p_rcv, p_mpr, pp_ports[0], + pp_ports[2], base_offs, comp_mask , p_list ); + matrix[0][1] = __osm_mpr_rcv_get_apm_port_pair_paths( p_rcv, p_mpr, pp_ports[0], + pp_ports[3], base_offs, comp_mask, p_list ); + matrix[1][0] = __osm_mpr_rcv_get_apm_port_pair_paths( p_rcv, p_mpr, pp_ports[1], + pp_ports[2], base_offs+1, comp_mask, p_list ); + matrix[1][1] = __osm_mpr_rcv_get_apm_port_pair_paths( p_rcv, p_mpr, pp_ports[1], + pp_ports[3], base_offs+1, comp_mask, p_list ); + + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_mpr_rcv_get_apm_paths: " + "APM matrix:\n" + "\t{0,0} 0x%X->0x%X (%d)\t| {0,1} 0x%X->0x%X (%d)\n" + "\t{1,0} 0x%X->0x%X (%d)\t| {1,1} 0x%X->0x%X (%d)\n", + matrix[0][0]->path_rec.slid, matrix[0][0]->path_rec.dlid, matrix[0][0]->hops, + matrix[0][1]->path_rec.slid, matrix[0][1]->path_rec.dlid, matrix[0][1]->hops, + matrix[1][0]->path_rec.slid, matrix[1][0]->path_rec.dlid, matrix[1][0]->hops, + matrix[1][1]->path_rec.slid, matrix[1][1]->path_rec.dlid, matrix[1][1]->hops ); + + /* check diagonal A {(0,0), (1,1)} */ + sumA = matrix[0][0]->hops + matrix[1][1]->hops; + minA = min( matrix[0][0]->hops, matrix[1][1]->hops ); + + /* check diagonal B {(0,1), (1,0)} */ + sumB = matrix[0][1]->hops + matrix[1][0]->hops; + minB = min( matrix[0][1]->hops, matrix[1][0]->hops ); + + /* and the winner is... */ + if ( minA <= minB || ( minA == minB && sumA < sumB ) ) { + /* Diag A */ + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_mpr_rcv_get_apm_paths: " + "Diag {0,0} & {1,1} is the best:\n" + "\t{0,0} 0x%X->0x%X (%d)\t & {1,1} 0x%X->0x%X (%d)\n", + matrix[0][0]->path_rec.slid, matrix[0][0]->path_rec.dlid, matrix[0][0]->hops, + matrix[1][1]->path_rec.slid, matrix[1][1]->path_rec.dlid, matrix[1][1]->hops ); + cl_qlist_insert_tail( p_list, + (cl_list_item_t*)&matrix[0][0]->pool_item ); + cl_qlist_insert_tail( p_list, + (cl_list_item_t*)&matrix[1][1]->pool_item ); + cl_qlock_pool_put( &p_rcv->pr_pool, &matrix[0][1]->pool_item ); + cl_qlock_pool_put( &p_rcv->pr_pool, &matrix[1][0]->pool_item ); + } else { + /* Diag B */ + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_mpr_rcv_get_apm_paths: " + "Diag {0,1} & {1,0} is the best:\n" + "\t{0,1} 0x%X->0x%X (%d)\t & {1,0} 0x%X->0x%X (%d)\n", + matrix[0][1]->path_rec.slid, matrix[0][1]->path_rec.dlid, matrix[0][1]->hops, + matrix[1][0]->path_rec.slid, matrix[1][0]->path_rec.dlid, matrix[1][0]->hops ); + cl_qlist_insert_tail( p_list, + (cl_list_item_t*)&matrix[0][1]->pool_item ); + cl_qlist_insert_tail( p_list, + (cl_list_item_t*)&matrix[1][0]->pool_item ); + cl_qlock_pool_put( &p_rcv->pr_pool, &matrix[0][0]->pool_item ); + cl_qlock_pool_put( &p_rcv->pr_pool, &matrix[1][1]->pool_item ); + } + + OSM_LOG_EXIT( p_rcv->p_log ); +} + +/********************************************************************** + **********************************************************************/ +static void +__osm_mpr_rcv_process_pairs( + IN osm_mpr_rcv_t* const p_rcv, + IN const ib_multipath_rec_t* const p_mpr, + IN osm_port_t* const p_req_port, + IN osm_port_t ** pp_ports, + IN const int nsrc, + IN const int ndest, + IN const ib_net64_t comp_mask, + IN cl_qlist_t* const p_list ) +{ + osm_port_t **pp_src_port, **pp_es; + osm_port_t **pp_dest_port, **pp_ed; + uint32_t max_paths, num_paths, total_paths = 0; + + OSM_LOG_ENTER( p_rcv->p_log, __osm_mpr_rcv_process_pairs ); + + if ( comp_mask & IB_MPR_COMPMASK_NUMBPATH ) + max_paths = p_mpr->num_path & 0x7F; + else + max_paths = OSM_SA_MPR_MAX_NUM_PATH; + + for ( pp_src_port = pp_ports, pp_es = pp_ports + nsrc; pp_src_port < pp_es; pp_src_port++ ) + { + for ( pp_dest_port = pp_es, pp_ed = pp_es + ndest; pp_dest_port < pp_ed; pp_dest_port++ ) + { + num_paths = __osm_mpr_rcv_get_port_pair_paths( p_rcv, p_mpr, p_req_port, + *pp_src_port, *pp_dest_port, + max_paths - total_paths, + comp_mask, p_list ); + total_paths += num_paths; + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_mpr_rcv_process_pairs: " + "%d paths %d total paths %d max paths\n", + num_paths, total_paths, max_paths ); + /* Just take first NumbPaths found */ + if (total_paths >= max_paths) + goto Exit; + } + } + + Exit: + OSM_LOG_EXIT( p_rcv->p_log ); +} + +/********************************************************************** + **********************************************************************/ +static void +__osm_mpr_rcv_respond( + IN osm_mpr_rcv_t* const p_rcv, + IN const osm_madw_t* const p_madw, + IN cl_qlist_t* const p_list ) +{ + osm_madw_t* p_resp_madw; + const ib_sa_mad_t* p_sa_mad; + ib_sa_mad_t* p_resp_sa_mad; + size_t num_rec; + size_t mad_size; + ib_path_rec_t* p_resp_pr; + ib_multipath_rec_t* p_mpr; + ib_api_status_t status; + osm_mpr_item_t* p_mpr_item; + uint32_t i; + + OSM_LOG_ENTER( p_rcv->p_log, __osm_mpr_rcv_respond ); + + p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); + p_mpr = (ib_multipath_rec_t*)ib_sa_mad_get_payload_ptr( p_sa_mad ); + + num_rec = cl_qlist_count( p_list ); + + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_mpr_rcv_respond: " + "Generating response with %u records\n", num_rec ); + + mad_size = IB_SA_MAD_HDR_SIZE + num_rec * sizeof(ib_path_rec_t); + + /* + Get a MAD to reply. Address of Mad is in the received mad_wrapper + */ + p_resp_madw = osm_mad_pool_get( p_rcv->p_mad_pool, p_madw->h_bind, + mad_size, &p_madw->mad_addr ); + + if ( !p_resp_madw ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_respond: " + "ERR 4502: Unable to allocate MAD\n" ); + + for ( i = 0; i < num_rec; i++ ) + { + p_mpr_item = (osm_mpr_item_t*)cl_qlist_remove_head( p_list ); + cl_qlock_pool_put( &p_rcv->pr_pool, &p_mpr_item->pool_item ); + } + + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_NO_RESOURCES ); + goto Exit; + } + + p_resp_sa_mad = osm_madw_get_sa_mad_ptr( p_resp_madw ); + + cl_memcpy( p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE ); + p_resp_sa_mad->method |= IB_MAD_METHOD_RESP_MASK; + /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ + p_resp_sa_mad->sm_key = 0; + + /* + o15-0.2.7: If MultiPath is supported, then SA shall respond to a + SubnAdmGetMulti() containing a valid MultiPathRecord attribute with + a set of zero or more PathRecords satisfying the constraints indicated + in the MultiPathRecord received. The PathRecord Attribute ID shall be + used in the response. + */ + p_resp_sa_mad->attr_id = IB_MAD_ATTR_PATH_RECORD; + p_resp_sa_mad->attr_offset = ib_get_attr_offset( sizeof(ib_path_rec_t) ); + + p_resp_sa_mad->rmpp_flags = IB_RMPP_FLAG_ACTIVE; + + p_resp_pr = (ib_path_rec_t*)ib_sa_mad_get_payload_ptr( p_resp_sa_mad ); + + for ( i = 0; i < num_rec; i++ ) + { + p_mpr_item = (osm_mpr_item_t*)cl_qlist_remove_head( p_list ); + + /* Copy the Path Records from the list into the MAD */ + *p_resp_pr = p_mpr_item->path_rec; + + cl_qlock_pool_put( &p_rcv->pr_pool, &p_mpr_item->pool_item ); + p_resp_pr++; + } + + CL_ASSERT( cl_is_qlist_empty( p_list ) ); + + osm_dump_sa_mad( p_rcv->p_log, p_resp_sa_mad, OSM_LOG_FRAMES ); + + status = osm_vendor_send( p_resp_madw->h_bind, p_resp_madw, FALSE ); + + if ( status != IB_SUCCESS ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_mpr_rcv_respond: ERR 4507: " + "Unable to send MAD (%s)\n", ib_get_err_str( status ) ); + /* osm_mad_pool_put( p_rcv->p_mad_pool, p_resp_madw ); */ + } + + Exit: + OSM_LOG_EXIT( p_rcv->p_log ); +} + +/********************************************************************** + **********************************************************************/ +void +osm_mpr_rcv_process( + IN osm_mpr_rcv_t* const p_rcv, + IN osm_madw_t* const p_madw ) +{ + const ib_multipath_rec_t* p_mpr; + const ib_sa_mad_t* p_sa_mad; + osm_port_t* requester_port; + osm_port_t* pp_ports[IB_MULTIPATH_MAX_GIDS]; + cl_qlist_t pr_list; + ib_net16_t sa_status; + int nsrc, ndest; + + OSM_LOG_ENTER( p_rcv->p_log, osm_mpr_rcv_process ); + + CL_ASSERT( p_madw ); + + /* update the requester physical port. */ + requester_port = osm_get_port_by_mad_addr( p_rcv->p_log, p_rcv->p_subn, + osm_madw_get_mad_addr_ptr( p_madw ) ); + if ( requester_port == NULL ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_mpr_rcv_process: ERR 4516: " + "Cannot find requester physical port\n" ); + goto Exit; + } + + p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); + p_mpr = (ib_multipath_rec_t*)ib_sa_mad_get_payload_ptr( p_sa_mad ); + + CL_ASSERT( p_sa_mad->attr_id == IB_MAD_ATTR_MULTIPATH_RECORD ); + + if ( ( p_sa_mad->rmpp_flags & IB_RMPP_FLAG_ACTIVE ) != IB_RMPP_FLAG_ACTIVE ) + { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_mpr_rcv_process: ERR 4510: " + "Invalid request as RMPP_FLAG_ACTIVE is not set\n" ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + goto Exit; + } + + if ( p_sa_mad->method != IB_MAD_METHOD_GETMULTI ) { + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_mpr_rcv_process: ERR 4513: " + "Unsupported Method (%s)\n", + ib_get_sa_method_str( p_sa_mad->method ) ); + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + goto Exit; + } + + if ( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) + osm_dump_multipath_record( p_rcv->p_log, p_mpr, OSM_LOG_DEBUG ); + + cl_qlist_init( &pr_list ); + + /* + Most SA functions (including this one) are read-only on the + subnet object, so we grab the lock non-exclusively. + */ + cl_plock_acquire( p_rcv->p_lock ); + + sa_status = __osm_mpr_rcv_get_end_points( p_rcv, p_madw, pp_ports, + &nsrc, &ndest ); + + if ( sa_status != IB_SA_MAD_STATUS_SUCCESS || !nsrc || !ndest ) + { + if ( sa_status == IB_SA_MAD_STATUS_SUCCESS && ( !nsrc || !ndest ) ) + osm_log( p_rcv->p_log, OSM_LOG_ERROR, "osm_mpr_rcv_process_cb: ERR 4512: " + "__osm_mpr_rcv_get_end_points failed, not enough GIDs " + "(nsrc %d ndest %d)\n", + nsrc, ndest); + cl_plock_release( p_rcv->p_lock ); + if ( sa_status == IB_SA_MAD_STATUS_SUCCESS ) + osm_sa_send_error( p_rcv->p_resp, p_madw, IB_SA_MAD_STATUS_REQ_INVALID ); + else + osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status ); + goto Exit; + } + + /* APM request */ + if ( nsrc == 2 && ndest == 2 && ( p_mpr->num_path & 0x7F ) == 2 ) + __osm_mpr_rcv_get_apm_paths( p_rcv, p_mpr, requester_port, pp_ports, + p_sa_mad->comp_mask, &pr_list ); + else + __osm_mpr_rcv_process_pairs( p_rcv, p_mpr, requester_port, pp_ports, + nsrc, ndest, + p_sa_mad->comp_mask, &pr_list ); + + cl_plock_release( p_rcv->p_lock ); + __osm_mpr_rcv_respond( p_rcv, p_madw, &pr_list ); + + Exit: + OSM_LOG_EXIT( p_rcv->p_log ); +} +#endif Property changes on: osm/opensm/osm_sa_multipath_record.c ___________________________________________________________________ Name: svn:keywords + Id Index: osm/opensm/osm_helper.c =================================================================== --- osm/opensm/osm_helper.c (revision 6920) +++ osm/opensm/osm_helper.c (working copy) @@ -952,6 +952,79 @@ osm_dump_path_record( /********************************************************************** **********************************************************************/ void +osm_dump_multipath_record( + IN osm_log_t* const p_log, + IN const ib_multipath_rec_t* const p_mpr, + IN const osm_log_level_t log_level ) +{ + int i; + char buf_line[1024]; + ib_gid_t const *p_gid; + + if( osm_log_is_active( p_log, log_level ) ) + { + cl_memclr(buf_line, sizeof(buf_line)); + p_gid = p_mpr->gids; + if ( p_mpr->sgid_count ) + { + for (i = 0; i < p_mpr->sgid_count; i++) + { + sprintf( buf_line, "%s\t\t\t\tsgid%02d.................." + "0x%016" PRIx64 " : 0x%016" PRIx64 "\n", + buf_line, i + 1, cl_ntoh64( p_gid->unicast.prefix ), + cl_ntoh64( p_gid->unicast.interface_id ) ); + p_gid++; + } + } + if ( p_mpr->dgid_count ) + { + for (i = 0; i < p_mpr->dgid_count; i++) + { + sprintf( buf_line, "%s\t\t\t\tdgid%02d.................." + "0x%016" PRIx64 " : 0x%016" PRIx64 "\n", + buf_line, i + 1, cl_ntoh64( p_gid->unicast.prefix ), + cl_ntoh64( p_gid->unicast.interface_id ) ); + p_gid++; + } + } + osm_log( p_log, log_level, + "MultiPathRecord dump:\n" + "\t\t\t\thop_flow_raw............0x%X\n" + "\t\t\t\ttclass..................0x%X\n" + "\t\t\t\tnum_path_revers.........0x%X\n" + "\t\t\t\tpkey....................0x%X\n" + "\t\t\t\tresv0...................0x%X\n" + "\t\t\t\tsl......................0x%X\n" + "\t\t\t\tmtu.....................0x%X\n" + "\t\t\t\trate....................0x%X\n" + "\t\t\t\tpkt_life................0x%X\n" + "\t\t\t\tresv1...................0x%X\n" + "\t\t\t\tindependence............0x%X\n" + "\t\t\t\tsgid_count..............0x%X\n" + "\t\t\t\tdgid_count..............0x%X\n" + "%s\n" + "", + cl_ntoh32( p_mpr->hop_flow_raw ), + p_mpr->tclass, + p_mpr->num_path, + cl_ntoh16( p_mpr->pkey ), + p_mpr->resv0, + cl_ntoh16( p_mpr->sl ), + p_mpr->mtu, + p_mpr->rate, + p_mpr->pkt_life, + p_mpr->resv1, + p_mpr->independence, + p_mpr->sgid_count, + p_mpr->dgid_count, + buf_line + ); + } +} + +/********************************************************************** + **********************************************************************/ +void osm_dump_mc_record( IN osm_log_t* const p_log, IN const ib_member_rec_t* const p_mcmr, Index: osm/opensm/osm_sa_response.c =================================================================== --- osm/opensm/osm_sa_response.c (revision 6920) +++ osm/opensm/osm_sa_response.c (working copy) @@ -154,6 +154,13 @@ osm_sa_send_error( */ p_resp_sa_mad->sm_key = 0; + /* + * o15-0.2.7 - The PathRecord Attribute ID shall be used in + * the response (to a SubnAdmGetMulti(MultiPathRecord) + */ + if( p_resp_sa_mad->attr_id == IB_MAD_ATTR_MULTIPATH_RECORD ) + p_resp_sa_mad->attr_id = IB_MAD_ATTR_PATH_RECORD; + if( osm_log_is_active( p_resp->p_log, OSM_LOG_FRAMES ) ) osm_dump_sa_mad( p_resp->p_log, p_resp_sa_mad, OSM_LOG_FRAMES ); Index: osm/opensm/osm_sa.c =================================================================== --- osm/opensm/osm_sa.c (revision 6920) +++ osm/opensm/osm_sa.c (working copy) @@ -98,6 +98,11 @@ osm_sa_construct( osm_pr_rcv_construct( &p_sa->pr_rcv ); osm_pr_rcv_ctrl_construct( &p_sa->pr_rcv_ctrl ); +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) + osm_mpr_rcv_construct( &p_sa->mpr_rcv ); + osm_mpr_rcv_ctrl_construct( &p_sa->mpr_rcv_ctrl ); +#endif + osm_smir_rcv_construct( &p_sa->smir_rcv ); osm_smir_ctrl_construct( &p_sa->smir_ctrl ); @@ -141,6 +146,9 @@ osm_sa_shutdown( osm_gir_rcv_ctrl_destroy( &p_sa->gir_rcv_ctrl ); osm_lr_rcv_ctrl_destroy( &p_sa->lr_rcv_ctrl ); osm_pr_rcv_ctrl_destroy( &p_sa->pr_rcv_ctrl ); +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) + osm_mpr_rcv_ctrl_destroy( &p_sa->mpr_rcv_ctrl ); +#endif osm_smir_ctrl_destroy( &p_sa->smir_ctrl ); osm_mcmr_rcv_ctrl_destroy( &p_sa->mcmr_rcv_ctlr); osm_sr_rcv_ctrl_destroy( &p_sa->sr_rcv_ctrl ); @@ -169,6 +177,9 @@ osm_sa_destroy( osm_gir_rcv_destroy( &p_sa->gir_rcv ); osm_lr_rcv_destroy( &p_sa->lr_rcv ); osm_pr_rcv_destroy( &p_sa->pr_rcv ); +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) + osm_mpr_rcv_destroy( &p_sa->mpr_rcv ); +#endif osm_smir_rcv_destroy( &p_sa->smir_rcv ); osm_mcmr_rcv_destroy(&p_sa->mcmr_rcv); osm_sr_rcv_destroy( &p_sa->sr_rcv ); @@ -335,6 +346,26 @@ osm_sa_init( if( status != IB_SUCCESS ) goto Exit; +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) + status = osm_mpr_rcv_init( + &p_sa->mpr_rcv, + &p_sa->resp, + p_sa->p_mad_pool, + p_subn, + p_log, + p_lock ); + if( status != IB_SUCCESS ) + goto Exit; + + status = osm_mpr_rcv_ctrl_init( + &p_sa->mpr_rcv_ctrl, + &p_sa->mpr_rcv, + p_log, + p_disp ); + if( status != IB_SUCCESS ) + goto Exit; +#endif + status = osm_smir_rcv_init( &p_sa->smir_rcv, &p_sa->resp, Index: osm/opensm/osm_sa_class_port_info.c =================================================================== --- osm/opensm/osm_sa_class_port_info.c (revision 6920) +++ osm/opensm/osm_sa_class_port_info.c (working copy) @@ -219,8 +219,14 @@ __osm_cpi_rcv_respond( */ /* Note host notation replaced later */ +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) p_resp_cpi->cap_mask = OSM_CAP_IS_SUBN_GET_SET_NOTICE_SUP | - OSM_CAP_IS_PORT_INFO_CAPMASK_MATCH_SUPPORTED; + OSM_CAP_IS_PORT_INFO_CAPMASK_MATCH_SUPPORTED | + OSM_CAP_IS_MULTIPATH_SUP; +#else + p_resp_cpi->cap_mask = OSM_CAP_IS_SUBN_GET_SET_NOTICE_SUP | + OSM_CAP_IS_PORT_INFO_CAPMASK_MATCH_SUPPORTED; +#endif if (p_rcv->p_subn->opt.no_multicast_option != TRUE) p_resp_cpi->cap_mask |= OSM_CAP_IS_UD_MCAST_SUP; p_resp_cpi->cap_mask = cl_hton16(p_resp_cpi->cap_mask); Index: osm/opensm/libopensm.map =================================================================== --- osm/opensm/libopensm.map (revision 6920) +++ osm/opensm/libopensm.map (working copy) @@ -21,6 +21,7 @@ OPENSM_1.0 { osm_dump_node_info; osm_dump_node_record; osm_dump_path_record; + osm_dump_multipath_record; osm_dump_mc_record; osm_dump_service_record; osm_dump_inform_info; Index: osm/opensm/osm_sa_mad_ctrl.c =================================================================== --- osm/opensm/osm_sa_mad_ctrl.c (revision 6920) +++ osm/opensm/osm_sa_mad_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -209,6 +209,12 @@ __osm_sa_mad_ctrl_process( msg_id = OSM_MSG_MAD_GUIDINFO_RECORD; break; +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) + case IB_MAD_ATTR_MULTIPATH_RECORD: + msg_id = OSM_MSG_MAD_MULTIPATH_RECORD; + break; +#endif + default: osm_log( p_ctrl->p_log, OSM_LOG_ERROR, "__osm_sa_mad_ctrl_process: ERR 1A01: " @@ -370,6 +376,9 @@ __osm_sa_mad_ctrl_rcv_callback( case IB_MAD_METHOD_GET: case IB_MAD_METHOD_GETTABLE: +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) + case IB_MAD_METHOD_GETMULTI: +#endif case IB_MAD_METHOD_SET: case IB_MAD_METHOD_DELETE: __osm_sa_mad_ctrl_process( p_ctrl, p_madw ); Index: osm/opensm/osm_sa_multipath_record_ctrl.c =================================================================== --- osm/opensm/osm_sa_multipath_record_ctrl.c (revision 0) +++ osm/opensm/osm_sa_multipath_record_ctrl.c (revision 0) @@ -0,0 +1,133 @@ +/* + * Copyright (c) 2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. + * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id$ + */ + + +/* + * Abstract: + * Implementation of osm_mpr_rcv_ctrl_t. + * This object represents the MultiPathRecord request controller object. + * This object is part of the opensm family of objects. + * + * Environment: + * Linux User Mode + * + */ + +/* + Next available error code: 0x203 +*/ + +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include + +/********************************************************************** + **********************************************************************/ +static void +__osm_mpr_rcv_ctrl_disp_callback( + IN void *context, + IN void *p_data ) +{ + /* ignore return status when invoked via the dispatcher */ + osm_mpr_rcv_process( ((osm_mpr_rcv_ctrl_t*)context)->p_rcv, + (osm_madw_t*)p_data ); +} + +/********************************************************************** + **********************************************************************/ +void +osm_mpr_rcv_ctrl_construct( + IN osm_mpr_rcv_ctrl_t* const p_ctrl ) +{ + cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; +} + +/********************************************************************** + **********************************************************************/ +void +osm_mpr_rcv_ctrl_destroy( + IN osm_mpr_rcv_ctrl_t* const p_ctrl ) +{ + CL_ASSERT( p_ctrl ); + cl_disp_unregister( p_ctrl->h_disp ); +} + +/********************************************************************** + **********************************************************************/ +ib_api_status_t +osm_mpr_rcv_ctrl_init( + IN osm_mpr_rcv_ctrl_t* const p_ctrl, + IN osm_mpr_rcv_t* const p_rcv, + IN osm_log_t* const p_log, + IN cl_dispatcher_t* const p_disp ) +{ + ib_api_status_t status = IB_SUCCESS; + + OSM_LOG_ENTER( p_log, osm_mpr_rcv_ctrl_init ); + + osm_mpr_rcv_ctrl_construct( p_ctrl ); + p_ctrl->p_log = p_log; + p_ctrl->p_rcv = p_rcv; + p_ctrl->p_disp = p_disp; + + p_ctrl->h_disp = cl_disp_register( + p_disp, + OSM_MSG_MAD_MULTIPATH_RECORD, + __osm_mpr_rcv_ctrl_disp_callback, + p_ctrl ); + + if( p_ctrl->h_disp == CL_DISP_INVALID_HANDLE ) + { + osm_log( p_log, OSM_LOG_ERROR, + "osm_mpr_rcv_ctrl_init: ERR 4B01: " + "Dispatcher registration failed\n" ); + status = IB_INSUFFICIENT_RESOURCES; + goto Exit; + } + + Exit: + OSM_LOG_EXIT( p_log ); + return( status ); +} + +#endif Property changes on: osm/opensm/osm_sa_multipath_record_ctrl.c ___________________________________________________________________ Name: svn:keywords + Id Index: osm/opensm/Makefile.am =================================================================== --- osm/opensm/Makefile.am (revision 6920) +++ osm/opensm/Makefile.am (working copy) @@ -69,6 +73,7 @@ opensm_SOURCES = main.c osm_console.c os osm_sa_pkey_record.c osm_sa_pkey_record_ctrl.c \ osm_sa_portinfo_record.c osm_sa_portinfo_record_ctrl.c \ osm_sa_guidinfo_record.c osm_sa_guidinfo_record_ctrl.c \ + osm_sa_multipath_record.c osm_sa_multipath_record_ctrl.c \ osm_sa_response.c osm_sa_service_record.c \ osm_sa_service_record_ctrl.c osm_sa_slvl_record.c \ osm_sa_slvl_record_ctrl.c osm_sa_sminfo_record.c \ From vuhuong at mellanox.com Fri May 5 11:53:50 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Fri, 05 May 2006 11:53:50 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: References: <443C4934.7080400@mellanox.com> <443E9A88.7020302@mellanox.com> <443FC35F.6080301@mellanox.com> <44450F89.7020500@mellanox.com> <44451D32.1010106@mellanox.com> <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> Message-ID: <445B9F3E.7060601@mellanox.com> Roland Dreier wrote: > > 1. srp_unmap_data() and srp_remove_req() for .eh_abort_handler(scmnd) > > a. abort get timeout or > > b. req->cmd_done or > > c. !req->tsk_status > > 2. we should do step (1) for .eh_abort_handler(scmnd) only and don't > > do step 1 for .eh_device_reset_handler(scmnd) since same scsi command > > is used for all .eh_handler() > > 3. scsi command is used in all .eh_handler() will be freed by scsi > > midlayer at the end of error handling sequences > > 4. If we don't do step 1, scsi command which is used in all > > .eh_handler() and freed is still in our pending queue and is > > referenced in srp_reconnect_target() / reinit request ring > > So I finally got a chance to look at this in detail. It does look > like we should remove the request in (1) if the command finishes or > the abort succeeds. However if the abort times out then then command > is still out there -- shouldn't we wait for the > eh_device_reset_handler and then flush all matching commands there? > We should remove the request in (1) if the command finishes or the abort succeeds for eh_abort_handler only For abort times out case, we can do either remove the command in eh_abort_handler or wait for eh_device_reset_handler (this may be better since it gives a command more time with a chance to complete. > And I don't understand (4) -- isn't srp_reconnect_target() being > called from srp_reset_host() as part of the error handling sequence? > Unless I'm misreading the code in scsi_error.c, commands don't get > freed (assuming all aborts and device resets fail) before then. > > What am I missing? In your case, where the abort and device reset > fail and then the host reset gets called, where was the command > getting freed? reading scsi_error.c again, I find this logic for our case (please correct me if I'm wrong) 1. eh_abort_handler and eh_device_reset_handler fail with timeout; eh_host_reset_handler successes 2. scsi_eh_host_reset goes on with scsi_eh_try_stu & scsi_eh_tur 3. either scsi_eh_try_stu or scsi_eh_tur will reuse the scsi command and call scsi_send_eh_cmnd to send STU or TUR command 4. scsi_send_eh_cmnd calls srp_queuecommand which will get new req, reformat scsi_done pointer to scsi_eh_done, and add req to req_queue for this same scsi command with different opcode (ie. STU or TUR) 5. In my case I got QP event 1 - so scsi_send_eh_cmnd will get to timeout case and call eh_abort_handler for this scsi command with opcode STU or TUR 6. scsi_eh_try_stu & scsi_eh_tur will retrieve the old scsi command back with scsi_set_cmd_retry; however, srp already change and can not retrieve the old scsi_done and host_scribble pointer 8. scsi_eh_host_reset fail and scsi_eh_offline_sdevs is called 9. scsi_eh_offline_sdevs calls scsi_eh_finish_cmd which moves the scsi command to done_q and scsi command is freed in done_q 10. However the srp req carries this scsi command still in our req_queue. The next eh_host_reset_handler will re-init the req_queue and use the scsi command pointer (this is the crash use-after-freed that we see) Bottom line my previous patch still does not address the logic above - I'll rework the patch and send to you later for review Vu From vuhuong at mellanox.com Fri May 5 12:00:56 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Fri, 05 May 2006 12:00:56 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <445B9F3E.7060601@mellanox.com> References: <443C4934.7080400@mellanox.com> <443E9A88.7020302@mellanox.com> <443FC35F.6080301@mellanox.com> <44450F89.7020500@mellanox.com> <44451D32.1010106@mellanox.com> <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> <445B9F3E.7060601@mellanox.com> Message-ID: <445BA0E8.7010104@mellanox.com> > > reading scsi_error.c again, I find this logic for our case (please > correct me if I'm wrong) > 1. eh_abort_handler and eh_device_reset_handler fail with timeout; > eh_host_reset_handler successes > 2. scsi_eh_host_reset goes on with scsi_eh_try_stu & scsi_eh_tur > 3. either scsi_eh_try_stu or scsi_eh_tur will reuse the scsi command and > call scsi_send_eh_cmnd to send STU or TUR command > 4. scsi_send_eh_cmnd calls srp_queuecommand which will get new req, > reformat scsi_done pointer to scsi_eh_done, and add req to req_queue for > this same scsi command with different opcode (ie. STU or TUR) > 5. In my case I got QP event 1 - so scsi_send_eh_cmnd will get to > timeout case and call eh_abort_handler for this scsi command with opcode > STU or TUR > 6. scsi_eh_try_stu & scsi_eh_tur will retrieve the old scsi command back > with scsi_set_cmd_retry; however, srp already change and can not > retrieve the old scsi_done and host_scribble pointer > 8. scsi_eh_host_reset fail and scsi_eh_offline_sdevs is called > 9. scsi_eh_offline_sdevs calls scsi_eh_finish_cmd which moves the scsi > command to done_q and scsi command is freed in done_q > 10. However the srp req carries this scsi command still in our > req_queue. The next eh_host_reset_handler will re-init the req_queue and > use the scsi command pointer (this is the crash use-after-freed that we > see) > > Bottom line my previous patch still does not address the logic above - > I'll rework the patch and send to you later for review > on correction: my previous patch address the issue since the the abort of TUR or STU command get time out and I remove the req; therefore the req was not in req_queue anymore and subsequence eh_host_reset_handler did not run into use-after-free Vu From rdreier at cisco.com Fri May 5 13:15:01 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 05 May 2006 13:15:01 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <445B9F3E.7060601@mellanox.com> (Vu Pham's message of "Fri, 05 May 2006 11:53:50 -0700") References: <443C4934.7080400@mellanox.com> <443E9A88.7020302@mellanox.com> <443FC35F.6080301@mellanox.com> <44450F89.7020500@mellanox.com> <44451D32.1010106@mellanox.com> <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> <445B9F3E.7060601@mellanox.com> Message-ID: > reading scsi_error.c again, I find this logic for our case (please > correct me if I'm wrong) > 1. eh_abort_handler and eh_device_reset_handler fail with timeout; > eh_host_reset_handler successes But you're crashing inside the call to srp_reconnect_target() in srp_reset_host(), right? So eh_host_reset_handler() never finishes. (The crash is because the loop that flushes the req_queue in srp_reconnect_target() hits a stale SCSI command) Or am I misunderstanding what you see? - R. From rdreier at cisco.com Fri May 5 13:27:23 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 05 May 2006 13:27:23 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <445BA0E8.7010104@mellanox.com> (Vu Pham's message of "Fri, 05 May 2006 12:00:56 -0700") References: <443E9A88.7020302@mellanox.com> <443FC35F.6080301@mellanox.com> <44450F89.7020500@mellanox.com> <44451D32.1010106@mellanox.com> <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> <445B9F3E.7060601@mellanox.com> <445BA0E8.7010104@mellanox.com> Message-ID: I'm testing the patch below, which I think does the right thing for aborts and device resets. What do you think? diff-tree 70e37d747deddb32d3b3560b42c5fd26a6118967 (from ebdb3a3d034fdea7c296f96fb0b07093ff75a989) Author: Roland Dreier Date: Fri May 5 13:26:13 2006 -0700 IB/srp: Fix tracking of pending requests during error handling Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 5bb5574..c32ce43 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -409,6 +409,34 @@ static int srp_connect_target(struct srp } } +static void srp_unmap_data(struct scsi_cmnd *scmnd, + struct srp_target_port *target, + struct srp_request *req) +{ + struct scatterlist *scat; + int nents; + + if (!scmnd->request_buffer || + (scmnd->sc_data_direction != DMA_TO_DEVICE && + scmnd->sc_data_direction != DMA_FROM_DEVICE)) + return; + + /* + * This handling of non-SG commands can be killed when the + * SCSI midlayer no longer generates non-SG commands. + */ + if (likely(scmnd->use_sg)) { + nents = scmnd->use_sg; + scat = scmnd->request_buffer; + } else { + nents = 1; + scat = &req->fake_sg; + } + + dma_unmap_sg(target->srp_host->dev->dma_device, scat, nents, + scmnd->sc_data_direction); +} + static int srp_reconnect_target(struct srp_target_port *target) { struct ib_cm_id *new_cm_id; @@ -455,16 +483,16 @@ static int srp_reconnect_target(struct s list_for_each_entry(req, &target->req_queue, list) { req->scmnd->result = DID_RESET << 16; req->scmnd->scsi_done(req->scmnd); + srp_unmap_data(req->scmnd, target, req); } target->rx_head = 0; target->tx_head = 0; target->tx_tail = 0; - target->req_head = 0; - for (i = 0; i < SRP_SQ_SIZE - 1; ++i) - target->req_ring[i].next = i + 1; - target->req_ring[SRP_SQ_SIZE - 1].next = -1; + INIT_LIST_HEAD(&target->free_reqs); INIT_LIST_HEAD(&target->req_queue); + for (i = 0; i < SRP_SQ_SIZE; ++i) + list_add_tail(&target->req_ring[i].list, &target->free_reqs); ret = srp_connect_target(target); if (ret) @@ -589,40 +617,10 @@ static int srp_map_data(struct scsi_cmnd return len; } -static void srp_unmap_data(struct scsi_cmnd *scmnd, - struct srp_target_port *target, - struct srp_request *req) -{ - struct scatterlist *scat; - int nents; - - if (!scmnd->request_buffer || - (scmnd->sc_data_direction != DMA_TO_DEVICE && - scmnd->sc_data_direction != DMA_FROM_DEVICE)) - return; - - /* - * This handling of non-SG commands can be killed when the - * SCSI midlayer no longer generates non-SG commands. - */ - if (likely(scmnd->use_sg)) { - nents = scmnd->use_sg; - scat = scmnd->request_buffer; - } else { - nents = 1; - scat = &req->fake_sg; - } - - dma_unmap_sg(target->srp_host->dev->dma_device, scat, nents, - scmnd->sc_data_direction); -} - -static void srp_remove_req(struct srp_target_port *target, struct srp_request *req, - int index) +static void srp_remove_req(struct srp_target_port *target, struct srp_request *req) { - list_del(&req->list); - req->next = target->req_head; - target->req_head = index; + srp_unmap_data(req->scmnd, target, req); + list_move_tail(&req->list, &target->free_reqs); } static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp) @@ -647,7 +645,7 @@ static void srp_process_rsp(struct srp_t req->tsk_status = rsp->data[3]; complete(&req->done); } else { - scmnd = req->scmnd; + scmnd = req->scmnd; if (!scmnd) printk(KERN_ERR "Null scmnd for RSP w/tag %016llx\n", (unsigned long long) rsp->tag); @@ -665,14 +663,11 @@ static void srp_process_rsp(struct srp_t else if (rsp->flags & (SRP_RSP_FLAG_DIOVER | SRP_RSP_FLAG_DIUNDER)) scmnd->resid = be32_to_cpu(rsp->data_in_res_cnt); - srp_unmap_data(scmnd, target, req); - if (!req->tsk_mgmt) { - req->scmnd = NULL; scmnd->host_scribble = (void *) -1L; scmnd->scsi_done(scmnd); - srp_remove_req(target, req, rsp->tag & ~SRP_TAG_TSK_MGMT); + srp_remove_req(target, req); } else req->cmd_done = 1; } @@ -859,7 +854,6 @@ static int srp_queuecommand(struct scsi_ struct srp_request *req; struct srp_iu *iu; struct srp_cmd *cmd; - long req_index; int len; if (target->state == SRP_TARGET_CONNECTING) @@ -879,22 +873,20 @@ static int srp_queuecommand(struct scsi_ dma_sync_single_for_cpu(target->srp_host->dev->dma_device, iu->dma, SRP_MAX_IU_LEN, DMA_TO_DEVICE); - req_index = target->req_head; + req = list_entry(target->free_reqs.next, struct srp_request, list); scmnd->scsi_done = done; scmnd->result = 0; - scmnd->host_scribble = (void *) req_index; + scmnd->host_scribble = (void *) (long) req->index; cmd = iu->buf; memset(cmd, 0, sizeof *cmd); cmd->opcode = SRP_CMD; cmd->lun = cpu_to_be64((u64) scmnd->device->lun << 48); - cmd->tag = req_index; + cmd->tag = req->index; memcpy(cmd->cdb, scmnd->cmnd, scmnd->cmd_len); - req = &target->req_ring[req_index]; - req->scmnd = scmnd; req->cmd = iu; req->cmd_done = 0; @@ -919,8 +911,7 @@ static int srp_queuecommand(struct scsi_ goto err_unmap; } - target->req_head = req->next; - list_add_tail(&req->list, &target->req_queue); + list_move_tail(&req->list, &target->req_queue); return 0; @@ -1143,30 +1134,20 @@ static int srp_cm_handler(struct ib_cm_i return 0; } -static int srp_send_tsk_mgmt(struct scsi_cmnd *scmnd, u8 func) +static int srp_send_tsk_mgmt(struct srp_target_port *target, + struct srp_request *req, u8 func) { - struct srp_target_port *target = host_to_target(scmnd->device->host); - struct srp_request *req; struct srp_iu *iu; struct srp_tsk_mgmt *tsk_mgmt; - int req_index; - int ret = FAILED; spin_lock_irq(target->scsi_host->host_lock); if (target->state == SRP_TARGET_DEAD || target->state == SRP_TARGET_REMOVED) { - scmnd->result = DID_BAD_TARGET << 16; + req->scmnd->result = DID_BAD_TARGET << 16; goto out; } - if (scmnd->host_scribble == (void *) -1L) - goto out; - - req_index = (long) scmnd->host_scribble; - printk(KERN_ERR "Abort for req_index %d\n", req_index); - - req = &target->req_ring[req_index]; init_completion(&req->done); iu = __srp_get_tx_iu(target); @@ -1177,10 +1158,10 @@ static int srp_send_tsk_mgmt(struct scsi memset(tsk_mgmt, 0, sizeof *tsk_mgmt); tsk_mgmt->opcode = SRP_TSK_MGMT; - tsk_mgmt->lun = cpu_to_be64((u64) scmnd->device->lun << 48); - tsk_mgmt->tag = req_index | SRP_TAG_TSK_MGMT; + tsk_mgmt->lun = cpu_to_be64((u64) req->scmnd->device->lun << 48); + tsk_mgmt->tag = req->index | SRP_TAG_TSK_MGMT; tsk_mgmt->tsk_mgmt_func = func; - tsk_mgmt->task_tag = req_index; + tsk_mgmt->task_tag = req->index; if (__srp_post_send(target, iu, sizeof *tsk_mgmt)) goto out; @@ -1188,37 +1169,85 @@ static int srp_send_tsk_mgmt(struct scsi req->tsk_mgmt = iu; spin_unlock_irq(target->scsi_host->host_lock); + if (!wait_for_completion_timeout(&req->done, msecs_to_jiffies(SRP_ABORT_TIMEOUT_MS))) - return FAILED; - spin_lock_irq(target->scsi_host->host_lock); + return -1; - if (req->cmd_done) { - srp_remove_req(target, req, req_index); - scmnd->scsi_done(scmnd); - } else if (!req->tsk_status) { - srp_remove_req(target, req, req_index); - scmnd->result = DID_ABORT << 16; - ret = SUCCESS; - } + return 0; out: spin_unlock_irq(target->scsi_host->host_lock); - return ret; + return -1; +} + +static int srp_find_req(struct srp_target_port *target, + struct scsi_cmnd *scmnd, + struct srp_request **req) +{ + if (scmnd->host_scribble == (void *) -1L) + return -1; + + *req = &target->req_ring[(long) scmnd->host_scribble]; + + return 0; } static int srp_abort(struct scsi_cmnd *scmnd) { + struct srp_target_port *target = host_to_target(scmnd->device->host); + struct srp_request *req; + int ret = SUCCESS; + printk(KERN_ERR "SRP abort called\n"); - return srp_send_tsk_mgmt(scmnd, SRP_TSK_ABORT_TASK); + if (srp_find_req(target, scmnd, &req)) + return FAILED; + if (srp_send_tsk_mgmt(target, req, SRP_TSK_ABORT_TASK)) + return FAILED; + + spin_lock_irq(target->scsi_host->host_lock); + + if (req->cmd_done) { + srp_remove_req(target, req); + scmnd->scsi_done(scmnd); + } else if (!req->tsk_status) { + srp_remove_req(target, req); + scmnd->result = DID_ABORT << 16; + } else + ret = FAILED; + + spin_unlock_irq(target->scsi_host->host_lock); + + return ret; } static int srp_reset_device(struct scsi_cmnd *scmnd) { + struct srp_target_port *target = host_to_target(scmnd->device->host); + struct srp_request *req, *tmp; + printk(KERN_ERR "SRP reset_device called\n"); - return srp_send_tsk_mgmt(scmnd, SRP_TSK_LUN_RESET); + if (srp_find_req(target, scmnd, &req)) + return FAILED; + if (srp_send_tsk_mgmt(target, req, SRP_TSK_LUN_RESET)) + return FAILED; + if (req->tsk_status) + return FAILED; + + spin_lock_irq(target->scsi_host->host_lock); + + list_for_each_entry_safe(req, tmp, &target->req_queue, list) + if (req->scmnd->device == scmnd->device) { + req->scmnd->result = DID_RESET << 16; + scmnd->scsi_done(scmnd); + srp_remove_req(target, req); + } + + spin_unlock_irq(target->scsi_host->host_lock); + + return SUCCESS; } static int srp_reset_host(struct scsi_cmnd *scmnd) @@ -1518,10 +1547,12 @@ static ssize_t srp_create_target(struct INIT_WORK(&target->work, srp_reconnect_work, target); - for (i = 0; i < SRP_SQ_SIZE - 1; ++i) - target->req_ring[i].next = i + 1; - target->req_ring[SRP_SQ_SIZE - 1].next = -1; + INIT_LIST_HEAD(&target->free_reqs); INIT_LIST_HEAD(&target->req_queue); + for (i = 0; i < SRP_SQ_SIZE; ++i) { + target->req_ring[i].index = i; + list_add_tail(&target->req_ring[i].list, &target->free_reqs); + } ret = srp_parse_options(buf, target); if (ret) diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index bd7f7c3..c5cd43a 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -101,7 +101,7 @@ struct srp_request { */ struct scatterlist fake_sg; struct completion done; - short next; + short index; u8 cmd_done; u8 tsk_status; }; @@ -133,7 +133,7 @@ struct srp_target_port { unsigned tx_tail; struct srp_iu *tx_ring[SRP_SQ_SIZE + 1]; - int req_head; + struct list_head free_reqs; struct list_head req_queue; struct srp_request req_ring[SRP_SQ_SIZE]; From sweitzen at cisco.com Fri May 5 13:50:33 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Fri, 5 May 2006 13:50:33 -0700 Subject: [openib-general] OFED-1.0-rc4 is available Message-ID: I see the failure too, and opened bug #57 for it. http://openib.org/bugzilla/show_bug.cgi?id=57 Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of > Woodruff, Robert J > Sent: Thursday, May 04, 2006 3:57 PM > To: Hefty, Sean; Davis, Arlin R > Cc: openfabrics-ewg at openib.org; openib-general > Subject: RE: [openib-general] OFED-1.0-rc4 is available > > Tziporet wrote, > > >Hi All, > > >We have prepared OFED 1.0 RC4. > > I took a version of the OFED RC4 kernel code, > gen2/branches/1.0/ofed/tags/rc4/linux-kernel > applied my latest backport patch (for svn6829), which applied fine. > and built a kernel RPM for testing. > > Then I took the 1.0 userspace code and built it. > > I found that using the cma version of uDAPL did not work > and caused a core dump. Using the newer userspace cma.c code > fixes the problem. I applied this patch and it fixed the > problem. > > Not sure if anyone cares about having the rdma_cm in OFED, but > if they do, I think it needs this fix. > > woody > > --- cma.c 2006-04-07 10:15:20.000000000 -0700 > +++ /home/woody/gen2/trunk/src/userspace/librdmacm/src/cma.c > 2006-05-04 16:24:00.701184088 -0700 > @@ -109,6 +109,7 @@ struct cma_id_private { > struct rdma_cm_id id; > struct cma_device *cma_dev; > int events_completed; > + int connect_error; > pthread_cond_t cond; > pthread_mutex_t mut; > uint32_t handle; > @@ -150,10 +151,8 @@ static int check_abi_version(void) > return -ENODEV; > } > > - strncat(path, "/class/misc/rdma_cm/abi_version", sizeof path); > - if (sysfs_read_attribute_value(path, val, sizeof val)) > - abi_ver = 1; /* ABI version wasn't available until > version 2 */ > - else > + strncat(path, "/class/infiniband_ucma/abi_version", sizeof > path); > + if (!sysfs_read_attribute_value(path, val, sizeof val)) > abi_ver = strtol(val, NULL, 10); > > if (abi_ver < RDMA_USER_CM_MIN_ABI_VERSION || > @@ -435,11 +434,9 @@ int rdma_bind_addr(struct rdma_cm_id *id > if (ret != size) > return (ret > 0) ? -ENODATA : ret; > > - if (abi_ver > 1) { > - ret = ucma_query_route(id); > - if (ret) > - return ret; > - } > + ret = ucma_query_route(id); > + if (ret) > + return ret; > > memcpy(&id->route.addr.src_addr, addr, addrlen); > return 0; > @@ -689,7 +686,7 @@ int rdma_listen(struct rdma_cm_id *id, i > if (ret != size) > return (ret > 0) ? -ENODATA : ret; > > - return 0; > + return ucma_query_route(id); > } > > int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param > *conn_param) > @@ -924,17 +921,27 @@ retry: > evt->status = ucma_process_conn_resp(id_priv); > if (!evt->status) > evt->event = RDMA_CM_EVENT_ESTABLISHED; > - else > + else { > evt->event = RDMA_CM_EVENT_CONNECT_ERROR; > + id_priv->connect_error = 1; > + } > break; > case RDMA_CM_EVENT_ESTABLISHED: > evt->status = ucma_process_establish(&id_priv->id); > - if (evt->status) > + if (evt->status) { > evt->event = RDMA_CM_EVENT_CONNECT_ERROR; > + id_priv->connect_error = 1; > + } > break; > case RDMA_CM_EVENT_REJECTED: > + if (id_priv->connect_error) > + goto retry; > ucma_modify_qp_err(evt->id); > break; > + case RDMA_CM_EVENT_DISCONNECTED: > + if (id_priv->connect_error) > + goto retry; > + break; > default: > break; > } > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From sean.hefty at intel.com Fri May 5 14:11:44 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 5 May 2006 14:11:44 -0700 Subject: [openib-general] [PATCH 1/2] librdmacm: add event channels Message-ID: Introduce event channels to the userspace RDMA CM. Event channels allow the user to direct communication events to a specific fd. These are similar in concept to the completion channels in the userspace verbs library. Event channels give users greater control over event processing, allowing different threads to process events for different rdma_cm_id's. Signed-off-by: Sean Hefty --- Index: src/cma.c =================================================================== --- src/cma.c (revision 6950) +++ src/cma.c (working copy) @@ -120,7 +120,6 @@ static struct dlist *cma_dev_list; static pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; static int ucma_initialized; static int abi_ver; -int cma_fd; #define container_of(ptr, type, field) \ ((type *) ((void *)ptr - offsetof(type, field))) @@ -136,9 +135,6 @@ static void ucma_cleanup(void) dlist_destroy(cma_dev_list); cma_dev_list = NULL; } - - if (cma_fd > 0) - close(cma_fd); } static int check_abi_version(void) @@ -176,13 +172,6 @@ static int ucma_init(void) if (ucma_initialized) goto out; - cma_fd = open("/dev/infiniband/rdma_cm", O_RDWR); - if (cma_fd < 0) { - printf("CMA: unable to open /dev/infiniband/rdma_cm\n"); - ret = -ENOENT; - goto err; - } - ret = check_abi_version(); if (ret) goto err; @@ -241,6 +230,34 @@ static void __attribute__((destructor)) ucma_cleanup(); } +struct rdma_event_channel *rdma_create_event_channel() +{ + struct rdma_event_channel *channel; + + if (!ucma_initialized && ucma_init()) + return NULL; + + channel = malloc(sizeof *channel); + if (!channel) + return NULL; + + channel->fd = open("/dev/infiniband/rdma_cm", O_RDWR); + if (channel->fd < 0) { + printf("CMA: unable to open /dev/infiniband/rdma_cm\n"); + goto err; + } + return channel; +err: + free(channel); + return NULL; +} + +int rdma_destroy_event_channel(struct rdma_event_channel *channel) +{ + close(channel->fd); + free(channel); +} + static int ucma_get_device(struct cma_id_private *id_priv, uint64_t guid) { struct cma_device *cma_dev; @@ -264,7 +281,8 @@ static void ucma_free_id(struct cma_id_p free(id_priv); } -static struct cma_id_private *ucma_alloc_id(void *context) +static struct cma_id_private *ucma_alloc_id(struct rdma_event_channel *channel, + void *context) { struct cma_id_private *id_priv; @@ -274,6 +292,7 @@ static struct cma_id_private *ucma_alloc memset(id_priv, 0, sizeof *id_priv); id_priv->id.context = context; + id_priv->id.channel = channel; pthread_mutex_init(&id_priv->mut, NULL); if (pthread_cond_init(&id_priv->cond, NULL)) goto err; @@ -284,7 +303,8 @@ err: ucma_free_id(id_priv); return NULL; } -int rdma_create_id(struct rdma_cm_id **id, void *context) +int rdma_create_id(struct rdma_event_channel *channel, + struct rdma_cm_id **id, void *context) { struct ucma_abi_create_id_resp *resp; struct ucma_abi_create_id *cmd; @@ -296,14 +316,14 @@ int rdma_create_id(struct rdma_cm_id **i if (ret) return ret; - id_priv = ucma_alloc_id(context); + id_priv = ucma_alloc_id(channel, context); if (!id_priv) return -ENOMEM; CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_CREATE_ID, size); cmd->uid = (uintptr_t) id_priv; - ret = write(cma_fd, msg, size); + ret = write(channel->fd, msg, size); if (ret != size) goto err; @@ -315,7 +335,7 @@ err: ucma_free_id(id_priv); return ret; } -static int ucma_destroy_kern_id(uint32_t handle) +static int ucma_destroy_kern_id(int fd, uint32_t handle) { struct ucma_abi_destroy_id_resp *resp; struct ucma_abi_destroy_id *cmd; @@ -325,7 +345,7 @@ static int ucma_destroy_kern_id(uint32_t CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_DESTROY_ID, size); cmd->id = handle; - ret = write(cma_fd, msg, size); + ret = write(fd, msg, size); if (ret != size) return (ret > 0) ? -ENODATA : ret; @@ -338,7 +358,7 @@ int rdma_destroy_id(struct rdma_cm_id *i int ret; id_priv = container_of(id, struct cma_id_private, id); - ret = ucma_destroy_kern_id(id_priv->handle); + ret = ucma_destroy_kern_id(id->channel->fd, id_priv->handle); if (ret < 0) return ret; @@ -378,7 +398,7 @@ static int ucma_query_route(struct rdma_ id_priv = container_of(id, struct cma_id_private, id); cmd->id = id_priv->handle; - ret = write(cma_fd, msg, size); + ret = write(id->channel->fd, msg, size); if (ret != size) return (ret > 0) ? -ENODATA : ret; @@ -430,7 +450,7 @@ int rdma_bind_addr(struct rdma_cm_id *id cmd->id = id_priv->handle; memcpy(&cmd->addr, addr, addrlen); - ret = write(cma_fd, msg, size); + ret = write(id->channel->fd, msg, size); if (ret != size) return (ret > 0) ? -ENODATA : ret; @@ -462,7 +482,7 @@ int rdma_resolve_addr(struct rdma_cm_id memcpy(&cmd->dst_addr, dst_addr, daddrlen); cmd->timeout_ms = timeout_ms; - ret = write(cma_fd, msg, size); + ret = write(id->channel->fd, msg, size); if (ret != size) return (ret > 0) ? -ENODATA : ret; @@ -482,7 +502,7 @@ int rdma_resolve_route(struct rdma_cm_id cmd->id = id_priv->handle; cmd->timeout_ms = timeout_ms; - ret = write(cma_fd, msg, size); + ret = write(id->channel->fd, msg, size); if (ret != size) return (ret > 0) ? -ENODATA : ret; @@ -503,7 +523,7 @@ static int rdma_init_qp_attr(struct rdma cmd->id = id_priv->handle; cmd->qp_state = qp_attr->qp_state; - ret = write(cma_fd, msg, size); + ret = write(id->channel->fd, msg, size); if (ret != size) return (ret > 0) ? -ENODATA : ret; @@ -663,7 +683,7 @@ int rdma_connect(struct rdma_cm_id *id, cmd->id = id_priv->handle; ucma_copy_conn_param_to_kern(&cmd->conn_param, conn_param, id->qp); - ret = write(cma_fd, msg, size); + ret = write(id->channel->fd, msg, size); if (ret != size) return (ret > 0) ? -ENODATA : ret; @@ -682,7 +702,7 @@ int rdma_listen(struct rdma_cm_id *id, i cmd->id = id_priv->handle; cmd->backlog = backlog; - ret = write(cma_fd, msg, size); + ret = write(id->channel->fd, msg, size); if (ret != size) return (ret > 0) ? -ENODATA : ret; @@ -706,7 +726,7 @@ int rdma_accept(struct rdma_cm_id *id, s cmd->uid = (uintptr_t) id_priv; ucma_copy_conn_param_to_kern(&cmd->conn_param, conn_param, id->qp); - ret = write(cma_fd, msg, size); + ret = write(id->channel->fd, msg, size); if (ret != size) { ucma_modify_qp_err(id); return (ret > 0) ? -ENODATA : ret; @@ -733,7 +753,7 @@ int rdma_reject(struct rdma_cm_id *id, c } else cmd->private_data_len = 0; - ret = write(cma_fd, msg, size); + ret = write(id->channel->fd, msg, size); if (ret != size) return (ret > 0) ? -ENODATA : ret; @@ -755,7 +775,7 @@ int rdma_disconnect(struct rdma_cm_id *i id_priv = container_of(id, struct cma_id_private, id); cmd->id = id_priv->handle; - ret = write(cma_fd, msg, size); + ret = write(id->channel->fd, msg, size); if (ret != size) return (ret > 0) ? -ENODATA : ret; @@ -806,9 +826,9 @@ static int ucma_process_conn_req(struct int ret; listen_id_priv = container_of(event->id, struct cma_id_private, id); - id_priv = ucma_alloc_id(event->id->context); + id_priv = ucma_alloc_id(event->id->channel, event->id->context); if (!id_priv) { - ucma_destroy_kern_id(handle); + ucma_destroy_kern_id(event->id->channel->fd, handle); ret = -ENOMEM; goto err; } @@ -846,7 +866,7 @@ static int ucma_process_conn_resp(struct CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_ACCEPT, size); cmd->id = id_priv->handle; - ret = write(cma_fd, msg, size); + ret = write(id_priv->id.channel->fd, msg, size); if (ret != size) { ret = (ret > 0) ? -ENODATA : ret; goto err; @@ -869,7 +889,8 @@ static int ucma_process_establish(struct return ret; } -int rdma_get_cm_event(struct rdma_cm_event **event) +int rdma_get_cm_event(struct rdma_event_channel *channel, + struct rdma_cm_event **event) { struct ucma_abi_event_resp *resp; struct ucma_abi_get_event *cmd; @@ -891,7 +912,7 @@ int rdma_get_cm_event(struct rdma_cm_eve retry: CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_GET_EVENT, size); - ret = write(cma_fd, msg, size); + ret = write(channel->fd, msg, size); if (ret != size) { ret = (ret > 0) ? -ENODATA : ret; goto err; @@ -953,17 +974,6 @@ err: return ret; } -int rdma_get_fd(void) -{ - int ret; - - ret = ucma_initialized ? 0 : ucma_init(); - if (ret) - return ret; - - return cma_fd; -} - int rdma_get_option(struct rdma_cm_id *id, int level, int optname, void *optval, size_t *optlen) { @@ -981,7 +991,7 @@ int rdma_get_option(struct rdma_cm_id *i cmd->optname = optname; cmd->optlen = *optlen; - ret = write(cma_fd, msg, size); + ret = write(id->channel->fd, msg, size); if (ret != size) return (ret > 0) ? -ENODATA : ret; @@ -1005,7 +1015,7 @@ int rdma_set_option(struct rdma_cm_id *i cmd->optname = optname; cmd->optlen = optlen; - ret = write(cma_fd, msg, size); + ret = write(id->channel->fd, msg, size); if (ret != size) return (ret > 0) ? -ENODATA : ret; Index: src/librdmacm.map =================================================================== --- src/librdmacm.map (revision 6950) +++ src/librdmacm.map (working copy) @@ -1,5 +1,7 @@ RDMACM_1.0 { global: + rdma_create_event_channel; + rdma_destroy_event_channel; rdma_create_id; rdma_destroy_id; rdma_bind_addr; @@ -14,7 +16,6 @@ RDMACM_1.0 { rdma_disconnect; rdma_get_cm_event; rdma_ack_cm_event; - rdma_get_fd; rdma_get_option; rdma_set_option; local: *; Index: include/rdma/rdma_cma.h =================================================================== --- include/rdma/rdma_cma.h (revision 6950) +++ include/rdma/rdma_cma.h (working copy) @@ -85,8 +85,13 @@ struct rdma_route { int num_paths; }; +struct rdma_event_channel { + int fd; +}; + struct rdma_cm_id { struct ibv_context *verbs; + struct rdma_event_channel *channel; void *context; struct ibv_qp *qp; struct rdma_route route; @@ -102,8 +107,33 @@ struct rdma_cm_event { uint8_t private_data_len; }; -int rdma_create_id(struct rdma_cm_id **id, void *context); +/** + * rdma_create_event_channel - Open a channel used to report communication + * events. + */ +struct rdma_event_channel *rdma_create_event_channel(); + +/** + * rdma_destroy_event_channel - Close the event communication channel. + * @channel: The communication channel to destroy. + */ +int rdma_destroy_event_channel(struct rdma_event_channel *channel); +/** + * rdma_create_id - Allocate a communication identifier. + * @channel: The communication channel that events associated with the + * allocated rdma_cm_id will be reported on. + * @id: A reference where the allocated communication identifier will be + * returned. + * @context: User specified context associated with the rdma_cm_id. + */ +int rdma_create_id(struct rdma_event_channel *channel, + struct rdma_cm_id **id, void *context); + +/** + * rdma_destroy_id - Release a communication identifier. + * @id: The communication identifier to destroy. + */ int rdma_destroy_id(struct rdma_cm_id *id); /** @@ -209,6 +239,7 @@ int rdma_disconnect(struct rdma_cm_id *i /** * rdma_get_cm_event - Retrieves the next pending communications event, * if no event is pending waits for an event. + * @channel: Event channel to check for events. * @event: Allocated information about the next communication event. * Event should be freed using rdma_ack_cm_event() * @@ -216,7 +247,8 @@ int rdma_disconnect(struct rdma_cm_id *i * in the allocation of a new @rdma_cm_id. * Clients are responsible for destroying the new @rdma_cm_id. */ -int rdma_get_cm_event(struct rdma_cm_event **event); +int rdma_get_cm_event(struct rdma_event_channel *channel, + struct rdma_cm_event **event); /** * rdma_ack_cm_event - Free a communications event. @@ -228,8 +260,6 @@ int rdma_get_cm_event(struct rdma_cm_eve */ int rdma_ack_cm_event(struct rdma_cm_event *event); -int rdma_get_fd(void); - /** * rdma_get_option - Retrieve options for an rdma_cm_id. * @id: Communication identifier to retrieve option for. From sean.hefty at intel.com Fri May 5 14:14:06 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 5 May 2006 14:14:06 -0700 Subject: [openib-general] [PATCH 2/2] librdmacm: update test programs to use event channels In-Reply-To: Message-ID: Update test programs to use the event channel interfaces. Signed-off-by: Sean Hefty --- Index: examples/rping.c =================================================================== --- examples/rping.c (revision 5693) +++ examples/rping.c (working copy) @@ -141,6 +141,7 @@ struct rping_cb { /* CM stuff */ pthread_t cmthread; + struct rdma_event_channel *cm_channel; struct rdma_cm_id *cm_id; /* connection on client side,*/ /* listener on service side. */ struct rdma_cm_id *child_cm_id; /* connection on server side */ @@ -532,11 +533,12 @@ err1: static void *cm_thread(void *arg) { + struct rping_cb *cb = arg; struct rdma_cm_event *event; int ret; while (1) { - ret = rdma_get_cm_event(&event); + ret = rdma_get_cm_event(cb->cm_channel, &event); if (ret) { fprintf(stderr, "rdma_get_cm_event err %d\n", ret); exit(ret); @@ -1017,11 +1019,18 @@ int main(int argc, char *argv[]) goto out; } - ret = rdma_create_id(&cb->cm_id, cb); + cb->cm_channel = rdma_create_event_channel(); + if (!cb->cm_channel) { + ret = errno; + fprintf(stderr, "rdma_create_event_channel error %d\n", ret); + goto out; + } + + ret = rdma_create_id(cb->cm_channel, &cb->cm_id, cb); if (ret) { ret = errno; fprintf(stderr, "rdma_create_id error %d\n", ret); - goto out; + goto out2; } DEBUG_LOG("created cm_id %p\n", cb->cm_id); @@ -1034,6 +1043,8 @@ int main(int argc, char *argv[]) DEBUG_LOG("destroy cm_id %p\n", cb->cm_id); rdma_destroy_id(cb->cm_id); +out2: + rdma_destroy_event_channel(cb->cm_channel); out: free(cb); return ret; Index: examples/cmatose.c =================================================================== --- examples/cmatose.c (revision 6183) +++ examples/cmatose.c (working copy) @@ -69,6 +69,7 @@ struct cmatest_node { }; struct cmatest { + struct rdma_event_channel *channel; struct cmatest_node *nodes; int conn_index; int connects_left; @@ -375,7 +376,8 @@ static int alloc_nodes(void) for (i = 0; i < connections; i++) { test.nodes[i].id = i; if (!is_server) { - ret = rdma_create_id(&test.nodes[i].cma_id, + ret = rdma_create_id(test.channel, + &test.nodes[i].cma_id, &test.nodes[i]); if (ret) goto err; @@ -424,7 +426,7 @@ static void connect_events(void) int err = 0; while (test.connects_left && !err) { - err = rdma_get_cm_event(&event); + err = rdma_get_cm_event(test.channel, &event); if (!err) { cma_handler(event->id, event); rdma_ack_cm_event(event); @@ -438,7 +440,7 @@ static void disconnect_events(void) int err = 0; while (test.disconnects_left && !err) { - err = rdma_get_cm_event(&event); + err = rdma_get_cm_event(test.channel, &event); if (!err) { cma_handler(event->id, event); rdma_ack_cm_event(event); @@ -452,7 +454,7 @@ static void run_server(void) int i, ret; printf("cmatose: starting server\n"); - ret = rdma_create_id(&listen_id, &test); + ret = rdma_create_id(test.channel, &listen_id, &test); if (ret) { printf("cmatose: listen request failed\n"); return; @@ -582,6 +584,13 @@ int main(int argc, char **argv) test.src_addr = (struct sockaddr *) &test.src_in; test.connects_left = connections; test.disconnects_left = connections; + + test.channel = rdma_create_event_channel(); + if (!test.channel) { + printf("failed to create event channel\n"); + exit(1); + } + if (alloc_nodes()) exit(1); @@ -592,5 +601,6 @@ int main(int argc, char **argv) printf("test complete\n"); destroy_nodes(); + rdma_destroy_event_channel(test.channel); return 0; } From robert.j.woodruff at intel.com Fri May 5 14:15:52 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Fri, 5 May 2006 14:15:52 -0700 Subject: [openib-general] OFED-1.0-rc4 is available Message-ID: <1AC79F16F5C5284499BB9591B33D6F00079F9D93@orsmsx408> Does anyone know what the largest cluster is that has been run reliably with recent versions of the openib code, either the trunk or OFED, using OpenSM as the subnet manager ? I have someone that is building a 256 node cluster and want to know if they can rely on OpenSM or if they should use the SM in the switch. If folks could share their recent experiences that would be great. woody From vuhuong at mellanox.com Fri May 5 14:19:42 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Fri, 05 May 2006 14:19:42 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: References: <443C4934.7080400@mellanox.com> <443E9A88.7020302@mellanox.com> <443FC35F.6080301@mellanox.com> <44450F89.7020500@mellanox.com> <44451D32.1010106@mellanox.com> <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> <445B9F3E.7060601@mellanox.com> Message-ID: <445BC16E.2090100@mellanox.com> Roland Dreier wrote: > > reading scsi_error.c again, I find this logic for our case (please > > correct me if I'm wrong) > > 1. eh_abort_handler and eh_device_reset_handler fail with timeout; > > eh_host_reset_handler successes > > But you're crashing inside the call to srp_reconnect_target() in > srp_reset_host(), right? So eh_host_reset_handler() never finishes. > > (The crash is because the loop that flushes the req_queue in > srp_reconnect_target() hits a stale SCSI command) > > Or am I misunderstanding what you see? > The first eh_host_reset_handler() will finish without problem; however, the next eh_host_reset_handler() will hit a stale scsi command Vu From robert.j.woodruff at intel.com Fri May 5 14:57:23 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Fri, 5 May 2006 14:57:23 -0700 Subject: [openib-general] Qlogic (Pathscale driver) on OFED RC4 Message-ID: <1AC79F16F5C5284499BB9591B33D6F00079F9E3B@orsmsx408> When I try to run ibv_rc_pingpong on OFED RC4 on Qlogic PCI-Express cards in an Intel Xeon 3.6Ghz (Lindenhurst) platform on a 2.6.9-34EL (RedHar EL4-U3 kernel), I get the following error. This is the same problem I see on the trunk, 6829. [root at rkl-12 bin]# ./ibv_rc_pingpong local address: LID 0x0004, QPN 0x000003, PSN 0x143bc0 remote address: LID 0x0003, QPN 0x000004, PSN 0xa49b38 Failed status 12 for wr_id 2 Can someone open a bug on this one in Bugzilla ? woody -------------- next part -------------- An HTML attachment was scrubbed... URL: From bos at pathscale.com Fri May 5 15:10:10 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 05 May 2006 15:10:10 -0700 Subject: [openib-general] Re: Qlogic (Pathscale driver) on OFED RC4 In-Reply-To: <1AC79F16F5C5284499BB9591B33D6F00079F9E3B@orsmsx408> References: <1AC79F16F5C5284499BB9591B33D6F00079F9E3B@orsmsx408> Message-ID: <1146867010.2970.35.camel@chalcedony.pathscale.com> On Fri, 2006-05-05 at 14:57 -0700, Woodruff, Robert J wrote: > Can someone open a bug on this one in Bugzilla ? Sure. http://openib.org/bugzilla/show_bug.cgi?id=60 Ranjit, are you the default owner for RDS bugs? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems From surs at cse.ohio-state.edu Fri May 5 15:21:29 2006 From: surs at cse.ohio-state.edu (Sayantan Sur) Date: Fri, 5 May 2006 18:21:29 -0400 Subject: [openib-general] Re: [mvapich-discuss] RE: [openfabrics-ewg] Current OFED kernel snapshot In-Reply-To: References: Message-ID: <20060505222126.GC10255@cse.ohio-state.edu> Hi, * On May,10 Roland Dreier wrote : > Abhinav> By referring to the PPC64 architecture, i was mentioning > Abhinav> about the IBM HCAs(4x/12x) running on GX/GX+ Bus. To the > Abhinav> best of my knowledge, these HCAs do not support the > Abhinav> features mentioned above. > > Hmm, making this a compile-time thing seems like a problem then. Some > ppc64 systems have IBM eHCAs and some have Mellanox and/or PathScale > HCAs. Shouldn't the same MPI package work on all of these systems? We've just incorporated an autodetection utility as a part of our build script (make.mvapich.gen2). It essentially reads the type of HCA from the standard location /sys/class/infiniband//hca_type. Using this script, we have decoupled the choice of architecture and the InfiniBand card in the system. We believe using this script, MVAPICH will be able to work on PPC64 systems connected with Mellanox/Pathscale cards. On our installation of the IBM PPC64 systems, there is no `hca_type' file present. If that file is there, then the above mentioned "decoupling" thing can be done easily by trivially modifying our script. Does anyone have a PPC64 installation which has that file? With OpenIB/Gen2 standardized method of exporting HCA types, we will soon have runtime detection and optimization. Thanks, Sayantan. -- http://www.cse.ohio-state.edu/~surs From arlin.r.davis at intel.com Fri May 5 15:45:30 2006 From: arlin.r.davis at intel.com (Arlin Davis) Date: Fri, 5 May 2006 15:45:30 -0700 Subject: [openib-general] [PATCH] update uDAPL openib_cma provider to work with new uCMA event channels Message-ID: James, Update the uDAPL openib_cma provider to work with the new uCMA event channel interface. I ran a full set of Intel-MPI test suites with these latest changes and it looks fine. Sync up with Sean on commits. Signed-off by: Arlin Davis Index: dapl/openib_cma/dapl_ib_util.c =================================================================== --- dapl/openib_cma/dapl_ib_util.c (revision 6942) +++ dapl/openib_cma/dapl_ib_util.c (working copy) @@ -67,6 +67,7 @@ static const char rcsid[] = "$Id: $"; int g_dapl_loopback_connection = 0; int g_ib_pipe[2]; +struct rdma_event_channel *g_cm_events = NULL; ib_thread_state_t g_ib_thread_state = 0; DAPL_OS_THREAD g_ib_thread; DAPL_OS_LOCK g_hca_lock; @@ -184,6 +185,7 @@ int32_t dapls_ib_release(void) { dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " dapl_ib_release: \n"); dapli_ib_thread_destroy(); + rdma_destroy_event_channel(g_cm_events); return 0; } @@ -214,9 +216,17 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_N dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " open_hca: %s - %p\n", hca_name, hca_ptr); + /* Setup the global cm event channel */ + dapl_os_lock(&g_hca_lock); + if (g_cm_events == NULL) { + g_cm_events = rdma_create_event_channel(); + if (g_cm_events == NULL) + return DAT_INTERNAL_ERROR; + } + dapl_os_unlock(&g_hca_lock); + if (dapli_ib_thread_init()) return DAT_INTERNAL_ERROR; - /* HCA name will be hostname or IP address */ if (getipaddr((char*)hca_name, @@ -224,14 +234,13 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_N sizeof(DAT_SOCK_ADDR6))) return DAT_INVALID_ADDRESS; - /* cm_id will bind local device/GID based on IP address */ - if (rdma_create_id(&cm_id, (void*)hca_ptr)) + if (rdma_create_id(g_cm_events, &cm_id, (void*)hca_ptr)) return DAT_INTERNAL_ERROR; ret = rdma_bind_addr(cm_id, (struct sockaddr *)&hca_ptr->hca_address); - if (ret) { + if ((ret) || (cm_id->verbs == NULL)) { rdma_destroy_id(cm_id); dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " open_hca: ERR bind (%d) %s \n", @@ -551,8 +560,8 @@ int dapli_ib_thread_init(void) } /* uCMA events non-blocking */ - opts = fcntl(rdma_get_fd(), F_GETFL); /* uCMA */ - if (opts < 0 || fcntl(rdma_get_fd(), + opts = fcntl(g_cm_events->fd, F_GETFL); /* uCMA */ + if (opts < 0 || fcntl(g_cm_events->fd, F_SETFL, opts | O_NONBLOCK) < 0) { dapl_dbg_log (DAPL_DBG_TYPE_ERR, " dapl_ib_init: ERR with uCMA FD\n" ); @@ -741,7 +750,7 @@ void dapli_thread(void *arg) dapl_dbg_log (DAPL_DBG_TYPE_UTIL, " ib_thread(%d,0x%x): ENTER: pipe %d ucma %d\n", - getpid(), g_ib_thread, g_ib_pipe[0], rdma_get_fd()); + getpid(), g_ib_thread, g_ib_pipe[0], g_cm_events->fd); /* Poll across pipe, CM, AT never changes */ dapl_os_lock( &g_hca_lock ); @@ -749,7 +758,7 @@ void dapli_thread(void *arg) ufds[0].fd = g_ib_pipe[0]; /* pipe */ ufds[0].events = POLLIN; - ufds[1].fd = rdma_get_fd(); /* uCMA */ + ufds[1].fd = g_cm_events->fd; /* uCMA */ ufds[1].events = POLLIN; while (g_ib_thread_state == IB_THREAD_RUN) { Index: dapl/openib_cma/dapl_ib_cm.c =================================================================== --- dapl/openib_cma/dapl_ib_cm.c (revision 6942) +++ dapl/openib_cma/dapl_ib_cm.c (working copy) @@ -59,6 +59,8 @@ #include #include +extern struct rdma_event_channel *g_cm_events; + /* local prototypes */ static struct dapl_cm_id * dapli_req_recv(struct dapl_cm_id *conn, struct rdma_cm_event *event); @@ -614,7 +616,7 @@ dapls_ib_setup_conn_listener(IN DAPL_IA dapl_os_lock_init(&conn->lock); /* create CM_ID, bind to local device, create QP */ - if (rdma_create_id(&conn->cm_id, (void*)conn)) { + if (rdma_create_id(g_cm_events, &conn->cm_id, (void*)conn)) { dapl_os_free(conn, sizeof(*conn)); return(dapl_convert_errno(errno,"setup_listener")); } @@ -1067,7 +1069,7 @@ void dapli_cma_event_cb(void) dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " cm_event()\n"); /* process one CM event, fairness */ - if(!rdma_get_cm_event(&event)) { + if(!rdma_get_cm_event(g_cm_events, &event)) { struct dapl_cm_id *conn; /* set proper conn from cm_id context*/ Index: dapl/openib_cma/dapl_ib_qp.c =================================================================== --- dapl/openib_cma/dapl_ib_qp.c (revision 6942) +++ dapl/openib_cma/dapl_ib_qp.c (working copy) @@ -35,6 +35,8 @@ #include "dapl.h" #include "dapl_adapter_util.h" +extern struct rdma_event_channel *g_cm_events; + /* * dapl_ib_qp_alloc * @@ -128,7 +130,7 @@ DAT_RETURN dapls_ib_qp_alloc(IN DAPL_IA dapl_os_lock_init(&conn->lock); /* create CM_ID, bind to local device, create QP */ - if (rdma_create_id(&cm_id, (void*)conn)) { + if (rdma_create_id(g_cm_events, &cm_id, (void*)conn)) { dapl_os_free(conn, sizeof(*conn)); return(dapl_convert_errno(errno, "create_qp")); } From vuhuong at mellanox.com Fri May 5 16:07:58 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Fri, 05 May 2006 16:07:58 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: References: <443E9A88.7020302@mellanox.com> <443FC35F.6080301@mellanox.com> <44450F89.7020500@mellanox.com> <44451D32.1010106@mellanox.com> <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> <445B9F3E.7060601@mellanox.com> <445BA0E8.7010104@mellanox.com> Message-ID: <445BDACE.8000205@mellanox.com> Roland Dreier wrote: > I'm testing the patch below, which I think does the right thing for > aborts and device resets. What do you think? > It still does not address the issue pointed out from my previous email - the first eh_host_reset_handler() success, right away scsi_eh_host_reset() send start-stop-unit or test-unit-ready command using the same scsi command. This stu or tur command stuck in our queue, get timeout and get aborted. The abortion of stu or tur command once again get timeout. The original scsi command get freed. We delay the clean-up of the associated request in eh_device_reset_handler() instead of in eh_abort_handler() so it's still in our queue. The lun is marked offline. The next eh_device_reset_handler() for the same lun won't be called. The next eh_reset_host_handler() will hit used-after-free bug. You can see the log below May 5 16:36:22 lab105 kernel: ib_srp: failed receive status 5 May 5 16:36:24 lab105 kernel: ib_srp: connection closed May 5 16:36:24 lab105 kernel: ib_mthca 0000:05:00.0: CQ overrun on CQN 040082 May 5 16:36:24 lab105 kernel: ib_srp: QP event 1 May 5 16:36:24 lab105 last message repeated 2 times May 5 16:36:54 lab105 kernel: SRP abort called May 5 16:36:59 lab105 kernel: SRP reset_device called May 5 16:37:04 lab105 kernel: ib_srp: SRP reset_host called May 5 16:37:06 lab105 kernel: ib_srp: connection closed May 5 16:37:16 lab105 kernel: ib_srp: QP event 1 May 5 16:37:16 lab105 last message repeated 3 times May 5 16:37:26 lab105 kernel: SRP abort called May 5 16:37:26 lab105 kernel: ib_srp: QP event 1 May 5 16:37:31 lab105 kernel: sd 6:0:0:1: scsi: Device offlined - not ready after error recovery May 5 16:37:31 lab105 kernel: sd 6:0:0:1: rejecting I/O to offline device May 5 16:37:31 lab105 kernel: Buffer I/O error on device sdd, logical block 0 May 5 16:37:31 lab105 kernel: Buffer I/O error on device sdd, logical block 1 May 5 16:37:31 lab105 kernel: sd 6:0:0:1: rejecting I/O to offline device May 5 16:37:31 lab105 kernel: Buffer I/O error on device sdd, logical block 0 May 5 16:37:31 lab105 kernel: ib_srp: QP event 1 May 5 16:37:31 lab105 kernel: ib_srp: QP event 1 May 5 16:38:01 lab105 kernel: SRP abort called May 5 16:38:06 lab105 kernel: SRP reset_device called May 5 16:38:11 lab105 kernel: ib_srp: SRP reset_host called May 5 16:38:13 scsi_eh_6[27704]: Oops 11012296146944 [1] Modules linked in: ib_srp ib_cm ib_sa ib_mthca ib_mad ib_core nls_utf8 evdev joydev sg st sr_mod ide_cd cdrom usbserial parport_pc lp parport thermal processor ipv6 fan button d Pid: 27704, CPU 1, comm: scsi_eh_6 psr : 00001210081a6018 ifs : 800000000000058e ip : [] Not tainted lab105 kernel: iip is at srp_reconnect_target+0x2b1/0x5e0 [ib_srp] unat: 0000000000000000 pfs : 000000000000058e rsc : 0000000000000003 rnat: 0000000000000000 bsps: 0000000000000000 pr : 0000000000009141 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000002022755e0 b6 : a00000010000faa0 b7 : a000000202215ac0 f6 : 1003e6b6b6b6b6b6b6b6b f7 : 0ffdd8000000000000000 f8 : 1003e00000000000014c8 f9 : 1003e0000000000000013 f10 : 1003e0000000000000000 f11 : 1003e0000000000000000 r1 : a0000002022782f8 r2 : e0000001f0d6bb48 r3 : e0000001f0d6b9e8 r8 : 0000000000000000 r9 : a00000010090d8f0 r10 : a00000010090d8f8 r11 : 0000000000000001 r12 : e0000001b03d7d00 r13 : e0000001b03d0000 r14 : a00000010090d900 r15 : e0000001b03d0000 r16 : 0000000000000001 r17 : 0000000000000001 r18 : e0000001b03d0fa4 r19 : a00000010090d908 r20 : ffffffffffffffff r21 : 0000000000000008 r22 : e0000000752c4400 r23 : e000000060ee5688 r24 : 0000000000000080 r25 : e0000000752c441f r26 : a000000202215ac0 r27 : e00000018a40e1e0 r28 : e00000018a40e000 r29 : e000000060ee55e8 r30 : e0000001b1fd8ac0 r31 : e0000001b1fd8a28 Call Trace: b_srp: connectio [] show_stack+0x80/0xa0 sp=e0000001b03d7880 bsp=e0000001b03d1330 n closed [] show_regs+0x840/0x880 sp=e0000001b03d7a50 bsp=e0000001b03d12d0 [] die+0x1b0/0x2e0 sp=e0000001b03d7a60 bsp=e0000001b03d1288 [] ia64_do_page_fault+0x970/0xae0 sp=e0000001b03d7a80 bsp=e0000001b03d1220 [] ia64_leave_kernel+0x0/0x280 sp=e0000001b03d7b30 bsp=e0000001b03d1220 [] srp_reconnect_target+0x2b0/0x5e0 [ib_srp] sp=e0000001b03d7d00 bsp=e0000001b03d11a8 [] srp_reset_host+0x60/0xa0 [ib_srp] sp=e0000001b03d7dc0 bsp=e0000001b03d1180 [] scsi_try_host_reset+0xd0/0x240 [scsi_mod] sp=e0000001b03d7dc0 bsp=e0000001b03d1150 [] scsi_error_handler+0x17f0/0x2220 [scsi_mod] sp=e0000001b03d7dc0 bsp=e0000001b03d1068 [] kthread+0x220/0x280 sp=e0000001b03d7e10 bsp=e0000001b03d1028 [] kernel_thread_helper+0xe0/0x100 sp=e0000001b03d7e30 bsp=e0000001b03d1000 [] start_kernel_thread+0x20/0x40 sp=e0000001b03d7e30 bsp=e0000001b03d1000 May 5 16:38:14 lab105 kernel: Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b6b May 5 16:38:14 lab105 kernel: scsi_eh_6[27704]: Oops 11012296146944 [1] May 5 16:38:14 lab105 kernel: Modules linked in: ib_srp ib_cm ib_sa ib_mthca ib_mad ib_core nls_utf8 evdev joydev sg st sr_mod ide_cd cdrom usbserial parport_pc lp parport thed May 5 16:38:14 lab105 kernel: May 5 16:38:14 lab105 kernel: Pid: 27704, CPU 1, comm: scsi_eh_6 May 5 16:38:14 lab105 kernel: psr : 00001210081a6018 ifs : 800000000000058e ip : [] Not tainted May 5 16:38:14 lab105 kernel: ip is at srp_reconnect_target+0x2b1/0x5e0 [ib_srp] May 5 16:38:14 lab105 kernel: unat: 0000000000000000 pfs : 000000000000058e rsc : 0000000000000003 May 5 16:38:14 lab105 kernel: rnat: 0000000000000000 bsps: 0000000000000000 pr : 0000000000009141 May 5 16:38:14 lab105 kernel: ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f May 5 16:38:14 lab105 kernel: csd : 0000000000000000 ssd : 0000000000000000 May 5 16:38:14 lab105 kernel: b0 : a0000002022755e0 b6 : a00000010000faa0 b7 : a000000202215ac0 May 5 16:38:14 lab105 kernel: f6 : 1003e6b6b6b6b6b6b6b6b f7 : 0ffdd8000000000000000 May 5 16:38:14 lab105 kernel: f8 : 1003e00000000000014c8 f9 : 1003e0000000000000013 May 5 16:38:14 lab105 kernel: f10 : 1003e0000000000000000 f11 : 1003e0000000000000000 May 5 16:38:14 lab105 kernel: r1 : a0000002022782f8 r2 : e0000001f0d6bb48 r3 : e0000001f0d6b9e8 May 5 16:38:14 lab105 kernel: r8 : 0000000000000000 r9 : a00000010090d8f0 r10 : a00000010090d8f8 May 5 16:38:14 lab105 kernel: r11 : 0000000000000001 r12 : e0000001b03d7d00 r13 : e0000001b03d0000 May 5 16:38:14 lab105 kernel: r14 : a00000010090d900 r15 : e0000001b03d0000 r16 : 0000000000000001 May 5 16:38:14 lab105 kernel: r17 : 0000000000000001 r18 : e0000001b03d0fa4 r19 : a00000010090d908 May 5 16:38:14 lab105 kernel: r20 : ffffffffffffffff r21 : 0000000000000008 r22 : e0000000752c4400 May 5 16:38:14 lab105 kernel: r23 : e000000060ee5688 r24 : 0000000000000080 r25 : e0000000752c441f May 5 16:38:14 lab105 kernel: r26 : a000000202215ac0 r27 : e00000018a40e1e0 r28 : e00000018a40e000 May 5 16:38:14 lab105 kernel: r29 : e000000060ee55e8 r30 : e0000001b1fd8ac0 r31 : e0000001b1fd8a28 May 5 16:38:14 lab105 kernel: May 5 16:38:14 lab105 kernel: Call Trace: May 5 16:38:14 lab105 kernel: [] show_stack+0x80/0xa0 May 5 16:38:14 lab105 kernel: sp=e0000001b03d7880 bsp=e0000001b03d1330 May 5 16:38:14 lab105 kernel: [] show_regs+0x840/0x880 May 5 16:38:14 lab105 kernel: sp=e0000001b03d7a50 bsp=e0000001b03d12d0 May 5 16:38:14 lab105 kernel: [] die+0x1b0/0x2e0 May 5 16:38:14 lab105 kernel: sp=e0000001b03d7a60 bsp=e0000001b03d1288 May 5 16:38:14 lab105 kernel: [] ia64_do_page_fault+0x970/0xae0 May 5 16:38:14 lab105 kernel: sp=e0000001b03d7a80 bsp=e0000001b03d1220 May 5 16:38:14 lab105 kernel: [] ia64_leave_kernel+0x0/0x280 May 5 16:38:14 lab105 kernel: sp=e0000001b03d7b30 bsp=e0000001b03d1220 May 5 16:38:14 lab105 kernel: [] srp_reconnect_target+0x2b0/0x5e0 [ib_srp] May 5 16:38:14 lab105 kernel: sp=e0000001b03d7d00 bsp=e0000001b03d11a8 May 5 16:38:14 lab105 kernel: [] srp_reset_host+0x60/0xa0 [ib_srp] May 5 16:38:14 lab105 kernel: sp=e0000001b03d7dc0 bsp=e0000001b03d1180 May 5 16:38:14 lab105 kernel: [] scsi_try_host_reset+0xd0/0x240 [scsi_mod] May 5 16:38:14 lab105 kernel: sp=e0000001b03d7dc0 bsp=e0000001b03d1150 May 5 16:38:14 lab105 kernel: [] scsi_error_handler+0x17f0/0x2220 [scsi_mod] May 5 16:38:14 lab105 kernel: sp=e0000001b03d7dc0 bsp=e0000001b03d1068 May 5 16:38:14 lab105 kernel: [] kthread+0x220/0x280 May 5 16:38:14 lab105 kernel: sp=e0000001b03d7e10 bsp=e0000001b03d1028 May 5 16:38:14 lab105 kernel: [] kernel_thread_helper+0xe0/0x100 May 5 16:38:14 lab105 kernel: sp=e0000001b03d7e30 bsp=e0000001b03d1000 May 5 16:38:14 lab105 kernel: [] start_kernel_thread+0x20/0x40 May 5 16:38:14 lab105 kernel: sp=e0000001b03d7e30 bsp=e0000001b03d1000 This patch apply on top of your patch will fix the problem. IB/srp: Fix tracking of pending requests during error handling Signed-off-by: Vu Pham -------------- next part -------------- A non-text attachment was scrubbed... Name: srp_eh.patch Type: text/x-patch Size: 1859 bytes Desc: not available URL: From bugzilla-daemon at openib.org Fri May 5 16:41:44 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Fri, 5 May 2006 16:41:44 -0700 (PDT) Subject: [openib-general] [Bug 50] 1.0rc4 build fails on 2.6.9-39chaos Message-ID: <20060505234144.1536A2283EB@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=50 weiny2 at llnl.gov changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Additional Comments From weiny2 at llnl.gov 2006-05-05 16:41 ------- Sorry, apparently we have added these for other devices. Thanks, Ira ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Fri May 5 16:58:26 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Fri, 5 May 2006 16:58:26 -0700 (PDT) Subject: [openib-general] [Bug 50] 1.0rc4 build fails on 2.6.9-39chaos Message-ID: <20060505235826.B7C6F2283EB@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=50 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED ------- Additional Comments From sweitzen at cisco.com 2006-05-05 16:58 ------- Close out INVALID bug. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rdreier at cisco.com Fri May 5 18:33:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 05 May 2006 18:33:09 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <445BDACE.8000205@mellanox.com> (Vu Pham's message of "Fri, 05 May 2006 16:07:58 -0700") References: <443FC35F.6080301@mellanox.com> <44450F89.7020500@mellanox.com> <44451D32.1010106@mellanox.com> <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> <445B9F3E.7060601@mellanox.com> <445BA0E8.7010104@mellanox.com> <445BDACE.8000205@mellanox.com> Message-ID: Vu> It still does not address the issue pointed out from my Vu> previous email - the first eh_host_reset_handler() success, Vu> right away scsi_eh_host_reset() send start-stop-unit or Vu> test-unit-ready command using the same scsi command. This stu Vu> or tur command stuck in our queue, get timeout and get Vu> aborted. The abortion of stu or tur command once again get Vu> timeout. The original scsi command get freed. We delay the Vu> clean-up of the associated request in Vu> eh_device_reset_handler() instead of in eh_abort_handler() so Vu> it's still in our queue. The lun is marked offline. The next Vu> eh_device_reset_handler() for the same lun won't be Vu> called. The next eh_reset_host_handler() will hit Vu> used-after-free bug. You can see the log below I'm still confused. Even the original eh_reset_host_handler implementation will throw away all commands in the SRP queue, because it does: for (i = 0; i < SRP_SQ_SIZE - 1; ++i) target->req_ring[i].next = i + 1; target->req_ring[SRP_SQ_SIZE - 1].next = -1; INIT_LIST_HEAD(&target->req_queue); and the new patched version does list_for_each_entry(req, &target->req_queue, list) { req->scmnd->result = DID_RESET << 16; req->scmnd->scsi_done(req->scmnd); srp_unmap_data(req->scmnd, target, req); } on top of that. So after srp_reconnect_target() returns, SRP has no requests in its queue. The only way that a command could be put in the queue is if the SCSI midlayer passes it back into the queuecommand functions. I know I'm being dense but could you explain it one more time? Also, this really worries me: Vu> May 5 16:36:24 lab105 kernel: ib_mthca 0000:05:00.0: CQ overrun on CQN 040082 Do you know what's causing this? - R. From bos at pathscale.com Fri May 5 22:27:41 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 05 May 2006 22:27:41 -0700 Subject: [openib-general] Re: Need OpenIB bugzilla component for RDS In-Reply-To: References: Message-ID: <1146893261.1045.18.camel@localhost.localdomain> On Fri, 2006-05-05 at 15:23 -0700, Scott Weitzenkamp (sweitzen) wrote: > Ranjit, are you the default owner for RDS bugs? I'm going to hold off creating the component until I know who the owner is :-) References: <443FC35F.6080301@mellanox.com> <44450F89.7020500@mellanox.com> <44451D32.1010106@mellanox.com> <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> <445B9F3E.7060601@mellanox.com> <445BA0E8.7010104@mellanox.com> <445BDACE.8000205@mellanox.com> Message-ID: <445C3C5E.9080207@mellanox.com> > > So after srp_reconnect_target() returns, SRP has no requests in its > queue. The only way that a command could be put in the queue is if > the SCSI midlayer passes it back into the queuecommand functions. Yes this is exactly what happening. static int scsi_eh_host_reset(struct list_head *work_q, struct list_head *done_q) { ... rtn = scsi_try_host_reset(scmd); if (rtn == SUCCESS) { list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (!scsi_device_online(scmd->device) || (!scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd)) || !scsi_eh_tur(scmd)) ... } 1st scsi_try_host_reset() --> srp_host_reset() --> srp_reconnect_target() return SUCCESS. Then scsi_eh_try_stu() or scsi_eh_tur() is called right after scsi_eh_try_stu or scsi_eh_tur --> scsi_send_eh_cmnd() --> srp_queuecommand() > > Also, this really worries me: > > Vu> May 5 16:36:24 lab105 kernel: ib_mthca 0000:05:00.0: CQ overrun on CQN 040082 > > Do you know what's causing this? > I'm guesing that we get qp event 1 without handling it and we keep posting request Vu From rdreier at cisco.com Sat May 6 08:01:14 2006 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 06 May 2006 08:01:14 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <445C3C5E.9080207@mellanox.com> (Vu Pham's message of "Fri, 05 May 2006 23:04:14 -0700") References: <44450F89.7020500@mellanox.com> <44451D32.1010106@mellanox.com> <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> <445B9F3E.7060601@mellanox.com> <445BA0E8.7010104@mellanox.com> <445BDACE.8000205@mellanox.com> <445C3C5E.9080207@mellanox.com> Message-ID: > 1st scsi_try_host_reset() --> srp_host_reset() --> > srp_reconnect_target() return SUCCESS. Then scsi_eh_try_stu() or > scsi_eh_tur() is called right after > > scsi_eh_try_stu or scsi_eh_tur --> scsi_send_eh_cmnd() --> > srp_queuecommand() But after srp_reconnect_target(), both SRP's and the midlayer's queue of pending commands should be completely empty, since I put list_for_each_entry(req, &target->req_queue, list) { req->scmnd->result = DID_RESET << 16; req->scmnd->scsi_done(req->scmnd); srp_unmap_data(req->scmnd, target, req); } and INIT_LIST_HEAD(&target->free_reqs); INIT_LIST_HEAD(&target->req_queue); for (i = 0; i < SRP_SQ_SIZE; ++i) list_add_tail(&target->req_ring[i].list, &target->free_reqs); in there. Why doesn't that work to kill all the pending commands? - R. From bugzilla-daemon at openib.org Sat May 6 12:05:10 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Sat, 6 May 2006 12:05:10 -0700 (PDT) Subject: [openib-general] [Bug 33] Ping fails on ib1 interface - IBED - RC3 Message-ID: <20060506190510.8942E2283DF@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=33 ------- Additional Comments From sweitzen at cisco.com 2006-05-06 12:05 ------- Are you still seeing this with rc4? I have not seen this using OpenIB IPoIB and the Cisco IB SM. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eitan at mellanox.co.il Sat May 6 22:31:53 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sun, 7 May 2006 08:31:53 +0300 Subject: [openib-general] Dump and load routes with opensm? Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAFE@mtlexch01.mtl.com> Hi Greg, A plan for OpenSM to support loading unicast routes already exists. To do it we need to develop a scheme where the routes file also holds the topology using GUIDs such that the discovered topology is compared to the saved one. I have outlined the algorithm in the attached writeup on OpenSM routing algorithms section 6: "Incremental Algorithms". The right place to implement this will be to have the osm_db* enhanced to support this new DB domain. Then the osm_ucast_mgr.c will need to initialize the DB and use it while routing. Regarding dump of existing tables: If you run opensm -V or -D 0x43 you should get the file /tmp/osm.fdbs with that dump. You can also use a more SM independent method for obtaining the tables : if you run ibdiagnet you should get the file /tmp/ibdiagnet.fdbs with similar format. Eitan Greg Johnson wrote: > On Thu, May 04, 2006 at 12:39:38PM -0400, Hal Rosenstock wrote: > >>Hi Greg, >> >>On Thu, 2006-05-04 at 12:34, Greg Johnson wrote: >> >>>Is there currently a way to dump and load routes with opensm? If not, >>>how would I go about writing one? >> >>Is it really routes or stable LIDs you want ? > > > I actually want routes. I have queried them with ibtraceroute and > ibroute, but we need routes for the whole fabric. > > BTW, if you call ibtraceroute thousands of times it stops working. > Maybe a problem in the MAD driver? > > > >>LIDs are stored in /var/cache/osm/guid2lid and restored from there when >>OpenSM is started assuming the reassign LIDs option (-r or >>--reassign_lids) is not used when invoking OpenSM. > > > Thanks, that's good to know. > > Greg > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: OpenSM ROUTING.txt URL: From sweitzen at cisco.com Sun May 7 00:29:27 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sun, 7 May 2006 00:29:27 -0700 Subject: [openib-general] OFED-1.0-rc4 is available Message-ID: We see many problems with RC4, we've only tested i686 and x86_64 so far (just started ia64 and ppc64). 1) If the new SDP does not make it in until RC5, I don't see how we can avoid an RC6. 2) SRP is missing on RHEL4 (http://openib.org/bugzilla/show_bug.cgi?id=51 ), which is a big regression. 3) RDS does not load on RHEL4 U3 (http://openib.org/bugzilla/show_bug.cgi?id=61 ). 4) No MPI for ppc64. 5) Open MPI is not working well for Pallas benchmarks, more details to follow. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Thursday, May 04, 2006 9:04 AM To: openfabrics-ewg at openib.org Cc: openib-general Subject: [openib-general] OFED-1.0-rc4 is available Hi All, We have prepared OFED 1.0 RC4. Release location: *https://openib.org/svn/gen2/branches/1.0/ofed/releases* File: OFED-1.0-rc4.tgz *BUILD_ID:* OFED-1.0-rc4 openib-1.0 (REV=6922) # User space https://openib.org/svn/gen2/branches/1.0/src/userspace # Kernel space git/HEAD: ref: refs/heads/for-2.6.17 commit 9817d207dc13e3a9fc0287bbd36bdfa3cffe5ed4 https://openib.org/svn/gen2/branches/1.0/ofed/tags/rc4/linux-kernel # MPI mpi_osu-0.9.7-mlx2.1.0.tgz openmpi-1.1a3-1.src.rpm mpitests-1.0-0.src.rpm *OSes:* * RH EL4 up2: 2.6.9-22.ELsmp * RH EL4 up3: 2.6.9-34.ELsmp * Fedora C4: 2.6.11-1.1369_FC4 * SLES10 beta 7: 2.6.16-rc5-git9-2-smp * SUSE 10 Pro: 2.6.13-15-smp * kernel.org: 2.6.16 *Systems:* * x86_64 * x86 * ia64 * ppc64 *Main changes from RC3:* 1. Kernel code based on git (see BUILD_ID for version) 2. SRP - with new features: FMR, tunable parameters, SRP daemon (see details below) 3. Open MPI - new package based on 1.1a3 4. RDS - new version from main trunk 5. Standard network configuration: Network configuration scripts ifcfg-ib*, located under /etc/sysconfig/network-scripts (.../network for SuSE) 6. /etc/modprobe.conf updated with: alias ib0 ib_ipoib ... alias net-pf-26 ib_sdp Note: this causes unloading the ipoib module to fail on SuSE (hotplug issue) 7. ipath driver available on RH EL4 (up2 and up3). 8. uDAPL is available on RH EL4 (up2 and up3). 9. Documentation updated (the installation guide is not complete). 10. pdsh has been removed from the package. 11. libibat has been removed. 12. Bug Fixes *Package limitations:* 1. iSER is working on SuSE SLES 10 Beta8 only 2. SDP has not been upgraded yet. 3. MPI OSU and Open MPI compilation fails on PPC64 4. ipath driver compilation fails on FedoraC4. *SRP details:* SRP new features and changes: * FMR support was added. * Added an interface for removing a target in the initiator. * Added an interface for querying the connected targets of an initiator. * The SRP tool ibsrpdm can execute as a daemon. The daemon adds new targets that join the network and removes targets that leave the network. * ibsrpdm can use new SM feature: enhanced capability mask matching (errata MGTWG8372) Limitation: * Attempting to add the same target twice fails - to be fixed in RC5. * The implementation of the target list query must be modified. The new implementation should create a directory in sysfs for each target - to be fixed in RC5. Please send us any issues you encounter and/or test results. Thanks Tziporet & Vlad -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Sun May 7 00:46:46 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 7 May 2006 10:46:46 +0300 Subject: [openib-general] OFED-1.0-rc4 is available Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6ECE@mtlexch01.mtl.com> Thanks for your input. See inside. Tziporet -----Original Message----- From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Sent: Sunday, May 07, 2006 10:29 AM To: Tziporet Koren; openfabrics-ewg at openib.org Cc: openib-general Subject: RE: [openib-general] OFED-1.0-rc4 is available We see many problems with RC4, we've only tested i686 and x86_64 so far (just started ia64 and ppc64). 1) If the new SDP does not make it in until RC5, I don't see how we can avoid an RC6. I agree - and this is what I meant in my mail last week. We plan RC5 with SDP and then RC6 for showstopper bugs. 2) SRP is missing on RHEL4 (http://openib.org/bugzilla/show_bug.cgi?id=51), which is a big regression. It was removed on purpose since when running simple tests system is hanged due to the spin-lock issue in kernel 2.6.9 (see discussion of Ishai and Doug about this). We can put it back but I don't see a point in putting something that easily halts the machine or causing kernel oops. Ishai is working to debug it and once it is solved we will put it back. What is your opinion on this? 3) RDS does not load on RHEL4 U3 (http://openib.org/bugzilla/show_bug.cgi?id=61). SilverStorm should fix this. 4) No MPI for ppc64. Pasha will try to work with OSU and solve it this week. 5) Open MPI is not working well for Pallas benchmarks, more details to follow. That's Jeff's Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren Sent: Thursday, May 04, 2006 9:04 AM To: openfabrics-ewg at openib.org Cc: openib-general Subject: [openib-general] OFED-1.0-rc4 is available Hi All, We have prepared OFED 1.0 RC4. Release location: *https://openib.org/svn/gen2/branches/1.0/ofed/releases* File: OFED-1.0-rc4.tgz *BUILD_ID:* OFED-1.0-rc4 openib-1.0 (REV=6922) # User space https://openib.org/svn/gen2/branches/1.0/src/userspace # Kernel space git/HEAD: ref: refs/heads/for-2.6.17 commit 9817d207dc13e3a9fc0287bbd36bdfa3cffe5ed4 https://openib.org/svn/gen2/branches/1.0/ofed/tags/rc4/linux-kernel # MPI mpi_osu-0.9.7-mlx2.1.0.tgz openmpi-1.1a3-1.src.rpm mpitests-1.0-0.src.rpm *OSes:* * RH EL4 up2: 2.6.9-22.ELsmp * RH EL4 up3: 2.6.9-34.ELsmp * Fedora C4: 2.6.11-1.1369_FC4 * SLES10 beta 7: 2.6.16-rc5-git9-2-smp * SUSE 10 Pro: 2.6.13-15-smp * kernel.org: 2.6.16 *Systems:* * x86_64 * x86 * ia64 * ppc64 *Main changes from RC3:* 1. Kernel code based on git (see BUILD_ID for version) 2. SRP - with new features: FMR, tunable parameters, SRP daemon (see details below) 3. Open MPI - new package based on 1.1a3 4. RDS - new version from main trunk 5. Standard network configuration: Network configuration scripts ifcfg-ib*, located under /etc/sysconfig/network-scripts (.../network for SuSE) 6. /etc/modprobe.conf updated with: alias ib0 ib_ipoib ... alias net-pf-26 ib_sdp Note: this causes unloading the ipoib module to fail on SuSE (hotplug issue) 7. ipath driver available on RH EL4 (up2 and up3). 8. uDAPL is available on RH EL4 (up2 and up3). 9. Documentation updated (the installation guide is not complete). 10. pdsh has been removed from the package. 11. libibat has been removed. 12. Bug Fixes *Package limitations:* 1. iSER is working on SuSE SLES 10 Beta8 only 2. SDP has not been upgraded yet. 3. MPI OSU and Open MPI compilation fails on PPC64 4. ipath driver compilation fails on FedoraC4. *SRP details:* SRP new features and changes: * FMR support was added. * Added an interface for removing a target in the initiator. * Added an interface for querying the connected targets of an initiator. * The SRP tool ibsrpdm can execute as a daemon. The daemon adds new targets that join the network and removes targets that leave the network. * ibsrpdm can use new SM feature: enhanced capability mask matching (errata MGTWG8372) Limitation: * Attempting to add the same target twice fails - to be fixed in RC5. * The implementation of the target list query must be modified. The new implementation should create a directory in sysfs for each target - to be fixed in RC5. Please send us any issues you encounter and/or test results. Thanks Tziporet & Vlad -------------- next part -------------- An HTML attachment was scrubbed... URL: From jackm at mellanox.co.il Sun May 7 01:57:55 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Sun, 7 May 2006 11:57:55 +0300 Subject: [openib-general] [PATCH 3/3] librdmacm: add ability to get/set transport specific options In-Reply-To: References: Message-ID: <200605071157.55936.jackm@mellanox.co.il> On Friday 05 May 2006 02:15, Sean Hefty wrote: > Add routines to the userspace RDMA CM library to get/set transport > specific options. > > Add an option to retrieve possible path records for a connection, and > set which path a connection will be established on. > Sean, I noticed that in this update to librdmacm, you added functionality (to the ABI). Should we use this revision (6949/6950 of the openib trunk) of the rdma_cm in the upcoming OFED (branch) release? If yes, I will update both the kernel and userspace rdma_cm (in the branch) for OFED RC5. If not, please change the trunk ABI version (RDMA_USER_CM_ABI_VERSION) to 2 in file linux_kernel/driver/infiniband/include/rdma/rdma_user_cm.h (since the branch rdma_cm kernel module will be in the field -- and it will not support the get/set transport-specific options), and also change RDMA_USER_CM_MAX_ABI_VERSION to 2 in userspace file rdma_cma_abi.h, and add appropriate kernel abi version checking in userspace for the userspace functions unavailable with a kernel rdma_cm module which supports only ABI version 1. - Jack From eitan at mellanox.co.il Sun May 7 03:52:11 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sun, 7 May 2006 13:52:11 +0300 Subject: [openib-general] OFED-1.0-rc4 is available Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB09@mtlexch01.mtl.com> Hi Woody, There were several >= 256 nodes clusters that were brought up lately with the same OpenSM core of OFED-1.0-rc4. Eitan Woodruff, Robert J wrote: > Does anyone know what the largest cluster is that has been > run reliably with recent versions of the openib code, either the > trunk or OFED, using OpenSM as the subnet manager ? > > I have someone that is building a 256 node cluster and want to > know if they can rely on OpenSM or if they should use the > SM in the switch. > > If folks could share their recent experiences that would be > great. > > woody > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From tziporet at mellanox.co.il Sun May 7 04:30:17 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 07 May 2006 14:30:17 +0300 Subject: [openib-general] OFED-1.0-rc4 is available In-Reply-To: <1AC79F16F5C5284499BB9591B33D6F00079C64F4@orsmsx408> References: <1AC79F16F5C5284499BB9591B33D6F00079C64F4@orsmsx408> Message-ID: <445DDA49.2070201@mellanox.co.il> Woodruff, Robert J wrote: > I took a version of the OFED RC4 kernel code, > gen2/branches/1.0/ofed/tags/rc4/linux-kernel > applied my latest backport patch (for svn6829), which applied fine. > and built a kernel RPM for testing. > > Then I took the 1.0 userspace code and built it. > > I found that using the cma version of uDAPL did not work > and caused a core dump. Using the newer userspace cma.c code > fixes the problem. I applied this patch and it fixed the > problem. > > Not sure if anyone cares about having the rdma_cm in OFED, but > if they do, I think it needs this fix. > > woody > > Thanks Woody, We took this patch to the branch and it will be in RC5 Tziporet From mst at mellanox.co.il Sun May 7 07:37:05 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 7 May 2006 17:37:05 +0300 Subject: [openib-general] cm crash Message-ID: <20060507143705.GI3032@mellanox.co.il> Hello! I have observed the crash below. From SDP messages, this seems to happen when I am calling rdma_destroy_cm_id and at the same time a handler is running and cma is getting non-zero return code from the callback. My analysis of the failure: cm_process_work does: cm_deref_id(cm_id_priv); if (ret) ib_destroy_cm_id(&cm_id_priv->id); assume that another thread calls ib_destroy_cm_id. Now wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount)); while ((work = cm_dequeue_work(cm_id_priv)) != NULL) cm_free_work(work); kfree(cm_id_priv->compare_data); kfree(cm_id_priv->private_data); kfree(cm_id_priv); once the reference count reaches 0, this thread will wake. We now have two threads running destroy on the same id! Forwarded message from "Michael S. Tsirkin" ----- Subject: crash Date: Sun, 7 May 2006 16:37:43 +0300 From: "Michael S. Tsirkin" Reply-To: "Michael S. Tsirkin" sdp_sock(32769:0): socket is being torn down idr_remove called for id=1064 which is not allocated. sdp_sock(32769:0): socket is being torn down idr_remove called for id=1064 which is not allocated. Call Trace: {idr_remove+278} {:ib_cm:ib_destroy_cm_id+476} {cache_free_debugcheck+568} {:ib_cm:cm_process_work+206} {:ib_cm:cm_work_handler+2732} {run_workqueue+167} {:ib_cm:cm_work_handler+0} {worker_thread+0} {worker_thread+255} {default_wake_function+0} {default_wake_function+0} {worker_thread+0} {kthread+210} {child_rip+8} {kthread+0} {child_rip+0} Slab corruption: start=ffff81017a99ab10, len=512 -- What item should I pick to always win in rock, scissors, paper? ----- End forwarded message ----- -- MST From mst at mellanox.co.il Sun May 7 08:24:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 7 May 2006 18:24:18 +0300 Subject: [openib-general] Re: cm crash In-Reply-To: <20060507143705.GI3032@mellanox.co.il> References: <20060507143705.GI3032@mellanox.co.il> Message-ID: <20060507152418.GK3032@mellanox.co.il> Another possible issue: static inline void cm_deref_id(struct cm_id_private *cm_id_priv) { if (atomic_dec_and_test(&cm_id_priv->refcount)) wake_up(&cm_id_priv->wait); } A thread could test the refcount after atomic_dec_and_test but before wake_up(&cm_id_priv->wait), and remove cm_id_priv. This would result in use after free. Quoting r. Michael S. Tsirkin : Subject: cm crash Hello! I have observed the crash below. From SDP messages, this seems to happen when I am calling rdma_destroy_cm_id and at the same time a handler is running and cma is getting non-zero return code from the callback. My analysis of the failure: cm_process_work does: cm_deref_id(cm_id_priv); if (ret) ib_destroy_cm_id(&cm_id_priv->id); assume that another thread calls ib_destroy_cm_id. Now wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount)); while ((work = cm_dequeue_work(cm_id_priv)) != NULL) cm_free_work(work); kfree(cm_id_priv->compare_data); kfree(cm_id_priv->private_data); kfree(cm_id_priv); once the reference count reaches 0, this thread will wake. We now have two threads running destroy on the same id! Forwarded message from "Michael S. Tsirkin" ----- Subject: crash Date: Sun, 7 May 2006 16:37:43 +0300 From: "Michael S. Tsirkin" Reply-To: "Michael S. Tsirkin" sdp_sock(32769:0): socket is being torn down idr_remove called for id=1064 which is not allocated. sdp_sock(32769:0): socket is being torn down idr_remove called for id=1064 which is not allocated. Call Trace: {idr_remove+278} {:ib_cm:ib_destroy_cm_id+476} {cache_free_debugcheck+568} {:ib_cm:cm_process_work+206} {:ib_cm:cm_work_handler+2732} {run_workqueue+167} {:ib_cm:cm_work_handler+0} {worker_thread+0} {worker_thread+255} {default_wake_function+0} {default_wake_function+0} {worker_thread+0} {kthread+210} {child_rip+8} {kthread+0} {child_rip+0} Slab corruption: start=ffff81017a99ab10, len=512 -- What item should I pick to always win in rock, scissors, paper? ----- End forwarded message ----- -- MST _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- MST From mst at mellanox.co.il Sun May 7 08:54:47 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 7 May 2006 18:54:47 +0300 Subject: [openib-general] [PATCH] fix 2 race conditions in ib_destroy_cm_id In-Reply-To: <20060507152418.GK3032@mellanox.co.il> References: <20060507143705.GI3032@mellanox.co.il> <20060507152418.GK3032@mellanox.co.il> Message-ID: <20060507155447.GL3032@mellanox.co.il> Fix two issues in CM. 1. crash if cm id is destroyed from handler because of non-0 return code, and at the same time from user thread by direct call to ib_destroy_cm_id. 2. use after free if ib_destroy_cm_id tests the refcount after cm_deref_id has decremented the reference count but before it has called wake_up. I'm sure the first one has caused crashes for me, and I suspect the second one caused a system hang. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.16/drivers/infiniband/core/cm.c =================================================================== --- linux-2.6.16.orig/drivers/infiniband/core/cm.c 2006-05-07 20:41:38.000000000 +0300 +++ linux-2.6.16/drivers/infiniband/core/cm.c 2006-05-07 20:43:28.000000000 +0300 @@ -159,8 +159,12 @@ static void cm_work_handler(void *data); static inline void cm_deref_id(struct cm_id_private *cm_id_priv) { + unsigned long flags; + + spin_lock_irqsave(&cm_id_priv->lock, flags); if (atomic_dec_and_test(&cm_id_priv->refcount)) wake_up(&cm_id_priv->wait); + spin_unlock_irqrestore(&cm_id_priv->lock, flags); } static int cm_alloc_msg(struct cm_id_private *cm_id_priv, @@ -320,13 +324,22 @@ static int cm_alloc_id(struct cm_id_priv return ret; } -static void cm_free_id(__be32 local_id) +static int cm_free_id(__be32 local_id) { + struct cm_id_private *cm_id_priv; unsigned long flags; + int rc; spin_lock_irqsave(&cm.lock, flags); - idr_remove(&cm.local_id_table, (__force int) local_id); + cm_id_priv = idr_find(&cm.local_id_table, (__force int) local_id); + if (unlikely(!cm_id_priv)) + rc = -EINVAL; + else { + rc = 0; + idr_remove(&cm.local_id_table, (__force int) local_id); + } spin_unlock_irqrestore(&cm.lock, flags); + return rc; } static struct cm_id_private * cm_get_id(__be32 local_id, __be32 remote_id) @@ -710,11 +723,12 @@ static void cm_reset_to_idle(struct cm_i } } -void ib_destroy_cm_id(struct ib_cm_id *cm_id) +static void __ib_destroy_cm_id(struct ib_cm_id *cm_id, int flush) { struct cm_id_private *cm_id_priv; struct cm_work *work; unsigned long flags; + int rc; cm_id_priv = container_of(cm_id, struct cm_id_private, id); retest: @@ -775,15 +789,32 @@ retest: break; } - cm_free_id(cm_id->local_id); - atomic_dec(&cm_id_priv->refcount); - wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount)); + rc = cm_free_id(cm_id->local_id); + cm_deref_id(cm_id_priv); + if (unlikely(rc) && flush) { + /* A handler has removed the id from the idr. + Make sure it is not still running. */ + flush_workqueue(&cm.wq); + return; + } + + wait_event(cm_id_priv->wait, ({ + spin_lock_irq(&cm_id_priv->lock); + rc = !atomic_read(&cm_id_priv->refcount) + spin_unlock_irq(&cm_id_priv->lock); + rc; + })); while ((work = cm_dequeue_work(cm_id_priv)) != NULL) cm_free_work(work); kfree(cm_id_priv->compare_data); kfree(cm_id_priv->private_data); kfree(cm_id_priv); } + +void ib_destroy_cm_id(struct ib_cm_id *cm_id) +{ + __ib_destroy_cm_id(cm_id, 1); +} EXPORT_SYMBOL(ib_destroy_cm_id); int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask, @@ -1171,8 +1202,8 @@ static void cm_process_work(struct cm_id cm_free_work(work); } cm_deref_id(cm_id_priv); - if (ret) - ib_destroy_cm_id(&cm_id_priv->id); + if (ret) /* don't flush: we are running from cm.wq */ + __ib_destroy_cm_id(&cm_id_priv->id, 0); } static void cm_format_mra(struct cm_mra_msg *mra_msg, -- MST From mst at mellanox.co.il Sun May 7 08:57:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 7 May 2006 18:57:54 +0300 Subject: [openib-general] Re: [PATCH] fix 2 race conditions in ib_destroy_cm_id In-Reply-To: <20060507155447.GL3032@mellanox.co.il> References: <20060507143705.GI3032@mellanox.co.il> <20060507152418.GK3032@mellanox.co.il> <20060507155447.GL3032@mellanox.co.il> Message-ID: <20060507155754.GM3032@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Subject: [PATCH] fix 2 race conditions in ib_destroy_cm_id > > Fix two issues in CM. > 1. crash if cm id is destroyed from handler because of non-0 return code, > and at the same time from user thread by direct call to ib_destroy_cm_id. > 2. use after free if ib_destroy_cm_id tests the refcount after cm_deref_id has > decremented the reference count but before it has called wake_up. > > I'm sure the first one has caused crashes for me, and I suspect > the second one caused a system hang. > > Signed-off-by: Michael S. Tsirkin I have sent this by mistake - the patch is still under test. Please disregard. Thanks, -- MST From sweitzen at cisco.com Sun May 7 09:25:57 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Sun, 7 May 2006 09:25:57 -0700 Subject: [openib-general] OFED-1.0-rc4 is available Message-ID: 2) SRP is missing on RHEL4 (http://openib.org/bugzilla/show_bug.cgi?id=51), which is a big regression. It was removed on purpose since when running simple tests system is hanged due to the spin-lock issue in kernel 2.6.9 (see discussion of Ishai and Doug about this). We can put it back but I don't see a point in putting something that easily halts the machine or causing kernel oops. Ishai is working to debug it and once it is solved we will put it back. What is your opinion on this? [Scott Weitzenkamp (sweitzen)] In OFED 1.0 rc2, SRP was working on RHEL4 2.6.9, at least on x86_64. All other SRP implementations I know of work on RHEL4 2.6.9. I think OFED SRP on RHEL4 2.6.9 is a must have. SRP in OFED consists of too many patches right now, we need to get the functionality checked into SVN not just patched on top of SVN (see bug 62 that I just filed, http://openib.org/bugzilla/show_bug.cgi?id=62). Scott -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpandit at silverstorm.com Sun May 7 10:07:29 2006 From: rpandit at silverstorm.com (Ranjit Pandit) Date: Sun, 7 May 2006 10:07:29 -0700 Subject: [openib-general] Re: Need OpenIB bugzilla component for RDS In-Reply-To: <1146893261.1045.18.camel@localhost.localdomain> References: <1146893261.1045.18.camel@localhost.localdomain> Message-ID: <96f8e60e0605071007u41ba9789g6f52b08e5be0fcd8@mail.gmail.com> Bryan, Please mark me as the default owner of RDS bugs. Ranjit On 5/5/06, Bryan O'Sullivan wrote: > On Fri, 2006-05-05 at 15:23 -0700, Scott Weitzenkamp (sweitzen) wrote: > > Ranjit, are you the default owner for RDS bugs? > > I'm going to hold off creating the component until I know who the owner > is :-) > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From xma at us.ibm.com Sun May 7 17:26:46 2006 From: xma at us.ibm.com (Shirley Ma) Date: Sun, 7 May 2006 17:26:46 -0700 Subject: [openib-general] Re: sdp code in trunk In-Reply-To: <20060504200450.GC4682@mellanox.co.il> Message-ID: Hello Michael, In your previous email http://openib.org/pipermail/openib-general/2006-February/017121.html You mentioned you were planning to use single CQ. > - Single CQ, perform all CQ polling from interrupt context I went throught SDP code, splitting CQ is still used. I am thinking to remove SDP tx/rx_rings after I done for IPoIB. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Sun May 7 19:57:43 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Sun, 7 May 2006 19:57:43 -0700 Subject: [openib-general] RE: cm crash In-Reply-To: <20060507143705.GI3032@mellanox.co.il> Message-ID: >cm_process_work does: > > cm_deref_id(cm_id_priv); > if (ret) > ib_destroy_cm_id(&cm_id_priv->id); > >assume that another thread calls ib_destroy_cm_id. >Now > > wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount)); > while ((work = cm_dequeue_work(cm_id_priv)) != NULL) > cm_free_work(work); > kfree(cm_id_priv->compare_data); > kfree(cm_id_priv->private_data); > kfree(cm_id_priv); > >once the reference count reaches 0, this thread will wake. >We now have two threads running destroy on the same id! This is a user issue where they try to destroy the cm_id twice. A user cannot call ib_destroy_cm_id() and return non-zero from a callback on that same ID. We cannot fix this in the CM. If the thread calling ib_destroy_cm_id() is delayed, then the callback handler will return, and cleanup will occur. The thread calling ib_destroy_cm_id() will then reference invalid memory. - Sean From sean.hefty at intel.com Sun May 7 20:14:32 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Sun, 7 May 2006 20:14:32 -0700 Subject: [openib-general] RE: cm crash In-Reply-To: <20060507152418.GK3032@mellanox.co.il> Message-ID: >Another possible issue: > >static inline void cm_deref_id(struct cm_id_private *cm_id_priv) >{ > if (atomic_dec_and_test(&cm_id_priv->refcount)) > wake_up(&cm_id_priv->wait); >} > >A thread could test the refcount after atomic_dec_and_test but before >wake_up(&cm_id_priv->wait), and remove cm_id_priv. >This would result in use after free. I believe that this same code structure is in other places in the IB code. Maybe we're using wait_event() incorrectly to make destruction synchronous? Is there some other wait call that can work here, or an atomic dec_and_test_wake_up that can protect against this? - Sean From mst at mellanox.co.il Sun May 7 22:41:23 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 08:41:23 +0300 Subject: [openib-general] Re: sdp code in trunk In-Reply-To: References: <20060504200450.GC4682@mellanox.co.il> Message-ID: <20060508054123.GA19660@mellanox.co.il> Quoting r. Shirley Ma : > Hello Michael, > > In your previous email http://openib.org/pipermail/openib-general/2006-February/017121.html > You mentioned you were planning to use single CQ. > > - Single CQ, perform all CQ polling from interrupt context > > I went throught SDP code, splitting CQ is still used. I'm quite sure there's a single CQ per QP in SDP at the moment. The structure is similiar to IPoIB. -- MST From mst at mellanox.co.il Sun May 7 22:42:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 08:42:01 +0300 Subject: [openib-general] Re: cm crash In-Reply-To: References: <20060507152418.GK3032@mellanox.co.il> Message-ID: <20060508054201.GB19660@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: cm crash > > >Another possible issue: > > > >static inline void cm_deref_id(struct cm_id_private *cm_id_priv) > >{ > > if (atomic_dec_and_test(&cm_id_priv->refcount)) > > wake_up(&cm_id_priv->wait); > >} > > > >A thread could test the refcount after atomic_dec_and_test but before > >wake_up(&cm_id_priv->wait), and remove cm_id_priv. > >This would result in use after free. > > I believe that this same code structure is in other places in the IB code. Where? > Maybe we're using wait_event() incorrectly to make destruction synchronous? Is > there some other wait call that can work here, or an atomic dec_and_test_wake_up > that can protect against this? > > - Sean > -- MST From mst at mellanox.co.il Sun May 7 22:45:29 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 08:45:29 +0300 Subject: [openib-general] [PATCH] cm refcount race fix Message-ID: <20060508054529.GC19660@mellanox.co.il> Sean, please review. Are there other places in code that have the same race? -- Fix race condition in CM. use after free if ib_destroy_cm_id tests the refcount after cm_deref_id has decremented the reference count but before it has called wake_up. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.16/drivers/infiniband/core/cm.c =================================================================== --- linux-2.6.16.orig/drivers/infiniband/core/cm.c 2006-05-07 22:01:23.000000000 +0300 +++ linux-2.6.16/drivers/infiniband/core/cm.c 2006-05-07 22:01:40.000000000 +0300 @@ -159,8 +159,12 @@ static void cm_work_handler(void *data); static inline void cm_deref_id(struct cm_id_private *cm_id_priv) { + unsigned long flags; + + spin_lock_irqsave(&cm_id_priv->lock, flags); if (atomic_dec_and_test(&cm_id_priv->refcount)) wake_up(&cm_id_priv->wait); + spin_unlock_irqrestore(&cm_id_priv->lock, flags); } static int cm_alloc_msg(struct cm_id_private *cm_id_priv, @@ -778,6 +782,10 @@ retest: cm_free_id(cm_id->local_id); atomic_dec(&cm_id_priv->refcount); wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount)); + + /* Make sure cm_deref_id is not in progress */ + spin_lock_irq(&cm_id_priv->lock); + spin_unlock_irq(&cm_id_priv->lock); while ((work = cm_dequeue_work(cm_id_priv)) != NULL) cm_free_work(work); kfree(cm_id_priv->compare_data); -- MST From oferg at mellanox.co.il Mon May 8 00:05:49 2006 From: oferg at mellanox.co.il (Ofer Gigi) Date: Mon, 8 May 2006 10:05:49 +0300 Subject: [openib-general] [PATCH] osm_vendor_mlx.c : missing pointer check Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30208D944@mtlexch01.mtl.com> Hi Hal, Bug fix: When the driver is down and you try to raise opensm - it exits in segmentation fault. The reason for this is that __osm_vendor_internal_unbind is called with h_bind==0. Thanks Ofer G. Signed-off-by: Ofer Gigi Index: osm_vendor_mlx.c =================================================================== --- osm_vendor_mlx.c (revision 6640) +++ osm_vendor_mlx.c (working copy) @@ -357,7 +357,10 @@ osm_vendor_unbind(IN osm_bind_handle_t cl_qlist_remove_item(p_bh_list,p_item); if (p_obj) cl_free(p_obj); + if (h_bind != 0) + { __osm_vendor_internal_unbind(h_bind); + } OSM_LOG_EXIT(p_log); } From sweitzen at cisco.com Mon May 8 00:29:07 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 8 May 2006 00:29:07 -0700 Subject: [openfabrics-ewg] RE: [openib-general] OFED-1.0-rc4 is available Message-ID: 5) Open MPI is not working well for Pallas benchmarks, more details to follow [Scott Weitzenkamp (sweitzen)] First bug filed, Open MPI does not work when more than one IB port is active on a host (http://openib.org/bugzilla/show_bug.cgi?id=64). Scott -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at mellanox.co.il Mon May 8 00:41:03 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 08 May 2006 10:41:03 +0300 Subject: [openib-general] Re: [mvapich-discuss] RE: [openfabrics-ewg] Current OFED kernel snapshot In-Reply-To: <20060505222126.GC10255@cse.ohio-state.edu> References: <20060505222126.GC10255@cse.ohio-state.edu> Message-ID: <445EF60F.2070907@mellanox.co.il> Sayantan Sur wrote: > Hi, > > * On May,10 Roland Dreier wrote : > >> Abhinav> By referring to the PPC64 architecture, i was mentioning >> Abhinav> about the IBM HCAs(4x/12x) running on GX/GX+ Bus. To the >> Abhinav> best of my knowledge, these HCAs do not support the >> Abhinav> features mentioned above. >> >> Hmm, making this a compile-time thing seems like a problem then. Some >> ppc64 systems have IBM eHCAs and some have Mellanox and/or PathScale >> HCAs. Shouldn't the same MPI package work on all of these systems? >> > > We've just incorporated an autodetection utility as a part of our build > script (make.mvapich.gen2). It essentially reads the type of HCA from > the standard location /sys/class/infiniband//hca_type. Using this > script, we have decoupled the choice of architecture and the InfiniBand > card in the system. We believe using this script, MVAPICH will be able > to work on PPC64 systems connected with Mellanox/Pathscale cards. > > On our installation of the IBM PPC64 systems, there is no `hca_type' > file present. If that file is there, then the above mentioned > "decoupling" thing can be done easily by trivially modifying our script. > Does anyone have a PPC64 installation which has that file? > > With OpenIB/Gen2 standardized method of exporting HCA types, we will > soon have runtime detection and optimization. > > Thanks, > Sayantan. > > Sayantan, 1. What if HCA driver is not loaded, it can happen when you are going to install OFED for the first time and HCA driver is not present? I think, you should rely on "lspci" instead of /sys/... . 2. There is an option to build RPMs using OFED build.sh script on machine without HCAs. So, in this case HCA type should be passed as a parameter. Regards, Vladimir From mst at mellanox.co.il Mon May 8 01:45:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 11:45:06 +0300 Subject: [openib-general] Re: cm crash In-Reply-To: References: <20060507143705.GI3032@mellanox.co.il> Message-ID: <20060508084506.GC20207@mellanox.co.il> Quoting r. Sean Hefty : > This is a user issue where they try to destroy the cm_id twice. A user cannot > call ib_destroy_cm_id() and return non-zero from a callback on that same ID. > > We cannot fix this in the CM. If the thread calling ib_destroy_cm_id() is > delayed, then the callback handler will return, and cleanup will occur. The > thread calling ib_destroy_cm_id() will then reference invalid memory. Hmm. Let's look at CMA - I'm a little more familiar with it. There is a case of RDMA_CM_EVENT_CONNECT_REQUEST, where a new cma_id was created for me by cm. In this case I don't track the id so won't be destroying the id. So it seems I must do: return event->event == RDMA_CM_EVENT_CONNECT_REQUEST ? -EINVAL : 0; Idea: could there be a special code (positive code?) that will have meaning "I am destroying this id, don't proceed"? This should destroy the id only if CMA created it as a result of RDMA_CM_EVENT_CONNECT_REQUEST. Does this make sense? -- MST From mst at mellanox.co.il Mon May 8 01:53:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 11:53:01 +0300 Subject: [openib-general] CMA: compliancy issue? Message-ID: <20060508085301.GD20207@mellanox.co.il> Hello, Sean! SDP spec states: CA4-24.2.3: The connecting peer shall terminate the connection attempt if ExtMaxAdverts of the HAH is set to zero. This means that SDP must examine the HAH before RTU is sent. But, CMA currently sends RTU from cma_rep_recv, before notifying the user. I think the same problem might affect other ULPs as well: CMA should notify the user *before* responsing with CM message, and should supply a way for the user to reject the connection, with or without destroying the CMA ID. -- MST From bugzilla-daemon at openib.org Mon May 8 04:07:36 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 8 May 2006 04:07:36 -0700 (PDT) Subject: [openib-general] [Bug 62] OFED 1.0 rc4: too many SRP patches, get this code checked in Message-ID: <20060508110736.D663B22834D@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=62 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|tziporet at mellanox.co.il |bugzilla at openib.org ------- Additional Comments From tziporet at mellanox.co.il 2006-05-08 04:07 ------- This is correct. The reason was that Roland was on vacation and we could not get the code checked-in for RC4 and we wanted the new code to be tested as part of RC4. This week Roland and Vu are working to get the code to the trunk thus RC5 will include less SRP patches. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Mon May 8 04:09:17 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 8 May 2006 04:09:17 -0700 (PDT) Subject: [openib-general] [Bug 62] OFED 1.0 rc4: too many SRP patches, get this code checked in Message-ID: <20060508110917.8F3A3228422@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=62 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |rolandd at cisco.com ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ogerlitz at voltaire.com Mon May 8 04:52:04 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 08 May 2006 14:52:04 +0300 Subject: [openib-general] CMA: compliancy issue? In-Reply-To: <20060508085301.GD20207@mellanox.co.il> References: <20060508085301.GD20207@mellanox.co.il> Message-ID: <445F30E4.9020109@voltaire.com> Michael S. Tsirkin wrote: > CMA currently sends RTU from cma_rep_recv, before notifying > the user. > I think the same problem might affect other ULPs as well: CMA > should notify the user *before* responsing with CM message, and > should supply a way for the user to reject the connection, > with or without destroying the CMA ID. Please read a little deeper into the documentation of rdma_create_qp and rdma_accept in rdma_cm.h and also see the code. The design of the CMA say that if the ULP wants the CMA to do the QP states transitions it should call rdma_create_qp and later the CMA will not deliver up event of REP but instead modify the QP state to RTR, RTS and send an RTU. And vise versa, if the ULP want to get a callback on REP it should ***not*** call rdma_create_qp and rather do the state changes by itself. Or. From tziporet at mellanox.co.il Mon May 8 04:56:05 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 08 May 2006 14:56:05 +0300 Subject: [openib-general] OFED-1.0-rc4 is available In-Reply-To: References: Message-ID: <445F31D5.9090602@mellanox.co.il> Scott Weitzenkamp (sweitzen) wrote: > > > > > SRP in OFED consists of too many patches right now, we need to get > the functionality checked into SVN not just patched on top of SVN > (see bug 62 that I just > filed, http://openib.org/bugzilla/show_bug.cgi?id=62). > > > > Scott > > > I am going to take this with Roland, Vu and Ishai who are currently working on SRP. I agree that in RC5 we must have only small bug fixes and not features as patches. Tziporet From halr at voltaire.com Mon May 8 05:47:03 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 08 May 2006 08:47:03 -0400 Subject: [openib-general] Re: [PATCH] osm_vendor_mlx.c : missing pointer check In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30208D944@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30208D944@mtlexch01.mtl.com> Message-ID: <1147092422.4719.132117.camel@hal.voltaire.com> On Mon, 2006-05-08 at 03:05, Ofer Gigi wrote: > Hi Hal, > > Bug fix: > When the driver is down and you try to raise opensm - it exits in > segmentation > fault. The reason for this is that __osm_vendor_internal_unbind is > called > with h_bind==0. > > Thanks Thanks. Applied to both trunk and 1.0 branch. -- Hal > > Ofer G. > > Signed-off-by: Ofer Gigi From mst at mellanox.co.il Mon May 8 06:11:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 16:11:56 +0300 Subject: [openib-general] CMA: compliancy issue? In-Reply-To: <445F30E4.9020109@voltaire.com> References: <20060508085301.GD20207@mellanox.co.il> <445F30E4.9020109@voltaire.com> Message-ID: <20060508131156.GA21036@mellanox.co.il> Quoting r. Or Gerlitz : > The design of the CMA say that if the ULP wants the CMA to do the QP > states transitions it should call rdma_create_qp and later the CMA will > not deliver up event of REP but instead modify the QP state to RTR, RTS > and send an RTU. And vise versa, if the ULP want to get a callback on > REP it should ***not*** call rdma_create_qp and rather do the state > changes by itself. I think you are mistaken. Here's code from cma.c case IB_CM_REP_RECEIVED: if (id_priv->id.qp) { status = cma_rep_recv(id_priv); event = status ? RDMA_CM_EVENT_CONNECT_ERROR : RDMA_CM_EVENT_ESTABLISHED; } else event = RDMA_CM_EVENT_CONNECT_RESPONSE; So CMA passes the REP to the ULP as inside RDMA_CM_EVENT_ESTABLISHED event. This works fine for me: on seeing RDMA_CM_EVENT_ESTABLISHED, SDP actually looks at private data and gets all the parameters it needs from the REP. What I am saying is that CMA should call the ULP on the REP *before* sending RTU, and API should give ULP an option to reject the connection based on private data. And no, I don't want to handle state transitions myself, I think think capability is mainly useful for userspace. -- MST From ogerlitz at voltaire.com Mon May 8 06:20:36 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 08 May 2006 16:20:36 +0300 Subject: [openib-general] CMA: compliancy issue? In-Reply-To: <20060508131156.GA21036@mellanox.co.il> References: <20060508085301.GD20207@mellanox.co.il> <445F30E4.9020109@voltaire.com> <20060508131156.GA21036@mellanox.co.il> Message-ID: <445F45A4.4080308@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >> The design of the CMA say that if the ULP wants the CMA to do the QP >> states transitions it should call rdma_create_qp and later the CMA will >> not deliver up event of REP but instead modify the QP state to RTR, RTS >> and send an RTU. And vise versa, if the ULP want to get a callback on >> REP it should ***not*** call rdma_create_qp and rather do the state >> changes by itself. > > I think you are mistaken. Here's code from cma.c > > case IB_CM_REP_RECEIVED: > if (id_priv->id.qp) { > status = cma_rep_recv(id_priv); > event = status ? RDMA_CM_EVENT_CONNECT_ERROR : > RDMA_CM_EVENT_ESTABLISHED; > } else > event = RDMA_CM_EVENT_CONNECT_RESPONSE; > > So CMA passes the REP to the ULP as inside RDMA_CM_EVENT_ESTABLISHED event. no no, see the code or my parsing of it for you below. If the ULP has created a QP via the CMA then the CMA will send RTU and deliver up ESTABLISHED event, else the CMA will deliver up CONNECT_RESPONSE event and only later when the ULP calls rdma_accept the CMA will send the RTU. Or. From mst at mellanox.co.il Mon May 8 06:28:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 16:28:04 +0300 Subject: [openib-general] CMA: port 2 loopback problems Message-ID: <20060508132803.GB21036@mellanox.co.il> Sean, I am seeing the following problem: I have a dual-port HCA with IPoIB interfaces ib0 on port 1 and ib1 on port 2. port 1 is down and port 2 is up, and I try creating a connection to the loopback address 127.0.0.1. The problem I am seeing is that I am getting RDMA_CM_EVENT_ROUTE_ERROR. Apparently CMA attempts address resolution through port 1, which fails. 1. A quick way to solve this would be to change addr_resolve_local to only look at devices which are UP. This is slightly ugly however, since the device could only be momentarily DOWN. 2. A nicer, but harder, way would be to find *all* appropriate devices, and start address resolution through all of them at the same time. The first one to get resolved will return to the user, others should be cancelled. 3. Finally, it is possible to cause CMA to avoid performing SA queries for access to a local device. I think the second approach is clearly the best thing to do, but it is also quite nontrivial. Sean, will you be interested in a patch implementing the first approach as a stopgap measure? -- MST From mst at mellanox.co.il Mon May 8 06:31:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 16:31:54 +0300 Subject: [openib-general] CMA: compliancy issue? In-Reply-To: <445F45A4.4080308@voltaire.com> References: <20060508085301.GD20207@mellanox.co.il> <445F30E4.9020109@voltaire.com> <20060508131156.GA21036@mellanox.co.il> <445F45A4.4080308@voltaire.com> Message-ID: <20060508133154.GC21036@mellanox.co.il> Quoting r. Or Gerlitz : > >So CMA passes the REP to the ULP as inside RDMA_CM_EVENT_ESTABLISHED event. > > no no, see the code or my parsing of it for you below. > > If the ULP has created a QP via the CMA then the CMA will send RTU and > deliver up ESTABLISHED event, else the CMA will deliver up > CONNECT_RESPONSE event and only later when the ULP calls rdma_accept the > CMA will send the RTU. Correct, that's what I am saying. I want to create a QP via CMA (no sense in duplicating functionality), but I need CMA to sent RTU *after* delivering ESTABLISHED event, not before as it does now, and I need a way to tell CMA to reject the connection after my handler looks at private data. -- MST From halr at voltaire.com Mon May 8 06:27:33 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 08 May 2006 09:27:33 -0400 Subject: [openib-general] Dump and load routes with opensm? In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAFE@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BAFE@mtlexch01.mtl.com> Message-ID: <1147094852.4719.132940.camel@hal.voltaire.com> Hi Eitan, On Sun, 2006-05-07 at 01:31, Eitan Zahavi wrote: > Hi Greg, > > A plan for OpenSM to support loading unicast routes already exists. What's the timeframe for implementing this ? Has the implementation already started ? Is it what you write below and had posted to the list last summer ? > To do it we need to develop a scheme where the routes file also holds > the topology using GUIDs > such that the discovered topology is compared to the saved one. I have > outlined the algorithm in > the attached writeup on OpenSM routing algorithms section 6: > "Incremental Algorithms". Wouldn't it be better to separate out the persistency and incremental issues ? Also, I sent numerous questions and comments on this writeup some time ago. > The right place to implement this will be to have the osm_db* enhanced > to support this new DB domain. > Then the osm_ucast_mgr.c will need to initialize the DB and use it while > routing. Also, changes versus this either need disabling or handling. -- Hal > Regarding dump of existing tables: > > If you run opensm -V or -D 0x43 you should get the file /tmp/osm.fdbs > with that dump. > You can also use a more SM independent method for obtaining the tables : > if you run ibdiagnet you should get the file /tmp/ibdiagnet.fdbs with > similar format. > > Eitan > > Greg Johnson wrote: > > On Thu, May 04, 2006 at 12:39:38PM -0400, Hal Rosenstock wrote: > > > >>Hi Greg, > >> > >>On Thu, 2006-05-04 at 12:34, Greg Johnson wrote: > >> > >>>Is there currently a way to dump and load routes with opensm? If > not, > >>>how would I go about writing one? > >> > >>Is it really routes or stable LIDs you want ? > > > > > > I actually want routes. I have queried them with ibtraceroute and > > ibroute, but we need routes for the whole fabric. > > > > BTW, if you call ibtraceroute thousands of times it stops working. > > Maybe a problem in the MAD driver? > > > > > > > >>LIDs are stored in /var/cache/osm/guid2lid and restored from there > when > >>OpenSM is started assuming the reassign LIDs option (-r or > >>--reassign_lids) is not used when invoking OpenSM. > > > > > > Thanks, that's good to know. > > > > Greg > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > > > ______________________________________________________________________ > > OpenSM Unicast Routing Enhancements for Scalability > ===================================================== > > Authors:Eitan Zahavi , Yael Kalka > Date: Aug 2005. > > Table of contents: > 1. Overview > 2. Notation > 3. Current Algorithms > 4. Proposed Routing Algorithms > 5. Min Hop Tables Implementation > 6. Incremental Routing > 7. Routing Persistancy > > 1. Overview: > ------------ > > OpenSM currently uses a two stage routing algorithm for unicast > forwarding tables calculation. As shown later these algorithm are > O(N^3). Inspected run time of OpenSM routing stage was ~2.5min on 1100 > nodes cluster. The purpose of this memo is to present the community > the proposed work for enhancing OpenSM routing engine. > > 2. Notation: > ------------ > The following notations are used throughout this document: > S = Number of switch devices in the system > P = Number of ports each switch node has > H = Number of HCA ports connected to the fabric > L = Number of HCAs connected to each Leaf switch device. > Normal values are 1/2P to 3/4P > D = Fat Tree depth > > 3. Current Algorithms: > ---------------------- > OpenSM provide two routing algorithms: Minimal Hop and Up/Down. Both > of them do not scale with cluster size and can consume both large run > time (minutes) and memory (GB). This section provides meta code for > these algorithms and order calculation. > > 3.1 Min Hop algorithm analysis: > The Min Hop algorithm is divided into two stages: computation of > min-hop tables on every switch and LFT output port assignment. > > Step 1: Computation of Min-Hop Tables on each switch > > The memory consumed is S*(S+H)*(P+2)*Byte. On 10K nodes cluster with > 2500 switch devices this ends up as 812M-Byte (using LMC=0). > > Meta algorithm: > For each HCA mark its remote neighbor switch port with hop 1. > For each switch mark itself port 0 as hop 0 > While changes > For each switch > For each port > For each LID > Propagate remote port hop as hop +1 if smaller or undefined > > The order of this step: O(S*P*(S+H)*(D+1)) > > Step 2: Assigning output port: > For each switch > For each LID > For each Port > Is it the one with min count > > The order of this step: O(S*(S+H)*P) > > 3.2 Up/Down algorithm analysis > The Up/Down algorithm depends on the ability to rank the fabric nodes > from root to leaf of the tree. To get that ranking it runs a > heuristics that is based on the Min Hop tables. So the memory and > complexity are identical to the Min-Hop first step to start with. > > Once ranking is performed the algorithm is BFS from every HCA and fill > in the Min Hop tables again. Up/Down traversal rule is enforced during > the BFS such that only valid turns are allowed. > > Meta algorithm: > For each HCA > Get connected Switch > For each Switch in NextSwitches > For each Port > Check if direction is OK. Check if not visited > > The order is O(H*S*P) > > To finalize output port assignment the second step of the Min Hop > algorithm is invoked. > > > 4. Proposed Algorithm: > ---------------------- > > Inspecting the routing problem we have noticed the following > attributes: > a. Using Min-Hop tables for keeping intermediate routing information > has a disadvantage in terms of memory consumption. However, any > incremental routing algorithm (for handling fabric changes after > first setup), or routing persistence solution could use this > information and gain speed. > b. Since we need to fill in LFT tables that are of the order S*(S+H) > the algorithm is lower bounded by O(H^2). > c. A persistence based solution which uses previously routed fabric > data and is able to handle simple incremental changes will provide > a much faster runtime as topology match will require O(S*P) > (traversing all links once) > d. Since the minimum hops information is identical for a switch and > all the HCAs connected to it - there is no point in building "min > hop" tables for HCAs. During the "output port" assignment stage, > the HCAs connected to each switch are considered and routed. > > The result of "a" is that several algorithms that are superior from > memory footprint and skip any "hop table" stage are not considered for > implementation. > > To support "d" we needed to provide an index to each switch such that > the "min hop" tables are dense (previously they were indexed by LID). > The new index is stored on the switch object and thus allow lookup of > a switch "min hop" by its index. An array of switch pointer by index > supports the reverse lookup. > > The suggested algorithm is broken into the following 3 stages: > * Root nodes identification heuristics > * Min Hop tables computation > * Output port assignment > > 4.1 Root nodes identification heuristics: > This step is only required under the AND of the following two > conditions: > * Up/Down routing is required > * The user does not provide a file with guids of the tree "root nodes" > > This heuristics for recognizing the tree roots is based on histograms > of the HCAs distance from every switch. > I.e. How many HCAs are 1 hop, 2 hops from the switch. In order to fill > in these histograms on all switches we need to BFS from every leaf > switch and propagate the number of HCAs connected to it: > > Meta algorithm: > For each switch > For each Port > If connected to HCAs count them > If any HCA > Init BFS to start with current switch > set hop count to 0 > While there are switches in BFS list > increment hop count > For each switch in BFS List > Add the number of HCAs to the histogram at the current hop count > For each port > If remote port switch not visited > Add the switch to the BFS Next Step List > Once finished all this step list use next steps > > The order of this step is: O(S*P + H/L*S*P) = O(*H*S) > > 4.2 Min Hop tables computation: > This step is mandatory and has a slightly different flavor for the > case of Up/Down routing. > > The algorithm starts from every leaf switch and traverses BFS wise > through the fabric. > > Meta algorithm: > foreach switch in the fabric > clear the "Rank" vector for all switches. > start BFS with the given switch. > set rank to 0 > while any switch in BFS list > | foreach switch in BFS switch list > | |foreach port (valid, active, not unhealthy) > | | if remote side is a switch: > | | if rank of remote side 0 or = rank + 1 > | | set the remote port entry MinHopPort for this switch > | | if rank of remote side 0 i.e. never visited > | | set the remote switch rank to rank + 1 > | | add the remote switch to next BFS switches > | |------ > | switch between the current and next switches list > | increment rank > |------ > > The order of this algorithm: O(P*S^2) > > Algorithm that merges Up/Down step criteria not yet written for this > stage. But the idea is to make each step keep track of any previous > step down. Such that a step up will be prohibited in this case. > > 4.3 Output port assignment: > This step provide actual LFT values assignment to each switch. > To do that we access the built "min hop" tables and track port usage. > > Meta algorithm: > foreach switch in the fabric > clear the port subscription vector (track number of paths subscribed) > foreach target switch in the "min hop" table > get the list of min-hop ports > foreach end-node attached (HCA connected to it and itself) > if lmc > 0 init tracking of remote system and node for selecting > disjoint paths for same end-node different LID LSBs > get min-subscribed (and disjoint) port marked min-hop target switch > track port usage in port-subscription (opt. if target LID is not a switch) > > Order of this step: > Currently the selection of the output port by min-subscription is > trivial and requires O(P) so the overall order is > O(S*S*(1+L)*P) <= O(16*P*S^2) > > One could obtain the list of marked min-hop ports and then use a > modified cyclic list for avoiding the search for min subscription in > the case of LMC > 0. In that case the order could be reduced to: > O(S*S*(1+L)) ~= O(S*H) > > 5. Min Hop Table Implementation: > -------------------------------- > The proposed algorithm does not require storing the number of hops > arriving at the switch port - but only the fact a port is on the min > hop path. This allowed for another memory usage reduction if the min > hop table would be of boolean values. > > The issue then is in an efficiant iterator on the boolean (bits) > array. The tradeoff is thus the common memory versus runtime. > > (Anybody knowns off a fast boolean array lookup implementation ?) > > 6. Incremental Routing: > ----------------------- > Once the fabric is routed we can define an algorithm for performing > incremental routing changes. An obvious case is when a link is > declared un-healthy or one of the ports is dropped. Assuming the > recognition of the change is done by some other algorithm. The > following cases apply: > * If the link connects HCA and a switch the HCA is unreachable. No > routing change required. > * If the link is between switches: > * If there at least one another link between these switches: > o Spread all routes going through the failing port to the other > ports connecting to the same switch. > * If there is no other link to these switches > o Go back to all switches that feed into each one of the switches > (feed in means they route some target lids through the switch) > but only those that route lids that go through the failing port. Check > to see if there is another port that goes to a different switch > to route that lid to. If there is no other way do nothing. > > How do we support topology changes line moving an HCA from one Switch > to another? > > 7. Routing Persistancy: > ----------------------- > To make the subnet initialization faster, one could store the existing > routing solution and use it without any calculation. > > The issue is of course what conditions makes the stored routing > obsolete. To maximize the usefullnes of the stored information we > propose to store the Min Hop tables rather then the final port > assignment. It is assumed that after restart there might be a need for > modifications to LMC and routing which will invalidate the LFT > anyways. To enable "cache invalidation criteria" the persistent > database should include information that could be used to easily check > if the fabric was not altered in a way that invalidates the MinHop tables. > > The stored information should hold for each switch in the fabric (by guid) > the list of ports and the guids and port numbers on the remote side. > To validate there are no significant changes, the discovered set > of switches is checked to match the stored information. Table 1 > describes the possible changes and their effect on the validity of > the MinHop tables. > > Table 1 - Connectivity Changes effect on Routing Info Validity > > Change | Effect on MinHop Tables | Effect on LFT and MFT | > ------------------------------------------------------------------------- > New Switch found | Invalidates (might connect | Invalidates | > | more HCAs and carries more | | > | routing resources) | | > ------------------------------------------------------------------------- > Missing Switch | Invalidates (MinHops might | Invalidates | > | be broken a few steps away)| | > ------------------------------------------------------------------------- > New cable found | Does not invalidate | Does not invalidate | > ------------------------------------------------------------------------- > Missing Cable | Invalidates only if there | Invalidates all LIDs | > (SW to SW) | is no other cable | going through that port | > | connecting the switches | | > ------------------------------------------------------------------------- > Missing Cable | Does no invalidate | Does not invalidate | > (SW to HCA) | | | > ------------------------------------------------------------------------- > New HCA | Does no invalidate | Does not invalidate | > ------------------------------------------------------------------------- > Missing HCA | Does no invalidate | Does not invalidate | > ------------------------------------------------------------------------- > LID Changes | Does no invalidate | Invalidates the modified| > | | LIDs | > ------------------------------------------------------------------------- > > Special marking for "root nodes" shold cache the results of the first > step for Up/Dpwn routing. These nodes should be invalidated on any > missing or additional switch conditions. > > From dotanb at mellanox.co.il Mon May 8 06:37:24 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Mon, 8 May 2006 16:37:24 +0300 Subject: [openib-general] Re: the QP attribute have a double defenition of the QP primary port In-Reply-To: References: <200604301110.42718.dotanb@mellanox.co.il> Message-ID: <200605081637.24625.dotanb@mellanox.co.il> On Thursday 04 May 2006 19:57, Roland Dreier wrote: > > Dotan> I think that there are 2 problems with this implementation: > Dotan> 1) the user can use two different values for those port > Dotan> numbers (in mthca driver, the port number that was defined > Dotan> in the address vector will be used) 2) the user can define > Dotan> / change the QP port number in the transition INIT->RTR > Dotan> (which means a IB spec violation) > > Probably the low-level drivers should ignore the port number in the > primary address vector for INIT->RTR. That would fix both problems, right? > > - R. Yes, it will fix the both problems. But i think that this API is confusing (for a new IB user).. Dotan From ogerlitz at voltaire.com Mon May 8 06:41:56 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 08 May 2006 16:41:56 +0300 Subject: [openib-general] CMA: compliancy issue? In-Reply-To: <20060508133154.GC21036@mellanox.co.il> References: <20060508085301.GD20207@mellanox.co.il> <445F30E4.9020109@voltaire.com> <20060508131156.GA21036@mellanox.co.il> <445F45A4.4080308@voltaire.com> <20060508133154.GC21036@mellanox.co.il> Message-ID: <445F4AA4.2020802@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >> If the ULP has created a QP via the CMA then the CMA will send RTU and >> deliver up ESTABLISHED event, else the CMA will deliver up >> CONNECT_RESPONSE event and only later when the ULP calls rdma_accept the >> CMA will send the RTU. > Correct, that's what I am saying. > I want to create a QP via CMA (no sense in duplicating functionality), but I > need CMA to sent RTU *after* delivering ESTABLISHED event, not before as it does > now, and I need a way to tell CMA to reject the connection after my handler > looks at private data. Please note that if the ULP manage the QP states is can call rdma_reject after getting the REP instead of rdma_accept and this will do what you need. The idea behind the orig CM***A*** design was to have MAX abstraction in the price of MIN (ZERO) impact on the ULP. We were thinking here that after getting the REP its fine to send REP and deliver up ESTABLISHED. You are suggesting a design change in the CMA which would effect also the current CMA ULP consumers: iSER, RDS, NFSoRDMA and Lustre. Or. From mst at mellanox.co.il Mon May 8 06:49:00 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 16:49:00 +0300 Subject: [openib-general] CMA: compliancy issue? In-Reply-To: <445F4AA4.2020802@voltaire.com> References: <20060508085301.GD20207@mellanox.co.il> <445F30E4.9020109@voltaire.com> <20060508131156.GA21036@mellanox.co.il> <445F45A4.4080308@voltaire.com> <20060508133154.GC21036@mellanox.co.il> <445F4AA4.2020802@voltaire.com> Message-ID: <20060508134900.GD21036@mellanox.co.il> Quoting r. Or Gerlitz : > You are suggesting a design change in the CMA which would effect also > the current CMA ULP consumers: iSER, RDS, NFSoRDMA and Lustre. I don't see how it will affect these ULPs: don't see how ULPs that don't check private data care whether RTU is sent before, or after, the handler. In any case, CMA is still in early stages of development so its natural to expect API changes. -- MST From eitan at mellanox.co.il Mon May 8 07:00:22 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 8 May 2006 17:00:22 +0300 Subject: [openib-general] Dump and load routes with opensm? Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB24@mtlexch01.mtl.com> > > > > A plan for OpenSM to support loading unicast routes already exists. > > What's the timeframe for implementing this ? Has the implementation > already started ? Is it what you write below and had posted to the list > last summer ? [EZ] Yes: it is the same proposal. No: the work has not started. > > > To do it we need to develop a scheme where the routes file also holds > > the topology using GUIDs > > such that the discovered topology is compared to the saved one. I have > > outlined the algorithm in > > the attached writeup on OpenSM routing algorithms section 6: > > "Incremental Algorithms". > > Wouldn't it be better to separate out the persistency and incremental > issues ? [EZ] I think that once you load the routes in the file you probably need to compare the topology used for generating them with the current one. The complexity level of "incremental" support - i.e. the ability to gracefully handle some topology change - might be built with time. So first implementation might be limited to compare with reference topology and reroute on any change. But it is a bit crude in my mind. > > Also, I sent numerous questions and comments on this writeup some time > ago. [EZ] I thought I did answer them. > > > The right place to implement this will be to have the osm_db* enhanced > > to support this new DB domain. > > Then the osm_ucast_mgr.c will need to initialize the DB and use it while > > routing. > > Also, changes versus this either need disabling or handling. [EZ] Not following you here. If you mean we need to be able to "shut off" this new capability - sure. > > -- Hal > > > Regarding dump of existing tables: > > > > If you run opensm -V or -D 0x43 you should get the file /tmp/osm.fdbs > > with that dump. > > You can also use a more SM independent method for obtaining the tables : > > if you run ibdiagnet you should get the file /tmp/ibdiagnet.fdbs with > > similar format. > > > > Eitan > > > > Greg Johnson wrote: > > > On Thu, May 04, 2006 at 12:39:38PM -0400, Hal Rosenstock wrote: > > > > > >>Hi Greg, > > >> > > >>On Thu, 2006-05-04 at 12:34, Greg Johnson wrote: > > >> > > >>>Is there currently a way to dump and load routes with opensm? If > > not, > > >>>how would I go about writing one? > > >> > > >>Is it really routes or stable LIDs you want ? > > > > > > > > > I actually want routes. I have queried them with ibtraceroute and > > > ibroute, but we need routes for the whole fabric. > > > > > > BTW, if you call ibtraceroute thousands of times it stops working. > > > Maybe a problem in the MAD driver? > > > > > > > > > > > >>LIDs are stored in /var/cache/osm/guid2lid and restored from there > > when > > >>OpenSM is started assuming the reassign LIDs option (-r or > > >>--reassign_lids) is not used when invoking OpenSM. > > > > > > > > > Thanks, that's good to know. > > > > > > Greg > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > ______________________________________________________________________ > > > > OpenSM Unicast Routing Enhancements for Scalability > > ===================================================== > > > > Authors:Eitan Zahavi , Yael Kalka > > Date: Aug 2005. > > > > Table of contents: > > 1. Overview > > 2. Notation > > 3. Current Algorithms > > 4. Proposed Routing Algorithms > > 5. Min Hop Tables Implementation > > 6. Incremental Routing > > 7. Routing Persistancy > > > > 1. Overview: > > ------------ > > > > OpenSM currently uses a two stage routing algorithm for unicast > > forwarding tables calculation. As shown later these algorithm are > > O(N^3). Inspected run time of OpenSM routing stage was ~2.5min on 1100 > > nodes cluster. The purpose of this memo is to present the community > > the proposed work for enhancing OpenSM routing engine. > > > > 2. Notation: > > ------------ > > The following notations are used throughout this document: > > S = Number of switch devices in the system > > P = Number of ports each switch node has > > H = Number of HCA ports connected to the fabric > > L = Number of HCAs connected to each Leaf switch device. > > Normal values are 1/2P to 3/4P > > D = Fat Tree depth > > > > 3. Current Algorithms: > > ---------------------- > > OpenSM provide two routing algorithms: Minimal Hop and Up/Down. Both > > of them do not scale with cluster size and can consume both large run > > time (minutes) and memory (GB). This section provides meta code for > > these algorithms and order calculation. > > > > 3.1 Min Hop algorithm analysis: > > The Min Hop algorithm is divided into two stages: computation of > > min-hop tables on every switch and LFT output port assignment. > > > > Step 1: Computation of Min-Hop Tables on each switch > > > > The memory consumed is S*(S+H)*(P+2)*Byte. On 10K nodes cluster with > > 2500 switch devices this ends up as 812M-Byte (using LMC=0). > > > > Meta algorithm: > > For each HCA mark its remote neighbor switch port with hop 1. > > For each switch mark itself port 0 as hop 0 > > While changes > > For each switch > > For each port > > For each LID > > Propagate remote port hop as hop +1 if smaller or undefined > > > > The order of this step: O(S*P*(S+H)*(D+1)) > > > > Step 2: Assigning output port: > > For each switch > > For each LID > > For each Port > > Is it the one with min count > > > > The order of this step: O(S*(S+H)*P) > > > > 3.2 Up/Down algorithm analysis > > The Up/Down algorithm depends on the ability to rank the fabric nodes > > from root to leaf of the tree. To get that ranking it runs a > > heuristics that is based on the Min Hop tables. So the memory and > > complexity are identical to the Min-Hop first step to start with. > > > > Once ranking is performed the algorithm is BFS from every HCA and fill > > in the Min Hop tables again. Up/Down traversal rule is enforced during > > the BFS such that only valid turns are allowed. > > > > Meta algorithm: > > For each HCA > > Get connected Switch > > For each Switch in NextSwitches > > For each Port > > Check if direction is OK. Check if not visited > > > > The order is O(H*S*P) > > > > To finalize output port assignment the second step of the Min Hop > > algorithm is invoked. > > > > > > 4. Proposed Algorithm: > > ---------------------- > > > > Inspecting the routing problem we have noticed the following > > attributes: > > a. Using Min-Hop tables for keeping intermediate routing information > > has a disadvantage in terms of memory consumption. However, any > > incremental routing algorithm (for handling fabric changes after > > first setup), or routing persistence solution could use this > > information and gain speed. > > b. Since we need to fill in LFT tables that are of the order S*(S+H) > > the algorithm is lower bounded by O(H^2). > > c. A persistence based solution which uses previously routed fabric > > data and is able to handle simple incremental changes will provide > > a much faster runtime as topology match will require O(S*P) > > (traversing all links once) > > d. Since the minimum hops information is identical for a switch and > > all the HCAs connected to it - there is no point in building "min > > hop" tables for HCAs. During the "output port" assignment stage, > > the HCAs connected to each switch are considered and routed. > > > > The result of "a" is that several algorithms that are superior from > > memory footprint and skip any "hop table" stage are not considered for > > implementation. > > > > To support "d" we needed to provide an index to each switch such that > > the "min hop" tables are dense (previously they were indexed by LID). > > The new index is stored on the switch object and thus allow lookup of > > a switch "min hop" by its index. An array of switch pointer by index > > supports the reverse lookup. > > > > The suggested algorithm is broken into the following 3 stages: > > * Root nodes identification heuristics > > * Min Hop tables computation > > * Output port assignment > > > > 4.1 Root nodes identification heuristics: > > This step is only required under the AND of the following two > > conditions: > > * Up/Down routing is required > > * The user does not provide a file with guids of the tree "root nodes" > > > > This heuristics for recognizing the tree roots is based on histograms > > of the HCAs distance from every switch. > > I.e. How many HCAs are 1 hop, 2 hops from the switch. In order to fill > > in these histograms on all switches we need to BFS from every leaf > > switch and propagate the number of HCAs connected to it: > > > > Meta algorithm: > > For each switch > > For each Port > > If connected to HCAs count them > > If any HCA > > Init BFS to start with current switch > > set hop count to 0 > > While there are switches in BFS list > > increment hop count > > For each switch in BFS List > > Add the number of HCAs to the histogram at the current hop count > > For each port > > If remote port switch not visited > > Add the switch to the BFS Next Step List > > Once finished all this step list use next steps > > > > The order of this step is: O(S*P + H/L*S*P) = O(*H*S) > > > > 4.2 Min Hop tables computation: > > This step is mandatory and has a slightly different flavor for the > > case of Up/Down routing. > > > > The algorithm starts from every leaf switch and traverses BFS wise > > through the fabric. > > > > Meta algorithm: > > foreach switch in the fabric > > clear the "Rank" vector for all switches. > > start BFS with the given switch. > > set rank to 0 > > while any switch in BFS list > > | foreach switch in BFS switch list > > | |foreach port (valid, active, not unhealthy) > > | | if remote side is a switch: > > | | if rank of remote side 0 or = rank + 1 > > | | set the remote port entry MinHopPort for this switch > > | | if rank of remote side 0 i.e. never visited > > | | set the remote switch rank to rank + 1 > > | | add the remote switch to next BFS switches > > | |------ > > | switch between the current and next switches list > > | increment rank > > |------ > > > > The order of this algorithm: O(P*S^2) > > > > Algorithm that merges Up/Down step criteria not yet written for this > > stage. But the idea is to make each step keep track of any previous > > step down. Such that a step up will be prohibited in this case. > > > > 4.3 Output port assignment: > > This step provide actual LFT values assignment to each switch. > > To do that we access the built "min hop" tables and track port usage. > > > > Meta algorithm: > > foreach switch in the fabric > > clear the port subscription vector (track number of paths subscribed) > > foreach target switch in the "min hop" table > > get the list of min-hop ports > > foreach end-node attached (HCA connected to it and itself) > > if lmc > 0 init tracking of remote system and node for selecting > > disjoint paths for same end-node different LID LSBs > > get min-subscribed (and disjoint) port marked min-hop target switch > > track port usage in port-subscription (opt. if target LID is not a switch) > > > > Order of this step: > > Currently the selection of the output port by min-subscription is > > trivial and requires O(P) so the overall order is > > O(S*S*(1+L)*P) <= O(16*P*S^2) > > > > One could obtain the list of marked min-hop ports and then use a > > modified cyclic list for avoiding the search for min subscription in > > the case of LMC > 0. In that case the order could be reduced to: > > O(S*S*(1+L)) ~= O(S*H) > > > > 5. Min Hop Table Implementation: > > -------------------------------- > > The proposed algorithm does not require storing the number of hops > > arriving at the switch port - but only the fact a port is on the min > > hop path. This allowed for another memory usage reduction if the min > > hop table would be of boolean values. > > > > The issue then is in an efficiant iterator on the boolean (bits) > > array. The tradeoff is thus the common memory versus runtime. > > > > (Anybody knowns off a fast boolean array lookup implementation ?) > > > > 6. Incremental Routing: > > ----------------------- > > Once the fabric is routed we can define an algorithm for performing > > incremental routing changes. An obvious case is when a link is > > declared un-healthy or one of the ports is dropped. Assuming the > > recognition of the change is done by some other algorithm. The > > following cases apply: > > * If the link connects HCA and a switch the HCA is unreachable. No > > routing change required. > > * If the link is between switches: > > * If there at least one another link between these switches: > > o Spread all routes going through the failing port to the other > > ports connecting to the same switch. > > * If there is no other link to these switches > > o Go back to all switches that feed into each one of the switches > > (feed in means they route some target lids through the switch) > > but only those that route lids that go through the failing port. Check > > to see if there is another port that goes to a different switch > > to route that lid to. If there is no other way do nothing. > > > > How do we support topology changes line moving an HCA from one Switch > > to another? > > > > 7. Routing Persistancy: > > ----------------------- > > To make the subnet initialization faster, one could store the existing > > routing solution and use it without any calculation. > > > > The issue is of course what conditions makes the stored routing > > obsolete. To maximize the usefullnes of the stored information we > > propose to store the Min Hop tables rather then the final port > > assignment. It is assumed that after restart there might be a need for > > modifications to LMC and routing which will invalidate the LFT > > anyways. To enable "cache invalidation criteria" the persistent > > database should include information that could be used to easily check > > if the fabric was not altered in a way that invalidates the MinHop tables. > > > > The stored information should hold for each switch in the fabric (by guid) > > the list of ports and the guids and port numbers on the remote side. > > To validate there are no significant changes, the discovered set > > of switches is checked to match the stored information. Table 1 > > describes the possible changes and their effect on the validity of > > the MinHop tables. > > > > Table 1 - Connectivity Changes effect on Routing Info Validity > > > > Change | Effect on MinHop Tables | Effect on LFT and MFT | > > ------------------------------------------------------------------------ - > > New Switch found | Invalidates (might connect | Invalidates | > > | more HCAs and carries more | | > > | routing resources) | | > > ------------------------------------------------------------------------ - > > Missing Switch | Invalidates (MinHops might | Invalidates | > > | be broken a few steps away)| | > > ------------------------------------------------------------------------ - > > New cable found | Does not invalidate | Does not invalidate | > > ------------------------------------------------------------------------ - > > Missing Cable | Invalidates only if there | Invalidates all LIDs | > > (SW to SW) | is no other cable | going through that port | > > | connecting the switches | | > > ------------------------------------------------------------------------ - > > Missing Cable | Does no invalidate | Does not invalidate | > > (SW to HCA) | | | > > ------------------------------------------------------------------------ - > > New HCA | Does no invalidate | Does not invalidate | > > ------------------------------------------------------------------------ - > > Missing HCA | Does no invalidate | Does not invalidate | > > ------------------------------------------------------------------------ - > > LID Changes | Does no invalidate | Invalidates the modified| > > | | LIDs | > > ------------------------------------------------------------------------ - > > > > Special marking for "root nodes" shold cache the results of the first > > step for Up/Dpwn routing. These nodes should be invalidated on any > > missing or additional switch conditions. > > > > From mst at mellanox.co.il Mon May 8 06:58:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 16:58:55 +0300 Subject: [openib-general] rdma_cm.h: comment nits. Message-ID: <20060508135855.GE21036@mellanox.co.il> Two nits wrt rdma_cm.h: /** * * rdma_reject - Called on the passive side to reject a connection request. */ Its OK to call rdma_reject on active side as well, isn't it? /** * rdma_cm_event_handler - Callback used to report user events. * * Notes: Users may not call rdma_destroy_id from this callback to destroy * the passed in id, or a corresponding listen id. Returning a * non-zero value from the callback will destroy the corresponding id. */ CMA will actually always destroy the passed in id, not the "corresponding id". -- MST From halr at voltaire.com Mon May 8 07:08:07 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 08 May 2006 10:08:07 -0400 Subject: [openib-general] Dump and load routes with opensm? In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB24@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB24@mtlexch01.mtl.com> Message-ID: <1147096946.4719.133600.camel@hal.voltaire.com> On Mon, 2006-05-08 at 10:00, Eitan Zahavi wrote: > > > > > > A plan for OpenSM to support loading unicast routes already exists. > > > > What's the timeframe for implementing this ? Has the implementation > > already started ? Is it what you write below and had posted to the > list > > last summer ? > [EZ] Yes: it is the same proposal. No: the work has not started. > > > > > To do it we need to develop a scheme where the routes file also > holds > > > the topology using GUIDs > > > such that the discovered topology is compared to the saved one. I > have > > > outlined the algorithm in > > > the attached writeup on OpenSM routing algorithms section 6: > > > "Incremental Algorithms". > > > > Wouldn't it be better to separate out the persistency and incremental > > issues ? > [EZ] I think that once you load the routes in the file you probably need > to compare the topology used for generating them with the current one. Absolutely. > The complexity level of "incremental" support - i.e. the ability to > gracefully handle some topology change - might be built with time. So > first implementation might be limited to compare with reference topology > and reroute on any change. The reroute on change might ultimately be the default but there might be an option to disable this. > But it is a bit crude in my mind. Yes. It would allow for the experimentation I think they desire though. > > Also, I sent numerous questions and comments on this writeup some time > > ago. > [EZ] I thought I did answer them. I don't think so but I need to dig this out of my email. It's been quite a while... > > > The right place to implement this will be to have the osm_db* > enhanced > > > to support this new DB domain. > > > Then the osm_ucast_mgr.c will need to initialize the DB and use it > while > > > routing. > > > > Also, changes versus this either need disabling or handling. > [EZ] Not following you here. If you mean we need to be able to "shut > off" this new capability - sure. I was referring to more than this (in terms of disabling the reroute on change). -- Hal > > -- Hal > > > > > Regarding dump of existing tables: > > > > > > If you run opensm -V or -D 0x43 you should get the file > /tmp/osm.fdbs > > > with that dump. > > > You can also use a more SM independent method for obtaining the > tables : > > > if you run ibdiagnet you should get the file /tmp/ibdiagnet.fdbs > with > > > similar format. > > > > > > Eitan > > > > > > Greg Johnson wrote: > > > > On Thu, May 04, 2006 at 12:39:38PM -0400, Hal Rosenstock wrote: > > > > > > > >>Hi Greg, > > > >> > > > >>On Thu, 2006-05-04 at 12:34, Greg Johnson wrote: > > > >> > > > >>>Is there currently a way to dump and load routes with opensm? If > > > not, > > > >>>how would I go about writing one? > > > >> > > > >>Is it really routes or stable LIDs you want ? > > > > > > > > > > > > I actually want routes. I have queried them with ibtraceroute and > > > > ibroute, but we need routes for the whole fabric. > > > > > > > > BTW, if you call ibtraceroute thousands of times it stops working. > > > > Maybe a problem in the MAD driver? > > > > > > > > > > > > > > > >>LIDs are stored in /var/cache/osm/guid2lid and restored from there > > > when > > > >>OpenSM is started assuming the reassign LIDs option (-r or > > > >>--reassign_lids) is not used when invoking OpenSM. > > > > > > > > > > > > Thanks, that's good to know. > > > > > > > > Greg > > > > _______________________________________________ > > > > openib-general mailing list > > > > openib-general at openib.org > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > > > > ______________________________________________________________________ > > > > > > OpenSM Unicast Routing Enhancements for Scalability > > > ===================================================== > > > > > > Authors:Eitan Zahavi , Yael Kalka > > > > Date: Aug 2005. > > > > > > Table of contents: > > > 1. Overview > > > 2. Notation > > > 3. Current Algorithms > > > 4. Proposed Routing Algorithms > > > 5. Min Hop Tables Implementation > > > 6. Incremental Routing > > > 7. Routing Persistancy > > > > > > 1. Overview: > > > ------------ > > > > > > OpenSM currently uses a two stage routing algorithm for unicast > > > forwarding tables calculation. As shown later these algorithm are > > > O(N^3). Inspected run time of OpenSM routing stage was ~2.5min on > 1100 > > > nodes cluster. The purpose of this memo is to present the community > > > the proposed work for enhancing OpenSM routing engine. > > > > > > 2. Notation: > > > ------------ > > > The following notations are used throughout this document: > > > S = Number of switch devices in the system > > > P = Number of ports each switch node has > > > H = Number of HCA ports connected to the fabric > > > L = Number of HCAs connected to each Leaf switch device. > > > Normal values are 1/2P to 3/4P > > > D = Fat Tree depth > > > > > > 3. Current Algorithms: > > > ---------------------- > > > OpenSM provide two routing algorithms: Minimal Hop and Up/Down. Both > > > of them do not scale with cluster size and can consume both large > run > > > time (minutes) and memory (GB). This section provides meta code for > > > these algorithms and order calculation. > > > > > > 3.1 Min Hop algorithm analysis: > > > The Min Hop algorithm is divided into two stages: computation of > > > min-hop tables on every switch and LFT output port assignment. > > > > > > Step 1: Computation of Min-Hop Tables on each switch > > > > > > The memory consumed is S*(S+H)*(P+2)*Byte. On 10K nodes cluster with > > > 2500 switch devices this ends up as 812M-Byte (using LMC=0). > > > > > > Meta algorithm: > > > For each HCA mark its remote neighbor switch port with hop 1. > > > For each switch mark itself port 0 as hop 0 > > > While changes > > > For each switch > > > For each port > > > For each LID > > > Propagate remote port hop as hop +1 if smaller or undefined > > > > > > The order of this step: O(S*P*(S+H)*(D+1)) > > > > > > Step 2: Assigning output port: > > > For each switch > > > For each LID > > > For each Port > > > Is it the one with min count > > > > > > The order of this step: O(S*(S+H)*P) > > > > > > 3.2 Up/Down algorithm analysis > > > The Up/Down algorithm depends on the ability to rank the fabric > nodes > > > from root to leaf of the tree. To get that ranking it runs a > > > heuristics that is based on the Min Hop tables. So the memory and > > > complexity are identical to the Min-Hop first step to start with. > > > > > > Once ranking is performed the algorithm is BFS from every HCA and > fill > > > in the Min Hop tables again. Up/Down traversal rule is enforced > during > > > the BFS such that only valid turns are allowed. > > > > > > Meta algorithm: > > > For each HCA > > > Get connected Switch > > > For each Switch in NextSwitches > > > For each Port > > > Check if direction is OK. Check if not visited > > > > > > The order is O(H*S*P) > > > > > > To finalize output port assignment the second step of the Min Hop > > > algorithm is invoked. > > > > > > > > > 4. Proposed Algorithm: > > > ---------------------- > > > > > > Inspecting the routing problem we have noticed the following > > > attributes: > > > a. Using Min-Hop tables for keeping intermediate routing > information > > > has a disadvantage in terms of memory consumption. However, any > > > incremental routing algorithm (for handling fabric changes after > > > first setup), or routing persistence solution could use this > > > information and gain speed. > > > b. Since we need to fill in LFT tables that are of the order S*(S+H) > > > the algorithm is lower bounded by O(H^2). > > > c. A persistence based solution which uses previously routed fabric > > > data and is able to handle simple incremental changes will > provide > > > a much faster runtime as topology match will require O(S*P) > > > (traversing all links once) > > > d. Since the minimum hops information is identical for a switch and > > > all the HCAs connected to it - there is no point in building "min > > > hop" tables for HCAs. During the "output port" assignment stage, > > > the HCAs connected to each switch are considered and routed. > > > > > > The result of "a" is that several algorithms that are superior from > > > memory footprint and skip any "hop table" stage are not considered > for > > > implementation. > > > > > > To support "d" we needed to provide an index to each switch such > that > > > the "min hop" tables are dense (previously they were indexed by > LID). > > > The new index is stored on the switch object and thus allow lookup > of > > > a switch "min hop" by its index. An array of switch pointer by index > > > supports the reverse lookup. > > > > > > The suggested algorithm is broken into the following 3 stages: > > > * Root nodes identification heuristics > > > * Min Hop tables computation > > > * Output port assignment > > > > > > 4.1 Root nodes identification heuristics: > > > This step is only required under the AND of the following two > > > conditions: > > > * Up/Down routing is required > > > * The user does not provide a file with guids of the tree "root > nodes" > > > > > > This heuristics for recognizing the tree roots is based on > histograms > > > of the HCAs distance from every switch. > > > I.e. How many HCAs are 1 hop, 2 hops from the switch. In order to > fill > > > in these histograms on all switches we need to BFS from every leaf > > > switch and propagate the number of HCAs connected to it: > > > > > > Meta algorithm: > > > For each switch > > > For each Port > > > If connected to HCAs count them > > > If any HCA > > > Init BFS to start with current switch > > > set hop count to 0 > > > While there are switches in BFS list > > > increment hop count > > > For each switch in BFS List > > > Add the number of HCAs to the histogram at the current hop > count > > > For each port > > > If remote port switch not visited > > > Add the switch to the BFS Next Step List > > > Once finished all this step list use next steps > > > > > > The order of this step is: O(S*P + H/L*S*P) = O(*H*S) > > > > > > 4.2 Min Hop tables computation: > > > This step is mandatory and has a slightly different flavor for the > > > case of Up/Down routing. > > > > > > The algorithm starts from every leaf switch and traverses BFS wise > > > through the fabric. > > > > > > Meta algorithm: > > > foreach switch in the fabric > > > clear the "Rank" vector for all switches. > > > start BFS with the given switch. > > > set rank to 0 > > > while any switch in BFS list > > > | foreach switch in BFS switch list > > > | |foreach port (valid, active, not unhealthy) > > > | | if remote side is a switch: > > > | | if rank of remote side 0 or = rank + 1 > > > | | set the remote port entry MinHopPort for this switch > > > | | if rank of remote side 0 i.e. never visited > > > | | set the remote switch rank to rank + 1 > > > | | add the remote switch to next BFS switches > > > | |------ > > > | switch between the current and next switches list > > > | increment rank > > > |------ > > > > > > The order of this algorithm: O(P*S^2) > > > > > > Algorithm that merges Up/Down step criteria not yet written for this > > > stage. But the idea is to make each step keep track of any previous > > > step down. Such that a step up will be prohibited in this case. > > > > > > 4.3 Output port assignment: > > > This step provide actual LFT values assignment to each switch. > > > To do that we access the built "min hop" tables and track port > usage. > > > > > > Meta algorithm: > > > foreach switch in the fabric > > > clear the port subscription vector (track number of paths > subscribed) > > > foreach target switch in the "min hop" table > > > get the list of min-hop ports > > > foreach end-node attached (HCA connected to it and itself) > > > if lmc > 0 init tracking of remote system and node for selecting > > > disjoint paths for same end-node different LID LSBs > > > get min-subscribed (and disjoint) port marked min-hop target > switch > > > track port usage in port-subscription (opt. if target LID is not > a switch) > > > > > > Order of this step: > > > Currently the selection of the output port by min-subscription is > > > trivial and requires O(P) so the overall order is > > > O(S*S*(1+L)*P) <= O(16*P*S^2) > > > > > > One could obtain the list of marked min-hop ports and then use a > > > modified cyclic list for avoiding the search for min subscription in > > > the case of LMC > 0. In that case the order could be reduced to: > > > O(S*S*(1+L)) ~= O(S*H) > > > > > > 5. Min Hop Table Implementation: > > > -------------------------------- > > > The proposed algorithm does not require storing the number of hops > > > arriving at the switch port - but only the fact a port is on the min > > > hop path. This allowed for another memory usage reduction if the min > > > hop table would be of boolean values. > > > > > > The issue then is in an efficiant iterator on the boolean (bits) > > > array. The tradeoff is thus the common memory versus runtime. > > > > > > (Anybody knowns off a fast boolean array lookup implementation ?) > > > > > > 6. Incremental Routing: > > > ----------------------- > > > Once the fabric is routed we can define an algorithm for performing > > > incremental routing changes. An obvious case is when a link is > > > declared un-healthy or one of the ports is dropped. Assuming the > > > recognition of the change is done by some other algorithm. The > > > following cases apply: > > > * If the link connects HCA and a switch the HCA is unreachable. No > > > routing change required. > > > * If the link is between switches: > > > * If there at least one another link between these switches: > > > o Spread all routes going through the failing port to the other > > > ports connecting to the same switch. > > > * If there is no other link to these switches > > > o Go back to all switches that feed into each one of the > switches > > > (feed in means they route some target lids through the > switch) > > > but only those that route lids that go through the failing > port. Check > > > to see if there is another port that goes to a different > switch > > > to route that lid to. If there is no other way do nothing. > > > > > > How do we support topology changes line moving an HCA from one > Switch > > > to another? > > > > > > 7. Routing Persistancy: > > > ----------------------- > > > To make the subnet initialization faster, one could store the > existing > > > routing solution and use it without any calculation. > > > > > > The issue is of course what conditions makes the stored routing > > > obsolete. To maximize the usefullnes of the stored information we > > > propose to store the Min Hop tables rather then the final port > > > assignment. It is assumed that after restart there might be a need > for > > > modifications to LMC and routing which will invalidate the LFT > > > anyways. To enable "cache invalidation criteria" the persistent > > > database should include information that could be used to easily > check > > > if the fabric was not altered in a way that invalidates the MinHop > tables. > > > > > > The stored information should hold for each switch in the fabric (by > guid) > > > the list of ports and the guids and port numbers on the remote side. > > > To validate there are no significant changes, the discovered set > > > of switches is checked to match the stored information. Table 1 > > > describes the possible changes and their effect on the validity of > > > the MinHop tables. > > > > > > Table 1 - Connectivity Changes effect on Routing Info Validity > > > > > > Change | Effect on MinHop Tables | Effect on LFT and MFT > | > > > > ------------------------------------------------------------------------ > - > > > New Switch found | Invalidates (might connect | Invalidates > | > > > | more HCAs and carries more | > | > > > | routing resources) | > | > > > > ------------------------------------------------------------------------ > - > > > Missing Switch | Invalidates (MinHops might | Invalidates > | > > > | be broken a few steps away)| > | > > > > ------------------------------------------------------------------------ > - > > > New cable found | Does not invalidate | Does not invalidate > | > > > > ------------------------------------------------------------------------ > - > > > Missing Cable | Invalidates only if there | Invalidates all LIDs > | > > > (SW to SW) | is no other cable | going through that > port | > > > | connecting the switches | > | > > > > ------------------------------------------------------------------------ > - > > > Missing Cable | Does no invalidate | Does not invalidate > | > > > (SW to HCA) | | > | > > > > ------------------------------------------------------------------------ > - > > > New HCA | Does no invalidate | Does not invalidate > | > > > > ------------------------------------------------------------------------ > - > > > Missing HCA | Does no invalidate | Does not invalidate > | > > > > ------------------------------------------------------------------------ > - > > > LID Changes | Does no invalidate | Invalidates the > modified| > > > | | LIDs > | > > > > ------------------------------------------------------------------------ > - > > > > > > Special marking for "root nodes" shold cache the results of the > first > > > step for Up/Dpwn routing. These nodes should be invalidated on any > > > missing or additional switch conditions. > > > > > > From dotanb at mellanox.co.il Mon May 8 07:14:44 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Mon, 8 May 2006 17:14:44 +0300 Subject: [openib-general] [rds]: there is a kernel oops while loading theRDS module in kernel 2.6.11 In-Reply-To: References: Message-ID: <200605081714.44467.dotanb@mellanox.co.il> On Wednesday 03 May 2006 20:04, Sean Hefty wrote: > >Trace:{:ib_local_sa:sa_db_init+148} > > Can you reproduce this just loading the ib_local_sa module? > > - Sean we got this failure because of bad backport to kernel < 2.6.12 (this was found by jackm). the workqueue name was too long. thanks Dotan From bugzilla-daemon at openib.org Mon May 8 07:35:30 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 8 May 2006 07:35:30 -0700 (PDT) Subject: [openib-general] [Bug 65] New: ib_ipoib refuses to unload when alias exists in modprobe.conf Message-ID: <20060508143530.04B6222834D@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=65 Summary: ib_ipoib refuses to unload when alias exists in modprobe.conf Product: OpenFabrics Linux Version: gen2 Platform: X86-64 OS/Version: Other Status: NEW Severity: normal Priority: P2 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: monis at voltaire.com This happens on SuSE10 beta8 but not on RedHat when the line(s) alias ib0 ib_ipoib alias ib1 ib_ipoib exist in /etc/modprobe.conf then unloading ib_ipoib (modprobe -r) doesn't seem to fail (return status=0) but the module still remains up (maybe it is being loaded again automatically). dmesg shows ADDRCONF(NETDEV_UP): ib0: link is not ready ADDRCONF(NETDEV_UP): ib1: link is not ready ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ogerlitz at voltaire.com Mon May 8 07:25:37 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 08 May 2006 17:25:37 +0300 Subject: [openib-general] CMA: compliancy issue? In-Reply-To: <20060508134900.GD21036@mellanox.co.il> References: <20060508085301.GD20207@mellanox.co.il> <445F30E4.9020109@voltaire.com> <20060508131156.GA21036@mellanox.co.il> <445F45A4.4080308@voltaire.com> <20060508133154.GC21036@mellanox.co.il> <445F4AA4.2020802@voltaire.com> <20060508134900.GD21036@mellanox.co.il> Message-ID: <445F54E1.8000305@voltaire.com> Michael S. Tsirkin wrote: > Quoting r. Or Gerlitz : >> You are suggesting a design change in the CMA which would effect also >> the current CMA ULP consumers: iSER, RDS, NFSoRDMA and Lustre. > I don't see how it will affect these ULPs: don't see how ULPs that don't check > private data care whether RTU is sent before, or after, the handler. I ment to say that today they just get an ESTABLISHED event, and with the change you suggest they will get CONNECT_RESPONSE event and will need to call either rdma_accept or rdma_reject, that's a little change in the ULP state machine. > In any case, CMA is still in early stages of development so its natural to > expect API changes. The CMA is pushed upstream for 2.6.18 so i am not sure what do you mean by "early" here, its been under work for many months and its you that started to look/use it only lately. The CMA has very solid design and implementation, its not that down the road, design/API changes are not possible but saying it is in "early stages of development" is insulting, no less. Or. From jackm at mellanox.co.il Mon May 8 07:52:22 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Mon, 8 May 2006 17:52:22 +0300 Subject: [openib-general] [PATCH] libmthca: missing include file if libibverbs does not have ibv_read_sysfs_file Message-ID: <200605081752.22065.jackm@mellanox.co.il> include file needed in the case where libibverbs does not implement the function ibv_read_sysfs_file(). Signed-off-by: Jack Morgenstein Index: userspace/libmthca/src/mthca.c =================================================================== --- userspace/libmthca/src/mthca.c (revision 6980) +++ userspace/libmthca/src/mthca.c (working copy) @@ -44,6 +44,10 @@ #include #include +#ifndef HAVE_IBV_READ_SYSFS_FILE +#include +#endif + #include "mthca.h" #include "mthca-abi.h" From ogerlitz at voltaire.com Mon May 8 08:01:51 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 8 May 2006 18:01:51 +0300 (IDT) Subject: [openib-general] (no subject) Message-ID: Hi Sean, Looking in the code i have realized that it is a must for the CMA consumer to call rdma_disconnect to have the QP state moved into ERROR. Am i correct? with this understanding at hand, i have changed iSER code to call rdma_disconnect even if it got a DISCONNECTED event caused by the passive side initiating the disconnect flow (ie sending a DREQ), since otherwise in such case i never got the FLUSHES on the posted RX/TX WRs. This makes sense to me, just wanted to have your confirmation. Or. Index: iser_verbs.c =================================================================== --- iser_verbs.c (revision 6965) +++ iser_verbs.c (working copy) @@ -318,18 +318,20 @@ void iser_conn_terminate(struct iser_con { int err = 0; - if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, - ISER_CONN_TERMINATING)) { - err = rdma_disconnect(ib_conn->cma_id); - if (err) - iser_err("Failed to disconnect, conn: 0x%p err %d\n", - ib_conn,err); - - } - + /* change the ib conn state only if the conn is UP, however always call + * rdma_disconnect since this is the only way to cause the CMA to change + * the QP state to ERROR + */ + + iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, ISER_CONN_TERMINATING); + err = rdma_disconnect(ib_conn->cma_id); + if (err) + iser_err("Failed to disconnect, conn: 0x%p err %d\n", + ib_conn,err); + wait_event_interruptible(ib_conn->wait, ib_conn->state == ISER_CONN_DOWN); - + iser_conn_release(ib_conn); } From mst at mellanox.co.il Mon May 8 08:05:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 18:05:34 +0300 Subject: [openib-general] Re: CMA: compliancy issue? In-Reply-To: <445F54E1.8000305@voltaire.com> References: <20060508085301.GD20207@mellanox.co.il> <445F30E4.9020109@voltaire.com> <20060508131156.GA21036@mellanox.co.il> <445F45A4.4080308@voltaire.com> <20060508133154.GC21036@mellanox.co.il> <445F4AA4.2020802@voltaire.com> <20060508134900.GD21036@mellanox.co.il> <445F54E1.8000305@voltaire.com> Message-ID: <20060508150534.GF21036@mellanox.co.il> Quoting r. Or Gerlitz : > >>You are suggesting a design change in the CMA which would effect also > >>the current CMA ULP consumers: iSER, RDS, NFSoRDMA and Lustre. > > >I don't see how it will affect these ULPs: don't see how ULPs that don't > >check private data care whether RTU is sent before, or after, the handler. > > I ment to say that today they just get an ESTABLISHED event, and with > the change you suggest they will get CONNECT_RESPONSE event and will > need to call either rdma_accept or rdma_reject, that's a little change > in the ULP state machine. That would work too, and it does not sound like a huge change. I was originally thinking along the lines of still using ESTABLISHED, and simply delaying RTU till after the handler is called. We would then need to teach CMA to perform reject instead of RTU if handler returns an error code. We even can have a flag to select the required behaviour, or even behave specially for SDP, although I don't think this makes a lot of sense. Sean, what looks best to you? -- MST From ogerlitz at voltaire.com Mon May 8 08:06:34 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 8 May 2006 18:06:34 +0300 (IDT) Subject: [openib-general] question on rdma_disconnect Message-ID: Resending, this time with a subject line. Or. ---------- Forwarded message ---------- Date: Mon, 8 May 2006 18:01:51 +0300 (IDT) From: Or Gerlitz To: mshefty at ichips.intel.com Cc: openib-general at openib.org Hi Sean, Looking in the code i have realized that it is a must for the CMA consumer to call rdma_disconnect to have the QP state moved into ERROR. Am i correct? with this understanding at hand, i have changed iSER code to call rdma_disconnect even if it got a DISCONNECTED event caused by the passive side initiating the disconnect flow (ie sending a DREQ), since otherwise in such case i never got the FLUSHES on the posted RX/TX WRs. This makes sense to me, just wanted to have your confirmation. Or. Index: iser_verbs.c =================================================================== --- iser_verbs.c (revision 6965) +++ iser_verbs.c (working copy) @@ -318,18 +318,20 @@ void iser_conn_terminate(struct iser_con { int err = 0; - if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, - ISER_CONN_TERMINATING)) { - err = rdma_disconnect(ib_conn->cma_id); - if (err) - iser_err("Failed to disconnect, conn: 0x%p err %d\n", - ib_conn,err); - - } - + /* change the ib conn state only if the conn is UP, however always call + * rdma_disconnect since this is the only way to cause the CMA to change + * the QP state to ERROR + */ + + iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, ISER_CONN_TERMINATING); + err = rdma_disconnect(ib_conn->cma_id); + if (err) + iser_err("Failed to disconnect, conn: 0x%p err %d\n", + ib_conn,err); + wait_event_interruptible(ib_conn->wait, ib_conn->state == ISER_CONN_DOWN); - + iser_conn_release(ib_conn); } From arkady at netapp.com Mon May 8 08:12:22 2006 From: arkady at netapp.com (Arkady Kanevsky) Date: Mon, 8 May 2006 10:12:22 -0500 Subject: [openib-general] error in mthca_reset? Message-ID: <200605081112.22410.arkady@netapp.com> There seems to be an error in mthca_reset. First, we store 64 double words (256 bytes) for both HCA and HCA bridge: for (i = 0; i < 64; ++i) { if (i == 22 || i == 23) continue; if (pci_read_config_dword(mdev->pdev, i * 4, hca_header + i)) { err = -ENODEV; mthca_err(mdev, "Couldn't save HCA " "PCI header, aborting.\n"); goto out; } } And we avoid storing 22 and 23 double word. Yet, when we restore we only restore first 16 double words: for (i = 0; i < 16; ++i) { if (i * 4 == PCI_COMMAND) continue; if (pci_write_config_dword(mdev->pdev, i * 4, hca_header[i])) { err = -ENODEV; mthca_err(mdev, "Couldn't restore HCA reg %x, " "aborting.\n", i); goto out; } } So, 1. Do we need to restore the whole 64 double words? If not, just store 16 double words that need to be restored instead of 64. 2.. Why do we bother with not storing double words 22 and 23 when we do not restore them? Are 22 and 23, we do not want to store, are not reallly double words but bytes, or words, or something else? Thanks, Arkady From jlentini at netapp.com Mon May 8 08:14:14 2006 From: jlentini at netapp.com (James Lentini) Date: Mon, 8 May 2006 11:14:14 -0400 (EDT) Subject: [openib-general] Re: [PATCH] update uDAPL openib_cma provider to work with new uCMA event channels In-Reply-To: References: Message-ID: On Fri, 5 May 2006, Arlin Davis wrote: > James, > > Update the uDAPL openib_cma provider to work with the new uCMA event > channel interface. I ran a full set of Intel-MPI test suites with > these latest changes and it looks fine. The changes look good to me. > Sync up with Sean on commits. I'm watching for Sean's commit. Did I miss it? From halr at voltaire.com Mon May 8 08:39:01 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 08 May 2006 11:39:01 -0400 Subject: [openib-general] Re: [PATCH] opensm: prevent ports duplication in partition config In-Reply-To: <20060504000539.GC3689@sashak.voltaire.com> References: <20060504000539.GC3689@sashak.voltaire.com> Message-ID: <1147101761.29999.959.camel@hal.voltaire.com> On Wed, 2006-05-03 at 20:05, Sasha Khapyorsky wrote: > Hello Hal, > > There is fix for case when port is repeatedly configured as member of > the same partition. If membership is different this may broke pkey tables > update code. Thanks. Applied to trunk only. -- Hal > Sasha. > > Signed-off-by: Sasha Khapyorsky From rdreier at cisco.com Mon May 8 08:50:04 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 08:50:04 -0700 Subject: [openib-general] Re: cm crash In-Reply-To: (Sean Hefty's message of "Sun, 7 May 2006 20:14:32 -0700") References: Message-ID: >static inline void cm_deref_id(struct cm_id_private *cm_id_priv) >{ > if (atomic_dec_and_test(&cm_id_priv->refcount)) > wake_up(&cm_id_priv->wait); >} > >A thread could test the refcount after atomic_dec_and_test but before >wake_up(&cm_id_priv->wait), and remove cm_id_priv. >This would result in use after free. Yes, there is a small race window there. Hmm, I wonder what the right way to fix that is. This construction is used a number of other places (eg in mthca). - R. From rdreier at cisco.com Mon May 8 08:50:52 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 08:50:52 -0700 Subject: [openib-general] cm crash In-Reply-To: <20060507143705.GI3032@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 7 May 2006 17:37:05 +0300") References: <20060507143705.GI3032@mellanox.co.il> Message-ID: Michael> Hello! I have observed the crash below. From SDP Michael> messages, this seems to happen when I am calling Michael> rdma_destroy_cm_id and at the same time a handler is Michael> running and cma is getting non-zero return code from the Michael> callback. This seems like a bug in SDP to me. Basically you are calling free on the same object twice, and I'm not convinced this should ever work. - R. From rdreier at cisco.com Mon May 8 08:53:01 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 08:53:01 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508054529.GC19660@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 8 May 2006 08:45:29 +0300") References: <20060508054529.GC19660@mellanox.co.il> Message-ID: > + /* Make sure cm_deref_id is not in progress */ > + spin_lock_irq(&cm_id_priv->lock); > + spin_unlock_irq(&cm_id_priv->lock); How does this help anything? cm_deref_id() can still be in progress, it just can't hold the lock during this empty section of code. - R. From tom at opengridcomputing.com Mon May 8 09:03:34 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Mon, 08 May 2006 11:03:34 -0500 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508054529.GC19660@mellanox.co.il> Message-ID: <1147104214.15524.13.camel@trinity.ogc.int> On Mon, 2006-05-08 at 08:53 -0700, Roland Dreier wrote: > > + /* Make sure cm_deref_id is not in progress */ > > + spin_lock_irq(&cm_id_priv->lock); > > + spin_unlock_irq(&cm_id_priv->lock); > > How does this help anything? cm_deref_id() can still be in progress, > it just can't hold the lock during this empty section of code. > Maybe Michael means this... T1 spin_lock_irq(&obj->lock); if (atomic_dec_and_test(&obj->refcount)) wake_up(&obj->wait); spin_unlock_irq(&obj->lock); T2 atomic_dec(&obj->refcount); wait_event(&obj->wait, !atomic_read(&obj->refcount)) spin_lock_irq(&obj->lock); spin_unlock_irq(&obj->lock); kfree(obj); > - R. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Mon May 8 09:03:33 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 09:03:33 -0700 Subject: [openib-general] Re: [PATCH] libmthca: missing include file if libibverbs does not have ibv_read_sysfs_file In-Reply-To: <200605081752.22065.jackm@mellanox.co.il> (Jack Morgenstein's message of "Mon, 8 May 2006 17:52:22 +0300") References: <200605081752.22065.jackm@mellanox.co.il> Message-ID: Thanks, applied. From mst at mellanox.co.il Mon May 8 09:06:14 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 19:06:14 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508054529.GC19660@mellanox.co.il> Message-ID: <20060508160614.GJ21036@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] cm refcount race fix > > > + /* Make sure cm_deref_id is not in progress */ > > + spin_lock_irq(&cm_id_priv->lock); > > + spin_unlock_irq(&cm_id_priv->lock); > > How does this help anything? cm_deref_id() can still be in progress, > it just can't hold the lock during this empty section of code. This does not parse: static inline void cm_deref_id(struct cm_id_private *cm_id_priv) { + unsigned long flags; + + spin_lock_irqsave(&cm_id_priv->lock, flags); if (atomic_dec_and_test(&cm_id_priv->refcount)) wake_up(&cm_id_priv->wait); + spin_unlock_irqrestore(&cm_id_priv->lock, flags); } cm_deref_id does nothing outside the lock. -- MST From mst at mellanox.co.il Mon May 8 09:07:13 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 19:07:13 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <1147104214.15524.13.camel@trinity.ogc.int> References: <20060508054529.GC19660@mellanox.co.il> <1147104214.15524.13.camel@trinity.ogc.int> Message-ID: <20060508160713.GK21036@mellanox.co.il> Quoting r. Tom Tucker : > Subject: Re: [openib-general] Re: [PATCH] cm refcount race fix > > On Mon, 2006-05-08 at 08:53 -0700, Roland Dreier wrote: > > > + /* Make sure cm_deref_id is not in progress */ > > > + spin_lock_irq(&cm_id_priv->lock); > > > + spin_unlock_irq(&cm_id_priv->lock); > > > > How does this help anything? cm_deref_id() can still be in progress, > > it just can't hold the lock during this empty section of code. > > > > Maybe Michael means this... > > > T1 > > spin_lock_irq(&obj->lock); > if (atomic_dec_and_test(&obj->refcount)) > wake_up(&obj->wait); > spin_unlock_irq(&obj->lock); > > T2 > > atomic_dec(&obj->refcount); > wait_event(&obj->wait, !atomic_read(&obj->refcount)) > spin_lock_irq(&obj->lock); > spin_unlock_irq(&obj->lock); > kfree(obj); > Right, that's what the patch does. No? -- MST From mst at mellanox.co.il Mon May 8 09:08:31 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 19:08:31 +0300 Subject: [openib-general] Re: cm crash In-Reply-To: References: <20060507143705.GI3032@mellanox.co.il> Message-ID: <20060508160830.GL21036@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: cm crash > > Michael> Hello! I have observed the crash below. From SDP > Michael> messages, this seems to happen when I am calling > Michael> rdma_destroy_cm_id and at the same time a handler is > Michael> running and cma is getting non-zero return code from the > Michael> callback. > > This seems like a bug in SDP to me. Basically you are calling free on > the same object twice, and I'm not convinced this should ever work. Right. I've corrected this: I now only return error code from handler on request event. -- MST From eitan at mellanox.co.il Mon May 8 09:14:16 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 8 May 2006 19:14:16 +0300 Subject: [openib-general] Dump and load routes with opensm? Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB26@mtlexch01.mtl.com> > > > > > > > > A plan for OpenSM to support loading unicast routes already exists. > > > > > > What's the timeframe for implementing this ? Has the implementation > > > already started ? Is it what you write below and had posted to the > > list > > > last summer ? > > [EZ] Yes: it is the same proposal. No: the work has not started. > > > > > > > To do it we need to develop a scheme where the routes file also > > holds > > > > the topology using GUIDs > > > > such that the discovered topology is compared to the saved one. I > > have > > > > outlined the algorithm in > > > > the attached writeup on OpenSM routing algorithms section 6: > > > > "Incremental Algorithms". > > > > > > Wouldn't it be better to separate out the persistency and incremental > > > issues ? > > [EZ] I think that once you load the routes in the file you probably need > > to compare the topology used for generating them with the current one. > > Absolutely. > > > The complexity level of "incremental" support - i.e. the ability to > > gracefully handle some topology change - might be built with time. So > > first implementation might be limited to compare with reference topology > > and reroute on any change. > > The reroute on change might ultimately be the default but there might be > an option to disable this. [EZ] OK I see. But this means you intentionally agree to have partial connectivity (i.e. no route from every host to every other host). > > > But it is a bit crude in my mind. > > Yes. It would allow for the experimentation I think they desire though. > > > > Also, I sent numerous questions and comments on this writeup some time > > > ago. > > [EZ] I thought I did answer them. > > I don't think so but I need to dig this out of my email. It's been quite > a while... [EZ] I'll try my archive too > > > > > The right place to implement this will be to have the osm_db* > > enhanced > > > > to support this new DB domain. > > > > Then the osm_ucast_mgr.c will need to initialize the DB and use it > > while > > > > routing. > > > > > > Also, changes versus this either need disabling or handling. > > [EZ] Not following you here. If you mean we need to be able to "shut > > off" this new capability - sure. > > I was referring to more than this (in terms of disabling the reroute on > change). [EZ] OK. > > -- Hal > > > > -- Hal > > > > > > > Regarding dump of existing tables: > > > > > > > > If you run opensm -V or -D 0x43 you should get the file > > /tmp/osm.fdbs > > > > with that dump. > > > > You can also use a more SM independent method for obtaining the > > tables : > > > > if you run ibdiagnet you should get the file /tmp/ibdiagnet.fdbs > > with > > > > similar format. > > > > > > > > Eitan > > > > > > > > Greg Johnson wrote: > > > > > On Thu, May 04, 2006 at 12:39:38PM -0400, Hal Rosenstock wrote: > > > > > > > > > >>Hi Greg, > > > > >> > > > > >>On Thu, 2006-05-04 at 12:34, Greg Johnson wrote: > > > > >> > > > > >>>Is there currently a way to dump and load routes with opensm? If > > > > not, > > > > >>>how would I go about writing one? > > > > >> > > > > >>Is it really routes or stable LIDs you want ? > > > > > > > > > > > > > > > I actually want routes. I have queried them with ibtraceroute and > > > > > ibroute, but we need routes for the whole fabric. > > > > > > > > > > BTW, if you call ibtraceroute thousands of times it stops working. > > > > > Maybe a problem in the MAD driver? > > > > > > > > > > > > > > > > > > > >>LIDs are stored in /var/cache/osm/guid2lid and restored from there > > > > when > > > > >>OpenSM is started assuming the reassign LIDs option (-r or > > > > >>--reassign_lids) is not used when invoking OpenSM. > > > > > > > > > > > > > > > Thanks, that's good to know. > > > > > > > > > > Greg > > > > > _______________________________________________ > > > > > openib-general mailing list > > > > > openib-general at openib.org > > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > To unsubscribe, please visit > > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > > > > > > > > > > > > > > > ______________________________________________________________________ > > > > > > > > OpenSM Unicast Routing Enhancements for Scalability > > > > ===================================================== > > > > > > > > Authors:Eitan Zahavi , Yael Kalka > > > > > > Date: Aug 2005. > > > > > > > > Table of contents: > > > > 1. Overview > > > > 2. Notation > > > > 3. Current Algorithms > > > > 4. Proposed Routing Algorithms > > > > 5. Min Hop Tables Implementation > > > > 6. Incremental Routing > > > > 7. Routing Persistancy > > > > > > > > 1. Overview: > > > > ------------ > > > > > > > > OpenSM currently uses a two stage routing algorithm for unicast > > > > forwarding tables calculation. As shown later these algorithm are > > > > O(N^3). Inspected run time of OpenSM routing stage was ~2.5min on > > 1100 > > > > nodes cluster. The purpose of this memo is to present the community > > > > the proposed work for enhancing OpenSM routing engine. > > > > > > > > 2. Notation: > > > > ------------ > > > > The following notations are used throughout this document: > > > > S = Number of switch devices in the system > > > > P = Number of ports each switch node has > > > > H = Number of HCA ports connected to the fabric > > > > L = Number of HCAs connected to each Leaf switch device. > > > > Normal values are 1/2P to 3/4P > > > > D = Fat Tree depth > > > > > > > > 3. Current Algorithms: > > > > ---------------------- > > > > OpenSM provide two routing algorithms: Minimal Hop and Up/Down. Both > > > > of them do not scale with cluster size and can consume both large > > run > > > > time (minutes) and memory (GB). This section provides meta code for > > > > these algorithms and order calculation. > > > > > > > > 3.1 Min Hop algorithm analysis: > > > > The Min Hop algorithm is divided into two stages: computation of > > > > min-hop tables on every switch and LFT output port assignment. > > > > > > > > Step 1: Computation of Min-Hop Tables on each switch > > > > > > > > The memory consumed is S*(S+H)*(P+2)*Byte. On 10K nodes cluster with > > > > 2500 switch devices this ends up as 812M-Byte (using LMC=0). > > > > > > > > Meta algorithm: > > > > For each HCA mark its remote neighbor switch port with hop 1. > > > > For each switch mark itself port 0 as hop 0 > > > > While changes > > > > For each switch > > > > For each port > > > > For each LID > > > > Propagate remote port hop as hop +1 if smaller or undefined > > > > > > > > The order of this step: O(S*P*(S+H)*(D+1)) > > > > > > > > Step 2: Assigning output port: > > > > For each switch > > > > For each LID > > > > For each Port > > > > Is it the one with min count > > > > > > > > The order of this step: O(S*(S+H)*P) > > > > > > > > 3.2 Up/Down algorithm analysis > > > > The Up/Down algorithm depends on the ability to rank the fabric > > nodes > > > > from root to leaf of the tree. To get that ranking it runs a > > > > heuristics that is based on the Min Hop tables. So the memory and > > > > complexity are identical to the Min-Hop first step to start with. > > > > > > > > Once ranking is performed the algorithm is BFS from every HCA and > > fill > > > > in the Min Hop tables again. Up/Down traversal rule is enforced > > during > > > > the BFS such that only valid turns are allowed. > > > > > > > > Meta algorithm: > > > > For each HCA > > > > Get connected Switch > > > > For each Switch in NextSwitches > > > > For each Port > > > > Check if direction is OK. Check if not visited > > > > > > > > The order is O(H*S*P) > > > > > > > > To finalize output port assignment the second step of the Min Hop > > > > algorithm is invoked. > > > > > > > > > > > > 4. Proposed Algorithm: > > > > ---------------------- > > > > > > > > Inspecting the routing problem we have noticed the following > > > > attributes: > > > > a. Using Min-Hop tables for keeping intermediate routing > > information > > > > has a disadvantage in terms of memory consumption. However, any > > > > incremental routing algorithm (for handling fabric changes after > > > > first setup), or routing persistence solution could use this > > > > information and gain speed. > > > > b. Since we need to fill in LFT tables that are of the order S*(S+H) > > > > the algorithm is lower bounded by O(H^2). > > > > c. A persistence based solution which uses previously routed fabric > > > > data and is able to handle simple incremental changes will > > provide > > > > a much faster runtime as topology match will require O(S*P) > > > > (traversing all links once) > > > > d. Since the minimum hops information is identical for a switch and > > > > all the HCAs connected to it - there is no point in building "min > > > > hop" tables for HCAs. During the "output port" assignment stage, > > > > the HCAs connected to each switch are considered and routed. > > > > > > > > The result of "a" is that several algorithms that are superior from > > > > memory footprint and skip any "hop table" stage are not considered > > for > > > > implementation. > > > > > > > > To support "d" we needed to provide an index to each switch such > > that > > > > the "min hop" tables are dense (previously they were indexed by > > LID). > > > > The new index is stored on the switch object and thus allow lookup > > of > > > > a switch "min hop" by its index. An array of switch pointer by index > > > > supports the reverse lookup. > > > > > > > > The suggested algorithm is broken into the following 3 stages: > > > > * Root nodes identification heuristics > > > > * Min Hop tables computation > > > > * Output port assignment > > > > > > > > 4.1 Root nodes identification heuristics: > > > > This step is only required under the AND of the following two > > > > conditions: > > > > * Up/Down routing is required > > > > * The user does not provide a file with guids of the tree "root > > nodes" > > > > > > > > This heuristics for recognizing the tree roots is based on > > histograms > > > > of the HCAs distance from every switch. > > > > I.e. How many HCAs are 1 hop, 2 hops from the switch. In order to > > fill > > > > in these histograms on all switches we need to BFS from every leaf > > > > switch and propagate the number of HCAs connected to it: > > > > > > > > Meta algorithm: > > > > For each switch > > > > For each Port > > > > If connected to HCAs count them > > > > If any HCA > > > > Init BFS to start with current switch > > > > set hop count to 0 > > > > While there are switches in BFS list > > > > increment hop count > > > > For each switch in BFS List > > > > Add the number of HCAs to the histogram at the current hop > > count > > > > For each port > > > > If remote port switch not visited > > > > Add the switch to the BFS Next Step List > > > > Once finished all this step list use next steps > > > > > > > > The order of this step is: O(S*P + H/L*S*P) = O(*H*S) > > > > > > > > 4.2 Min Hop tables computation: > > > > This step is mandatory and has a slightly different flavor for the > > > > case of Up/Down routing. > > > > > > > > The algorithm starts from every leaf switch and traverses BFS wise > > > > through the fabric. > > > > > > > > Meta algorithm: > > > > foreach switch in the fabric > > > > clear the "Rank" vector for all switches. > > > > start BFS with the given switch. > > > > set rank to 0 > > > > while any switch in BFS list > > > > | foreach switch in BFS switch list > > > > | |foreach port (valid, active, not unhealthy) > > > > | | if remote side is a switch: > > > > | | if rank of remote side 0 or = rank + 1 > > > > | | set the remote port entry MinHopPort for this switch > > > > | | if rank of remote side 0 i.e. never visited > > > > | | set the remote switch rank to rank + 1 > > > > | | add the remote switch to next BFS switches > > > > | |------ > > > > | switch between the current and next switches list > > > > | increment rank > > > > |------ > > > > > > > > The order of this algorithm: O(P*S^2) > > > > > > > > Algorithm that merges Up/Down step criteria not yet written for this > > > > stage. But the idea is to make each step keep track of any previous > > > > step down. Such that a step up will be prohibited in this case. > > > > > > > > 4.3 Output port assignment: > > > > This step provide actual LFT values assignment to each switch. > > > > To do that we access the built "min hop" tables and track port > > usage. > > > > > > > > Meta algorithm: > > > > foreach switch in the fabric > > > > clear the port subscription vector (track number of paths > > subscribed) > > > > foreach target switch in the "min hop" table > > > > get the list of min-hop ports > > > > foreach end-node attached (HCA connected to it and itself) > > > > if lmc > 0 init tracking of remote system and node for selecting > > > > disjoint paths for same end-node different LID LSBs > > > > get min-subscribed (and disjoint) port marked min-hop target > > switch > > > > track port usage in port-subscription (opt. if target LID is not > > a switch) > > > > > > > > Order of this step: > > > > Currently the selection of the output port by min-subscription is > > > > trivial and requires O(P) so the overall order is > > > > O(S*S*(1+L)*P) <= O(16*P*S^2) > > > > > > > > One could obtain the list of marked min-hop ports and then use a > > > > modified cyclic list for avoiding the search for min subscription in > > > > the case of LMC > 0. In that case the order could be reduced to: > > > > O(S*S*(1+L)) ~= O(S*H) > > > > > > > > 5. Min Hop Table Implementation: > > > > -------------------------------- > > > > The proposed algorithm does not require storing the number of hops > > > > arriving at the switch port - but only the fact a port is on the min > > > > hop path. This allowed for another memory usage reduction if the min > > > > hop table would be of boolean values. > > > > > > > > The issue then is in an efficiant iterator on the boolean (bits) > > > > array. The tradeoff is thus the common memory versus runtime. > > > > > > > > (Anybody knowns off a fast boolean array lookup implementation ?) > > > > > > > > 6. Incremental Routing: > > > > ----------------------- > > > > Once the fabric is routed we can define an algorithm for performing > > > > incremental routing changes. An obvious case is when a link is > > > > declared un-healthy or one of the ports is dropped. Assuming the > > > > recognition of the change is done by some other algorithm. The > > > > following cases apply: > > > > * If the link connects HCA and a switch the HCA is unreachable. No > > > > routing change required. > > > > * If the link is between switches: > > > > * If there at least one another link between these switches: > > > > o Spread all routes going through the failing port to the other > > > > ports connecting to the same switch. > > > > * If there is no other link to these switches > > > > o Go back to all switches that feed into each one of the > > switches > > > > (feed in means they route some target lids through the > > switch) > > > > but only those that route lids that go through the failing > > port. Check > > > > to see if there is another port that goes to a different > > switch > > > > to route that lid to. If there is no other way do nothing. > > > > > > > > How do we support topology changes line moving an HCA from one > > Switch > > > > to another? > > > > > > > > 7. Routing Persistancy: > > > > ----------------------- > > > > To make the subnet initialization faster, one could store the > > existing > > > > routing solution and use it without any calculation. > > > > > > > > The issue is of course what conditions makes the stored routing > > > > obsolete. To maximize the usefullnes of the stored information we > > > > propose to store the Min Hop tables rather then the final port > > > > assignment. It is assumed that after restart there might be a need > > for > > > > modifications to LMC and routing which will invalidate the LFT > > > > anyways. To enable "cache invalidation criteria" the persistent > > > > database should include information that could be used to easily > > check > > > > if the fabric was not altered in a way that invalidates the MinHop > > tables. > > > > > > > > The stored information should hold for each switch in the fabric (by > > guid) > > > > the list of ports and the guids and port numbers on the remote side. > > > > To validate there are no significant changes, the discovered set > > > > of switches is checked to match the stored information. Table 1 > > > > describes the possible changes and their effect on the validity of > > > > the MinHop tables. > > > > > > > > Table 1 - Connectivity Changes effect on Routing Info Validity > > > > > > > > Change | Effect on MinHop Tables | Effect on LFT and MFT > > | > > > > > > ------------------------------------------------------------------------ > > - > > > > New Switch found | Invalidates (might connect | Invalidates > > | > > > > | more HCAs and carries more | > > | > > > > | routing resources) | > > | > > > > > > ------------------------------------------------------------------------ > > - > > > > Missing Switch | Invalidates (MinHops might | Invalidates > > | > > > > | be broken a few steps away)| > > | > > > > > > ------------------------------------------------------------------------ > > - > > > > New cable found | Does not invalidate | Does not invalidate > > | > > > > > > ------------------------------------------------------------------------ > > - > > > > Missing Cable | Invalidates only if there | Invalidates all LIDs > > | > > > > (SW to SW) | is no other cable | going through that > > port | > > > > | connecting the switches | > > | > > > > > > ------------------------------------------------------------------------ > > - > > > > Missing Cable | Does no invalidate | Does not invalidate > > | > > > > (SW to HCA) | | > > | > > > > > > ------------------------------------------------------------------------ > > - > > > > New HCA | Does no invalidate | Does not invalidate > > | > > > > > > ------------------------------------------------------------------------ > > - > > > > Missing HCA | Does no invalidate | Does not invalidate > > | > > > > > > ------------------------------------------------------------------------ > > - > > > > LID Changes | Does no invalidate | Invalidates the > > modified| > > > > | | LIDs > > | > > > > > > ------------------------------------------------------------------------ > > - > > > > > > > > Special marking for "root nodes" shold cache the results of the > > first > > > > step for Up/Dpwn routing. These nodes should be invalidated on any > > > > missing or additional switch conditions. > > > > > > > > From tom at opengridcomputing.com Mon May 8 09:15:19 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Mon, 08 May 2006 11:15:19 -0500 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508160713.GK21036@mellanox.co.il> References: <20060508054529.GC19660@mellanox.co.il> <1147104214.15524.13.camel@trinity.ogc.int> <20060508160713.GK21036@mellanox.co.il> Message-ID: <1147104919.15524.16.camel@trinity.ogc.int> sry...i responded to Roland's Q without reading the whole thread -- the pox on me! On Mon, 2006-05-08 at 19:07 +0300, Michael S. Tsirkin wrote: > Quoting r. Tom Tucker : > > Subject: Re: [openib-general] Re: [PATCH] cm refcount race fix > > > > On Mon, 2006-05-08 at 08:53 -0700, Roland Dreier wrote: > > > > + /* Make sure cm_deref_id is not in progress */ > > > > + spin_lock_irq(&cm_id_priv->lock); > > > > + spin_unlock_irq(&cm_id_priv->lock); > > > > > > How does this help anything? cm_deref_id() can still be in progress, > > > it just can't hold the lock during this empty section of code. > > > > > > > Maybe Michael means this... > > > > > > T1 > > > > spin_lock_irq(&obj->lock); > > if (atomic_dec_and_test(&obj->refcount)) > > wake_up(&obj->wait); > > spin_unlock_irq(&obj->lock); > > > > T2 > > > > atomic_dec(&obj->refcount); > > wait_event(&obj->wait, !atomic_read(&obj->refcount)) > > spin_lock_irq(&obj->lock); > > spin_unlock_irq(&obj->lock); > > kfree(obj); > > > > Right, that's what the patch does. No? > From rdreier at cisco.com Mon May 8 09:15:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 09:15:13 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <1147104214.15524.13.camel@trinity.ogc.int> (Tom Tucker's message of "Mon, 08 May 2006 11:03:34 -0500") References: <20060508054529.GC19660@mellanox.co.il> <1147104214.15524.13.camel@trinity.ogc.int> Message-ID: > atomic_dec(&obj->refcount); > wait_event(&obj->wait, !atomic_read(&obj->refcount)) > spin_lock_irq(&obj->lock); > spin_unlock_irq(&obj->lock); > kfree(obj); Yeah, that seems to work. I wonder if there's a cleaner way though -- this sort of empty locked section is not exactly idiomatic. - R. From mst at mellanox.co.il Mon May 8 09:19:47 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 19:19:47 +0300 Subject: [openib-general] Re: Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508054529.GC19660@mellanox.co.il> <1147104214.15524.13.camel@trinity.ogc.int> Message-ID: <20060508161947.GN21036@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: Re: [PATCH] cm refcount race fix > > > atomic_dec(&obj->refcount); > > wait_event(&obj->wait, !atomic_read(&obj->refcount)) > > spin_lock_irq(&obj->lock); > > spin_unlock_irq(&obj->lock); > > kfree(obj); > > Yeah, that seems to work. I wonder if there's a cleaner way though -- > this sort of empty locked section is not exactly idiomatic. We can change refcount from atomic to a simple integer, protected by lock. And then wait_event(&obj->wait, ({ spin_lock_irq(&obj->lock); count = obj->refcount; spin_unlock_irq(&obj->lock); count; }) It's a big change though. -- MST From vuhuong at mellanox.com Mon May 8 09:19:32 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Mon, 08 May 2006 09:19:32 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: References: <44450F89.7020500@mellanox.com> <44451D32.1010106@mellanox.com> <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> <445B9F3E.7060601@mellanox.com> <445BA0E8.7010104@mellanox.com> <445BDACE.8000205@mellanox.com> <445C3C5E.9080207@mellanox.com> Message-ID: <445F6F94.6000701@mellanox.com> Roland Dreier wrote: > > 1st scsi_try_host_reset() --> srp_host_reset() --> > > srp_reconnect_target() return SUCCESS. Then scsi_eh_try_stu() or > > scsi_eh_tur() is called right after > > > > scsi_eh_try_stu or scsi_eh_tur --> scsi_send_eh_cmnd() --> > > srp_queuecommand() > > But after srp_reconnect_target(), both SRP's and the midlayer's queue > of pending commands should be completely empty, since I put > > list_for_each_entry(req, &target->req_queue, list) { > req->scmnd->result = DID_RESET << 16; > req->scmnd->scsi_done(req->scmnd); > srp_unmap_data(req->scmnd, target, req); > } > > and > > INIT_LIST_HEAD(&target->free_reqs); > INIT_LIST_HEAD(&target->req_queue); > for (i = 0; i < SRP_SQ_SIZE; ++i) > list_add_tail(&target->req_ring[i].list, &target->free_reqs); > > in there. Why doesn't that work to kill all the pending commands? That works fine and kills all the pending commands; however right after srp_host_reset return, scsi error handling queue/send the stu or tur scsi command right away in the error handling flow of function scsi_eh_host_reset() Please re-read scsi_eh_host_reset() and scsi_try_host_reset() in scsi_error.c. Here is the logic scsi_eh_host_reset() --> scsi_try_host_reset() --> srp_host_reset() --- all pending command are killed. srp_host_reset() returns SUCCESS, scsi_try_host_reset() returns SUCCCESS. static int scsi_eh_host_reset(struct list_head *work_q, struct list_head *done_q) { ... rtn = scsi_try_host_reset(scmd); if (rtn == SUCCESS) { list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (!scsi_device_online(scmd->device) || (!scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd)) || !scsi_eh_tur(scmd)) ... } Since the (rtn == SUCCESS), scsi_eh_host_reset calls scsi_eh_try_stu() or scsi_eh_try_tur() which will call scsi_send_eh_cmnd() --> srp_queuecommand(). Now srp's request queue is not empty anymore. scsi_eh_try_stu or scsi_eh_try_tur get timeout, scsi midlayer tried to abort stu or tur command as well. Since we delay to clean in srp_reset_device(), srp's request queue is still not empty. This stu or tur command is freed by scsi midlayer. The next srp_host_reset() will try to clean srp's request queue with "old" request referencing to freed scsi command. If you still have question, I can call you or give me a call at (408) 916-0006 Vu From sean.hefty at intel.com Mon May 8 09:14:34 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 8 May 2006 09:14:34 -0700 Subject: [openib-general] Re: [PATCH] update uDAPL openib_cma provider to work with new uCMA event channels In-Reply-To: Message-ID: >> Sync up with Sean on commits. > >I'm watching for Sean's commit. Did I miss it? No - I will commit by noon PST today. - Sean From mst at mellanox.co.il Mon May 8 09:23:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 19:23:34 +0300 Subject: [openib-general] Re: Re: [PATCH] cm refcount race fix In-Reply-To: <20060508161947.GN21036@mellanox.co.il> References: <20060508054529.GC19660@mellanox.co.il> <1147104214.15524.13.camel@trinity.ogc.int> <20060508161947.GN21036@mellanox.co.il> Message-ID: <20060508162334.GO21036@mellanox.co.il> Quoting r. Michael S. Tsirkin : > We can change refcount from atomic to a simple integer, protected > by lock. And then > > wait_event(&obj->wait, ({ > spin_lock_irq(&obj->lock); > count = obj->refcount; > spin_unlock_irq(&obj->lock); > count; > }) Make that wait_event(&obj->wait, ({ spin_lock_irq(&obj->lock); count = obj->refcount; spin_unlock_irq(&obj->lock); !count; }) -- MST From rdreier at cisco.com Mon May 8 09:31:17 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 09:31:17 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <445F6F94.6000701@mellanox.com> (Vu Pham's message of "Mon, 08 May 2006 09:19:32 -0700") References: <44451D32.1010106@mellanox.com> <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> <445B9F3E.7060601@mellanox.com> <445BA0E8.7010104@mellanox.com> <445BDACE.8000205@mellanox.com> <445C3C5E.9080207@mellanox.com> <445F6F94.6000701@mellanox.com> Message-ID: Vu> scsi_eh_try_stu or scsi_eh_try_tur get timeout, scsi midlayer Vu> tried to abort stu or tur command as well. Since we delay to Vu> clean in srp_reset_device(), srp's request queue is still not Vu> empty. This stu or tur command is freed by scsi midlayer. The Vu> next srp_host_reset() will try to clean srp's request queue Vu> with "old" request referencing to freed scsi command. This is where I get confused. We should be flushing out the command queue in srp_host_reset(), so the loop in scsi_error.c after resetting the host: rtn = scsi_try_host_reset(scmd); if (rtn == SUCCESS) { list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (!scsi_device_online(scmd->device) || (!scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd)) || !scsi_eh_tur(scmd)) scsi_eh_finish_cmd(scmd, done_q); } should not find any commands still queued. Your previous patch can't be the right fix. I think there are two things wrong with the changes below to srp_reset_device(): - You changed srp_abort to remove the request from SRP's queue, but then you look it up and use it again in srp_reset_device, which seems risky at best. - If srp_reset_device() succeeds, you don't flush all matching commands, so this will definitely leave some stale commands in SRP's queue. static int srp_reset_device(struct scsi_cmnd *scmnd) { struct srp_target_port *target = host_to_target(scmnd->device->host); - struct srp_request *req, *tmp; + struct srp_request *req; + int ret = SUCCESS; printk(KERN_ERR "SRP reset_device called\n"); - if (srp_find_req(target, scmnd, &req)) - return FAILED; - if (srp_send_tsk_mgmt(target, req, SRP_TSK_LUN_RESET)) - return FAILED; - if (req->tsk_status) - return FAILED; - - spin_lock_irq(target->scsi_host->host_lock); - - list_for_each_entry_safe(req, tmp, &target->req_queue, list) - if (req->scmnd->device == scmnd->device) { - req->scmnd->result = DID_RESET << 16; - scmnd->scsi_done(scmnd); - srp_remove_req(target, req); - } - - spin_unlock_irq(target->scsi_host->host_lock); + if ((srp_find_req(target, scmnd, &req)) || + (srp_send_tsk_mgmt(target, req, SRP_TSK_LUN_RESET)) || + (req->tsk_status)) + ret = FAILED; - return SUCCESS; + return ret; } From rdreier at cisco.com Mon May 8 09:39:55 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 09:39:55 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508160614.GJ21036@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 8 May 2006 19:06:14 +0300") References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> Message-ID: Michael> cm_deref_id does nothing outside the lock. But you could still have: /* Make sure cm_deref_id is not in progress */ spin_lock_irq(&cm_id_priv->lock); spin_unlock_irq(&cm_id_priv->lock); spin_lock_irqsave(&cm_id_priv->lock, flags); if (atomic_dec_and_test(&cm_id_priv->refcount)) wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount)); while ((work = cm_dequeue_work(cm_id_priv)) != NULL) cm_free_work(work); kfree(cm_id_priv->compare_data); wake_up(&cm_id_priv->wait); spin_unlock_irqrestore(&cm_id_priv->lock, flags); - R. From rdreier at cisco.com Mon May 8 09:41:36 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 09:41:36 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508162334.GO21036@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 8 May 2006 19:23:34 +0300") References: <20060508054529.GC19660@mellanox.co.il> <1147104214.15524.13.camel@trinity.ogc.int> <20060508161947.GN21036@mellanox.co.il> <20060508162334.GO21036@mellanox.co.il> Message-ID: Good idea. I think with static inline int get_obj_refcount(struct foo *obj) { int c; spin_lock_irq(&obj->lock); c = obj->refcount; spin_unlock_irq(&obj->lock); return c; } then wait_event(&obj->wait, !get_obj_refcount(&obj)); looks like a pretty clean solution. - R. From mst at mellanox.co.il Mon May 8 09:44:36 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 19:44:36 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> Message-ID: <20060508164436.GP21036@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] cm refcount race fix > > Michael> cm_deref_id does nothing outside the lock. > > But you could still have: > > /* Make sure cm_deref_id is not in progress */ > spin_lock_irq(&cm_id_priv->lock); > spin_unlock_irq(&cm_id_priv->lock); > > spin_lock_irqsave(&cm_id_priv->lock, flags); > if (atomic_dec_and_test(&cm_id_priv->refcount)) > > wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount)); > > while ((work = cm_dequeue_work(cm_id_priv)) != NULL) > cm_free_work(work); > kfree(cm_id_priv->compare_data); > > wake_up(&cm_id_priv->wait); > spin_unlock_irqrestore(&cm_id_priv->lock, flags); Now I am confused. That's not what my patch does. http://openib.org/pipermail/openib-general/2006-May/021040.html I do lock/unlock *after* refcount is 0: after wait_event. Please tell me what I am missing. -- MST From mst at mellanox.co.il Mon May 8 09:46:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 19:46:54 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508054529.GC19660@mellanox.co.il> <1147104214.15524.13.camel@trinity.ogc.int> <20060508161947.GN21036@mellanox.co.il> <20060508162334.GO21036@mellanox.co.il> Message-ID: <20060508164654.GQ21036@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] cm refcount race fix > > Good idea. I think with > > static inline int get_obj_refcount(struct foo *obj) > { > int c; > > spin_lock_irq(&obj->lock); > c = obj->refcount; > spin_unlock_irq(&obj->lock); > > return c; > } > > then > > wait_event(&obj->wait, !get_obj_refcount(&obj)); > > looks like a pretty clean solution. Right, fine, but if we change refcount from atomic to int, we still need to edit all code and replace atomic_inc/atomic_dec with lock/modify/unlock. Would static inline int get_obj_refcount(struct foo *obj) { int c; spin_lock_irq(&obj->lock); c = atomic_read(&obj->refcount); spin_unlock_irq(&obj->lock); return c; } be acceptable? -- MST From rdreier at cisco.com Mon May 8 09:47:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 09:47:32 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508164436.GP21036@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 8 May 2006 19:44:36 +0300") References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> Message-ID: Michael> I do lock/unlock *after* refcount is 0: after wait_event. Michael> Please tell me what I am missing. Sorry, I misread your patch and then miscopied it somehow. You're right, it would work. But I think changing atomic_t to an integer protected by a lock it much cleaner anyway. - R. From vuhuong at mellanox.com Mon May 8 09:50:08 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Mon, 08 May 2006 09:50:08 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: References: <44451D32.1010106@mellanox.com> <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> <445B9F3E.7060601@mellanox.com> <445BA0E8.7010104@mellanox.com> <445BDACE.8000205@mellanox.com> <445C3C5E.9080207@mellanox.com> <445F6F94.6000701@mellanox.com> Message-ID: <445F76C0.3080905@mellanox.com> Roland Dreier wrote: > Vu> scsi_eh_try_stu or scsi_eh_try_tur get timeout, scsi midlayer > Vu> tried to abort stu or tur command as well. Since we delay to > Vu> clean in srp_reset_device(), srp's request queue is still not > Vu> empty. This stu or tur command is freed by scsi midlayer. The > Vu> next srp_host_reset() will try to clean srp's request queue > Vu> with "old" request referencing to freed scsi command. > > This is where I get confused. We should be flushing out the command > queue in srp_host_reset(), so the loop in scsi_error.c after resetting > the host: > > rtn = scsi_try_host_reset(scmd); > if (rtn == SUCCESS) { > list_for_each_entry_safe(scmd, next, work_q, eh_entry) { > if (!scsi_device_online(scmd->device) || > (!scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd)) || > !scsi_eh_tur(scmd)) > scsi_eh_finish_cmd(scmd, done_q); > } > > should not find any commands still queued. > Have you read scsi_eh_try_stu(scmnd) and scsi_eh_tur(scmnd)? These functions use the same scmnd and reformat it with new cdb and call srp_queuecommand() which uses new req and put this new req in request queue for this same scmnd with different cdb > Your previous patch can't be the right fix. I think there are two > things wrong with the changes below to srp_reset_device(): > > - You changed srp_abort to remove the request from SRP's queue, but > then you look it up and use it again in srp_reset_device, which > seems risky at best. I aggree. In stead of looking for and using the same req's buffer to send task management packet, I need new buffer to send this task management then. > - If srp_reset_device() succeeds, you don't flush all matching > commands, so this will definitely leave some stale commands in > SRP's queue. > I already guarantee to flush all commands in srp_abort. Vu From mshefty at ichips.intel.com Mon May 8 09:51:27 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 08 May 2006 09:51:27 -0700 Subject: [openib-general] rdma_cm.h: comment nits. In-Reply-To: <20060508135855.GE21036@mellanox.co.il> References: <20060508135855.GE21036@mellanox.co.il> Message-ID: <445F770F.6080703@ichips.intel.com> Michael S. Tsirkin wrote: > Two nits wrt rdma_cm.h: > > /** > * * rdma_reject - Called on the passive side to reject a connection request. > */ > > > Its OK to call rdma_reject on active side as well, isn't it? Yes - but only for users that are managing the QP states themselves. > /** > * rdma_cm_event_handler - Callback used to report user events. > * > * Notes: Users may not call rdma_destroy_id from this callback to destroy > * the passed in id, or a corresponding listen id. Returning a > * non-zero value from the callback will destroy the corresponding id. > */ > > CMA will actually always destroy the passed in id, not the "corresponding id". Correct - the passed in id is destroyed. I will update the comments. - Sean From rdreier at cisco.com Mon May 8 09:54:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 09:54:07 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508164654.GQ21036@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 8 May 2006 19:46:54 +0300") References: <20060508054529.GC19660@mellanox.co.il> <1147104214.15524.13.camel@trinity.ogc.int> <20060508161947.GN21036@mellanox.co.il> <20060508162334.GO21036@mellanox.co.il> <20060508164654.GQ21036@mellanox.co.il> Message-ID: > static inline int get_obj_refcount(struct foo *obj) > { > int c; > > spin_lock_irq(&obj->lock); > c = atomic_read(&obj->refcount); > spin_unlock_irq(&obj->lock); > > return c; > } > > be acceptable? That looks pretty silly to me. Especially since you need the lock around the atomic_dec too. Changing the refcount to an int doesn't seem like that daunting a change to make... - R. From mst at mellanox.co.il Mon May 8 09:56:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 19:56:25 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> Message-ID: <20060508165625.GR21036@mellanox.co.il> Quoting r. Roland Dreier : > But I think changing atomic_t to an integer protected by a lock it > much cleaner anyway. We'll have to change each and every access to the locked version then though. It would be a big risky change. How about the following compromize? ---- Fix race condition in CM. use after free if ib_destroy_cm_id tests the refcount after cm_deref_id has decremented the reference count but before it has called wake_up. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.16/drivers/infiniband/core/cm.c =================================================================== --- linux-2.6.16.orig/drivers/infiniband/core/cm.c 2006-05-08 19:16:10.000000000 +0300 +++ linux-2.6.16/drivers/infiniband/core/cm.c 2006-05-08 19:19:05.000000000 +0300 @@ -159,8 +159,12 @@ static void cm_work_handler(void *data); static inline void cm_deref_id(struct cm_id_private *cm_id_priv) { + unsigned long flags; + + spin_lock_irqsave(&cm_id_priv->lock, flags); if (atomic_dec_and_test(&cm_id_priv->refcount)) wake_up(&cm_id_priv->wait); + spin_unlock_irqrestore(&cm_id_priv->lock, flags); } static int cm_alloc_msg(struct cm_id_private *cm_id_priv, @@ -710,6 +714,16 @@ static void cm_reset_to_idle(struct cm_i } } +static int cm_get_count(struct cm_id_private *cm_id_priv) +{ + int count; + /* Lock makes sure cm_deref_id is not in progress */ + spin_lock_irq(&cm_id_priv->lock); + count = atomic_read(&cm_id_priv->refcount) + spin_unlock_irq(&cm_id_priv->lock); + return count; +} + void ib_destroy_cm_id(struct ib_cm_id *cm_id) { struct cm_id_private *cm_id_priv; @@ -777,7 +791,8 @@ retest: cm_free_id(cm_id->local_id); atomic_dec(&cm_id_priv->refcount); - wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount)); + wait_event(cm_id_priv->wait, !cm_get_count(cm_id_priv)); + while ((work = cm_dequeue_work(cm_id_priv)) != NULL) cm_free_work(work); kfree(cm_id_priv->compare_data); -- MST From rdreier at cisco.com Mon May 8 09:57:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 09:57:08 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <445F76C0.3080905@mellanox.com> (Vu Pham's message of "Mon, 08 May 2006 09:50:08 -0700") References: <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> <445B9F3E.7060601@mellanox.com> <445BA0E8.7010104@mellanox.com> <445BDACE.8000205@mellanox.com> <445C3C5E.9080207@mellanox.com> <445F6F94.6000701@mellanox.com> <445F76C0.3080905@mellanox.com> Message-ID: Vu> Have you read scsi_eh_try_stu(scmnd) and scsi_eh_tur(scmnd)? Vu> These functions use the same scmnd and reformat it with new Vu> cdb and call srp_queuecommand() which uses new req and put Vu> this new req in request queue for this same scmnd with Vu> different cdb Yes, but why are there any commands in the work_q at all? In other words, why is this loop entered at all? list_for_each_entry_safe(scmd, next, work_q, eh_entry) { if (!scsi_device_online(scmd->device) || (!scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd)) || !scsi_eh_tur(scmd)) scsi_eh_finish_cmd(scmd, done_q); } srp_reconnect_target() should get rid of all queued commands already: list_for_each_entry(req, &target->req_queue, list) { req->scmnd->result = DID_RESET << 16; req->scmnd->scsi_done(req->scmnd); srp_unmap_data(req->scmnd, target, req); } why does the midlayer have any commands around after that loop? - R. From mshefty at ichips.intel.com Mon May 8 09:58:15 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 08 May 2006 09:58:15 -0700 Subject: [openib-general] Re: CMA disconnect In-Reply-To: References: Message-ID: <445F78A7.2020401@ichips.intel.com> Or Gerlitz wrote: > Looking in the code i have realized that it is a must for the CMA > consumer to call rdma_disconnect to have the QP state moved into ERROR. Maybe it would make sense for the CMA to transition the QP to the error state before destroying it? > Am i correct? with this understanding at hand, i have changed iSER code to > call rdma_disconnect even if it got a DISCONNECTED event caused by the > passive side initiating the disconnect flow (ie sending a DREQ), since > otherwise in such case i never got the FLUSHES on the posted RX/TX WRs. You are correct, rdma_disconnect() is required to transition the QP into the error state. The code is written such that both sides call it. - Sean From rdreier at cisco.com Mon May 8 10:00:38 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 10:00:38 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508165625.GR21036@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 8 May 2006 19:56:25 +0300") References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> Message-ID: Michael> We'll have to change each and every access to the locked Michael> version then though. It would be a big risky change. How Michael> about the following compromize? I have to disagree. I don't think it's big or risky at all... there's only 11 uses of refcount in cm.c, and one of them is initialization. If we can't get that right then we should just give up. - R. From mshefty at ichips.intel.com Mon May 8 10:02:23 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 08 May 2006 10:02:23 -0700 Subject: [openib-general] Re: question on rdma_disconnect In-Reply-To: References: Message-ID: <445F799F.6030907@ichips.intel.com> Or Gerlitz wrote: > + /* change the ib conn state only if the conn is UP, however always call > + * rdma_disconnect since this is the only way to cause the CMA to change > + * the QP state to ERROR > + */ I updated the comments in the header file to state that rdma_disconnect() transitions that QP into the error state. - Sean From mst at mellanox.co.il Mon May 8 10:06:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 20:06:54 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508054529.GC19660@mellanox.co.il> <1147104214.15524.13.camel@trinity.ogc.int> <20060508161947.GN21036@mellanox.co.il> <20060508162334.GO21036@mellanox.co.il> <20060508164654.GQ21036@mellanox.co.il> Message-ID: <20060508170654.GS21036@mellanox.co.il> Quoting r. Roland Dreier : > Changing the refcount to an int doesn't seem like that daunting a > change to make... My problem is, we have the same bug in: mthca mad_rmpp.c cma.c mad.c ucm.c ucma.c multicast.c So changing all refcounts from atomic to int and making sure no one keeps the lock when calling atomic_dec/atomic_inc is a big task. As opposed to this, the wake_up/wait_event idiom is easy to spot and just touching it is easier. And while the trivially correct variant that I posted would be 2.6.17 material (or even -stable if I start running into this crash a lot), this bigger change seems too risky. But - if you are up to it ... Patch? -- MST From mst at mellanox.co.il Mon May 8 10:10:44 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 20:10:44 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> Message-ID: <20060508171044.GT21036@mellanox.co.il> Quoting r. Roland Dreier : > I have to disagree. I don't think it's big or risky at all... there's > only 11 uses of refcount in cm.c, and one of them is initialization. It's easy to fix cm. But 12 other files (8 modules) have the same race. Did you consider this? If yes - it's your call, that's the way we should do it. -- MST From rdreier at cisco.com Mon May 8 10:10:58 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 10:10:58 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508170654.GS21036@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 8 May 2006 20:06:54 +0300") References: <20060508054529.GC19660@mellanox.co.il> <1147104214.15524.13.camel@trinity.ogc.int> <20060508161947.GN21036@mellanox.co.il> <20060508162334.GO21036@mellanox.co.il> <20060508164654.GQ21036@mellanox.co.il> <20060508170654.GS21036@mellanox.co.il> Message-ID: Michael> And while the trivially correct variant that I posted Michael> would be 2.6.17 material (or even -stable if I start Michael> running into this crash a lot), this bigger change seems Michael> too risky. Michael> But - if you are up to it ... Patch? I will fix things up gradually. But most of it is very low priority -- I doubt that anyone has ever hit any of the mthca races for example. - R. From rdreier at cisco.com Mon May 8 10:12:20 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 10:12:20 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508171044.GT21036@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 8 May 2006 20:10:44 +0300") References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <20060508171044.GT21036@mellanox.co.il> Message-ID: Michael> It's easy to fix cm. But 12 other files (8 modules) have Michael> the same race. Did you consider this? If yes - it's your Michael> call, that's the way we should do it. I definitely think we should do this the right way. It's not worth saving a few minutes at the cost of having atomic operations protected by a spinlock. - R. From mst at mellanox.co.il Mon May 8 10:15:36 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 20:15:36 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <1147104214.15524.13.camel@trinity.ogc.int> <20060508161947.GN21036@mellanox.co.il> <20060508162334.GO21036@mellanox.co.il> <20060508164654.GQ21036@mellanox.co.il> <20060508170654.GS21036@mellanox.co.il> Message-ID: <20060508171536.GU21036@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] cm refcount race fix > > Michael> And while the trivially correct variant that I posted > Michael> would be 2.6.17 material (or even -stable if I start > Michael> running into this crash a lot), this bigger change seems > Michael> too risky. > > Michael> But - if you are up to it ... Patch? > > I will fix things up gradually. But most of it is very low priority > -- I doubt that anyone has ever hit any of the mthca races for > example. Hmm ... with stress on CQ events we might get there :) I think I've hit the CM one, unlikely though it sounds. Maybe because I'm running with preempt on ... -- MST From vuhuong at mellanox.com Mon May 8 10:25:15 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Mon, 08 May 2006 10:25:15 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: References: <444566E7.8070907@mellanox.com> <444687E2.8020103@mellanox.com> <44490648.9070106@mellanox.com> <44491233.2010207@mellanox.com> <4449487A.3080004@mellanox.com> <445B9F3E.7060601@mellanox.com> <445BA0E8.7010104@mellanox.com> <445BDACE.8000205@mellanox.com> <445C3C5E.9080207@mellanox.com> <445F6F94.6000701@mellanox.com> <445F76C0.3080905@mellanox.com> Message-ID: <445F7EFB.3040006@mellanox.com> Roland Dreier wrote: > Vu> Have you read scsi_eh_try_stu(scmnd) and scsi_eh_tur(scmnd)? > Vu> These functions use the same scmnd and reformat it with new > Vu> cdb and call srp_queuecommand() which uses new req and put > Vu> this new req in request queue for this same scmnd with > Vu> different cdb > > Yes, but why are there any commands in the work_q at all? In other > words, why is this loop entered at all? > > list_for_each_entry_safe(scmd, next, work_q, eh_entry) { > if (!scsi_device_online(scmd->device) || > (!scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd)) || > !scsi_eh_tur(scmd)) > scsi_eh_finish_cmd(scmd, done_q); > } > The command is removed from work_q in scsi_eh_finish_cmd() if scsi_eh_abort_cmds() or scsi_eh_bus_device_reset() or scsi_eh_host_reset() return SUCCESS. This require at least one scsi_eh_tur or stu has to be successfully sent. In our case we failed scsi_eh_abort_cmds() and scsi_eh_bus_device_reset(). Therefore, if we enter scsi_eh_host_reset with 2 outstanding scmnd in work_q, we end up in this loop. > srp_reconnect_target() should get rid of all queued commands already: > > list_for_each_entry(req, &target->req_queue, list) { > req->scmnd->result = DID_RESET << 16; > req->scmnd->scsi_done(req->scmnd); > srp_unmap_data(req->scmnd, target, req); > } > > why does the midlayer have any commands around after that loop? > scmnd->scsi_done() does not guarantee the scmnd is removed from work_q of scsi midlayer in error handling flow. Moreover scmnd may be reformat by scsi milayer to send stu or tur command if any and scmnd->scsi_done is already changed to scsi_eh_done() by srp. From mst at mellanox.co.il Mon May 8 10:36:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 20:36:55 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <20060508171044.GT21036@mellanox.co.il> Message-ID: <20060508173655.GA24615@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] cm refcount race fix > > Michael> It's easy to fix cm. But 12 other files (8 modules) have > Michael> the same race. Did you consider this? If yes - it's your > Michael> call, that's the way we should do it. > > I definitely think we should do this the right way. It's not worth > saving a few minutes at the cost of having atomic operations protected > by a spinlock. > > - R. > Actually, in some places we can get rid of all the refcounting completely. Consider mthca_cq_event as an example - I think we can just: void mthca_cq_event(struct mthca_dev *dev, u32 cqn, enum ib_event_type event_type) { struct mthca_cq *cq; struct ib_event event; spin_lock(&dev->cq_table.lock); cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1)); if (!cq) { mthca_warn(dev, "Async event for bogus CQ %08x\n", cqn); goto unlock; } event.device = &dev->ib_dev; event.event = event_type; event.element.cq = &cq->ibcq; if (cq->ibcq.event_handler) cq->ibcq.event_handler(&event, cq->ibcq.cq_context); unlock: spin_unlock(&dev->cq_table.lock); } -- MST From mshefty at ichips.intel.com Mon May 8 10:49:00 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 08 May 2006 10:49:00 -0700 Subject: [openib-general] Re: CMA: compliancy issue? In-Reply-To: <20060508150534.GF21036@mellanox.co.il> References: <20060508085301.GD20207@mellanox.co.il> <445F30E4.9020109@voltaire.com> <20060508131156.GA21036@mellanox.co.il> <445F45A4.4080308@voltaire.com> <20060508133154.GC21036@mellanox.co.il> <445F4AA4.2020802@voltaire.com> <20060508134900.GD21036@mellanox.co.il> <445F54E1.8000305@voltaire.com> <20060508150534.GF21036@mellanox.co.il> Message-ID: <445F848C.2000502@ichips.intel.com> Michael S. Tsirkin wrote: > I was originally thinking along the lines of still using ESTABLISHED, and simply > delaying RTU till after the handler is called. We would then need to teach CMA > to perform reject instead of RTU if handler returns an error code. > > We even can have a flag to select the required behaviour, or even > behave specially for SDP, although I don't think this makes a lot of sense. > > Sean, what looks best to you? I need to think about this more. I do think that the current API will work fine, but would require SDP to manage the QP transitions, which really isn't that much work. I will need to see what can be done about separating QP transitions from event reporting, if SDP wants to see all events, but still let the CMA manage the QP transitions. There's already specific checking for SDP in the CMA, so extending that doesn't sound that too bad, (considering the alternative to SDP specific checks is duplicating functionality). I don't like the idea of using the return code from the user's handler to send a reject or RTU when there are API calls that the user could just invoke. - Sean From mshefty at ichips.intel.com Mon May 8 10:53:12 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 08 May 2006 10:53:12 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508173655.GA24615@mellanox.co.il> References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <20060508171044.GT21036@mellanox.co.il> <20060508173655.GA24615@mellanox.co.il> Message-ID: <445F8588.5070002@ichips.intel.com> Michael S. Tsirkin wrote: > void mthca_cq_event(struct mthca_dev *dev, u32 cqn, > enum ib_event_type event_type) > { > struct mthca_cq *cq; > struct ib_event event; > > spin_lock(&dev->cq_table.lock); > > cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1)); > > if (!cq) { > mthca_warn(dev, "Async event for bogus CQ %08x\n", cqn); > goto unlock; > } > > event.device = &dev->ib_dev; > event.event = event_type; > event.element.cq = &cq->ibcq; > if (cq->ibcq.event_handler) > cq->ibcq.event_handler(&event, cq->ibcq.cq_context); > > unlock: > spin_unlock(&dev->cq_table.lock); > } As a general rule, I always avoid holding a lock around a user's callback. It restricts what a user can do too much. - Sean From sweitzen at cisco.com Mon May 8 10:53:56 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 8 May 2006 10:53:56 -0700 Subject: [openib-general] IPoIB not working on ppc64 RHEL4 U3 w/OFED 1.0 rc4? Message-ID: I can't seem to get IPoIB to work (didn't try earlier rc) on this combo, does anyone have it working? [root at svbu-qa-js20-1 ~]# lsmod | grep ib_ipoib ib_ipoib 70296 0 ib_sa 29536 1 ib_ipoib ib_core 81720 3 ib_ipoib,ib_sa,ib_mad [root at svbu-qa-js20-1 ~]# uname -a Linux svbu-qa-js20-1 2.6.9-34.EL #1 SMP Fri Feb 24 16:46:57 EST 2006 ppc64 ppc64 ppc64 GNU/Linux [root at svbu-qa-js20-1 ~]# ifconfig ib0 ib0: error fetching interface information: Device not found Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Mon May 8 10:57:49 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 8 May 2006 10:57:49 -0700 Subject: [openib-general] IPoIB not working on ppc64 RHEL4 U3 w/OFED 1.0 rc4? Message-ID: Never mind, user error on my part. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Scott Weitzenkamp (sweitzen) Sent: Monday, May 08, 2006 10:54 AM To: openfabrics-ewg at openib.org Cc: openib-general at openib.org Subject: [openib-general] IPoIB not working on ppc64 RHEL4 U3 w/OFED 1.0 rc4? I can't seem to get IPoIB to work (didn't try earlier rc) on this combo, does anyone have it working? [root at svbu-qa-js20-1 ~]# lsmod | grep ib_ipoib ib_ipoib 70296 0 ib_sa 29536 1 ib_ipoib ib_core 81720 3 ib_ipoib,ib_sa,ib_mad [root at svbu-qa-js20-1 ~]# uname -a Linux svbu-qa-js20-1 2.6.9-34.EL #1 SMP Fri Feb 24 16:46:57 EST 2006 ppc64 ppc64 ppc64 GNU/Linux [root at svbu-qa-js20-1 ~]# ifconfig ib0 ib0: error fetching interface information: Device not found Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon May 8 11:01:23 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 11:01:23 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <445F8588.5070002@ichips.intel.com> (Sean Hefty's message of "Mon, 08 May 2006 10:53:12 -0700") References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <20060508171044.GT21036@mellanox.co.il> <20060508173655.GA24615@mellanox.co.il> <445F8588.5070002@ichips.intel.com> Message-ID: Sean> As a general rule, I always avoid holding a lock around a Sean> user's callback. It restricts what a user can do too much. Yes, I agree. I use that rule as well. - R. From mshefty at ichips.intel.com Mon May 8 11:14:44 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 08 May 2006 11:14:44 -0700 Subject: [openib-general] CMA: port 2 loopback problems In-Reply-To: <20060508132803.GB21036@mellanox.co.il> References: <20060508132803.GB21036@mellanox.co.il> Message-ID: <445F8A94.2080506@ichips.intel.com> Michael S. Tsirkin wrote: > Sean, I am seeing the following problem: I have a dual-port HCA > with IPoIB interfaces ib0 on port 1 and ib1 on port 2. > port 1 is down and port 2 is up, > and I try creating a connection to the loopback address 127.0.0.1. > > The problem I am seeing is that I am getting RDMA_CM_EVENT_ROUTE_ERROR. > Apparently CMA attempts address resolution through port 1, which fails. To be clear, loopback addresses are handled separately and do not perform address resolution. But they do use port 1 of the first local device in the CMA's device list. The problem is that path record lookup fails. Checking for a port that's active in cma_bind_loopback() seems like a reasonable approach for now. Is it possible to communicate between QPs on the same device if that device is disconnected from the fabric? - Sean From vuhuong at mellanox.com Mon May 8 11:26:50 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Mon, 08 May 2006 11:26:50 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: References: Message-ID: <445F8D6A.6050503@mellanox.com> > + dma_pages[page_cnt++] = > + (sg_dma_address(&scat[i]) & dev->fmr_page_mask) + j; > + This fmr patch does not work for ia64 system because this dev->fmr_page_mask is defined as unsigned int. We should type cast it to u64 or define it as unsigned long > + struct ib_fmr_pool *fmr_pool; > + int fmr_page_shift; > + int fmr_page_size; > + unsigned int fmr_page_mask; unsigned long fmr_page_mask Vu From rdreier at cisco.com Mon May 8 11:35:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 11:35:09 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <445F8D6A.6050503@mellanox.com> (Vu Pham's message of "Mon, 08 May 2006 11:26:50 -0700") References: <445F8D6A.6050503@mellanox.com> Message-ID: Vu> This fmr patch does not work for ia64 system because this Vu> fmr_page_mask is defined as unsigned int. Great catch! Vu> We should type cast it to u64 or define it as unsigned long Casting it won't help because it will just get zero-extended. I think we need the following in ib_srp.h: unsigned long fmr_page_mask; and then in ib_srp.c: srp_dev->fmr_page_mask = ~((unsigned long) srp_dev->fmr_page_size - 1); does this work for you? Thanks, Roland From rdreier at cisco.com Mon May 8 12:07:59 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 12:07:59 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508173655.GA24615@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 8 May 2006 20:36:55 +0300") References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <20060508171044.GT21036@mellanox.co.il> <20060508173655.GA24615@mellanox.co.il> Message-ID: It does seem we can simplify mthca_cq in a slightly different way. mthca_cq_clean() doesn't need to take a CQ reference, because we know the CQ can't go away before all associated QPs are gone, and at least one QP will stay around until mthca_cq_clean() returns. So the below patch is both a fix and a decent cleanup: --- infiniband/hw/mthca/mthca_provider.h (revision 6945) +++ infiniband/hw/mthca/mthca_provider.h (working copy) @@ -197,7 +197,7 @@ struct mthca_cq_resize { struct mthca_cq { struct ib_cq ibcq; spinlock_t lock; - atomic_t refcount; + int refcount; int cqn; u32 cons_index; struct mthca_cq_buf buf; --- infiniband/hw/mthca/mthca_dev.h (revision 6945) +++ infiniband/hw/mthca/mthca_dev.h (working copy) @@ -496,7 +496,7 @@ void mthca_free_cq(struct mthca_dev *dev void mthca_cq_completion(struct mthca_dev *dev, u32 cqn); void mthca_cq_event(struct mthca_dev *dev, u32 cqn, enum ib_event_type event_type); -void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn, +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, struct mthca_srq *srq); void mthca_cq_resize_copy_cqes(struct mthca_cq *cq); int mthca_alloc_cq_buf(struct mthca_dev *dev, struct mthca_cq_buf *buf, int nent); --- infiniband/hw/mthca/mthca_cq.c (revision 6945) +++ infiniband/hw/mthca/mthca_cq.c (working copy) @@ -234,14 +234,19 @@ void mthca_cq_event(struct mthca_dev *de { struct mthca_cq *cq; struct ib_event event; + unsigned long flags; - spin_lock(&dev->cq_table.lock); + spin_lock_irqsave(&dev->cq_table.lock, flags); cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1)); - if (cq) - atomic_inc(&cq->refcount); - spin_unlock(&dev->cq_table.lock); + if (cq) { + spin_lock(&cq->lock); + ++cq->refcount; + spin_unlock(&cq->lock); + } + + spin_unlock_irqrestore(&dev->cq_table.lock, flags); if (!cq) { mthca_warn(dev, "Async event for bogus CQ %08x\n", cqn); @@ -254,8 +259,10 @@ void mthca_cq_event(struct mthca_dev *de if (cq->ibcq.event_handler) cq->ibcq.event_handler(&event, cq->ibcq.cq_context); - if (atomic_dec_and_test(&cq->refcount)) + spin_lock_irqsave(&cq->lock, flags); + if (!--cq->refcount) wake_up(&cq->wait); + spin_unlock_irqrestore(&cq->lock, flags); } static inline int is_recv_cqe(struct mthca_cqe *cqe) @@ -267,23 +274,13 @@ static inline int is_recv_cqe(struct mth return !(cqe->is_send & 0x80); } -void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn, +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, struct mthca_srq *srq) { - struct mthca_cq *cq; struct mthca_cqe *cqe; u32 prod_index; int nfreed = 0; - spin_lock_irq(&dev->cq_table.lock); - cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1)); - if (cq) - atomic_inc(&cq->refcount); - spin_unlock_irq(&dev->cq_table.lock); - - if (!cq) - return; - spin_lock_irq(&cq->lock); /* @@ -301,7 +298,7 @@ void mthca_cq_clean(struct mthca_dev *de if (0) mthca_dbg(dev, "Cleaning QPN %06x from CQN %06x; ci %d, pi %d\n", - qpn, cqn, cq->cons_index, prod_index); + qpn, cq->cqn, cq->cons_index, prod_index); /* * Now sweep backwards through the CQ, removing CQ entries @@ -323,10 +320,6 @@ void mthca_cq_clean(struct mthca_dev *de cq->cons_index += nfreed; update_cons_index(dev, cq, nfreed); } - - spin_unlock_irq(&cq->lock); - if (atomic_dec_and_test(&cq->refcount)) - wake_up(&cq->wait); } void mthca_cq_resize_copy_cqes(struct mthca_cq *cq) @@ -821,7 +814,7 @@ int mthca_init_cq(struct mthca_dev *dev, } spin_lock_init(&cq->lock); - atomic_set(&cq->refcount, 1); + cq->refcount = 1; init_waitqueue_head(&cq->wait); memset(cq_context, 0, sizeof *cq_context); @@ -896,6 +889,17 @@ err_out: return err; } +static int get_cq_refcount(struct mthca_cq *cq) +{ + int c; + + spin_lock_irq(&cq->lock); + c = cq->refcount; + spin_unlock_irq(&cq->lock); + + return c; +} + void mthca_free_cq(struct mthca_dev *dev, struct mthca_cq *cq) { @@ -936,8 +940,11 @@ void mthca_free_cq(struct mthca_dev *dev else synchronize_irq(dev->pdev->irq); - atomic_dec(&cq->refcount); - wait_event(cq->wait, !atomic_read(&cq->refcount)); + spin_lock_irq(&cq->lock); + --cq->refcount; + spin_unlock_irq(&cq->lock); + + wait_event(cq->wait, !get_cq_refcount(cq)); if (cq->is_kernel) { mthca_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe); --- infiniband/hw/mthca/mthca_qp.c (revision 6945) +++ infiniband/hw/mthca/mthca_qp.c (working copy) @@ -831,10 +831,10 @@ int mthca_modify_qp(struct ib_qp *ibqp, * entries and reinitialize the QP. */ if (new_state == IB_QPS_RESET && !qp->ibqp.uobject) { - mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); if (qp->ibqp.send_cq != qp->ibqp.recv_cq) - mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); mthca_wq_init(&qp->sq); @@ -1356,10 +1356,10 @@ void mthca_free_qp(struct mthca_dev *dev * unref the mem-free tables and free the QPN in our table. */ if (!qp->ibqp.uobject) { - mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); if (qp->ibqp.send_cq != qp->ibqp.recv_cq) - mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); mthca_free_memfree(dev, qp); From vuhuong at mellanox.com Mon May 8 12:12:07 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Mon, 08 May 2006 12:12:07 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: References: <445F8D6A.6050503@mellanox.com> Message-ID: <445F9807.7050106@mellanox.com> Roland Dreier wrote: > Vu> This fmr patch does not work for ia64 system because this > Vu> fmr_page_mask is defined as unsigned int. > > Great catch! > > Vu> We should type cast it to u64 or define it as unsigned long > > Casting it won't help because it will just get zero-extended. I think > we need the following in ib_srp.h: > > unsigned long fmr_page_mask; > > and then in ib_srp.c: > > srp_dev->fmr_page_mask = ~((unsigned long) srp_dev->fmr_page_size - 1); > > does this work for you? > Yes. Please commit the final fmr patch. Thanks, Vu From mshefty at ichips.intel.com Mon May 8 12:37:23 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 08 May 2006 12:37:23 -0700 Subject: [openib-general] [PATCH] cm refcount race fix In-Reply-To: <20060508054529.GC19660@mellanox.co.il> References: <20060508054529.GC19660@mellanox.co.il> Message-ID: <445F9DF3.7070800@ichips.intel.com> Michael S. Tsirkin wrote: > static inline void cm_deref_id(struct cm_id_private *cm_id_priv) > { > + unsigned long flags; > + > + spin_lock_irqsave(&cm_id_priv->lock, flags); > if (atomic_dec_and_test(&cm_id_priv->refcount)) > wake_up(&cm_id_priv->wait); > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > } I don't have a fix for this yet, but the basic problem is that the code releases the reference on the cm_id_priv, then immediately accesses it on the next line. Maybe there's a way to have wait object separate from the cm_id? The way this is used, we almost want the wait object hidden. - Sean From tom at opengridcomputing.com Mon May 8 12:39:58 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Mon, 08 May 2006 14:39:58 -0500 Subject: [openib-general] Re: CMA disconnect In-Reply-To: <445F78A7.2020401@ichips.intel.com> References: <445F78A7.2020401@ichips.intel.com> Message-ID: <1147117198.15524.22.camel@trinity.ogc.int> On Mon, 2006-05-08 at 09:58 -0700, Sean Hefty wrote: > Or Gerlitz wrote: > > Looking in the code i have realized that it is a must for the CMA > > consumer to call rdma_disconnect to have the QP state moved into ERROR. > > Maybe it would make sense for the CMA to transition the QP to the error state > before destroying it? FYI: For iWARP, moving the QP to ERR is a CM RESET, and moving it to SQD is a graceful shutdown. Keeping these differences hidden from the app inside rdma_disconnect is probably a good idea. > > > Am i correct? with this understanding at hand, i have changed iSER code to > > call rdma_disconnect even if it got a DISCONNECTED event caused by the > > passive side initiating the disconnect flow (ie sending a DREQ), since > > otherwise in such case i never got the FLUSHES on the posted RX/TX WRs. > > You are correct, rdma_disconnect() is required to transition the QP into the > error state. The code is written such that both sides call it. > > - Sean > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Mon May 8 12:47:05 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 22:47:05 +0300 Subject: [openib-general] Re: CMA: compliancy issue? In-Reply-To: <445F848C.2000502@ichips.intel.com> References: <20060508085301.GD20207@mellanox.co.il> <445F30E4.9020109@voltaire.com> <20060508131156.GA21036@mellanox.co.il> <445F45A4.4080308@voltaire.com> <20060508133154.GC21036@mellanox.co.il> <445F4AA4.2020802@voltaire.com> <20060508134900.GD21036@mellanox.co.il> <445F54E1.8000305@voltaire.com> <20060508150534.GF21036@mellanox.co.il> <445F848C.2000502@ichips.intel.com> Message-ID: <20060508194705.GA25527@mellanox.co.il> Quoting r. Sean Hefty : > I don't like the idea of using the return code from the user's handler to > send a reject or RTU when there are API calls that the user could just > invoke. Fine, but rdma_reject will have to alter the state then, to avoid RTU after REJ. -- MST From mst at mellanox.co.il Mon May 8 12:47:50 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 22:47:50 +0300 Subject: [openib-general] CMA: port 2 loopback problems In-Reply-To: <445F8A94.2080506@ichips.intel.com> References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> Message-ID: <20060508194750.GB25527@mellanox.co.il> Quoting r. Sean Hefty : > Is it possible to communicate between QPs on > the same device if that device is disconnected from the fabric? Yes. -- MST From mshefty at ichips.intel.com Mon May 8 12:55:50 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 08 May 2006 12:55:50 -0700 Subject: [openib-general] Re: CMA: compliancy issue? In-Reply-To: <20060508194705.GA25527@mellanox.co.il> References: <20060508085301.GD20207@mellanox.co.il> <445F30E4.9020109@voltaire.com> <20060508131156.GA21036@mellanox.co.il> <445F45A4.4080308@voltaire.com> <20060508133154.GC21036@mellanox.co.il> <445F4AA4.2020802@voltaire.com> <20060508134900.GD21036@mellanox.co.il> <445F54E1.8000305@voltaire.com> <20060508150534.GF21036@mellanox.co.il> <445F848C.2000502@ichips.intel.com> <20060508194705.GA25527@mellanox.co.il> Message-ID: <445FA246.5090106@ichips.intel.com> Michael S. Tsirkin wrote: > Fine, but rdma_reject will have to alter the state then, to avoid RTU > after REJ. It should be able to work just as it does for userspace. The user either calls accept or reject. The IB CM will not send an RTU if a REJ has been sent. I think that the real issue here is that the CMA determines whether to hand the REP to the user based on if it has a QP. This was done to support userspace. SDP wants the CMA to manage its QP, but still wants to see the REP. What I need to determine is if it's easier/better for SDP to manage the QP itself, or to change the CMA to expose the REP when it manages the QP states. - Sean From sashak at voltaire.com Mon May 8 13:00:29 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 08 May 2006 23:00:29 +0300 Subject: [openib-general] [PATCH 0/2] opensm: low-level QoS implementation Message-ID: <20060508200029.28763.96450.stgit@sashak.voltaire.com> Hello, There is support for low level Quality of Service (QoS) parameters configuration and setup in OpenSM. Please comment. Thanks. Sasha. From sashak at voltaire.com Mon May 8 13:03:45 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 08 May 2006 23:03:45 +0300 Subject: [openib-general] [PATCH 1/2] opensm: low-level QoS configuration In-Reply-To: <20060508200029.28763.96450.stgit@sashak.voltaire.com> References: <20060508200029.28763.96450.stgit@sashak.voltaire.com> Message-ID: <20060508200344.28763.9861.stgit@sashak.voltaire.com> Trivial low-level QoS configuration parameters description, definition and processing. Signed-off-by: Sasha Khapyorsky --- osm/doc/qos-config.txt | 44 +++++++++++++ osm/include/opensm/osm_subnet.h | 81 ++++++++++++++++++++++++ osm/opensm/osm_subnet.c | 133 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 258 insertions(+), 0 deletions(-) diff --git a/osm/doc/qos-config.txt b/osm/doc/qos-config.txt new file mode 100644 index 0000000..d40dfb9 --- /dev/null +++ b/osm/doc/qos-config.txt @@ -0,0 +1,44 @@ +Trivial low level QoS configuration proposition. +=============================================== + +Basically we have set of QoS related low-level configuration parameters. +All those parameter names are prefixed by "qos_" string. There is full +list of such parameters: + + qos_max_vls - The number of maximum VLs will be on the Subnet + qos_high_limit - The limit of High Priority component of VL Arbitration + table (IBA 7.6.9) + qos_vlarb_low - High priority VL Arbitration table (IBA 7.6.9) template. + qos_vlarb_high - Low priority VL Arbitration table (IBA 7.6.9) template. + Both VL arbitration templates are pairs of VL and weight. + qos_sl2vl - SL2VL Mapping table (IBA 7.6.6) template. It is a list + of VLs corresponding to SLs 0-15. (Note the VL15 used + here means drop this SL). + +Typical default values (hard-coded in OpenSM initialization) are: + + qos_max_vls=15 + qos_high_limit=0 + qos_vlarb_low=0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0 + qos_vlarb_high=0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4 + qos_sl2vl=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 + +The syntax is compatible with rest of OpenSM configuration options and +values may be stored in OpenSM config file (cached options file). + +In addition to above we may to define separate QoS configuration +parameters sets for various target types. As targets we currently support +HCA, routers, switch external ports and switch's enhanced port 0. The +names of such specialized parameters are prefixed by "qos__" +string. There is full list of currently supported sets: + + qos_hca_ - QoS configuration parameters set for HCAs. + qos_rtr_ - parameters set for routers. + qos_sw0_ - parameters set for switches' port 0. + qos_swe_ - parameters set for switches' external ports. + +Examples: + + qos_sw0_max_vls=2 + qos_hca_sl2vl=0,1,2,3,5,5,5,12,12,0, + qos_swe_high_limit=0 diff --git a/osm/include/opensm/osm_subnet.h b/osm/include/opensm/osm_subnet.h index 767e598..0da3f0c 100644 --- a/osm/include/opensm/osm_subnet.h +++ b/osm/include/opensm/osm_subnet.h @@ -180,6 +180,44 @@ typedef enum _osm_testability_modes } osm_testability_modes_t; /***********/ +/****s* OpenSM: Subnet/osm_qos_options_t +* NAME +* osm_qos_options_t +* +* DESCRIPTION +* Subnet QoS options structure. This structure contains the various +* QoS specific configuration parameters for the subnet. +* +* SYNOPSIS +*/ +typedef struct _osm_qos_options_t { + unsigned max_vls; + unsigned high_limit; + char *vlarb_high; + char *vlarb_low; + char *sl2vl; +} osm_qos_options_t; +/* +* FIELDS +* +* max_vls +* The number of maximum VLs on the Subnet +* +* high_limit +* The limit of High Priority component of VL Arbitration +* table (IBA 7.6.9) +* +* vlarb_high +* High priority VL Arbitration table template. +* +* vlarb_low +* Low priority VL Arbitration table template. +* +* sl2vl +* SL2VL Mapping table (IBA 7.6.6) template. +* +*********/ + /****s* OpenSM: Subnet/osm_subn_opt_t * NAME * osm_subn_opt_t @@ -242,6 +280,10 @@ typedef struct _osm_subn_opt char * updn_guid_file; boolean_t exit_on_fatal; boolean_t honor_guid2lid_file; + osm_qos_options_t qos_options; + osm_qos_options_t qos_hca_options; + osm_qos_options_t qos_sw0_options; + osm_qos_options_t qos_swe_options; } osm_subn_opt_t; /* * FIELDS @@ -394,6 +436,18 @@ typedef struct _osm_subn_opt * the file will be honored when SM is coming out of STANDBY. * By default this is FALSE. * +* qos_options +* Default set of QoS options +* +* qos_hca_options +* QoS options for HCA ports +* +* qos_sw0_options +* QoS options for switches' port 0 +* +* qos_swe_options +* QoS options for switches' external ports +* * SEE ALSO * Subnet object *********/ @@ -1016,6 +1070,33 @@ osm_subn_parse_conf_file( * osm_subn_is_inited *********/ +/****f* OpenSM: Subnet/osm_subn_parse_conf_file +* NAME +* osm_subn_rescan_conf_file +* +* DESCRIPTION +* The osm_subn_rescan_conf_file function parses the configuration +* file and update selected subnet options +* +* SYNOPSIS +*/ +void +osm_subn_rescan_conf_file( + IN osm_subn_opt_t* const p_opts ); +/* +* PARAMETERS +* +* p_opt +* [in] Pointer to the subnet options structure. +* +* RETURN VALUES +* None +* +* NOTES +* This uses the same file as osm_subn_parse_conf_file() +* +*********/ + /****f* OpenSM: Subnet/osm_subn_write_conf_file * NAME * osm_subn_write_conf_file diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c index ef64e05..f155df7 100644 --- a/osm/opensm/osm_subnet.c +++ b/osm/opensm/osm_subnet.c @@ -398,6 +398,19 @@ osm_get_port_by_guid( /********************************************************************** **********************************************************************/ +static void +subn_set_default_qos_options( + IN osm_qos_options_t *opt) +{ + opt->max_vls = 15; + opt->high_limit = 0; + opt->vlarb_high = "0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0"; + opt->vlarb_low = "0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4"; + opt->sl2vl = "0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7"; +} + +/********************************************************************** + **********************************************************************/ void osm_subn_set_default_opt( IN osm_subn_opt_t* const p_opt ) @@ -457,6 +470,10 @@ osm_subn_set_default_opt( p_opt->updn_activate = FALSE; p_opt->updn_guid_file = NULL; p_opt->exit_on_fatal = TRUE; + subn_set_default_qos_options(&p_opt->qos_options); + subn_set_default_qos_options(&p_opt->qos_hca_options); + subn_set_default_qos_options(&p_opt->qos_sw0_options); + subn_set_default_qos_options(&p_opt->qos_swe_options); } /********************************************************************** @@ -619,6 +636,95 @@ __osm_subn_opts_unpack_charp( /********************************************************************** **********************************************************************/ +static void +subn_parse_qos_options( + IN const char *prefix, + IN char *p_key, + IN char *p_val_str, + IN osm_qos_options_t *opt) +{ + char name[256]; + snprintf(name, sizeof(name), "%s_max_vls", prefix); + __osm_subn_opts_unpack_uint32(name, p_key, p_val_str, &opt->max_vls); + snprintf(name, sizeof(name), "%s_high_limit", prefix); + __osm_subn_opts_unpack_uint32(name, p_key, p_val_str, &opt->high_limit); + snprintf(name, sizeof(name), "%s_vlarb_high", prefix); + __osm_subn_opts_unpack_charp(name, p_key, p_val_str, &opt->vlarb_high); + snprintf(name, sizeof(name), "%s_vlarb_low", prefix); + __osm_subn_opts_unpack_charp(name, p_key, p_val_str, &opt->vlarb_low); + snprintf(name, sizeof(name), "%s_sl2vl", prefix); + __osm_subn_opts_unpack_charp(name, p_key, p_val_str, &opt->sl2vl); +} + +static int +subn_dump_qos_options( + FILE *file, + const char *set_name, + const char *prefix, + osm_qos_options_t *opt) +{ + return fprintf(file, "# %s\n" + "%s_max_vls %u\n" + "%s_high_limit %u\n" + "%s_vlarb_high %s\n" + "%s_vlarb_low %s\n" + "%s_sl2vl %s\n", + set_name, + prefix, opt->max_vls, + prefix, opt->high_limit, + prefix, opt->vlarb_high, + prefix, opt->vlarb_low, + prefix, opt->sl2vl); +} + +/********************************************************************** + **********************************************************************/ +void +osm_subn_rescan_conf_file( + IN osm_subn_opt_t* const p_opts ) +{ + char *p_cache_dir = getenv("OSM_CACHE_DIR"); + char file_name[256]; + FILE *opts_file; + char line[1024]; + char *p_key, *p_val ,*p_last; + + /* try to open the options file from the cache dir */ + if (! p_cache_dir) p_cache_dir = OSM_DEFAULT_CACHE_DIR; + + strcpy(file_name, p_cache_dir); + strcat(file_name,"opensm.opts"); + + opts_file = fopen(file_name, "r"); + if (!opts_file) return; + + while (fgets(line, 1023, opts_file) != NULL) + { + /* get the first token */ + p_key = strtok_r(line, " \t\n", &p_last); + if (p_key) + { + p_val = strtok_r(NULL, " \t\n", &p_last); + + subn_parse_qos_options("qos", + p_key, p_val, &p_opts->qos_options); + + subn_parse_qos_options("qos_hca", + p_key, p_val, &p_opts->qos_hca_options); + + subn_parse_qos_options("qos_sw0", + p_key, p_val, &p_opts->qos_sw0_options); + + subn_parse_qos_options("qos_swe", + p_key, p_val, &p_opts->qos_swe_options); + + } + } + fclose(opts_file); +} + +/********************************************************************** + **********************************************************************/ void osm_subn_parse_conf_file( IN osm_subn_opt_t* const p_opts ) @@ -802,6 +908,18 @@ osm_subn_parse_conf_file( "honor_guid2lid_file", p_key, p_val, &p_opts->honor_guid2lid_file); + subn_parse_qos_options("qos", + p_key, p_val, &p_opts->qos_options); + + subn_parse_qos_options("qos_hca", + p_key, p_val, &p_opts->qos_hca_options); + + subn_parse_qos_options("qos_sw0", + p_key, p_val, &p_opts->qos_sw0_options); + + subn_parse_qos_options("qos_swe", + p_key, p_val, &p_opts->qos_swe_options); + } } fclose(opts_file); @@ -997,6 +1115,21 @@ osm_subn_write_conf_file( p_opts->exit_on_fatal ? "TRUE" : "FALSE" ); + fprintf( + opts_file, + "#\n# QoS OPTIONS\n#\n\n"); + subn_dump_qos_options(opts_file, + "QoS default options", "qos", &p_opts->qos_options); + fprintf(opts_file, "\n"); + subn_dump_qos_options(opts_file, + "QoS HCA options", "qos_hca", &p_opts->qos_hca_options); + fprintf(opts_file, "\n"); + subn_dump_qos_options(opts_file, + "QoS Switch Port 0 options", "qos_sw0", &p_opts->qos_sw0_options); + fprintf(opts_file, "\n"); + subn_dump_qos_options(opts_file, + "QoS Switch external ports options", "qos_swe", &p_opts->qos_swe_options); + /* optional string attributes ... */ fclose(opts_file); From sashak at voltaire.com Mon May 8 13:03:47 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 08 May 2006 23:03:47 +0300 Subject: [openib-general] [PATCH 2/2] opensm: basic QoS implementation In-Reply-To: <20060508200029.28763.96450.stgit@sashak.voltaire.com> References: <20060508200029.28763.96450.stgit@sashak.voltaire.com> Message-ID: <20060508200347.28763.78213.stgit@sashak.voltaire.com> Basic low-level QoS implementation. The main procedure (osm_qos_setup()) will be called from resweeper (after configuration refreshing). And then this will setup low level QoS related ports' attributes (PortInfo:VLHighLimit, VL*Arbitration and SL2VLMapping tables). Different port categories (HCA, switch external ports and switch port 0) will be updated according to provided configurations. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_madw.h | 1 osm/opensm/Makefile.am | 2 osm/opensm/osm_qos.c | 439 +++++++++++++++++++++++++++++++++++++++++ osm/opensm/osm_state_mgr.c | 11 + 4 files changed, 452 insertions(+), 1 deletions(-) diff --git a/osm/include/opensm/osm_madw.h b/osm/include/opensm/osm_madw.h index 5b4ddab..4d92db4 100644 --- a/osm/include/opensm/osm_madw.h +++ b/osm/include/opensm/osm_madw.h @@ -352,6 +352,7 @@ typedef union _osm_madw_context osm_smi_context_t smi_context; osm_slvl_context_t slvl_context; osm_pkey_context_t pkey_context; + osm_vla_context_t vla_context; #ifndef OSM_VENDOR_INTF_OPENIB osm_arbitrary_context_t arb_context; #endif diff --git a/osm/opensm/Makefile.am b/osm/opensm/Makefile.am index e396dcf..ebb6295 100644 --- a/osm/opensm/Makefile.am +++ b/osm/opensm/Makefile.am @@ -81,7 +81,7 @@ opensm_SOURCES = main.c osm_console.c os osm_state_mgr_ctrl.c osm_subnet.c \ osm_sweep_fail_ctrl.c osm_sw_info_rcv.c \ osm_sw_info_rcv_ctrl.c osm_switch.c \ - osm_prtn.c osm_prtn_config.c \ + osm_prtn.c osm_prtn_config.c osm_qos.c \ osm_trap_rcv.c osm_trap_rcv_ctrl.c \ osm_ucast_mgr.c osm_ucast_updn.c \ osm_vl15intf.c osm_vl_arb_rcv.c \ diff --git a/osm/opensm/osm_qos.c b/osm/opensm/osm_qos.c new file mode 100644 index 0000000..be27b40 --- /dev/null +++ b/osm/opensm/osm_qos.c @@ -0,0 +1,439 @@ +/* + * Copyright (c) 2006 Voltaire, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id$ + */ + +/* + * Abstract: + * Implementation of OpenSM QoS infrastructure primitives + * + * Environment: + * Linux User Mode + * + * $Revision$ + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include + +#include +#include +#include +#include +#include + +struct qos_config { + uint8_t max_vls; + uint8_t vl_high_limit; + ib_vl_arb_table_t vlarb_high[2]; + ib_vl_arb_table_t vlarb_low[2]; + ib_slvl_table_t sl2vl; +}; + +static void qos_build_config(struct qos_config *cfg, + osm_qos_options_t * opt, osm_qos_options_t * dflt); + +/* + * QoS primitives + * + */ + +static ib_api_status_t vlarb_update_table_block(osm_req_t * p_req, + osm_physp_t * p, + unsigned port_num, + const ib_vl_arb_table_t *table_block, + unsigned block_length, + unsigned block_num) +{ + ib_vl_arb_table_t block; + osm_madw_context_t context; + uint32_t attr_mod; + ib_port_info_t *p_pi; + unsigned vl_mask; + int i; + + if (!(p_pi = osm_physp_get_port_info_ptr(p))) + return IB_ERROR; + + vl_mask = (1 << (ib_port_info_get_op_vls(p_pi) - 1)) - 1; + + cl_memset(&block, 0, sizeof(block)); + cl_memcpy(&block, table_block, + block_length * sizeof(block.vl_entry[0])); + for (i = 0; i < block_length; i++) + block.vl_entry[i].vl &= vl_mask; + + if (!cl_memcmp(&p->vl_arb[block_num], &block, + block_length * sizeof(block.vl_entry[0]))) + return IB_SUCCESS; + + context.vla_context.node_guid = + osm_node_get_node_guid(osm_physp_get_node_ptr(p)); + context.vla_context.port_guid = osm_physp_get_port_guid(p); + context.vla_context.set_method = TRUE; + attr_mod = ((block_num + 1) << 16) | port_num; + + return osm_req_set(p_req, osm_physp_get_dr_path_ptr(p), + (uint8_t *) & block, sizeof(block), + IB_MAD_ATTR_VL_ARBITRATION, + cl_hton32(attr_mod), CL_DISP_MSGID_NONE, &context); +} + +static ib_api_status_t vlarb_update(osm_req_t * p_req, + osm_physp_t * p, unsigned port_num, + const struct qos_config *qcfg) +{ + ib_api_status_t status = IB_SUCCESS; + ib_port_info_t *p_pi; + unsigned len; + + if (!(p_pi = osm_physp_get_port_info_ptr(p))) + return IB_ERROR; + + if (p_pi->vl_arb_low_cap > 0) { + len = p_pi->vl_arb_low_cap < IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK ? + p_pi->vl_arb_low_cap : IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; + if ((status = vlarb_update_table_block(p_req, p, port_num, + &qcfg->vlarb_low[0], + len, 0)) != IB_SUCCESS) + return status; + } + if (p_pi->vl_arb_low_cap > IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK) { + len = p_pi->vl_arb_low_cap % IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; + if ((status = vlarb_update_table_block(p_req, p, port_num, + &qcfg->vlarb_low[1], + len, 1)) != IB_SUCCESS) + return status; + } + if (p_pi->vl_arb_high_cap > 0) { + len = p_pi->vl_arb_high_cap < IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK ? + p_pi->vl_arb_high_cap : IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; + if ((status = vlarb_update_table_block(p_req, p, port_num, + &qcfg->vlarb_high[0], + len, 2)) != IB_SUCCESS) + return status; + } + if (p_pi->vl_arb_high_cap > IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK) { + len = p_pi->vl_arb_high_cap % IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; + if ((status = vlarb_update_table_block(p_req, p, port_num, + &qcfg->vlarb_high[1], + len, 3)) != IB_SUCCESS) + return status; + } + + return status; +} + +static ib_api_status_t sl2vl_update_table(osm_req_t * p_req, + osm_physp_t * p, uint8_t in_port, + uint8_t out_port, + const ib_slvl_table_t * sl2vl_table) +{ + osm_madw_context_t context; + ib_slvl_table_t tbl, *p_tbl; + osm_node_t *p_node = osm_physp_get_node_ptr(p); + uint32_t attr_mod; + ib_port_info_t *p_pi; + unsigned vl_mask; + uint8_t vl1, vl2; + int i; + + if (!(p_pi = osm_physp_get_port_info_ptr(p))) + return IB_ERROR; + + vl_mask = (1 << (ib_port_info_get_op_vls(p_pi) - 1)) - 1; + + for (i = 0; i < IB_MAX_NUM_VLS / 2; i++) { + vl1 = sl2vl_table->raw_vl_by_sl[i] >> 4; + vl2 = sl2vl_table->raw_vl_by_sl[i] & 0xf; + if (vl1 != 15) + vl1 &= vl_mask; + if (vl2 != 15) + vl2 &= vl_mask; + tbl.raw_vl_by_sl[i] = (vl1 << 4 ) | vl2 ; + } + + p_tbl = osm_physp_get_slvl_tbl(p, in_port); + if (p_tbl && !cl_memcmp(p_tbl, &tbl, sizeof(tbl))) + return IB_SUCCESS; + + context.slvl_context.node_guid = osm_node_get_node_guid(p_node); + context.slvl_context.port_guid = osm_physp_get_port_guid(p); + context.slvl_context.set_method = TRUE; + attr_mod = in_port << 8 | out_port; + return osm_req_set(p_req, osm_physp_get_dr_path_ptr(p), + (uint8_t *) & tbl, sizeof(tbl), + IB_MAD_ATTR_SLVL_TABLE, + cl_hton32(attr_mod), CL_DISP_MSGID_NONE, &context); +} + +static ib_api_status_t sl2vl_update(osm_req_t * p_req, + osm_physp_t * p, unsigned port_num, + const struct qos_config *qcfg) +{ + ib_api_status_t status; + unsigned i, num_ports; + ib_port_info_t *p_pi = osm_physp_get_port_info_ptr(p); + + if (p_pi && !(p_pi->capability_mask & IB_PORT_CAP_HAS_SL_MAP)) + return IB_SUCCESS; + + if (osm_node_get_type(osm_physp_get_node_ptr(p)) == IB_NODE_TYPE_SWITCH) + num_ports = osm_node_get_num_physp(osm_physp_get_node_ptr(p)); + else + num_ports = 1; + + for (i = 0; i < num_ports; i++) { + status = + sl2vl_update_table(p_req, p, i, port_num, &qcfg->sl2vl); + if (status != IB_SUCCESS) + return status; + } + + return IB_SUCCESS; +} + +static ib_api_status_t vl_high_limit_update(osm_req_t * p_req, + osm_physp_t * p, + const struct qos_config *qcfg) +{ + uint8_t payload[IB_SMP_DATA_SIZE]; + osm_madw_context_t context; + ib_port_info_t *p_pi; + + if (!(p_pi = osm_physp_get_port_info_ptr(p))) + return IB_ERROR; + + if (p_pi->vl_high_limit == qcfg->vl_high_limit) + return IB_SUCCESS; + + cl_memclr(payload, IB_SMP_DATA_SIZE); + cl_memcpy(payload, p_pi, sizeof(ib_port_info_t)); + + p_pi = (ib_port_info_t *) payload; + p_pi->state_info2 = 0; + ib_port_info_set_port_state(p_pi, IB_LINK_NO_CHANGE); + + p_pi->vl_high_limit = qcfg->vl_high_limit; + + context.pi_context.node_guid = + osm_node_get_node_guid(osm_physp_get_node_ptr(p)); + context.pi_context.port_guid = osm_physp_get_port_guid(p); + context.pi_context.set_method = TRUE; + context.pi_context.update_master_sm_base_lid = FALSE; + context.pi_context.ignore_errors = FALSE; + context.pi_context.light_sweep = FALSE; + + return osm_req_set(p_req, osm_physp_get_dr_path_ptr(p), + payload, sizeof(payload), IB_MAD_ATTR_PORT_INFO, + cl_hton32(osm_physp_get_port_num(p)), + CL_DISP_MSGID_NONE, &context); +} + +static ib_api_status_t qos_physp_setup(osm_log_t * p_log, osm_req_t * p_req, + osm_physp_t * p, unsigned port_num, + const struct qos_config *qcfg) +{ + ib_api_status_t status; + + /* OpVLs should be ok at this moment - just use it */ + + /* setup vl high limit */ + status = vl_high_limit_update(p_req, p, qcfg); + if (status != IB_SUCCESS) { + osm_log(p_log, OSM_LOG_ERROR, "qos_physp_setup: " + "failed to update VLHighLimit " + "for port %" PRIx64 " #%d\n", + cl_ntoh64(p->port_guid), port_num); + return status; + } + + /* setup VLArbitration */ + status = vlarb_update(p_req, p, port_num, qcfg); + if (status != IB_SUCCESS) { + osm_log(p_log, OSM_LOG_ERROR, "qos_physp_setup: " + "failed to update VLArbitration tables " + "for port %" PRIx64 " #%d\n", + cl_ntoh64(p->port_guid), port_num); + return status; + } + + /* setup Sl2VL tables */ + status = sl2vl_update(p_req, p, port_num, qcfg); + if (status != IB_SUCCESS) { + osm_log(p_log, OSM_LOG_ERROR, "qos_physp_setup: " + "failed to update SL2VLMapping tables " + "for port %" PRIx64 " #%d\n", + cl_ntoh64(p->port_guid), port_num); + return status; + } + + return IB_SUCCESS; +} + +osm_signal_t osm_qos_setup(osm_opensm_t * p_osm) +{ + struct qos_config hca_config, sw0_config, swe_config; + struct qos_config *cfg; + osm_switch_t *p_sw; + ib_switch_info_t *p_si; + cl_qmap_t *p_tbl; + cl_map_item_t *p_next; + osm_port_t *p_port; + uint32_t num_physp; + osm_physp_t *p_physp; + uint8_t node_type; + ib_api_status_t status; + uint32_t i; + + OSM_LOG_ENTER(&p_osm->log, osm_qos_setup); + + qos_build_config(&hca_config, &p_osm->subn.opt.qos_hca_options, + &p_osm->subn.opt.qos_options); + qos_build_config(&sw0_config, &p_osm->subn.opt.qos_sw0_options, + &p_osm->subn.opt.qos_options); + qos_build_config(&swe_config, &p_osm->subn.opt.qos_swe_options, + &p_osm->subn.opt.qos_options); + + cl_plock_excl_acquire(&p_osm->lock); + + p_tbl = &p_osm->subn.port_guid_tbl; + p_next = cl_qmap_head(p_tbl); + while (p_next != cl_qmap_end(p_tbl)) { + p_port = (osm_port_t *) p_next; + p_next = cl_qmap_next(p_next); + + node_type = osm_node_get_type(osm_port_get_parent_node(p_port)); + if (node_type == IB_NODE_TYPE_SWITCH) { + num_physp = osm_port_get_num_physp(p_port); + for (i = 1; i < num_physp; i++) { + p_physp = osm_port_get_phys_ptr(p_port, i); + if (!p_physp || !osm_physp_is_valid(p_physp)) + continue; + status = + qos_physp_setup(&p_osm->log, &p_osm->sm.req, + p_physp, i, &swe_config); + } + /* skip base port 0 */ + p_sw = osm_get_switch_by_guid(&p_osm->subn, + osm_port_get_guid(p_port)); + if (!p_sw || !(p_si = osm_switch_get_si_ptr(p_sw)) || + !ib_switch_info_is_enhanced_port_0(p_si)) + continue; + + cfg = &sw0_config; + } + else + cfg = &hca_config; + + p_physp = osm_port_get_default_phys_ptr(p_port); + if (!osm_physp_is_valid(p_physp)) + continue; + + status = qos_physp_setup(&p_osm->log, &p_osm->sm.req, + p_physp, 0, cfg); + } + + cl_plock_release(&p_osm->lock); + OSM_LOG_EXIT(&p_osm->log); + + return OSM_SIGNAL_DONE; +} + +/* + * QoS config stuff + * + */ + +static int parse_one_unsigned(char *str, char delim, unsigned *val) +{ + char *end; + *val = strtoul(str, &end, 0); + if (*end) + end++; + return end - str; +} + +static int parse_vlarb_entry(char *str, ib_vl_arb_element_t * e) +{ + unsigned val; + char *p = str; + p += parse_one_unsigned(p, ':', &val); + e->vl = val % 15; + p += parse_one_unsigned(p, ',', &val); + e->weight = val; + return p - str; +} + +static int parse_sl2vl_entry(char *str, uint8_t * raw) +{ + unsigned val1, val2; + char *p = str; + p += parse_one_unsigned(p, ',', &val1); + p += parse_one_unsigned(p, ',', &val2); + *raw = (val1 << 4) | (val2 & 0xf); + return p - str; +} + +static void qos_build_config(struct qos_config *cfg, + osm_qos_options_t * opt, osm_qos_options_t * dflt) +{ + int i; + char *p; + + memset(cfg, 0, sizeof(*cfg)); + + cfg->max_vls = opt->max_vls > 0 ? opt->max_vls : dflt->max_vls; + cfg->vl_high_limit = opt->high_limit; + + p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; + for (i = 0; i < 2 * IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; i++) { + p += parse_vlarb_entry(p, + &cfg->vlarb_high[i/IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]. + vl_entry[i%IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]); + } + + p = opt->vlarb_low ? opt->vlarb_low : dflt->vlarb_low; + for (i = 0; i < 2 * IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; i++) { + p += parse_vlarb_entry(p, + &cfg->vlarb_low[i/IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]. + vl_entry[i%IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]); + } + + p = opt->sl2vl ? opt->sl2vl : dflt->sl2vl; + for (i = 0; i < IB_MAX_NUM_VLS / 2; i++) + p += parse_sl2vl_entry(p, &cfg->sl2vl.raw_vl_by_sl[i]); + +} diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c index 1aefc0b..ca46937 100644 --- a/osm/opensm/osm_state_mgr.c +++ b/osm/opensm/osm_state_mgr.c @@ -71,6 +71,10 @@ #include /********************************************************************** **********************************************************************/ +osm_signal_t osm_qos_setup(IN osm_opensm_t * p_osm); + +/********************************************************************** + **********************************************************************/ void osm_state_mgr_construct( IN osm_state_mgr_t * const p_mgr ) @@ -1971,6 +1975,9 @@ osm_state_mgr_process( * need to unset it. */ p_mgr->p_subn->subnet_initialization_error = FALSE; + /* rescan configuration updates */ + osm_subn_rescan_conf_file(&p_mgr->p_subn->opt); + status = __osm_state_mgr_sweep_hop_0( p_mgr ); if( status == IB_SUCCESS ) { @@ -2234,6 +2241,10 @@ osm_state_mgr_process( /* the returned signal might be DONE or DONE_PENDING */ signal = osm_pkey_mgr_process( p_mgr->p_subn->p_osm ); + + /* the returned signal is always DONE */ + signal = osm_qos_setup(p_mgr->p_subn->p_osm); + break; default: From mshefty at ichips.intel.com Mon May 8 13:01:31 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 08 May 2006 13:01:31 -0700 Subject: [openib-general] CMA: port 2 loopback problems In-Reply-To: <20060508194750.GB25527@mellanox.co.il> References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> Message-ID: <445FA39B.50107@ichips.intel.com> Michael S. Tsirkin wrote: >> Is it possible to communicate between QPs on >> the same device if that device is disconnected from the fabric? > > > Yes. What attributes do you use for the pkey index and address vector when connecting the QPs? I'm wondering if the correct solution to this issue isn't to create some sort of "loopback" path record. I'm not sure what changes would be required of the IB CM. - Sean From mst at mellanox.co.il Mon May 8 13:22:38 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 23:22:38 +0300 Subject: [openib-general] Re: CMA: compliancy issue? In-Reply-To: <445FA246.5090106@ichips.intel.com> References: <20060508131156.GA21036@mellanox.co.il> <445F45A4.4080308@voltaire.com> <20060508133154.GC21036@mellanox.co.il> <445F4AA4.2020802@voltaire.com> <20060508134900.GD21036@mellanox.co.il> <445F54E1.8000305@voltaire.com> <20060508150534.GF21036@mellanox.co.il> <445F848C.2000502@ichips.intel.com> <20060508194705.GA25527@mellanox.co.il> <445FA246.5090106@ichips.intel.com> Message-ID: <20060508202238.GC25527@mellanox.co.il> Quoting r. Sean Hefty : > I think that the real issue here is that the CMA determines whether to hand > the REP to the user based on if it has a QP. This was done to support > userspace. SDP wants the CMA to manage its QP, but still wants to see the > REP. I'm a bit confused. What we currently have is that CMA passes ESTABLISHED event to SDP on REP. If so I think we can leave it as it is. > What I need to determine is if it's easier/better for SDP to manage > the QP itself, or to change the CMA to expose the REP when it manages the > QP states. I actually think all we have to do is to change CMA behaviour on REP: send RTU after, and not before, calling user handler. Since other ULPs don't seem t care when RTU is sent, they will continue working. No? -- MST From mst at mellanox.co.il Mon May 8 13:29:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 23:29:04 +0300 Subject: [openib-general] CMA: port 2 loopback problems In-Reply-To: <445FA39B.50107@ichips.intel.com> References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> <445FA39B.50107@ichips.intel.com> Message-ID: <20060508202904.GD25527@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] CMA: port 2 loopback problems > > Michael S. Tsirkin wrote: > >>Is it possible to communicate between QPs on > >>the same device if that device is disconnected from the fabric? > > > > > >Yes. > > What attributes do you use for the pkey index and address vector when > connecting the QPs? > > I'm wondering if the correct solution to this issue isn't to create some > sort of "loopback" path record. I'm not sure what changes would be > required of the IB CM. I thought about this too. People actually do expect loopback to work when link is down. I guess we could create "loopback" path record, with parameters such as SL editable from sysfs. And on port event we could try to update it from the SM. -- MST From mshefty at ichips.intel.com Mon May 8 13:32:36 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 08 May 2006 13:32:36 -0700 Subject: [openib-general] Re: CMA: compliancy issue? In-Reply-To: <20060508202238.GC25527@mellanox.co.il> References: <20060508131156.GA21036@mellanox.co.il> <445F45A4.4080308@voltaire.com> <20060508133154.GC21036@mellanox.co.il> <445F4AA4.2020802@voltaire.com> <20060508134900.GD21036@mellanox.co.il> <445F54E1.8000305@voltaire.com> <20060508150534.GF21036@mellanox.co.il> <445F848C.2000502@ichips.intel.com> <20060508194705.GA25527@mellanox.co.il> <445FA246.5090106@ichips.intel.com> <20060508202238.GC25527@mellanox.co.il> Message-ID: <445FAAE4.4050600@ichips.intel.com> Michael S. Tsirkin wrote: > I actually think all we have to do is to change CMA behaviour on REP: > send RTU after, and not before, calling user handler. > Since other ULPs don't seem t care when RTU is sent, they will continue > working. I think that it makes more sense to give the user the CONNECT_RESPONSE event, and have them call rdma_accept() or rdma_reject(), versus allowing them to call rdma_reject() after an ESTABLISHED event. There is already code for this; it's just tied to the existence of a QP off the rdma_cm_id. - Sean From mst at mellanox.co.il Mon May 8 13:46:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 8 May 2006 23:46:43 +0300 Subject: [openib-general] Re: CMA: compliancy issue? In-Reply-To: <445FAAE4.4050600@ichips.intel.com> References: <20060508133154.GC21036@mellanox.co.il> <445F4AA4.2020802@voltaire.com> <20060508134900.GD21036@mellanox.co.il> <445F54E1.8000305@voltaire.com> <20060508150534.GF21036@mellanox.co.il> <445F848C.2000502@ichips.intel.com> <20060508194705.GA25527@mellanox.co.il> <445FA246.5090106@ichips.intel.com> <20060508202238.GC25527@mellanox.co.il> <445FAAE4.4050600@ichips.intel.com> Message-ID: <20060508204643.GA26276@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] Re: CMA: compliancy issue? > > Michael S. Tsirkin wrote: > >I actually think all we have to do is to change CMA behaviour on REP: > >send RTU after, and not before, calling user handler. > >Since other ULPs don't seem t care when RTU is sent, they will continue > >working. > > I think that it makes more sense to give the user the CONNECT_RESPONSE > event, and have them call rdma_accept() or rdma_reject(), versus allowing > them to call rdma_reject() after an ESTABLISHED event. There is already > code for this; it's just tied to the existence of a QP off the rdma_cm_id. OK, that's fine too. BTW I don't think SDP needs to see the RTU on the active side. -- MST From sashak at voltaire.com Mon May 8 14:37:07 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 09 May 2006 00:37:07 +0300 Subject: [openib-general] [PATCH] opensm: make some local functions static Message-ID: <20060508213707.4552.35457.stgit@sashak.voltaire.com> This patch makes some local functions static. One unused function was cleaned up, other currently unused was masked. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_node_info_rcv.c | 24 +++++++++--------- osm/opensm/osm_port_info_rcv.c | 2 +- osm/opensm/osm_sw_info_rcv.c | 52 +++++----------------------------------- 3 files changed, 20 insertions(+), 58 deletions(-) diff --git a/osm/opensm/osm_node_info_rcv.c b/osm/opensm/osm_node_info_rcv.c index 0b8a78f..943c352 100644 --- a/osm/opensm/osm_node_info_rcv.c +++ b/osm/opensm/osm_node_info_rcv.c @@ -74,7 +74,7 @@ #include /********************************************************************** The plock must be held before calling this function. **********************************************************************/ -void +static void __osm_ni_rcv_set_links( IN const osm_ni_rcv_t* const p_rcv, osm_node_t* p_node, @@ -287,7 +287,7 @@ __osm_ni_rcv_set_links( /********************************************************************** The plock must be held before calling this function. **********************************************************************/ -void +static void __osm_ni_rcv_process_new_node( IN const osm_ni_rcv_t* const p_rcv, IN osm_node_t* const p_node, @@ -352,7 +352,7 @@ __osm_ni_rcv_process_new_node( /********************************************************************** The plock must be held before calling this function. **********************************************************************/ -void +static void __osm_ni_rcv_get_node_desc( IN const osm_ni_rcv_t* const p_rcv, IN osm_node_t* const p_node, @@ -412,7 +412,7 @@ __osm_ni_rcv_get_node_desc( /********************************************************************** The plock must be held before calling this function. **********************************************************************/ -void +static void __osm_ni_rcv_process_new_ca( IN const osm_ni_rcv_t* const p_rcv, IN osm_node_t* const p_node, @@ -438,7 +438,7 @@ __osm_ni_rcv_process_new_ca( /********************************************************************** The plock must be held before calling this function. **********************************************************************/ -void +static void __osm_ni_rcv_process_ca_port( IN const osm_ni_rcv_t* const p_rcv, IN osm_node_t* const p_node, @@ -586,7 +586,7 @@ __osm_ni_rcv_process_ca_port( /********************************************************************** The plock must be held before calling this function. **********************************************************************/ -void +static void __osm_ni_rcv_process_existing_ca( IN const osm_ni_rcv_t* const p_rcv, IN osm_node_t* const p_node, @@ -602,7 +602,7 @@ __osm_ni_rcv_process_existing_ca( /********************************************************************** The plock must be held before calling this function. **********************************************************************/ -void +static void __osm_ni_rcv_process_new_router( IN const osm_ni_rcv_t* const p_rcv, IN osm_node_t* const p_node, @@ -617,7 +617,7 @@ __osm_ni_rcv_process_new_router( /********************************************************************** **********************************************************************/ -void +static void __osm_ni_rcv_process_switch( IN const osm_ni_rcv_t* const p_rcv, IN osm_node_t* const p_node, @@ -666,7 +666,7 @@ __osm_ni_rcv_process_switch( /********************************************************************** The plock must be held before calling this function. **********************************************************************/ -void +static void __osm_ni_rcv_process_existing_switch( IN const osm_ni_rcv_t* const p_rcv, IN osm_node_t* const p_node, @@ -713,7 +713,7 @@ __osm_ni_rcv_process_existing_switch( /********************************************************************** The plock must be held before calling this function. **********************************************************************/ -void +static void __osm_ni_rcv_process_new_switch( IN const osm_ni_rcv_t* const p_rcv, IN osm_node_t* const p_node, @@ -739,7 +739,7 @@ __osm_ni_rcv_process_new_switch( /********************************************************************** The plock must NOT be held before calling this function. **********************************************************************/ -void +static void __osm_ni_rcv_process_new( IN const osm_ni_rcv_t* const p_rcv, IN const osm_madw_t* const p_madw ) @@ -908,7 +908,7 @@ __osm_ni_rcv_process_new( /********************************************************************** The plock must be held before calling this function. **********************************************************************/ -void +static void __osm_ni_rcv_process_existing( IN const osm_ni_rcv_t* const p_rcv, IN osm_node_t* const p_node, diff --git a/osm/opensm/osm_port_info_rcv.c b/osm/opensm/osm_port_info_rcv.c index 658d99e..bc75e71 100644 --- a/osm/opensm/osm_port_info_rcv.c +++ b/osm/opensm/osm_port_info_rcv.c @@ -510,7 +510,7 @@ void osm_pkey_get_tables( /********************************************************************** **********************************************************************/ -void +static void __osm_pi_rcv_get_pkey_slvl_vla_tables( IN const osm_pi_rcv_t* const p_rcv, IN osm_node_t* const p_node, diff --git a/osm/opensm/osm_sw_info_rcv.c b/osm/opensm/osm_sw_info_rcv.c index cefcf28..ee7c744 100644 --- a/osm/opensm/osm_sw_info_rcv.c +++ b/osm/opensm/osm_sw_info_rcv.c @@ -63,49 +63,9 @@ #include #include /********************************************************************** - **********************************************************************/ -void -__osm_si_rcv_clear_sc_bit( - IN const osm_si_rcv_t* const p_rcv, - IN osm_node_t* const p_node, - IN ib_switch_info_t* const p_si ) -{ - uint8_t payload[IB_SMP_DATA_SIZE]; - ib_api_status_t status; - osm_madw_context_t context; - OSM_LOG_ENTER( p_rcv->p_log, __osm_si_rcv_clear_sc_bit ); - - context.si_context.node_guid = osm_node_get_node_guid( p_node ); - context.si_context.set_method = TRUE; - context.si_context.light_sweep = FALSE; - - cl_memcpy( payload, p_si, IB_SMP_DATA_SIZE ); - - status = osm_req_set( p_rcv->p_req, - osm_node_get_any_dr_path_ptr( p_node ), - payload, - sizeof(payload), - IB_MAD_ATTR_SWITCH_INFO, - 0, - CL_DISP_MSGID_NONE, - &context ); - - if( status != IB_SUCCESS ) - { - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "__osm_si_rcv_clear_sc_bit: ERR 3601: " - "Unable to clear state change bit for switch " - "with GUID = 0x%" PRIx64 "\n", - cl_ntoh64( osm_node_get_node_guid( p_node ) ) ); - } - - OSM_LOG_EXIT( p_rcv->p_log ); -} - -/********************************************************************** The plock must be held before calling this function. **********************************************************************/ -void +static void __osm_si_rcv_get_port_info( IN const osm_si_rcv_t* const p_rcv, IN osm_switch_t* const p_sw, @@ -178,7 +138,7 @@ __osm_si_rcv_get_port_info( /********************************************************************** The plock must be held before calling this function. **********************************************************************/ -void +static void __osm_si_rcv_get_fwd_tbl( IN const osm_si_rcv_t* const p_rcv, IN osm_switch_t* const p_sw ) @@ -242,7 +202,8 @@ __osm_si_rcv_get_fwd_tbl( /********************************************************************** The plock must be held before calling this function. **********************************************************************/ -void +#if 0 +static void __osm_si_rcv_get_mcast_fwd_tbl( IN const osm_si_rcv_t* const p_rcv, IN osm_switch_t* const p_sw ) @@ -347,11 +308,12 @@ __osm_si_rcv_get_mcast_fwd_tbl( Exit: OSM_LOG_EXIT( p_rcv->p_log ); } +#endif /********************************************************************** Lock must be held on entry to this function. **********************************************************************/ -void +static void __osm_si_rcv_process_new( IN const osm_si_rcv_t* const p_rcv, IN osm_node_t* const p_node, @@ -462,7 +424,7 @@ #endif Return 1 if the caller is expected to send a change_detected event. this can not be done internally as the event needs the lock... **********************************************************************/ -boolean_t +static boolean_t __osm_si_rcv_process_existing( IN const osm_si_rcv_t* const p_rcv, IN osm_node_t* const p_node, From rheflin at atipa.com Mon May 8 14:36:07 2006 From: rheflin at atipa.com (Roger Heflin) Date: Mon, 08 May 2006 16:36:07 -0500 Subject: [openib-general] Openmpi/xhpl kernel crash 2.6.17-rc3 with Pathscale htx Message-ID: <445FB9C7.8060507@atipa.com> Hello, Running hpl with openmpi over Infiniband gets me a crash. Using hpl, openmpi 1.0.2, openib, and the 2.6.17-rc3 kernel. I don't see the crash under ip over ib (ran for over an hour), the crash occurs immediately upon attempting to start xhpl. Here is the crash captured via the serial port: [ 144.713555] ----------- [cut here ] --------- [please bite here ] --------- [ 144.720550] Kernel BUG at drivers/infiniband/hw/ipath/ipath_layer.c:757 [ 144.727205] invalid opcode: 0000 [1] SMP [ 144.731334] CPU 0 [ 144.733419] Modules linked in: ipv6 autofs4 adm1026 hwmon_vid i2c_piix4 nfs lockd nfs_acl sunrpc dm_mirror dm_multipath dm_mod button battery ac ohci_hcd ehci_hcd i2c_nforce2 i2c_core shpchp snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc ib_ipoib ib_ipath ipath_core ib_uverbs ib_umad ib_ucm ib_sa ib_cm ib_mad ib_core tg3 floppy sata_svw ext3 jbd sata_nv libata sd_mod scsi_mod [ 144.774643] Pid: 4771, comm: xhpl Not tainted 2.6.17-rc3 #1 [ 144.780244] RIP: 0010:[] {:ipath_core:ipath_verbs_send+362} [ 144.788858] RSP: 0018:ffffffff8051be38 EFLAGS: 00010246 [ 144.794409] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8100df4a0150 [ 144.801574] RDX: ffffc200003b1078 RSI: 0000000000000000 RDI: ffff8100df4a0150 [ 144.808742] RBP: 0000000000000000 R08: ffff8100df4a0158 R09: 0000000000000018 [ 144.815910] R10: 0000000000000018 R11: 0000000000000246 R12: ffffc2000026f020 [ 144.823071] R13: 0000000000000000 R14: 0000000000000018 R15: 0000000000000000 [ 144.830230] FS: 00002b750d6fcca0(0000) GS:ffffffff805ad000(0000) knlGS:0000000000000000 [ 144.838398] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 144.844190] CR2: 000000000047f050 CR3: 000000000cdb1000 CR4: 00000000000006e0 [ 144.851370] Process xhpl (pid: 4771, threadinfo ffff81000ceba000, task ffff81000cc9e880) [ 144.859504] Stack: ffffffff8059d900 ffff8100df4a0150 00000018dfef1000 ffff8100df4a0120 [ 144.867549] ffff8100df4a0000 ffffffff805f7d88 ffff8100df4a0098 0000000000000038 [ 144.875829] 0000000000000400 ffffffff8811869e [ 144.881079] Call Trace: {:ib_ipath:ipath_do_rc_send+348} [ 144.888727] {do_timer+58} {main_timer_handler+493} [ 144.897498] {tasklet_hi_action+105} {__do_softirq+80} [ 144.906525] {call_softirq+30} {do_softirq+47} [ 144.914854] {do_IRQ+62} {ret_from_intr+0} [ 144.923395] {kfree+417} {:ib_uverbs:ib_uverbs_poll_cq+409} [ 144.932867] {:ib_uverbs:ib_uverbs_write+196} {vfs_write+212} [ 144.942509] {sys_write+69} {system_call+126} [ 144.950997] [ 144.950998] Code: 0f 0b 68 84 21 10 88 c2 f5 02 eb 07 44 39 f3 41 0f 47 de 48 [ 144.960709] RIP {:ipath_core:ipath_verbs_send+362} RSP [ 144.969212] <3>BUG: sleeping function called from invalid context at include/linux/rwsem.h:43 [ 144.977952] in_atomic():1, irqs_disabled():0 [ 144.982261] [ 144.982262] Call Trace: {__might_sleep+190} [ 144.990056] {flat_send_IPI_mask+0} {blocking_notifier_call_chain+31} [ 145.000411] {do_exit+34} {_spin_unlock_irqrestore+11} [ 145.009454] {do_divide_error+0} {do_invalid_op+145} [ 145.018334] {:ipath_core:ipath_verbs_send+362} [ 145.025102] {tcp_v4_do_rcv+43} {:tg3:tg3_interrupt_tagged+51} [ 145.034840] {error_exit+0} {:ipath_core:ipath_verbs_send+362} [ 145.044606] {:ipath_core:ipath_verbs_send+806} [ 145.051390] {:ib_ipath:ipath_do_rc_send+348} {do_timer+58} [ 145.060897] {main_timer_handler+493} {tasklet_hi_action+105} [ 145.070569] {__do_softirq+80} {call_softirq+30} [ 145.079121] {do_softirq+47} {do_IRQ+62} [ 145.086947] {ret_from_intr+0} {kfree+417} [ 145.095523] {:ib_uverbs:ib_uverbs_poll_cq+409} [ 145.102291] {:ib_uverbs:ib_uverbs_write+196} {vfs_write+212} [ 145.111972] {sys_write+69} {system_call+126} [ 145.120482] Kernel panic - not syncing: Aiee, killing interrupt handler! [ 145.127265] /proc/interrupts looks like this: CPU0 CPU1 CPU2 CPU3 0: 107714 110040 109206 113504 IO-APIC-edge timer 1: 417 1287 405 1627 IO-APIC-edge i8042 8: 0 0 0 0 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-level acpi 15: 50 0 0 23 IO-APIC-edge ide1 50: 0 0 0 0 IO-APIC-level libata, ohci_hcd:usb2 58: 0 0 0 0 IO-APIC-level libata 66: 0 0 0 0 IO-APIC-level libata 74: 15625 0 0 11 IO-APIC-level eth0 90: 551 0 0 0 IO-APIC-level ipath_core 98: 0 0 0 0 IO-APIC-level NVidia CK804 233: 249 904 1161 4180 IO-APIC-level libata, ehci_hcd:usb1 NMI: 107 124 406 483 LOC: 440388 440365 440341 440317 ERR: 0 MIS: 0 Any ideas? Roger From mst at mellanox.co.il Mon May 8 14:43:37 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 00:43:37 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <20060508171044.GT21036@mellanox.co.il> <20060508173655.GA24615@mellanox.co.il> Message-ID: <20060508214337.GC26276@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] cm refcount race fix > > It does seem we can simplify mthca_cq in a slightly different way. > mthca_cq_clean() doesn't need to take a CQ reference, because we know > the CQ can't go away before all associated QPs are gone, and at least > one QP will stay around until mthca_cq_clean() returns. > > So the below patch is both a fix and a decent cleanup: > > --- infiniband/hw/mthca/mthca_provider.h (revision 6945) > +++ infiniband/hw/mthca/mthca_provider.h (working copy) > @@ -197,7 +197,7 @@ struct mthca_cq_resize { > struct mthca_cq { > struct ib_cq ibcq; > spinlock_t lock; > - atomic_t refcount; > + int refcount; > int cqn; > u32 cons_index; > struct mthca_cq_buf buf; > --- infiniband/hw/mthca/mthca_dev.h (revision 6945) > +++ infiniband/hw/mthca/mthca_dev.h (working copy) > @@ -496,7 +496,7 @@ void mthca_free_cq(struct mthca_dev *dev > void mthca_cq_completion(struct mthca_dev *dev, u32 cqn); > void mthca_cq_event(struct mthca_dev *dev, u32 cqn, > enum ib_event_type event_type); > -void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn, > +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, > struct mthca_srq *srq); > void mthca_cq_resize_copy_cqes(struct mthca_cq *cq); > int mthca_alloc_cq_buf(struct mthca_dev *dev, struct mthca_cq_buf *buf, int nent); > --- infiniband/hw/mthca/mthca_cq.c (revision 6945) > +++ infiniband/hw/mthca/mthca_cq.c (working copy) > @@ -234,14 +234,19 @@ void mthca_cq_event(struct mthca_dev *de > { > struct mthca_cq *cq; > struct ib_event event; > + unsigned long flags; > > - spin_lock(&dev->cq_table.lock); > + spin_lock_irqsave(&dev->cq_table.lock, flags); > > cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1)); > > - if (cq) > - atomic_inc(&cq->refcount); > - spin_unlock(&dev->cq_table.lock); > + if (cq) { > + spin_lock(&cq->lock); > + ++cq->refcount; > + spin_unlock(&cq->lock); > + } > + > + spin_unlock_irqrestore(&dev->cq_table.lock, flags); Hmm. I see you take cq->lock inside cq_table.lock OTOH in mthca_qp.c we have: spin_lock_irq(&send_cq->lock); if (send_cq != recv_cq) spin_lock(&recv_cq->lock); spin_lock(&dev->qp_table.lock); So qp_table.lock is taken inside cq->lock. I can't prove its a problem, but locking rules are getting confusing - it was better when all table locks where inner-most. As a solution, we can decide that cq recount is protected by cq_table.lock. what do you say. -- MST From halr at voltaire.com Mon May 8 14:47:31 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 08 May 2006 17:47:31 -0400 Subject: [openib-general] Re: [PATCH] opensm: make some local functions static In-Reply-To: <20060508213707.4552.35457.stgit@sashak.voltaire.com> References: <20060508213707.4552.35457.stgit@sashak.voltaire.com> Message-ID: <1147124851.4485.5091.camel@hal.voltaire.com> On Mon, 2006-05-08 at 17:37, Sasha Khapyorsky wrote: > This patch makes some local functions static. One unused function was > cleaned up, other currently unused was masked. Thanks. Applied. One comment below: > Signed-off-by: Sasha Khapyorsky > --- > > osm/opensm/osm_node_info_rcv.c | 24 +++++++++--------- > osm/opensm/osm_port_info_rcv.c | 2 +- > osm/opensm/osm_sw_info_rcv.c | 52 +++++----------------------------------- > 3 files changed, 20 insertions(+), 58 deletions(-) [snip...] > diff --git a/osm/opensm/osm_port_info_rcv.c b/osm/opensm/osm_port_info_rcv.c > index 658d99e..bc75e71 100644 > --- a/osm/opensm/osm_port_info_rcv.c > +++ b/osm/opensm/osm_port_info_rcv.c [snip...] > @@ -242,7 +202,8 @@ __osm_si_rcv_get_fwd_tbl( > /********************************************************************** > The plock must be held before calling this function. > **********************************************************************/ > -void > +#if 0 > +static void > __osm_si_rcv_get_mcast_fwd_tbl( > IN const osm_si_rcv_t* const p_rcv, > IN osm_switch_t* const p_sw ) > @@ -347,11 +308,12 @@ __osm_si_rcv_get_mcast_fwd_tbl( > Exit: > OSM_LOG_EXIT( p_rcv->p_log ); > } > +#endif How come you just #if 0'd this out ? [snip...] -- Hal From rdreier at cisco.com Mon May 8 15:01:56 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 15:01:56 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508214337.GC26276@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 9 May 2006 00:43:37 +0300") References: <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <20060508171044.GT21036@mellanox.co.il> <20060508173655.GA24615@mellanox.co.il> <20060508214337.GC26276@mellanox.co.il> Message-ID: Michael> As a solution, we can decide that cq recount is protected Michael> by cq_table.lock. what do you say. Yes, that makes sense. Like this (locking comments also fixed). I'll do the QP and SRQ stuff now, it's so easy... --- infiniband/hw/mthca/mthca_provider.h (revision 6999) +++ infiniband/hw/mthca/mthca_provider.h (working copy) @@ -139,11 +139,12 @@ struct mthca_ah { * a qp may be locked, with the send cq locked first. No other * nesting should be done. * - * Each struct mthca_cq/qp also has an atomic_t ref count. The - * pointer from the cq/qp_table to the struct counts as one reference. - * This reference also is good for access through the consumer API, so - * modifying the CQ/QP etc doesn't need to take another reference. - * Access because of a completion being polled does need a reference. + * Each struct mthca_cq/qp also has an ref count, protected by the + * corresponding table lock. The pointer from the cq/qp_table to the + * struct counts as one reference. This reference also is good for + * access through the consumer API, so modifying the CQ/QP etc doesn't + * need to take another reference. Access because of a completion + * being polled does need a reference. * * Finally, each struct mthca_cq/qp has a wait_queue_head_t for the * destroy function to sleep on. @@ -159,8 +160,9 @@ struct mthca_ah { * - decrement ref count; if zero, wake up waiters * * To destroy a CQ/QP, we can do the following: - * - lock cq/qp_table, remove pointer, unlock cq/qp_table lock - * - decrement ref count + * - lock cq/qp_table + * - remove pointer and decrement ref count + * - unlock cq/qp_table lock * - wait_event until ref count is zero * * It is the consumer's responsibilty to make sure that no QP @@ -197,7 +199,7 @@ struct mthca_cq_resize { struct mthca_cq { struct ib_cq ibcq; spinlock_t lock; - atomic_t refcount; + int refcount; int cqn; u32 cons_index; struct mthca_cq_buf buf; --- infiniband/hw/mthca/mthca_dev.h (revision 6999) +++ infiniband/hw/mthca/mthca_dev.h (working copy) @@ -496,7 +496,7 @@ void mthca_free_cq(struct mthca_dev *dev void mthca_cq_completion(struct mthca_dev *dev, u32 cqn); void mthca_cq_event(struct mthca_dev *dev, u32 cqn, enum ib_event_type event_type); -void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn, +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, struct mthca_srq *srq); void mthca_cq_resize_copy_cqes(struct mthca_cq *cq); int mthca_alloc_cq_buf(struct mthca_dev *dev, struct mthca_cq_buf *buf, int nent); --- infiniband/hw/mthca/mthca_cq.c (revision 6999) +++ infiniband/hw/mthca/mthca_cq.c (working copy) @@ -238,9 +238,9 @@ void mthca_cq_event(struct mthca_dev *de spin_lock(&dev->cq_table.lock); cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1)); - if (cq) - atomic_inc(&cq->refcount); + ++cq->refcount; + spin_unlock(&dev->cq_table.lock); if (!cq) { @@ -254,8 +254,10 @@ void mthca_cq_event(struct mthca_dev *de if (cq->ibcq.event_handler) cq->ibcq.event_handler(&event, cq->ibcq.cq_context); - if (atomic_dec_and_test(&cq->refcount)) + spin_lock(&dev->cq_table.lock); + if (!--cq->refcount) wake_up(&cq->wait); + spin_unlock(&dev->cq_table.lock); } static inline int is_recv_cqe(struct mthca_cqe *cqe) @@ -267,23 +269,13 @@ static inline int is_recv_cqe(struct mth return !(cqe->is_send & 0x80); } -void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn, +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, struct mthca_srq *srq) { - struct mthca_cq *cq; struct mthca_cqe *cqe; u32 prod_index; int nfreed = 0; - spin_lock_irq(&dev->cq_table.lock); - cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1)); - if (cq) - atomic_inc(&cq->refcount); - spin_unlock_irq(&dev->cq_table.lock); - - if (!cq) - return; - spin_lock_irq(&cq->lock); /* @@ -301,7 +293,7 @@ void mthca_cq_clean(struct mthca_dev *de if (0) mthca_dbg(dev, "Cleaning QPN %06x from CQN %06x; ci %d, pi %d\n", - qpn, cqn, cq->cons_index, prod_index); + qpn, cq->cqn, cq->cons_index, prod_index); /* * Now sweep backwards through the CQ, removing CQ entries @@ -325,8 +317,6 @@ void mthca_cq_clean(struct mthca_dev *de } spin_unlock_irq(&cq->lock); - if (atomic_dec_and_test(&cq->refcount)) - wake_up(&cq->wait); } void mthca_cq_resize_copy_cqes(struct mthca_cq *cq) @@ -821,7 +811,7 @@ int mthca_init_cq(struct mthca_dev *dev, } spin_lock_init(&cq->lock); - atomic_set(&cq->refcount, 1); + cq->refcount = 1; init_waitqueue_head(&cq->wait); memset(cq_context, 0, sizeof *cq_context); @@ -896,6 +886,17 @@ err_out: return err; } +static int get_cq_refcount(struct mthca_dev *dev, struct mthca_cq *cq) +{ + int c; + + spin_lock_irq(&dev->cq_table.lock); + c = cq->refcount; + spin_unlock_irq(&dev->cq_table.lock); + + return c; +} + void mthca_free_cq(struct mthca_dev *dev, struct mthca_cq *cq) { @@ -929,6 +930,7 @@ void mthca_free_cq(struct mthca_dev *dev spin_lock_irq(&dev->cq_table.lock); mthca_array_clear(&dev->cq_table.cq, cq->cqn & (dev->limits.num_cqs - 1)); + --cq->refcount; spin_unlock_irq(&dev->cq_table.lock); if (dev->mthca_flags & MTHCA_FLAG_MSI_X) @@ -936,8 +938,7 @@ void mthca_free_cq(struct mthca_dev *dev else synchronize_irq(dev->pdev->irq); - atomic_dec(&cq->refcount); - wait_event(cq->wait, !atomic_read(&cq->refcount)); + wait_event(cq->wait, !get_cq_refcount(dev, cq)); if (cq->is_kernel) { mthca_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe); --- infiniband/hw/mthca/mthca_qp.c (revision 6999) +++ infiniband/hw/mthca/mthca_qp.c (working copy) @@ -831,10 +831,10 @@ int mthca_modify_qp(struct ib_qp *ibqp, * entries and reinitialize the QP. */ if (new_state == IB_QPS_RESET && !qp->ibqp.uobject) { - mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); if (qp->ibqp.send_cq != qp->ibqp.recv_cq) - mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); mthca_wq_init(&qp->sq); @@ -1356,10 +1356,10 @@ void mthca_free_qp(struct mthca_dev *dev * unref the mem-free tables and free the QPN in our table. */ if (!qp->ibqp.uobject) { - mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); if (qp->ibqp.send_cq != qp->ibqp.recv_cq) - mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); mthca_free_memfree(dev, qp); From sashak at voltaire.com Mon May 8 15:07:23 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 9 May 2006 01:07:23 +0300 Subject: [openib-general] Re: [PATCH] opensm: make some local functions static In-Reply-To: <1147124851.4485.5091.camel@hal.voltaire.com> References: <20060508213707.4552.35457.stgit@sashak.voltaire.com> <1147124851.4485.5091.camel@hal.voltaire.com> Message-ID: <20060508220723.GA28740@sashak.voltaire.com> On 17:47 Mon 08 May , Hal Rosenstock wrote: > > > @@ -242,7 +202,8 @@ __osm_si_rcv_get_fwd_tbl( > > /********************************************************************** > > The plock must be held before calling this function. > > **********************************************************************/ > > -void > > +#if 0 > > +static void > > __osm_si_rcv_get_mcast_fwd_tbl( > > IN const osm_si_rcv_t* const p_rcv, > > IN osm_switch_t* const p_sw ) > > @@ -347,11 +308,12 @@ __osm_si_rcv_get_mcast_fwd_tbl( > > Exit: > > OSM_LOG_EXIT( p_rcv->p_log ); > > } > > +#endif > > How come you just #if 0'd this out ? Below there is the code which uses this function and it is under #if 0, so in order tp prevent warning and to be consistent with existed code... :). Actually if it is really not necessary both the function and its call may be removed. Sasha. From mst at mellanox.co.il Mon May 8 15:05:57 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 01:05:57 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <20060508171044.GT21036@mellanox.co.il> <20060508173655.GA24615@mellanox.co.il> <20060508214337.GC26276@mellanox.co.il> Message-ID: <20060508220556.GA1084@mellanox.co.il> Quoting r. Roland Dreier : > + * Access because of a completion > + * being polled does need a reference. Does it? -- MST From mshefty at ichips.intel.com Mon May 8 15:05:30 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 08 May 2006 15:05:30 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508165625.GR21036@mellanox.co.il> References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> Message-ID: <445FC0AA.50503@ichips.intel.com> Michael S. Tsirkin wrote: > static inline void cm_deref_id(struct cm_id_private *cm_id_priv) > { > + unsigned long flags; > + > + spin_lock_irqsave(&cm_id_priv->lock, flags); > if (atomic_dec_and_test(&cm_id_priv->refcount)) > wake_up(&cm_id_priv->wait); > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > } Could we use atomic_dec_and_lock() instead? This would keep refcount atomic, but use a spinlock to synchronize with destruction. - Sean From mst at mellanox.co.il Mon May 8 15:09:10 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 01:09:10 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <20060508171044.GT21036@mellanox.co.il> <20060508173655.GA24615@mellanox.co.il> <20060508214337.GC26276@mellanox.co.il> Message-ID: <20060508220910.GB1084@mellanox.co.il> Quoting r. Roland Dreier : > +static int get_cq_refcount(struct mthca_dev *dev, struct mthca_cq *cq) inline? rename mthca_cq_refcount for consistency? -- MST From mst at mellanox.co.il Mon May 8 15:13:38 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 01:13:38 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <445FC0AA.50503@ichips.intel.com> References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <445FC0AA.50503@ichips.intel.com> Message-ID: <20060508221338.GC1084@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] Re: [PATCH] cm refcount race fix > > Michael S. Tsirkin wrote: > > static inline void cm_deref_id(struct cm_id_private *cm_id_priv) > > { > >+ unsigned long flags; > >+ > >+ spin_lock_irqsave(&cm_id_priv->lock, flags); > > if (atomic_dec_and_test(&cm_id_priv->refcount)) > > wake_up(&cm_id_priv->wait); > >+ spin_unlock_irqrestore(&cm_id_priv->lock, flags); > > } > > Could we use atomic_dec_and_lock() instead? This would keep refcount > atomic, but use a spinlock to synchronize with destruction. I think that'd work too. -- MST From mst at mellanox.co.il Mon May 8 15:14:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 01:14:25 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <20060508171044.GT21036@mellanox.co.il> <20060508173655.GA24615@mellanox.co.il> <20060508214337.GC26276@mellanox.co.il> Message-ID: <20060508221425.GD1084@mellanox.co.il> Quoting r. Roland Dreier : > I'll do the QP and SRQ stuff now, it's so easy... Yep. -- MST From rdreier at cisco.com Mon May 8 15:14:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 15:14:24 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508220556.GA1084@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 9 May 2006 01:05:57 +0300") References: <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <20060508171044.GT21036@mellanox.co.il> <20060508173655.GA24615@mellanox.co.il> <20060508214337.GC26276@mellanox.co.il> <20060508220556.GA1084@mellanox.co.il> Message-ID: Michael> Does it? Not anymore. I fixed the comment there too. Here's my latest patch, which fixes CQs, QPs and SRQs (everything in mthca, I think): --- infiniband/hw/mthca/mthca_provider.h (revision 6999) +++ infiniband/hw/mthca/mthca_provider.h (working copy) @@ -139,11 +139,12 @@ struct mthca_ah { * a qp may be locked, with the send cq locked first. No other * nesting should be done. * - * Each struct mthca_cq/qp also has an atomic_t ref count. The - * pointer from the cq/qp_table to the struct counts as one reference. - * This reference also is good for access through the consumer API, so - * modifying the CQ/QP etc doesn't need to take another reference. - * Access because of a completion being polled does need a reference. + * Each struct mthca_cq/qp also has an ref count, protected by the + * corresponding table lock. The pointer from the cq/qp_table to the + * struct counts as one reference. This reference also is good for + * access through the consumer API, so modifying the CQ/QP etc doesn't + * need to take another reference. Access to a QP because of a + * completion being polled does not need a reference either. * * Finally, each struct mthca_cq/qp has a wait_queue_head_t for the * destroy function to sleep on. @@ -159,8 +160,9 @@ struct mthca_ah { * - decrement ref count; if zero, wake up waiters * * To destroy a CQ/QP, we can do the following: - * - lock cq/qp_table, remove pointer, unlock cq/qp_table lock - * - decrement ref count + * - lock cq/qp_table + * - remove pointer and decrement ref count + * - unlock cq/qp_table lock * - wait_event until ref count is zero * * It is the consumer's responsibilty to make sure that no QP @@ -197,7 +199,7 @@ struct mthca_cq_resize { struct mthca_cq { struct ib_cq ibcq; spinlock_t lock; - atomic_t refcount; + int refcount; int cqn; u32 cons_index; struct mthca_cq_buf buf; @@ -217,7 +219,7 @@ struct mthca_cq { struct mthca_srq { struct ib_srq ibsrq; spinlock_t lock; - atomic_t refcount; + int refcount; int srqn; int max; int max_gs; @@ -254,7 +256,7 @@ struct mthca_wq { struct mthca_qp { struct ib_qp ibqp; - atomic_t refcount; + int refcount; u32 qpn; int is_direct; u8 port; /* for SQP and memfree use only */ --- infiniband/hw/mthca/mthca_srq.c (revision 6999) +++ infiniband/hw/mthca/mthca_srq.c (working copy) @@ -241,7 +241,7 @@ int mthca_alloc_srq(struct mthca_dev *de goto err_out_mailbox; spin_lock_init(&srq->lock); - atomic_set(&srq->refcount, 1); + srq->refcount = 1; init_waitqueue_head(&srq->wait); if (mthca_is_memfree(dev)) @@ -308,6 +308,17 @@ err_out: return err; } +static int get_srq_refcount(struct mthca_dev *dev, struct mthca_srq *srq) +{ + int c; + + spin_lock_irq(&dev->srq_table.lock); + c = srq->refcount; + spin_unlock_irq(&dev->srq_table.lock); + + return c; +} + void mthca_free_srq(struct mthca_dev *dev, struct mthca_srq *srq) { struct mthca_mailbox *mailbox; @@ -329,10 +340,10 @@ void mthca_free_srq(struct mthca_dev *de spin_lock_irq(&dev->srq_table.lock); mthca_array_clear(&dev->srq_table.srq, srq->srqn & (dev->limits.num_srqs - 1)); + --srq->refcount; spin_unlock_irq(&dev->srq_table.lock); - atomic_dec(&srq->refcount); - wait_event(srq->wait, !atomic_read(&srq->refcount)); + wait_event(srq->wait, !get_srq_refcount(dev, srq)); if (!srq->ibsrq.uobject) { mthca_free_srq_buf(dev, srq); @@ -414,7 +425,7 @@ void mthca_srq_event(struct mthca_dev *d spin_lock(&dev->srq_table.lock); srq = mthca_array_get(&dev->srq_table.srq, srqn & (dev->limits.num_srqs - 1)); if (srq) - atomic_inc(&srq->refcount); + ++srq->refcount; spin_unlock(&dev->srq_table.lock); if (!srq) { @@ -431,8 +442,10 @@ void mthca_srq_event(struct mthca_dev *d srq->ibsrq.event_handler(&event, srq->ibsrq.srq_context); out: - if (atomic_dec_and_test(&srq->refcount)) + spin_lock(&dev->srq_table.lock); + if (!--srq->refcount) wake_up(&srq->wait); + spin_unlock(&dev->srq_table.lock); } /* --- infiniband/hw/mthca/mthca_dev.h (revision 6999) +++ infiniband/hw/mthca/mthca_dev.h (working copy) @@ -496,7 +496,7 @@ void mthca_free_cq(struct mthca_dev *dev void mthca_cq_completion(struct mthca_dev *dev, u32 cqn); void mthca_cq_event(struct mthca_dev *dev, u32 cqn, enum ib_event_type event_type); -void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn, +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, struct mthca_srq *srq); void mthca_cq_resize_copy_cqes(struct mthca_cq *cq); int mthca_alloc_cq_buf(struct mthca_dev *dev, struct mthca_cq_buf *buf, int nent); --- infiniband/hw/mthca/mthca_cq.c (revision 6999) +++ infiniband/hw/mthca/mthca_cq.c (working copy) @@ -238,9 +238,9 @@ void mthca_cq_event(struct mthca_dev *de spin_lock(&dev->cq_table.lock); cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1)); - if (cq) - atomic_inc(&cq->refcount); + ++cq->refcount; + spin_unlock(&dev->cq_table.lock); if (!cq) { @@ -254,8 +254,10 @@ void mthca_cq_event(struct mthca_dev *de if (cq->ibcq.event_handler) cq->ibcq.event_handler(&event, cq->ibcq.cq_context); - if (atomic_dec_and_test(&cq->refcount)) + spin_lock(&dev->cq_table.lock); + if (!--cq->refcount) wake_up(&cq->wait); + spin_unlock(&dev->cq_table.lock); } static inline int is_recv_cqe(struct mthca_cqe *cqe) @@ -267,23 +269,13 @@ static inline int is_recv_cqe(struct mth return !(cqe->is_send & 0x80); } -void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn, +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, struct mthca_srq *srq) { - struct mthca_cq *cq; struct mthca_cqe *cqe; u32 prod_index; int nfreed = 0; - spin_lock_irq(&dev->cq_table.lock); - cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1)); - if (cq) - atomic_inc(&cq->refcount); - spin_unlock_irq(&dev->cq_table.lock); - - if (!cq) - return; - spin_lock_irq(&cq->lock); /* @@ -301,7 +293,7 @@ void mthca_cq_clean(struct mthca_dev *de if (0) mthca_dbg(dev, "Cleaning QPN %06x from CQN %06x; ci %d, pi %d\n", - qpn, cqn, cq->cons_index, prod_index); + qpn, cq->cqn, cq->cons_index, prod_index); /* * Now sweep backwards through the CQ, removing CQ entries @@ -325,8 +317,6 @@ void mthca_cq_clean(struct mthca_dev *de } spin_unlock_irq(&cq->lock); - if (atomic_dec_and_test(&cq->refcount)) - wake_up(&cq->wait); } void mthca_cq_resize_copy_cqes(struct mthca_cq *cq) @@ -821,7 +811,7 @@ int mthca_init_cq(struct mthca_dev *dev, } spin_lock_init(&cq->lock); - atomic_set(&cq->refcount, 1); + cq->refcount = 1; init_waitqueue_head(&cq->wait); memset(cq_context, 0, sizeof *cq_context); @@ -896,6 +886,17 @@ err_out: return err; } +static int get_cq_refcount(struct mthca_dev *dev, struct mthca_cq *cq) +{ + int c; + + spin_lock_irq(&dev->cq_table.lock); + c = cq->refcount; + spin_unlock_irq(&dev->cq_table.lock); + + return c; +} + void mthca_free_cq(struct mthca_dev *dev, struct mthca_cq *cq) { @@ -929,6 +930,7 @@ void mthca_free_cq(struct mthca_dev *dev spin_lock_irq(&dev->cq_table.lock); mthca_array_clear(&dev->cq_table.cq, cq->cqn & (dev->limits.num_cqs - 1)); + --cq->refcount; spin_unlock_irq(&dev->cq_table.lock); if (dev->mthca_flags & MTHCA_FLAG_MSI_X) @@ -936,8 +938,7 @@ void mthca_free_cq(struct mthca_dev *dev else synchronize_irq(dev->pdev->irq); - atomic_dec(&cq->refcount); - wait_event(cq->wait, !atomic_read(&cq->refcount)); + wait_event(cq->wait, !get_cq_refcount(dev, cq)); if (cq->is_kernel) { mthca_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe); --- infiniband/hw/mthca/mthca_qp.c (revision 6999) +++ infiniband/hw/mthca/mthca_qp.c (working copy) @@ -238,7 +238,7 @@ void mthca_qp_event(struct mthca_dev *de spin_lock(&dev->qp_table.lock); qp = mthca_array_get(&dev->qp_table.qp, qpn & (dev->limits.num_qps - 1)); if (qp) - atomic_inc(&qp->refcount); + ++qp->refcount; spin_unlock(&dev->qp_table.lock); if (!qp) { @@ -255,8 +255,10 @@ void mthca_qp_event(struct mthca_dev *de if (qp->ibqp.event_handler) qp->ibqp.event_handler(&event, qp->ibqp.qp_context); - if (atomic_dec_and_test(&qp->refcount)) + spin_lock(&dev->qp_table.lock); + if (!--qp->refcount) wake_up(&qp->wait); + spin_unlock(&dev->qp_table.lock); } static int to_mthca_state(enum ib_qp_state ib_state) @@ -831,10 +833,10 @@ int mthca_modify_qp(struct ib_qp *ibqp, * entries and reinitialize the QP. */ if (new_state == IB_QPS_RESET && !qp->ibqp.uobject) { - mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); if (qp->ibqp.send_cq != qp->ibqp.recv_cq) - mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); mthca_wq_init(&qp->sq); @@ -1094,7 +1096,7 @@ static int mthca_alloc_qp_common(struct int ret; int i; - atomic_set(&qp->refcount, 1); + qp->refcount = 1; init_waitqueue_head(&qp->wait); qp->state = IB_QPS_RESET; qp->atomic_rd_en = 0; @@ -1316,6 +1318,17 @@ int mthca_alloc_sqp(struct mthca_dev *de return err; } +static int get_qp_refcount(struct mthca_dev *dev, struct mthca_qp *qp) +{ + int c; + + spin_lock_irq(&dev->qp_table.lock); + c = qp->refcount; + spin_unlock_irq(&dev->qp_table.lock); + + return c; +} + void mthca_free_qp(struct mthca_dev *dev, struct mthca_qp *qp) { @@ -1337,14 +1350,14 @@ void mthca_free_qp(struct mthca_dev *dev spin_lock(&dev->qp_table.lock); mthca_array_clear(&dev->qp_table.qp, qp->qpn & (dev->limits.num_qps - 1)); + --qp->refcount; spin_unlock(&dev->qp_table.lock); if (send_cq != recv_cq) spin_unlock(&recv_cq->lock); spin_unlock_irq(&send_cq->lock); - atomic_dec(&qp->refcount); - wait_event(qp->wait, !atomic_read(&qp->refcount)); + wait_event(qp->wait, !get_qp_refcount(dev, qp)); if (qp->state != IB_QPS_RESET) mthca_MODIFY_QP(dev, qp->state, IB_QPS_RESET, qp->qpn, 0, @@ -1356,10 +1369,10 @@ void mthca_free_qp(struct mthca_dev *dev * unref the mem-free tables and free the QPN in our table. */ if (!qp->ibqp.uobject) { - mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); if (qp->ibqp.send_cq != qp->ibqp.recv_cq) - mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); mthca_free_memfree(dev, qp); From rdreier at cisco.com Mon May 8 15:15:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 15:15:21 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <445FC0AA.50503@ichips.intel.com> (Sean Hefty's message of "Mon, 08 May 2006 15:05:30 -0700") References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <445FC0AA.50503@ichips.intel.com> Message-ID: Sean> Could we use atomic_dec_and_lock() instead? This would keep Sean> refcount atomic, but use a spinlock to synchronize with Sean> destruction. Hmm, how does that help? Just going to a plain integer with a spinlock to protect it seems simple and clear. - R. From rdreier at cisco.com Mon May 8 15:17:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 15:17:21 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508220910.GB1084@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 9 May 2006 01:09:10 +0300") References: <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <20060508171044.GT21036@mellanox.co.il> <20060508173655.GA24615@mellanox.co.il> <20060508214337.GC26276@mellanox.co.il> <20060508220910.GB1084@mellanox.co.il> Message-ID: Michael> inline? rename mthca_cq_refcount for consistency? The compiler should be smart enough to decided whether to inline it or not. I thought the get_cq_refcount name is consistent with get_cqe, get_wqe, etc. - R. From sweitzen at cisco.com Mon May 8 15:36:41 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 8 May 2006 15:36:41 -0700 Subject: [openib-general] OFED-1.0-rc4 is available Message-ID: Open MPI compiles on PPC64, at least on RHEL4 it does. 3. MPI OSU and Open MPI compilation fails on PPC64 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon May 8 15:42:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 01:42:43 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508165625.GR21036@mellanox.co.il> <20060508171044.GT21036@mellanox.co.il> <20060508173655.GA24615@mellanox.co.il> <20060508214337.GC26276@mellanox.co.il> <20060508220910.GB1084@mellanox.co.il> Message-ID: <20060508224242.GA1962@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] cm refcount race fix > > Michael> inline? rename mthca_cq_refcount for consistency? > > The compiler should be smart enough to decided whether to inline it or > not. OK I guess though since there's only one user we might as well help out. > I thought the get_cq_refcount name is consistent with get_cqe, > get_wqe, etc. Oh OK. -- MST From hjafri at ncsa.uiuc.edu Mon May 8 15:52:17 2006 From: hjafri at ncsa.uiuc.edu (Hassan M. Jafri) Date: Mon, 08 May 2006 17:52:17 -0500 Subject: [openib-general] ip over ib throughtput In-Reply-To: <20041229134351.GA3486@mellanox.co.il> References: <20041229134351.GA3486@mellanox.co.il> Message-ID: <445FCBA1.1030903@ncsa.uiuc.edu> I cant crank out more than 150 MB/sec with my 2.0 GHz xeons. verbs level benchmarks, however give decent numbers for bandwidth. With netperf, the server side CPU usage is 99% which is much higher than other posted bandwidth results on this thread. Any suggestions? Here is the complete configuration for my bandwidth tests Kernel-2.6.15.4 netperf-2.3-3 OpenIB rev 6552 MTLP23108-CF128 Firmware 3.4.0 MSI-X is enabled for the HCA ------------------------------ Here is the netperf output TCP STREAM TEST to 192.168.2.2 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. MBytes /s % T % T us/KB us/KB 262142 262142 32768 10.01 151.32 59.66 99.84 7.700 12.886 ------------------------------- Here is ib0 config for one of the nodes ib0 Link encap:UNSPEC HWaddr 00-02-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:192.168.2.1 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:0:3ce9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:1724527 errors:0 dropped:0 overruns:0 frame:0 TX packets:9685456 errors:0 dropped:2 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:89830114 (85.6 MiB) TX bytes:2213308646 (2.0 GiB) Michael S. Tsirkin wrote: > Hi! > What kind of performance do people see with ip over ib on gen2? > I see about 100Mbyte/sec at 99% CPU utilisation on send, > on an express card, Xeon 2.8GHz, SSE doorbells enabled. > > MST > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From mshefty at ichips.intel.com Mon May 8 15:53:50 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 08 May 2006 15:53:50 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <445FC0AA.50503@ichips.intel.com> Message-ID: <445FCBFE.7070400@ichips.intel.com> Roland Dreier wrote: > Sean> Could we use atomic_dec_and_lock() instead? This would keep > Sean> refcount atomic, but use a spinlock to synchronize with > Sean> destruction. > > Hmm, how does that help? > > Just going to a plain integer with a spinlock to protect it seems > simple and clear. Basically, I'm just trying to explore what options we have. The cost of using a spinlock around an integer is that we end up serializing everything with the larger lock. With the CM, sometimes the global CM lock is being held when refcount is incremented, but there are places where only a lock on the cm_id is held. And unless the id is being destroyed, there's no need to acquire the lock. Thinking about this more, it seems that we're wanting something similar to: initialize() { get destroy mutex } release() { if (atomic_dec_and_test(refcount)) put destroy mutex; /* or signal event */ } destroy() { release() get destroy mutex; /* wait for event to be signaled */ } Using an actual mutex gets ugly since it's held for a long time, and ends up needing to be released in destroy(). And I don't see that there's an event abstraction that would work. - Sean From xma at us.ibm.com Mon May 8 16:23:24 2006 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 8 May 2006 16:23:24 -0700 Subject: [openib-general] ip over ib throughtput In-Reply-To: <445FCBA1.1030903@ncsa.uiuc.edu> Message-ID: I am testing most of my patches. Under 1.Intel(R) Xeon(TM) CPU 2.80GHz, one cpu, 2. fw-23108-3_4_000-MHXL-CF128-T.bin 3. pci-x without msi_x enabled 4. kernel 2.6.16 5. netperf-2.4.0 6. SVN 68XX+several IPoIB patches The best result I got so far: Testing with the following command line: netperf -l 60 -H 10.1.1.100 -t TCP_STREAM -i 10,2 -I 95,5 -- -m 16384 -s 349520 -S 349520 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.100 (10.1.1.100) port 0 AF_INET : +/-2.5% @ 95% conf. Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 699040 699040 16384 60.00 3668.07 (458MB/s) cpu utilization was around 95%. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 "Hassan M. Jafri" Sent by: openib-general-bounces at openib.org 05/08/2006 03:52 PM To openib-general at openib.org cc Subject Re: [openib-general] ip over ib throughtput I cant crank out more than 150 MB/sec with my 2.0 GHz xeons. verbs level benchmarks, however give decent numbers for bandwidth. With netperf, the server side CPU usage is 99% which is much higher than other posted bandwidth results on this thread. Any suggestions? Here is the complete configuration for my bandwidth tests Kernel-2.6.15.4 netperf-2.3-3 OpenIB rev 6552 MTLP23108-CF128 Firmware 3.4.0 MSI-X is enabled for the HCA ------------------------------ Here is the netperf output TCP STREAM TEST to 192.168.2.2 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. MBytes /s % T % T us/KB us/KB 262142 262142 32768 10.01 151.32 59.66 99.84 7.700 12.886 ------------------------------- Here is ib0 config for one of the nodes ib0 Link encap:UNSPEC HWaddr 00-02-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:192.168.2.1 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:0:3ce9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:1724527 errors:0 dropped:0 overruns:0 frame:0 TX packets:9685456 errors:0 dropped:2 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:89830114 (85.6 MiB) TX bytes:2213308646 (2.0 GiB) Michael S. Tsirkin wrote: > Hi! > What kind of performance do people see with ip over ib on gen2? > I see about 100Mbyte/sec at 99% CPU utilisation on send, > on an express card, Xeon 2.8GHz, SSE doorbells enabled. > > MST > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at openib.org Mon May 8 16:49:45 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 8 May 2006 16:49:45 -0700 (PDT) Subject: [openib-general] [Bug 68] New: OFED 1.0 rc4: kernel build failed in IB core on SUSE10 Message-ID: <20060508234945.017D82283DD@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=68 Summary: OFED 1.0 rc4: kernel build failed in IB core on SUSE10 Product: OpenFabrics Linux Version: gen2 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: IB Core AssignedTo: bugzilla at openib.org ReportedBy: kball at pathscale.com I see the following compiler error trying to install on SUSE 10. I will attach the debug_info file. /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/cm.c: In function 'ib_cm_cleanup': /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/cm.c:3366: error: implicit declaration of function 'idr_destroy' make[5]: *** [/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/cm.o] Error 1 make[4]: *** [/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core] Error 2 make[3]: *** [_module_/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband] Error 2 make[2]: *** [modules] Error 2 make[1]: *** [modules] Error 2 make[1]: Leaving directory `/usr/src/linux-2.6.13-15.7-obj/x86_64/smp' make: *** [kernel] Error 2 ERROR: Failed to execute: make kernel ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Mon May 8 16:50:51 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 8 May 2006 16:50:51 -0700 (PDT) Subject: [openib-general] [Bug 68] OFED 1.0 rc4: kernel build failed in IB core on SUSE10 Message-ID: <20060508235051.CA22F2283DD@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=68 ------- Additional Comments From kball at pathscale.com 2006-05-08 16:50 ------- Created an attachment (id=12) --> (http://openib.org/bugzilla/attachment.cgi?id=12&action=view) debug_info.tgz ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mshefty at ichips.intel.com Mon May 8 16:46:18 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 08 May 2006 16:46:18 -0700 Subject: [openib-general] [PATCH 3/3] librdmacm: add ability to get/set transport specific options In-Reply-To: <200605071157.55936.jackm@mellanox.co.il> References: <200605071157.55936.jackm@mellanox.co.il> Message-ID: <445FD84A.2050102@ichips.intel.com> Jack Morgenstein wrote: > Should we use this revision (6949/6950 of the openib trunk) of the rdma_cm in > the upcoming OFED (branch) release? The end-users should probably decide that. These changes use the local_sa, which has not been queued to be merged upstream yet. > (since the branch rdma_cm kernel module will be in the field -- and it will > not support the get/set transport-specific options), A user should simply get ENOSYS if they try to use those calls on an older kernel. All other calls should work fine. - Sean From Don.Albert at Bull.com Mon May 8 17:25:41 2006 From: Don.Albert at Bull.com (Don.Albert at Bull.com) Date: Mon, 8 May 2006 17:25:41 -0700 Subject: [openib-general] NOP problem in ib_mthca on OFED RC4 Message-ID: Back in March I had a problem with initializing the ib_mthca driver on an EM64T system. The module loading would give an error of "ib_mthca 0000:03:00.0: NOP command failed to generate interrupt (IRQ 169), aborting." This appeared to be corrected when I updated the firmware on the Mellanox MT25208 HCA card. The problem has reappeared with the OFED release, on the same system, but different software and a different HCA card. I have a small testbed with two EM64T machines connected back-to-back with two Mellanox MT25204 single port DDR cards. I was successfully running the backported 2.6.9-34 kernel on RHEL4 Update 3, with a recent version of the OpenIB tree. Both systems would come up and the cards successfully initialized. Over the weekend I moved to the 2.6.16 stock kernel, and then built and installed the OFED-1.0-rc4 release. One of the systems appears to come up ok, but the port stays in the "down" state. I assumed this was because the other end of the link (the other machine) was not up. The second machine boots, but I see the following in dmesg: ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) ib_mthca: Initializing 0000:03:00.0 ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 16 (level, low) -> IRQ 169 PCI: Setting latency timer of device 0000:03:00.0 to 64 ib_mthca 0000:03:00.0: NOP command failed to generate interrupt (IRQ 169), aborting. ib_mthca 0000:03:00.0: BIOS or ACPI interrupt routing problem? When I had the problem previously, Roland Drier suggested trying to load the ib_mthca module with "fw_cmd_doorbell=0", which did avoid the error then, and in fact does on this new problem. But the question is why? Updating the firmware on the old board seemed to have solved the problem before, but now it has occurred again on a fairly new card with recent firmware. Has anyone else seen this problem? One thing that may have a bearing on this is that the "/sbin/lspci" command has also started issuing an error message relating to the PCI slot that the HCA is in. Here is the message: pcilib: Resource 2 in /sys/bus/pci/devices/0000:03:00.0/resource has a 64-bit address, ignoring .... 03:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20) Do I need a new version of pcilib? I currently have pciutils-2.1.99.test8-3.1. -Don Albert- -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon May 8 17:51:01 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 17:51:01 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <445FCBFE.7070400@ichips.intel.com> (Sean Hefty's message of "Mon, 08 May 2006 15:53:50 -0700") References: <20060508054529.GC19660@mellanox.co.il> <20060508160614.GJ21036@mellanox.co.il> <20060508164436.GP21036@mellanox.co.il> <20060508165625.GR21036@mellanox.co.il> <445FC0AA.50503@ichips.intel.com> <445FCBFE.7070400@ichips.intel.com> Message-ID: Sean> The cost of using a spinlock around an integer is that we Sean> end up serializing everything with the larger lock. With Sean> the CM, sometimes the global CM lock is being held when Sean> refcount is incremented, but there are places where only a Sean> lock on the cm_id is held. And unless the id is being Sean> destroyed, there's no need to acquire the lock. Well, you can pick whatever fine-grained lock you want to protect the reference count with. atomic_dec_and_lock() doesn't really help with this, since you still need a spinlock. (And if you look at the implementation of atomic_dec_and_lock(), you can see that it takes the spinlock every time) Sean> Using an actual mutex gets ugly since it's held for a long Sean> time, and ends up needing to be released in destroy(). And Sean> I don't see that there's an event abstraction that would Sean> work. If you wanted to implement this, you would have to use a completion. A mutex can't be used because it must be released in process context with interrupts enabled. And a semaphore can't be used because there's an implicit use-after-free with semaphores (basically up() touches the semaphore memory after it calls wake_up()). - R. From rdreier at cisco.com Mon May 8 17:59:31 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 17:59:31 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060508224242.GA1962@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 9 May 2006 01:42:43 +0300") References: <20060508165625.GR21036@mellanox.co.il> <20060508171044.GT21036@mellanox.co.il> <20060508173655.GA24615@mellanox.co.il> <20060508214337.GC26276@mellanox.co.il> <20060508220910.GB1084@mellanox.co.il> <20060508224242.GA1962@mellanox.co.il> Message-ID: Michael> OK I guess though since there's only one user we might as Michael> well help out. Yeah, you're probably right. Old gcc might not be smart enough... From xma at us.ibm.com Mon May 8 18:18:53 2006 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 8 May 2006 18:18:53 -0700 Subject: [openib-general] profiling data on mthca Message-ID: Hello Roland, Below are kernel profiling data I got from my test. Do you know why on SMP, it is mthca_init_eq_table, but on UP it was mthca_tavor_interrupt? Something I did wrong? SMP Profiling through timer interrupt ==== samples % image name app name symbol name 412179 35.2048 vmlinux vmlinux mwait_idle 172398 14.7248 vmlinux vmlinux csum_partial_copy_generic 95974 8.1973 vmlinux vmlinux handle_IRQ_event 43462 3.7122 ib_mthca.ko ib_mthca mthca_init_eq_table =================================================== 29770 2.5427 tg3 tg3 (no symbols) 29169 2.4914 ib_mthca.ko ib_mthca mthca_poll_cq 28105 2.4005 ib_ipoib.ko ib_ipoib ipoib_start_xmit 24947 2.1308 vmlinux vmlinux ip_queue_xmit 23124 1.9751 vmlinux vmlinux kfree UP Profiling through timer interrupt ======== samples % image name app name symbol name 412179 35.2048 vmlinux vmlinux mwait_idle 172398 14.7248 vmlinux vmlinux csum_partial_copy_generic 95974 8.1973 vmlinux vmlinux handle_IRQ_event 43462 3.7122 ib_mthca.ko ib_mthca mthca_tavor_interrupt ==================================================== 30411 2.5974 ib_mthca.ko ib_mthca mthca_poll_cq 29770 2.5427 tg3 tg3 (no symbols) 27773 2.3721 ib_ipoib.ko ib_ipoib ipoib_start_xmit 24947 2.1308 vmlinux vmlinux ip_queue_xmit 23124 1.9751 vmlinux vmlinux kfree Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon May 8 18:19:04 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 18:19:04 -0700 Subject: [openib-general] profiling data on mthca In-Reply-To: (Shirley Ma's message of "Mon, 8 May 2006 18:18:53 -0700") References: Message-ID: Shirley> Hello Roland, Below are kernel profiling data I got from Shirley> my test. Do you know why on SMP, it is Shirley> mthca_init_eq_table, but on UP it was Shirley> mthca_tavor_interrupt? Shirley> Something I did wrong? Probably something is messed up in how the addresses got converted to a symbolic name. Maybe the SMP numbers used the UP mthca symbol table? From bugzilla-daemon at openib.org Mon May 8 18:38:06 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 8 May 2006 18:38:06 -0700 (PDT) Subject: [openib-general] [Bug 72] New: OFED 1.0: Make IPoIB default configurations sane Message-ID: <20060509013806.4D3EE2283DD@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=72 Summary: OFED 1.0: Make IPoIB default configurations sane Product: OpenFabrics Linux Version: gen2 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: kball at pathscale.com Right now if I run the install script, the default IP addresses brought up for ib0 and ib1 are based off of eth0 in a bizarre way; they increment the first octet of the email address. This will set the ip addresses to some random public-domain ip address. It should really only suggest something from one of the 3 private network address ranges: 10.0.0.0 � 10.255.255.255 172.16.0.0 � 172.31.255.255 or 192.168.0.0 � 192.168.255.255 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From xma at us.ibm.com Mon May 8 18:43:13 2006 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 8 May 2006 18:43:13 -0700 Subject: [openib-general] profiling data on mthca In-Reply-To: Message-ID: Got the reason. I made a new kernel without turning on oprofile. The wired thing is opcontrol didn't complain about the kernel and generated the results. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Mon May 8 19:49:07 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 8 May 2006 19:49:07 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: Message-ID: >If you wanted to implement this, you would have to use a completion. >A mutex can't be used because it must be released in process context >with interrupts enabled. And a semaphore can't be used because >there's an implicit use-after-free with semaphores (basically up() >touches the semaphore memory after it calls wake_up()). Ah, I was looking around the kernel include files for some sort of signaled event. A completion looks like it's exactly what we want. Would replacing wake_up() with complete() and wait_event() with wait_for_completion() work? - Sean From zhushisongzhu at yahoo.com Mon May 8 19:51:42 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Mon, 8 May 2006 19:51:42 -0700 (PDT) Subject: [openib-general] sdp test tools Message-ID: <20060509025142.83638.qmail@web36903.mail.mud.yahoo.com> I hope to test sdp connection capacity and its performance. Who has the test tools? Or which tool is more suitable? tks zhu shi song __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From sweitzen at cisco.com Mon May 8 19:55:00 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 8 May 2006 19:55:00 -0700 Subject: [openib-general] sdp test tools Message-ID: We use netperf with libsdp.so, for example: $ LD_PRELOAD=libsdp.so netperf ... Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of zhu shi song > Sent: Monday, May 08, 2006 7:52 PM > To: openib-general at openib.org > Subject: [openib-general] sdp test tools > > I hope to test sdp connection capacity and its > performance. Who has the test tools? Or which tool > is more suitable? > tks > zhu shi song > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Mon May 8 21:22:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 21:22:32 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: (Sean Hefty's message of "Mon, 8 May 2006 19:49:07 -0700") References: Message-ID: Sean> Ah, I was looking around the kernel include files for some Sean> sort of signaled event. A completion looks like it's Sean> exactly what we want. Would replacing wake_up() with Sean> complete() and wait_event() with wait_for_completion() work? Yeah, although you have to make sure that even the atomic_dec() in the destroy function itself becomes if (atomic_dec_and_test()) complete(); or else you'll probably wait forever... And you also have to add a struct completion to your structure. - R. From or.gerlitz at gmail.com Mon May 8 21:38:27 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Tue, 9 May 2006 07:38:27 +0300 Subject: [openib-general] CMA: port 2 loopback problems In-Reply-To: <20060508202904.GD25527@mellanox.co.il> References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> <445FA39B.50107@ichips.intel.com> <20060508202904.GD25527@mellanox.co.il> Message-ID: <15ddcffd0605082138x366b6c42mf80ab20f3dcb5bf0@mail.gmail.com> On 5/8/06, Michael S. Tsirkin wrote: > Quoting r. Sean Hefty : >>>>Is it possible to communicate between QPs on >>>>the same device if that device is disconnected from the fabric? >>>Yes. Michael, can you educate me a little here, how does this loopback works? is there a special case within the HCA saying "if QPX.DLID is MY LID then move packets directly from QPX.TX queue to QPY.RX queue" or is it something else? > > I'm wondering if the correct solution to this issue isn't to create some > > sort of "loopback" path record. I'm not sure what changes would be > > required of the IB CM. > I thought about this too. People actually do expect loopback to work when link > is down. I guess we could create "loopback" path record, with parameters such > as SL editable from sysfs. > And on port event we could try to update it from the SM. And trivially, if there's no such special purpose rule as i have described above, this connection is broken when the port is UP since the SM can change the port LID that was used in the PATH set into the QPs. So why is it intresting to deal with the IB infrastructure working for non active ports? Or. From rdreier at cisco.com Mon May 8 21:43:01 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 21:43:01 -0700 Subject: [openib-general] CMA: port 2 loopback problems In-Reply-To: <15ddcffd0605082138x366b6c42mf80ab20f3dcb5bf0@mail.gmail.com> (Or Gerlitz's message of "Tue, 9 May 2006 07:38:27 +0300") References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> <445FA39B.50107@ichips.intel.com> <20060508202904.GD25527@mellanox.co.il> <15ddcffd0605082138x366b6c42mf80ab20f3dcb5bf0@mail.gmail.com> Message-ID: Or> Michael, can you educate me a little here, how does this Or> loopback works? is there a special case within the HCA saying Or> "if QPX.DLID is MY LID then move packets directly from QPX.TX Or> queue to QPY.RX queue" or is it something else? Yes, see the compliance statement C17-18 on page 1028 of the IB spec: C17-18: HCAs shall allow packets with a destination address the same as that of the port on which the packet is issued. Such a loopback packet shall not go onto the wire. - R. From or.gerlitz at gmail.com Mon May 8 21:43:21 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Tue, 9 May 2006 07:43:21 +0300 Subject: [openib-general] Re: CMA disconnect In-Reply-To: <445F78A7.2020401@ichips.intel.com> References: <445F78A7.2020401@ichips.intel.com> Message-ID: <15ddcffd0605082143t26f1f049w9043d017678d2d90@mail.gmail.com> On 5/8/06, Sean Hefty wrote: > Or Gerlitz wrote: > > Looking in the code i have realized that it is a must for the CMA > > consumer to call rdma_disconnect to have the QP state moved into ERROR. > Maybe it would make sense for the CMA to transition the QP to the error state > before destroying it? It makes sense to it do as a strict/cleanup policy, but **only** if the user did not cause the CMA to move the QP to ERROR **before** by calling rdma_disconnect. For example in iSER code we wait to get both the DISCONNECTED event and completions/FLUSHES on all the posted RX/TX before calling rdma_destroy_id, so deffering the QP state change to ERROR will deadlock our design. Or. From or.gerlitz at gmail.com Mon May 8 21:49:18 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Tue, 9 May 2006 07:49:18 +0300 Subject: [openib-general] CMA: port 2 loopback problems In-Reply-To: References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> <445FA39B.50107@ichips.intel.com> <20060508202904.GD25527@mellanox.co.il> <15ddcffd0605082138x366b6c42mf80ab20f3dcb5bf0@mail.gmail.com> Message-ID: <15ddcffd0605082149y2775701ap41fd5cceb21d10@mail.gmail.com> On 5/9/06, Roland Dreier wrote: > Or> Michael, can you educate me a little here, how does this > Or> loopback works? is there a special case within the HCA saying > Or> "if QPX.DLID is MY LID then move packets directly from QPX.TX > Or> queue to QPY.RX queue" or is it something else? > > Yes, see the compliance statement C17-18 on page 1028 of the IB spec: > > C17-18: HCAs shall allow packets with a destination address the same > as that of the port on which the packet is issued. Such a loopback > packet shall not go onto the wire. thanks! you have answered to me on a three years old question and its RTFM ... the IB "M" is size is sort of "F" so... anyway, what DLID should be set in the path of such QPs? also per your understanding (and experience...) what is happening when the port gets active, does it influence loopback QPs or they just keep working as they did before? Or. From bos at pathscale.com Mon May 8 21:51:08 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Mon, 08 May 2006 21:51:08 -0700 Subject: [openib-general] Re: Need OpenIB bugzilla component for RDS In-Reply-To: <96f8e60e0605071007u41ba9789g6f52b08e5be0fcd8@mail.gmail.com> References: <1146893261.1045.18.camel@localhost.localdomain> <96f8e60e0605071007u41ba9789g6f52b08e5be0fcd8@mail.gmail.com> Message-ID: <1147150268.5987.0.camel@localhost.localdomain> On Sun, 2006-05-07 at 10:07 -0700, Ranjit Pandit wrote: > Please mark me as the default owner of RDS bugs. OK, you're in. Message-ID: The optimization of slow path saves more (executing) instructions than the number of extra instructions needed to achieve it :-). Also maintains consistency with other routines and code standard where a match implies "break". - KK Roland Dreier wrote on 05/04/2006 10:36:26 PM: > I think this is a valid change but on the other hand I don't see much > motivation to apply it. It slightly optimizes a slow path at the cost > of slightly enlarging the code, which doesn't seem like a good > tradeoff to me. Am I off base? > > - R. From rdreier at cisco.com Mon May 8 21:53:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 08 May 2006 21:53:24 -0700 Subject: [openib-general] CMA: port 2 loopback problems In-Reply-To: <15ddcffd0605082149y2775701ap41fd5cceb21d10@mail.gmail.com> (Or Gerlitz's message of "Tue, 9 May 2006 07:49:18 +0300") References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> <445FA39B.50107@ichips.intel.com> <20060508202904.GD25527@mellanox.co.il> <15ddcffd0605082138x366b6c42mf80ab20f3dcb5bf0@mail.gmail.com> <15ddcffd0605082149y2775701ap41fd5cceb21d10@mail.gmail.com> Message-ID: Or> thanks! you have answered to me on a three years old question Or> and its RTFM ... the IB "M" is size is sort of "F" Or> so... anyway, what DLID should be set in the path of such QPs? Or> also per your understanding (and experience...) what is Or> happening when the port gets active, does it influence Or> loopback QPs or they just keep working as they did before? The DLID must be the LID of the port. If the LID changes when the port becomes active then packets sent to the old LID will start going into the fabric and either disappear or go to some other port. There was some discussion on openib-general a year or two ago about picking a pseudo-random LID for local ports when initializing HCAs, to allow loopback connections to stay alive in most cases (collisions are relatively unlikely and the SM should respect existing LIDs...). - R. From or.gerlitz at gmail.com Mon May 8 21:56:43 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Tue, 9 May 2006 07:56:43 +0300 Subject: [openib-general] Re: CMA: compliancy issue? In-Reply-To: <445FAAE4.4050600@ichips.intel.com> References: <20060508131156.GA21036@mellanox.co.il> <445F4AA4.2020802@voltaire.com> <20060508134900.GD21036@mellanox.co.il> <445F54E1.8000305@voltaire.com> <20060508150534.GF21036@mellanox.co.il> <445F848C.2000502@ichips.intel.com> <20060508194705.GA25527@mellanox.co.il> <445FA246.5090106@ichips.intel.com> <20060508202238.GC25527@mellanox.co.il> <445FAAE4.4050600@ichips.intel.com> Message-ID: <15ddcffd0605082156r4b9f6e94raf51d31d6664848e@mail.gmail.com> On 5/8/06, Sean Hefty wrote: > I think that it makes more sense to give the user the CONNECT_RESPONSE event, > and have them call rdma_accept() or rdma_reject(), versus allowing them to call > rdma_reject() after an ESTABLISHED event. There is already code for this; it's > just tied to the existence of a QP off the rdma_cm_id. I agree that it does not make sense to have REJ following ESTABLISHED, so it should be either as it is now or change the cma to deliver RESP have the ULP call accept or reject and based on the user call send RTU or REJ, such that in the active side there is always one event of the set {REJ, RESPONSE, CONNECT_ERROR} and ESTABLISHED in delivered only in the passive side. >From iSER point of view, this approach is fine, and it would allow for some future flexibility to reject the REP. We prefer to implement it only for 2.6.19, that is when 2.6.18-rc1 is out. Or. From or.gerlitz at gmail.com Mon May 8 22:03:00 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Tue, 9 May 2006 08:03:00 +0300 Subject: [openib-general] CMA: port 2 loopback problems In-Reply-To: References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> <445FA39B.50107@ichips.intel.com> <20060508202904.GD25527@mellanox.co.il> <15ddcffd0605082138x366b6c42mf80ab20f3dcb5bf0@mail.gmail.com> <15ddcffd0605082149y2775701ap41fd5cceb21d10@mail.gmail.com> Message-ID: <15ddcffd0605082203k625ab6cayf0ae9b5f0af1d8e@mail.gmail.com> On 5/9/06, Roland Dreier wrote: > The DLID must be the LID of the port. > If the LID changes when the port becomes active then packets sent to > the old LID will start going into the fabric and either disappear or > go to some other port. OK, thanks this is what i was thinking. > There was some discussion on openib-general a year or two ago about > picking a pseudo-random LID for local ports when initializing HCAs, to > allow loopback connections to stay alive in most cases (collisions are > relatively unlikely and the SM should respect existing LIDs...). Sure, i remember that. I think such an approach should be instantly **rejected** on the spot by the openib community. Among other reasons since basically, IB is not ment to that. The SM should set the LIDs of the fabric and its easy to come up with senarios which would not be supported under this approach, eg LMC > 0 would cause the SM to change the random LID, systematic derivation of the LID eg from the IP is problematic as only 48K of the 64K LIDs are legal for unicast,etc,etc you named it. Or. From mst at mellanox.co.il Mon May 8 22:28:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 08:28:33 +0300 Subject: [openib-general] Re: ip over ib throughtput In-Reply-To: <445FCBA1.1030903@ncsa.uiuc.edu> References: <20041229134351.GA3486@mellanox.co.il> <445FCBA1.1030903@ncsa.uiuc.edu> Message-ID: <20060509052833.GC17814@mellanox.co.il> Quoting r. Hassan M. Jafri : > Subject: Re: ip over ib throughtput > > I cant crank out more than 150 MB/sec with my 2.0 GHz xeons. verbs level > benchmarks, however give decent numbers for bandwidth. With netperf, the > server side CPU usage is 99% which is much higher than other posted > bandwidth results on this thread. Any suggestions? > > Here is the complete configuration for my bandwidth tests > > Kernel-2.6.15.4 > netperf-2.3-3 > OpenIB rev 6552 > MTLP23108-CF128 > Firmware 3.4.0 > MSI-X is enabled for the HCA Then something in TCP/IP stack configuration is consuming extra CPU cycles. Could be e.g. you have some kind of packet filtering installed? -- MST From mst at mellanox.co.il Mon May 8 22:32:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 08:32:01 +0300 Subject: [openib-general] Re: NOP problem in ib_mthca on OFED RC4 In-Reply-To: References: Message-ID: <20060509053201.GD17814@mellanox.co.il> Quoting r. Don.Albert at Bull.com : > When I had the problem previously, Roland Drier suggested trying to load the ib_mthca module with "fw_cmd_doorbell=0", which did avoid the error then, and in fact does on this new problem. But the question is why? Updating the firmware on the old board seemed to have solved the problem before, but now it has occurred again on a fairly new card with recent firmware. Has anyone else seen this problem? Which FW revision do you have? -- MST From mst at mellanox.co.il Mon May 8 22:36:08 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 08:36:08 +0300 Subject: [openib-general] Re: Re: [PATCH] cm refcount race fix In-Reply-To: References: Message-ID: <20060509053607.GE17814@mellanox.co.il> Quoting r. Sean Hefty : > Ah, I was looking around the kernel include files for some sort of signaled > event. A completion looks like it's exactly what we want. Would replacing > wake_up() with complete() and wait_event() with wait_for_completion() work? Yea but notice wait_for_completion does not have a condition, so you have to if (!atomic_dec_and_test()) wait_for_completion() and don't forget you must initialize the condition before each use. -- MST From sweitzen at cisco.com Mon May 8 22:56:27 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Mon, 8 May 2006 22:56:27 -0700 Subject: [openib-general] Re: Need OpenIB bugzilla component for RDS Message-ID: I'm not seeing an RDS component yet... Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Bryan O'Sullivan [mailto:bos at pathscale.com] > Sent: Monday, May 08, 2006 9:51 PM > To: Ranjit Pandit > Cc: Scott Weitzenkamp (sweitzen); openib-general > Subject: Re: [openib-general] Re: Need OpenIB bugzilla > component for RDS > > On Sun, 2006-05-07 at 10:07 -0700, Ranjit Pandit wrote: > > > Please mark me as the default owner of RDS bugs. > > OK, you're in. > > From leonida at voltaire.com Mon May 8 23:09:58 2006 From: leonida at voltaire.com (Leonid Arsh) Date: Tue, 9 May 2006 09:09:58 +0300 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space Message-ID: <20060509060958.GA482@voltaire.com> Roland, I'm reposting the Client Reregister event support patch for the kernel space. The patch defines the event and implements it in MTHCA. The event is implemented in software, as Michael proposed. It also moves the port_info structure definition from ipath_mad.c to ib_smi.h. Regards, Leonid Signed-off-by: Leonid Arsh Index: linux-kernel/infiniband/include/rdma/ib_verbs.h =================================================================== --- linux-kernel/infiniband/include/rdma/ib_verbs.h (revision 6969) +++ linux-kernel/infiniband/include/rdma/ib_verbs.h (working copy) @@ -283,7 +283,8 @@ IB_EVENT_SM_CHANGE, IB_EVENT_SRQ_ERR, IB_EVENT_SRQ_LIMIT_REACHED, - IB_EVENT_QP_LAST_WQE_REACHED + IB_EVENT_QP_LAST_WQE_REACHED, + IB_EVENT_CLIENT_REREGISTER }; struct ib_event { Index: linux-kernel/infiniband/include/rdma/ib_smi.h =================================================================== --- linux-kernel/infiniband/include/rdma/ib_smi.h (revision 6969) +++ linux-kernel/infiniband/include/rdma/ib_smi.h (working copy) @@ -91,4 +91,40 @@ return ((smp->status & IB_SMP_DIRECTION) == IB_SMP_DIRECTION); } +struct port_info { + __be64 mkey; + __be64 gid_prefix; + __be16 lid; + __be16 sm_lid; + __be32 cap_mask; + __be16 diag_code; + __be16 mkey_lease_period; + u8 local_port_num; + u8 link_width_enabled; + u8 link_width_supported; + u8 link_width_active; + u8 linkspeed_portstate; /* 4 bits, 4 bits */ + u8 portphysstate_linkdown; /* 4 bits, 4 bits */ + u8 mkeyprot_resv_lmc; /* 2 bits, 3 bits, 3 bits */ + u8 linkspeedactive_enabled; /* 4 bits, 4 bits */ + u8 neighbormtu_mastersmsl; /* 4 bits, 4 bits */ + u8 vlcap_inittype; /* 4 bits, 4 bits */ + u8 vl_high_limit; + u8 vl_arb_high_cap; + u8 vl_arb_low_cap; + u8 inittypereply_mtucap; /* 4 bits, 4 bits */ + u8 vlstallcnt_hoqlife; /* 3 bits, 5 bits */ + u8 operationalvl_pei_peo_fpi_fpo; /* 4 bits, 1, 1, 1, 1 */ + __be16 mkey_violations; + __be16 pkey_violations; + __be16 qkey_violations; + u8 guid_cap; + u8 clientrereg_resv_subnetto; /* 1 bit, 2 bits, 5 bits */ + u8 resv_resptimevalue; /* 3 bits, 5 bits */ + u8 localphyerrors_overrunerrors; /* 4 bits, 4 bits */ + __be16 max_credit_hint; + u8 resv; + u8 link_roundtrip_latency[3]; +} __attribute__ ((packed)); + #endif /* IB_SMI_H */ Index: linux-kernel/infiniband/hw/ipath/ipath_mad.c =================================================================== --- linux-kernel/infiniband/hw/ipath/ipath_mad.c (revision 6969) +++ linux-kernel/infiniband/hw/ipath/ipath_mad.c (working copy) @@ -137,42 +137,6 @@ return reply(smp); } -struct port_info { - __be64 mkey; - __be64 gid_prefix; - __be16 lid; - __be16 sm_lid; - __be32 cap_mask; - __be16 diag_code; - __be16 mkey_lease_period; - u8 local_port_num; - u8 link_width_enabled; - u8 link_width_supported; - u8 link_width_active; - u8 linkspeed_portstate; /* 4 bits, 4 bits */ - u8 portphysstate_linkdown; /* 4 bits, 4 bits */ - u8 mkeyprot_resv_lmc; /* 2 bits, 3, 3 */ - u8 linkspeedactive_enabled; /* 4 bits, 4 bits */ - u8 neighbormtu_mastersmsl; /* 4 bits, 4 bits */ - u8 vlcap_inittype; /* 4 bits, 4 bits */ - u8 vl_high_limit; - u8 vl_arb_high_cap; - u8 vl_arb_low_cap; - u8 inittypereply_mtucap; /* 4 bits, 4 bits */ - u8 vlstallcnt_hoqlife; /* 3 bits, 5 bits */ - u8 operationalvl_pei_peo_fpi_fpo; /* 4 bits, 1, 1, 1, 1 */ - __be16 mkey_violations; - __be16 pkey_violations; - __be16 qkey_violations; - u8 guid_cap; - u8 clientrereg_resv_subnetto; /* 1 bit, 2 bits, 5 */ - u8 resv_resptimevalue; /* 3 bits, 5 bits */ - u8 localphyerrors_overrunerrors; /* 4 bits, 4 bits */ - __be16 max_credit_hint; - u8 resv; - u8 link_roundtrip_latency[3]; -} __attribute__ ((packed)); - static int recv_subn_get_portinfo(struct ib_smp *smp, struct ib_device *ibdev, u8 port) { Index: linux-kernel/infiniband/hw/mthca/mthca_mad.c =================================================================== --- linux-kernel/infiniband/hw/mthca/mthca_mad.c (revision 6969) +++ linux-kernel/infiniband/hw/mthca/mthca_mad.c (working copy) @@ -105,20 +105,28 @@ u8 port_num, struct ib_mad *mad) { - struct ib_event event; + struct ib_event event; + struct port_info *pinfo; if ((mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && mad->mad_hdr.method == IB_MGMT_METHOD_SET) { if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { + pinfo = (struct port_info *)((struct ib_smp *)mad)->data; + mthca_update_rate(to_mdev(ibdev), port_num); update_sm_ah(to_mdev(ibdev), port_num, - be16_to_cpup((__be16 *) (mad->data + 58)), - (*(u8 *) (mad->data + 76)) & 0xf); + be16_to_cpup(&pinfo->lid), + pinfo->neighbormtu_mastersmsl & 0xf); event.device = ibdev; - event.event = IB_EVENT_LID_CHANGE; event.element.port_num = port_num; + + if(pinfo->clientrereg_resv_subnetto & 0x80) + event.event = IB_EVENT_CLIENT_REREGISTER; + else + event.event = IB_EVENT_LID_CHANGE; + ib_dispatch_event(&event); } @@ -128,6 +136,7 @@ event.element.port_num = port_num; ib_dispatch_event(&event); } + } } From leonida at voltaire.com Mon May 8 23:22:38 2006 From: leonida at voltaire.com (Leonid Arsh) Date: Tue, 9 May 2006 09:22:38 +0300 Subject: [openib-general][PATCH 2 of 3] repost: Client Reregister support for user space Message-ID: <20060509062238.GA721@voltaire.com> This is the second patch for the Client Reregister event support. The patch defines the event for the user space verbs. Signed-off-by: Leonid Arsh Index: userspace/libibverbs/include/infiniband/verbs.h =================================================================== --- userspace/libibverbs/include/infiniband/verbs.h (revision 8165) +++ userspace/libibverbs/include/infiniband/verbs.h (working copy) @@ -190,7 +190,8 @@ IBV_EVENT_SM_CHANGE, IBV_EVENT_SRQ_ERR, IBV_EVENT_SRQ_LIMIT_REACHED, - IBV_EVENT_QP_LAST_WQE_REACHED + IBV_EVENT_QP_LAST_WQE_REACHED, + IBV_EVENT_CLIENT_REREGISTER }; struct ibv_async_event { From tziporet at mellanox.co.il Mon May 8 23:35:21 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 9 May 2006 09:35:21 +0300 Subject: [openib-general] OFED-1.0-rc4 is available Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F06@mtlexch01.mtl.com> Thanks OSU MPI is also fixed now and will be available on PPC in rc5 -----Original Message----- From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] Sent: Tuesday, May 09, 2006 1:37 AM To: Tziporet Koren; openfabrics-ewg at openib.org Cc: openib-general Subject: RE: [openib-general] OFED-1.0-rc4 is available Open MPI compiles on PPC64, at least on RHEL4 it does. 3. MPI OSU and Open MPI compilation fails on PPC64 -------------- next part -------------- An HTML attachment was scrubbed... URL: From leonida at voltaire.com Mon May 8 23:28:46 2006 From: leonida at voltaire.com (Leonid Arsh) Date: Tue, 9 May 2006 09:28:46 +0300 Subject: [openib-general][PATCH 3 of 3] repost: Client Reregister support for IPoIB Message-ID: <20060509062846.GA784@voltaire.com> This is the third patch for the Client Reregister event support. The patch implements the event usage in IPoIB. Signed-off-by: Leonid Arsh Index: linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c =================================================================== --- linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c (revision 6969) +++ linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c (working copy) @@ -255,7 +255,9 @@ record->event == IB_EVENT_PKEY_CHANGE || record->event == IB_EVENT_PORT_ACTIVE || record->event == IB_EVENT_LID_CHANGE || - record->event == IB_EVENT_SM_CHANGE) { + record->event == IB_EVENT_SM_CHANGE || + record->event == IB_EVENT_CLIENT_REREGISTER + ) { ipoib_dbg(priv, "Port state change event\n"); queue_work(ipoib_workqueue, &priv->flush_task); } From zhushisongzhu at yahoo.com Tue May 9 01:11:42 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Tue, 9 May 2006 01:11:42 -0700 (PDT) Subject: [openib-general] sdp kernel panic Message-ID: <20060509081142.59860.qmail@web36908.mail.mud.yahoo.com> OS: Redhat AMD64/EMT64 64-bit Enterprise Linux v3 update 4 IB: IBG2.0.1 downloaded from mellanox website HCA: MT 25204 APP: squid-2.5-stable13, ab(apache benchmark distributed with httpd 2.2.0) I use two HCA cards to connect two computers directly. on Machine B run command: LD_PRELOAD=libsdp.so squid -d 10 -f squid2.conf on Machine A run command: LD_PRELOAD=libsdp.so ab -c 2000 -n 2000 -X 193.12.10.14:3129 Machine A paniced. The last two lines show: Code: 48 8b 6b 10 49 89 dd 4c 8b 75 20 48 8b 43 40 8b 93 c0 00 00 RIP { :ib_mad:ib_mad_send_done_handler+20} RSP <00001003ccadd78> CR2: 0000000000000010 <0> Kernel Panic - not syncing: OOps who has tested the stability and performance of sdp when supporting large mount of concurrent connections? It seems sdp connection building is very slow. tks zhu shi song __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From zhushisongzhu at yahoo.com Tue May 9 01:13:57 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Tue, 9 May 2006 01:13:57 -0700 (PDT) Subject: [openib-general] sdp kernel panic Message-ID: <20060509081357.59470.qmail@web36907.mail.mud.yahoo.com> OS: Redhat AMD64/EMT64 64-bit Enterprise Linux v3 update 4 IB: IBG2.0.1 downloaded from mellanox website HCA: MT 25204 APP: squid-2.5-stable13, ab(apache benchmark distributed with httpd 2.2.0) I use two HCA cards to connect two computers directly. on Machine B run command: LD_PRELOAD=libsdp.so squid -d 10 -f squid2.conf on Machine A run command: LD_PRELOAD=libsdp.so ab -c 2000 -n 2000 -X 193.12.10.14:3129 http://www.google.com/index.html Machine A paniced. The last two lines show: Code: 48 8b 6b 10 49 89 dd 4c 8b 75 20 48 8b 43 40 8b 93 c0 00 00 RIP { :ib_mad:ib_mad_send_done_handler+20} RSP <00001003ccadd78> CR2: 0000000000000010 <0> Kernel Panic - not syncing: OOps who has tested the stability and performance of sdp when supporting large mount of concurrent connections? It seems sdp connection building is very slow. tks zhu shi song __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From mst at mellanox.co.il Tue May 9 01:27:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 11:27:56 +0300 Subject: [openib-general] Re: Re: CMA: compliancy issue? In-Reply-To: <15ddcffd0605082156r4b9f6e94raf51d31d6664848e@mail.gmail.com> References: <445F4AA4.2020802@voltaire.com> <20060508134900.GD21036@mellanox.co.il> <445F54E1.8000305@voltaire.com> <20060508150534.GF21036@mellanox.co.il> <445F848C.2000502@ichips.intel.com> <20060508194705.GA25527@mellanox.co.il> <445FA246.5090106@ichips.intel.com> <20060508202238.GC25527@mellanox.co.il> <445FAAE4.4050600@ichips.intel.com> <15ddcffd0605082156r4b9f6e94raf51d31d6664848e@mail.gmail.com> Message-ID: <20060509082756.GY21036@mellanox.co.il> Quoting r. Or Gerlitz : > >From iSER point of view, this approach is fine, and it would allow for > some future flexibility to reject the REP. We prefer to implement it > only for 2.6.19, that is when 2.6.18-rc1 is out. Let us start by implementing this in SVN trunk. Sean, if you agree too, can you do this? -- MST From jackm at mellanox.co.il Tue May 9 02:23:39 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Tue, 9 May 2006 12:23:39 +0300 Subject: [openib-general] [PATCH 3/3] librdmacm: add ability to get/set transport specific options In-Reply-To: <445FD84A.2050102@ichips.intel.com> References: <200605071157.55936.jackm@mellanox.co.il> <445FD84A.2050102@ichips.intel.com> Message-ID: <200605091223.39564.jackm@mellanox.co.il> On Tuesday 09 May 2006 02:46, you wrote: > The end-users should probably decide that. These changes use the local_sa, > which has not been queued to be merged upstream yet. Use of local_sa in the rdma_cm kernel module is already patched out for OFED. local_sa is used ONLY in kernel cma.c ( static function cma_resolve_ib_route(), which calls ib_get_path_rec()). Icall to ib_get_path_rec() is eliminated, and we call cma_query_ib_route() directly. Sounds like I should take the current rdma_cm from the trunk for OFED (rather than leave rdma_cm at trunk revision number 6860 in OFED) -- and also take the new userlevel sources as well. What do you think -- is there anything else (besides local_sa) that might interfere with doing this? - Jack From zhushisongzhu at yahoo.com Tue May 9 03:03:37 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Tue, 9 May 2006 03:03:37 -0700 (PDT) Subject: [openib-general] sdp can't support many connections (>2000) Message-ID: <20060509100337.24254.qmail@web36913.mail.mud.yahoo.com> after I have upgraded firmware to MT25204 1.0.8 from 1.0.1, sdp doesn't cause the kernel panic again. But after executing LD_PRELOAD=libsdp.so ab -c 1000 -n 1000 -X 193.12.10.14:3129 http://www.google.com about three times continually, I can't connect Machine B from Machine A. I must reboot the two machines to restore sdp to work. I think sdp can't support many connections. Who knows what the problem exactly is ? How to solve the problem? tks zhu shi song __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From eitan at mellanox.co.il Tue May 9 04:20:21 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 9 May 2006 14:20:21 +0300 Subject: [openib-general] RE: [PATCH 0/2] opensm: low-level QoS implementation Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB33@mtlexch01.mtl.com> Hi Sasha, Hal, It is great that you work on QoS implementation. In general I see this simple extension of the SM capabilities as very useful one. But I think it would have been better if you first send out the RFC for the proposed functionality and only later implement it (as was done on the partition manager case). I have extracted the "description section" from the patch and here are my comments (prefixed [EZ]) to it: osm/doc/qos-config.txt: Trivial low level QoS configuration proposition. =============================================== Basically we have set of QoS related low-level configuration parameters. [EZ] I expected QoS parameters to be stored in a QoS Policy file as was done in the partition case. The main reason for that is that I believe a simple set of parameters is going to be an over simplification of the required functionality. All those parameter names are prefixed by "qos_" string. There is full list of such parameters: qos_max_vls - The number of maximum VLs will be on the Subnet qos_high_limit - The limit of High Priority component of VL Arbitration table (IBA 7.6.9) qos_vlarb_low - High priority VL Arbitration table (IBA 7.6.9) template. qos_vlarb_high - Low priority VL Arbitration table (IBA 7.6.9) template. Both VL arbitration templates are pairs of VL and weight. qos_sl2vl - SL2VL Mapping table (IBA 7.6.6) template. It is a list of VLs corresponding to SLs 0-15. (Note the VL15 used here means drop this SL). Typical default values (hard-coded in OpenSM initialization) are: qos_max_vls=15 qos_high_limit=0 qos_vlarb_low=0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0 qos_vlarb_high=0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4 qos_sl2vl=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 The syntax is compatible with rest of OpenSM configuration options and values may be stored in OpenSM config file (cached options file). [EZ] The above set of parameters is fine for "default" QoS support. I better understand the scope of this proposal now. [EZ] Please note that algorithm to validate the applicability of the above on the particular fabric is still required as not all devices support the 16 VLs and not all devices must support VLArb of 8 entries. In such cases we should at least provide an error describing why the provided setting is un-realizable. [EZ] The default SL2VL map and VLArb tables are not consistent: The VLArb tables do not provide any entry for VL > 7 so the SL >= 8 are not usable. In addition to above we may to define separate QoS configuration parameters sets for various target types. As targets we currently support HCA, routers, switch external ports and switch's enhanced port 0. The names of such specialized parameters are prefixed by "qos__" string. There is full list of currently supported sets: qos_hca_ - QoS configuration parameters set for HCAs. qos_rtr_ - parameters set for routers. qos_sw0_ - parameters set for switches' port 0. qos_swe_ - parameters set for switches' external ports. [EZ] I do not see how the above could be used. Instead I do see groups of nodes as being assigned different QoS levels. As we defined "groups of nodes" in the partition policy I would propose using the partitions as the means to define node groups. [EZ] So I propose to keep the "trivial" implementation without this level of control. Instead I would prefer having QoS Policy file defined such that these groups can be referred to. Examples: qos_sw0_max_vls=2 qos_hca_sl2vl=0,1,2,3,5,5,5,12,12,0, qos_swe_high_limit=0 [EZ] Another concept that is not represented in this proposal is the support of selecting QoS level for particular PathRecord queries. (or how does the ULP or Application obtain the SL). But I guess this falls under the second phase of the QoS support. I will follow-up with proposal for OSM QoS policy file syntax and functionality RFC for the next implementation steps. Thanks Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Monday, May 08, 2006 11:00 PM > To: Hal Rosenstock; openib-general at openib.org > Cc: Eitan Zahavi; Yael Kalka; Ofer Gigi; Eli Dorfman > Subject: [PATCH 0/2] opensm: low-level QoS implementation > > Hello, > > There is support for low level Quality of Service (QoS) parameters > configuration and setup in OpenSM. > > Please comment. Thanks. > > Sasha. From mst at mellanox.co.il Tue May 9 04:46:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 14:46:06 +0300 Subject: [openib-general] Re: sdp can't support many connections (>2000) In-Reply-To: <20060509100337.24254.qmail@web36913.mail.mud.yahoo.com> References: <20060509100337.24254.qmail@web36913.mail.mud.yahoo.com> Message-ID: <20060509114605.GB21036@mellanox.co.il> Quoting r. zhu shi song : > Subject: sdp can't support many connections (>2000) > > after I have upgraded firmware to MT25204 1.0.8 > from 1.0.1, sdp doesn't cause the kernel panic again. > But after executing LD_PRELOAD=libsdp.so ab -c 1000 -n > 1000 -X 193.12.10.14:3129 http://www.google.com about > three times continually, I can't connect Machine B > from Machine A. I must reboot the two machines to > restore sdp to work. > I think sdp can't support many connections. Who > knows what the problem exactly is ? How to solve the > problem? > > tks > zhu shi song I expect http://www.google.com isn't reacheable by SDP, is it? IIRC the SDP version that you use had problems in socket cleanup for cases where path resolution was timing out. -- MST From zhushisongzhu at yahoo.com Tue May 9 05:00:02 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Tue, 9 May 2006 05:00:02 -0700 (PDT) Subject: [openib-general] Re: sdp can't support many connections (>2000) In-Reply-To: <20060509114605.GB21036@mellanox.co.il> Message-ID: <20060509120002.85548.qmail@web36909.mail.mud.yahoo.com> ab send the request to squid cache server running on Machine B. Then squid send the real request to google website. So how can I upgrade my version to solve the problem? zhu --- "Michael S. Tsirkin" wrote: > Quoting r. zhu shi song : > > Subject: sdp can't support many connections > (>2000) > > > > after I have upgraded firmware to MT25204 > 1.0.8 > > from 1.0.1, sdp doesn't cause the kernel panic > again. > > But after executing LD_PRELOAD=libsdp.so ab -c > 1000 -n > > 1000 -X 193.12.10.14:3129 http://www.google.com > about > > three times continually, I can't connect Machine B > > from Machine A. I must reboot the two machines to > > restore sdp to work. > > I think sdp can't support many connections. > Who > > knows what the problem exactly is ? How to solve > the > > problem? > > > > tks > > zhu shi song > > I expect http://www.google.com isn't reacheable by > SDP, is it? > IIRC the SDP version that you use had problems in > socket cleanup > for cases where path resolution was timing out. > > -- > MST > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From leonida at voltaire.com Tue May 9 05:06:13 2006 From: leonida at voltaire.com (Leonid Arsh) Date: Tue, 9 May 2006 15:06:13 +0300 Subject: [openib-general][RFC][PATCH] core/sysfs.c: ability to reset port counters Message-ID: <20060509120613.GA3294@voltaire.com> Hello, we need a possibility to reset the port counters in /sys/class/infiniband/mthca0/ports/1/counters/. The attached patch implements the ability to reset the counters by writing to the sysfs counter files. The patch uses the same process_mad() mechanism as in the counter reading, but with IB_MGMT_METHOD_SET. The patch was checked on IBED-1.0-rc3 code, with MTHCA adaptor. I checked also possibility to set specific counter values, but always got the counter reset instead. The questions are: Is it a protocol or a firmware limitation, that I couldn't set specific values? If there is a way to set specific values, should we implement it? Should we implement an ability to reset (or set) specific counters, just like like I did in this patch? (this can cause inconsistency between counter values, for example between port_xmit_data and port_xmit_packets) Should we create an additional sysfs entry for the counter reset purpose (like /sys/class/infiniband/mthca0/ports/1/reset_counters) instead? Regards, Leonid Signed-off-by: Leonid Arsh --- linux-kernel/infiniband/core/sysfs.c.orig 2006-05-07 23:07:10.000000000 +0300 +++ linux-kernel/infiniband/core/sysfs.c 2006-05-09 17:55:55.000000000 +0300 @@ -88,8 +88,24 @@ return port_attr->show(p, port_attr, buf); } +static ssize_t port_attr_store(struct kobject *kobj, + struct attribute *attr, const char *buf, size_t len) +{ + struct port_attribute *port_attr = + container_of(attr, struct port_attribute, attr); + struct ib_port *p = container_of(kobj, struct ib_port, kobj); + + if (!port_attr->store) + return -EIO; + if (!ibdev_is_alive(p->ibdev)) + return -ENODEV; + + return port_attr->store(p, port_attr, buf, len); +} + static struct sysfs_ops port_sysfs_ops = { - .show = port_attr_show + .show = port_attr_show, + .store = port_attr_store }; static ssize_t state_show(struct ib_port *p, struct port_attribute *unused, @@ -292,10 +308,65 @@ #define PORT_PMA_ATTR(_name, _counter, _width, _offset) \ struct port_table_attribute port_pma_attr_##_name = { \ - .attr = __ATTR(_name, S_IRUGO, show_pma_counter, NULL), \ + .attr = __ATTR(_name, S_IRUGO | S_IWUSR, \ + show_pma_counter, store_pma_counter), \ .index = (_offset) | ((_width) << 16) | ((_counter) << 24) \ } +static ssize_t store_pma_counter(struct ib_port *p, struct port_attribute *attr, + const char *buf, size_t count) +{ + struct port_table_attribute *tab_attr = + container_of(attr, struct port_table_attribute, attr); + int counter = (tab_attr->index >> 24) & 0xff; + struct ib_mad *in_mad = NULL; + struct ib_mad *out_mad = NULL; + ssize_t ret; + + if (!p->ibdev->process_mad) + { + printk("store_pma_counter() process_mad() == NULL"); + ret = -EINVAL; + goto out; + } + + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); + out_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); + if (!in_mad || !out_mad) { + printk("store_pma_counter() NOMEM"); + ret = -ENOMEM; + goto out; + } + + memset( in_mad, 0, sizeof *in_mad ); + + in_mad->mad_hdr.base_version = 1; + in_mad->mad_hdr.mgmt_class = IB_MGMT_CLASS_PERF_MGMT; + in_mad->mad_hdr.class_version = 1; + in_mad->mad_hdr.method = IB_MGMT_METHOD_SET; + in_mad->mad_hdr.attr_id = cpu_to_be16(0x12); /* PortCounters */ + + *(__be16 *)(in_mad->data+42) = cpu_to_be16( ((__u16)1) << counter ); /* CounterSelect field */ + + in_mad->data[41] = p->port_num; /* PortSelect field */ + + if ((p->ibdev->process_mad(p->ibdev, IB_MAD_IGNORE_MKEY, + p->port_num, NULL, NULL, in_mad, out_mad) & + (IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY)) != + (IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY)) { + printk("store_pma_counter() EINVAL"); + ret = -EINVAL; + goto out; + } + + ret = count; +out: + kfree(in_mad); + kfree(out_mad); + + return ret; +} + static ssize_t show_pma_counter(struct ib_port *p, struct port_attribute *attr, char *buf) { From mst at mellanox.co.il Tue May 9 05:13:00 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 15:13:00 +0300 Subject: [openib-general] Re: sdp can't support many connections (>2000) In-Reply-To: <20060509120002.85548.qmail@web36909.mail.mud.yahoo.com> References: <20060509114605.GB21036@mellanox.co.il> <20060509120002.85548.qmail@web36909.mail.mud.yahoo.com> Message-ID: <20060509121300.GC21036@mellanox.co.il> Quoting r. zhu shi song : > Subject: Re: sdp can't support many connections (>2000) > > ab send the request to squid cache server running on > Machine B. Then squid send the real request to google > website. > So how can I upgrade my version to solve the > problem? > > zhu Try getting latest stack snapshot from svn. -- MST From schihei at de.ibm.com Tue May 9 05:35:28 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Tue, 09 May 2006 14:35:28 +0200 Subject: [openib-general] [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: References: <4450A196.2050901@de.ibm.com> <445B4DA9.9040601@de.ibm.com> Message-ID: <44608C90.30909@de.ibm.com> Roland Dreier wrote: > Heiko> Originaly, we had the same idea as you mentioned, that it > Heiko> would be better to do this in the higher levels. The point > Heiko> is that we can't see so far any simple posibility how this > Heiko> can done in the OpenIB stack, the TCP/IP network layer or > Heiko> somewhere in the Linux kernel. > > Heiko> For example: For IPoIB we get the best throughput when we > Heiko> do the CQ callbacks on different CPUs and not to stay on > Heiko> the same CPU. > > So why not do it in IPoIB then? This approach is not optimal > globally. For example, uverbs event dispatch is just going to queue > an event and wake up the process waiting for events, and doing this on > some random CPU not related to the where the process will run is > clearly the worst possible way to dispatch the event. Yes, I agree. It would not be an optimal solution, because other upper level protocols (e.g. SDP, SRP, etc.) or userspace verbs would not be affected by this changes. Nevertheless, how can an improved "scaling" or "SMP" version of IPoIB look like. How could it be implemented? > Heiko> In other papers and slides (see [1]) you can see similar > Heiko> approaches. > > Heiko> [1]: Speeding up Networking, Van Jacobson and Bob > Heiko> Felderman, > Heiko> http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf > I think you've misunderstood this paper. It's about maximizing CPU > locality and pushing processing directly into the consumer. In the > context of slide 9, what you've done is sort of like adding another > control loop inside the kernel, since you dispatch from interrupt > handler to driver thread to final consumer. So I would argue that > your approach is exactly the opposite of what VJ is advocating. Sorry, my idea was not to use the *.pdf file how it should be implemented. I only wanted to show that other people are also thinking about how TCP/IP performance could be increased and where the bottlenecks (e.g. SOFTIRQs) are. :) Regards, Heiko From hch at lst.de Tue May 9 06:15:47 2006 From: hch at lst.de (Christoph Hellwig) Date: Tue, 9 May 2006 15:15:47 +0200 Subject: [openib-general] [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: <4450A196.2050901@de.ibm.com> References: <4450A196.2050901@de.ibm.com> Message-ID: <20060509131547.GA8449@lst.de> > +#include > +#include > +#include Please don't use directly ever. Always include From mst at mellanox.co.il Tue May 9 06:17:13 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 16:17:13 +0300 Subject: [openib-general][RFC][PATCH] core/sysfs.c: ability to reset port counters In-Reply-To: <20060509120613.GA3294@voltaire.com> References: <20060509120613.GA3294@voltaire.com> Message-ID: <20060509131713.GD21036@mellanox.co.il> I think the capability to reset counters is useful. Some comments on the patch: Quoting r. Leonid Arsh : > Subject: [openib-general][RFC][PATCH] core/sysfs.c: ability to reset port counters > > > Hello, > > we need a possibility to reset the port counters in /sys/class/infiniband/mthca0/ports/1/counters/. > > The attached patch implements the ability to reset the counters by writing to the sysfs counter files. > The patch uses the same process_mad() mechanism as in the counter reading, but with IB_MGMT_METHOD_SET. > > The patch was checked on IBED-1.0-rc3 code, with MTHCA adaptor. > > I checked also possibility to set specific counter values, but always got the counter reset instead. That's what the spec says. IB spec: C16-7: When initially powered-up or reset, the value of all counters on all ports of a node shall be set to zero. During operation, instead of overflowing, they shall stop at all ones. At any time, writing (Set) zero into a counter shall cause the counter to be reset to zero. Note that writing (Set) anything other than zero into a counter results in undefined behavior. > The questions are: > Is it a protocol or a firmware limitation, that I couldn't set specific values? > If there is a way to set specific values, should we implement it? I don't think it would be useful even if some devices supported it. > Should we implement an ability to reset (or set) specific counters, > just like like I did in this patch? Generally I think what you do is fine. However you must validate the input and reject any attempt to set the value to anything except 0, to avoid triggering undefined behaviour in hardware. > (this can cause inconsistency between counter values, > for example between port_xmit_data and port_xmit_packets) You can't read the values atomically anyway, so why do you care about atomic reset? > Should we create an additional sysfs entry for the counter reset purpose > (like /sys/class/infiniband/mthca0/ports/1/reset_counters) instead? I wouldn't think so. > > Regards, > Leonid Some comments: > Signed-off-by: Leonid Arsh > > --- linux-kernel/infiniband/core/sysfs.c.orig 2006-05-07 23:07:10.000000000 +0300 > +++ linux-kernel/infiniband/core/sysfs.c 2006-05-09 17:55:55.000000000 +0300 > @@ -88,8 +88,24 @@ > return port_attr->show(p, port_attr, buf); > } > > +static ssize_t port_attr_store(struct kobject *kobj, > + struct attribute *attr, const char *buf, size_t len) Move more parameters to the first line. Documentation/CodingStyle: "Descendants are always substantially shorter than the parent" > +{ > + struct port_attribute *port_attr = > + container_of(attr, struct port_attribute, attr); > + struct ib_port *p = container_of(kobj, struct ib_port, kobj); > + > + if (!port_attr->store) > + return -EIO; > + if (!ibdev_is_alive(p->ibdev)) > + return -ENODEV; > + > + return port_attr->store(p, port_attr, buf, len); > +} > + > static struct sysfs_ops port_sysfs_ops = { > - .show = port_attr_show > + .show = port_attr_show, > + .store = port_attr_store > }; > > static ssize_t state_show(struct ib_port *p, struct port_attribute *unused, > @@ -292,10 +308,65 @@ > > #define PORT_PMA_ATTR(_name, _counter, _width, _offset) \ > struct port_table_attribute port_pma_attr_##_name = { \ > - .attr = __ATTR(_name, S_IRUGO, show_pma_counter, NULL), \ > + .attr = __ATTR(_name, S_IRUGO | S_IWUSR, \ > + show_pma_counter, store_pma_counter), \ Please move show_pma_counter back to the first line and align descendants at least to the right from the parent brace: Documentation/CodingStyle: "Descendants are always substantially shorter than the parent and are placed substantially to the right." > .index = (_offset) | ((_width) << 16) | ((_counter) << 24) \ > } > > +static ssize_t store_pma_counter(struct ib_port *p, struct port_attribute *attr, > + const char *buf, size_t count) You seem to be ignoring the value user is writing. Could this be why you always get the value set to 0? Writing 0x1 into sysfs and getting 0 is surprizing. I think you must check that the value is 0, since: C16-7: When initially powered-up or reset, the value of all counters on all ports of a node shall be set to zero. During operation, instead of overflowing, they shall stop at all ones. At any time, writing (Set) zero into a counter shall cause the counter to be reset to zero. Note that writing (Set) anything other than zero into a counter results in undefined behavior. Any other value should return -EINVAL. > +{ > + struct port_table_attribute *tab_attr = > + container_of(attr, struct port_table_attribute, attr); > + int counter = (tab_attr->index >> 24) & 0xff; index is 32 bit, so index >> 24 can't have high bits set, so 0xff is unnecessary. Also, make counter u8 for clarity. > + struct ib_mad *in_mad = NULL; > + struct ib_mad *out_mad = NULL; = NULL shouldn't be necessary. > + ssize_t ret; No need for this variable - you can just assign to count. > + > + if (!p->ibdev->process_mad) This is really bad: user can't be expected to parse var log messages to figure out why does echo fail. Can we avoid creating the sysfs file if process_mad is NULL? > + { { must be on same line with if. > + printk("store_pma_counter() process_mad() == NULL"); printk left over from debug? > + ret = -EINVAL; > + goto out; return -EINVAL here and don't waste cycles initializing ib_mad/out_mad. > + } > + > + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); > + out_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); > + if (!in_mad || !out_mad) { > + printk("store_pma_counter() NOMEM"); > + ret = -ENOMEM; > + goto out; > + } > + > + memset( in_mad, 0, sizeof *in_mad ); memset after kzalloc? Also space inside (). > + in_mad->mad_hdr.base_version = 1; > + in_mad->mad_hdr.mgmt_class = IB_MGMT_CLASS_PERF_MGMT; > + in_mad->mad_hdr.class_version = 1; > + in_mad->mad_hdr.method = IB_MGMT_METHOD_SET; > + in_mad->mad_hdr.attr_id = cpu_to_be16(0x12); /* PortCounters */ > + > + *(__be16 *)(in_mad->data+42) = cpu_to_be16( ((__u16)1) << counter ); /* CounterSelect field */ Speces are required around +. Don't use __u16 in source - its mainly for headers. And the cast is not needed here anyway. > + > + in_mad->data[41] = p->port_num; /* PortSelect field */ A huge number of magic constants here. Please use named constants instead. Instead of casts, I think it would be much better to create a proper structure describing the MAD format that you use. Then you just cast in_mad->data to that type and fill it in. > + > + if ((p->ibdev->process_mad(p->ibdev, IB_MAD_IGNORE_MKEY, > + p->port_num, NULL, NULL, in_mad, out_mad) & > + (IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY)) != This line is too long. How about a temporary variable here? != is weaker than -> and function call, you don't need the extra (). > + (IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY)) { > + printk("store_pma_counter() EINVAL"); Leftovers from debug? > + ret = -EINVAL; > + goto out; > + } > + > + ret = count; > +out: > + kfree(in_mad); > + kfree(out_mad); > + > + return ret; > +} > + > static ssize_t show_pma_counter(struct ib_port *p, struct port_attribute *attr, > char *buf) > { > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > -- MST From dotanb at mellanox.co.il Tue May 9 06:25:49 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Tue, 9 May 2006 16:25:49 +0300 Subject: [openib-general] compilation warning in libibcm Message-ID: <200605091625.49464.dotanb@mellanox.co.il> make[2]: Entering directory `/tmp/openib_gen2/last_stable/src/userspace/libibcm' if /bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I../libibverbs/include -g -Wall -D_GNU_SOURCE -g -O2 -MT cm.lo -MD -MP -MF ".deps/cm.Tpo" -c -o cm.lo `test -f 'src/cm.c' || echo './'`src/cm.c; \ then mv -f ".deps/cm.Tpo" ".deps/cm.Plo"; else rm -f ".deps/cm.Tpo"; exit 1; fi mkdir .libs gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I../libibverbs/include -g -Wall -D_GNU_SOURCE -g -O2 -MT cm.lo -MD -MP -MF .deps/cm.Tpo -c src /cm.c -fPIC -DPIC -o .libs/cm.o src/cm.c: In function `ib_cm_init_qp_attr': src/cm.c:406: warning: `ib_copy_qp_attr_from_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:71) src/cm.c: In function `ib_cm_send_req': src/cm.c:466: warning: `ib_copy_path_rec_to_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:77) src/cm.c:475: warning: `ib_copy_path_rec_to_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:77) src/cm.c: In function `ib_cm_send_lap': src/cm.c:695: warning: `ib_copy_path_rec_to_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:77) src/cm.c: In function `ib_cm_send_sidr_req': src/cm.c:735: warning: `ib_copy_path_rec_to_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:77) src/cm.c: In function `cm_event_req_get': src/cm.c:803: warning: `ib_copy_path_rec_from_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:74) src/cm.c:805: warning: `ib_copy_path_rec_from_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:74) src/cm.c: In function `ib_cm_get_event': src/cm.c:950: warning: `ib_copy_path_rec_from_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:74) /bin/sh ./libtool --tag=CC --mode=link gcc -g -Wall -D_GNU_SOURCE -g -O2 -L../libibverbs/src -libverbs -o src/libibcm.la -rpath /usr/local/ /lib64 -avoid-version cm.lo mkdir src/.libs From dotanb at mellanox.co.il Tue May 9 06:30:12 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Tue, 9 May 2006 16:30:12 +0300 Subject: [openib-general] compilation warnings in libibcm Message-ID: <200605091630.12039.dotanb@mellanox.co.il> Hi, please ignore the previous email. Compilation of the gen2 driver generate compilation warnings when compiling the libibcm. here are the machine + driver info: Host Architecture : x86_64 Linux Distribution: Red Hat Enterprise Linux AS release 4 (Nahant Update 3) Kernel Version : 2.6.16.9 Driver Version : openib_gen2-20060509-0800 (REV=7002) Here are the compilation warnings: make[2]: Entering directory `/tmp/openib_gen2/last_stable/src/userspace/libibcm' if /bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I../libibverbs/include -g -Wall -D_GNU_SOURCE -g -O2 -MT cm.lo -MD -MP -MF ".deps/cm.Tpo" -c -o cm.lo `test -f 'src/cm.c' || echo './'`src/cm.c; \ then mv -f ".deps/cm.Tpo" ".deps/cm.Plo"; else rm -f ".deps/cm.Tpo"; exit 1; fi mkdir .libs gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I../libibverbs/include -g -Wall -D_GNU_SOURCE -g -O2 -MT cm.lo -MD -MP -MF .deps/cm.Tpo -c src /cm.c -fPIC -DPIC -o .libs/cm.o src/cm.c: In function `ib_cm_init_qp_attr': src/cm.c:406: warning: `ib_copy_qp_attr_from_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:71) src/cm.c: In function `ib_cm_send_req': src/cm.c:466: warning: `ib_copy_path_rec_to_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:77) src/cm.c:475: warning: `ib_copy_path_rec_to_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:77) src/cm.c: In function `ib_cm_send_lap': src/cm.c:695: warning: `ib_copy_path_rec_to_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:77) src/cm.c: In function `ib_cm_send_sidr_req': src/cm.c:735: warning: `ib_copy_path_rec_to_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:77) src/cm.c: In function `cm_event_req_get': src/cm.c:803: warning: `ib_copy_path_rec_from_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:74) src/cm.c:805: warning: `ib_copy_path_rec_from_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:74) src/cm.c: In function `ib_cm_get_event': src/cm.c:950: warning: `ib_copy_path_rec_from_kern' is deprecated (declared at ../libibverbs/include/infiniband/marshall.h:74) /bin/sh ./libtool --tag=CC --mode=link gcc -g -Wall -D_GNU_SOURCE -g -O2 -L../libibverbs/src -libverbs -o src/libibcm.la -rpath /usr/local/ /lib64 -avoid-version cm.lo mkdir src/.libs thanks Dotan From k_mahesh85 at yahoo.co.in Tue May 9 07:25:52 2006 From: k_mahesh85 at yahoo.co.in (keshetti mahesh) Date: Tue, 9 May 2006 15:25:52 +0100 (BST) Subject: [openib-general] problem with applications like telnet over SDP Message-ID: <20060509142552.27150.qmail@web8321.mail.in.yahoo.com> hello can anybody tell me ..do i need to configure anything to make applications like TELNET,FTP,...work over SDP The normal socket applications are working fine over SDP with the setup i am using. i have configured /etc/libsdp.conf like below >match listen *:* >match destination *:* >match program * and exported LD_PRELOAD,LIBSDP_CONFIG_FILE variables It is failing in connection setup when i run telnet. what can i do to avoid it? thanks n regards K.Mahesh __________________________________________________________ Yahoo! India Answers: Share what you know. Learn something new. http://in.answers.yahoo.com From bos at pathscale.com Tue May 9 07:35:34 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 09 May 2006 07:35:34 -0700 Subject: [openib-general] Re: Need OpenIB bugzilla component for RDS In-Reply-To: <96f8e60e0605071007u41ba9789g6f52b08e5be0fcd8@mail.gmail.com> References: <1146893261.1045.18.camel@localhost.localdomain> <96f8e60e0605071007u41ba9789g6f52b08e5be0fcd8@mail.gmail.com> Message-ID: <1147185334.16126.4.camel@chalcedony.pathscale.com> On Sun, 2006-05-07 at 10:07 -0700, Ranjit Pandit wrote: > Please mark me as the default owner of RDS bugs. As it turns out, I "fired and forgot", and didn't check the result of creating the component. It failed because you don't have a Bugzilla account. Please create one. References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB33@mtlexch01.mtl.com> Message-ID: <20060509144804.GE28740@sashak.voltaire.com> Hi Eitan, On 14:20 Tue 09 May , Eitan Zahavi wrote: > > It is great that you work on QoS implementation. In general I see this > simple extension of the SM capabilities as very useful one. But I think > it would have been better if you first send out the RFC for the proposed > functionality and only later implement it (as was done on the partition > manager case). I agree with you, those patches are RFC (I've just forgot to sign this in the subject). > I have extracted the "description section" from the patch > and here are my comments (prefixed [EZ]) to it: > > osm/doc/qos-config.txt: > Trivial low level QoS configuration proposition. > =============================================== > > Basically we have set of QoS related low-level configuration parameters. > [EZ] I expected QoS parameters to be stored in a QoS Policy file as was > done in > the partition case. The main reason for that is that I believe > a simple set of > parameters is going to be an over simplification of the > required functionality. The main goal of this implementation is to provide low-level primitives to setup QoS related attributes and simple but useful way to configure it. In this case just "raw" configuration parameters seems as most suitable for me. So unlike to partition case we don't really have any "policy" yet. I think kind of this will be the next stage of QoS functionality. Some things we may do right now - for instance we may associate service level value with particular partition (and define it in partition "policy" configuration). > All those parameter names are prefixed by "qos_" string. There is full > list of such parameters: > > qos_max_vls - The number of maximum VLs will be on the Subnet > qos_high_limit - The limit of High Priority component of VL > Arbitration > table (IBA 7.6.9) > qos_vlarb_low - High priority VL Arbitration table (IBA 7.6.9) > template. > qos_vlarb_high - Low priority VL Arbitration table (IBA 7.6.9) > template. > Both VL arbitration templates are pairs of VL and > weight. > qos_sl2vl - SL2VL Mapping table (IBA 7.6.6) template. It is a > list > of VLs corresponding to SLs 0-15. (Note the VL15 used > here means drop this SL). > > Typical default values (hard-coded in OpenSM initialization) are: > > qos_max_vls=15 > qos_high_limit=0 > qos_vlarb_low=0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0 > qos_vlarb_high=0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4 > qos_sl2vl=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 > > The syntax is compatible with rest of OpenSM configuration options and > values may be stored in OpenSM config file (cached options file). > > [EZ] The above set of parameters is fine for "default" QoS support. > I better understand the scope of this proposal now. > [EZ] Please note that algorithm to validate the applicability of the > above on the > particular fabric is still required as not all devices support > the 16 VLs VL numbers are translated according to port's capabilities and configured OperVLs (the numbers are MODed). > and not all > devices must support VLArb of 8 entries. In such cases we > should at least provide > an error describing why the provided setting is un-realizable. In the case of "short" VLArb table the template will be truncated (silently) to meet port's capabilities. This is not a error, right? > [EZ] The default SL2VL map and VLArb tables are not consistent: The > VLArb tables do > not provide any entry for VL > 7 so the SL >= 8 are not usable. This is reasonable note, we may extend default set. > In addition to above we may to define separate QoS configuration > parameters sets for various target types. As targets we currently > support > HCA, routers, switch external ports and switch's enhanced port 0. The > names of such specialized parameters are prefixed by "qos__" > string. There is full list of currently supported sets: > > qos_hca_ - QoS configuration parameters set for HCAs. > qos_rtr_ - parameters set for routers. > qos_sw0_ - parameters set for switches' port 0. > qos_swe_ - parameters set for switches' external ports. > > [EZ] I do not see how the above could be used. Instead I do see groups > of nodes as being > assigned different QoS levels. As we defined "groups of nodes" > in the partition > policy I would propose using the partitions as the means to > define node groups. > [EZ] So I propose to keep the "trivial" implementation without this > level of control. At least it does not hurt, and somebody tell me that this will be useful. So I would prefer to start with this feature. > Instead I would prefer having QoS Policy file defined such that > these groups can be > referred to. I think that finally we will end with something like this, but for this we don't have yet some important things (like QoS level), so I would prefer to start with simple model. > Examples: > > qos_sw0_max_vls=2 > qos_hca_sl2vl=0,1,2,3,5,5,5,12,12,0, > qos_swe_high_limit=0 > > [EZ] Another concept that is not represented in this proposal is the > support of selecting QoS level for particular PathRecord queries. There is no QoS level, just SL, this is not the same. > (or > how does the ULP or Application obtain the SL). But I guess this falls > under the second phase of the QoS support. We may do do some elements of this soon - specifically SL value per partition (optional), when SL value may be returned in SA queries. Obviously low-level QoS implementation is needed for this. Thanks for comments. Sasha. From mst at mellanox.co.il Tue May 9 07:58:28 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 17:58:28 +0300 Subject: [openib-general] Re: problem with applications like telnet over SDP In-Reply-To: <20060509142552.27150.qmail@web8321.mail.in.yahoo.com> References: <20060509142552.27150.qmail@web8321.mail.in.yahoo.com> Message-ID: <20060509145828.GH21036@mellanox.co.il> Quoting r. keshetti mahesh : > Subject: problem with applications like telnet over SDP > > hello > > can anybody tell me ..do i need to configure anything > to make applications like TELNET,FTP,...work over SDP > > The normal socket applications are working fine over > SDP with the setup i am using. > > i have configured /etc/libsdp.conf like below > > >match listen *:* > >match destination *:* > >match program * > > and exported LD_PRELOAD,LIBSDP_CONFIG_FILE variables > > It is failing in connection setup when i run telnet. > what can i do to avoid it? > > thanks n regards > K.Mahesh You did run the server with libsdp as well, did you not? -- MST From eitan at mellanox.co.il Tue May 9 08:04:41 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 9 May 2006 18:04:41 +0300 Subject: [openib-general] RE: [PATCH 0/2] opensm: low-level QoS implementation Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB36@mtlexch01.mtl.com> Hi Sasha, I have cut and paste only the sections I would like to further comment on. Eitan > > [EZ] Please note that algorithm to validate the applicability of the > > above on the > > particular fabric is still required as not all devices support > > the 16 VLs > > VL numbers are translated according to port's capabilities and > configured OperVLs (the numbers are MODed). [EZ] I am not following what you mean here. Can you elaborate? > > > and not all > > devices must support VLArb of 8 entries. In such cases we > > should at least provide > > an error describing why the provided setting is un-realizable. > > In the case of "short" VLArb table the template will be truncated > (silently) to meet port's capabilities. This is not a error, right? [EZ] If you do not have an entry for a VL in the VLArb (both high and low) tables it means this VL will never be scheduled for transmission. So anybody using this VL will be "blocked". An algorithm to do SL2VL in such cases can be used to avoid these problems. > > > [EZ] I do not see how the above could be used. Instead I do see groups > > of nodes as being > > assigned different QoS levels. As we defined "groups of nodes" > > in the partition > > policy I would propose using the partitions as the means to > > define node groups. > > [EZ] So I propose to keep the "trivial" implementation without this > > level of control. > > At least it does not hurt, and somebody tell me that this will be > useful. So I would prefer to start with this feature. [EZ] OK. But in the future the policy will override these default parameters. > > > Instead I would prefer having QoS Policy file defined such that From Don.Albert at Bull.com Tue May 9 08:01:08 2006 From: Don.Albert at Bull.com (Don.Albert at Bull.com) Date: Tue, 9 May 2006 08:01:08 -0700 Subject: [openib-general] Re: NOP problem in ib_mthca on OFED RC4 In-Reply-To: <20060509053201.GD17814@mellanox.co.il> Message-ID: Michael, "Michael S. Tsirkin" wrote on 05/08/2006 10:32:01 PM: > Quoting r. Don.Albert at Bull.com : > > When I had the problem previously, Roland Drier suggested trying > to load the ib_mthca module with "fw_cmd_doorbell=0", which did > avoid the error then, and in fact does on this new problem. But > the question is why? Updating the firmware on the old board > seemed to have solved the problem before, but now it has occurred > again on a fairly new card with recent firmware. Has anyone else > seen this problem? > > Which FW revision do you have? > The "ibstat" command shows: CA type: MT25204 Number of ports: 1 Firmware version: 1.0.800 Hardware version: a0 Node GUID: 0x0002c90200216dc4 System image GUID: 0x0002c90200216dc7 -Don Albert- -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Tue May 9 08:04:26 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 18:04:26 +0300 Subject: [openib-general] Re: NOP problem in ib_mthca on OFED RC4 In-Reply-To: References: <20060509053201.GD17814@mellanox.co.il> Message-ID: <20060509150426.GI21036@mellanox.co.il> Quoting r. Don.Albert at bull.com : > Subject: Re: NOP problem in ib_mthca on OFED RC4 > > > Michael, > > "Michael S. Tsirkin" wrote on 05/08/2006 10:32:01 PM: > > > Quoting r. Don.Albert at Bull.com : > > > When I had the problem previously, Roland Drier suggested trying > > to load the ib_mthca module with "fw_cmd_doorbell=0", which did > > avoid the error then, and in fact does on this new problem. But > > the question is why? Updating the firmware on the old board > > seemed to have solved the problem before, but now it has occurred > > again on a fairly new card with recent firmware. Has anyone else > > seen this problem? > > > > Which FW revision do you have? > > > The "ibstat" command shows: > > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.0.800 > Hardware version: a0 > Node GUID: 0x0002c90200216dc4 > System image GUID: 0x0002c90200216dc7 > > -Don Albert- > Yes, that's the latest revision. Hmm. -- MST From sashak at voltaire.com Tue May 9 09:05:58 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 9 May 2006 19:05:58 +0300 Subject: [openib-general] Re: [PATCH 0/2] opensm: low-level QoS implementation In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB36@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB36@mtlexch01.mtl.com> Message-ID: <20060509160558.GI28740@sashak.voltaire.com> On 18:04 Tue 09 May , Eitan Zahavi wrote: > > > > [EZ] Please note that algorithm to validate the applicability of the > > > above on the > > > particular fabric is still required as not all devices > support > > > the 16 VLs > > > > VL numbers are translated according to port's capabilities and > > configured OperVLs (the numbers are MODed). > [EZ] I am not following what you mean here. Can you elaborate? For example for ports with VLCap and OperVLs VL0-7 such SL2VL table template 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 will be translated to such 0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7 SL2VL table. > > > and not all > > > devices must support VLArb of 8 entries. In such cases we > > > should at least provide > > > an error describing why the provided setting is > un-realizable. > > > > In the case of "short" VLArb table the template will be truncated > > (silently) to meet port's capabilities. This is not a error, right? > [EZ] If you do not have an entry for a VL in the VLArb (both high and > low) tables it means this VL will never be scheduled for transmission. > So anybody using this VL will be "blocked". But size of low table cannot be less than number of data VLs supported by the port. So the case you described is possible only when it is specially configured (like this: qos_vlarb_low=1:1,1:1,1:1,1:1,1:1...), and then I guess that it is what was desired by admin. > An algorithm to do SL2VL in > such cases can be used to avoid these problems. > > > > > [EZ] I do not see how the above could be used. Instead I do see > groups > > > of nodes as being > > > assigned different QoS levels. As we defined "groups of > nodes" > > > in the partition > > > policy I would propose using the partitions as the means to > > > define node groups. > > > [EZ] So I propose to keep the "trivial" implementation without this > > > level of control. > > > > At least it does not hurt, and somebody tell me that this will be > > useful. So I would prefer to start with this feature. > > [EZ] OK. But in the future the policy will override these default > parameters. Yes, it may override it in the future. We will clean it up as obsolete code then. Sasha. From rdreier at cisco.com Tue May 9 09:23:38 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 09 May 2006 09:23:38 -0700 Subject: [openib-general] [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: <44608C90.30909@de.ibm.com> (Heiko J. Schick's message of "Tue, 09 May 2006 14:35:28 +0200") References: <4450A196.2050901@de.ibm.com> <445B4DA9.9040601@de.ibm.com> <44608C90.30909@de.ibm.com> Message-ID: Heiko> Yes, I agree. It would not be an optimal solution, because Heiko> other upper level protocols (e.g. SDP, SRP, etc.) or Heiko> userspace verbs would not be affected by this Heiko> changes. Nevertheless, how can an improved "scaling" or Heiko> "SMP" version of IPoIB look like. How could it be Heiko> implemented? The trivial way to do it would be to use the same idea as the current ehca driver: just create a thread for receive CQ events and a thread for send CQ events, and defer CQ polling into those two threads. Something even better may be possible by specializing to IPoIB of course. - R. From mshefty at ichips.intel.com Tue May 9 09:25:21 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 09 May 2006 09:25:21 -0700 Subject: [openib-general] [PATCH 3/3] librdmacm: add ability to get/set transport specific options In-Reply-To: <200605091223.39564.jackm@mellanox.co.il> References: <200605071157.55936.jackm@mellanox.co.il> <445FD84A.2050102@ichips.intel.com> <200605091223.39564.jackm@mellanox.co.il> Message-ID: <4460C271.6060304@ichips.intel.com> Jack Morgenstein wrote: > Use of local_sa in the rdma_cm kernel module is already patched out for OFED. > local_sa is used ONLY in kernel cma.c ( static function > cma_resolve_ib_route(), which calls ib_get_path_rec()). > Icall to ib_get_path_rec() is eliminated, and we call cma_query_ib_route() > directly. This is fine for route lookup, but not rdma_get_option. > Sounds like I should take the current rdma_cm from the trunk for OFED (rather > than leave rdma_cm at trunk revision number 6860 in OFED) -- and also take > the new userlevel sources as well. > > What do you think -- is there anything else (besides local_sa) that might > interfere with doing this? I don't believe that anything prevents you from updating the rdma_cm to a later revision. Without the local SA, the rdma_get_option call won't actually do anything unless I make it asynchronous. - Sean From mshefty at ichips.intel.com Tue May 9 09:25:24 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 09 May 2006 09:25:24 -0700 Subject: [openib-general] Re: CMA: compliancy issue? In-Reply-To: <20060509082756.GY21036@mellanox.co.il> References: <445F4AA4.2020802@voltaire.com> <20060508134900.GD21036@mellanox.co.il> <445F54E1.8000305@voltaire.com> <20060508150534.GF21036@mellanox.co.il> <445F848C.2000502@ichips.intel.com> <20060508194705.GA25527@mellanox.co.il> <445FA246.5090106@ichips.intel.com> <20060508202238.GC25527@mellanox.co.il> <445FAAE4.4050600@ichips.intel.com> <15ddcffd0605082156r4b9f6e94raf51d31d6664848e@mail.gmail.com> <20060509082756.GY21036@mellanox.co.il> Message-ID: <4460C274.8070004@ichips.intel.com> Michael S. Tsirkin wrote: >>>From iSER point of view, this approach is fine, and it would allow for >>some future flexibility to reject the REP. We prefer to implement it >>only for 2.6.19, that is when 2.6.18-rc1 is out. > > > Let us start by implementing this in SVN trunk. Sean, if you agree too, can you > do this? I'm not sure that always exposing CONNECT_RESPONSE makes sense. This is slowly turning the RDMA CM into the IB CM. CONNECT_RESPONSE is really there to support userspace, and is IB protocol specific. - Sean From eitan at mellanox.co.il Tue May 9 09:32:05 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 9 May 2006 19:32:05 +0300 Subject: [openib-general] RE: [PATCH 0/2] opensm: low-level QoS implementation Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB3E@mtlexch01.mtl.com> Hi Sasha Thanks for clearing the issues. I'm OK with the RFC. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Tuesday, May 09, 2006 7:06 PM > To: Eitan Zahavi > Cc: Hal Rosenstock; openib-general at openib.org; Yael Kalka; Ofer Gigi; Eli Dorfman > Subject: Re: [PATCH 0/2] opensm: low-level QoS implementation > > On 18:04 Tue 09 May , Eitan Zahavi wrote: > > > > > > [EZ] Please note that algorithm to validate the applicability of the > > > > above on the > > > > particular fabric is still required as not all devices > > support > > > > the 16 VLs > > > > > > VL numbers are translated according to port's capabilities and > > > configured OperVLs (the numbers are MODed). > > [EZ] I am not following what you mean here. Can you elaborate? > > For example for ports with VLCap and OperVLs VL0-7 such SL2VL table > template > > 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 > > will be translated to such > > 0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7 > > SL2VL table. > > > > > and not all > > > > devices must support VLArb of 8 entries. In such cases we > > > > should at least provide > > > > an error describing why the provided setting is > > un-realizable. > > > > > > In the case of "short" VLArb table the template will be truncated > > > (silently) to meet port's capabilities. This is not a error, right? > > [EZ] If you do not have an entry for a VL in the VLArb (both high and > > low) tables it means this VL will never be scheduled for transmission. > > So anybody using this VL will be "blocked". > > But size of low table cannot be less than number of data VLs supported > by the port. So the case you described is possible only when it is > specially configured (like this: > qos_vlarb_low=1:1,1:1,1:1,1:1,1:1...), and then I guess that it is what > was desired by admin. > > > An algorithm to do SL2VL in > > such cases can be used to avoid these problems. > > > > > > > > [EZ] I do not see how the above could be used. Instead I do see > > groups > > > > of nodes as being > > > > assigned different QoS levels. As we defined "groups of > > nodes" > > > > in the partition > > > > policy I would propose using the partitions as the means to > > > > define node groups. > > > > [EZ] So I propose to keep the "trivial" implementation without this > > > > level of control. > > > > > > At least it does not hurt, and somebody tell me that this will be > > > useful. So I would prefer to start with this feature. > > > > [EZ] OK. But in the future the policy will override these default > > parameters. > > Yes, it may override it in the future. We will clean it up as obsolete > code then. > > Sasha. From rdreier at cisco.com Tue May 9 09:29:35 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 09 May 2006 09:29:35 -0700 Subject: [openib-general][RFC][PATCH] core/sysfs.c: ability to reset port counters In-Reply-To: <20060509120613.GA3294@voltaire.com> (Leonid Arsh's message of "Tue, 9 May 2006 15:06:13 +0300") References: <20060509120613.GA3294@voltaire.com> Message-ID: Leonid> we need a possibility to reset the port counters in Leonid> /sys/class/infiniband/mthca0/ports/1/counters/. Why do you need this possibility? Having counters reset locally is going to confuse any performance manager running on the fabric that might be reading the counters remotely. Is it really a must for this to be done by writing to sysfs? Couldn't a trivial userspace app send a MAD locally to reset them? - R. From mst at mellanox.co.il Tue May 9 09:44:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 19:44:34 +0300 Subject: [openib-general] Re: CMA: compliancy issue? In-Reply-To: <4460C274.8070004@ichips.intel.com> References: <445F54E1.8000305@voltaire.com> <20060508150534.GF21036@mellanox.co.il> <445F848C.2000502@ichips.intel.com> <20060508194705.GA25527@mellanox.co.il> <445FA246.5090106@ichips.intel.com> <20060508202238.GC25527@mellanox.co.il> <445FAAE4.4050600@ichips.intel.com> <15ddcffd0605082156r4b9f6e94raf51d31d6664848e@mail.gmail.com> <20060509082756.GY21036@mellanox.co.il> <4460C274.8070004@ichips.intel.com> Message-ID: <20060509164433.GB5063@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: CMA: compliancy issue? > > Michael S. Tsirkin wrote: > >>>From iSER point of view, this approach is fine, and it would allow for > >>some future flexibility to reject the REP. We prefer to implement it > >>only for 2.6.19, that is when 2.6.18-rc1 is out. > > > > > >Let us start by implementing this in SVN trunk. Sean, if you agree too, > >can you > >do this? > > I'm not sure that always exposing CONNECT_RESPONSE makes sense. How about going back to my proposal then: continue exposing ESTABLISHED, change only the order of sending RTU - send it after calling the handler. > This is > slowly turning the RDMA CM into the IB CM. CONNECT_RESPONSE is really > there to support userspace, and is IB protocol specific. Is it really IB specific? What about TCP syn-ack? -- MST From halr at voltaire.com Tue May 9 09:40:09 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 09 May 2006 12:40:09 -0400 Subject: [openib-general][RFC][PATCH] core/sysfs.c: ability to reset port counters In-Reply-To: References: <20060509120613.GA3294@voltaire.com> Message-ID: <1147192809.4485.22254.camel@hal.voltaire.com> On Tue, 2006-05-09 at 12:29, Roland Dreier wrote: > Leonid> we need a possibility to reset the port counters in > Leonid> /sys/class/infiniband/mthca0/ports/1/counters/. > > Why do you need this possibility? Having counters reset locally is > going to confuse any performance manager running on the fabric that > might be reading the counters remotely. > > Is it really a must for this to be done by writing to sysfs? Couldn't > a trivial userspace app send a MAD locally to reset them? In fact, there already is one which can do this (perfquery). -- Hal > > - R. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Tue May 9 09:49:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 19:49:19 +0300 Subject: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: References: <4450A196.2050901@de.ibm.com> <445B4DA9.9040601@de.ibm.com> <44608C90.30909@de.ibm.com> Message-ID: <20060509164919.GC5063@mellanox.co.il> Quoting r. Roland Dreier : > The trivial way to do it would be to use the same idea as the current > ehca driver: just create a thread for receive CQ events and a thread > for send CQ events, and defer CQ polling into those two threads. For RX, isn't this basically what NAPI is doing? Only NAPI seems better, avoiding interrupts completely and avoiding latency hit by only getting triggered on high load ... -- MST From halr at voltaire.com Tue May 9 10:08:57 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 09 May 2006 13:08:57 -0400 Subject: [openib-general] [PATCH] OpenSM: Add SA client API for MultiPathRecord queries Message-ID: <1147194536.4485.22684.camel@hal.voltaire.com> OpenSM: Add SA client API for MultiPathRecord queries Signed-off-by: Hal Rosenstock Index: include/vendor/osm_vendor_sa_api.h =================================================================== --- include/vendor/osm_vendor_sa_api.h (revision 7007) +++ include/vendor/osm_vendor_sa_api.h (working copy) @@ -116,6 +116,8 @@ typedef enum _osmv_query_type OSMV_QUERY_UD_MULTICAST_SET, OSMV_QUERY_UD_MULTICAST_DELETE, + OSMV_QUERY_MULTIPATH_REC, + } osmv_query_type_t; /* * VALUES @@ -318,6 +320,37 @@ typedef struct _osmv_guid_pair *****/ +/****s* OpenSM Vendor SA Client/osmv_multipath_req_t +* NAME +* osmv_guid_pair_t +* +* DESCRIPTION +* Fields from which to generate a MultiPathiRecord request. +* +* SYNOPSIS +*/ +typedef struct _osmv_multipath_req_t +{ + ib_net64_t comp_mask; + uint16_t pkey; + boolean_t reversible; + uint8_t num_path; + uint8_t sl; + uint8_t independence; + uint8_t sgid_count; + uint8_t dgid_count; + ib_gid_t gids[IB_MULTIPATH_MAX_GIDS]; +} osmv_multipath_req_t; +/* +* FIELDS +* +* NOTES +* This structure is used to describe a multipath request. +* +* SEE ALSO +*****/ + + /****s* OpenSM Vendor SA Client/osmv_query_res_t * NAME * osmv_query_res_t Index: libvendor/osm_vendor_ibumad_sa.c =================================================================== --- libvendor/osm_vendor_ibumad_sa.c (revision 7007) +++ libvendor/osm_vendor_ibumad_sa.c (working copy) @@ -545,6 +545,10 @@ __osmv_send_sa_req( p_sa_mad->sm_key = p_query_req->sm_key; p_sa_mad->attr_offset = 0; p_sa_mad->comp_mask = p_sa_mad_data->comp_mask; +#ifdef DUAL_SIDED_RMPP + if( p_sa_mad->method == IB_MAD_METHOD_GETMULTI ) + p_sa_mad->rmpp_flags = IB_RMPP_FLAG_ACTIVE; +#endif if( p_sa_mad->comp_mask ) { cl_memcpy( p_sa_mad->data, p_sa_mad_data->p_attr, @@ -616,6 +620,11 @@ osmv_query_sa( ib_node_record_t node_rec; ib_portinfo_record_t port_info; ib_path_rec_t path_rec; +#ifdef DUAL_SIDED_RMPP + ib_multipath_rec_t multipath_rec; + osmv_multipath_req_t *p_mpr_req; + int i, j; +#endif ib_class_port_info_t class_port_info; osm_log_t *p_log = p_bind->p_log; ib_api_status_t status; @@ -823,6 +832,47 @@ osmv_query_sa( sa_mad_data.p_attr = p_user_query->p_attr; break; +#ifdef DUAL_SIDED_RMPP + case OSMV_QUERY_MULTIPATH_REC: + osm_log( p_log, OSM_LOG_DEBUG, + "osmv_query_sa DBG:001 %s","MULTIPATH_REC\n" ); + /* Validate sgid/dgid counts against SA client limit */ + p_mpr_req = ( osmv_multipath_req_t * ) p_query_req->p_query_input; + if ( p_mpr_req->sgid_count + p_mpr_req->dgid_count > IB_MULTIPATH_MAX_GIDS ) + { + osm_log( p_log, OSM_LOG_ERROR, + "osmv_query_sa DBG:001 MULTIPATH_REC ", + "SGID count %d DGID count %d max count %d\n", + p_mpr_req->sgid_count, p_mpr_req->dgid_count, + IB_MULTIPATH_MAX_GIDS ); + CL_ASSERT( 0 ); + return IB_ERROR; + } + cl_memclr(&multipath_rec, sizeof(ib_multipath_rec_t )); + sa_mad_data.method = IB_MAD_METHOD_GETMULTI; + sa_mad_data.attr_id = IB_MAD_ATTR_MULTIPATH_RECORD; + sa_mad_data.attr_offset = + ib_get_attr_offset( sizeof( ib_multipath_rec_t ) ); + sa_mad_data.p_attr = &multipath_rec; + sa_mad_data.comp_mask = p_mpr_req->comp_mask; + multipath_rec.num_path = p_mpr_req->num_path; + if ( p_mpr_req->reversible ) + multipath_rec.num_path |= 0x80; + else + multipath_rec.num_path &= ~0x80; + multipath_rec.pkey = p_mpr_req->pkey; + multipath_rec.sl = p_mpr_req->sl; + multipath_rec.independence = p_mpr_req->independence; + multipath_rec.sgid_count = p_mpr_req->sgid_count; + multipath_rec.dgid_count = p_mpr_req->dgid_count; + j = 0; + for (i = 0; i < p_mpr_req->sgid_count; i++, j++) + multipath_rec.gids[j] = p_mpr_req->gids[j]; + for (i = 0; i < p_mpr_req->dgid_count; i++, j++) + multipath_rec.gids[j] = p_mpr_req->gids[j]; + break; +#endif + default: osm_log( p_log, OSM_LOG_ERROR, "osmv_query_sa DBG:001 %s", "UNKNOWN\n" ); From sean.hefty at intel.com Tue May 9 10:19:08 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 9 May 2006 10:19:08 -0700 Subject: [openib-general] RE: [PATCH] cm refcount race fix In-Reply-To: <20060508054529.GC19660@mellanox.co.il> Message-ID: Here's a patch that should fix both the IB CM and RDMA CM using completions rather than spinlock / wait objects. Michael, can you test that this version works for you? Signed-off-by: Sean Hefty --- Index: cm.c =================================================================== --- cm.c (revision 6884) +++ cm.c (working copy) @@ -34,6 +34,8 @@ * * $Id$ */ + +#include #include #include #include @@ -122,7 +124,7 @@ struct cm_id_private { struct rb_node service_node; struct rb_node sidr_id_node; spinlock_t lock; /* Do not acquire inside cm.lock */ - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; struct ib_mad_send_buf *msg; @@ -160,7 +162,7 @@ static void cm_work_handler(void *data); static inline void cm_deref_id(struct cm_id_private *cm_id_priv) { if (atomic_dec_and_test(&cm_id_priv->refcount)) - wake_up(&cm_id_priv->wait); + complete(&cm_id_priv->comp); } static int cm_alloc_msg(struct cm_id_private *cm_id_priv, @@ -611,7 +613,7 @@ struct ib_cm_id *ib_create_cm_id(struct goto error; spin_lock_init(&cm_id_priv->lock); - init_waitqueue_head(&cm_id_priv->wait); + init_completion(&cm_id_priv->comp); INIT_LIST_HEAD(&cm_id_priv->work_list); atomic_set(&cm_id_priv->work_count, -1); atomic_set(&cm_id_priv->refcount, 1); @@ -776,8 +778,8 @@ retest: } cm_free_id(cm_id->local_id); - atomic_dec(&cm_id_priv->refcount); - wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount)); + cm_deref_id(cm_id_priv); + wait_for_completion(&cm_id_priv->comp); while ((work = cm_dequeue_work(cm_id_priv)) != NULL) cm_free_work(work); kfree(cm_id_priv->compare_data); Index: cma.c =================================================================== --- cma.c (revision 6948) +++ cma.c (working copy) @@ -29,6 +29,7 @@ * */ +#include #include #include #include @@ -70,7 +71,7 @@ struct cma_device { struct list_head list; struct ib_device *device; __be64 node_guid; - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; struct list_head id_list; }; @@ -111,7 +112,7 @@ struct rdma_id_private { enum cma_state state; spinlock_t lock; - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; wait_queue_head_t wait_remove; atomic_t dev_remove; @@ -244,11 +245,16 @@ static void cma_attach_to_dev(struct rdm list_add_tail(&id_priv->list, &cma_dev->id_list); } +static inline void cma_deref_dev(struct cma_device *cma_dev) +{ + if (atomic_dec_and_test(&cma_dev->refcount)) + complete(&cma_dev->comp); +} + static void cma_detach_from_dev(struct rdma_id_private *id_priv) { list_del(&id_priv->list); - if (atomic_dec_and_test(&id_priv->cma_dev->refcount)) - wake_up(&id_priv->cma_dev->wait); + cma_deref_dev(id_priv->cma_dev); id_priv->cma_dev = NULL; } @@ -288,7 +294,7 @@ static int cma_acquire_dev(struct rdma_i static void cma_deref_id(struct rdma_id_private *id_priv) { if (atomic_dec_and_test(&id_priv->refcount)) - wake_up(&id_priv->wait); + complete(&id_priv->comp); } static void cma_release_remove(struct rdma_id_private *id_priv) @@ -311,7 +317,7 @@ struct rdma_cm_id* rdma_create_id(rdma_c id_priv->id.event_handler = event_handler; id_priv->id.ps = ps; spin_lock_init(&id_priv->lock); - init_waitqueue_head(&id_priv->wait); + init_completion(&id_priv->comp); atomic_set(&id_priv->refcount, 1); init_waitqueue_head(&id_priv->wait_remove); atomic_set(&id_priv->dev_remove, 0); @@ -618,8 +624,8 @@ static void cma_destroy_listen(struct rd } list_del(&id_priv->listen_list); - atomic_dec(&id_priv->refcount); - wait_event(id_priv->wait, !atomic_read(&id_priv->refcount)); + cma_deref_id(id_priv); + wait_for_completion(&id_priv->comp); kfree(id_priv); } @@ -699,8 +705,8 @@ void rdma_destroy_id(struct rdma_cm_id * } cma_release_port(id_priv); - atomic_dec(&id_priv->refcount); - wait_event(id_priv->wait, !atomic_read(&id_priv->refcount)); + cma_deref_id(id_priv); + wait_for_completion(&id_priv->comp); kfree(id_priv->id.route.path_rec); kfree(id_priv); @@ -1778,7 +1784,7 @@ static void cma_add_one(struct ib_device if (!cma_dev->node_guid) goto err; - init_waitqueue_head(&cma_dev->wait); + init_completion(&cma_dev->comp); atomic_set(&cma_dev->refcount, 1); INIT_LIST_HEAD(&cma_dev->id_list); ib_set_client_data(device, &cma_client, cma_dev); @@ -1845,8 +1851,8 @@ static void cma_process_remove(struct cm } mutex_unlock(&lock); - atomic_dec(&cma_dev->refcount); - wait_event(cma_dev->wait, !atomic_read(&cma_dev->refcount)); + cma_deref_dev(cma_dev); + wait_for_completion(&cma_dev->comp); } static void cma_remove_one(struct ib_device *device) From mst at mellanox.co.il Tue May 9 10:27:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 20:27:03 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060508054529.GC19660@mellanox.co.il> Message-ID: <20060509172703.GA22825@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [PATCH] cm refcount race fix > > Here's a patch that should fix both the IB CM and RDMA CM using > completions rather than spinlock / wait objects. > > Michael, can you test that this version works for you? Looks sane, I'll test tomorrow. Other modules that seem to have the same issue: mad_rmpp.c mad.c ucm.c ucma.c multicast.c -- MST From halr at voltaire.com Tue May 9 10:22:33 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 09 May 2006 13:22:33 -0400 Subject: [openib-general] [PATCH] osmtest: Add rudimentary SA MultiPathRecord tests Message-ID: <1147195352.4485.22894.camel@hal.voltaire.com> osmtest: Add rudimentary SA MultiPathRecord tests Signed-off-by: Hal Rosenstock Index: osmtest/osmtest.c =================================================================== --- osmtest/osmtest.c (revision 7007) +++ osmtest/osmtest.c (working copy) @@ -1,4 +1,5 @@ /* + * Copyright (c) 2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -1009,6 +1010,75 @@ osmtest_get_path_rec_by_guid_pair( IN os return ( status ); } +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) +/********************************************************************** + **********************************************************************/ +static ib_api_status_t +osmtest_get_multipath_rec( IN osmtest_t * const p_osmt, + IN osmv_multipath_req_t *p_request, + IN osmtest_req_context_t *p_context) +{ + cl_status_t status = IB_SUCCESS; + osmv_query_req_t req; + + OSM_LOG_ENTER( &p_osmt->log, osmtest_get_multipath_rec ); + + /* + * Do a blocking query for this record in the subnet. + * The result is returned in the result field of the caller's + * context structure. + * + * The query structures are locals. + */ + cl_memclr( &req, sizeof( req ) ); + + p_context->p_osmt = p_osmt; + req.timeout_ms = p_osmt->opt.transaction_timeout; + req.retry_cnt = p_osmt->opt.retry_count; + req.flags = OSM_SA_FLAGS_SYNC; + req.query_context = p_context; + req.pfn_query_cb = osmtest_query_res_cb; + + req.query_type = OSMV_QUERY_MULTIPATH_REC; + + req.p_query_input = p_request; + req.sm_key = 0; + + status = osmv_query_sa( p_osmt->h_bind, &req ); + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: ERR 0068: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); + goto Exit; + } + + status = p_context->result.status; + + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: ERR 0069: " + "ib_query failed (%s)\n", ib_get_err_str( status ) ); + + if( status == IB_REMOTE_ERROR ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " + "Remote error = %s\n", + ib_get_mad_status_str( osm_madw_get_mad_ptr + ( p_context->result. + p_result_madw ) ) ); + } + goto Exit; + } + + Exit: + OSM_LOG_EXIT( &p_osmt->log ); + return ( status ); +} +#endif + /********************************************************************** **********************************************************************/ ib_api_status_t @@ -3896,6 +3966,8 @@ osmtest_validate_path_rec( IN osmtest_t } #ifdef VENDOR_RMPP_SUPPORT +ib_net64_t portguid = 0; + /********************************************************************** **********************************************************************/ static ib_api_status_t @@ -3953,6 +4025,8 @@ osmtest_validate_all_node_recs( IN osmte ib_get_err_str( status ) ); goto Exit; } + if (!portguid) + portguid = p_rec->node_info.port_guid; } status = osmtest_check_missing_nodes( p_osmt ); @@ -4647,6 +4721,10 @@ static ib_api_status_t osmtest_validate_against_db( IN osmtest_t * const p_osmt ) { ib_api_status_t status = IB_SUCCESS; +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) + osmtest_req_context_t context; + osmv_multipath_req_t request; +#endif OSM_LOG_ENTER( &p_osmt->log, osmtest_validate_against_db ); @@ -4660,6 +4738,98 @@ osmtest_validate_against_db( IN osmtest_ if( status != IB_SUCCESS ) goto Exit; +#if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) + cl_memclr( &context, sizeof( context ) ); + cl_memclr( &request, sizeof( request ) ); + request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT | IB_MPR_COMPMASK_DGIDCOUNT; + request.sgid_count = 1; + request.dgid_count = 1; + ib_gid_set_default( &request.gids[0], portguid ); + ib_gid_set_default( &request.gids[1], portguid ); + status = osmtest_get_multipath_rec( p_osmt, &request, &context ); + if( status != IB_SUCCESS ) + goto Exit; + + cl_memclr( &context, sizeof( context ) ); + cl_memclr( &request, sizeof( request ) ); + status = osmtest_get_multipath_rec( p_osmt, &request, &context ); + if( status == IB_SUCCESS ) + goto Exit; + else + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " + "IS EXPECTED ERROR ^^^^\n"); + } + + cl_memclr( &context, sizeof( context ) ); + cl_memclr( &request, sizeof( request ) ); + request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT; + request.sgid_count = 1; + ib_gid_set_default( &request.gids[0], portguid ); + status = osmtest_get_multipath_rec( p_osmt, &request, &context ); + if( status == IB_SUCCESS ) + goto Exit; + else + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " + "IS EXPECTED ERROR ^^^^\n"); + } + + cl_memclr( &context, sizeof( context ) ); + cl_memclr( &request, sizeof( request ) ); + request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT | IB_MPR_COMPMASK_DGIDCOUNT; + request.sgid_count = 1; + request.dgid_count = 1; + ib_gid_set_default( &request.gids[0], portguid ); + /* Set IPoIB broadcast MGID */ + request.gids[1].unicast.prefix = CL_HTON64(0xff12401bffff0000ULL); + request.gids[1].unicast.interface_id = CL_HTON64(0x00000000ffffffffULL); + status = osmtest_get_multipath_rec( p_osmt, &request, &context ); + if( status == IB_SUCCESS ) + goto Exit; + else + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec: " + "IS EXPECTED ERROR ^^^^\n"); + } + + cl_memclr( &context, sizeof( context ) ); + request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT | IB_MPR_COMPMASK_DGIDCOUNT; + request.sgid_count = 1; + request.dgid_count = 1; + /* Set IPoIB broadcast MGID */ + request.gids[0].unicast.prefix = CL_HTON64(0xff12401bffff0000ULL); + request.gids[0].unicast.interface_id = CL_HTON64(0x00000000ffffffffULL); + ib_gid_set_default( &request.gids[1], portguid ); + status = osmtest_get_multipath_rec( p_osmt, &request, &context ); + if( status == IB_SUCCESS ) + goto Exit; + else + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_get_multipath_rec_gid_ipoib_bcast: " + "IS EXPECTED ERROR ^^^^\n"); + } + + cl_memclr( &context, sizeof( context ) ); + cl_memclr( &request, sizeof( request ) ); + request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT | IB_MPR_COMPMASK_DGIDCOUNT | + IB_MPR_COMPMASK_NUMBPATH; + request.sgid_count = 2; + request.dgid_count = 2; + request.num_path = 2; + ib_gid_set_default( &request.gids[0], portguid ); + ib_gid_set_default( &request.gids[1], portguid ); + ib_gid_set_default( &request.gids[2], portguid ); + ib_gid_set_default( &request.gids[3], portguid ); + status = osmtest_get_multipath_rec( p_osmt, &request, &context ); + if( status != IB_SUCCESS ) + goto Exit; +#endif + #ifdef VENDOR_RMPP_SUPPORT if (! p_osmt->opt.ignore_path_records) { From rdreier at cisco.com Tue May 9 11:00:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 09 May 2006 11:00:07 -0700 Subject: [openib-general] [GIT PULL] InfiniBand updates for 2.6.17-rc3 Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus The changes and patch are: Ralph Campbell: IB: Fix display of 4-bit port counters in sysfs Roland Dreier: IB/srp: Fix tracking of pending requests during error handling IB/mthca: Fix race in reference counting drivers/infiniband/core/sysfs.c | 2 drivers/infiniband/hw/mthca/mthca_cq.c | 41 +++-- drivers/infiniband/hw/mthca/mthca_dev.h | 2 drivers/infiniband/hw/mthca/mthca_provider.h | 22 ++- drivers/infiniband/hw/mthca/mthca_qp.c | 31 +++- drivers/infiniband/hw/mthca/mthca_srq.c | 23 ++- drivers/infiniband/ulp/srp/ib_srp.c | 195 +++++++++++++++----------- drivers/infiniband/ulp/srp/ib_srp.h | 4 - 8 files changed, 190 insertions(+), 130 deletions(-) diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c index 15121cb..21f9282 100644 --- a/drivers/infiniband/core/sysfs.c +++ b/drivers/infiniband/core/sysfs.c @@ -336,7 +336,7 @@ static ssize_t show_pma_counter(struct i switch (width) { case 4: ret = sprintf(buf, "%u\n", (out_mad->data[40 + offset / 8] >> - (offset % 4)) & 0xf); + (4 - (offset % 8))) & 0xf); break; case 8: ret = sprintf(buf, "%u\n", out_mad->data[40 + offset / 8]); diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c index 312cf90..205854e 100644 --- a/drivers/infiniband/hw/mthca/mthca_cq.c +++ b/drivers/infiniband/hw/mthca/mthca_cq.c @@ -238,9 +238,9 @@ void mthca_cq_event(struct mthca_dev *de spin_lock(&dev->cq_table.lock); cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1)); - if (cq) - atomic_inc(&cq->refcount); + ++cq->refcount; + spin_unlock(&dev->cq_table.lock); if (!cq) { @@ -254,8 +254,10 @@ void mthca_cq_event(struct mthca_dev *de if (cq->ibcq.event_handler) cq->ibcq.event_handler(&event, cq->ibcq.cq_context); - if (atomic_dec_and_test(&cq->refcount)) + spin_lock(&dev->cq_table.lock); + if (!--cq->refcount) wake_up(&cq->wait); + spin_unlock(&dev->cq_table.lock); } static inline int is_recv_cqe(struct mthca_cqe *cqe) @@ -267,23 +269,13 @@ static inline int is_recv_cqe(struct mth return !(cqe->is_send & 0x80); } -void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn, +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, struct mthca_srq *srq) { - struct mthca_cq *cq; struct mthca_cqe *cqe; u32 prod_index; int nfreed = 0; - spin_lock_irq(&dev->cq_table.lock); - cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1)); - if (cq) - atomic_inc(&cq->refcount); - spin_unlock_irq(&dev->cq_table.lock); - - if (!cq) - return; - spin_lock_irq(&cq->lock); /* @@ -301,7 +293,7 @@ void mthca_cq_clean(struct mthca_dev *de if (0) mthca_dbg(dev, "Cleaning QPN %06x from CQN %06x; ci %d, pi %d\n", - qpn, cqn, cq->cons_index, prod_index); + qpn, cq->cqn, cq->cons_index, prod_index); /* * Now sweep backwards through the CQ, removing CQ entries @@ -325,8 +317,6 @@ void mthca_cq_clean(struct mthca_dev *de } spin_unlock_irq(&cq->lock); - if (atomic_dec_and_test(&cq->refcount)) - wake_up(&cq->wait); } void mthca_cq_resize_copy_cqes(struct mthca_cq *cq) @@ -821,7 +811,7 @@ int mthca_init_cq(struct mthca_dev *dev, } spin_lock_init(&cq->lock); - atomic_set(&cq->refcount, 1); + cq->refcount = 1; init_waitqueue_head(&cq->wait); memset(cq_context, 0, sizeof *cq_context); @@ -896,6 +886,17 @@ err_out: return err; } +static inline int get_cq_refcount(struct mthca_dev *dev, struct mthca_cq *cq) +{ + int c; + + spin_lock_irq(&dev->cq_table.lock); + c = cq->refcount; + spin_unlock_irq(&dev->cq_table.lock); + + return c; +} + void mthca_free_cq(struct mthca_dev *dev, struct mthca_cq *cq) { @@ -929,6 +930,7 @@ void mthca_free_cq(struct mthca_dev *dev spin_lock_irq(&dev->cq_table.lock); mthca_array_clear(&dev->cq_table.cq, cq->cqn & (dev->limits.num_cqs - 1)); + --cq->refcount; spin_unlock_irq(&dev->cq_table.lock); if (dev->mthca_flags & MTHCA_FLAG_MSI_X) @@ -936,8 +938,7 @@ void mthca_free_cq(struct mthca_dev *dev else synchronize_irq(dev->pdev->irq); - atomic_dec(&cq->refcount); - wait_event(cq->wait, !atomic_read(&cq->refcount)); + wait_event(cq->wait, !get_cq_refcount(dev, cq)); if (cq->is_kernel) { mthca_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe); diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h index 4c1dcb4..f8160b8 100644 --- a/drivers/infiniband/hw/mthca/mthca_dev.h +++ b/drivers/infiniband/hw/mthca/mthca_dev.h @@ -496,7 +496,7 @@ void mthca_free_cq(struct mthca_dev *dev void mthca_cq_completion(struct mthca_dev *dev, u32 cqn); void mthca_cq_event(struct mthca_dev *dev, u32 cqn, enum ib_event_type event_type); -void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn, +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, struct mthca_srq *srq); void mthca_cq_resize_copy_cqes(struct mthca_cq *cq); int mthca_alloc_cq_buf(struct mthca_dev *dev, struct mthca_cq_buf *buf, int nent); diff --git a/drivers/infiniband/hw/mthca/mthca_provider.h b/drivers/infiniband/hw/mthca/mthca_provider.h index 6676a78..179a8f6 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.h +++ b/drivers/infiniband/hw/mthca/mthca_provider.h @@ -139,11 +139,12 @@ struct mthca_ah { * a qp may be locked, with the send cq locked first. No other * nesting should be done. * - * Each struct mthca_cq/qp also has an atomic_t ref count. The - * pointer from the cq/qp_table to the struct counts as one reference. - * This reference also is good for access through the consumer API, so - * modifying the CQ/QP etc doesn't need to take another reference. - * Access because of a completion being polled does need a reference. + * Each struct mthca_cq/qp also has an ref count, protected by the + * corresponding table lock. The pointer from the cq/qp_table to the + * struct counts as one reference. This reference also is good for + * access through the consumer API, so modifying the CQ/QP etc doesn't + * need to take another reference. Access to a QP because of a + * completion being polled does not need a reference either. * * Finally, each struct mthca_cq/qp has a wait_queue_head_t for the * destroy function to sleep on. @@ -159,8 +160,9 @@ struct mthca_ah { * - decrement ref count; if zero, wake up waiters * * To destroy a CQ/QP, we can do the following: - * - lock cq/qp_table, remove pointer, unlock cq/qp_table lock - * - decrement ref count + * - lock cq/qp_table + * - remove pointer and decrement ref count + * - unlock cq/qp_table lock * - wait_event until ref count is zero * * It is the consumer's responsibilty to make sure that no QP @@ -197,7 +199,7 @@ struct mthca_cq_resize { struct mthca_cq { struct ib_cq ibcq; spinlock_t lock; - atomic_t refcount; + int refcount; int cqn; u32 cons_index; struct mthca_cq_buf buf; @@ -217,7 +219,7 @@ struct mthca_cq { struct mthca_srq { struct ib_srq ibsrq; spinlock_t lock; - atomic_t refcount; + int refcount; int srqn; int max; int max_gs; @@ -254,7 +256,7 @@ struct mthca_wq { struct mthca_qp { struct ib_qp ibqp; - atomic_t refcount; + int refcount; u32 qpn; int is_direct; u8 port; /* for SQP and memfree use only */ diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index f37b0e3..19765f6 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -240,7 +240,7 @@ void mthca_qp_event(struct mthca_dev *de spin_lock(&dev->qp_table.lock); qp = mthca_array_get(&dev->qp_table.qp, qpn & (dev->limits.num_qps - 1)); if (qp) - atomic_inc(&qp->refcount); + ++qp->refcount; spin_unlock(&dev->qp_table.lock); if (!qp) { @@ -257,8 +257,10 @@ void mthca_qp_event(struct mthca_dev *de if (qp->ibqp.event_handler) qp->ibqp.event_handler(&event, qp->ibqp.qp_context); - if (atomic_dec_and_test(&qp->refcount)) + spin_lock(&dev->qp_table.lock); + if (!--qp->refcount) wake_up(&qp->wait); + spin_unlock(&dev->qp_table.lock); } static int to_mthca_state(enum ib_qp_state ib_state) @@ -833,10 +835,10 @@ int mthca_modify_qp(struct ib_qp *ibqp, * entries and reinitialize the QP. */ if (new_state == IB_QPS_RESET && !qp->ibqp.uobject) { - mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); if (qp->ibqp.send_cq != qp->ibqp.recv_cq) - mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); mthca_wq_init(&qp->sq); @@ -1096,7 +1098,7 @@ static int mthca_alloc_qp_common(struct int ret; int i; - atomic_set(&qp->refcount, 1); + qp->refcount = 1; init_waitqueue_head(&qp->wait); qp->state = IB_QPS_RESET; qp->atomic_rd_en = 0; @@ -1318,6 +1320,17 @@ int mthca_alloc_sqp(struct mthca_dev *de return err; } +static inline int get_qp_refcount(struct mthca_dev *dev, struct mthca_qp *qp) +{ + int c; + + spin_lock_irq(&dev->qp_table.lock); + c = qp->refcount; + spin_unlock_irq(&dev->qp_table.lock); + + return c; +} + void mthca_free_qp(struct mthca_dev *dev, struct mthca_qp *qp) { @@ -1339,14 +1352,14 @@ void mthca_free_qp(struct mthca_dev *dev spin_lock(&dev->qp_table.lock); mthca_array_clear(&dev->qp_table.qp, qp->qpn & (dev->limits.num_qps - 1)); + --qp->refcount; spin_unlock(&dev->qp_table.lock); if (send_cq != recv_cq) spin_unlock(&recv_cq->lock); spin_unlock_irq(&send_cq->lock); - atomic_dec(&qp->refcount); - wait_event(qp->wait, !atomic_read(&qp->refcount)); + wait_event(qp->wait, !get_qp_refcount(dev, qp)); if (qp->state != IB_QPS_RESET) mthca_MODIFY_QP(dev, qp->state, IB_QPS_RESET, qp->qpn, 0, @@ -1358,10 +1371,10 @@ void mthca_free_qp(struct mthca_dev *dev * unref the mem-free tables and free the QPN in our table. */ if (!qp->ibqp.uobject) { - mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); if (qp->ibqp.send_cq != qp->ibqp.recv_cq) - mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); mthca_free_memfree(dev, qp); diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c index adcaf85..1ea4332 100644 --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -241,7 +241,7 @@ int mthca_alloc_srq(struct mthca_dev *de goto err_out_mailbox; spin_lock_init(&srq->lock); - atomic_set(&srq->refcount, 1); + srq->refcount = 1; init_waitqueue_head(&srq->wait); if (mthca_is_memfree(dev)) @@ -308,6 +308,17 @@ err_out: return err; } +static inline int get_srq_refcount(struct mthca_dev *dev, struct mthca_srq *srq) +{ + int c; + + spin_lock_irq(&dev->srq_table.lock); + c = srq->refcount; + spin_unlock_irq(&dev->srq_table.lock); + + return c; +} + void mthca_free_srq(struct mthca_dev *dev, struct mthca_srq *srq) { struct mthca_mailbox *mailbox; @@ -329,10 +340,10 @@ void mthca_free_srq(struct mthca_dev *de spin_lock_irq(&dev->srq_table.lock); mthca_array_clear(&dev->srq_table.srq, srq->srqn & (dev->limits.num_srqs - 1)); + --srq->refcount; spin_unlock_irq(&dev->srq_table.lock); - atomic_dec(&srq->refcount); - wait_event(srq->wait, !atomic_read(&srq->refcount)); + wait_event(srq->wait, !get_srq_refcount(dev, srq)); if (!srq->ibsrq.uobject) { mthca_free_srq_buf(dev, srq); @@ -414,7 +425,7 @@ void mthca_srq_event(struct mthca_dev *d spin_lock(&dev->srq_table.lock); srq = mthca_array_get(&dev->srq_table.srq, srqn & (dev->limits.num_srqs - 1)); if (srq) - atomic_inc(&srq->refcount); + ++srq->refcount; spin_unlock(&dev->srq_table.lock); if (!srq) { @@ -431,8 +442,10 @@ void mthca_srq_event(struct mthca_dev *d srq->ibsrq.event_handler(&event, srq->ibsrq.srq_context); out: - if (atomic_dec_and_test(&srq->refcount)) + spin_lock(&dev->srq_table.lock); + if (!--srq->refcount) wake_up(&srq->wait); + spin_unlock(&dev->srq_table.lock); } /* diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 5bb5574..c32ce43 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -409,6 +409,34 @@ static int srp_connect_target(struct srp } } +static void srp_unmap_data(struct scsi_cmnd *scmnd, + struct srp_target_port *target, + struct srp_request *req) +{ + struct scatterlist *scat; + int nents; + + if (!scmnd->request_buffer || + (scmnd->sc_data_direction != DMA_TO_DEVICE && + scmnd->sc_data_direction != DMA_FROM_DEVICE)) + return; + + /* + * This handling of non-SG commands can be killed when the + * SCSI midlayer no longer generates non-SG commands. + */ + if (likely(scmnd->use_sg)) { + nents = scmnd->use_sg; + scat = scmnd->request_buffer; + } else { + nents = 1; + scat = &req->fake_sg; + } + + dma_unmap_sg(target->srp_host->dev->dma_device, scat, nents, + scmnd->sc_data_direction); +} + static int srp_reconnect_target(struct srp_target_port *target) { struct ib_cm_id *new_cm_id; @@ -455,16 +483,16 @@ static int srp_reconnect_target(struct s list_for_each_entry(req, &target->req_queue, list) { req->scmnd->result = DID_RESET << 16; req->scmnd->scsi_done(req->scmnd); + srp_unmap_data(req->scmnd, target, req); } target->rx_head = 0; target->tx_head = 0; target->tx_tail = 0; - target->req_head = 0; - for (i = 0; i < SRP_SQ_SIZE - 1; ++i) - target->req_ring[i].next = i + 1; - target->req_ring[SRP_SQ_SIZE - 1].next = -1; + INIT_LIST_HEAD(&target->free_reqs); INIT_LIST_HEAD(&target->req_queue); + for (i = 0; i < SRP_SQ_SIZE; ++i) + list_add_tail(&target->req_ring[i].list, &target->free_reqs); ret = srp_connect_target(target); if (ret) @@ -589,40 +617,10 @@ static int srp_map_data(struct scsi_cmnd return len; } -static void srp_unmap_data(struct scsi_cmnd *scmnd, - struct srp_target_port *target, - struct srp_request *req) -{ - struct scatterlist *scat; - int nents; - - if (!scmnd->request_buffer || - (scmnd->sc_data_direction != DMA_TO_DEVICE && - scmnd->sc_data_direction != DMA_FROM_DEVICE)) - return; - - /* - * This handling of non-SG commands can be killed when the - * SCSI midlayer no longer generates non-SG commands. - */ - if (likely(scmnd->use_sg)) { - nents = scmnd->use_sg; - scat = scmnd->request_buffer; - } else { - nents = 1; - scat = &req->fake_sg; - } - - dma_unmap_sg(target->srp_host->dev->dma_device, scat, nents, - scmnd->sc_data_direction); -} - -static void srp_remove_req(struct srp_target_port *target, struct srp_request *req, - int index) +static void srp_remove_req(struct srp_target_port *target, struct srp_request *req) { - list_del(&req->list); - req->next = target->req_head; - target->req_head = index; + srp_unmap_data(req->scmnd, target, req); + list_move_tail(&req->list, &target->free_reqs); } static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp) @@ -647,7 +645,7 @@ static void srp_process_rsp(struct srp_t req->tsk_status = rsp->data[3]; complete(&req->done); } else { - scmnd = req->scmnd; + scmnd = req->scmnd; if (!scmnd) printk(KERN_ERR "Null scmnd for RSP w/tag %016llx\n", (unsigned long long) rsp->tag); @@ -665,14 +663,11 @@ static void srp_process_rsp(struct srp_t else if (rsp->flags & (SRP_RSP_FLAG_DIOVER | SRP_RSP_FLAG_DIUNDER)) scmnd->resid = be32_to_cpu(rsp->data_in_res_cnt); - srp_unmap_data(scmnd, target, req); - if (!req->tsk_mgmt) { - req->scmnd = NULL; scmnd->host_scribble = (void *) -1L; scmnd->scsi_done(scmnd); - srp_remove_req(target, req, rsp->tag & ~SRP_TAG_TSK_MGMT); + srp_remove_req(target, req); } else req->cmd_done = 1; } @@ -859,7 +854,6 @@ static int srp_queuecommand(struct scsi_ struct srp_request *req; struct srp_iu *iu; struct srp_cmd *cmd; - long req_index; int len; if (target->state == SRP_TARGET_CONNECTING) @@ -879,22 +873,20 @@ static int srp_queuecommand(struct scsi_ dma_sync_single_for_cpu(target->srp_host->dev->dma_device, iu->dma, SRP_MAX_IU_LEN, DMA_TO_DEVICE); - req_index = target->req_head; + req = list_entry(target->free_reqs.next, struct srp_request, list); scmnd->scsi_done = done; scmnd->result = 0; - scmnd->host_scribble = (void *) req_index; + scmnd->host_scribble = (void *) (long) req->index; cmd = iu->buf; memset(cmd, 0, sizeof *cmd); cmd->opcode = SRP_CMD; cmd->lun = cpu_to_be64((u64) scmnd->device->lun << 48); - cmd->tag = req_index; + cmd->tag = req->index; memcpy(cmd->cdb, scmnd->cmnd, scmnd->cmd_len); - req = &target->req_ring[req_index]; - req->scmnd = scmnd; req->cmd = iu; req->cmd_done = 0; @@ -919,8 +911,7 @@ static int srp_queuecommand(struct scsi_ goto err_unmap; } - target->req_head = req->next; - list_add_tail(&req->list, &target->req_queue); + list_move_tail(&req->list, &target->req_queue); return 0; @@ -1143,30 +1134,20 @@ static int srp_cm_handler(struct ib_cm_i return 0; } -static int srp_send_tsk_mgmt(struct scsi_cmnd *scmnd, u8 func) +static int srp_send_tsk_mgmt(struct srp_target_port *target, + struct srp_request *req, u8 func) { - struct srp_target_port *target = host_to_target(scmnd->device->host); - struct srp_request *req; struct srp_iu *iu; struct srp_tsk_mgmt *tsk_mgmt; - int req_index; - int ret = FAILED; spin_lock_irq(target->scsi_host->host_lock); if (target->state == SRP_TARGET_DEAD || target->state == SRP_TARGET_REMOVED) { - scmnd->result = DID_BAD_TARGET << 16; + req->scmnd->result = DID_BAD_TARGET << 16; goto out; } - if (scmnd->host_scribble == (void *) -1L) - goto out; - - req_index = (long) scmnd->host_scribble; - printk(KERN_ERR "Abort for req_index %d\n", req_index); - - req = &target->req_ring[req_index]; init_completion(&req->done); iu = __srp_get_tx_iu(target); @@ -1177,10 +1158,10 @@ static int srp_send_tsk_mgmt(struct scsi memset(tsk_mgmt, 0, sizeof *tsk_mgmt); tsk_mgmt->opcode = SRP_TSK_MGMT; - tsk_mgmt->lun = cpu_to_be64((u64) scmnd->device->lun << 48); - tsk_mgmt->tag = req_index | SRP_TAG_TSK_MGMT; + tsk_mgmt->lun = cpu_to_be64((u64) req->scmnd->device->lun << 48); + tsk_mgmt->tag = req->index | SRP_TAG_TSK_MGMT; tsk_mgmt->tsk_mgmt_func = func; - tsk_mgmt->task_tag = req_index; + tsk_mgmt->task_tag = req->index; if (__srp_post_send(target, iu, sizeof *tsk_mgmt)) goto out; @@ -1188,37 +1169,85 @@ static int srp_send_tsk_mgmt(struct scsi req->tsk_mgmt = iu; spin_unlock_irq(target->scsi_host->host_lock); + if (!wait_for_completion_timeout(&req->done, msecs_to_jiffies(SRP_ABORT_TIMEOUT_MS))) - return FAILED; - spin_lock_irq(target->scsi_host->host_lock); + return -1; - if (req->cmd_done) { - srp_remove_req(target, req, req_index); - scmnd->scsi_done(scmnd); - } else if (!req->tsk_status) { - srp_remove_req(target, req, req_index); - scmnd->result = DID_ABORT << 16; - ret = SUCCESS; - } + return 0; out: spin_unlock_irq(target->scsi_host->host_lock); - return ret; + return -1; +} + +static int srp_find_req(struct srp_target_port *target, + struct scsi_cmnd *scmnd, + struct srp_request **req) +{ + if (scmnd->host_scribble == (void *) -1L) + return -1; + + *req = &target->req_ring[(long) scmnd->host_scribble]; + + return 0; } static int srp_abort(struct scsi_cmnd *scmnd) { + struct srp_target_port *target = host_to_target(scmnd->device->host); + struct srp_request *req; + int ret = SUCCESS; + printk(KERN_ERR "SRP abort called\n"); - return srp_send_tsk_mgmt(scmnd, SRP_TSK_ABORT_TASK); + if (srp_find_req(target, scmnd, &req)) + return FAILED; + if (srp_send_tsk_mgmt(target, req, SRP_TSK_ABORT_TASK)) + return FAILED; + + spin_lock_irq(target->scsi_host->host_lock); + + if (req->cmd_done) { + srp_remove_req(target, req); + scmnd->scsi_done(scmnd); + } else if (!req->tsk_status) { + srp_remove_req(target, req); + scmnd->result = DID_ABORT << 16; + } else + ret = FAILED; + + spin_unlock_irq(target->scsi_host->host_lock); + + return ret; } static int srp_reset_device(struct scsi_cmnd *scmnd) { + struct srp_target_port *target = host_to_target(scmnd->device->host); + struct srp_request *req, *tmp; + printk(KERN_ERR "SRP reset_device called\n"); - return srp_send_tsk_mgmt(scmnd, SRP_TSK_LUN_RESET); + if (srp_find_req(target, scmnd, &req)) + return FAILED; + if (srp_send_tsk_mgmt(target, req, SRP_TSK_LUN_RESET)) + return FAILED; + if (req->tsk_status) + return FAILED; + + spin_lock_irq(target->scsi_host->host_lock); + + list_for_each_entry_safe(req, tmp, &target->req_queue, list) + if (req->scmnd->device == scmnd->device) { + req->scmnd->result = DID_RESET << 16; + scmnd->scsi_done(scmnd); + srp_remove_req(target, req); + } + + spin_unlock_irq(target->scsi_host->host_lock); + + return SUCCESS; } static int srp_reset_host(struct scsi_cmnd *scmnd) @@ -1518,10 +1547,12 @@ static ssize_t srp_create_target(struct INIT_WORK(&target->work, srp_reconnect_work, target); - for (i = 0; i < SRP_SQ_SIZE - 1; ++i) - target->req_ring[i].next = i + 1; - target->req_ring[SRP_SQ_SIZE - 1].next = -1; + INIT_LIST_HEAD(&target->free_reqs); INIT_LIST_HEAD(&target->req_queue); + for (i = 0; i < SRP_SQ_SIZE; ++i) { + target->req_ring[i].index = i; + list_add_tail(&target->req_ring[i].list, &target->free_reqs); + } ret = srp_parse_options(buf, target); if (ret) diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index bd7f7c3..c5cd43a 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -101,7 +101,7 @@ struct srp_request { */ struct scatterlist fake_sg; struct completion done; - short next; + short index; u8 cmd_done; u8 tsk_status; }; @@ -133,7 +133,7 @@ struct srp_target_port { unsigned tx_tail; struct srp_iu *tx_ring[SRP_SQ_SIZE + 1]; - int req_head; + struct list_head free_reqs; struct list_head req_queue; struct srp_request req_ring[SRP_SQ_SIZE]; From sashak at voltaire.com Tue May 9 11:00:59 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 09 May 2006 21:00:59 +0300 Subject: [openib-general] [PATCH 0/3] Series short description Message-ID: <20060509180059.14584.31483.stgit@sashak.voltaire.com> Hello, There is "take 2" of support for low level Quality of Service (QoS) parameters configuration and setup in OpenSM. The changes against previous version are: - consistent default values for QoS configuration parameters (as suggested by Eitan Zahavi) - global '--no_qos' option which disables QoS setup at all Please comment. Thanks. Sasha. From sashak at voltaire.com Tue May 9 11:15:48 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 09 May 2006 21:15:48 +0300 Subject: [openib-general] [PATCH 1/3] opensm: low-level QoS configuration In-Reply-To: <20060509180059.14584.31483.stgit@sashak.voltaire.com> References: <20060509180059.14584.31483.stgit@sashak.voltaire.com> Message-ID: <20060509181548.14584.39036.stgit@sashak.voltaire.com> Trivial low-level QoS configuration parameters description, definition and processing. Signed-off-by: Sasha Khapyorsky --- osm/doc/qos-config.txt | 44 +++++++++++++ osm/include/opensm/osm_subnet.h | 81 ++++++++++++++++++++++++ osm/opensm/osm_subnet.c | 133 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 258 insertions(+), 0 deletions(-) diff --git a/osm/doc/qos-config.txt b/osm/doc/qos-config.txt new file mode 100644 index 0000000..2f9373c --- /dev/null +++ b/osm/doc/qos-config.txt @@ -0,0 +1,44 @@ +Trivial low level QoS configuration proposition. +=============================================== + +Basically we have set of QoS related low-level configuration parameters. +All those parameter names are prefixed by "qos_" string. There is full +list of such parameters: + + qos_max_vls - The number of maximum VLs will be on the Subnet + qos_high_limit - The limit of High Priority component of VL Arbitration + table (IBA 7.6.9) + qos_vlarb_low - High priority VL Arbitration table (IBA 7.6.9) template. + qos_vlarb_high - Low priority VL Arbitration table (IBA 7.6.9) template. + Both VL arbitration templates are pairs of VL and weight. + qos_sl2vl - SL2VL Mapping table (IBA 7.6.6) template. It is a list + of VLs corresponding to SLs 0-15. (Note the VL15 used + here means drop this SL). + +Typical default values (hard-coded in OpenSM initialization) are: + + qos_max_vls=15 + qos_high_limit=0 + qos_vlarb_low=0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 + qos_vlarb_high=0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 + qos_sl2vl=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 + +The syntax is compatible with rest of OpenSM configuration options and +values may be stored in OpenSM config file (cached options file). + +In addition to above we may to define separate QoS configuration +parameters sets for various target types. As targets we currently support +HCA, routers, switch external ports and switch's enhanced port 0. The +names of such specialized parameters are prefixed by "qos__" +string. There is full list of currently supported sets: + + qos_hca_ - QoS configuration parameters set for HCAs. + qos_rtr_ - parameters set for routers. + qos_sw0_ - parameters set for switches' port 0. + qos_swe_ - parameters set for switches' external ports. + +Examples: + + qos_sw0_max_vls=2 + qos_hca_sl2vl=0,1,2,3,5,5,5,12,12,0, + qos_swe_high_limit=0 diff --git a/osm/include/opensm/osm_subnet.h b/osm/include/opensm/osm_subnet.h index 767e598..0da3f0c 100644 --- a/osm/include/opensm/osm_subnet.h +++ b/osm/include/opensm/osm_subnet.h @@ -180,6 +180,44 @@ typedef enum _osm_testability_modes } osm_testability_modes_t; /***********/ +/****s* OpenSM: Subnet/osm_qos_options_t +* NAME +* osm_qos_options_t +* +* DESCRIPTION +* Subnet QoS options structure. This structure contains the various +* QoS specific configuration parameters for the subnet. +* +* SYNOPSIS +*/ +typedef struct _osm_qos_options_t { + unsigned max_vls; + unsigned high_limit; + char *vlarb_high; + char *vlarb_low; + char *sl2vl; +} osm_qos_options_t; +/* +* FIELDS +* +* max_vls +* The number of maximum VLs on the Subnet +* +* high_limit +* The limit of High Priority component of VL Arbitration +* table (IBA 7.6.9) +* +* vlarb_high +* High priority VL Arbitration table template. +* +* vlarb_low +* Low priority VL Arbitration table template. +* +* sl2vl +* SL2VL Mapping table (IBA 7.6.6) template. +* +*********/ + /****s* OpenSM: Subnet/osm_subn_opt_t * NAME * osm_subn_opt_t @@ -242,6 +280,10 @@ typedef struct _osm_subn_opt char * updn_guid_file; boolean_t exit_on_fatal; boolean_t honor_guid2lid_file; + osm_qos_options_t qos_options; + osm_qos_options_t qos_hca_options; + osm_qos_options_t qos_sw0_options; + osm_qos_options_t qos_swe_options; } osm_subn_opt_t; /* * FIELDS @@ -394,6 +436,18 @@ typedef struct _osm_subn_opt * the file will be honored when SM is coming out of STANDBY. * By default this is FALSE. * +* qos_options +* Default set of QoS options +* +* qos_hca_options +* QoS options for HCA ports +* +* qos_sw0_options +* QoS options for switches' port 0 +* +* qos_swe_options +* QoS options for switches' external ports +* * SEE ALSO * Subnet object *********/ @@ -1016,6 +1070,33 @@ osm_subn_parse_conf_file( * osm_subn_is_inited *********/ +/****f* OpenSM: Subnet/osm_subn_parse_conf_file +* NAME +* osm_subn_rescan_conf_file +* +* DESCRIPTION +* The osm_subn_rescan_conf_file function parses the configuration +* file and update selected subnet options +* +* SYNOPSIS +*/ +void +osm_subn_rescan_conf_file( + IN osm_subn_opt_t* const p_opts ); +/* +* PARAMETERS +* +* p_opt +* [in] Pointer to the subnet options structure. +* +* RETURN VALUES +* None +* +* NOTES +* This uses the same file as osm_subn_parse_conf_file() +* +*********/ + /****f* OpenSM: Subnet/osm_subn_write_conf_file * NAME * osm_subn_write_conf_file diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c index ef64e05..4580ed1 100644 --- a/osm/opensm/osm_subnet.c +++ b/osm/opensm/osm_subnet.c @@ -398,6 +398,19 @@ osm_get_port_by_guid( /********************************************************************** **********************************************************************/ +static void +subn_set_default_qos_options( + IN osm_qos_options_t *opt) +{ + opt->max_vls = 15; + opt->high_limit = 0; + opt->vlarb_high = "0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0"; + opt->vlarb_low = "0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4"; + opt->sl2vl = "0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7"; +} + +/********************************************************************** + **********************************************************************/ void osm_subn_set_default_opt( IN osm_subn_opt_t* const p_opt ) @@ -457,6 +470,10 @@ osm_subn_set_default_opt( p_opt->updn_activate = FALSE; p_opt->updn_guid_file = NULL; p_opt->exit_on_fatal = TRUE; + subn_set_default_qos_options(&p_opt->qos_options); + subn_set_default_qos_options(&p_opt->qos_hca_options); + subn_set_default_qos_options(&p_opt->qos_sw0_options); + subn_set_default_qos_options(&p_opt->qos_swe_options); } /********************************************************************** @@ -619,6 +636,95 @@ __osm_subn_opts_unpack_charp( /********************************************************************** **********************************************************************/ +static void +subn_parse_qos_options( + IN const char *prefix, + IN char *p_key, + IN char *p_val_str, + IN osm_qos_options_t *opt) +{ + char name[256]; + snprintf(name, sizeof(name), "%s_max_vls", prefix); + __osm_subn_opts_unpack_uint32(name, p_key, p_val_str, &opt->max_vls); + snprintf(name, sizeof(name), "%s_high_limit", prefix); + __osm_subn_opts_unpack_uint32(name, p_key, p_val_str, &opt->high_limit); + snprintf(name, sizeof(name), "%s_vlarb_high", prefix); + __osm_subn_opts_unpack_charp(name, p_key, p_val_str, &opt->vlarb_high); + snprintf(name, sizeof(name), "%s_vlarb_low", prefix); + __osm_subn_opts_unpack_charp(name, p_key, p_val_str, &opt->vlarb_low); + snprintf(name, sizeof(name), "%s_sl2vl", prefix); + __osm_subn_opts_unpack_charp(name, p_key, p_val_str, &opt->sl2vl); +} + +static int +subn_dump_qos_options( + FILE *file, + const char *set_name, + const char *prefix, + osm_qos_options_t *opt) +{ + return fprintf(file, "# %s\n" + "%s_max_vls %u\n" + "%s_high_limit %u\n" + "%s_vlarb_high %s\n" + "%s_vlarb_low %s\n" + "%s_sl2vl %s\n", + set_name, + prefix, opt->max_vls, + prefix, opt->high_limit, + prefix, opt->vlarb_high, + prefix, opt->vlarb_low, + prefix, opt->sl2vl); +} + +/********************************************************************** + **********************************************************************/ +void +osm_subn_rescan_conf_file( + IN osm_subn_opt_t* const p_opts ) +{ + char *p_cache_dir = getenv("OSM_CACHE_DIR"); + char file_name[256]; + FILE *opts_file; + char line[1024]; + char *p_key, *p_val ,*p_last; + + /* try to open the options file from the cache dir */ + if (! p_cache_dir) p_cache_dir = OSM_DEFAULT_CACHE_DIR; + + strcpy(file_name, p_cache_dir); + strcat(file_name,"opensm.opts"); + + opts_file = fopen(file_name, "r"); + if (!opts_file) return; + + while (fgets(line, 1023, opts_file) != NULL) + { + /* get the first token */ + p_key = strtok_r(line, " \t\n", &p_last); + if (p_key) + { + p_val = strtok_r(NULL, " \t\n", &p_last); + + subn_parse_qos_options("qos", + p_key, p_val, &p_opts->qos_options); + + subn_parse_qos_options("qos_hca", + p_key, p_val, &p_opts->qos_hca_options); + + subn_parse_qos_options("qos_sw0", + p_key, p_val, &p_opts->qos_sw0_options); + + subn_parse_qos_options("qos_swe", + p_key, p_val, &p_opts->qos_swe_options); + + } + } + fclose(opts_file); +} + +/********************************************************************** + **********************************************************************/ void osm_subn_parse_conf_file( IN osm_subn_opt_t* const p_opts ) @@ -802,6 +908,18 @@ osm_subn_parse_conf_file( "honor_guid2lid_file", p_key, p_val, &p_opts->honor_guid2lid_file); + subn_parse_qos_options("qos", + p_key, p_val, &p_opts->qos_options); + + subn_parse_qos_options("qos_hca", + p_key, p_val, &p_opts->qos_hca_options); + + subn_parse_qos_options("qos_sw0", + p_key, p_val, &p_opts->qos_sw0_options); + + subn_parse_qos_options("qos_swe", + p_key, p_val, &p_opts->qos_swe_options); + } } fclose(opts_file); @@ -997,6 +1115,21 @@ osm_subn_write_conf_file( p_opts->exit_on_fatal ? "TRUE" : "FALSE" ); + fprintf( + opts_file, + "#\n# QoS OPTIONS\n#\n\n"); + subn_dump_qos_options(opts_file, + "QoS default options", "qos", &p_opts->qos_options); + fprintf(opts_file, "\n"); + subn_dump_qos_options(opts_file, + "QoS HCA options", "qos_hca", &p_opts->qos_hca_options); + fprintf(opts_file, "\n"); + subn_dump_qos_options(opts_file, + "QoS Switch Port 0 options", "qos_sw0", &p_opts->qos_sw0_options); + fprintf(opts_file, "\n"); + subn_dump_qos_options(opts_file, + "QoS Switch external ports options", "qos_swe", &p_opts->qos_swe_options); + /* optional string attributes ... */ fclose(opts_file); From sashak at voltaire.com Tue May 9 11:15:50 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 09 May 2006 21:15:50 +0300 Subject: [openib-general] [PATCH 2/3] opensm: basic QoS implementation In-Reply-To: <20060509180059.14584.31483.stgit@sashak.voltaire.com> References: <20060509180059.14584.31483.stgit@sashak.voltaire.com> Message-ID: <20060509181550.14584.19176.stgit@sashak.voltaire.com> Basic low-level QoS implementation. The main procedure (osm_qos_setup()) will be called from resweeper (after configuration refreshing). And then this will setup low level QoS related ports' attributes (PortInfo:VLHighLimit, VL*Arbitration and SL2VLMapping tables). Different port categories (HCA, switch external ports and switch port 0) will be updated according to provided configurations. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_madw.h | 1 osm/opensm/Makefile.am | 2 osm/opensm/osm_qos.c | 439 +++++++++++++++++++++++++++++++++++++++++ osm/opensm/osm_state_mgr.c | 11 + 4 files changed, 452 insertions(+), 1 deletions(-) diff --git a/osm/include/opensm/osm_madw.h b/osm/include/opensm/osm_madw.h index 5b4ddab..4d92db4 100644 --- a/osm/include/opensm/osm_madw.h +++ b/osm/include/opensm/osm_madw.h @@ -352,6 +352,7 @@ typedef union _osm_madw_context osm_smi_context_t smi_context; osm_slvl_context_t slvl_context; osm_pkey_context_t pkey_context; + osm_vla_context_t vla_context; #ifndef OSM_VENDOR_INTF_OPENIB osm_arbitrary_context_t arb_context; #endif diff --git a/osm/opensm/Makefile.am b/osm/opensm/Makefile.am index eab0f5b..43fe8c1 100644 --- a/osm/opensm/Makefile.am +++ b/osm/opensm/Makefile.am @@ -82,7 +82,7 @@ opensm_SOURCES = main.c osm_console.c os osm_state_mgr_ctrl.c osm_subnet.c \ osm_sweep_fail_ctrl.c osm_sw_info_rcv.c \ osm_sw_info_rcv_ctrl.c osm_switch.c \ - osm_prtn.c osm_prtn_config.c \ + osm_prtn.c osm_prtn_config.c osm_qos.c \ osm_trap_rcv.c osm_trap_rcv_ctrl.c \ osm_ucast_mgr.c osm_ucast_updn.c \ osm_vl15intf.c osm_vl_arb_rcv.c \ diff --git a/osm/opensm/osm_qos.c b/osm/opensm/osm_qos.c new file mode 100644 index 0000000..be27b40 --- /dev/null +++ b/osm/opensm/osm_qos.c @@ -0,0 +1,439 @@ +/* + * Copyright (c) 2006 Voltaire, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id$ + */ + +/* + * Abstract: + * Implementation of OpenSM QoS infrastructure primitives + * + * Environment: + * Linux User Mode + * + * $Revision$ + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include + +#include +#include +#include +#include +#include + +struct qos_config { + uint8_t max_vls; + uint8_t vl_high_limit; + ib_vl_arb_table_t vlarb_high[2]; + ib_vl_arb_table_t vlarb_low[2]; + ib_slvl_table_t sl2vl; +}; + +static void qos_build_config(struct qos_config *cfg, + osm_qos_options_t * opt, osm_qos_options_t * dflt); + +/* + * QoS primitives + * + */ + +static ib_api_status_t vlarb_update_table_block(osm_req_t * p_req, + osm_physp_t * p, + unsigned port_num, + const ib_vl_arb_table_t *table_block, + unsigned block_length, + unsigned block_num) +{ + ib_vl_arb_table_t block; + osm_madw_context_t context; + uint32_t attr_mod; + ib_port_info_t *p_pi; + unsigned vl_mask; + int i; + + if (!(p_pi = osm_physp_get_port_info_ptr(p))) + return IB_ERROR; + + vl_mask = (1 << (ib_port_info_get_op_vls(p_pi) - 1)) - 1; + + cl_memset(&block, 0, sizeof(block)); + cl_memcpy(&block, table_block, + block_length * sizeof(block.vl_entry[0])); + for (i = 0; i < block_length; i++) + block.vl_entry[i].vl &= vl_mask; + + if (!cl_memcmp(&p->vl_arb[block_num], &block, + block_length * sizeof(block.vl_entry[0]))) + return IB_SUCCESS; + + context.vla_context.node_guid = + osm_node_get_node_guid(osm_physp_get_node_ptr(p)); + context.vla_context.port_guid = osm_physp_get_port_guid(p); + context.vla_context.set_method = TRUE; + attr_mod = ((block_num + 1) << 16) | port_num; + + return osm_req_set(p_req, osm_physp_get_dr_path_ptr(p), + (uint8_t *) & block, sizeof(block), + IB_MAD_ATTR_VL_ARBITRATION, + cl_hton32(attr_mod), CL_DISP_MSGID_NONE, &context); +} + +static ib_api_status_t vlarb_update(osm_req_t * p_req, + osm_physp_t * p, unsigned port_num, + const struct qos_config *qcfg) +{ + ib_api_status_t status = IB_SUCCESS; + ib_port_info_t *p_pi; + unsigned len; + + if (!(p_pi = osm_physp_get_port_info_ptr(p))) + return IB_ERROR; + + if (p_pi->vl_arb_low_cap > 0) { + len = p_pi->vl_arb_low_cap < IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK ? + p_pi->vl_arb_low_cap : IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; + if ((status = vlarb_update_table_block(p_req, p, port_num, + &qcfg->vlarb_low[0], + len, 0)) != IB_SUCCESS) + return status; + } + if (p_pi->vl_arb_low_cap > IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK) { + len = p_pi->vl_arb_low_cap % IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; + if ((status = vlarb_update_table_block(p_req, p, port_num, + &qcfg->vlarb_low[1], + len, 1)) != IB_SUCCESS) + return status; + } + if (p_pi->vl_arb_high_cap > 0) { + len = p_pi->vl_arb_high_cap < IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK ? + p_pi->vl_arb_high_cap : IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; + if ((status = vlarb_update_table_block(p_req, p, port_num, + &qcfg->vlarb_high[0], + len, 2)) != IB_SUCCESS) + return status; + } + if (p_pi->vl_arb_high_cap > IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK) { + len = p_pi->vl_arb_high_cap % IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; + if ((status = vlarb_update_table_block(p_req, p, port_num, + &qcfg->vlarb_high[1], + len, 3)) != IB_SUCCESS) + return status; + } + + return status; +} + +static ib_api_status_t sl2vl_update_table(osm_req_t * p_req, + osm_physp_t * p, uint8_t in_port, + uint8_t out_port, + const ib_slvl_table_t * sl2vl_table) +{ + osm_madw_context_t context; + ib_slvl_table_t tbl, *p_tbl; + osm_node_t *p_node = osm_physp_get_node_ptr(p); + uint32_t attr_mod; + ib_port_info_t *p_pi; + unsigned vl_mask; + uint8_t vl1, vl2; + int i; + + if (!(p_pi = osm_physp_get_port_info_ptr(p))) + return IB_ERROR; + + vl_mask = (1 << (ib_port_info_get_op_vls(p_pi) - 1)) - 1; + + for (i = 0; i < IB_MAX_NUM_VLS / 2; i++) { + vl1 = sl2vl_table->raw_vl_by_sl[i] >> 4; + vl2 = sl2vl_table->raw_vl_by_sl[i] & 0xf; + if (vl1 != 15) + vl1 &= vl_mask; + if (vl2 != 15) + vl2 &= vl_mask; + tbl.raw_vl_by_sl[i] = (vl1 << 4 ) | vl2 ; + } + + p_tbl = osm_physp_get_slvl_tbl(p, in_port); + if (p_tbl && !cl_memcmp(p_tbl, &tbl, sizeof(tbl))) + return IB_SUCCESS; + + context.slvl_context.node_guid = osm_node_get_node_guid(p_node); + context.slvl_context.port_guid = osm_physp_get_port_guid(p); + context.slvl_context.set_method = TRUE; + attr_mod = in_port << 8 | out_port; + return osm_req_set(p_req, osm_physp_get_dr_path_ptr(p), + (uint8_t *) & tbl, sizeof(tbl), + IB_MAD_ATTR_SLVL_TABLE, + cl_hton32(attr_mod), CL_DISP_MSGID_NONE, &context); +} + +static ib_api_status_t sl2vl_update(osm_req_t * p_req, + osm_physp_t * p, unsigned port_num, + const struct qos_config *qcfg) +{ + ib_api_status_t status; + unsigned i, num_ports; + ib_port_info_t *p_pi = osm_physp_get_port_info_ptr(p); + + if (p_pi && !(p_pi->capability_mask & IB_PORT_CAP_HAS_SL_MAP)) + return IB_SUCCESS; + + if (osm_node_get_type(osm_physp_get_node_ptr(p)) == IB_NODE_TYPE_SWITCH) + num_ports = osm_node_get_num_physp(osm_physp_get_node_ptr(p)); + else + num_ports = 1; + + for (i = 0; i < num_ports; i++) { + status = + sl2vl_update_table(p_req, p, i, port_num, &qcfg->sl2vl); + if (status != IB_SUCCESS) + return status; + } + + return IB_SUCCESS; +} + +static ib_api_status_t vl_high_limit_update(osm_req_t * p_req, + osm_physp_t * p, + const struct qos_config *qcfg) +{ + uint8_t payload[IB_SMP_DATA_SIZE]; + osm_madw_context_t context; + ib_port_info_t *p_pi; + + if (!(p_pi = osm_physp_get_port_info_ptr(p))) + return IB_ERROR; + + if (p_pi->vl_high_limit == qcfg->vl_high_limit) + return IB_SUCCESS; + + cl_memclr(payload, IB_SMP_DATA_SIZE); + cl_memcpy(payload, p_pi, sizeof(ib_port_info_t)); + + p_pi = (ib_port_info_t *) payload; + p_pi->state_info2 = 0; + ib_port_info_set_port_state(p_pi, IB_LINK_NO_CHANGE); + + p_pi->vl_high_limit = qcfg->vl_high_limit; + + context.pi_context.node_guid = + osm_node_get_node_guid(osm_physp_get_node_ptr(p)); + context.pi_context.port_guid = osm_physp_get_port_guid(p); + context.pi_context.set_method = TRUE; + context.pi_context.update_master_sm_base_lid = FALSE; + context.pi_context.ignore_errors = FALSE; + context.pi_context.light_sweep = FALSE; + + return osm_req_set(p_req, osm_physp_get_dr_path_ptr(p), + payload, sizeof(payload), IB_MAD_ATTR_PORT_INFO, + cl_hton32(osm_physp_get_port_num(p)), + CL_DISP_MSGID_NONE, &context); +} + +static ib_api_status_t qos_physp_setup(osm_log_t * p_log, osm_req_t * p_req, + osm_physp_t * p, unsigned port_num, + const struct qos_config *qcfg) +{ + ib_api_status_t status; + + /* OpVLs should be ok at this moment - just use it */ + + /* setup vl high limit */ + status = vl_high_limit_update(p_req, p, qcfg); + if (status != IB_SUCCESS) { + osm_log(p_log, OSM_LOG_ERROR, "qos_physp_setup: " + "failed to update VLHighLimit " + "for port %" PRIx64 " #%d\n", + cl_ntoh64(p->port_guid), port_num); + return status; + } + + /* setup VLArbitration */ + status = vlarb_update(p_req, p, port_num, qcfg); + if (status != IB_SUCCESS) { + osm_log(p_log, OSM_LOG_ERROR, "qos_physp_setup: " + "failed to update VLArbitration tables " + "for port %" PRIx64 " #%d\n", + cl_ntoh64(p->port_guid), port_num); + return status; + } + + /* setup Sl2VL tables */ + status = sl2vl_update(p_req, p, port_num, qcfg); + if (status != IB_SUCCESS) { + osm_log(p_log, OSM_LOG_ERROR, "qos_physp_setup: " + "failed to update SL2VLMapping tables " + "for port %" PRIx64 " #%d\n", + cl_ntoh64(p->port_guid), port_num); + return status; + } + + return IB_SUCCESS; +} + +osm_signal_t osm_qos_setup(osm_opensm_t * p_osm) +{ + struct qos_config hca_config, sw0_config, swe_config; + struct qos_config *cfg; + osm_switch_t *p_sw; + ib_switch_info_t *p_si; + cl_qmap_t *p_tbl; + cl_map_item_t *p_next; + osm_port_t *p_port; + uint32_t num_physp; + osm_physp_t *p_physp; + uint8_t node_type; + ib_api_status_t status; + uint32_t i; + + OSM_LOG_ENTER(&p_osm->log, osm_qos_setup); + + qos_build_config(&hca_config, &p_osm->subn.opt.qos_hca_options, + &p_osm->subn.opt.qos_options); + qos_build_config(&sw0_config, &p_osm->subn.opt.qos_sw0_options, + &p_osm->subn.opt.qos_options); + qos_build_config(&swe_config, &p_osm->subn.opt.qos_swe_options, + &p_osm->subn.opt.qos_options); + + cl_plock_excl_acquire(&p_osm->lock); + + p_tbl = &p_osm->subn.port_guid_tbl; + p_next = cl_qmap_head(p_tbl); + while (p_next != cl_qmap_end(p_tbl)) { + p_port = (osm_port_t *) p_next; + p_next = cl_qmap_next(p_next); + + node_type = osm_node_get_type(osm_port_get_parent_node(p_port)); + if (node_type == IB_NODE_TYPE_SWITCH) { + num_physp = osm_port_get_num_physp(p_port); + for (i = 1; i < num_physp; i++) { + p_physp = osm_port_get_phys_ptr(p_port, i); + if (!p_physp || !osm_physp_is_valid(p_physp)) + continue; + status = + qos_physp_setup(&p_osm->log, &p_osm->sm.req, + p_physp, i, &swe_config); + } + /* skip base port 0 */ + p_sw = osm_get_switch_by_guid(&p_osm->subn, + osm_port_get_guid(p_port)); + if (!p_sw || !(p_si = osm_switch_get_si_ptr(p_sw)) || + !ib_switch_info_is_enhanced_port_0(p_si)) + continue; + + cfg = &sw0_config; + } + else + cfg = &hca_config; + + p_physp = osm_port_get_default_phys_ptr(p_port); + if (!osm_physp_is_valid(p_physp)) + continue; + + status = qos_physp_setup(&p_osm->log, &p_osm->sm.req, + p_physp, 0, cfg); + } + + cl_plock_release(&p_osm->lock); + OSM_LOG_EXIT(&p_osm->log); + + return OSM_SIGNAL_DONE; +} + +/* + * QoS config stuff + * + */ + +static int parse_one_unsigned(char *str, char delim, unsigned *val) +{ + char *end; + *val = strtoul(str, &end, 0); + if (*end) + end++; + return end - str; +} + +static int parse_vlarb_entry(char *str, ib_vl_arb_element_t * e) +{ + unsigned val; + char *p = str; + p += parse_one_unsigned(p, ':', &val); + e->vl = val % 15; + p += parse_one_unsigned(p, ',', &val); + e->weight = val; + return p - str; +} + +static int parse_sl2vl_entry(char *str, uint8_t * raw) +{ + unsigned val1, val2; + char *p = str; + p += parse_one_unsigned(p, ',', &val1); + p += parse_one_unsigned(p, ',', &val2); + *raw = (val1 << 4) | (val2 & 0xf); + return p - str; +} + +static void qos_build_config(struct qos_config *cfg, + osm_qos_options_t * opt, osm_qos_options_t * dflt) +{ + int i; + char *p; + + memset(cfg, 0, sizeof(*cfg)); + + cfg->max_vls = opt->max_vls > 0 ? opt->max_vls : dflt->max_vls; + cfg->vl_high_limit = opt->high_limit; + + p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high; + for (i = 0; i < 2 * IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; i++) { + p += parse_vlarb_entry(p, + &cfg->vlarb_high[i/IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]. + vl_entry[i%IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]); + } + + p = opt->vlarb_low ? opt->vlarb_low : dflt->vlarb_low; + for (i = 0; i < 2 * IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; i++) { + p += parse_vlarb_entry(p, + &cfg->vlarb_low[i/IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]. + vl_entry[i%IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]); + } + + p = opt->sl2vl ? opt->sl2vl : dflt->sl2vl; + for (i = 0; i < IB_MAX_NUM_VLS / 2; i++) + p += parse_sl2vl_entry(p, &cfg->sl2vl.raw_vl_by_sl[i]); + +} diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c index 1aefc0b..ca46937 100644 --- a/osm/opensm/osm_state_mgr.c +++ b/osm/opensm/osm_state_mgr.c @@ -71,6 +71,10 @@ #include /********************************************************************** **********************************************************************/ +osm_signal_t osm_qos_setup(IN osm_opensm_t * p_osm); + +/********************************************************************** + **********************************************************************/ void osm_state_mgr_construct( IN osm_state_mgr_t * const p_mgr ) @@ -1971,6 +1975,9 @@ osm_state_mgr_process( * need to unset it. */ p_mgr->p_subn->subnet_initialization_error = FALSE; + /* rescan configuration updates */ + osm_subn_rescan_conf_file(&p_mgr->p_subn->opt); + status = __osm_state_mgr_sweep_hop_0( p_mgr ); if( status == IB_SUCCESS ) { @@ -2234,6 +2241,10 @@ osm_state_mgr_process( /* the returned signal might be DONE or DONE_PENDING */ signal = osm_pkey_mgr_process( p_mgr->p_subn->p_osm ); + + /* the returned signal is always DONE */ + signal = osm_qos_setup(p_mgr->p_subn->p_osm); + break; default: From sashak at voltaire.com Tue May 9 11:15:52 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 09 May 2006 21:15:52 +0300 Subject: [openib-general] [PATCH 3/3] opensm: no_qos global option In-Reply-To: <20060509180059.14584.31483.stgit@sashak.voltaire.com> References: <20060509180059.14584.31483.stgit@sashak.voltaire.com> Message-ID: <20060509181552.14584.14666.stgit@sashak.voltaire.com> This new option '--no_qos' (or '-O') will disable QoS setup globally in OpenSM. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_subnet.h | 1 + osm/opensm/main.c | 10 +++++++++- osm/opensm/osm_qos.c | 3 +++ osm/opensm/osm_subnet.c | 11 ++++++++++- 4 files changed, 23 insertions(+), 2 deletions(-) diff --git a/osm/include/opensm/osm_subnet.h b/osm/include/opensm/osm_subnet.h index 0da3f0c..373030d 100644 --- a/osm/include/opensm/osm_subnet.h +++ b/osm/include/opensm/osm_subnet.h @@ -263,6 +263,7 @@ typedef struct _osm_subn_opt char * log_file; char * partition_config_file; boolean_t no_partition_enforcement; + boolean_t no_qos; boolean_t accum_log_file; boolean_t console; cl_map_t port_prof_ignore_guids; diff --git a/osm/opensm/main.c b/osm/opensm/main.c index aa4bccb..45ea8ca 100644 --- a/osm/opensm/main.c +++ b/osm/opensm/main.c @@ -234,6 +234,9 @@ show_usage(void) "--Pconfig\n" " This option defines the optional partition configuration file.\n" " The default name is \'" OSM_DEFAULT_PARTITION_CONFIG_FILE "\'.\n\n"); + printf( "-Q\n" + "--no_qos\n" + " This option disables QoS setup.\n\n"); printf( "-N\n" "--no_part_enforce\n" " This option disables partition enforcement on switch external ports.\n\n"); @@ -523,7 +526,7 @@ #endif boolean_t cache_options = FALSE; char *ignore_guids_file_name = NULL; uint32_t val; - const char * const short_option = "i:f:ed:g:l:s:t:a:P:NuvVhorcyx"; + const char * const short_option = "i:f:ed:g:l:s:t:a:P:NQuvVhorcyx"; /* In the array below, the 2nd parameter specified the number @@ -546,6 +549,7 @@ #endif { "erase_log_file",0, NULL, 'e'}, { "Pconfig", 1, NULL, 'P'}, { "no_part_enforce",0,NULL, 'N'}, + { "no_qos", 0, NULL, 'Q'}, { "maxsmps", 1, NULL, 'n'}, { "console", 0, NULL, 'q'}, { "V", 0, NULL, 'V'}, @@ -738,6 +742,10 @@ #endif opt.no_partition_enforcement = TRUE; break; + case 'Q': + opt.no_qos = TRUE; + break; + case 'y': opt.exit_on_fatal = FALSE; printf(" Staying on fatal initialization errors\n"); diff --git a/osm/opensm/osm_qos.c b/osm/opensm/osm_qos.c index be27b40..20cfaad 100644 --- a/osm/opensm/osm_qos.c +++ b/osm/opensm/osm_qos.c @@ -318,6 +318,9 @@ osm_signal_t osm_qos_setup(osm_opensm_t ib_api_status_t status; uint32_t i; + if (p_osm->subn.opt.no_qos) + return OSM_SIGNAL_DONE; + OSM_LOG_ENTER(&p_osm->log, osm_qos_setup); qos_build_config(&hca_config, &p_osm->subn.opt.qos_hca_options, diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c index 4580ed1..036a6ba 100644 --- a/osm/opensm/osm_subnet.c +++ b/osm/opensm/osm_subnet.c @@ -456,6 +456,7 @@ osm_subn_set_default_opt( p_opt->log_file = OSM_DEFAULT_LOG_FILE; p_opt->partition_config_file = OSM_DEFAULT_PARTITION_CONFIG_FILE; p_opt->no_partition_enforcement = FALSE; + p_opt->no_qos = FALSE; p_opt->accum_log_file = TRUE; p_opt->port_profile_switch_nodes = FALSE; p_opt->max_port_profile = 0xffffffff; @@ -889,6 +890,10 @@ osm_subn_parse_conf_file( p_key, p_val, &p_opts->no_partition_enforcement); __osm_subn_opts_unpack_boolean( + "no_qos", + p_key, p_val, &p_opts->no_qos); + + __osm_subn_opts_unpack_boolean( "accum_log_file", p_key, p_val, &p_opts->accum_log_file); @@ -1117,7 +1122,11 @@ osm_subn_write_conf_file( fprintf( opts_file, - "#\n# QoS OPTIONS\n#\n\n"); + "#\n# QoS OPTIONS\n#\n" + "# Disable QoS setup\n" + "no_qos %s\n\n", + p_opts->no_qos ? "TRUE" : "FALSE"); + subn_dump_qos_options(opts_file, "QoS default options", "qos", &p_opts->qos_options); fprintf(opts_file, "\n"); From sean.hefty at intel.com Tue May 9 11:13:58 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 9 May 2006 11:13:58 -0700 Subject: [openib-general] Re: [PATCH] update uDAPL openib_cma provider to work with new uCMA event channels In-Reply-To: Message-ID: >> Sync up with Sean on commits. > >I'm watching for Sean's commit. Did I miss it? It looks like my commit failed for some reason, and I missed it. I've just re-committed the changes, which should be in revision 7019. - Sean From mshefty at ichips.intel.com Tue May 9 11:20:59 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 09 May 2006 11:20:59 -0700 Subject: [openib-general] compilation warnings in libibcm In-Reply-To: <200605091630.12039.dotanb@mellanox.co.il> References: <200605091630.12039.dotanb@mellanox.co.il> Message-ID: <4460DD8B.908@ichips.intel.com> Dotan Barak wrote: > Compilation of the gen2 driver generate compilation warnings when compiling the libibcm. I've committed changes that should eliminate these warnings. I've also updated librdmacm to replace deprecated function/structure names with the latest names. Thanks. - Sean From sashak at voltaire.com Tue May 9 11:25:20 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 9 May 2006 21:25:20 +0300 Subject: [openib-general] [PATCH 0/3] opensm: low-level QoS implementation (take 2) [was: Series short description] In-Reply-To: <20060509180059.14584.31483.stgit@sashak.voltaire.com> References: <20060509180059.14584.31483.stgit@sashak.voltaire.com> Message-ID: <20060509182520.GK28740@sashak.voltaire.com> Sorry, subject should be "opensm: low-level QoS implementation (take 2)" On 21:00 Tue 09 May , Sasha Khapyorsky wrote: > Hello, > > There is "take 2" of support for low level Quality of Service (QoS) > parameters configuration and setup in OpenSM. > > The changes against previous version are: > > - consistent default values for QoS configuration parameters (as > suggested by Eitan Zahavi) > - global '--no_qos' option which disables QoS setup at all > > Please comment. Thanks. > > Sasha. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From xma at us.ibm.com Tue May 9 11:27:29 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 9 May 2006 11:27:29 -0700 Subject: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: <20060509164919.GC5063@mellanox.co.il> Message-ID: openib-general-bounces at openib.org wrote on 05/09/2006 09:49:19 AM: > Quoting r. Roland Dreier : > > The trivial way to do it would be to use the same idea as the current > > ehca driver: just create a thread for receive CQ events and a thread > > for send CQ events, and defer CQ polling into those two threads. I have done some patch like that on top of splitting CQ. The problem I found that hardware interrupt favors one CPU. Most of the time these two threads are running on the same cpu according to my debug output. You can easily find out by cat /proc/interrupts and /proc/irq/XXX/smp_affinity. ehca has distributed interrupts evenly on SMP, so it gets the benefits of two threads, and gains much better throughputs. The interesting thing is the UP results are much better than SMP results with this approach on mthca. > For RX, isn't this basically what NAPI is doing? > Only NAPI seems better, avoiding interrupts completely and avoiding > latency hit > by only getting triggered on high load ... > > -- > MST According to some results from different resouces, NAPI only gives 3%-10% performance improvement on single CQ. I am trying a simple NAPI patch on splitting CQ now to see how much performance there. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From iod00d at hp.com Tue May 9 11:29:57 2006 From: iod00d at hp.com (Grant Grundler) Date: Tue, 9 May 2006 11:29:57 -0700 Subject: [openib-general] ip over ib throughtput In-Reply-To: <445FCBA1.1030903@ncsa.uiuc.edu> References: <20041229134351.GA3486@mellanox.co.il> <445FCBA1.1030903@ncsa.uiuc.edu> Message-ID: <20060509182957.GA29261@esmail.cup.hp.com> On Mon, May 08, 2006 at 05:52:17PM -0500, Hassan M. Jafri wrote: > I cant crank out more than 150 MB/sec with my 2.0 GHz xeons. verbs level > benchmarks, however give decent numbers for bandwidth. With netperf, the > server side CPU usage is 99% which is much higher than other posted > bandwidth results on this thread. Any suggestions? > > Here is the complete configuration for my bandwidth tests > > Kernel-2.6.15.4 > netperf-2.3-3 FWIW, 2.4.1 is the latest netperf version. See http://www.netperf.org/svn/netperf2 > OpenIB rev 6552 > MTLP23108-CF128 > Firmware 3.4.0 I was using 3.3.3...looks like I should update. > MSI-X is enabled for the HCA > > ------------------------------ > Here is the netperf output > > > > TCP STREAM TEST to 192.168.2.2 > Recv Send Send Utilization Service > Demand > Socket Socket Message Elapsed Send Recv Send Recv > Size Size Size Time Throughput local remote local > remote > bytes bytes bytes secs. MBytes /s % T % T us/KB us/KB > > 262142 262142 32768 10.01 151.32 59.66 99.84 7.700 > 12.886 I think Michael is probably right - NETFILTER or something else is burning additional CPU cycles. The "Send local" service demand is nearly 2X of what I was seeing a few monthes back. You need to use perfmon or oprofile to figure out where the time is being spent. You could also test SDP (see openib.org for notes on LD_PRELOAD=libsdp.so) to see how that performs - doesn't use NETFILTER. You might also try the netperf "-T" option to bind the netserver process to a different CPU. If two CPUs are available, IPoIB throughput is better with the interrupts handled on a different CPU than the one handling the data. grant From xma at us.ibm.com Tue May 9 11:36:45 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 9 May 2006 11:36:45 -0700 Subject: [openib-general] ip over ib throughtput In-Reply-To: <20060509182957.GA29261@esmail.cup.hp.com> Message-ID: Grant, > You might also try the netperf "-T" option to bind the netserver > process to a different CPU. If two CPUs are available, IPoIB throughput > is better with the interrupts handled on a different CPU than the one > handling the data. > > grant What throughput did you get on two CPUs? Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Don.Albert at Bull.com Tue May 9 11:34:08 2006 From: Don.Albert at Bull.com (Don.Albert at Bull.com) Date: Tue, 9 May 2006 11:34:08 -0700 Subject: [openib-general] Re: NOP problem in ib_mthca on OFED RC4 In-Reply-To: <20060509150426.GI21036@mellanox.co.il> Message-ID: Michael, > > > Which FW revision do you have? > > > > > The "ibstat" command shows: > > > > CA type: MT25204 > > Number of ports: 1 > > Firmware version: 1.0.800 > > Hardware version: a0 > > Node GUID: 0x0002c90200216dc4 > > System image GUID: 0x0002c90200216dc7 > > > > -Don Albert- > > > > Yes, that's the latest revision. Hmm. > What about the other thing I mentioned in my first message: the "lspci" command complains about the board slot that the HCA is plugged into: pcilib: Resource 2 in /sys/bus/pci/devices/0000:03:00.0/resource has a 64-bit address, ignoring .... 03:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20) I also found out that on this machine the HCA is plugged into a 16X PCI-e slot, which is different than the other machine which is working, where the slot is 8X. Bear in mind, however, that both machines were previously working with the 2.6.9-34 kernel with the backport patches and the OpenIB svn 6500 code. Did something happen in 2.6.16, or am I missing a patch? -Don Albert- -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue May 9 11:36:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 09 May 2006 11:36:07 -0700 Subject: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: (Shirley Ma's message of "Tue, 9 May 2006 11:27:29 -0700") References: Message-ID: Shirley> I have done some patch like that on top of splitting Shirley> CQ. The problem I found that hardware interrupt favors Shirley> one CPU. Most of the time these two threads are running Shirley> on the same cpu according to my debug output. You can Shirley> easily find out by cat /proc/interrupts and Shirley> /proc/irq/XXX/smp_affinity. ehca has distributed Shirley> interrupts evenly on SMP, so it gets the benefits of two Shirley> threads, and gains much better throughputs. Yes, an interrupt will likely be delivered to one CPU. But there's no reason why the two threads can't be pinned to different CPUs or given exclusive CPU masks, exactly the same way that ehca implements it. - R. From mst at mellanox.co.il Tue May 9 11:41:24 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 21:41:24 +0300 Subject: [openib-general] Re: NOP problem in ib_mthca on OFED RC4 In-Reply-To: References: <20060509150426.GI21036@mellanox.co.il> Message-ID: <20060509184124.GE22825@mellanox.co.il> Quoting r. Don.Albert at bull.com : > Subject: Re: NOP problem in ib_mthca on OFED RC4 > > > Michael, > > > > > Which FW revision do you have? > > > > > > > The "ibstat" command shows: > > > > > > CA type: MT25204 > > > Number of ports: 1 > > > Firmware version: 1.0.800 > > > Hardware version: a0 > > > Node GUID: 0x0002c90200216dc4 > > > System image GUID: 0x0002c90200216dc7 > > > > > > -Don Albert- > > > > > > > Yes, that's the latest revision. Hmm. > > > > What about the other thing I mentioned in my first message: the "lspci" command complains about the board slot that the HCA is plugged into: > > pcilib: Resource 2 in /sys/bus/pci/devices/0000:03:00.0/resource has a 64-bit address, ignoring > .... > 03:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20) No, that's just a bug in pciutils. > I also found out that on this machine the HCA is plugged into a 16X PCI-e slot, which is different than the other machine which is working, where the slot is 8X. Shouldn't matter. Hmm. > Bear in mind, however, that both machines were previously working with the 2.6.9-34 kernel with the backport patches and the OpenIB svn 6500 code. Did something happen in 2.6.16, or am I missing a patch? I think fw_cmd_db was added and default was changed to 1 from 0. -- MST From xma at us.ibm.com Tue May 9 11:44:49 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 9 May 2006 11:44:49 -0700 Subject: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: Message-ID: Roland Dreier wrote on 05/09/2006 11:36:07 AM: > But there's no reason why the two threads can't be pinned to different > CPUs or given exclusive CPU masks, exactly the same way that ehca > implements it. > > - R. I could try this. Let's see how much latency increase there. Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Tue May 9 11:44:52 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 21:44:52 +0300 Subject: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: References: <20060509164919.GC5063@mellanox.co.il> Message-ID: <20060509184451.GF22825@mellanox.co.il> Quoting r. Shirley Ma : > According to some results from different resouces, NAPI only gives 3%-10% performance improvement on single CQ. When you say performance you mean bandwidth. But I think it should improve the CPU utilization on RX side significantly. If it does, that an important metric as well. > I am trying a simple NAPI patch on splitting CQ now to see how much performance there. What are you using for a benchmark? -- MST From xma at us.ibm.com Tue May 9 11:51:58 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 9 May 2006 11:51:58 -0700 Subject: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: <20060509184451.GF22825@mellanox.co.il> Message-ID: "Michael S. Tsirkin" wrote on 05/09/2006 11:44:52 AM: > Quoting r. Shirley Ma : > > According to some results from different resouces, NAPI only gives > 3%-10% performance improvement on single CQ. > > When you say performance you mean bandwidth. > But I think it should improve the CPU utilization on RX side significantly. > If it does, that an important metric as well. No, CPU utilization wasn't reduced. When you use single CQ, NAPI polls on both RX/TX. > > I am trying a simple NAPI patch on splitting CQ now to see how > much performance there. > > What are you using for a benchmark? > > -- > MST netperf, iperf, mpstat, netpipe, oprofiling, what's your suggestion? Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Tue May 9 11:55:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 21:55:33 +0300 Subject: [openib-general] Re: Re: [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: References: <20060509184451.GF22825@mellanox.co.il> Message-ID: <20060509185533.GG22825@mellanox.co.il> Quoting r. Shirley Ma : > No, CPU utilization wasn't reduced. When you use single CQ, NAPI polls on both RX/TX. I think NAPI's point is to reduce the interrupt rate. Wouldn't this reduce CPU load? > netperf, iperf, mpstat, netpipe, oprofiling, what's your suggestion? netperf has -C which gives CPU load, which is handy. Running vmstat in another window also works reasoably well. -- MST From info at schihei.de Tue May 9 11:57:01 2006 From: info at schihei.de (Heiko J Schick) Date: Tue, 9 May 2006 20:57:01 +0200 Subject: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: <20060509164919.GC5063@mellanox.co.il> References: <4450A196.2050901@de.ibm.com> <445B4DA9.9040601@de.ibm.com> <44608C90.30909@de.ibm.com> <20060509164919.GC5063@mellanox.co.il> Message-ID: <40FCD6B6-9135-43C1-8974-E9070475DB78@schihei.de> On 09.05.2006, at 18:49, Michael S. Tsirkin wrote: >> The trivial way to do it would be to use the same idea as the current >> ehca driver: just create a thread for receive CQ events and a thread >> for send CQ events, and defer CQ polling into those two threads. > > For RX, isn't this basically what NAPI is doing? > Only NAPI seems better, avoiding interrupts completely and avoiding > latency hit > by only getting triggered on high load ... Does NAPI schedules CQ callbacks to different CPUs or stays the callback (handling of data, etc.) on the same CPU where the interrupt came in? Regards, Heiko From tziporet at mellanox.co.il Tue May 9 12:03:48 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 9 May 2006 22:03:48 +0300 Subject: [openib-general][patch review] srp: fmr implementation, Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> Roland, What is the status of the FMR patch? When do you expect it to be on the trunk? Thanks Tziporet -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Vu Pham Sent: Monday, May 08, 2006 10:12 PM To: Roland Dreier Cc: openib-general at openib.org Subject: Re: [openib-general][patch review] srp: fmr implementation, Roland Dreier wrote: > Vu> This fmr patch does not work for ia64 system because this > Vu> fmr_page_mask is defined as unsigned int. > > Great catch! > > Vu> We should type cast it to u64 or define it as unsigned long > > Casting it won't help because it will just get zero-extended. I think > we need the following in ib_srp.h: > > unsigned long fmr_page_mask; > > and then in ib_srp.c: > > srp_dev->fmr_page_mask = ~((unsigned long) srp_dev->fmr_page_size - 1); > > does this work for you? > Yes. Please commit the final fmr patch. Thanks, Vu _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From iod00d at hp.com Tue May 9 12:05:38 2006 From: iod00d at hp.com (Grant Grundler) Date: Tue, 9 May 2006 12:05:38 -0700 Subject: [openib-general] ip over ib throughtput In-Reply-To: References: <20060509182957.GA29261@esmail.cup.hp.com> Message-ID: <20060509190538.GC29261@esmail.cup.hp.com> On Tue, May 09, 2006 at 11:36:45AM -0700, Shirley Ma wrote: > What throughput did you get on two CPUs? With one CPU, I get ~2.5-2.8 Gb/s. With two CPUs: 3.5-3.6 Gb/s. The last SVN version I tested was 2.6.15 + r4929 (several monthes ago). To be clear, "one CPU" means the netperf process is bound to the same CPU as the one handling mthca interrupts with taskset. Ditto for netserver on the other system. "Two CPU" means bind netperf/netserver processes to a different CPU that is NOT handling the mthca interrupt. Also note that the "Service Demand" (CPU us/KB) goes up by 10-20% also. So we really only want to do this when the CPU handling interrupts is saturated and we know the other CPU is available. grant From rdreier at cisco.com Tue May 9 12:09:39 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 09 May 2006 12:09:39 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> (Tziporet Koren's message of "Tue, 9 May 2006 22:03:48 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> Message-ID: Tziporet> Roland, What is the status of the FMR patch? When do Tziporet> you expect it to be on the trunk? It seems good. I should check it into svn today. I still need to figure out how I want to put it on the for-2.6.18 git branch, since it depends on some earlier fixes that Linus hasn't pulled yet. - R. From rdreier at cisco.com Tue May 9 12:12:27 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 09 May 2006 12:12:27 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> (Tziporet Koren's message of "Tue, 9 May 2006 22:03:48 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> Message-ID: BTW, does Mellanox (or anyone else) have any numbers showing that using FMRs makes any difference in performance on a semi-realistic benchmark? - R. From sean.hefty at intel.com Tue May 9 12:14:02 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 9 May 2006 12:14:02 -0700 Subject: [openib-general] RE: [PATCH] cm refcount race fix In-Reply-To: <20060509172703.GA22825@mellanox.co.il> Message-ID: Here's a patch for all of the files that you listed. I did do some basic testing and didn't see any issues. Signed-off-by: Sean Hefty --- Index: mad_rmpp.c =================================================================== --- mad_rmpp.c (revision 6884) +++ mad_rmpp.c (working copy) @@ -49,7 +49,7 @@ struct mad_rmpp_recv { struct list_head list; struct work_struct timeout_work; struct work_struct cleanup_work; - wait_queue_head_t wait; + struct completion comp; enum rmpp_state state; spinlock_t lock; atomic_t refcount; @@ -69,10 +69,16 @@ struct mad_rmpp_recv { u8 method; }; +static inline void deref_rmpp_recv(struct mad_rmpp_recv *rmpp_recv) +{ + if (atomic_dec_and_test(&rmpp_recv->refcount)) + complete(&rmpp_recv->comp); +} + static void destroy_rmpp_recv(struct mad_rmpp_recv *rmpp_recv) { - atomic_dec(&rmpp_recv->refcount); - wait_event(rmpp_recv->wait, !atomic_read(&rmpp_recv->refcount)); + deref_rmpp_recv(rmpp_recv); + wait_for_completion(&rmpp_recv->comp); ib_destroy_ah(rmpp_recv->ah); kfree(rmpp_recv); } @@ -253,7 +259,7 @@ create_rmpp_recv(struct ib_mad_agent_pri goto error; rmpp_recv->agent = agent; - init_waitqueue_head(&rmpp_recv->wait); + init_completion(&rmpp_recv->comp); INIT_WORK(&rmpp_recv->timeout_work, recv_timeout_handler, rmpp_recv); INIT_WORK(&rmpp_recv->cleanup_work, recv_cleanup_handler, rmpp_recv); spin_lock_init(&rmpp_recv->lock); @@ -279,12 +285,6 @@ error: kfree(rmpp_recv); return NULL; } -static inline void deref_rmpp_recv(struct mad_rmpp_recv *rmpp_recv) -{ - if (atomic_dec_and_test(&rmpp_recv->refcount)) - wake_up(&rmpp_recv->wait); -} - static struct mad_rmpp_recv * find_rmpp_recv(struct ib_mad_agent_private *agent, struct ib_mad_recv_wc *mad_recv_wc) Index: cm.c =================================================================== --- cm.c (revision 6884) +++ cm.c (working copy) @@ -34,6 +34,8 @@ * * $Id$ */ + +#include #include #include #include @@ -122,7 +124,7 @@ struct cm_id_private { struct rb_node service_node; struct rb_node sidr_id_node; spinlock_t lock; /* Do not acquire inside cm.lock */ - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; struct ib_mad_send_buf *msg; @@ -160,7 +162,7 @@ static void cm_work_handler(void *data); static inline void cm_deref_id(struct cm_id_private *cm_id_priv) { if (atomic_dec_and_test(&cm_id_priv->refcount)) - wake_up(&cm_id_priv->wait); + complete(&cm_id_priv->comp); } static int cm_alloc_msg(struct cm_id_private *cm_id_priv, @@ -611,7 +613,7 @@ struct ib_cm_id *ib_create_cm_id(struct goto error; spin_lock_init(&cm_id_priv->lock); - init_waitqueue_head(&cm_id_priv->wait); + init_completion(&cm_id_priv->comp); INIT_LIST_HEAD(&cm_id_priv->work_list); atomic_set(&cm_id_priv->work_count, -1); atomic_set(&cm_id_priv->refcount, 1); @@ -776,8 +778,8 @@ retest: } cm_free_id(cm_id->local_id); - atomic_dec(&cm_id_priv->refcount); - wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount)); + cm_deref_id(cm_id_priv); + wait_for_completion(&cm_id_priv->comp); while ((work = cm_dequeue_work(cm_id_priv)) != NULL) cm_free_work(work); kfree(cm_id_priv->compare_data); Index: multicast.c =================================================================== --- multicast.c (revision 6884) +++ multicast.c (working copy) @@ -30,6 +30,7 @@ * SOFTWARE. */ +#include #include #include #include @@ -69,7 +70,7 @@ struct mcast_port { spinlock_t lock; struct rb_root table; atomic_t refcount; - wait_queue_head_t wait; + struct completion comp; u8 port_num; }; @@ -110,7 +111,7 @@ struct mcast_member { struct list_head list; enum mcast_state state; atomic_t refcount; - wait_queue_head_t wait; + struct completion comp; }; static void join_handler(int status, struct ib_sa_mcmember_rec *rec, @@ -168,7 +169,7 @@ static struct mcast_group *mcast_insert( static void deref_port(struct mcast_port *port) { if (atomic_dec_and_test(&port->refcount)) - wake_up(&port->wait); + complete(&port->comp); } static void release_group(struct mcast_group *group) @@ -189,7 +190,7 @@ static void release_group(struct mcast_g static void deref_member(struct mcast_member *member) { if (atomic_dec_and_test(&member->refcount)) - wake_up(&member->wait); + complete(&member->comp); } static void queue_join(struct mcast_member *member) @@ -512,7 +513,7 @@ struct ib_multicast *ib_join_multicast(s member->multicast.comp_mask = comp_mask; member->multicast.callback = callback; member->multicast.context = context; - init_waitqueue_head(&member->wait); + init_completion(&member->comp); atomic_set(&member->refcount, 1); member->state = MCAST_JOINING; @@ -569,8 +570,8 @@ void ib_free_multicast(struct ib_multica release_group(group); } - atomic_dec(&member->refcount); - wait_event(member->wait, !atomic_read(&member->refcount)); + deref_member(member); + wait_for_completion(&member->comp); kfree(member); } EXPORT_SYMBOL(ib_free_multicast); @@ -602,7 +603,7 @@ static void mcast_add_one(struct ib_devi port->port_num = dev->start_port + i; spin_lock_init(&port->lock); port->table = RB_ROOT; - init_waitqueue_head(&port->wait); + init_completion(&port->comp); atomic_set(&port->refcount, 1); } @@ -644,8 +645,8 @@ static void mcast_remove_one(struct ib_d for (i = 0; i < dev->end_port - dev->start_port; i++) { port = &dev->port[i]; leave_groups(port); - atomic_dec(&port->refcount); - wait_event(port->wait, !atomic_read(&port->refcount)); + deref_port(port); + wait_for_completion(&port->comp); } kfree(dev); Index: cma.c =================================================================== --- cma.c (revision 6948) +++ cma.c (working copy) @@ -29,6 +29,7 @@ * */ +#include #include #include #include @@ -70,7 +71,7 @@ struct cma_device { struct list_head list; struct ib_device *device; __be64 node_guid; - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; struct list_head id_list; }; @@ -111,7 +112,7 @@ struct rdma_id_private { enum cma_state state; spinlock_t lock; - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; wait_queue_head_t wait_remove; atomic_t dev_remove; @@ -244,11 +245,16 @@ static void cma_attach_to_dev(struct rdm list_add_tail(&id_priv->list, &cma_dev->id_list); } +static inline void cma_deref_dev(struct cma_device *cma_dev) +{ + if (atomic_dec_and_test(&cma_dev->refcount)) + complete(&cma_dev->comp); +} + static void cma_detach_from_dev(struct rdma_id_private *id_priv) { list_del(&id_priv->list); - if (atomic_dec_and_test(&id_priv->cma_dev->refcount)) - wake_up(&id_priv->cma_dev->wait); + cma_deref_dev(id_priv->cma_dev); id_priv->cma_dev = NULL; } @@ -288,7 +294,7 @@ static int cma_acquire_dev(struct rdma_i static void cma_deref_id(struct rdma_id_private *id_priv) { if (atomic_dec_and_test(&id_priv->refcount)) - wake_up(&id_priv->wait); + complete(&id_priv->comp); } static void cma_release_remove(struct rdma_id_private *id_priv) @@ -311,7 +317,7 @@ struct rdma_cm_id* rdma_create_id(rdma_c id_priv->id.event_handler = event_handler; id_priv->id.ps = ps; spin_lock_init(&id_priv->lock); - init_waitqueue_head(&id_priv->wait); + init_completion(&id_priv->comp); atomic_set(&id_priv->refcount, 1); init_waitqueue_head(&id_priv->wait_remove); atomic_set(&id_priv->dev_remove, 0); @@ -618,8 +624,8 @@ static void cma_destroy_listen(struct rd } list_del(&id_priv->listen_list); - atomic_dec(&id_priv->refcount); - wait_event(id_priv->wait, !atomic_read(&id_priv->refcount)); + cma_deref_id(id_priv); + wait_for_completion(&id_priv->comp); kfree(id_priv); } @@ -699,8 +705,8 @@ void rdma_destroy_id(struct rdma_cm_id * } cma_release_port(id_priv); - atomic_dec(&id_priv->refcount); - wait_event(id_priv->wait, !atomic_read(&id_priv->refcount)); + cma_deref_id(id_priv); + wait_for_completion(&id_priv->comp); kfree(id_priv->id.route.path_rec); kfree(id_priv); @@ -1778,7 +1784,7 @@ static void cma_add_one(struct ib_device if (!cma_dev->node_guid) goto err; - init_waitqueue_head(&cma_dev->wait); + init_completion(&cma_dev->comp); atomic_set(&cma_dev->refcount, 1); INIT_LIST_HEAD(&cma_dev->id_list); ib_set_client_data(device, &cma_client, cma_dev); @@ -1845,8 +1851,8 @@ static void cma_process_remove(struct cm } mutex_unlock(&lock); - atomic_dec(&cma_dev->refcount); - wait_event(cma_dev->wait, !atomic_read(&cma_dev->refcount)); + cma_deref_dev(cma_dev); + wait_for_completion(&cma_dev->comp); } static void cma_remove_one(struct ib_device *device) Index: mad.c =================================================================== --- mad.c (revision 6886) +++ mad.c (working copy) @@ -353,7 +353,7 @@ struct ib_mad_agent *ib_register_mad_age INIT_WORK(&mad_agent_priv->local_work, local_completions, mad_agent_priv); atomic_set(&mad_agent_priv->refcount, 1); - init_waitqueue_head(&mad_agent_priv->wait); + init_completion(&mad_agent_priv->comp); return &mad_agent_priv->agent; @@ -468,7 +468,7 @@ struct ib_mad_agent *ib_register_mad_sno mad_snoop_priv->agent.qp = port_priv->qp_info[qpn].qp; mad_snoop_priv->agent.port_num = port_num; mad_snoop_priv->mad_snoop_flags = mad_snoop_flags; - init_waitqueue_head(&mad_snoop_priv->wait); + init_completion(&mad_snoop_priv->comp); mad_snoop_priv->snoop_index = register_snoop_agent( &port_priv->qp_info[qpn], mad_snoop_priv); @@ -487,6 +487,18 @@ error1: } EXPORT_SYMBOL(ib_register_mad_snoop); +static inline void deref_mad_agent(struct ib_mad_agent_private *mad_agent_priv) +{ + if (atomic_dec_and_test(&mad_agent_priv->refcount)) + complete(&mad_agent_priv->comp); +} + +static inline void deref_snoop_agent(struct ib_mad_snoop_private *mad_snoop_priv) +{ + if (atomic_dec_and_test(&mad_snoop_priv->refcount)) + complete(&mad_snoop_priv->comp); +} + static void unregister_mad_agent(struct ib_mad_agent_private *mad_agent_priv) { struct ib_mad_port_private *port_priv; @@ -510,9 +522,8 @@ static void unregister_mad_agent(struct flush_workqueue(port_priv->wq); ib_cancel_rmpp_recvs(mad_agent_priv); - atomic_dec(&mad_agent_priv->refcount); - wait_event(mad_agent_priv->wait, - !atomic_read(&mad_agent_priv->refcount)); + deref_mad_agent(mad_agent_priv); + wait_for_completion(&mad_agent_priv->comp); kfree(mad_agent_priv->reg_req); ib_dereg_mr(mad_agent_priv->agent.mr); @@ -530,9 +541,8 @@ static void unregister_mad_snoop(struct atomic_dec(&qp_info->snoop_count); spin_unlock_irqrestore(&qp_info->snoop_lock, flags); - atomic_dec(&mad_snoop_priv->refcount); - wait_event(mad_snoop_priv->wait, - !atomic_read(&mad_snoop_priv->refcount)); + deref_snoop_agent(mad_snoop_priv); + wait_for_completion(&mad_snoop_priv->comp); kfree(mad_snoop_priv); } @@ -601,8 +611,7 @@ static void snoop_send(struct ib_mad_qp_ spin_unlock_irqrestore(&qp_info->snoop_lock, flags); mad_snoop_priv->agent.snoop_handler(&mad_snoop_priv->agent, send_buf, mad_send_wc); - if (atomic_dec_and_test(&mad_snoop_priv->refcount)) - wake_up(&mad_snoop_priv->wait); + deref_snoop_agent(mad_snoop_priv); spin_lock_irqsave(&qp_info->snoop_lock, flags); } spin_unlock_irqrestore(&qp_info->snoop_lock, flags); @@ -627,8 +636,7 @@ static void snoop_recv(struct ib_mad_qp_ spin_unlock_irqrestore(&qp_info->snoop_lock, flags); mad_snoop_priv->agent.recv_handler(&mad_snoop_priv->agent, mad_recv_wc); - if (atomic_dec_and_test(&mad_snoop_priv->refcount)) - wake_up(&mad_snoop_priv->wait); + deref_snoop_agent(mad_snoop_priv); spin_lock_irqsave(&qp_info->snoop_lock, flags); } spin_unlock_irqrestore(&qp_info->snoop_lock, flags); @@ -969,8 +977,7 @@ void ib_free_send_mad(struct ib_mad_send free_send_rmpp_list(mad_send_wr); kfree(send_buf->mad); - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); } EXPORT_SYMBOL(ib_free_send_mad); @@ -1789,8 +1796,7 @@ static void ib_mad_complete_recv(struct mad_recv_wc = ib_process_rmpp_recv_wc(mad_agent_priv, mad_recv_wc); if (!mad_recv_wc) { - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); return; } } @@ -1802,8 +1808,7 @@ static void ib_mad_complete_recv(struct if (!mad_send_wr) { spin_unlock_irqrestore(&mad_agent_priv->lock, flags); ib_free_recv_mad(mad_recv_wc); - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); return; } ib_mark_mad_done(mad_send_wr); @@ -1822,8 +1827,7 @@ static void ib_mad_complete_recv(struct } else { mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, mad_recv_wc); - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); } } @@ -2053,8 +2057,7 @@ void ib_mad_complete_send_wr(struct ib_m mad_send_wc); /* Release reference on agent taken when sending */ - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); return; done: spin_unlock_irqrestore(&mad_agent_priv->lock, flags); Index: mad_priv.h =================================================================== --- mad_priv.h (revision 6884) +++ mad_priv.h (working copy) @@ -37,6 +37,7 @@ #ifndef __IB_MAD_PRIV_H__ #define __IB_MAD_PRIV_H__ +#include #include #include #include @@ -108,7 +109,7 @@ struct ib_mad_agent_private { struct list_head rmpp_list; atomic_t refcount; - wait_queue_head_t wait; + struct completion comp; }; struct ib_mad_snoop_private { @@ -117,7 +118,7 @@ struct ib_mad_snoop_private { int snoop_index; int mad_snoop_flags; atomic_t refcount; - wait_queue_head_t wait; + struct completion comp; }; struct ib_mad_send_wr_private { Index: ucm.c =================================================================== --- ucm.c (revision 6884) +++ ucm.c (working copy) @@ -32,6 +32,8 @@ * * $Id$ */ + +#include #include #include #include @@ -73,7 +75,7 @@ struct ib_ucm_file { struct ib_ucm_context { int id; - wait_queue_head_t wait; + struct completion comp; atomic_t ref; int events_reported; @@ -139,7 +141,7 @@ static struct ib_ucm_context *ib_ucm_ctx static void ib_ucm_ctx_put(struct ib_ucm_context *ctx) { if (atomic_dec_and_test(&ctx->ref)) - wake_up(&ctx->wait); + complete(&ctx->comp); } static inline int ib_ucm_new_cm_id(int event) @@ -179,7 +181,7 @@ static struct ib_ucm_context *ib_ucm_ctx return NULL; atomic_set(&ctx->ref, 1); - init_waitqueue_head(&ctx->wait); + init_completion(&ctx->comp); ctx->file = file; INIT_LIST_HEAD(&ctx->events); @@ -559,8 +561,8 @@ static ssize_t ib_ucm_destroy_id(struct if (IS_ERR(ctx)) return PTR_ERR(ctx); - atomic_dec(&ctx->ref); - wait_event(ctx->wait, !atomic_read(&ctx->ref)); + ib_ucm_ctx_put(ctx); + wait_for_completion(&ctx->comp); /* No new events will be generated after destroying the cm_id. */ ib_destroy_cm_id(ctx->cm_id); Index: ucma.c =================================================================== --- ucma.c (revision 6949) +++ ucma.c (working copy) @@ -30,6 +30,7 @@ * SOFTWARE. */ +#include #include #include #include @@ -61,7 +62,7 @@ struct ucma_file { struct ucma_context { int id; - wait_queue_head_t wait; + struct completion comp; atomic_t ref; int events_reported; int backlog; @@ -105,7 +106,7 @@ static struct ucma_context* ucma_get_ctx static void ucma_put_ctx(struct ucma_context *ctx) { if (atomic_dec_and_test(&ctx->ref)) - wake_up(&ctx->wait); + complete(&ctx->comp); } static void ucma_cleanup_events(struct ucma_context *ctx) @@ -140,7 +141,7 @@ static struct ucma_context* ucma_alloc_c return NULL; atomic_set(&ctx->ref, 1); - init_waitqueue_head(&ctx->wait); + init_completion(&ctx->comp); ctx->file = file; INIT_LIST_HEAD(&ctx->events); @@ -341,8 +342,8 @@ static ssize_t ucma_destroy_id(struct uc if (IS_ERR(ctx)) return PTR_ERR(ctx); - atomic_dec(&ctx->ref); - wait_event(ctx->wait, !atomic_read(&ctx->ref)); + ucma_put_ctx(ctx); + wait_for_completion(&ctx->comp); /* No new events will be generated after destroying the id. */ rdma_destroy_id(ctx->cm_id); From rdreier at cisco.com Tue May 9 12:24:51 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 09 May 2006 12:24:51 -0700 Subject: [openib-general][PATCH] srp: tuned parameters, In-Reply-To: <443E8325.2000502@mellanox.com> (Vu Pham's message of "Thu, 13 Apr 2006 09:58:13 -0700") References: <443E8325.2000502@mellanox.com> Message-ID: I finally looked this over. First, this should be two patches: making srp_sg_tablesize tunable should be a separate change from making it possible to specify max_cmd_per_lun for a target. The srp_sg_tablesize change makes the default number of SG entries quite a bit larger than it is now, which makes the default max IU length much bigger. Is this justified? What workload creates such huge SG lists? For the cmd_per_lun change, shouldn't the line > + target->scsi_host->cmd_per_lun = token; be something like target->scsi_host->cmd_per_lun = min(token, SRP_SQ_SIZE); otherwise it's too easy to overflow a send queue by mistake. - R. From mshefty at ichips.intel.com Tue May 9 12:25:07 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 09 May 2006 12:25:07 -0700 Subject: [openib-general] CMA: port 2 loopback problems In-Reply-To: <20060508202904.GD25527@mellanox.co.il> References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> <445FA39B.50107@ichips.intel.com> <20060508202904.GD25527@mellanox.co.il> Message-ID: <4460EC93.4090307@ichips.intel.com> Michael S. Tsirkin wrote: > I thought about this too. People actually do expect loopback to work when link > is down. I guess we could create "loopback" path record, with parameters such > as SL editable from sysfs. Until the underlying IB stack supports loopback connections on a non-active port, my thinking is to have the RDMA CM select the first active port when connecting in loopback. - Sean From xma at us.ibm.com Tue May 9 12:41:30 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 9 May 2006 12:41:30 -0700 Subject: [openib-general] ip over ib throughtput In-Reply-To: <20060509190538.GC29261@esmail.cup.hp.com> Message-ID: Grant Grundler wrote on 05/09/2006 12:05:38 PM: > On Tue, May 09, 2006 at 11:36:45AM -0700, Shirley Ma wrote: > > What throughput did you get on two CPUs? > > With one CPU, I get ~2.5-2.8 Gb/s. With two CPUs: 3.5-3.6 Gb/s. > The last SVN version I tested was 2.6.15 + r4929 (several monthes ago). I got 3.5-3.66Gb/s on UP with splitting CQ + threads supports on send/recv CQ polling, and didn't get any better on SMP. > To be clear, "one CPU" means the netperf process is bound to the same > CPU as the one handling mthca interrupts with taskset. Ditto for netserver > on the other system. "Two CPU" means bind netperf/netserver processes > to a different CPU that is NOT handling the mthca interrupt. I tried interrupts affinity on mthca, it didn't work by changing /proc/irq/XXX/smp_affinity. The interrupts distruction kept bouncing between there two CPUs. What did you do to handle mthca interrupt on a fixed cpu? > Also note that the "Service Demand" (CPU us/KB) goes up by > 10-20% also. So we really only want to do this when the CPU > handling interrupts is saturated and we know the other CPU > is available. > grant Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Tue May 9 12:46:43 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 9 May 2006 12:46:43 -0700 Subject: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: <40FCD6B6-9135-43C1-8974-E9070475DB78@schihei.de> Message-ID: openib-general-bounces at openib.org wrote on 05/09/2006 11:57:01 AM: > On 09.05.2006, at 18:49, Michael S. Tsirkin wrote: > > >> The trivial way to do it would be to use the same idea as the current > >> ehca driver: just create a thread for receive CQ events and a thread > >> for send CQ events, and defer CQ polling into those two threads. > > > > For RX, isn't this basically what NAPI is doing? > > Only NAPI seems better, avoiding interrupts completely and avoiding > > latency hit > > by only getting triggered on high load ... > > Does NAPI schedules CQ callbacks to different CPUs or stays the callback > (handling of data, etc.) on the same CPU where the interrupt came in? > > Regards, > Heiko My understanding is NAPI handle interrutps CQ callbacks on the same CPU. But you could implement NAPI differently, then it doesn't follow the native NAPI implementation. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Tue May 9 13:19:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 23:19:30 +0300 Subject: [openib-general] CMA: port 2 loopback problems In-Reply-To: <4460EC93.4090307@ichips.intel.com> References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> <445FA39B.50107@ichips.intel.com> <20060508202904.GD25527@mellanox.co.il> <4460EC93.4090307@ichips.intel.com> Message-ID: <20060509201930.GA24713@mellanox.co.il> Quoting r. Sean Hefty : > Until the underlying IB stack supports loopback connections on a non-active > port How do you mean? You can already create loopback connections as per IB spec - it works already. -- MST From mst at mellanox.co.il Tue May 9 13:20:41 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 23:20:41 +0300 Subject: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: References: <40FCD6B6-9135-43C1-8974-E9070475DB78@schihei.de> Message-ID: <20060509202041.GB24713@mellanox.co.il> Quoting r. Shirley Ma : > My understanding is NAPI handle interrutps CQ callbacks on the same CPU. My understanding is NAPI disables interrupts under high RX load. No? -- MST From mst at mellanox.co.il Tue May 9 13:25:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 23:25:25 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060509172703.GA22825@mellanox.co.il> Message-ID: <20060509202525.GC24713@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [PATCH] cm refcount race fix > > Here's a patch for all of the files that you listed. > > I did do some basic testing and didn't see any issues. Looks fine, changes look trivial which is good. I'll test a bit tomorrow. -- MST From mst at mellanox.co.il Tue May 9 13:31:10 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 9 May 2006 23:31:10 +0300 Subject: [openib-general] Re: CMA: port 2 loopback problems In-Reply-To: <4460EC93.4090307@ichips.intel.com> References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> <445FA39B.50107@ichips.intel.com> <20060508202904.GD25527@mellanox.co.il> <4460EC93.4090307@ichips.intel.com> Message-ID: <20060509203109.GD24713@mellanox.co.il> Quoting r. Sean Hefty : > my thinking is to have the RDMA CM select the first active port when > connecting in loopback. OK. The way to do this I guess is to scan all devices along the lines of what xxx_ip_dev_find does, and check which one is up. -- MST From iod00d at hp.com Tue May 9 13:51:26 2006 From: iod00d at hp.com (Grant Grundler) Date: Tue, 9 May 2006 13:51:26 -0700 Subject: [openib-general] ip over ib throughtput In-Reply-To: References: <20060509190538.GC29261@esmail.cup.hp.com> Message-ID: <20060509205126.GC30011@esmail.cup.hp.com> On Tue, May 09, 2006 at 12:41:30PM -0700, Shirley Ma wrote: > Grant Grundler wrote on 05/09/2006 12:05:38 PM: > > > On Tue, May 09, 2006 at 11:36:45AM -0700, Shirley Ma wrote: > > > What throughput did you get on two CPUs? > > > > With one CPU, I get ~2.5-2.8 Gb/s. With two CPUs: 3.5-3.6 Gb/s. > > The last SVN version I tested was 2.6.15 + r4929 (several monthes ago). > > I got 3.5-3.66Gb/s on UP with splitting CQ + threads supports on send/recv > CQ polling, and didn't get any better on SMP. I was testing with a 3 year old 1.5 Ghz IA64 box. That might explain the difference between our results. > > To be clear, "one CPU" means the netperf process is bound to the same > > CPU as the one handling mthca interrupts with taskset. Ditto for > netserver > > on the other system. "Two CPU" means bind netperf/netserver processes > > to a different CPU that is NOT handling the mthca interrupt. > > I tried interrupts affinity on mthca, it didn't work by changing > /proc/irq/XXX/smp_affinity. The interrupts distruction kept bouncing > between there two CPUs. That sounds like a bug in the Local xAPIC and/or arch specific code. Setting the smp_affinity mask should bind the IRQ to a specific CPU. > What did you do to handle mthca interrupt on a fixed cpu? Used an HP IA64 ZX1 machine. HP IA64 chipsets do not implement "automatic" (read HW level) interrupt redirection. IIRC, it was the "XPR" functionality in the Local xAPIC that HP didn't implement (by design). Should be some way in SW to avoid that too. I don't know offhand what it is though. grant From Thomas.Talpey at netapp.com Tue May 9 14:14:00 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Tue, 09 May 2006 17:14:00 -0400 Subject: [openib-general] ip over ib throughtput In-Reply-To: References: <445FCBA1.1030903@ncsa.uiuc.edu> Message-ID: <7.0.1.0.2.20060509165536.04336f80@netapp.com> Shirley, Hassan - I am *very* interested in these results, and I want to at least mention that I'm doing similar NFS/RDMA testing, and getting some contrasting results. > 699040 699040 16384 60.00 3668.07 (458MB/s) > cpu utilization was around 95%. On my dual-2.4GHz Xeon, with the relatively untuned NFS/RDMA client on 2.6.16.6, I am able to pull about 450MB/sec of read throughput at 35% total CPU. This is using 16 threads of NFS direct i/o (O_DIRECT) to a midrange NetApp server, I did achieve a similar result with the Linux NFS/RDMA server (but only after hotwiring the ext2 interface because I don't have the spindles). I am using a dedicated filesystem test to generate the load, and also iozone. These NFS/RDMA direct reads use RDMA writes from the server to the client. Also, this was with client hyperthreading disabled and a dual-processor Dell, I could reboot with a single CPU to get more comparable results. But, the throughput was limited by server CPU (100%), the client was actually loafing a little bit. I thought it was interesting that a filesystem achieves the same throughput at better overhead than a dedicated network test. :-) And I haven't played with interrupt affinity at all. Tom. At 07:23 PM 5/8/2006, Shirley Ma wrote: >I am testing most of my patches. Under > >1.Intel(R) Xeon(TM) CPU 2.80GHz, one cpu, >2. fw-23108-3_4_000-MHXL-CF128-T.bin >3. pci-x without msi_x enabled >4. kernel 2.6.16 >5. netperf-2.4.0 >6. SVN 68XX+several IPoIB patches > >The best result I got so far: > >Testing with the following command line: >netperf -l 60 -H 10.1.1.100 -t TCP_STREAM -i 10,2 -I 95,5 -- -m 16384 -s 349520 -S 349520 > >TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.1.100 (10.1.1.100) port 0 AF_INET : +/-2.5% @ 95% conf. >Recv Send Send >Socket Socket Message Elapsed >Size Size Size Time Throughput >bytes bytes bytes secs. 10^6bits/sec > >699040 699040 16384 60.00 3668.07 (458MB/s) > >cpu utilization was around 95%. > >Thanks >Shirley Ma >IBM Linux Technology Center >15300 SW Koll Parkway >Beaverton, OR 97006-6063 >Phone(Fax): (503) 578-7638 > > > > >"Hassan M. Jafri" >Sent by: openib-general-bounces at openib.org > >05/08/2006 03:52 PM >To >openib-general at openib.org >cc >Subject >Re: [openib-general] ip over ib throughtput > > > > >I cant crank out more than 150 MB/sec with my 2.0 GHz xeons. verbs level >benchmarks, however give decent numbers for bandwidth. With netperf, the >server side CPU usage is 99% which is much higher than other posted >bandwidth results on this thread. Any suggestions? > >Here is the complete configuration for my bandwidth tests > >Kernel-2.6.15.4 >netperf-2.3-3 >OpenIB rev 6552 >MTLP23108-CF128 >Firmware 3.4.0 >MSI-X is enabled for the HCA > > >------------------------------ >Here is the netperf output > > > >TCP STREAM TEST to 192.168.2.2 >Recv Send Send Utilization Service >Demand >Socket Socket Message Elapsed Send Recv Send Recv >Size Size Size Time Throughput local remote local >remote >bytes bytes bytes secs. MBytes /s % T % T us/KB us/KB > >262142 262142 32768 10.01 151.32 59.66 99.84 7.700 >12.886 >------------------------------- > >Here is ib0 config for one of the nodes > >ib0 Link encap:UNSPEC HWaddr >00-02-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 > inet addr:192.168.2.1 Bcast:192.168.2.255 Mask:255.255.255.0 > inet6 addr: fe80::202:c902:0:3ce9/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:1724527 errors:0 dropped:0 overruns:0 frame:0 > TX packets:9685456 errors:0 dropped:2 overruns:0 carrier:0 > collisions:0 txqueuelen:128 > RX bytes:89830114 (85.6 MiB) TX bytes:2213308646 (2.0 GiB) > > > > > > > > >Michael S. Tsirkin wrote: >> Hi! >> What kind of performance do people see with ip over ib on gen2? >> I see about 100Mbyte/sec at 99% CPU utilisation on send, >> on an express card, Xeon 2.8GHz, SSE doorbells enabled. >> >> MST >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From xma at us.ibm.com Tue May 9 14:28:26 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 9 May 2006 14:28:26 -0700 Subject: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: <20060509202041.GB24713@mellanox.co.il> Message-ID: "Michael S. Tsirkin" wrote on 05/09/2006 01:20:41 PM: > Quoting r. Shirley Ma : > > My understanding is NAPI handle interrutps CQ callbacks on the same CPU. > > My understanding is NAPI disables interrupts under high RX load. No? > > -- > MST Yes, NAPI disables the interrupts based on the weight. In IPoIB case, it doesn't send out the next completion notification under heavy loading. The similiar CQ polling is still in NAPI on same CPU, but it's not a callback anymore. What I find that the send completion and recv completion are not that fast, which means RX load is not that heavy in IPoIB. That might be the reason compared to multiple threads implementation NAPI is not good. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Tue May 9 14:36:10 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 09 May 2006 14:36:10 -0700 Subject: [openib-general] CMA: port 2 loopback problems In-Reply-To: <20060509201930.GA24713@mellanox.co.il> References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> <445FA39B.50107@ichips.intel.com> <20060508202904.GD25527@mellanox.co.il> <4460EC93.4090307@ichips.intel.com> <20060509201930.GA24713@mellanox.co.il> Message-ID: <44610B4A.7010703@ichips.intel.com> Michael S. Tsirkin wrote: >>Until the underlying IB stack supports loopback connections on a non-active >>port > > How do you mean? You can already create loopback connections as per IB spec - it > works already. It works if the port is active. I don't believe that there's any code to support connecting on a port that's never been active. I'm not even sure that it would work if the port were down. - Sean From xma at us.ibm.com Tue May 9 14:47:05 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 9 May 2006 14:47:05 -0700 Subject: [openib-general] ip over ib throughtput In-Reply-To: <7.0.1.0.2.20060509165536.04336f80@netapp.com> Message-ID: Thanks for sharing these test results. The netperf/netserver IPoIB over UD mode test spent most of time on copying data from user to kernel + checksum(csum_partial_copy_generic), and it only can send no more than mtu=2044 ib_post_send() per wiki, which definitely limits its performance compared to RDMA read/write. I would expect NFS/RDMA throughput much better than IPoIB over UD. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Tue May 9 14:47:47 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 9 May 2006 14:47:47 -0700 Subject: [openib-general] [PATCH] RE: compliancy issue? In-Reply-To: <20060508085301.GD20207@mellanox.co.il> Message-ID: >CA4-24.2.3: The connecting peer shall terminate the connection attempt >if ExtMaxAdverts of the HAH is set to zero. > >This means that SDP must examine the HAH before RTU is sent. >But, CMA currently sends RTU from cma_rep_recv, before notifying >the user. Can you try this simple patch and see if it fixes your problem? You will need to call rdma_accept() or rdma_reject() after receiving a CONNECT_RESPONSE event. The conn_param to rdma_accept() should be NULL. Signed-off-by: Sean Hefty --- Index: cma.c =================================================================== --- cma.c (revision 6948) +++ cma.c (working copy) @@ -778,7 +778,7 @@ static int cma_ib_handler(struct ib_cm_i status = cma_verify_rep(id_priv, ib_event->private_data); if (status) event = RDMA_CM_EVENT_CONNECT_ERROR; - else if (id_priv->id.qp) { + else if (id_priv->id.qp && id_priv->id.ps != RDMA_PS_SDP) { status = cma_rep_recv(id_priv); event = status ? RDMA_CM_EVENT_CONNECT_ERROR : RDMA_CM_EVENT_ESTABLISHED; From weiny2 at llnl.gov Tue May 9 15:25:07 2006 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 09 May 2006 15:25:07 -0700 Subject: [openib-general] svn 6829 version issue with rdma_ucm and userspace? Message-ID: <20060509152507.7d777e24.weiny2@llnl.gov> I have been struggling with getting svn6829 to work and this is one of the latest issues. # odev1 /root > simple_rdma -S librdmacm: kernel ABI version 0 doesn't match library version 1. Failed to create rdma_cm_id This is a little rdma app I wrote which uses the rdma_cm userspace lib. The code was pulled from svn6829 (both kernel and user level). I had to make a couple of minor tweeks (beyond Woody's backport) to get it to compile on our 2.6.9 based kernel but nothing which would cause the above AFAICT. Am I not supposed to use the kernel code in svn6829? Thanks, Ira Weiny From mshefty at ichips.intel.com Tue May 9 15:39:39 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 09 May 2006 15:39:39 -0700 Subject: [openib-general] svn 6829 version issue with rdma_ucm and userspace? In-Reply-To: <20060509152507.7d777e24.weiny2@llnl.gov> References: <20060509152507.7d777e24.weiny2@llnl.gov> Message-ID: <44611A2B.4080703@ichips.intel.com> Ira Weiny wrote: > I have been struggling with getting svn6829 to work and this is one of the > latest issues. > > # odev1 /root > simple_rdma -S > librdmacm: kernel ABI version 0 doesn't match library version 1. > Failed to create rdma_cm_id > > This is a little rdma app I wrote which uses the rdma_cm userspace lib. > > The code was pulled from svn6829 (both kernel and user level). I had to make a > couple of minor tweeks (beyond Woody's backport) to get it to compile on our > 2.6.9 based kernel but nothing which would cause the above AFAICT. > > Am I not supposed to use the kernel code in svn6829? This looks like a backport issue. The library may be trying to read the abi_version from the wrong location for the backport. Normally, the abi_version file is found in /sys/class/misc/rdma_cm/abi_version. I believe that the backport patch moved this under something like /sys/class/infiniband, which would cause the library to need to updated to read from the new location. Woody may have more information. - Sean From weiny2 at llnl.gov Tue May 9 15:59:46 2006 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 09 May 2006 15:59:46 -0700 Subject: [openib-general] svn 6829 version issue with rdma_ucm and userspace? In-Reply-To: <44611A2B.4080703@ichips.intel.com> References: <20060509152507.7d777e24.weiny2@llnl.gov> <44611A2B.4080703@ichips.intel.com> Message-ID: <20060509155946.6a3ae802.weiny2@llnl.gov> Sean, Thanks that helped. Changing the /sys/class/misc/rdma_cm/abi_version to /sys/class/infiniband_ucma/abi_version in the user lib fixed the problem. I wonder though if this is the correct solution? Ira On Tue, 09 May 2006 15:39:39 -0700 Sean Hefty wrote: > Ira Weiny wrote: > > I have been struggling with getting svn6829 to work and this is one > > of the latest issues. > > > > # odev1 /root > simple_rdma -S > > librdmacm: kernel ABI version 0 doesn't match library version 1. > > Failed to create rdma_cm_id > > > > This is a little rdma app I wrote which uses the rdma_cm userspace > > lib. > > > > The code was pulled from svn6829 (both kernel and user level). I > > had to make a couple of minor tweeks (beyond Woody's backport) to > > get it to compile on our 2.6.9 based kernel but nothing which would > > cause the above AFAICT. > > > > Am I not supposed to use the kernel code in svn6829? > > This looks like a backport issue. The library may be trying to read > the abi_version from the wrong location for the backport. > > Normally, the abi_version file is found > in /sys/class/misc/rdma_cm/abi_version. I believe that the backport > patch moved this under something like /sys/class/infiniband, which > would cause the library to need to updated to read from the new > location. Woody may have more information. > > - Sean > From robert.j.woodruff at intel.com Tue May 9 16:00:06 2006 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Tue, 9 May 2006 16:00:06 -0700 Subject: [openib-general] svn 6829 version issue with rdma_ucm and userspace? In-Reply-To: <20060509152507.7d777e24.weiny2@llnl.gov> Message-ID: <000001c673bc$515b6d70$010fa8c0@amr.corp.intel.com> Ira wrote, ># odev1 /root > simple_rdma -S >librdmacm: kernel ABI version 0 doesn't match library version 1. >Failed to create rdma_cm_id Sean is correct, the backport puts the abi_version in a different place to be consistent with how the other kernel modules do it. Try appying this patch to your userspace code and rebuilding the librdmacm usermode library. I forgot to put this patch into SVN put will do so. diff -Naurp userspace/librdmacm/src/cma.c userspace-fixups/librdmacm/src/cma.c --- userspace/librdmacm/src/cma.c 2006-04-27 09:34:16.000000000 -0700 +++ userspace-fixups/librdmacm/src/cma.c 2006-05-01 14:46:34.000000000 -0700 @@ -151,7 +151,7 @@ static int check_abi_version(void) return -ENODEV; } - strncat(path, "/class/misc/rdma_cm/abi_version", sizeof path); + strncat(path, "/class/infiniband_ucma/abi_version", sizeof path); if (!sysfs_read_attribute_value(path, val, sizeof val)) abi_ver = strtol(val, NULL, 10); From segher at kernel.crashing.org Tue May 9 16:35:57 2006 From: segher at kernel.crashing.org (Segher Boessenkool) Date: Wed, 10 May 2006 01:35:57 +0200 Subject: [openib-general] [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: References: <4450A196.2050901@de.ibm.com> <445B4DA9.9040601@de.ibm.com> <44608C90.30909@de.ibm.com> Message-ID: <75CCC04D-06EF-48B6-BE76-8BFAA541A764@kernel.crashing.org> > Heiko> Yes, I agree. It would not be an optimal solution, because > Heiko> other upper level protocols (e.g. SDP, SRP, etc.) or > Heiko> userspace verbs would not be affected by this > Heiko> changes. Nevertheless, how can an improved "scaling" or > Heiko> "SMP" version of IPoIB look like. How could it be > Heiko> implemented? > > The trivial way to do it would be to use the same idea as the current > ehca driver: just create a thread for receive CQ events and a thread > for send CQ events, and defer CQ polling into those two threads. > > Something even better may be possible by specializing to IPoIB of > course. The hardware IRQ should go to some CPU close to the hardware itself. The softirq (or whatever else) should go to the same CPU that is handling the user-level task for that message. Or a CPU close to it, at least. Segher From bugzilla-daemon at openib.org Tue May 9 17:44:26 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 9 May 2006 17:44:26 -0700 (PDT) Subject: [openib-general] [Bug 78] New: OFED 1.0 RC 4 iser install fails if patches already applied Message-ID: <20060510004426.9755B2283E8@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=78 Summary: OFED 1.0 RC 4 iser install fails if patches already applied Product: OpenFabrics Linux Version: gen2 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: iSER AssignedTo: bugzilla at openib.org ReportedBy: kball at pathscale.com The install script does not correctly handle the case where the iser patches have already been applied on a system. This either needs to be fixed in the install script or the uninstall script needs to remove the patches when it uninstalls, because I cannot use the install script to install iSER on a system where I tested rc3. It tries to install the patch and fails out when it finds 'patch' failing. 'patch' is failing saying "these patches are already installed!" ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ralphc at pathscale.com Tue May 9 18:08:31 2006 From: ralphc at pathscale.com (Ralph Campbell) Date: Tue, 09 May 2006 18:08:31 -0700 Subject: [openib-general] Pathscale HCA can benefit using 2k MTU Message-ID: <1147223311.1062.42.camel@brick.internal.keyresearch.com> There is no need to limit the MTU to 1K on Pathscale HCAs. This patch fixes the problem. Signed-off-by: Ralph Campbell Index: src/userspace/mpi/mvapich-gen2/mpid/ch_gen2/ibverbs_const.h =================================================================== --- src/userspace/mpi/mvapich-gen2/mpid/ch_gen2/ibverbs_const.h (revision 7001) +++ src/userspace/mpi/mvapich-gen2/mpid/ch_gen2/ibverbs_const.h (working copy) @@ -66,7 +66,7 @@ #define HOSTNAME_LEN (255) -#if (defined(_MLX_PCI_EX_DDR_) || defined(_IBM_EHCA_)) +#if (defined(_MLX_PCI_EX_DDR_) || defined(_IBM_EHCA_) || defined(_PATH_HT_)) #define VIADEV_DEFAULT_MTU (IBV_MTU_2048) -- Ralph Campbell From Thomas.Talpey at netapp.com Tue May 9 18:58:09 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Tue, 09 May 2006 21:58:09 -0400 Subject: [openib-general] ip over ib throughtput In-Reply-To: References: <7.0.1.0.2.20060509165536.04336f80@netapp.com> Message-ID: <7.0.1.0.2.20060509213436.07b2c928@netapp.com> At 05:47 PM 5/9/2006, Shirley Ma wrote: >Thanks for sharing these test results. > >The netperf/netserver IPoIB over UD mode test spent most of time on copying data from user to kernel + checksum(csum_partial_copy_generic), and it only can send no more than mtu=2044 ib_post_send() per wiki, which definitely limits its performance compared to RDMA read/write. I would expect NFS/RDMA throughput much better than IPoIB over UD. Actually, I got excellent results in regular cached mode too, which results in one data copy from the file page cache to user space. (In NFS O_DIRECT, the RDMA is targeted at the user pages, bypassing the cache and yielding zero-copy zero-touch even though the I/O is kernel mediated by the NFS stack.) Throughput remains as high as in the direct case (because it's still not CPU limited), and utilization rises to a number less than you might expect - 65%. Specifically, the cached i/o test used 79us/32KB, and the direct i/o used 56us/32KB. Of course, the NFS/RDMA copies do not need to compute the checksum, so they are more efficient than the socket atop IPoIB. But I am not sure that the payload per WQE is important. We are nowhere near the op rate of the adapter. I think the more important factor is the interrupt rate. NFS/RDMA allows the client to take a single interrupt (the server reply) after all RDMA has occurred. Also, the client uses unsignalled completion on as many sends as possible. I believe I measured 0.7 interrupts per NFS op in my tests. Well, I have been very pleased with the results so far! We'll have more detail as we go. Tom. From bugzilla-daemon at openib.org Tue May 9 19:54:43 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 9 May 2006 19:54:43 -0700 (PDT) Subject: [openib-general] [Bug 68] OFED 1.0 rc4: kernel build failed in IB core on SUSE10 Message-ID: <20060510025443.B500A2283D5@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=68 kball at pathscale.com changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |critical ------- Additional Comments From kball at pathscale.com 2006-05-09 19:54 ------- Since this is blocking my testing on one of the claimed to be supported distros, I'm raising the severity on this. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From xma at us.ibm.com Tue May 9 20:13:29 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 9 May 2006 20:13:29 -0700 Subject: [openib-general] ip over ib throughtput In-Reply-To: <7.0.1.0.2.20060509213436.07b2c928@netapp.com> Message-ID: "Talpey, Thomas" wrote on 05/09/2006 06:58:09 PM: > Of course, the NFS/RDMA copies do not need to compute the checksum, > so they are more efficient than the socket atop IPoIB. But I am not > sure that the payload per WQE is important. We are nowhere near the > op rate of the adapter. I think the more important factor is the interrupt > rate. NFS/RDMA allows the client to take a single interrupt (the server > reply) after all RDMA has occurred. Also, the client uses unsignalled > completion on as many sends as possible. I believe I measured 0.7 > interrupts per NFS op in my tests. > > Well, I have been very pleased with the results so far! We'll have more > detail as we go. > > Tom. Computing the checksum is expensive on receiver, on sender it's free. Yes, high interrupt rate is another killer of the performance. But I think payload is important, with large MTU supports in IPoIB RC mode, the performance would be much better. Have you tried to send payload smaller than 2044? Any difference? Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom at opengridcomputing.com Tue May 9 20:18:00 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Tue, 09 May 2006 22:18:00 -0500 Subject: [openib-general] rdma_cm.h: comment nits. In-Reply-To: <20060508135855.GE21036@mellanox.co.il> References: <20060508135855.GE21036@mellanox.co.il> Message-ID: <1147231080.5093.13.camel@trinity.ogc.int> On Mon, 2006-05-08 at 16:58 +0300, Michael S. Tsirkin wrote: > Two nits wrt rdma_cm.h: > > /** > * * rdma_reject - Called on the passive side to reject a connection request. > */ > > > Its OK to call rdma_reject on active side as well, isn't it? You'll get -EINVAL on iWARP if you do this.... > /** > * rdma_cm_event_handler - Callback used to report user events. > * > * Notes: Users may not call rdma_destroy_id from this callback to destroy > * the passed in id, or a corresponding listen id. Returning a > * non-zero value from the callback will destroy the corresponding id. > */ > > CMA will actually always destroy the passed in id, not the "corresponding id". From mst at mellanox.co.il Tue May 9 22:29:37 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 08:29:37 +0300 Subject: [openib-general] Re: [PATCH] RE: compliancy issue? In-Reply-To: References: <20060508085301.GD20207@mellanox.co.il> Message-ID: <20060510052937.GH22825@mellanox.co.il> Quoting r. Sean Hefty : > Subject: [PATCH] RE: compliancy issue? > > >CA4-24.2.3: The connecting peer shall terminate the connection attempt > >if ExtMaxAdverts of the HAH is set to zero. > > > >This means that SDP must examine the HAH before RTU is sent. > >But, CMA currently sends RTU from cma_rep_recv, before notifying > >the user. > > Can you try this simple patch and see if it fixes your problem? You will > need to call rdma_accept() or rdma_reject() after receiving a CONNECT_RESPONSE > event. The conn_param to rdma_accept() should be NULL. > > Signed-off-by: Sean Hefty > --- > Index: cma.c > =================================================================== > --- cma.c (revision 6948) > +++ cma.c (working copy) > @@ -778,7 +778,7 @@ static int cma_ib_handler(struct ib_cm_i > status = cma_verify_rep(id_priv, ib_event->private_data); > if (status) > event = RDMA_CM_EVENT_CONNECT_ERROR; > - else if (id_priv->id.qp) { > + else if (id_priv->id.qp && id_priv->id.ps != RDMA_PS_SDP) { > status = cma_rep_recv(id_priv); > event = status ? RDMA_CM_EVENT_CONNECT_ERROR : > RDMA_CM_EVENT_ESTABLISHED; > Would not cma_rep_recv be required to modify the QP? Anyway, I'll try. -- MST From mst at mellanox.co.il Tue May 9 22:33:41 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 08:33:41 +0300 Subject: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: References: <20060509202041.GB24713@mellanox.co.il> Message-ID: <20060510053341.GI22825@mellanox.co.il> Quoting r. Shirley Ma : > Subject: Re: [openib-general] Re: [PATCH 07/16] ehca: interrupt handling?routines > > > "Michael S. Tsirkin" wrote on 05/09/2006 01:20:41 PM: > > > Quoting r. Shirley Ma : > > > My understanding is NAPI handle interrutps CQ callbacks on the same CPU. > > > > My understanding is NAPI disables interrupts under high RX load. No? > > Yes, NAPI disables the interrupts based on the weight. In IPoIB case, it doesn't > send out the next completion notification under heavy loading. > The similiar CQ polling is still in NAPI on same CPU, but it's not a callback > anymore. Sorry, same CPU as what? -- MST From mst at mellanox.co.il Tue May 9 22:43:13 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 08:43:13 +0300 Subject: [openib-general] Re: [PATCH] RE: compliancy issue? In-Reply-To: <20060510052937.GH22825@mellanox.co.il> References: <20060508085301.GD20207@mellanox.co.il> <20060510052937.GH22825@mellanox.co.il> Message-ID: <20060510054313.GJ22825@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Subject: Re: [PATCH] RE: compliancy issue? > > Quoting r. Sean Hefty : > > Subject: [PATCH] RE: compliancy issue? > > > > >CA4-24.2.3: The connecting peer shall terminate the connection attempt > > >if ExtMaxAdverts of the HAH is set to zero. > > > > > >This means that SDP must examine the HAH before RTU is sent. > > >But, CMA currently sends RTU from cma_rep_recv, before notifying > > >the user. > > > > Can you try this simple patch and see if it fixes your problem? You will > > need to call rdma_accept() or rdma_reject() after receiving a CONNECT_RESPONSE > > event. The conn_param to rdma_accept() should be NULL. > > > > Signed-off-by: Sean Hefty > > --- > > Index: cma.c > > =================================================================== > > --- cma.c (revision 6948) > > +++ cma.c (working copy) > > @@ -778,7 +778,7 @@ static int cma_ib_handler(struct ib_cm_i > > status = cma_verify_rep(id_priv, ib_event->private_data); > > if (status) > > event = RDMA_CM_EVENT_CONNECT_ERROR; > > - else if (id_priv->id.qp) { > > + else if (id_priv->id.qp && id_priv->id.ps != RDMA_PS_SDP) { > > status = cma_rep_recv(id_priv); > > event = status ? RDMA_CM_EVENT_CONNECT_ERROR : > > RDMA_CM_EVENT_ESTABLISHED; > > > > Would not cma_rep_recv be required to modify the QP? > Anyway, I'll try. Hmm, it seems that QP will get modified to RTS on RTU only. That's bad - I want to move it to RTS on REP, since receive completion might cross RTU and I want to be able to respond to each send immediately. I'll do some more tests. -- MST From jackm at mellanox.co.il Tue May 9 23:56:30 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Wed, 10 May 2006 09:56:30 +0300 Subject: [openib-general] [PATCH 3/3] librdmacm: add ability to get/set transport specific options In-Reply-To: <4460C271.6060304@ichips.intel.com> References: <200605091223.39564.jackm@mellanox.co.il> <4460C271.6060304@ichips.intel.com> Message-ID: <200605100956.30583.jackm@mellanox.co.il> On Tuesday 09 May 2006 19:25, Sean Hefty wrote: > > This is fine for route lookup, but not rdma_get_option. > I missed adding rdma_ib.c to my tag file, so missed the rdma_get_option usage of local_sa.c -- sorry about that. My preference is to take the most recent rdma_cm, since there are several important bug fixes since svn 6860; also, we avoid the ABI issue. Sounds like I should add a dummy "ib_local_sa.h" file to OFED kernel build, with static-inlines for all functions exported by local_sa.c -- and have these functions return -ENOSYS (as you suggested), and instead modify cma_resolve_ib_route() as follows: Index: cma.c =================================================================== --- cma.c (revision 6980) +++ cma.c (working copy) @@ -1172,7 +1172,7 @@ static int cma_resolve_ib_route(struct r route->num_paths = 1; queue_work(cma_wq, &work->work); } else { - if (ret == -ENODATA) + if (ret == -ENODATA || ret == -ENOSYS) ret = cma_query_ib_route(id_priv, timeout_ms, work); if (ret) goto err2; - Jack P.S., (maybe it would be a good idea to have this change in any event on the main trunk, and not just for OFED?) From mst at mellanox.co.il Wed May 10 00:17:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 10:17:19 +0300 Subject: [openib-general] Re: [PATCH 3/3] librdmacm: add ability to get/set transport specific options In-Reply-To: <200605100956.30583.jackm@mellanox.co.il> References: <200605091223.39564.jackm@mellanox.co.il> <4460C271.6060304@ichips.intel.com> <200605100956.30583.jackm@mellanox.co.il> Message-ID: <20060510071719.GN21036@mellanox.co.il> Quoting r. Jack Morgenstein : > Subject: Re: [PATCH 3/3] librdmacm: add ability to get/set transport specific options > > On Tuesday 09 May 2006 19:25, Sean Hefty wrote: > > > > This is fine for route lookup, but not rdma_get_option. > > > I missed adding rdma_ib.c to my tag file, so missed the rdma_get_option usage > of local_sa.c -- sorry about that. > > My preference is to take the most recent rdma_cm, since there are several > important bug fixes since svn 6860; also, we avoid the ABI issue. > > Sounds like I should add a dummy "ib_local_sa.h" file to OFED kernel build, > with static-inlines for all functions exported by local_sa.c -- and have > these functions return -ENOSYS (as you suggested), and instead modify > cma_resolve_ib_route() as follows: > > Index: cma.c > =================================================================== > --- cma.c (revision 6980) > +++ cma.c (working copy) > @@ -1172,7 +1172,7 @@ static int cma_resolve_ib_route(struct r > route->num_paths = 1; > queue_work(cma_wq, &work->work); > } else { > - if (ret == -ENODATA) > + if (ret == -ENODATA || ret == -ENOSYS) > ret = cma_query_ib_route(id_priv, timeout_ms, work); > if (ret) > goto err2; > > - Jack > > P.S., > (maybe it would be a good idea to have this change in any event on the main > trunk, and not just for OFED?) Maybe just return -ENODATA? Then you don't need to modify any code ... -- MST From jackm at mellanox.co.il Wed May 10 00:38:31 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Wed, 10 May 2006 10:38:31 +0300 Subject: [openib-general] Re: [PATCH 3/3] librdmacm: add ability to get/set transport specific options In-Reply-To: <20060510071719.GN21036@mellanox.co.il> References: <200605100956.30583.jackm@mellanox.co.il> <20060510071719.GN21036@mellanox.co.il> Message-ID: <200605101038.31541.jackm@mellanox.co.il> On Wednesday 10 May 2006 10:17, Michael S. Tsirkin wrote: > Maybe just return -ENODATA? Then you don't need to modify any code ... Userspace rdma_get_option() will then also get -ENODATA. OK. We can, therefore, do the following: the dummy procedures in the dummy ib_local_sa.h file will return -ENODATA for all get operations and for ib_create_path_cursor(), and -ENOSYS for all set operations. Then, no changes will be needed (except for adding the dummy file ib_local_sa.h). Is this acceptable? - Jack From ogerlitz at voltaire.com Wed May 10 00:48:25 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 10 May 2006 10:48:25 +0300 Subject: [openib-general] Re: [PATCH] RE: compliancy issue? In-Reply-To: <20060510054313.GJ22825@mellanox.co.il> References: <20060508085301.GD20207@mellanox.co.il> <20060510052937.GH22825@mellanox.co.il> <20060510054313.GJ22825@mellanox.co.il> Message-ID: <44619AC9.2040402@voltaire.com> Michael S. Tsirkin wrote: >>> Can you try this simple patch and see if it fixes your problem? You will >>> need to call rdma_accept() or rdma_reject() after receiving a CONNECT_RESPONSE >>> event. The conn_param to rdma_accept() should be NULL. >>> >>> Signed-off-by: Sean Hefty >>> --- >>> Index: cma.c >>> =================================================================== >>> --- cma.c (revision 6948) >>> +++ cma.c (working copy) >>> @@ -778,7 +778,7 @@ static int cma_ib_handler(struct ib_cm_i >>> status = cma_verify_rep(id_priv, ib_event->private_data); >>> if (status) >>> event = RDMA_CM_EVENT_CONNECT_ERROR; >>> - else if (id_priv->id.qp) { >>> + else if (id_priv->id.qp && id_priv->id.ps != RDMA_PS_SDP) { >>> status = cma_rep_recv(id_priv); >>> event = status ? RDMA_CM_EVENT_CONNECT_ERROR : >>> RDMA_CM_EVENT_ESTABLISHED; >>> >> Would not cma_rep_recv be required to modify the QP? >> Anyway, I'll try. > > Hmm, it seems that QP will get modified to RTS on RTU only. > That's bad - I want to move it to RTS on REP, since receive completion > might cross RTU and I want to be able to respond to each send immediately. No, looking in the code shows that qp will be changed to rtr and then rts ***before*** sending the RTU since you will call rdma_accept which in turn will call cma_rep_recv Or. From mst at mellanox.co.il Wed May 10 01:05:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 11:05:33 +0300 Subject: [openib-general] Re: [PATCH] RE: compliancy issue? In-Reply-To: <44619AC9.2040402@voltaire.com> References: <20060508085301.GD20207@mellanox.co.il> <20060510052937.GH22825@mellanox.co.il> <20060510054313.GJ22825@mellanox.co.il> <44619AC9.2040402@voltaire.com> Message-ID: <20060510080532.GO21036@mellanox.co.il> Quoting r. Or Gerlitz : > >Hmm, it seems that QP will get modified to RTS on RTU only. > >That's bad - I want to move it to RTS on REP, since receive completion > >might cross RTU and I want to be able to respond to each send immediately. > > No, looking in the code shows that qp will be changed to rtr and then > rts ***before*** sending the RTU since you will call rdma_accept which > in turn will call cma_rep_recv Right, missed that, thanks! I was wandering why it was behaving not the way I expected it to :) -- MST From mst at mellanox.co.il Wed May 10 01:32:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 11:32:56 +0300 Subject: [openib-general] [PATCH] kill dead code in mthca_eq.c Message-ID: <20060510083256.GP21036@mellanox.co.il> Kill some dead code in mthca_eq.c Signed-off-by: Michael S. Tsirkin Index: linux-2.6.16/drivers/infiniband/hw/mthca/mthca_eq.c =================================================================== --- linux-2.6.16.orig/drivers/infiniband/hw/mthca/mthca_eq.c 2006-04-06 17:49:46.000000000 +0200 +++ linux-2.6.16/drivers/infiniband/hw/mthca/mthca_eq.c 2006-05-10 13:18:41.000000000 +0300 @@ -695,10 +695,6 @@ static void mthca_unmap_reg(struct mthca static int __devinit mthca_map_eq_regs(struct mthca_dev *dev) { - unsigned long mthca_base; - - mthca_base = pci_resource_start(dev->pdev, 0); - if (mthca_is_memfree(dev)) { /* * We assume that the EQ arm and EQ set CI registers -- MST From leonida at voltaire.com Wed May 10 01:32:34 2006 From: leonida at voltaire.com (Leonid Arsh) Date: Wed, 10 May 2006 11:32:34 +0300 Subject: [openib-general][RFC][PATCH] core/sysfs.c: ability to reset port counters In-Reply-To: References: <20060509120613.GA3294@voltaire.com> Message-ID: <10e223bf0605100132h40d88edfv9b756eceb519047c@mail.gmail.com> On 5/9/06, Roland Dreier wrote: > Why do you need this possibility? Having counters reset locally is > going to confuse any performance manager running on the fabric that > might be reading the counters remotely. > > Is it really a must for this to be done by writing to sysfs? Couldn't > a trivial userspace app send a MAD locally to reset them? We need it for diagnostic purposes. The capability to reset counters locally on hosts helped us much in large network/cluster diagnostics. Often, in case of fabric problems, we can only access the counters locally. A user space application is an option too, although I think it's nice to have a 'built in' kernel feature. From tziporet at mellanox.co.il Wed May 10 02:10:38 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 10 May 2006 12:10:38 +0300 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> Message-ID: <4461AE0E.8070205@mellanox.co.il> Roland Dreier wrote: > It seems good. I should check it into svn today. I still need to > figure out how I want to put it on the for-2.6.18 git branch, since it > depends on some earlier fixes that Linus hasn't pulled yet. > > - R. > > Thanks, frankly I do not care about 2.6.18 git for now since OFED is build on 2.6.17 + patches from trunk. Tziporet From zhushisongzhu at yahoo.com Wed May 10 02:51:22 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Wed, 10 May 2006 02:51:22 -0700 (PDT) Subject: [openib-general] Re: sdp can't support many connections (>2000) In-Reply-To: <20060509121300.GC21036@mellanox.co.il> Message-ID: <20060510095122.18845.qmail@web36904.mail.mud.yahoo.com> I can't get the latest source from " svn co https://openfabrics.org/svn/gen2" in one whole day, it's so slow. Do you think the lastest source solve the problem? Or can you test sdp for > 2000 concurrent connections? tks zhu --- "Michael S. Tsirkin" wrote: > Quoting r. zhu shi song : > > Subject: Re: sdp can't support many connections > (>2000) > > > > ab send the request to squid cache server running > on > > Machine B. Then squid send the real request to > google > > website. > > So how can I upgrade my version to solve the > > problem? > > > > zhu > > Try getting latest stack snapshot from svn. > > -- > MST > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From Thomas.Talpey at netapp.com Wed May 10 03:49:28 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Wed, 10 May 2006 06:49:28 -0400 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> Message-ID: <7.0.1.0.2.20060510063730.04336f80@netapp.com> At 03:12 PM 5/9/2006, Roland Dreier wrote: >BTW, does Mellanox (or anyone else) have any numbers showing that >using FMRs makes any difference in performance on a semi-realistic benchmark? Not me. Using the current FMRs to register/deregister windows for each NFS/RDMA operation yields only a slight performance improvement over ib_reg_phys_mr(), and I suspect this is mainly from the fact that FMRs are page-rounded. Additionally, I find that the queuepair (or perhaps the completion queue) seems to hang unpredictably, new events get stuck, only to flush after the upper layer times out and closes the connection. What I really don't like about the current FMRs is that they seem to be optimized only for lazy-deregistration, the fmr pools attempt to defer the deregistration somewhat indefinitely. This is an enormous security hole, and pretty much defeats the point of dynamic registration. The NFS/RDMA client has full-physical mode for users that want speed in well-protected environments. And it's a LOT faster. I am planning to test this some more in the next few weeks, but what I'd really like to see is an IBTA 1.2-compliant implementation, and one that operated on work queue entries (not synchronous verbs). Is that being worked on? Tom. From halr at voltaire.com Wed May 10 03:48:35 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 May 2006 06:48:35 -0400 Subject: [openib-general] [PATCH] OpenSM: Add SA client API for MultiPathRecord queries In-Reply-To: <1147194536.4485.22684.camel@hal.voltaire.com> References: <1147194536.4485.22684.camel@hal.voltaire.com> Message-ID: <1147258114.4485.40207.camel@hal.voltaire.com> On Tue, 2006-05-09 at 13:08, Hal Rosenstock wrote: > OpenSM: Add SA client API for MultiPathRecord queries > > Signed-off-by: Hal Rosenstock Applied to trunk only. -- Hal From Thomas.Talpey at netapp.com Wed May 10 03:53:04 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Wed, 10 May 2006 06:53:04 -0400 Subject: [openib-general] ip over ib throughtput In-Reply-To: References: <7.0.1.0.2.20060509213436.07b2c928@netapp.com> Message-ID: <7.0.1.0.2.20060510065006.07b2c928@netapp.com> At 11:13 PM 5/9/2006, Shirley Ma wrote: >Have you tried to send payload smaller than 2044? Any difference? You mean MTU or ULP payload? The default NFS reads and writes are 32KB, and in the addressing mode used in these tests they were broken into 8 page-sized RDMA ops. So, there were 9 ops from the server, per NFS read. I used the default MTU so these were probably 19 messages on the wire. I don't expect much difference with smaller MTU, but smaller NFS ops would be noticeable. Tom. From halr at voltaire.com Wed May 10 04:08:07 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 May 2006 07:08:07 -0400 Subject: [openib-general] [PATCH] osmtest: Add rudimentary SA MultiPathRecord tests In-Reply-To: <1147195352.4485.22894.camel@hal.voltaire.com> References: <1147195352.4485.22894.camel@hal.voltaire.com> Message-ID: <1147258116.4485.40209.camel@hal.voltaire.com> On Tue, 2006-05-09 at 13:22, Hal Rosenstock wrote: > osmtest: Add rudimentary SA MultiPathRecord tests > > Signed-off-by: Hal Rosenstock Applied to trunk only. -- Hal From mst at mellanox.co.il Wed May 10 04:13:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 14:13:55 +0300 Subject: [openib-general] Re: sdp can't support many connections (>2000) In-Reply-To: <20060510095122.18845.qmail@web36904.mail.mud.yahoo.com> References: <20060509121300.GC21036@mellanox.co.il> <20060510095122.18845.qmail@web36904.mail.mud.yahoo.com> Message-ID: <20060510111355.GR21036@mellanox.co.il> Quoting r. zhu shi song : > Subject: Re: sdp can't support many connections (>2000) > > I can't get the latest source from " > svn co https://openfabrics.org/svn/gen2" in one whole > day, it's so slow. I use openib.org/svn/gen2 but I expect its just a redirection. Hmm. We'll be putting up a tarball about Monday I think. > Do you think the lastest source solve the problem? It should. > Or > can you test sdp for > 2000 concurrent connections? > tks > zhu I'll try to go test it around next week, busy now. > --- "Michael S. Tsirkin" wrote: > > > Quoting r. zhu shi song : > > > Subject: Re: sdp can't support many connections > > (>2000) > > > > > > ab send the request to squid cache server running > > on > > > Machine B. Then squid send the real request to > > google > > > website. > > > So how can I upgrade my version to solve the > > > problem? > > > > > > zhu > > > > Try getting latest stack snapshot from svn. > > > > -- > > MST > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > -- MST From halr at voltaire.com Wed May 10 04:11:23 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 May 2006 07:11:23 -0400 Subject: [openib-general] Re: [PATCH 1/3] opensm: low-level QoS configuration In-Reply-To: <20060509181548.14584.39036.stgit@sashak.voltaire.com> References: <20060509180059.14584.31483.stgit@sashak.voltaire.com> <20060509181548.14584.39036.stgit@sashak.voltaire.com> Message-ID: <1147259285.4485.40546.camel@hal.voltaire.com> On Tue, 2006-05-09 at 14:15, Sasha Khapyorsky wrote: > Trivial low-level QoS configuration parameters description, definition > and processing. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only. -- Hal From mst at mellanox.co.il Wed May 10 04:23:03 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 14:23:03 +0300 Subject: [openib-general] [PATCH] librdmacm abi version Message-ID: <20060510112303.GS21036@mellanox.co.il> Sean, could you take a look please? I think it makes sense to make same librdmacm work on as many kernels as possible, so that it's seamless for people to switch kernels. And since backport users install kernel modules or patches anyway, I think it's reasonable to ask them to always install the latest version. --- On kernels 2.6.9 and back, we didn't find a way to add sysfs attributes to misc devices. If the abi version file does not exist, assume latest ABI to make it possible to use librdmacm on such systems. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: latest/src/userspace/librdmacm/src/cma.c =================================================================== --- latest.orig/src/userspace/librdmacm/src/cma.c (revision 7031) +++ latest/src/userspace/librdmacm/src/cma.c (working copy) @@ -119,7 +119,7 @@ static struct ibv_device **dev_list; static struct dlist *cma_dev_list; static pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; static int ucma_initialized; -static int abi_ver; +static int abi_ver = RDMA_USER_CM_MAX_ABI_VERSION; #define container_of(ptr, type, field) \ ((type *) ((void *)ptr - offsetof(type, field))) ----- End forwarded message ----- -- MST From halr at voltaire.com Wed May 10 04:28:06 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 May 2006 07:28:06 -0400 Subject: [openib-general] Re: [PATCH 2/3] opensm: basic QoS implementation In-Reply-To: <20060509181550.14584.19176.stgit@sashak.voltaire.com> References: <20060509180059.14584.31483.stgit@sashak.voltaire.com> <20060509181550.14584.19176.stgit@sashak.voltaire.com> Message-ID: <1147259288.4485.40548.camel@hal.voltaire.com> On Tue, 2006-05-09 at 14:15, Sasha Khapyorsky wrote: > Basic low-level QoS implementation. The main procedure (osm_qos_setup()) > will be called from resweeper (after configuration refreshing). And > then this will setup low level QoS related ports' attributes > (PortInfo:VLHighLimit, VL*Arbitration and SL2VLMapping tables). > Different port categories (HCA, switch external ports and switch port 0) > will be updated according to provided configurations. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only. -- Hal From dotanb at mellanox.co.il Wed May 10 05:22:42 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Wed, 10 May 2006 15:22:42 +0300 Subject: [openib-general] there is a compilation warning in librdmacm Message-ID: <200605101522.42890.dotanb@mellanox.co.il> There is a compilation warning in the file: src/userspace/librdmacm/src/cma.c. here is the warning: gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I../libibverbs/include -g -Wall -D_GNU_SOURCE -g -O2 -MT cma.lo -MD -MP -MF .deps/cma.Tpo -c s rc/cma.c -fPIC -DPIC -o .libs/cma.o src/cma.c: In function `rdma_destroy_event_channel': src/cma.c:259: warning: control reaches end of non-void function and here is the problematic code: int rdma_destroy_event_channel(struct rdma_event_channel *channel) { close(channel->fd); free(channel); } (i didn't send a patch to fix this because i don't know if you want to return 0 or change the return value of the function to void) thanks Dotan From dotanb at mellanox.co.il Wed May 10 05:31:04 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Wed, 10 May 2006 15:31:04 +0300 Subject: [openib-general] there is a compilation warning in the diags tools Message-ID: <200605101531.04954.dotanb@mellanox.co.il> In the driver: openib_gen2-20060510-1209 (REV=7038), there is a compilation warning in the file: src/userspace/management/diags/src/grouping.c src/grouping.c:171: warning: 'is_chassis_switch' defined but not used /bin/sh ./libtool --tag=CC --mode=link gcc -g -O2 -L../libibcommon -libcommon -L../libibumad -libumad -L../osm/opensm/.libs -lopensm -L../ osm/libvendor/.libs -losmvendor -L../osm/complib/.libs -losmcomp -o src/ibnetdiscover src_ibnetdiscover-ibnetdiscover.o src_ibnetdiscover -grouping.o ../libibcommon/libibcommon.la ../libibumad/libibumad.la ../libibmad/libibmad.la thanks Dotan From halr at voltaire.com Wed May 10 05:33:18 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 May 2006 08:33:18 -0400 Subject: [openib-general] Re: [PATCH 3/3] opensm: no_qos global option In-Reply-To: <20060509181552.14584.14666.stgit@sashak.voltaire.com> References: <20060509180059.14584.31483.stgit@sashak.voltaire.com> <20060509181552.14584.14666.stgit@sashak.voltaire.com> Message-ID: <1147264397.4485.42104.camel@hal.voltaire.com> On Tue, 2006-05-09 at 14:15, Sasha Khapyorsky wrote: > This new option '--no_qos' (or '-O') ^^ -Q > will disable QoS setup globally in > OpenSM. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk. -- Hal From halr at voltaire.com Wed May 10 05:45:16 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 May 2006 08:45:16 -0400 Subject: [openib-general] [PATCH] OpenSM/osmtest: Add rudimentary SA GuidInfoRecord test Message-ID: <1147265116.4485.42334.camel@hal.voltaire.com> OpenSM/osmtest: Add rudimentary SA GuidInfoRecord test Signed-off-by: Hal Rosenstock Index: osmtest/osmtest.c =================================================================== --- osmtest/osmtest.c (revision 7043) +++ osmtest/osmtest.c (working copy) @@ -4055,6 +4055,59 @@ osmtest_validate_all_node_recs( IN osmte #endif #ifdef VENDOR_RMPP_SUPPORT +static ib_api_status_t +osmtest_validate_all_guidinfo_recs( IN osmtest_t * const p_osmt ) +{ + osmtest_req_context_t context; + const ib_guidinfo_record_t *p_rec; + cl_status_t status; + size_t num_recs; + + OSM_LOG_ENTER( &p_osmt->log, osmtest_validate_all_guidinfo_recs ); + + cl_memclr( &context, sizeof( context ) ); + + /* + * Do a blocking query for all GuidInfoRecords in the subnet. + */ + status = osmtest_get_all_recs( p_osmt, IB_MAD_ATTR_GUIDINFO_RECORD, + sizeof( *p_rec ), &context ); + + + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_validate_all_guidinfo_recs: ERR 0099: " + "osmtest_get_all_recs failed (%s)\n", + ib_get_err_str( status ) ); + goto Exit; + } + + num_recs = context.result.result_cnt; + + if( osm_log_is_active( &p_osmt->log, OSM_LOG_VERBOSE ) ) + { + osm_log( &p_osmt->log, OSM_LOG_VERBOSE, + "osmtest_validate_all_guidinfo_recs: " + "Received %u records\n", num_recs ); + } + + /* No validation as yet */ + + Exit: + /* + * Return the IB query MAD to the pool as necessary. + */ + if( context.result.p_result_madw != NULL ) + { + osm_mad_pool_put( &p_osmt->mad_pool, context.result.p_result_madw ); + context.result.p_result_madw = NULL; + } + + OSM_LOG_EXIT( &p_osmt->log ); + return ( status ); +} + /********************************************************************** **********************************************************************/ static ib_api_status_t @@ -4738,6 +4791,12 @@ osmtest_validate_against_db( IN osmtest_ if( status != IB_SUCCESS ) goto Exit; +#ifdef VENDOR_RMPP_SUPPORT + status = osmtest_validate_all_guidinfo_recs( p_osmt ); + if( status != IB_SUCCESS ) + goto Exit; +#endif + #if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) cl_memclr( &context, sizeof( context ) ); cl_memclr( &request, sizeof( request ) ); From halr at voltaire.com Wed May 10 05:49:24 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 May 2006 08:49:24 -0400 Subject: [openib-general] Re: there is a compilation warning in the diags tools In-Reply-To: <200605101531.04954.dotanb@mellanox.co.il> References: <200605101531.04954.dotanb@mellanox.co.il> Message-ID: <1147265363.4485.42420.camel@hal.voltaire.com> On Wed, 2006-05-10 at 08:31, Dotan Barak wrote: > In the driver: openib_gen2-20060510-1209 (REV=7038), there is a compilation warning in the file: src/userspace/management/diags/src/grouping.c > > src/grouping.c:171: warning: 'is_chassis_switch' defined but not used > /bin/sh ./libtool --tag=CC --mode=link gcc -g -O2 -L../libibcommon -libcommon -L../libibumad -libumad -L../osm/opensm/.libs -lopensm -L../ > osm/libvendor/.libs -losmvendor -L../osm/complib/.libs -losmcomp -o src/ibnetdiscover src_ibnetdiscover-ibnetdiscover.o src_ibnetdiscover > -grouping.o ../libibcommon/libibcommon.la ../libibumad/libibumad.la ../libibmad/libibmad.la Thanks. Fixed in r7056. -- Hal > thanks > Dotan From eitan at mellanox.co.il Wed May 10 06:09:01 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 10 May 2006 16:09:01 +0300 Subject: [openib-general] RE: [PATCH] OpenSM/osmtest: Add rudimentary SA GuidInfoRecord test Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB47@mtlexch01.mtl.com> Cool. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, May 10, 2006 3:45 PM > To: openib-general at openib.org > Cc: Eitan Zahavi; Yael Kalka; Ofer Gigi > Subject: [PATCH] OpenSM/osmtest: Add rudimentary SA GuidInfoRecord test > > OpenSM/osmtest: Add rudimentary SA GuidInfoRecord test > > Signed-off-by: Hal Rosenstock > > Index: osmtest/osmtest.c > =================================================================== > --- osmtest/osmtest.c (revision 7043) > +++ osmtest/osmtest.c (working copy) > @@ -4055,6 +4055,59 @@ osmtest_validate_all_node_recs( IN osmte > #endif > > #ifdef VENDOR_RMPP_SUPPORT > +static ib_api_status_t > +osmtest_validate_all_guidinfo_recs( IN osmtest_t * const p_osmt ) > +{ > + osmtest_req_context_t context; > + const ib_guidinfo_record_t *p_rec; > + cl_status_t status; > + size_t num_recs; > + > + OSM_LOG_ENTER( &p_osmt->log, osmtest_validate_all_guidinfo_recs ); > + > + cl_memclr( &context, sizeof( context ) ); > + > + /* > + * Do a blocking query for all GuidInfoRecords in the subnet. > + */ > + status = osmtest_get_all_recs( p_osmt, IB_MAD_ATTR_GUIDINFO_RECORD, > + sizeof( *p_rec ), &context ); > + > + > + if( status != IB_SUCCESS ) > + { > + osm_log( &p_osmt->log, OSM_LOG_ERROR, > + "osmtest_validate_all_guidinfo_recs: ERR 0099: " > + "osmtest_get_all_recs failed (%s)\n", > + ib_get_err_str( status ) ); > + goto Exit; > + } > + > + num_recs = context.result.result_cnt; > + > + if( osm_log_is_active( &p_osmt->log, OSM_LOG_VERBOSE ) ) > + { > + osm_log( &p_osmt->log, OSM_LOG_VERBOSE, > + "osmtest_validate_all_guidinfo_recs: " > + "Received %u records\n", num_recs ); > + } > + > + /* No validation as yet */ > + > + Exit: > + /* > + * Return the IB query MAD to the pool as necessary. > + */ > + if( context.result.p_result_madw != NULL ) > + { > + osm_mad_pool_put( &p_osmt->mad_pool, context.result.p_result_madw ); > + context.result.p_result_madw = NULL; > + } > + > + OSM_LOG_EXIT( &p_osmt->log ); > + return ( status ); > +} > + > /********************************************************************** > **********************************************************************/ > static ib_api_status_t > @@ -4738,6 +4791,12 @@ osmtest_validate_against_db( IN osmtest_ > if( status != IB_SUCCESS ) > goto Exit; > > +#ifdef VENDOR_RMPP_SUPPORT > + status = osmtest_validate_all_guidinfo_recs( p_osmt ); > + if( status != IB_SUCCESS ) > + goto Exit; > +#endif > + > #if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) > cl_memclr( &context, sizeof( context ) ); > cl_memclr( &request, sizeof( request ) ); > From ogerlitz at voltaire.com Wed May 10 06:20:30 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 10 May 2006 16:20:30 +0300 (IDT) Subject: [openib-general] [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator Message-ID: Roland, The patch series that follows contains the iSER code which we want to submit upstream for 2.6.18, fixed with the comments which we got in the previous post. LKML reviewers are reminded to CC openib-general at openib.org on your responses. Below is a log and diffstat over the changes from the previous post which is archived @ http://openib.org/pipermail/openib-general/2006-April/020616.html To have this code compiled you would need to get the iscsi updates for 2.6.18 into your source tree, that is pull/sync with include/scsi and drivers/scsi of the scsi-misc-2.6 git tree. There's one patch which is not yet merged there and without it iser's compilation fails. The patch is named "iscsi: add transport end point callbacks" and i will send it to you offlist. + use direct BUG_ON & BUG calls instead of the iser_bug macro + removed usage of SVN keywords such as $LastChangedDate and $Rev + few fixes related to the managment of the ib conn list + two fixes for checks done at the ib conn state machine flow + changed iser ib conn state management to be done with an int variable keeping the state and a lock. When a related race is possible the lock is used to check (comp) or change (comp_exch) the state. When no race can happen the state is just examined or changed. + always call rdma_disconnect in iser_conn_terminate such the CMA will move the QP state to ERROR and we will get the FLUSHES on all the pending RX/TX WRs + make iser_free_device_ib_res void, change the out goto label name of iser_device_find_by_ib_device + some whitespacing cleanups Makefile | 4 - iscsi_iser.c | 18 ++---- iscsi_iser.h | 21 +++---- iser_initiator.c | 24 ++++----- iser_memory.c | 12 +--- iser_verbs.c | 145 +++++++++++++++++++++++++++++++------------------------ 6 files changed, 120 insertions(+), 104 deletions(-) Or. From ogerlitz at voltaire.com Wed May 10 06:20:56 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 10 May 2006 16:20:56 +0300 (IDT) Subject: [openib-general] [PATCH 1/6] iscsi_iser header file In-Reply-To: Message-ID: --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iscsi_iser.h 1970-01-01 02:00:00.000000000 +0200 +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iscsi_iser.h 2006-05-10 15:32:01.000000000 +0300 @@ -0,0 +1,354 @@ +/* + * iSER transport for the Open iSCSI Initiator & iSER transport internals + * + * Copyright (C) 2004 Dmitry Yusupov + * Copyright (C) 2004 Alex Aizman + * Copyright (C) 2005 Mike Christie + * based on code maintained by open-iscsi at googlegroups.com + * + * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2005, 2006 Cisco Systems. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: iscsi_iser.h 7051 2006-05-10 12:29:11Z ogerlitz $ + */ +#ifndef __ISCSI_ISER_H__ +#define __ISCSI_ISER_H__ + +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include +#include + +#define DRV_NAME "iser" +#define PFX DRV_NAME ": " +#define DRV_VER "0.1" +#define DRV_DATE "May 7th, 2006" + +#define iser_dbg(fmt, arg...) \ + do { \ + if (iser_debug_level > 0) \ + printk(KERN_DEBUG PFX "%s:" fmt,\ + __func__ , ## arg); \ + } while (0) + +#define iser_err(fmt, arg...) \ + do { \ + printk(KERN_ERR PFX "%s:" fmt, \ + __func__ , ## arg); \ + } while (0) + + /* support upto 512KB in one RDMA */ +#define ISCSI_ISER_SG_TABLESIZE (0x80000 >> PAGE_SHIFT) +#define ISCSI_ISER_MAX_LUN 256 +#define ISCSI_ISER_MAX_CMD_LEN 16 + +/* QP settings */ +/* Maximal bounds on received asynchronous PDUs */ +#define ISER_MAX_RX_MISC_PDUS 4 /* NOOP_IN(2) , ASYNC_EVENT(2) */ + +#define ISER_MAX_TX_MISC_PDUS 6 /* NOOP_OUT(2), TEXT(1), * + * SCSI_TMFUNC(2), LOGOUT(1) */ + +#define ISER_QP_MAX_RECV_DTOS (ISCSI_XMIT_CMDS_MAX + \ + ISER_MAX_RX_MISC_PDUS + \ + ISER_MAX_TX_MISC_PDUS) + +/* the max TX (send) WR supported by the iSER QP is defined by * + * max_send_wr = T * (1 + D) + C ; D is how many inflight dataouts we expect * + * to have at max for SCSI command. The tx posting & completion handling code * + * supports -EAGAIN scheme where tx is suspended till the QP has room for more * + * send WR. D=8 comes from 64K/8K */ + +#define ISER_INFLIGHT_DATAOUTS 8 + +#define ISER_QP_MAX_REQ_DTOS (ISCSI_XMIT_CMDS_MAX * \ + (1 + ISER_INFLIGHT_DATAOUTS) + \ + ISER_MAX_TX_MISC_PDUS + \ + ISER_MAX_RX_MISC_PDUS) + +#define ISER_VER 0x10 +#define ISER_WSV 0x08 +#define ISER_RSV 0x04 + +struct iser_hdr { + u8 flags; + u8 rsvd[3]; + __be32 write_stag; /* write rkey */ + __be64 write_va; + __be32 read_stag; /* read rkey */ + __be64 read_va; +} __attribute__((packed)); + + +/* Length of an object name string */ +#define ISER_OBJECT_NAME_SIZE 64 + +enum iser_ib_conn_state { + ISER_CONN_INIT, /* descriptor allocd, no conn */ + ISER_CONN_PENDING, /* in the process of being established */ + ISER_CONN_UP, /* up and running */ + ISER_CONN_TERMINATING, /* in the process of being terminated */ + ISER_CONN_DOWN, /* shut down */ + ISER_CONN_STATES_NUM +}; + +enum iser_task_status { + ISER_TASK_STATUS_INIT = 0, + ISER_TASK_STATUS_STARTED, + ISER_TASK_STATUS_COMPLETED +}; + +enum iser_data_dir { + ISER_DIR_IN = 0, /* to initiator */ + ISER_DIR_OUT, /* from initiator */ + ISER_DIRS_NUM +}; + +struct iser_data_buf { + void *buf; /* pointer to the sg list */ + unsigned int size; /* num entries of this sg */ + unsigned long data_len; /* total data len */ + unsigned int dma_nents; /* returned by dma_map_sg */ + char *copy_buf; /* allocated copy buf for SGs unaligned * + * for rdma which are copied */ + struct scatterlist sg_single; /* SG-ified clone of a non SG SC or * + * unaligned SG */ + }; + +/* fwd declarations */ +struct iser_device; +struct iscsi_iser_conn; +struct iscsi_iser_cmd_task; + +struct iser_mem_reg { + u32 lkey; + u32 rkey; + u64 va; + u64 len; + void *mem_h; +}; + +struct iser_regd_buf { + struct iser_mem_reg reg; /* memory registration info */ + void *virt_addr; + struct iser_device *device; /* device->device for dma_unmap */ + dma_addr_t dma_addr; /* if non zero, addr for dma_unmap */ + enum dma_data_direction direction; /* direction for dma_unmap */ + unsigned int data_size; + atomic_t ref_count; /* refcount, freed when dec to 0 */ +}; + +#define MAX_REGD_BUF_VECTOR_LEN 2 + +struct iser_dto { + struct iscsi_iser_cmd_task *ctask; + struct iscsi_iser_conn *conn; + int notify_enable; + + /* vector of registered buffers */ + unsigned int regd_vector_len; + struct iser_regd_buf *regd[MAX_REGD_BUF_VECTOR_LEN]; + + /* offset into the registered buffer may be specified */ + unsigned int offset[MAX_REGD_BUF_VECTOR_LEN]; + + /* a smaller size may be specified, if 0, then full size is used */ + unsigned int used_sz[MAX_REGD_BUF_VECTOR_LEN]; +}; + +enum iser_desc_type { + ISCSI_RX, + ISCSI_TX_CONTROL , + ISCSI_TX_SCSI_COMMAND, + ISCSI_TX_DATAOUT +}; + +struct iser_desc { + struct iser_hdr iser_header; + struct iscsi_hdr iscsi_header; + struct iser_regd_buf hdr_regd_buf; + void *data; /* used by RX & TX_CONTROL */ + struct iser_regd_buf data_regd_buf; /* used by RX & TX_CONTROL */ + enum iser_desc_type type; + struct iser_dto dto; +}; + +struct iser_device { + struct ib_device *ib_device; + struct ib_pd *pd; + struct ib_cq *cq; + struct ib_mr *mr; + struct tasklet_struct cq_tasklet; + struct list_head ig_list; /* entry in ig devices list */ + int refcount; +}; + +struct iser_conn { + struct iscsi_iser_conn *iser_conn; /* iser conn for upcalls */ + enum iser_ib_conn_state state; /* rdma connection state */ + spinlock_t lock; /* used for state changes */ + struct iser_device *device; /* device context */ + struct rdma_cm_id *cma_id; /* CMA ID */ + struct ib_qp *qp; /* QP */ + struct ib_fmr_pool *fmr_pool; /* pool of IB FMRs */ + int disc_evt_flag; /* disconn event delivered */ + wait_queue_head_t wait; /* waitq for conn/disconn */ + atomic_t post_recv_buf_count; /* posted rx count */ + atomic_t post_send_buf_count; /* posted tx count */ + struct work_struct comperror_work; /* conn term sleepable ctx*/ + char name[ISER_OBJECT_NAME_SIZE]; + struct iser_page_vec *page_vec; /* represents SG to fmr maps* + * maps serialized as tx is*/ + struct list_head conn_list; /* entry in ig conn list */ +}; + +struct iscsi_iser_conn { + struct iscsi_conn *iscsi_conn;/* ptr to iscsi conn */ + struct iser_conn *ib_conn; /* iSER IB conn */ + + rwlock_t lock; +}; + +struct iscsi_iser_cmd_task { + struct iser_desc desc; + struct iscsi_iser_conn *iser_conn; + int rdma_data_count;/* RDMA bytes */ + enum iser_task_status status; + int command_sent; /* set if command sent */ + int dir[ISER_DIRS_NUM]; /* set if dir use*/ + struct iser_regd_buf rdma_regd[ISER_DIRS_NUM];/* regd rdma buf */ + struct iser_data_buf data[ISER_DIRS_NUM]; /* orig. data des*/ + struct iser_data_buf data_copy[ISER_DIRS_NUM];/* contig. copy */ +}; + +struct iser_page_vec { + u64 *pages; + int length; + int offset; + int data_size; +}; + +struct iser_global { + struct mutex device_list_mutex;/* */ + struct list_head device_list; /* all iSER devices */ + struct mutex connlist_mutex; + struct list_head connlist; /* all iSER IB connections */ + + kmem_cache_t *desc_cache; +}; + +extern struct iser_global ig; +extern int iser_debug_level; + +/* allocate connection resources needed for rdma functionality */ +int iser_conn_set_full_featured_mode(struct iscsi_conn *conn); + +int iser_send_control(struct iscsi_conn *conn, + struct iscsi_mgmt_task *mtask); + +int iser_send_command(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask); + +int iser_send_data_out(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask, + struct iscsi_data *hdr); + +void iscsi_iser_recv(struct iscsi_conn *conn, + struct iscsi_hdr *hdr, + char *rx_data, + int rx_data_len); + +int iser_conn_init(struct iser_conn **ib_conn); + +void iser_conn_terminate(struct iser_conn *ib_conn); + +void iser_conn_release(struct iser_conn *ib_conn); + +void iser_rcv_completion(struct iser_desc *desc, + unsigned long dto_xfer_len); + +void iser_snd_completion(struct iser_desc *desc); + +void iser_ctask_rdma_init(struct iscsi_iser_cmd_task *ctask); + +void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *ctask); + +void iser_dto_buffs_release(struct iser_dto *dto); + +int iser_regd_buff_release(struct iser_regd_buf *regd_buf); + +void iser_reg_single(struct iser_device *device, + struct iser_regd_buf *regd_buf, + enum dma_data_direction direction); + +int iser_start_rdma_unaligned_sg(struct iscsi_iser_cmd_task *ctask, + enum iser_data_dir cmd_dir); + +void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_cmd_task *ctask, + enum iser_data_dir cmd_dir); + +int iser_reg_rdma_mem(struct iscsi_iser_cmd_task *ctask, + enum iser_data_dir cmd_dir); + +int iser_connect(struct iser_conn *ib_conn, + struct sockaddr_in *src_addr, + struct sockaddr_in *dst_addr, + int non_blocking); + +int iser_reg_page_vec(struct iser_conn *ib_conn, + struct iser_page_vec *page_vec, + struct iser_mem_reg *mem_reg); + +void iser_unreg_mem(struct iser_mem_reg *mem_reg); + +int iser_post_recv(struct iser_desc *rx_desc); +int iser_post_send(struct iser_desc *tx_desc); + +int iser_conn_state_comp(struct iser_conn *ib_conn, + enum iser_ib_conn_state comp); +#endif From pradeep at us.ibm.com Wed May 10 06:23:49 2006 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Wed, 10 May 2006 06:23:49 -0700 Subject: [openib-general] [PATCH 07/16] ehca: interrupt handling routines In-Reply-To: <75CCC04D-06EF-48B6-BE76-8BFAA541A764@kernel.crashing.org> Message-ID: openib-general-bounces at openib.org wrote on 05/09/2006 04:35:57 PM: > > Heiko> Yes, I agree. It would not be an optimal solution, because > > Heiko> other upper level protocols (e.g. SDP, SRP, etc.) or > > Heiko> userspace verbs would not be affected by this > > Heiko> changes. Nevertheless, how can an improved "scaling" or > > Heiko> "SMP" version of IPoIB look like. How could it be > > Heiko> implemented? > > > > The trivial way to do it would be to use the same idea as the current > > ehca driver: just create a thread for receive CQ events and a thread > > for send CQ events, and defer CQ polling into those two threads. > > > > Something even better may be possible by specializing to IPoIB of > > course. > > The hardware IRQ should go to some CPU close to the hardware itself. > The > softirq (or whatever else) should go to the same CPU that is handling > the > user-level task for that message. Or a CPU close to it, at least. > I believe softirqs have a strong CPU affinity and will execute on the same CPU that handled the hard irq. Pradeep pradeep at us.ibm.com > > Segher > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Wed May 10 06:21:21 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 10 May 2006 16:21:21 +0300 (IDT) Subject: [openib-general] [PATCH 2/6] open iscsi iser transport provider code In-Reply-To: Message-ID: --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iscsi_iser.c 1970-01-01 02:00:00.000000000 +0200 +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iscsi_iser.c 2006-05-10 15:32:01.000000000 +0300 @@ -0,0 +1,794 @@ +/* + * iSCSI Initiator over iSER Data-Path + * + * Copyright (C) 2004 Dmitry Yusupov + * Copyright (C) 2004 Alex Aizman + * Copyright (C) 2005 Mike Christie + * Copyright (c) 2005, 2006 Voltaire, Inc. All rights reserved. + * maintained by openib-general at openib.org + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * Credits: + * Christoph Hellwig + * FUJITA Tomonori + * Arne Redlich + * Zhenyu Wang + * Modified by: + * Erez Zilber + * + * + * $Id: iscsi_iser.c 6965 2006-05-07 11:36:20Z ogerlitz $ + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "iscsi_iser.h" + +static unsigned int iscsi_max_lun = 512; +module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO); + +int iser_debug_level = 0; + +MODULE_DESCRIPTION("iSER (iSCSI Extensions for RDMA) Datamover " + "v" DRV_VER " (" DRV_DATE ")"); +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_AUTHOR("Alex Nezhinsky, Dan Bar Dov, Or Gerlitz"); + +module_param_named(debug_level, iser_debug_level, int, 0644); +MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0 (default:disabled)"); + +struct iser_global ig; + +void +iscsi_iser_recv(struct iscsi_conn *conn, + struct iscsi_hdr *hdr, char *rx_data, int rx_data_len) +{ + int rc = 0; + uint32_t ret_itt; + int datalen; + int ahslen; + + /* verify PDU length */ + datalen = ntoh24(hdr->dlength); + if (datalen != rx_data_len) { + printk(KERN_ERR "iscsi_iser: datalen %d (hdr) != %d (IB) \n", + datalen, rx_data_len); + rc = ISCSI_ERR_DATALEN; + goto error; + } + + /* read AHS */ + ahslen = hdr->hlength * 4; + + /* verify itt (itt encoding: age+cid+itt) */ + rc = iscsi_verify_itt(conn, hdr, &ret_itt); + + if (!rc) + rc = iscsi_complete_pdu(conn, hdr, rx_data, rx_data_len); + + if (rc && rc != ISCSI_ERR_NO_SCSI_CMD) + goto error; + + return; +error: + iscsi_conn_failure(conn, rc); +} + + +/** + * iscsi_iser_cmd_init - Initialize iSCSI SCSI_READ or SCSI_WRITE commands + * + **/ +static void +iscsi_iser_cmd_init(struct iscsi_cmd_task *ctask) +{ + struct iscsi_iser_conn *iser_conn = ctask->conn->dd_data; + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct scsi_cmnd *sc = ctask->sc; + + iser_ctask->command_sent = 0; + iser_ctask->iser_conn = iser_conn; + + if (sc->sc_data_direction == DMA_TO_DEVICE) { + BUG_ON(ctask->total_length == 0); + /* bytes to be sent via RDMA operations */ + iser_ctask->rdma_data_count = ctask->total_length - + ctask->imm_count - + ctask->unsol_count; + + debug_scsi("cmd [itt %x total %d imm %d imm_data %d " + "rdma_data %d]\n", + ctask->itt, ctask->total_length, ctask->imm_count, + ctask->unsol_count, ctask->rdma_data_count); + } else + /* bytes to be sent via RDMA operations */ + iser_ctask->rdma_data_count = ctask->total_length; + + iser_ctask_rdma_init(iser_ctask); +} + +/** + * iscsi_mtask_xmit - xmit management(immediate) task + * @conn: iscsi connection + * @mtask: task management task + * + * Notes: + * The function can return -EAGAIN in which case caller must + * call it again later, or recover. '0' return code means successful + * xmit. + * + **/ +static int +iscsi_iser_mtask_xmit(struct iscsi_conn *conn, + struct iscsi_mgmt_task *mtask) +{ + int error = 0; + + debug_scsi("mtask deq [cid %d itt 0x%x]\n", conn->id, mtask->itt); + + error = iser_send_control(conn, mtask); + + /* since iser xmits control with zero copy, mtasks can not be recycled + * right after sending them. + * The recycling scheme is based on whether a response is expected + * - if yes, the mtask is recycled at iscsi_complete_pdu + * - if no, the mtask is recycled at iser_snd_completion + */ + if (error && error != -EAGAIN) + iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); + + return error; +} + +static int +iscsi_iser_ctask_xmit_unsol_data(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask) +{ + struct iscsi_data hdr; + int error = 0; + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + + /* Send data-out PDUs while there's still unsolicited data to send */ + while (ctask->unsol_count > 0) { + iscsi_prep_unsolicit_data_pdu(ctask, &hdr, + iser_ctask->rdma_data_count); + + debug_scsi("Sending data-out: itt 0x%x, data count %d\n", + hdr.itt, ctask->data_count); + + /* the buffer description has been passed with the command */ + /* Send the command */ + error = iser_send_data_out(conn, ctask, &hdr); + if (error) { + ctask->unsol_datasn--; + goto iscsi_iser_ctask_xmit_unsol_data_exit; + } + ctask->unsol_count -= ctask->data_count; + debug_scsi("Need to send %d more as data-out PDUs\n", + ctask->unsol_count); + } + +iscsi_iser_ctask_xmit_unsol_data_exit: + return error; +} + +static int +iscsi_iser_ctask_xmit(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask) +{ + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + int error = 0; + + debug_scsi("ctask deq [cid %d itt 0x%x]\n", + conn->id, ctask->itt); + + /* + * serialize with TMF AbortTask + */ + if (ctask->mtask) + return error; + + /* Send the cmd PDU */ + if (!iser_ctask->command_sent) { + error = iser_send_command(conn, ctask); + if (error) + goto iscsi_iser_ctask_xmit_exit; + iser_ctask->command_sent = 1; + } + + /* Send unsolicited data-out PDU(s) if necessary */ + if (ctask->unsol_count) + error = iscsi_iser_ctask_xmit_unsol_data(conn, ctask); + + iscsi_iser_ctask_xmit_exit: + if (error && error != -EAGAIN) + iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); + return error; +} + +static void +iscsi_iser_cleanup_ctask(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) +{ + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + + if (iser_ctask->status == ISER_TASK_STATUS_STARTED) { + iser_ctask->status = ISER_TASK_STATUS_COMPLETED; + iser_ctask_rdma_finalize(iser_ctask); + } +} + +static struct iser_conn * +iscsi_iser_ib_conn_lookup(__u64 ep_handle) +{ + struct iser_conn *ib_conn; + struct iser_conn *uib_conn = (struct iser_conn *)(unsigned long)ep_handle; + + mutex_lock(&ig.connlist_mutex); + list_for_each_entry(ib_conn, &ig.connlist, conn_list) { + if (ib_conn == uib_conn) { + mutex_unlock(&ig.connlist_mutex); + return ib_conn; + } + } + mutex_unlock(&ig.connlist_mutex); + iser_err("no conn exists for eph %llx\n",(unsigned long long)ep_handle); + return NULL; +} + +static struct iscsi_cls_conn * +iscsi_iser_conn_create(struct iscsi_cls_session *cls_session, uint32_t conn_idx) +{ + struct iscsi_conn *conn; + struct iscsi_cls_conn *cls_conn; + struct iscsi_iser_conn *iser_conn; + + cls_conn = iscsi_conn_setup(cls_session, conn_idx); + if (!cls_conn) + return NULL; + conn = cls_conn->dd_data; + + /* + * due to issues with the login code re iser sematics + * this not set in iscsi_conn_setup - FIXME + */ + conn->max_recv_dlength = 128; + + iser_conn = kzalloc(sizeof(*iser_conn), GFP_KERNEL); + if (!iser_conn) + goto conn_alloc_fail; + + /* currently this is the only field which need to be initiated */ + rwlock_init(&iser_conn->lock); + + conn->recv_lock = &iser_conn->lock; + + conn->dd_data = iser_conn; + iser_conn->iscsi_conn = conn; + + return cls_conn; + +conn_alloc_fail: + iscsi_conn_teardown(cls_conn); + return NULL; +} + +static void +iscsi_iser_conn_destroy(struct iscsi_cls_conn *cls_conn) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_iser_conn *iser_conn = conn->dd_data; + + iscsi_conn_teardown(cls_conn); + kfree(iser_conn); +} + +static int +iscsi_iser_conn_bind(struct iscsi_cls_session *cls_session, + struct iscsi_cls_conn *cls_conn, uint64_t transport_eph, + int is_leading) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_iser_conn *iser_conn; + struct iser_conn *ib_conn; + int error; + + error = iscsi_conn_bind(cls_session, cls_conn, is_leading); + if (error) + return error; + + if (conn->stop_stage != STOP_CONN_SUSPEND) { + /* the transport ep handle comes from user space so it must be + * verified against the global ib connections list */ + ib_conn = iscsi_iser_ib_conn_lookup(transport_eph); + if (!ib_conn) { + iser_err("can't bind eph %llx\n", + (unsigned long long)transport_eph); + return -EINVAL; + } + /* binds the iSER connection retrieved from the previously + * connected ep_handle to the iSCSI layer connection. exchanges + * connection pointers */ + iser_err("binding iscsi conn %p to iser_conn %p\n",conn,ib_conn); + iser_conn = conn->dd_data; + ib_conn->iser_conn = iser_conn; + iser_conn->ib_conn = ib_conn; + } + + return 0; +} + +static int +iscsi_iser_conn_start(struct iscsi_cls_conn *cls_conn) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + int err; + + err = iscsi_conn_start(cls_conn); + if (err) + return err; + + return iser_conn_set_full_featured_mode(conn); +} + +static void +iscsi_iser_conn_terminate(struct iscsi_conn *conn) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iser_conn *ib_conn = iser_conn->ib_conn; + + BUG_ON(!ib_conn); + /* starts conn teardown process, waits until all previously * + * posted buffers get flushed, deallocates all conn resources */ + iser_conn_terminate(ib_conn); + iser_conn->ib_conn = NULL; + conn->recv_lock = NULL; +} + + +static struct iscsi_transport iscsi_iser_transport; + +static struct iscsi_cls_session * +iscsi_iser_session_create(struct iscsi_transport *iscsit, + struct scsi_transport_template *scsit, + uint32_t initial_cmdsn, uint32_t *hostno) +{ + struct iscsi_cls_session *cls_session; + struct iscsi_session *session; + int i; + uint32_t hn; + struct iscsi_cmd_task *ctask; + struct iscsi_mgmt_task *mtask; + struct iscsi_iser_cmd_task *iser_ctask; + struct iser_desc *desc; + + cls_session = iscsi_session_setup(iscsit, scsit, + sizeof(struct iscsi_iser_cmd_task), + sizeof(struct iser_desc), + initial_cmdsn, &hn); + if (!cls_session) + return NULL; + + *hostno = hn; + session = class_to_transport_session(cls_session); + + /* libiscsi setup itts, data and pool so just set desc fields */ + for (i = 0; i < session->cmds_max; i++) { + ctask = session->cmds[i]; + iser_ctask = ctask->dd_data; + ctask->hdr = (struct iscsi_cmd *)&iser_ctask->desc.iscsi_header; + } + + for (i = 0; i < session->mgmtpool_max; i++) { + mtask = session->mgmt_cmds[i]; + desc = mtask->dd_data; + mtask->hdr = &desc->iscsi_header; + desc->data = mtask->data; + } + + return cls_session; +} + +static int +iscsi_iser_conn_set_param(struct iscsi_cls_conn *cls_conn, + enum iscsi_param param, uint32_t value) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_session *session = conn->session; + + spin_lock_bh(&session->lock); + if (conn->c_stage != ISCSI_CONN_INITIAL_STAGE && + conn->stop_stage != STOP_CONN_RECOVER) { + printk(KERN_ERR "iscsi_iser: can not change parameter [%d]\n", + param); + spin_unlock_bh(&session->lock); + return 0; + } + spin_unlock_bh(&session->lock); + + switch (param) { + case ISCSI_PARAM_MAX_RECV_DLENGTH: + /* TBD */ + break; + case ISCSI_PARAM_MAX_XMIT_DLENGTH: + conn->max_xmit_dlength = value; + break; + case ISCSI_PARAM_HDRDGST_EN: + if (value) { + printk(KERN_ERR "DataDigest wasn't negotiated to None"); + return -EPROTO; + } + break; + case ISCSI_PARAM_DATADGST_EN: + if (value) { + printk(KERN_ERR "DataDigest wasn't negotiated to None"); + return -EPROTO; + } + break; + case ISCSI_PARAM_INITIAL_R2T_EN: + session->initial_r2t_en = value; + break; + case ISCSI_PARAM_IMM_DATA_EN: + session->imm_data_en = value; + break; + case ISCSI_PARAM_FIRST_BURST: + session->first_burst = value; + break; + case ISCSI_PARAM_MAX_BURST: + session->max_burst = value; + break; + case ISCSI_PARAM_PDU_INORDER_EN: + session->pdu_inorder_en = value; + break; + case ISCSI_PARAM_DATASEQ_INORDER_EN: + session->dataseq_inorder_en = value; + break; + case ISCSI_PARAM_ERL: + session->erl = value; + break; + case ISCSI_PARAM_IFMARKER_EN: + if (value) { + printk(KERN_ERR "IFMarker wasn't negotiated to No"); + return -EPROTO; + } + break; + case ISCSI_PARAM_OFMARKER_EN: + if (value) { + printk(KERN_ERR "OFMarker wasn't negotiated to No"); + return -EPROTO; + } + break; + default: + break; + } + + return 0; +} + +static int +iscsi_iser_session_get_param(struct iscsi_cls_session *cls_session, + enum iscsi_param param, uint32_t *value) +{ + struct Scsi_Host *shost = iscsi_session_to_shost(cls_session); + struct iscsi_session *session = iscsi_hostdata(shost->hostdata); + + switch (param) { + case ISCSI_PARAM_INITIAL_R2T_EN: + *value = session->initial_r2t_en; + break; + case ISCSI_PARAM_MAX_R2T: + *value = session->max_r2t; + break; + case ISCSI_PARAM_IMM_DATA_EN: + *value = session->imm_data_en; + break; + case ISCSI_PARAM_FIRST_BURST: + *value = session->first_burst; + break; + case ISCSI_PARAM_MAX_BURST: + *value = session->max_burst; + break; + case ISCSI_PARAM_PDU_INORDER_EN: + *value = session->pdu_inorder_en; + break; + case ISCSI_PARAM_DATASEQ_INORDER_EN: + *value = session->dataseq_inorder_en; + break; + case ISCSI_PARAM_ERL: + *value = session->erl; + break; + case ISCSI_PARAM_IFMARKER_EN: + *value = 0; + break; + case ISCSI_PARAM_OFMARKER_EN: + *value = 0; + break; + default: + return ISCSI_ERR_PARAM_NOT_FOUND; + } + + return 0; +} + +static int +iscsi_iser_conn_get_param(struct iscsi_cls_conn *cls_conn, + enum iscsi_param param, uint32_t *value) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + + switch(param) { + case ISCSI_PARAM_MAX_RECV_DLENGTH: + *value = conn->max_recv_dlength; + break; + case ISCSI_PARAM_MAX_XMIT_DLENGTH: + *value = conn->max_xmit_dlength; + break; + case ISCSI_PARAM_HDRDGST_EN: + *value = 0; + break; + case ISCSI_PARAM_DATADGST_EN: + *value = 0; + break; + /*case ISCSI_PARAM_TARGET_RECV_DLENGTH: + *value = conn->target_recv_dlength; + break; + case ISCSI_PARAM_INITIATOR_RECV_DLENGTH: + *value = conn->initiator_recv_dlength; + break;*/ + default: + return ISCSI_ERR_PARAM_NOT_FOUND; + } + + return 0; +} + + +static void +iscsi_iser_conn_get_stats(struct iscsi_cls_conn *cls_conn, struct iscsi_stats *stats) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + + stats->txdata_octets = conn->txdata_octets; + stats->rxdata_octets = conn->rxdata_octets; + stats->scsicmd_pdus = conn->scsicmd_pdus_cnt; + stats->dataout_pdus = conn->dataout_pdus_cnt; + stats->scsirsp_pdus = conn->scsirsp_pdus_cnt; + stats->datain_pdus = conn->datain_pdus_cnt; /* always 0 */ + stats->r2t_pdus = conn->r2t_pdus_cnt; /* always 0 */ + stats->tmfcmd_pdus = conn->tmfcmd_pdus_cnt; + stats->tmfrsp_pdus = conn->tmfrsp_pdus_cnt; + stats->custom_length = 3; + strcpy(stats->custom[0].desc, "qp_tx_queue_full"); + stats->custom[0].value = 0; /* TB iser_conn->qp_tx_queue_full; */ + strcpy(stats->custom[1].desc, "fmr_map_not_avail"); + stats->custom[1].value = 0; /* TB iser_conn->fmr_map_not_avail */; + strcpy(stats->custom[2].desc, "eh_abort_cnt"); + stats->custom[2].value = conn->eh_abort_cnt; +} + +static int +iscsi_iser_ep_connect(struct sockaddr *dst_addr, int non_blocking, + __u64 *ep_handle) +{ + int err; + struct iser_conn *ib_conn; + + err = iser_conn_init(&ib_conn); + if (err) + goto out; + + err = iser_connect(ib_conn, NULL, (struct sockaddr_in *)dst_addr, non_blocking); + if (!err) + *ep_handle = (__u64)(unsigned long)ib_conn; + +out: + return err; +} + +static int +iscsi_iser_ep_poll(__u64 ep_handle, int timeout_ms) +{ + struct iser_conn *ib_conn = iscsi_iser_ib_conn_lookup(ep_handle); + int rc; + + if (!ib_conn) + return -EINVAL; + + rc = wait_event_interruptible_timeout(ib_conn->wait, + ib_conn->state == ISER_CONN_UP, + msecs_to_jiffies(timeout_ms)); + + /* if conn establishment failed, return error code to iscsi */ + if (!rc && + (ib_conn->state == ISER_CONN_TERMINATING || + ib_conn->state == ISER_CONN_DOWN)) + rc = -1; + + iser_err("ib conn %p rc = %d\n", ib_conn, rc); + + if (rc > 0) + return 1; /* success, this is the equivalent of POLLOUT */ + else if (!rc) + return 0; /* timeout */ + else + return rc; /* signal */ +} + +static void +iscsi_iser_ep_disconnect(__u64 ep_handle) +{ + struct iser_conn *ib_conn = iscsi_iser_ib_conn_lookup(ep_handle); + + if (!ib_conn) + return; + + iser_err("ib conn %p state %d\n",ib_conn, ib_conn->state); + + iser_conn_terminate(ib_conn); +} + +static struct scsi_host_template iscsi_iser_sht = { + .name = "iSCSI Initiator over iSER, v." + ISCSI_VERSION_STR, + .queuecommand = iscsi_queuecommand, + .can_queue = ISCSI_XMIT_CMDS_MAX - 1, + .sg_tablesize = ISCSI_ISER_SG_TABLESIZE, + .cmd_per_lun = ISCSI_MAX_CMD_PER_LUN, + .eh_abort_handler = iscsi_eh_abort, + .eh_host_reset_handler = iscsi_eh_host_reset, + .use_clustering = DISABLE_CLUSTERING, + .proc_name = "iscsi_iser", + .this_id = -1, +}; + +static struct iscsi_transport iscsi_iser_transport = { + .owner = THIS_MODULE, + .name = "iser", + .caps = CAP_RECOVERY_L0 | CAP_MULTI_R2T, + .param_mask = ISCSI_MAX_RECV_DLENGTH | + ISCSI_MAX_XMIT_DLENGTH | + ISCSI_HDRDGST_EN | + ISCSI_DATADGST_EN | + ISCSI_INITIAL_R2T_EN | + ISCSI_MAX_R2T | + ISCSI_IMM_DATA_EN | + ISCSI_FIRST_BURST | + ISCSI_MAX_BURST | + ISCSI_PDU_INORDER_EN | + ISCSI_DATASEQ_INORDER_EN, + .host_template = &iscsi_iser_sht, + .conndata_size = sizeof(struct iscsi_conn), + .max_lun = ISCSI_ISER_MAX_LUN, + .max_cmd_len = ISCSI_ISER_MAX_CMD_LEN, + /* session management */ + .create_session = iscsi_iser_session_create, + .destroy_session = iscsi_session_teardown, + /* connection management */ + .create_conn = iscsi_iser_conn_create, + .bind_conn = iscsi_iser_conn_bind, + .destroy_conn = iscsi_iser_conn_destroy, + .set_param = iscsi_iser_conn_set_param, + .get_conn_param = iscsi_iser_conn_get_param, + .get_session_param = iscsi_iser_session_get_param, + .start_conn = iscsi_iser_conn_start, + .stop_conn = iscsi_conn_stop, + /* these are called as part of conn recovery */ + .suspend_conn_recv = NULL, /* FIXME is/how this relvant to iser? */ + .terminate_conn = iscsi_iser_conn_terminate, + /* IO */ + .send_pdu = iscsi_conn_send_pdu, + .get_stats = iscsi_iser_conn_get_stats, + .init_cmd_task = iscsi_iser_cmd_init, + .xmit_cmd_task = iscsi_iser_ctask_xmit, + .xmit_mgmt_task = iscsi_iser_mtask_xmit, + .cleanup_cmd_task = iscsi_iser_cleanup_ctask, + /* recovery */ + .session_recovery_timedout = iscsi_session_recovery_timedout, + + .ep_connect = iscsi_iser_ep_connect, + .ep_poll = iscsi_iser_ep_poll, + .ep_disconnect = iscsi_iser_ep_disconnect +}; + +static int __init iser_init(void) +{ + int err; + + iser_dbg("Starting iSER datamover...\n"); + + if (iscsi_max_lun < 1) { + printk(KERN_ERR "Invalid max_lun value of %u\n", iscsi_max_lun); + return -EINVAL; + } + + iscsi_iser_transport.max_lun = iscsi_max_lun; + + memset(&ig, 0, sizeof(struct iser_global)); + + ig.desc_cache = kmem_cache_create("iser_descriptors", + sizeof (struct iser_desc), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (ig.desc_cache == NULL) + return -ENOMEM; + + /* device init is called only after the first addr resolution */ + mutex_init(&ig.device_list_mutex); + INIT_LIST_HEAD(&ig.device_list); + mutex_init(&ig.connlist_mutex); + INIT_LIST_HEAD(&ig.connlist); + + if (!iscsi_register_transport(&iscsi_iser_transport)) { + iser_err("iscsi_register_transport failed\n"); + err = -EINVAL; + goto register_transport_failure; + } + + return 0; + +register_transport_failure: + kmem_cache_destroy(ig.desc_cache); + + return err; +} + +static void __exit iser_exit(void) +{ + iser_dbg("Removing iSER datamover...\n"); + iscsi_unregister_transport(&iscsi_iser_transport); + kmem_cache_destroy(ig.desc_cache); +} + +module_init(iser_init); +module_exit(iser_exit); From ogerlitz at voltaire.com Wed May 10 06:21:45 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 10 May 2006 16:21:45 +0300 (IDT) Subject: [openib-general] [PATCH 3/6] iser initiator In-Reply-To: Message-ID: --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iser_initiator.c 1970-01-01 02:00:00.000000000 +0200 +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iser_initiator.c 2006-05-10 15:32:01.000000000 +0300 @@ -0,0 +1,734 @@ +/* + * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: iser_initiator.c 6964 2006-05-07 11:11:43Z ogerlitz $ + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "iscsi_iser.h" + +/* Constant PDU lengths calculations */ +#define ISER_TOTAL_HEADERS_LEN (sizeof (struct iser_hdr) + \ + sizeof (struct iscsi_hdr)) + +/* iser_dto_add_regd_buff - increments the reference count for * + * the registered buffer & adds it to the DTO object */ +static void iser_dto_add_regd_buff(struct iser_dto *dto, + struct iser_regd_buf *regd_buf, + unsigned long use_offset, + unsigned long use_size) +{ + int add_idx; + + atomic_inc(®d_buf->ref_count); + + add_idx = dto->regd_vector_len; + dto->regd[add_idx] = regd_buf; + dto->used_sz[add_idx] = use_size; + dto->offset[add_idx] = use_offset; + + dto->regd_vector_len++; +} + +static int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask, + struct iser_data_buf *data, + enum iser_data_dir iser_dir, + enum dma_data_direction dma_dir) +{ + struct device *dma_device; + + iser_ctask->dir[iser_dir] = 1; + dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + + data->dma_nents = dma_map_sg(dma_device, data->buf, data->size, dma_dir); + if (data->dma_nents == 0) { + iser_err("dma_map_sg failed!!!\n"); + return -EINVAL; + } + return 0; +} + +static void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask) +{ + struct device *dma_device; + struct iser_data_buf *data; + + dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + + if (iser_ctask->dir[ISER_DIR_IN]) { + data = &iser_ctask->data[ISER_DIR_IN]; + dma_unmap_sg(dma_device, data->buf, data->size, DMA_FROM_DEVICE); + } + + if (iser_ctask->dir[ISER_DIR_OUT]) { + data = &iser_ctask->data[ISER_DIR_OUT]; + dma_unmap_sg(dma_device, data->buf, data->size, DMA_TO_DEVICE); + } +} + +/* Register user buffer memory and initialize passive rdma + * dto descriptor. Total data size is stored in + * iser_ctask->data[ISER_DIR_IN].data_len + */ +static int iser_prepare_read_cmd(struct iscsi_cmd_task *ctask, + unsigned int edtl) + +{ + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct iser_regd_buf *regd_buf; + int err; + struct iser_hdr *hdr = &iser_ctask->desc.iser_header; + struct iser_data_buf *buf_in = &iser_ctask->data[ISER_DIR_IN]; + + err = iser_dma_map_task_data(iser_ctask, + buf_in, + ISER_DIR_IN, + DMA_FROM_DEVICE); + if (err) + return err; + + if (edtl > iser_ctask->data[ISER_DIR_IN].data_len) { + iser_err("Total data length: %ld, less than EDTL: " + "%d, in READ cmd BHS itt: %d, conn: 0x%p\n", + iser_ctask->data[ISER_DIR_IN].data_len, edtl, + ctask->itt, iser_ctask->iser_conn); + return -EINVAL; + } + + err = iser_reg_rdma_mem(iser_ctask,ISER_DIR_IN); + if (err) { + iser_err("Failed to set up Data-IN RDMA\n"); + return err; + } + regd_buf = &iser_ctask->rdma_regd[ISER_DIR_IN]; + + hdr->flags |= ISER_RSV; + hdr->read_stag = cpu_to_be32(regd_buf->reg.rkey); + hdr->read_va = cpu_to_be64(regd_buf->reg.va); + + iser_dbg("Cmd itt:%d READ tags RKEY:%#.4X VA:%#llX\n", + ctask->itt, regd_buf->reg.rkey, + (unsigned long long)regd_buf->reg.va); + + return 0; +} + +/* Register user buffer memory and initialize passive rdma + * dto descriptor. Total data size is stored in + * ctask->data[ISER_DIR_OUT].data_len + */ +static int +iser_prepare_write_cmd(struct iscsi_cmd_task *ctask, + unsigned int imm_sz, + unsigned int unsol_sz, + unsigned int edtl) +{ + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct iser_regd_buf *regd_buf; + int err; + struct iser_dto *send_dto = &iser_ctask->desc.dto; + struct iser_hdr *hdr = &iser_ctask->desc.iser_header; + struct iser_data_buf *buf_out = &iser_ctask->data[ISER_DIR_OUT]; + + err = iser_dma_map_task_data(iser_ctask, + buf_out, + ISER_DIR_OUT, + DMA_TO_DEVICE); + if (err) + return err; + + if (edtl > iser_ctask->data[ISER_DIR_OUT].data_len) { + iser_err("Total data length: %ld, less than EDTL: %d, " + "in WRITE cmd BHS itt: %d, conn: 0x%p\n", + iser_ctask->data[ISER_DIR_OUT].data_len, + edtl, ctask->itt, ctask->conn); + return -EINVAL; + } + + err = iser_reg_rdma_mem(iser_ctask,ISER_DIR_OUT); + if (err != 0) { + iser_err("Failed to register write cmd RDMA mem\n"); + return err; + } + + regd_buf = &iser_ctask->rdma_regd[ISER_DIR_OUT]; + + if (unsol_sz < edtl) { + hdr->flags |= ISER_WSV; + hdr->write_stag = cpu_to_be32(regd_buf->reg.rkey); + hdr->write_va = cpu_to_be64(regd_buf->reg.va + unsol_sz); + + iser_dbg("Cmd itt:%d, WRITE tags, RKEY:%#.4X " + "VA:%#llX + unsol:%d\n", + ctask->itt, regd_buf->reg.rkey, + (unsigned long long)regd_buf->reg.va, unsol_sz); + } + + if (imm_sz > 0) { + iser_dbg("Cmd itt:%d, WRITE, adding imm.data sz: %d\n", + ctask->itt, imm_sz); + iser_dto_add_regd_buff(send_dto, + regd_buf, + 0, + imm_sz); + } + + return 0; +} + +/** + * iser_post_receive_control - allocates, initializes and posts receive DTO. + */ +static int iser_post_receive_control(struct iscsi_conn *conn) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iser_desc *rx_desc; + struct iser_regd_buf *regd_hdr; + struct iser_regd_buf *regd_data; + struct iser_dto *recv_dto = NULL; + struct iser_device *device = iser_conn->ib_conn->device; + int rx_data_size, err = 0; + + rx_desc = kmem_cache_alloc(ig.desc_cache, GFP_KERNEL); + if (rx_desc == NULL) { + iser_err("Failed to alloc desc for post recv\n"); + return -ENOMEM; + } + rx_desc->type = ISCSI_RX; + + /* for the login sequence we must support rx of upto 8K */ + if (conn->c_stage == ISCSI_CONN_INITIAL_STAGE) + rx_data_size = DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH; + else /* FIXME till user space sets conn->max_recv_dlength correctly */ + rx_data_size = 128; + + rx_desc->data = kmalloc(rx_data_size, GFP_KERNEL); + if (rx_desc->data == NULL) { + iser_err("Failed to alloc data buf for post recv\n"); + err = -ENOMEM; + goto post_rx_kmalloc_failure; + } + + recv_dto = &rx_desc->dto; + recv_dto->conn = iser_conn; + recv_dto->regd_vector_len = 0; + + regd_hdr = &rx_desc->hdr_regd_buf; + memset(regd_hdr, 0, sizeof(struct iser_regd_buf)); + regd_hdr->device = device; + regd_hdr->virt_addr = rx_desc; /* == &rx_desc->iser_header */ + regd_hdr->data_size = ISER_TOTAL_HEADERS_LEN; + + iser_reg_single(device, regd_hdr, DMA_FROM_DEVICE); + + iser_dto_add_regd_buff(recv_dto, regd_hdr, 0, 0); + + regd_data = &rx_desc->data_regd_buf; + memset(regd_data, 0, sizeof(struct iser_regd_buf)); + regd_data->device = device; + regd_data->virt_addr = rx_desc->data; + regd_data->data_size = rx_data_size; + + iser_reg_single(device, regd_data, DMA_FROM_DEVICE); + + iser_dto_add_regd_buff(recv_dto, regd_data, 0, 0); + + err = iser_post_recv(rx_desc); + if (!err) + return 0; + + /* iser_post_recv failed */ + iser_dto_buffs_release(recv_dto); + kfree(rx_desc->data); +post_rx_kmalloc_failure: + kmem_cache_free(ig.desc_cache, rx_desc); + return err; +} + +/* creates a new tx descriptor and adds header regd buffer */ +static void iser_create_send_desc(struct iscsi_iser_conn *iser_conn, + struct iser_desc *tx_desc) +{ + struct iser_regd_buf *regd_hdr = &tx_desc->hdr_regd_buf; + struct iser_dto *send_dto = &tx_desc->dto; + + memset(regd_hdr, 0, sizeof(struct iser_regd_buf)); + regd_hdr->device = iser_conn->ib_conn->device; + regd_hdr->virt_addr = tx_desc; /* == &tx_desc->iser_header */ + regd_hdr->data_size = ISER_TOTAL_HEADERS_LEN; + + send_dto->conn = iser_conn; + send_dto->notify_enable = 1; + send_dto->regd_vector_len = 0; + + memset(&tx_desc->iser_header, 0, sizeof(struct iser_hdr)); + tx_desc->iser_header.flags = ISER_VER; + + iser_dto_add_regd_buff(send_dto, regd_hdr, 0, 0); +} + +/** + * iser_conn_set_full_featured_mode - (iSER API) + */ +int iser_conn_set_full_featured_mode(struct iscsi_conn *conn) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + + int i; + /* no need to keep it in a var, we are after login so if this should + * be negotiated, by now the result should be available here */ + int initial_post_recv_bufs_num = ISER_MAX_RX_MISC_PDUS; + + iser_dbg("Initially post: %d\n", initial_post_recv_bufs_num); + + /* Check that there is no posted recv or send buffers left - */ + /* they must be consumed during the login phase */ + BUG_ON(atomic_read(&iser_conn->ib_conn->post_recv_buf_count) != 0); + BUG_ON(atomic_read(&iser_conn->ib_conn->post_send_buf_count) != 0); + + /* Initial post receive buffers */ + for (i = 0; i < initial_post_recv_bufs_num; i++) { + if (iser_post_receive_control(conn) != 0) { + iser_err("Failed to post recv bufs at:%d conn:0x%p\n", + i, conn); + return -ENOMEM; + } + } + iser_dbg("Posted %d post recv bufs, conn:0x%p\n", i, conn); + return 0; +} + +static int +iser_check_xmit(struct iscsi_conn *conn, void *task) +{ + int rc = 0; + struct iscsi_iser_conn *iser_conn = conn->dd_data; + + write_lock_bh(conn->recv_lock); + if (atomic_read(&iser_conn->ib_conn->post_send_buf_count) == + ISER_QP_MAX_REQ_DTOS) { + iser_dbg("%ld can't xmit task %p, suspending tx\n",jiffies,task); + set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); + rc = -EAGAIN; + } + write_unlock_bh(conn->recv_lock); + return rc; +} + + +/** + * iser_send_command - send command PDU + */ +int iser_send_command(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct iser_dto *send_dto = NULL; + unsigned long edtl; + int err = 0; + struct iser_data_buf *data_buf; + + struct iscsi_cmd *hdr = ctask->hdr; + struct scsi_cmnd *sc = ctask->sc; + + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { + iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); + return -EPERM; + } + if (iser_check_xmit(conn, ctask)) + return -EAGAIN; + + edtl = ntohl(hdr->data_length); + + /* build the tx desc regd header and add it to the tx desc dto */ + iser_ctask->desc.type = ISCSI_TX_SCSI_COMMAND; + send_dto = &iser_ctask->desc.dto; + send_dto->ctask = iser_ctask; + iser_create_send_desc(iser_conn, &iser_ctask->desc); + + if (hdr->flags & ISCSI_FLAG_CMD_READ) + data_buf = &iser_ctask->data[ISER_DIR_IN]; + else + data_buf = &iser_ctask->data[ISER_DIR_OUT]; + + if (sc->use_sg) { /* using a scatter list */ + data_buf->buf = sc->request_buffer; + data_buf->size = sc->use_sg; + } else { /* using a single buffer - convert it into one entry SG */ + sg_init_one(&data_buf->sg_single, + sc->request_buffer, sc->request_bufflen); + data_buf->buf = &data_buf->sg_single; + data_buf->size = 1; + } + + data_buf->data_len = sc->request_bufflen; + + if (hdr->flags & ISCSI_FLAG_CMD_READ) { + err = iser_prepare_read_cmd(ctask, edtl); + if (err) + goto send_command_error; + } + if (hdr->flags & ISCSI_FLAG_CMD_WRITE) { + err = iser_prepare_write_cmd(ctask, + ctask->imm_count, + ctask->imm_count + + ctask->unsol_count, + edtl); + if (err) + goto send_command_error; + } + + iser_reg_single(iser_conn->ib_conn->device, + send_dto->regd[0], DMA_TO_DEVICE); + + if (iser_post_receive_control(conn) != 0) { + iser_err("post_recv failed!\n"); + err = -ENOMEM; + goto send_command_error; + } + + iser_ctask->status = ISER_TASK_STATUS_STARTED; + + err = iser_post_send(&iser_ctask->desc); + if (!err) + return 0; + +send_command_error: + iser_dto_buffs_release(send_dto); + iser_err("conn %p failed ctask->itt %d err %d\n",conn, ctask->itt, err); + return err; +} + +/** + * iser_send_data_out - send data out PDU + */ +int iser_send_data_out(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask, + struct iscsi_data *hdr) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct iser_desc *tx_desc = NULL; + struct iser_dto *send_dto = NULL; + unsigned long buf_offset; + unsigned long data_seg_len; + unsigned int itt; + int err = 0; + + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { + iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); + return -EPERM; + } + + if (iser_check_xmit(conn, ctask)) + return -EAGAIN; + + itt = ntohl(hdr->itt); + data_seg_len = ntoh24(hdr->dlength); + buf_offset = ntohl(hdr->offset); + + iser_dbg("%s itt %d dseg_len %d offset %d\n", + __func__,(int)itt,(int)data_seg_len,(int)buf_offset); + + tx_desc = kmem_cache_alloc(ig.desc_cache, GFP_KERNEL); + if (tx_desc == NULL) { + iser_err("Failed to alloc desc for post dataout\n"); + return -ENOMEM; + } + + tx_desc->type = ISCSI_TX_DATAOUT; + memcpy(&tx_desc->iscsi_header, hdr, sizeof(struct iscsi_hdr)); + + /* build the tx desc regd header and add it to the tx desc dto */ + send_dto = &tx_desc->dto; + send_dto->ctask = iser_ctask; + iser_create_send_desc(iser_conn, tx_desc); + + iser_reg_single(iser_conn->ib_conn->device, + send_dto->regd[0], DMA_TO_DEVICE); + + /* all data was registered for RDMA, we can use the lkey */ + iser_dto_add_regd_buff(send_dto, + &iser_ctask->rdma_regd[ISER_DIR_OUT], + buf_offset, + data_seg_len); + + if (buf_offset + data_seg_len > iser_ctask->data[ISER_DIR_OUT].data_len) { + iser_err("Offset:%ld & DSL:%ld in Data-Out " + "inconsistent with total len:%ld, itt:%d\n", + buf_offset, data_seg_len, + iser_ctask->data[ISER_DIR_OUT].data_len, itt); + err = -EINVAL; + goto send_data_out_error; + } + iser_dbg("data-out itt: %d, offset: %ld, sz: %ld\n", + itt, buf_offset, data_seg_len); + + + err = iser_post_send(tx_desc); + if (!err) + return 0; + +send_data_out_error: + iser_dto_buffs_release(send_dto); + kmem_cache_free(ig.desc_cache, tx_desc); + iser_err("conn %p failed err %d\n",conn, err); + return err; +} + +int iser_send_control(struct iscsi_conn *conn, + struct iscsi_mgmt_task *mtask) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iser_desc *mdesc = mtask->dd_data; + struct iser_dto *send_dto = NULL; + unsigned int itt; + unsigned long data_seg_len; + int err = 0; + unsigned char opcode; + struct iser_regd_buf *regd_buf; + struct iser_device *device; + + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { + iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); + return -EPERM; + } + + if (iser_check_xmit(conn,mtask)) + return -EAGAIN; + + /* build the tx desc regd header and add it to the tx desc dto */ + mdesc->type = ISCSI_TX_CONTROL; + send_dto = &mdesc->dto; + send_dto->ctask = NULL; + iser_create_send_desc(iser_conn, mdesc); + + device = iser_conn->ib_conn->device; + + iser_reg_single(device, send_dto->regd[0], DMA_TO_DEVICE); + + itt = ntohl(mtask->hdr->itt); + opcode = mtask->hdr->opcode & ISCSI_OPCODE_MASK; + data_seg_len = ntoh24(mtask->hdr->dlength); + + if (data_seg_len > 0) { + regd_buf = &mdesc->data_regd_buf; + memset(regd_buf, 0, sizeof(struct iser_regd_buf)); + regd_buf->device = device; + regd_buf->virt_addr = mtask->data; + regd_buf->data_size = mtask->data_count; + iser_reg_single(device, regd_buf, + DMA_TO_DEVICE); + iser_dto_add_regd_buff(send_dto, regd_buf, + 0, + data_seg_len); + } + + if (iser_post_receive_control(conn) != 0) { + iser_err("post_rcv_buff failed!\n"); + err = -ENOMEM; + goto send_control_error; + } + + err = iser_post_send(mdesc); + if (!err) + return 0; + +send_control_error: + iser_dto_buffs_release(send_dto); + iser_err("conn %p failed err %d\n",conn, err); + return err; +} + +/** + * iser_rcv_dto_completion - recv DTO completion + */ +void iser_rcv_completion(struct iser_desc *rx_desc, + unsigned long dto_xfer_len) +{ + struct iser_dto *dto = &rx_desc->dto; + struct iscsi_iser_conn *conn = dto->conn; + struct iscsi_session *session = conn->iscsi_conn->session; + struct iscsi_cmd_task *ctask; + struct iscsi_iser_cmd_task *iser_ctask; + struct iscsi_hdr *hdr; + char *rx_data = NULL; + int rx_data_len = 0; + unsigned int itt; + unsigned char opcode; + + hdr = &rx_desc->iscsi_header; + + iser_dbg("op 0x%x itt 0x%x\n", hdr->opcode,hdr->itt); + + if (dto_xfer_len > ISER_TOTAL_HEADERS_LEN) { /* we have data */ + rx_data_len = dto_xfer_len - ISER_TOTAL_HEADERS_LEN; + rx_data = dto->regd[1]->virt_addr; + rx_data += dto->offset[1]; + } + + opcode = hdr->opcode & ISCSI_OPCODE_MASK; + + if (opcode == ISCSI_OP_SCSI_CMD_RSP) { + itt = hdr->itt & ISCSI_ITT_MASK; /* mask out cid and age bits */ + if (!(itt < session->cmds_max)) + iser_err("itt can't be matched to task!!!" + "conn %p opcode %d cmds_max %d itt %d\n", + conn->iscsi_conn,opcode,session->cmds_max,itt); + /* use the mapping given with the cmds array indexed by itt */ + ctask = (struct iscsi_cmd_task *)session->cmds[itt]; + iser_ctask = ctask->dd_data; + iser_dbg("itt %d ctask %p\n",itt,ctask); + iser_ctask->status = ISER_TASK_STATUS_COMPLETED; + iser_ctask_rdma_finalize(iser_ctask); + } + + iser_dto_buffs_release(dto); + + iscsi_iser_recv(conn->iscsi_conn, hdr, rx_data, rx_data_len); + + kfree(rx_desc->data); + kmem_cache_free(ig.desc_cache, rx_desc); + + /* decrementing conn->post_recv_buf_count only --after-- freeing the * + * task eliminates the need to worry on tasks which are completed in * + * parallel to the execution of iser_conn_term. So the code that waits * + * for the posted rx bufs refcount to become zero handles everything */ + atomic_dec(&conn->ib_conn->post_recv_buf_count); +} + +void iser_snd_completion(struct iser_desc *tx_desc) +{ + struct iser_dto *dto = &tx_desc->dto; + struct iscsi_iser_conn *iser_conn = dto->conn; + struct iscsi_conn *conn = iser_conn->iscsi_conn; + struct iscsi_mgmt_task *mtask; + + iser_dbg("Initiator, Data sent dto=0x%p\n", dto); + + iser_dto_buffs_release(dto); + + if (tx_desc->type == ISCSI_TX_DATAOUT) + kmem_cache_free(ig.desc_cache, tx_desc); + + atomic_dec(&iser_conn->ib_conn->post_send_buf_count); + + write_lock(conn->recv_lock); + if (conn->suspend_tx) { + iser_dbg("%ld resuming tx\n",jiffies); + clear_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); + scsi_queue_work(conn->session->host, &conn->xmitwork); + } + write_unlock(conn->recv_lock); + + if (tx_desc->type == ISCSI_TX_CONTROL) { + /* this arithmetic is legal by libiscsi dd_data allocation */ + mtask = (void *) ((long)(void *)tx_desc - + sizeof(struct iscsi_mgmt_task)); + if (mtask->hdr->itt == cpu_to_be32(ISCSI_RESERVED_TAG)) { + struct iscsi_session *session = conn->session; + + spin_lock(&conn->session->lock); + list_del(&mtask->running); + __kfifo_put(session->mgmtpool.queue, (void*)&mtask, + sizeof(void*)); + spin_unlock(&session->lock); + } + } +} + +void iser_ctask_rdma_init(struct iscsi_iser_cmd_task *iser_ctask) + +{ + iser_ctask->status = ISER_TASK_STATUS_INIT; + + iser_ctask->dir[ISER_DIR_IN] = 0; + iser_ctask->dir[ISER_DIR_OUT] = 0; + + iser_ctask->data[ISER_DIR_IN].data_len = 0; + iser_ctask->data[ISER_DIR_OUT].data_len = 0; + + memset(&iser_ctask->rdma_regd[ISER_DIR_IN], 0, + sizeof(struct iser_regd_buf)); + memset(&iser_ctask->rdma_regd[ISER_DIR_OUT], 0, + sizeof(struct iser_regd_buf)); +} + +void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *iser_ctask) +{ + int deferred; + + /* if we were reading, copy back to unaligned sglist, + * anyway dma_unmap and free the copy + */ + if (iser_ctask->data_copy[ISER_DIR_IN].copy_buf != NULL) + iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_IN); + if (iser_ctask->data_copy[ISER_DIR_OUT].copy_buf != NULL) + iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_OUT); + + if (iser_ctask->dir[ISER_DIR_IN]) { + deferred = iser_regd_buff_release + (&iser_ctask->rdma_regd[ISER_DIR_IN]); + if (deferred) { + iser_err("References remain for BUF-IN rdma reg\n"); + BUG(); + } + } + + if (iser_ctask->dir[ISER_DIR_OUT]) { + deferred = iser_regd_buff_release + (&iser_ctask->rdma_regd[ISER_DIR_OUT]); + if (deferred) { + iser_err("References remain for BUF-OUT rdma reg\n"); + BUG(); + } + } + + iser_dma_unmap_task_data(iser_ctask); +} + +void iser_dto_buffs_release(struct iser_dto *dto) +{ + int i; + + for (i = 0; i < dto->regd_vector_len; i++) + iser_regd_buff_release(dto->regd[i]); +} + From ogerlitz at voltaire.com Wed May 10 06:22:09 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 10 May 2006 16:22:09 +0300 (IDT) Subject: [openib-general] [PATCH 4/6] iser RDMA CM (CMA) and IB verbs interaction In-Reply-To: Message-ID: --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iser_verbs.c 1970-01-01 02:00:00.000000000 +0200 +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iser_verbs.c 2006-05-10 15:32:01.000000000 +0300 @@ -0,0 +1,827 @@ +/* + * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2005, 2006 Cisco Systems. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: iser_verbs.c 7051 2006-05-10 12:29:11Z ogerlitz $ + */ +#include +#include +#include +#include +#include +#include + +#include "iscsi_iser.h" + +#define ISCSI_ISER_MAX_CONN 8 +#define ISER_MAX_CQ_LEN ((ISER_QP_MAX_RECV_DTOS + \ + ISER_QP_MAX_REQ_DTOS) * \ + ISCSI_ISER_MAX_CONN) + +static void iser_cq_tasklet_fn(unsigned long data); +static void iser_cq_callback(struct ib_cq *cq, void *cq_context); +static void iser_comp_error_worker(void *data); + +static void iser_cq_event_callback(struct ib_event *cause, void *context) +{ + iser_err("got cq event %d \n", cause->event); +} + +static void iser_qp_event_callback(struct ib_event *cause, void *context) +{ + iser_err("got qp event %d\n",cause->event); +} + +/** + * iser_create_device_ib_res - creates Protection Domain (PD), Completion + * Queue (CQ), DMA Memory Region (DMA MR) with the device associated with + * the adapator. + * + * returns 0 on success, -1 on failure + */ +static int iser_create_device_ib_res(struct iser_device *device) +{ + device->pd = ib_alloc_pd(device->ib_device); + if (IS_ERR(device->pd)) + goto pd_err; + + device->cq = ib_create_cq(device->ib_device, + iser_cq_callback, + iser_cq_event_callback, + (void *)device, + ISER_MAX_CQ_LEN); + if (IS_ERR(device->cq)) + goto cq_err; + + if (ib_req_notify_cq(device->cq, IB_CQ_NEXT_COMP)) + goto cq_arm_err; + + tasklet_init(&device->cq_tasklet, + iser_cq_tasklet_fn, + (unsigned long)device); + + device->mr = ib_get_dma_mr(device->pd, + IB_ACCESS_LOCAL_WRITE); + if (IS_ERR(device->mr)) + goto dma_mr_err; + + return 0; + +dma_mr_err: + tasklet_kill(&device->cq_tasklet); +cq_arm_err: + ib_destroy_cq(device->cq); +cq_err: + ib_dealloc_pd(device->pd); +pd_err: + iser_err("failed to allocate an IB resource\n"); + return -1; +} + +/** + * iser_free_device_ib_res - destory/dealloc/dereg the DMA MR, + * CQ and PD created with the device associated with the adapator. + */ +static void iser_free_device_ib_res(struct iser_device *device) +{ + BUG_ON(device->mr == NULL); + + tasklet_kill(&device->cq_tasklet); + + (void)ib_dereg_mr(device->mr); + (void)ib_destroy_cq(device->cq); + (void)ib_dealloc_pd(device->pd); + + device->mr = NULL; + device->cq = NULL; + device->pd = NULL; +} + +/** + * iser_create_ib_conn_res - Creates FMR pool and Queue-Pair (QP) + * + * returns 0 on success, -1 on failure + */ +static int iser_create_ib_conn_res(struct iser_conn *ib_conn) +{ + struct iser_device *device; + struct ib_qp_init_attr init_attr; + int ret; + struct ib_fmr_pool_param params; + + BUG_ON(ib_conn->device == NULL); + + device = ib_conn->device; + + ib_conn->page_vec = kmalloc(sizeof(struct iser_page_vec) + + (sizeof(u64) * (ISCSI_ISER_SG_TABLESIZE +1)), + GFP_KERNEL); + if (!ib_conn->page_vec) { + ret = -ENOMEM; + goto alloc_err; + } + ib_conn->page_vec->pages = (u64 *) (ib_conn->page_vec + 1); + + params.page_shift = PAGE_SHIFT; + /* when the first/last SG element are not start/end * + * page aligned, the map whould be of N+1 pages */ + params.max_pages_per_fmr = ISCSI_ISER_SG_TABLESIZE + 1; + /* make the pool size twice the max number of SCSI commands * + * the ML is expected to queue, watermark for unmap at 50% */ + params.pool_size = ISCSI_XMIT_CMDS_MAX * 2; + params.dirty_watermark = ISCSI_XMIT_CMDS_MAX; + params.cache = 0; + params.flush_function = NULL; + params.access = (IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_WRITE | + IB_ACCESS_REMOTE_READ); + + ib_conn->fmr_pool = ib_create_fmr_pool(device->pd, ¶ms); + if (IS_ERR(ib_conn->fmr_pool)) { + ret = PTR_ERR(ib_conn->fmr_pool); + goto fmr_pool_err; + } + + memset(&init_attr, 0, sizeof init_attr); + + init_attr.event_handler = iser_qp_event_callback; + init_attr.qp_context = (void *)ib_conn; + init_attr.send_cq = device->cq; + init_attr.recv_cq = device->cq; + init_attr.cap.max_send_wr = ISER_QP_MAX_REQ_DTOS; + init_attr.cap.max_recv_wr = ISER_QP_MAX_RECV_DTOS; + init_attr.cap.max_send_sge = MAX_REGD_BUF_VECTOR_LEN; + init_attr.cap.max_recv_sge = 2; + init_attr.sq_sig_type = IB_SIGNAL_REQ_WR; + init_attr.qp_type = IB_QPT_RC; + + ret = rdma_create_qp(ib_conn->cma_id, device->pd, &init_attr); + if (ret) + goto qp_err; + + ib_conn->qp = ib_conn->cma_id->qp; + iser_err("setting conn %p cma_id %p: fmr_pool %p qp %p\n", + ib_conn, ib_conn->cma_id, + ib_conn->fmr_pool, ib_conn->cma_id->qp); + return ret; + +qp_err: + (void)ib_destroy_fmr_pool(ib_conn->fmr_pool); +fmr_pool_err: + kfree(ib_conn->page_vec); +alloc_err: + iser_err("unable to alloc mem or create resource, err %d\n", ret); + return ret; +} + +/** + * releases the FMR pool, QP and CMA ID objects, returns 0 on success, + * -1 on failure + */ +static int iser_free_ib_conn_res(struct iser_conn *ib_conn) +{ + BUG_ON(ib_conn == NULL); + + iser_err("freeing conn %p cma_id %p fmr pool %p qp %p\n", + ib_conn, ib_conn->cma_id, + ib_conn->fmr_pool, ib_conn->qp); + + /* qp is created only once both addr & route are resolved */ + if (ib_conn->fmr_pool != NULL) + ib_destroy_fmr_pool(ib_conn->fmr_pool); + + if (ib_conn->qp != NULL) + rdma_destroy_qp(ib_conn->cma_id); + + if (ib_conn->cma_id != NULL) + rdma_destroy_id(ib_conn->cma_id); + + ib_conn->fmr_pool = NULL; + ib_conn->qp = NULL; + ib_conn->cma_id = NULL; + kfree(ib_conn->page_vec); + + return 0; +} + +/** + * based on the resolved device node GUID see if there already allocated + * device for this device. If there's no such, create one. + */ +static +struct iser_device *iser_device_find_by_ib_device(struct rdma_cm_id *cma_id) +{ + struct list_head *p_list; + struct iser_device *device = NULL; + + mutex_lock(&ig.device_list_mutex); + + p_list = ig.device_list.next; + while (p_list != &ig.device_list) { + device = list_entry(p_list, struct iser_device, ig_list); + /* find if there's a match using the node GUID */ + if (device->ib_device->node_guid == cma_id->device->node_guid) + break; + } + + if (device == NULL) { + device = kzalloc(sizeof *device, GFP_KERNEL); + if (device == NULL) + goto out; + /* assign this device to the device */ + device->ib_device = cma_id->device; + /* init the device and link it into ig device list */ + if (iser_create_device_ib_res(device)) { + kfree(device); + device = NULL; + goto out; + } + list_add(&device->ig_list, &ig.device_list); + } +out: + BUG_ON(device == NULL); + device->refcount++; + mutex_unlock(&ig.device_list_mutex); + return device; +} + +/* if there's no demand for this device, release it */ +static void iser_device_try_release(struct iser_device *device) +{ + mutex_lock(&ig.device_list_mutex); + device->refcount--; + iser_err("device %p refcount %d\n",device,device->refcount); + if (!device->refcount) { + iser_free_device_ib_res(device); + list_del(&device->ig_list); + kfree(device); + } + mutex_unlock(&ig.device_list_mutex); +} + +int iser_conn_state_comp(struct iser_conn *ib_conn, + enum iser_ib_conn_state comp) +{ + int ret; + + spin_lock_bh(&ib_conn->lock); + ret = (ib_conn->state == comp); + spin_unlock_bh(&ib_conn->lock); + return ret; +} + +static int iser_conn_state_comp_exch(struct iser_conn *ib_conn, + enum iser_ib_conn_state comp, + enum iser_ib_conn_state exch) +{ + int ret; + + spin_lock_bh(&ib_conn->lock); + if ((ret = (ib_conn->state == comp))) + ib_conn->state = exch; + spin_unlock_bh(&ib_conn->lock); + return ret; +} + +/** + * triggers start of the disconnect procedures and wait for them to be done + */ +void iser_conn_terminate(struct iser_conn *ib_conn) +{ + int err = 0; + + /* change the ib conn state only if the conn is UP, however always call + * rdma_disconnect since this is the only way to cause the CMA to change + * the QP state to ERROR + */ + + iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, ISER_CONN_TERMINATING); + err = rdma_disconnect(ib_conn->cma_id); + if (err) + iser_err("Failed to disconnect, conn: 0x%p err %d\n", + ib_conn,err); + + wait_event_interruptible(ib_conn->wait, + ib_conn->state == ISER_CONN_DOWN); + + iser_conn_release(ib_conn); +} + +static void iser_connect_error(struct rdma_cm_id *cma_id) +{ + struct iser_conn *ib_conn; + ib_conn = (struct iser_conn *)cma_id->context; + + ib_conn->state = ISER_CONN_DOWN; + wake_up_interruptible(&ib_conn->wait); +} + +static void iser_addr_handler(struct rdma_cm_id *cma_id) +{ + struct iser_device *device; + struct iser_conn *ib_conn; + int ret; + + device = iser_device_find_by_ib_device(cma_id); + ib_conn = (struct iser_conn *)cma_id->context; + ib_conn->device = device; + + ret = rdma_resolve_route(cma_id, 1000); + if (ret) { + iser_err("resolve route failed: %d\n", ret); + iser_connect_error(cma_id); + } + return; +} + +static void iser_route_handler(struct rdma_cm_id *cma_id) +{ + struct rdma_conn_param conn_param; + int ret; + + ret = iser_create_ib_conn_res((struct iser_conn *)cma_id->context); + if (ret) + goto failure; + + iser_dbg("path.mtu is %d setting it to %d\n", + cma_id->route.path_rec->mtu, IB_MTU_1024); + + /* we must set the MTU to 1024 as this is what the target is assuming */ + if (cma_id->route.path_rec->mtu > IB_MTU_1024) + cma_id->route.path_rec->mtu = IB_MTU_1024; + + memset(&conn_param, 0, sizeof conn_param); + conn_param.responder_resources = 4; + conn_param.initiator_depth = 1; + conn_param.retry_count = 7; + conn_param.rnr_retry_count = 6; + + ret = rdma_connect(cma_id, &conn_param); + if (ret) { + iser_err("failure connecting: %d\n", ret); + goto failure; + } + + return; +failure: + iser_connect_error(cma_id); +} + +static void iser_connected_handler(struct rdma_cm_id *cma_id) +{ + struct iser_conn *ib_conn; + + ib_conn = (struct iser_conn *)cma_id->context; + ib_conn->state = ISER_CONN_UP; + wake_up_interruptible(&ib_conn->wait); +} + +static void iser_disconnected_handler(struct rdma_cm_id *cma_id) +{ + struct iser_conn *ib_conn; + + ib_conn = (struct iser_conn *)cma_id->context; + ib_conn->disc_evt_flag = 1; + + /* getting here when the state is UP means that the conn is being * + * terminated asynchronously from the iSCSI layer's perspective. */ + if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, + ISER_CONN_TERMINATING)) + iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, + ISCSI_ERR_CONN_FAILED); + + /* Complete the termination process if no posts are pending */ + if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) && + (atomic_read(&ib_conn->post_send_buf_count) == 0)) { + ib_conn->state = ISER_CONN_DOWN; + wake_up_interruptible(&ib_conn->wait); + } +} + +static int iser_cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) +{ + int ret = 0; + + iser_err("event %d conn %p id %p\n",event->event,cma_id->context,cma_id); + + switch (event->event) { + case RDMA_CM_EVENT_ADDR_RESOLVED: + iser_addr_handler(cma_id); + break; + case RDMA_CM_EVENT_ROUTE_RESOLVED: + iser_route_handler(cma_id); + break; + case RDMA_CM_EVENT_ESTABLISHED: + iser_connected_handler(cma_id); + break; + case RDMA_CM_EVENT_ADDR_ERROR: + case RDMA_CM_EVENT_ROUTE_ERROR: + case RDMA_CM_EVENT_CONNECT_ERROR: + case RDMA_CM_EVENT_UNREACHABLE: + case RDMA_CM_EVENT_REJECTED: + iser_err("event: %d, error: %d\n", event->event, event->status); + iser_connect_error(cma_id); + break; + case RDMA_CM_EVENT_DISCONNECTED: + iser_disconnected_handler(cma_id); + break; + case RDMA_CM_EVENT_DEVICE_REMOVAL: + BUG(); + break; + case RDMA_CM_EVENT_CONNECT_RESPONSE: + BUG(); + break; + case RDMA_CM_EVENT_CONNECT_REQUEST: + default: + break; + } + return ret; +} + +int iser_conn_init(struct iser_conn **ibconn) +{ + struct iser_conn *ib_conn; + + ib_conn = kzalloc(sizeof *ib_conn, GFP_KERNEL); + if (!ib_conn) { + iser_err("can't alloc memory for struct iser_conn\n"); + return -ENOMEM; + } + ib_conn->state = ISER_CONN_INIT; + init_waitqueue_head(&ib_conn->wait); + atomic_set(&ib_conn->post_recv_buf_count, 0); + atomic_set(&ib_conn->post_send_buf_count, 0); + INIT_WORK(&ib_conn->comperror_work, iser_comp_error_worker, + ib_conn); + INIT_LIST_HEAD(&ib_conn->conn_list); + spin_lock_init(&ib_conn->lock); + + *ibconn = ib_conn; + return 0; +} + + /** + * starts the process of connecting to the target + * sleeps untill the connection is established or rejected + */ +int iser_connect(struct iser_conn *ib_conn, + struct sockaddr_in *src_addr, + struct sockaddr_in *dst_addr, + int non_blocking) +{ + struct sockaddr *src, *dst; + int err = 0; + + sprintf(ib_conn->name,"%d.%d.%d.%d:%d", + NIPQUAD(dst_addr->sin_addr.s_addr), dst_addr->sin_port); + + /* the device is known only --after-- address resolution */ + ib_conn->device = NULL; + + iser_err("connecting to: %d.%d.%d.%d, port 0x%x\n", + NIPQUAD(dst_addr->sin_addr), dst_addr->sin_port); + + ib_conn->state = ISER_CONN_PENDING; + + ib_conn->cma_id = rdma_create_id(iser_cma_handler, + (void *)ib_conn, + RDMA_PS_TCP); + if (IS_ERR(ib_conn->cma_id)) { + err = PTR_ERR(ib_conn->cma_id); + iser_err("rdma_create_id failed: %d\n", err); + goto id_failure; + } + + src = (struct sockaddr *)src_addr; + dst = (struct sockaddr *)dst_addr; + err = rdma_resolve_addr(ib_conn->cma_id, src, dst, 1000); + if (err) { + iser_err("rdma_resolve_addr failed: %d\n", err); + goto addr_failure; + } + + if (!non_blocking) { + wait_event_interruptible(ib_conn->wait, + (ib_conn->state != ISER_CONN_PENDING)); + + if (ib_conn->state != ISER_CONN_UP) { + err = -EIO; + goto connect_failure; + } + } + + mutex_lock(&ig.connlist_mutex); + list_add(&ib_conn->conn_list, &ig.connlist); + mutex_unlock(&ig.connlist_mutex); + return 0; + +id_failure: + ib_conn->cma_id = NULL; +addr_failure: + ib_conn->state = ISER_CONN_DOWN; +connect_failure: + iser_conn_release(ib_conn); + return err; +} + +/** + * Frees all conn objects and deallocs conn descriptor + */ +void iser_conn_release(struct iser_conn *ib_conn) +{ + struct iser_device *device = ib_conn->device; + + BUG_ON(ib_conn->state != ISER_CONN_DOWN); + + mutex_lock(&ig.connlist_mutex); + list_del(&ib_conn->conn_list); + mutex_unlock(&ig.connlist_mutex); + + iser_free_ib_conn_res(ib_conn); + ib_conn->device = NULL; + /* on EVENT_ADDR_ERROR there's no device yet for this conn */ + if (device != NULL) + iser_device_try_release(device); + kfree(ib_conn); +} + + +/** + * iser_reg_page_vec - Register physical memory + * + * returns: 0 on success, errno code on failure + */ +int iser_reg_page_vec(struct iser_conn *ib_conn, + struct iser_page_vec *page_vec, + struct iser_mem_reg *mem_reg) +{ + struct ib_pool_fmr *mem; + u64 io_addr; + u64 *page_list; + int status; + + page_list = page_vec->pages; + io_addr = page_list[0]; + + mem = ib_fmr_pool_map_phys(ib_conn->fmr_pool, + page_list, + page_vec->length, + &io_addr); + + if (IS_ERR(mem)) { + status = (int)PTR_ERR(mem); + iser_err("ib_fmr_pool_map_phys failed: %d\n", status); + return status; + } + + mem_reg->lkey = mem->fmr->lkey; + mem_reg->rkey = mem->fmr->rkey; + mem_reg->len = page_vec->length * PAGE_SIZE; + mem_reg->va = io_addr; + mem_reg->mem_h = (void *)mem; + + mem_reg->va += page_vec->offset; + mem_reg->len = page_vec->data_size; + + iser_dbg("PHYSICAL Mem.register, [PHYS p_array: 0x%p, sz: %d, " + "entry[0]: (0x%08lx,%ld)] -> " + "[lkey: 0x%08X mem_h: 0x%p va: 0x%08lX sz: %ld]\n", + page_vec, page_vec->length, + (unsigned long)page_vec->pages[0], + (unsigned long)page_vec->data_size, + (unsigned int)mem_reg->lkey, mem_reg->mem_h, + (unsigned long)mem_reg->va, (unsigned long)mem_reg->len); + return 0; +} + +/** + * Unregister (previosuly registered) memory. + */ +void iser_unreg_mem(struct iser_mem_reg *reg) +{ + int ret; + + iser_dbg("PHYSICAL Mem.Unregister mem_h %p\n",reg->mem_h); + + ret = ib_fmr_pool_unmap((struct ib_pool_fmr *)reg->mem_h); + if (ret) + iser_err("ib_fmr_pool_unmap failed %d\n", ret); + + reg->mem_h = NULL; +} + +/** + * iser_dto_to_iov - builds IOV from a dto descriptor + */ +static void iser_dto_to_iov(struct iser_dto *dto, struct ib_sge *iov, int iov_len) +{ + int i; + struct ib_sge *sge; + struct iser_regd_buf *regd_buf; + + if (dto->regd_vector_len > iov_len) { + iser_err("iov size %d too small for posting dto of len %d\n", + iov_len, dto->regd_vector_len); + BUG(); + } + + for (i = 0; i < dto->regd_vector_len; i++) { + sge = &iov[i]; + regd_buf = dto->regd[i]; + + sge->addr = regd_buf->reg.va; + sge->length = regd_buf->reg.len; + sge->lkey = regd_buf->reg.lkey; + + if (dto->used_sz[i] > 0) /* Adjust size */ + sge->length = dto->used_sz[i]; + + /* offset and length should not exceed the regd buf length */ + if (sge->length + dto->offset[i] > regd_buf->reg.len) { + iser_err("Used len:%ld + offset:%d, exceed reg.buf.len:" + "%ld in dto:0x%p [%d], va:0x%08lX\n", + (unsigned long)sge->length, dto->offset[i], + (unsigned long)regd_buf->reg.len, dto, i, + (unsigned long)sge->addr); + BUG(); + } + + sge->addr += dto->offset[i]; /* Adjust offset */ + } +} + +/** + * iser_post_recv - Posts a receive buffer. + * + * returns 0 on success, -1 on failure + */ +int iser_post_recv(struct iser_desc *rx_desc) +{ + int ib_ret, ret_val = 0; + struct ib_recv_wr recv_wr, *recv_wr_failed; + struct ib_sge iov[2]; + struct iser_conn *ib_conn; + struct iser_dto *recv_dto = &rx_desc->dto; + + /* Retrieve conn */ + ib_conn = recv_dto->conn->ib_conn; + + iser_dto_to_iov(recv_dto, iov, 2); + + recv_wr.next = NULL; + recv_wr.sg_list = iov; + recv_wr.num_sge = recv_dto->regd_vector_len; + recv_wr.wr_id = (unsigned long)rx_desc; + + atomic_inc(&ib_conn->post_recv_buf_count); + ib_ret = ib_post_recv(ib_conn->qp, &recv_wr, &recv_wr_failed); + if (ib_ret) { + iser_err("ib_post_recv failed ret=%d\n", ib_ret); + atomic_dec(&ib_conn->post_recv_buf_count); + ret_val = -1; + } + + return ret_val; +} + +/** + * iser_start_send - Initiate a Send DTO operation + * + * returns 0 on success, -1 on failure + */ +int iser_post_send(struct iser_desc *tx_desc) +{ + int ib_ret, ret_val = 0; + struct ib_send_wr send_wr, *send_wr_failed; + struct ib_sge iov[MAX_REGD_BUF_VECTOR_LEN]; + struct iser_conn *ib_conn; + struct iser_dto *dto = &tx_desc->dto; + + ib_conn = dto->conn->ib_conn; + + iser_dto_to_iov(dto, iov, MAX_REGD_BUF_VECTOR_LEN); + + send_wr.next = NULL; + send_wr.wr_id = (unsigned long)tx_desc; + send_wr.sg_list = iov; + send_wr.num_sge = dto->regd_vector_len; + send_wr.opcode = IB_WR_SEND; + send_wr.send_flags = dto->notify_enable ? IB_SEND_SIGNALED : 0; + + atomic_inc(&ib_conn->post_send_buf_count); + + ib_ret = ib_post_send(ib_conn->qp, &send_wr, &send_wr_failed); + if (ib_ret) { + iser_err("Failed to start SEND DTO, dto: 0x%p, IOV len: %d\n", + dto, dto->regd_vector_len); + iser_err("ib_post_send failed, ret:%d\n", ib_ret); + atomic_dec(&ib_conn->post_send_buf_count); + ret_val = -1; + } + + return ret_val; +} + +static void iser_comp_error_worker(void *data) +{ + struct iser_conn *ib_conn = data; + + /* getting here when the state is UP means that the conn is being * + * terminated asynchronously from the iSCSI layer's perspective. */ + if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, + ISER_CONN_TERMINATING)) + iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, + ISCSI_ERR_CONN_FAILED); + + /* complete the termination process if disconnect event was delivered * + * note there are no more non completed posts to the QP */ + if (ib_conn->disc_evt_flag) { + ib_conn->state = ISER_CONN_DOWN; + wake_up_interruptible(&ib_conn->wait); + } +} + +static void iser_handle_comp_error(struct iser_desc *desc) +{ + struct iser_dto *dto = &desc->dto; + struct iser_conn *ib_conn = dto->conn->ib_conn; + + iser_dto_buffs_release(dto); + + if (desc->type == ISCSI_RX) { + kfree(desc->data); + kmem_cache_free(ig.desc_cache, desc); + atomic_dec(&ib_conn->post_recv_buf_count); + } else { /* type is TX control/command/dataout */ + if (desc->type == ISCSI_TX_DATAOUT) + kmem_cache_free(ig.desc_cache, desc); + atomic_dec(&ib_conn->post_send_buf_count); + } + + if (atomic_read(&ib_conn->post_recv_buf_count) == 0 && + atomic_read(&ib_conn->post_send_buf_count) == 0) + schedule_work(&ib_conn->comperror_work); +} + +static void iser_cq_tasklet_fn(unsigned long data) +{ + struct iser_device *device = (struct iser_device *)data; + struct ib_cq *cq = device->cq; + struct ib_wc wc; + struct iser_desc *desc; + unsigned long xfer_len; + + while (ib_poll_cq(cq, 1, &wc) == 1) { + desc = (struct iser_desc *) (unsigned long) wc.wr_id; + BUG_ON(desc == NULL); + + if (wc.status == IB_WC_SUCCESS) { + if (desc->type == ISCSI_RX) { + xfer_len = (unsigned long)wc.byte_len; + iser_rcv_completion(desc, xfer_len); + } else /* type == ISCSI_TX_CONTROL/SCSI_CMD/DOUT */ + iser_snd_completion(desc); + } else { + iser_err("comp w. error op %d status %d\n",desc->type,wc.status); + iser_handle_comp_error(desc); + } + } + /* #warning "it is assumed here that arming CQ only once its empty" * + * " would not cause interrupts to be missed" */ + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); +} + +static void iser_cq_callback(struct ib_cq *cq, void *cq_context) +{ + struct iser_device *device = (struct iser_device *)cq_context; + + tasklet_schedule(&device->cq_tasklet); +} From ogerlitz at voltaire.com Wed May 10 06:22:50 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 10 May 2006 16:22:50 +0300 (IDT) Subject: [openib-general] [PATCH 5/6] iser handling of memory for RDMA In-Reply-To: Message-ID: --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iser_memory.c 1970-01-01 02:00:00.000000000 +0200 +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iser_memory.c 2006-05-10 15:32:01.000000000 +0300 @@ -0,0 +1,401 @@ +/* + * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: iser_memory.c 6964 2006-05-07 11:11:43Z ogerlitz $ + */ +#include +#include +#include +#include +#include +#include +#include + +#include "iscsi_iser.h" + +#define ISER_KMALLOC_THRESHOLD 0x20000 /* 128K - kmalloc limit */ +/** + * Decrements the reference count for the + * registered buffer & releases it + * + * returns 0 if released, 1 if deferred + */ +int iser_regd_buff_release(struct iser_regd_buf *regd_buf) +{ + struct device *dma_device; + + if ((atomic_read(®d_buf->ref_count) == 0) || + atomic_dec_and_test(®d_buf->ref_count)) { + /* if we used the dma mr, unreg is just NOP */ + if (regd_buf->reg.rkey != 0) + iser_unreg_mem(®d_buf->reg); + + if (regd_buf->dma_addr) { + dma_device = regd_buf->device->ib_device->dma_device; + dma_unmap_single(dma_device, + regd_buf->dma_addr, + regd_buf->data_size, + regd_buf->direction); + } + /* else this regd buf is associated with task which we */ + /* dma_unmap_single/sg later */ + return 0; + } else { + iser_dbg("Release deferred, regd.buff: 0x%p\n", regd_buf); + return 1; + } +} + +/** + * iser_reg_single - fills registered buffer descriptor with + * registration information + */ +void iser_reg_single(struct iser_device *device, + struct iser_regd_buf *regd_buf, + enum dma_data_direction direction) +{ + dma_addr_t dma_addr; + + dma_addr = dma_map_single(device->ib_device->dma_device, + regd_buf->virt_addr, + regd_buf->data_size, direction); + BUG_ON(dma_mapping_error(dma_addr)); + + regd_buf->reg.lkey = device->mr->lkey; + regd_buf->reg.rkey = 0; /* indicate there's no need to unreg */ + regd_buf->reg.len = regd_buf->data_size; + regd_buf->reg.va = dma_addr; + + regd_buf->dma_addr = dma_addr; + regd_buf->direction = direction; +} + +/** + * iser_start_rdma_unaligned_sg + */ +int iser_start_rdma_unaligned_sg(struct iscsi_iser_cmd_task *iser_ctask, + enum iser_data_dir cmd_dir) +{ + int dma_nents; + struct device *dma_device; + char *mem = NULL; + struct iser_data_buf *data = &iser_ctask->data[cmd_dir]; + unsigned long cmd_data_len = data->data_len; + + if (cmd_data_len > ISER_KMALLOC_THRESHOLD) + mem = (void *)__get_free_pages(GFP_KERNEL, + long_log2(roundup_pow_of_two(cmd_data_len)) - PAGE_SHIFT); + else + mem = kmalloc(cmd_data_len, GFP_KERNEL); + + if (mem == NULL) { + iser_err("Failed to allocate mem size %d %d for copying sglist\n", + data->size,(int)cmd_data_len); + return -ENOMEM; + } + + if (cmd_dir == ISER_DIR_OUT) { + /* copy the unaligned sg the buffer which is used for RDMA */ + struct scatterlist *sg = (struct scatterlist *)data->buf; + int i; + char *p, *from; + + for (p = mem, i = 0; i < data->size; i++) { + from = kmap_atomic(sg[i].page, KM_USER0); + memcpy(p, + from + sg[i].offset, + sg[i].length); + kunmap_atomic(from, KM_USER0); + p += sg[i].length; + } + } + + sg_init_one(&iser_ctask->data_copy[cmd_dir].sg_single, mem, cmd_data_len); + iser_ctask->data_copy[cmd_dir].buf = + &iser_ctask->data_copy[cmd_dir].sg_single; + iser_ctask->data_copy[cmd_dir].size = 1; + + iser_ctask->data_copy[cmd_dir].copy_buf = mem; + + dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + + if (cmd_dir == ISER_DIR_OUT) + dma_nents = dma_map_sg(dma_device, + &iser_ctask->data_copy[cmd_dir].sg_single, + 1, DMA_TO_DEVICE); + else + dma_nents = dma_map_sg(dma_device, + &iser_ctask->data_copy[cmd_dir].sg_single, + 1, DMA_FROM_DEVICE); + + BUG_ON(dma_nents == 0); + + iser_ctask->data_copy[cmd_dir].dma_nents = dma_nents; + return 0; +} + +/** + * iser_finalize_rdma_unaligned_sg + */ +void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_cmd_task *iser_ctask, + enum iser_data_dir cmd_dir) +{ + struct device *dma_device; + struct iser_data_buf *mem_copy; + unsigned long cmd_data_len; + + dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + mem_copy = &iser_ctask->data_copy[cmd_dir]; + + if (cmd_dir == ISER_DIR_OUT) + dma_unmap_sg(dma_device, &mem_copy->sg_single, 1, + DMA_TO_DEVICE); + else + dma_unmap_sg(dma_device, &mem_copy->sg_single, 1, + DMA_FROM_DEVICE); + + if (cmd_dir == ISER_DIR_IN) { + char *mem; + struct scatterlist *sg; + unsigned char *p, *to; + unsigned int sg_size; + int i; + + /* copy back read RDMA to unaligned sg */ + mem = mem_copy->copy_buf; + + sg = (struct scatterlist *)iser_ctask->data[ISER_DIR_IN].buf; + sg_size = iser_ctask->data[ISER_DIR_IN].size; + + for (p = mem, i = 0; i < sg_size; i++){ + to = kmap_atomic(sg[i].page, KM_SOFTIRQ0); + memcpy(to + sg[i].offset, + p, + sg[i].length); + kunmap_atomic(to, KM_SOFTIRQ0); + p += sg[i].length; + } + } + + cmd_data_len = iser_ctask->data[cmd_dir].data_len; + + if (cmd_data_len > ISER_KMALLOC_THRESHOLD) + free_pages((unsigned long)mem_copy->copy_buf, + long_log2(roundup_pow_of_two(cmd_data_len)) - PAGE_SHIFT); + else + kfree(mem_copy->copy_buf); + + mem_copy->copy_buf = NULL; +} + +/** + * iser_sg_to_page_vec - Translates scatterlist entries to physical addresses + * and returns the length of resulting physical address array (may be less than + * the original due to possible compaction). + * + * we build a "page vec" under the assumption that the SG meets the RDMA + * alignment requirements. Other then the first and last SG elements, all + * the "internal" elements can be compacted into a list whose elements are + * dma addresses of physical pages. The code supports also the weird case + * where --few fragments of the same page-- are present in the SG as + * consecutive elements. Also, it handles one entry SG. + */ +static int iser_sg_to_page_vec(struct iser_data_buf *data, + struct iser_page_vec *page_vec) +{ + struct scatterlist *sg = (struct scatterlist *)data->buf; + dma_addr_t first_addr, last_addr, page; + int start_aligned, end_aligned; + unsigned int cur_page = 0; + unsigned long total_sz = 0; + int i; + + /* compute the offset of first element */ + page_vec->offset = (u64) sg[0].offset; + + for (i = 0; i < data->dma_nents; i++) { + total_sz += sg_dma_len(&sg[i]); + + first_addr = sg_dma_address(&sg[i]); + last_addr = first_addr + sg_dma_len(&sg[i]); + + start_aligned = !(first_addr & ~PAGE_MASK); + end_aligned = !(last_addr & ~PAGE_MASK); + + /* continue to collect page fragments till aligned or SG ends */ + while (!end_aligned && (i + 1 < data->dma_nents)) { + i++; + total_sz += sg_dma_len(&sg[i]); + last_addr = sg_dma_address(&sg[i]) + sg_dma_len(&sg[i]); + end_aligned = !(last_addr & ~PAGE_MASK); + } + + first_addr = first_addr & PAGE_MASK; + + for (page = first_addr; page < last_addr; page += PAGE_SIZE) + page_vec->pages[cur_page++] = page; + + } + page_vec->data_size = total_sz; + iser_dbg("page_vec->data_size:%d cur_page %d\n", page_vec->data_size,cur_page); + return cur_page; +} + +#define MASK_4K ((1UL << 12) - 1) /* 0xFFF */ +#define IS_4K_ALIGNED(addr) ((((unsigned long)addr) & MASK_4K) == 0) + +/** + * iser_data_buf_aligned_len - Tries to determine the maximal correctly aligned + * for RDMA sub-list of a scatter-gather list of memory buffers, and returns + * the number of entries which are aligned correctly. Supports the case where + * consecutive SG elements are actually fragments of the same physcial page. + */ +static unsigned int iser_data_buf_aligned_len(struct iser_data_buf *data) +{ + struct scatterlist *sg; + dma_addr_t end_addr, next_addr; + int i, cnt; + unsigned int ret_len = 0; + + sg = (struct scatterlist *)data->buf; + + for (cnt = 0, i = 0; i < data->dma_nents; i++, cnt++) { + /* iser_dbg("Checking sg iobuf [%d]: phys=0x%08lX " + "offset: %ld sz: %ld\n", i, + (unsigned long)page_to_phys(sg[i].page), + (unsigned long)sg[i].offset, + (unsigned long)sg[i].length); */ + end_addr = sg_dma_address(&sg[i]) + + sg_dma_len(&sg[i]); + /* iser_dbg("Checking sg iobuf end address " + "0x%08lX\n", end_addr); */ + if (i + 1 < data->dma_nents) { + next_addr = sg_dma_address(&sg[i+1]); + /* are i, i+1 fragments of the same page? */ + if (end_addr == next_addr) + continue; + else if (!IS_4K_ALIGNED(end_addr)) { + ret_len = cnt + 1; + break; + } + } + } + if (i == data->dma_nents) + ret_len = cnt; /* loop ended */ + iser_dbg("Found %d aligned entries out of %d in sg:0x%p\n", + ret_len, data->dma_nents, data); + return ret_len; +} + +static void iser_data_buf_dump(struct iser_data_buf *data) +{ + struct scatterlist *sg = (struct scatterlist *)data->buf; + int i; + + for (i = 0; i < data->size; i++) + iser_err("sg[%d] dma_addr:0x%lX page:0x%p " + "off:%d sz:%d dma_len:%d\n", + i, (unsigned long)sg_dma_address(&sg[i]), + sg[i].page, sg[i].offset, + sg[i].length,sg_dma_len(&sg[i])); +} + +static void iser_dump_page_vec(struct iser_page_vec *page_vec) +{ + int i; + + iser_err("page vec length %d data size %d\n", + page_vec->length, page_vec->data_size); + for (i = 0; i < page_vec->length; i++) + iser_err("%d %lx\n",i,(unsigned long)page_vec->pages[i]); +} + +static void iser_page_vec_build(struct iser_data_buf *data, + struct iser_page_vec *page_vec) +{ + int page_vec_len = 0; + + page_vec->length = 0; + page_vec->offset = 0; + + iser_dbg("Translating sg sz: %d\n", data->dma_nents); + page_vec_len = iser_sg_to_page_vec(data,page_vec); + iser_dbg("sg len %d page_vec_len %d\n", data->dma_nents,page_vec_len); + + page_vec->length = page_vec_len; + + if (page_vec_len * PAGE_SIZE < page_vec->data_size) { + iser_err("page_vec too short to hold this SG\n"); + iser_data_buf_dump(data); + iser_dump_page_vec(page_vec); + BUG(); + } +} + +/** + * iser_reg_rdma_mem - Registers memory intended for RDMA, + * obtaining rkey and va + * + * returns 0 on success, errno code on failure + */ +int iser_reg_rdma_mem(struct iscsi_iser_cmd_task *iser_ctask, + enum iser_data_dir cmd_dir) +{ + struct iser_conn *ib_conn = iser_ctask->iser_conn->ib_conn; + struct iser_data_buf *mem = &iser_ctask->data[cmd_dir]; + struct iser_regd_buf *regd_buf; + int aligned_len; + int err; + + regd_buf = &iser_ctask->rdma_regd[cmd_dir]; + + aligned_len = iser_data_buf_aligned_len(mem); + if (aligned_len != mem->size) { + iser_err("rdma alignment violation %d/%d aligned\n", + aligned_len, mem->size); + iser_data_buf_dump(mem); + /* allocate copy buf, if we are writing, copy the */ + /* unaligned scatterlist, dma map the copy */ + if (iser_start_rdma_unaligned_sg(iser_ctask, cmd_dir) != 0) + return -ENOMEM; + mem = &iser_ctask->data_copy[cmd_dir]; + } + + iser_page_vec_build(mem, ib_conn->page_vec); + err = iser_reg_page_vec(ib_conn, ib_conn->page_vec, ®d_buf->reg); + if (err) + return err; + + /* take a reference on this regd buf such that it will not be released * + * (eg in send dto completion) before we get the scsi response */ + atomic_inc(®d_buf->ref_count); + return 0; +} From ogerlitz at voltaire.com Wed May 10 06:23:12 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 10 May 2006 16:23:12 +0300 (IDT) Subject: [openib-general] [PATCH 6/6] iSER's Kconfig and Makefile In-Reply-To: Message-ID: --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/Kconfig 1970-01-01 02:00:00.000000000 +0200 +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/Kconfig 2006-05-10 15:32:01.000000000 +0300 @@ -0,0 +1,12 @@ +config INFINIBAND_ISER + tristate "ISCSI RDMA Protocol" + depends on INFINIBAND && SCSI + select SCSI_ISCSI_ATTRS + ---help--- + + Support for the ISCSI RDMA Protocol over InfiniBand. This + allows you to access storage devices that speak ISER/ISCSI + over InfiniBand. + + The ISER protocol is defined by IETF. + See . --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/Makefile 1970-01-01 02:00:00.000000000 +0200 +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/Makefile 2006-05-10 15:32:01.000000000 +0300 @@ -0,0 +1,8 @@ +EXTRA_CFLAGS += -Idrivers/infiniband/include + +obj-$(CONFIG_INFINIBAND_ISER) += ib_iser.o + +ib_iser-y := iser_verbs.o \ + iser_initiator.o \ + iser_memory.o \ + iscsi_iser.o From jackm at mellanox.co.il Wed May 10 06:29:33 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Wed, 10 May 2006 16:29:33 +0300 Subject: [openib-general] Re: [PATCH 3/3] librdmacm: add ability to get/set transport specific options In-Reply-To: <200605101038.31541.jackm@mellanox.co.il> References: <20060510071719.GN21036@mellanox.co.il> <200605101038.31541.jackm@mellanox.co.il> Message-ID: <200605101629.33870.jackm@mellanox.co.il> On Wednesday 10 May 2006 10:38, Jack Morgenstein wrote: > On Wednesday 10 May 2006 10:17, Michael S. Tsirkin wrote: > > Maybe just return -ENODATA? Then you don't need to modify any code ... > > Userspace rdma_get_option() will then also get -ENODATA. OK. > > We can, therefore, do the following: > the dummy procedures in the dummy ib_local_sa.h file will > return -ENODATA for all get operations and for ib_create_path_cursor(), > and -ENOSYS for all set operations. > Then, no changes will be needed (except for adding the dummy file > ib_local_sa.h). > > Is this acceptable? > > - Jack > Oops, there are not set operations here (I evidently did not pay attention that I was looking at ucma_ib.c). I'll just return -ENODATA for ib_get_path_rec() and ib_create_path_cursor(). - Jack From dotanb at mellanox.co.il Wed May 10 06:52:26 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Wed, 10 May 2006 16:52:26 +0300 Subject: [openib-general] [DAPL] latest DAPL cannot be compiled with the latest librdmacm Message-ID: <200605101652.26604.dotanb@mellanox.co.il> Hi. The latest DAPL cannot be compiled with the latest librdmacm after an API change in the librdmacm. Driver: openib_gen2-20060510-1209 (REV=7038) kernel: 2.6.9-22.ELsmp Arch: x86_64 gcc -DHAVE_CONFIG_H -I. -I. -I. -I../libibverbs/include/infiniband -I../librdmacm/include -I../libibverbs/include -Wall -g -D_GNU_SOURCE -D REDHAT_EL4 -DOPENIB -DCQ_WAIT_OBJECT -I./dat/include/ -I./dapl/include/ -I./dapl/common -I./dapl/udapl/linux -I./dapl/openib_cma -g -O2 -MT dapl_udapl_libdaplcma_la-dapl_ib_util.lo -MD -MP -MF .deps/dapl_udapl_libdaplcma_la-dapl_ib_util.Tpo -c dapl/openib_cma/dapl_ib_util.c -fPI C -DPIC -o .libs/dapl_udapl_libdaplcma_la-dapl_ib_util.o dapl/openib_cma/dapl_ib_util.c: In function `dapls_ib_open_hca': dapl/openib_cma/dapl_ib_util.c:229: warning: passing arg 1 of `rdma_create_id' from incompatible pointer type dapl/openib_cma/dapl_ib_util.c:229: error: too few arguments to function `rdma_create_id' dapl/openib_cma/dapl_ib_util.c: In function `dapli_ib_thread_init': dapl/openib_cma/dapl_ib_util.c:554: warning: implicit declaration of function `rdma_get_fd' Not only the DPAL cannot be compiled, the examples that comes with the DAPL that uses the librdmacm need to be updated too. thanks Dotan From xma at us.ibm.com Wed May 10 07:05:03 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 10 May 2006 07:05:03 -0700 Subject: [openib-general] ip over ib throughtput In-Reply-To: <7.0.1.0.2.20060510065006.07b2c928@netapp.com> Message-ID: "Talpey, Thomas" wrote on 05/10/2006 03:53:04 AM: > At 11:13 PM 5/9/2006, Shirley Ma wrote: > >Have you tried to send payload smaller than 2044? Any difference? > > > You mean MTU or ULP payload? The default NFS reads and writes are > 32KB, and in the addressing mode used in these tests they were > broken into 8 page-sized RDMA ops. So, there were 9 ops from the > server, per NFS read. I used the default MTU so these were probably > 19 messages on the wire. I don't expect much difference with smaller > MTU, but smaller NFS ops would be noticeable. > > Tom. > I meant payload less than or equal to 2044, not IB MTU. IPoIB can only send <=2044 payload per ib_send_post(). NFS/RDMA in this case send 32KB per ib_post_send(). It would be nice to know the performance difference under same payload for IPoIB over UD and NFS/RDMA. Is that possible? Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at openib.org Wed May 10 07:55:48 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 10 May 2006 07:55:48 -0700 (PDT) Subject: [openib-general] [Bug 78] OFED 1.0 RC 4 iser install fails if patches already applied Message-ID: <20060510145548.4C2BA2283E5@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=78 ------- Additional Comments From ogerlitz at voltaire.com 2006-05-10 07:55 ------- I am not sure to follow what are you reffering to by install script... looking in the spec file which you can extract by rmp2cpio oiscsi-2.6-16.src.rpm | cpio -id on the iscsi src rpm which is located under https://openib.org/svn/gen2/branches/1.0/ofed/extras/ I see that the preun target patch out the 3 iscsi patches. So, have you did uninstall of rc3 before installing rc4? Also, if you want to clean your system such that you can install rc4 you can get the patches from https://openib.org/svn/gen2/branches/backport/sles10/ and patch them out ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Wed May 10 07:58:41 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 17:58:41 +0300 Subject: [openib-general] [PATCH] mthca: ioremap fix (was: Problem with our SB and your IB Card) Message-ID: <20060510145841.GE10669@mellanox.co.il> Addresses for ioremap must be calculated off of pci_resource_start - we can't use the bus address as seen by the HCA for that. Based on patch by Klaus Smolin. Signed-off-by: Michael S. Tsirkin Index: openib/drivers/infiniband/hw/mthca/mthca_mr.c =================================================================== --- openib/drivers/infiniband/hw/mthca/mthca_mr.c (revision 7018) +++ openib/drivers/infiniband/hw/mthca/mthca_mr.c (working copy) @@ -761,6 +761,7 @@ void mthca_arbel_fmr_unmap(struct mthca_ int __devinit mthca_init_mr_table(struct mthca_dev *dev) { + unsigned long addr; int err, i; err = mthca_alloc_init(&dev->mr_table.mpt_alloc, @@ -796,9 +797,12 @@ int __devinit mthca_init_mr_table(struct goto err_fmr_mpt; } + addr = pci_resource_start(dev->pdev, 4) + + ((pci_resource_len(dev->pdev, 4) - 1) & + dev->mr_table.mpt_base); + dev->mr_table.tavor_fmr.mpt_base = - ioremap(dev->mr_table.mpt_base, - (1 << i) * sizeof (struct mthca_mpt_entry)); + ioremap(addr, (1 << i) * sizeof(struct mthca_mpt_entry)); if (!dev->mr_table.tavor_fmr.mpt_base) { mthca_warn(dev, "MPT ioremap for FMR failed.\n"); @@ -806,9 +810,12 @@ int __devinit mthca_init_mr_table(struct goto err_fmr_mpt; } + addr = pci_resource_start(dev->pdev, 4) + + ((pci_resource_len(dev->pdev, 4) - 1) & + dev->mr_table.mtt_base); + dev->mr_table.tavor_fmr.mtt_base = - ioremap(dev->mr_table.mtt_base, - (1 << i) * MTHCA_MTT_SEG_SIZE); + ioremap(addr, (1 << i) * MTHCA_MTT_SEG_SIZE); if (!dev->mr_table.tavor_fmr.mtt_base) { mthca_warn(dev, "MTT ioremap for FMR failed.\n"); err = -ENOMEM; -- MST From vuhuong at mellanox.com Wed May 10 08:36:41 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Wed, 10 May 2006 08:36:41 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> Message-ID: <44620889.3010702@mellanox.com> Roland Dreier wrote: > BTW, does Mellanox (or anyone else) have any numbers showing that > using FMRs makes any difference in performance on a semi-realistic benchmark? > I'm using xdd to test the performance www.ioperformance.com/products.htm The target is Mellanox srp target reference implemenation with 14 SATA spindles I can get ~780 MB/s max without FMRs and ~920 MB/s with FMRs (using 256 KB sequential read direct IO request) Vu From vuhuong at mellanox.com Wed May 10 08:42:39 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Wed, 10 May 2006 08:42:39 -0700 Subject: [openib-general][PATCH] srp: tuned parameters, In-Reply-To: References: <443E8325.2000502@mellanox.com> Message-ID: <446209EF.7040207@mellanox.com> Roland Dreier wrote: > I finally looked this over. > > First, this should be two patches: making srp_sg_tablesize tunable > should be a separate change from making it possible to specify > max_cmd_per_lun for a target. > OK, I'll break it to two patches > The srp_sg_tablesize change makes the default number of SG entries > quite a bit larger than it is now, which makes the default max IU > length much bigger. Is this justified? What workload creates such > huge SG lists? With semi-realistic benchmark xdd, orion I'm seeing better number with this default value I think that we can reduce the default value for srp_sg_tablesize and up to users to bump it up by overriding when loading up the module. > > For the cmd_per_lun change, shouldn't the line > > >>+ target->scsi_host->cmd_per_lun = token; > > > be something like > > target->scsi_host->cmd_per_lun = min(token, SRP_SQ_SIZE); > > otherwise it's too easy to overflow a send queue by mistake. > Yes. I'll fix it Vu From rdreier at cisco.com Wed May 10 08:53:47 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 08:53:47 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <7.0.1.0.2.20060510063730.04336f80@netapp.com> (Thomas Talpey's message of "Wed, 10 May 2006 06:49:28 -0400") References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> <7.0.1.0.2.20060510063730.04336f80@netapp.com> Message-ID: Thomas> I am planning to test this some more in the next few Thomas> weeks, but what I'd really like to see is an IBTA Thomas> 1.2-compliant implementation, and one that operated on Thomas> work queue entries (not synchronous verbs). Is that being Thomas> worked on? No current hardware supports that as far as I know. (Well, ipath could fake it since they already implement all the verbs in software) - R. From rdreier at cisco.com Wed May 10 08:55:22 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 08:55:22 -0700 Subject: [openib-general][RFC][PATCH] core/sysfs.c: ability to reset port counters In-Reply-To: <10e223bf0605100132h40d88edfv9b756eceb519047c@mail.gmail.com> (Leonid Arsh's message of "Wed, 10 May 2006 11:32:34 +0300") References: <20060509120613.GA3294@voltaire.com> <10e223bf0605100132h40d88edfv9b756eceb519047c@mail.gmail.com> Message-ID: Leonid> A user space application is an option too, although I Leonid> think it's nice to have a 'built in' kernel feature. As Hal pointed out, there already is an app to do this. So I don't see much need to put it into the kernel. - R. From halr at voltaire.com Wed May 10 09:21:47 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 May 2006 12:21:47 -0400 Subject: [openib-general] [PATCH] OpenSM/osmtest: Add rudimentary SA GuidInfoRecord test In-Reply-To: <1147265116.4485.42334.camel@hal.voltaire.com> References: <1147265116.4485.42334.camel@hal.voltaire.com> Message-ID: <1147278106.4485.46636.camel@hal.voltaire.com> On Wed, 2006-05-10 at 08:45, Hal Rosenstock wrote: > OpenSM/osmtest: Add rudimentary SA GuidInfoRecord test > > Signed-off-by: Hal Rosenstock Applied to both trunk and 1.0 branch. -- Hal From mshefty at ichips.intel.com Wed May 10 09:26:49 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 10 May 2006 09:26:49 -0700 Subject: [openib-general] rdma_cm.h: comment nits. In-Reply-To: <1147231080.5093.13.camel@trinity.ogc.int> References: <20060508135855.GE21036@mellanox.co.il> <1147231080.5093.13.camel@trinity.ogc.int> Message-ID: <44621449.9080507@ichips.intel.com> Tom Tucker wrote: >>Its OK to call rdma_reject on active side as well, isn't it? > > You'll get -EINVAL on iWARP if you do this.... For IB, rdma_reject can be called on the active side if the user is managing their own QP states, or is SDP. How does iWarp support userspace QPs? - Sean From mshefty at ichips.intel.com Wed May 10 09:28:18 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 10 May 2006 09:28:18 -0700 Subject: [openib-general] Re: [PATCH] RE: compliancy issue? In-Reply-To: <20060510080532.GO21036@mellanox.co.il> References: <20060508085301.GD20207@mellanox.co.il> <20060510052937.GH22825@mellanox.co.il> <20060510054313.GJ22825@mellanox.co.il> <44619AC9.2040402@voltaire.com> <20060510080532.GO21036@mellanox.co.il> Message-ID: <446214A2.30002@ichips.intel.com> Michael S. Tsirkin wrote: >>No, looking in the code shows that qp will be changed to rtr and then >>rts ***before*** sending the RTU since you will call rdma_accept which >>in turn will call cma_rep_recv > > > Right, missed that, thanks! > I was wandering why it was behaving not the way I expected it to :) If this is working for you, I will commit the change. Just let me know. (It looked like Or answered your questions.) - Sean From halr at voltaire.com Wed May 10 09:25:46 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 May 2006 12:25:46 -0400 Subject: [openib-general] IBDM Changes Coordination Message-ID: <1147278345.4485.46718.camel@hal.voltaire.com> Hi Eitan, Yesterday, you checked in some changes to ibdm for OFED on the 1.0 branch. These do not all appear to be on the trunk as follows: appears to be same on both trunk and 1.0/ofed: U ofed/ibutils/ibdm/datamodel/Fabric.h appear to need merging to trunk U ofed/ibutils/ibdm/src/osm_check.cpp U ofed/ibutils/ibdm/datamodel/SubnMgt.cpp U ofed/ibutils/ibdm/datamodel/LinkCover.cpp Should these be merged to the trunk ? I thought the OFED policy was trunk first and then 1.0 branch... -- Hal From robert.j.woodruff at intel.com Wed May 10 09:31:29 2006 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Wed, 10 May 2006 09:31:29 -0700 Subject: [openib-general] [TRIVIAL][PATCH] Add rds to Kconfig and Makefile Message-ID: <000001c6744f$3175f410$010fa8c0@amr.corp.intel.com> Not sure who maintains these, but the following patch adds RDS to the Kconfig and Makefile in drivers/infiniband woody diff -Naurp linux-2.6.9/drivers/infiniband/Kconfig linux-2.6.9-fixups/drivers/infiniband/Kconfig --- linux-2.6.9/drivers/infiniband/Kconfig 2006-05-10 08:41:42.000000000 -0700 +++ linux-2.6.9-fixups/drivers/infiniband/Kconfig 2006-05-10 08:42:28.000000000 -0700 @@ -47,4 +47,6 @@ source "drivers/infiniband/ulp/srp/Kconf source "drivers/infiniband/ulp/iser/Kconfig" +source "drivers/infiniband/ulp/rds/Kconfig" + endmenu diff -Naurp linux-2.6.9/drivers/infiniband/Makefile linux-2.6.9-fixups/drivers/infiniband/Makefile --- linux-2.6.9/drivers/infiniband/Makefile 2006-05-10 08:41:42.000000000 -0700 +++ linux-2.6.9-fixups/drivers/infiniband/Makefile 2006-05-10 08:42:50.000000000 -0700 @@ -7,3 +7,4 @@ obj-$(CONFIG_INFINIBAND_SDP) += ulp/sdp obj-$(CONFIG_INFINIBAND_SRP) += ulp/srp/ obj-$(CONFIG_INFINIBAND_ISER) += ulp/iser/ obj-$(CONFIG_INFINIBAND_EHCA) += hw/ehca/ +obj-$(CONFIG_INFINIBAND_RDS) += ulp/rds/ From mst at mellanox.co.il Wed May 10 09:35:23 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 19:35:23 +0300 Subject: [openib-general] rdma_cm.h: comment nits. In-Reply-To: <44621449.9080507@ichips.intel.com> References: <20060508135855.GE21036@mellanox.co.il> <1147231080.5093.13.camel@trinity.ogc.int> <44621449.9080507@ichips.intel.com> Message-ID: <20060510163523.GK22825@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] rdma_cm.h: comment nits. > > Tom Tucker wrote: > >>Its OK to call rdma_reject on active side as well, isn't it? > > > >You'll get -EINVAL on iWARP if you do this.... > > For IB, rdma_reject can be called on the active side if the user is > managing their own QP states, or is SDP. How does iWarp support userspace > QPs? BTW, Sean, could you please explain why is RESPONSE event IB-specific? Does not it match Syn/Ack in the TCP 3-way handshake? What I am trying to say, why are you returning ESTABLISHED on the active side at all? Maybe we should always pass RESPONSE on active side and only pass ESTABLISHED on passive side. TCP certainly seems to make a distinction between these. -- MST From mshefty at ichips.intel.com Wed May 10 09:34:35 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 10 May 2006 09:34:35 -0700 Subject: [openib-general] Re: [PATCH 3/3] librdmacm: add ability to get/set transport specific options In-Reply-To: <200605101038.31541.jackm@mellanox.co.il> References: <200605100956.30583.jackm@mellanox.co.il> <20060510071719.GN21036@mellanox.co.il> <200605101038.31541.jackm@mellanox.co.il> Message-ID: <4462161B.1060009@ichips.intel.com> Jack Morgenstein wrote: > Userspace rdma_get_option() will then also get -ENODATA. OK. > > We can, therefore, do the following: > the dummy procedures in the dummy ib_local_sa.h file will > return -ENODATA for all get operations and for ib_create_path_cursor(), > and -ENOSYS for all set operations. > Then, no changes will be needed (except for adding the dummy file > ib_local_sa.h). > > Is this acceptable? You could also just include the local SA cache. If you want it enabled by default, you can change cache_timeout or paths_per_dest to 0 where they are declared. - Sean From mst at mellanox.co.il Wed May 10 09:39:45 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 19:39:45 +0300 Subject: [openib-general] Re: [PATCH] RE: compliancy issue? In-Reply-To: <446214A2.30002@ichips.intel.com> References: <20060508085301.GD20207@mellanox.co.il> <20060510052937.GH22825@mellanox.co.il> <20060510054313.GJ22825@mellanox.co.il> <44619AC9.2040402@voltaire.com> <20060510080532.GO21036@mellanox.co.il> <446214A2.30002@ichips.intel.com> Message-ID: <20060510163945.GM22825@mellanox.co.il> Quoting r. Sean Hefty : > If this is working for you, I will commit the change. Just let me know. Thanks! It looks OK but I didn't test it yet. > (It looked like Or answered your questions.) Yes, he did. -- MST From jackm at mellanox.co.il Wed May 10 09:50:57 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Wed, 10 May 2006 19:50:57 +0300 Subject: [openib-general] Re: [PATCH 3/3] librdmacm: add ability to get/set transport specific options In-Reply-To: <4462161B.1060009@ichips.intel.com> References: <200605101038.31541.jackm@mellanox.co.il> <4462161B.1060009@ichips.intel.com> Message-ID: <200605101950.57334.jackm@mellanox.co.il> On Wednesday 10 May 2006 19:34, Sean Hefty wrote: > > You could also just include the local SA cache. If you want it enabled by > default, you can change cache_timeout or paths_per_dest to 0 where they are > declared. > I assume you mean "disabled". Looks like setting cache_timeout to zero as the default is good enough: in sa_db_init(), cache_timeout remains 0 when translated to jiffies, resulting in paths_per_dest being set to zero as well. Therefore, in the patch, I can replace line: static unsigned long cache_timeout = 15 * 60 * 1000; /* 15 min */ with static unsigned long cache_timeout; Is this correct? - Jack From mshefty at ichips.intel.com Wed May 10 09:50:48 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 10 May 2006 09:50:48 -0700 Subject: [openib-general] there is a compilation warning in librdmacm In-Reply-To: <200605101522.42890.dotanb@mellanox.co.il> References: <200605101522.42890.dotanb@mellanox.co.il> Message-ID: <446219E8.4050606@ichips.intel.com> Dotan Barak wrote: > There is a compilation warning in the file: src/userspace/librdmacm/src/cma.c. Thanks - I committed a fix for this. It should have been a void. - Sean From mshefty at ichips.intel.com Wed May 10 09:54:59 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 10 May 2006 09:54:59 -0700 Subject: [openib-general] Re: [PATCH 3/3] librdmacm: add ability to get/set transport specific options In-Reply-To: <200605101950.57334.jackm@mellanox.co.il> References: <200605101038.31541.jackm@mellanox.co.il> <4462161B.1060009@ichips.intel.com> <200605101950.57334.jackm@mellanox.co.il> Message-ID: <44621AE3.4030509@ichips.intel.com> Jack Morgenstein wrote: > I assume you mean "disabled". Uhm.. yes - that's what I meant. > Looks like setting cache_timeout to zero as the default is good enough: > in sa_db_init(), cache_timeout remains 0 when translated to jiffies, resulting > in paths_per_dest being set to zero as well. > > Therefore, in the patch, I can replace line: > static unsigned long cache_timeout = 15 * 60 * 1000; /* 15 min */ > with > static unsigned long cache_timeout; > > Is this correct? Correct. - Sean From mshefty at ichips.intel.com Wed May 10 09:55:32 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 10 May 2006 09:55:32 -0700 Subject: [openib-general] [PATCH] librdmacm abi version In-Reply-To: <20060510112303.GS21036@mellanox.co.il> References: <20060510112303.GS21036@mellanox.co.il> Message-ID: <44621B04.2020501@ichips.intel.com> Thanks - committed. - Sean From jackm at mellanox.co.il Wed May 10 10:05:58 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Wed, 10 May 2006 20:05:58 +0300 Subject: [openib-general] Re: [PATCH 3/3] librdmacm: add ability to get/set transport specific options In-Reply-To: <44621AE3.4030509@ichips.intel.com> References: <200605101950.57334.jackm@mellanox.co.il> <44621AE3.4030509@ichips.intel.com> Message-ID: <200605102005.58006.jackm@mellanox.co.il> On Wednesday 10 May 2006 19:54, Sean Hefty wrote: >> > Correct. > Good. I'll add local_sa to OFED tomorrow morning, and patch local_sa.c as indicated so that it is disabled by default. - Jack From sashak at voltaire.com Wed May 10 10:18:20 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 10 May 2006 20:18:20 +0300 Subject: [openib-general] [PATCH] opensm: complib: cleanup unused cl_obj files Message-ID: <20060510171820.31351.20465.stgit@sashak.voltaire.com> Cleanup unused cl_obj source and header files from complib. Signed-off-by: Sasha Khapyorsky --- osm/complib/Makefile.am | 3 osm/complib/cl_obj.c | 736 ----------------------------------- osm/complib/libosmcomp.map | 12 - osm/include/Makefile.am | 1 osm/include/complib/cl_obj.h | 871 ------------------------------------------ 5 files changed, 1 insertions(+), 1622 deletions(-) diff --git a/osm/complib/Makefile.am b/osm/complib/Makefile.am index 91aa35a..40e978f 100644 --- a/osm/complib/Makefile.am +++ b/osm/complib/Makefile.am @@ -21,7 +21,7 @@ endif libosmcomp_la_SOURCES = cl_async_proc.c cl_complib.c \ cl_dispatcher.c cl_event.c cl_event_wheel.c \ cl_list.c cl_log.c cl_map.c cl_memory.c \ - cl_memory_osd.c cl_obj.c cl_perf.c cl_pool.c \ + cl_memory_osd.c cl_perf.c cl_pool.c \ cl_ptr_vector.c cl_reqmgr.c \ cl_spinlock.c cl_statustext.c \ cl_thread.c cl_threadpool.c \ @@ -53,7 +53,6 @@ libosmcompinclude_HEADERS = $(srcdir)/.. $(srcdir)/../include/complib/cl_memory.h \ $(srcdir)/../include/complib/cl_memory_osd.h \ $(srcdir)/../include/complib/cl_memtrack.h \ - $(srcdir)/../include/complib/cl_obj.h \ $(srcdir)/../include/complib/cl_packoff.h \ $(srcdir)/../include/complib/cl_packon.h \ $(srcdir)/../include/complib/cl_passivelock.h \ diff --git a/osm/complib/cl_obj.c b/osm/complib/cl_obj.c deleted file mode 100644 index 5a3d790..0000000 --- a/osm/complib/cl_obj.c +++ /dev/null @@ -1,736 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * $Id$ - */ - - -#if HAVE_CONFIG_H -# include -#endif /* HAVE_CONFIG_H */ - -#include -#include -#include - - -/* Number of relation objects to add to the global pool when growing. */ -#define CL_REL_POOL_SIZE ( 4096 / sizeof( cl_obj_rel_t ) ) - - - -/* The global object manager. */ -cl_obj_mgr_t *gp_obj_mgr = NULL; - - - -/******************************************************************** - * Global Object Manager - *******************************************************************/ - -cl_status_t -cl_obj_mgr_create( void ) -{ - cl_status_t status; - - /* See if the object manager has already been created. */ - if( gp_obj_mgr ) - return CL_SUCCESS; - - /* Allocate the object manager. */ - gp_obj_mgr = cl_zalloc( sizeof( cl_obj_mgr_t ) ); - if( !gp_obj_mgr ) - return CL_INSUFFICIENT_MEMORY; - - /* Construct the object manager. */ - cl_qlist_init( &gp_obj_mgr->obj_list ); - cl_spinlock_construct( &gp_obj_mgr->lock ); - cl_async_proc_construct( &gp_obj_mgr->async_proc_mgr ); - cl_qpool_construct( &gp_obj_mgr->rel_pool ); - - /* Initialize the spinlock. */ - status = cl_spinlock_init( &gp_obj_mgr->lock ); - if( status != CL_SUCCESS ) - { - cl_obj_mgr_destroy(); - return status; - } - - /* Initialize the asynchronous processing manager. */ - status = cl_async_proc_init( &gp_obj_mgr->async_proc_mgr, 1, "obj_mgr" ); - if( status != CL_SUCCESS ) - { - cl_obj_mgr_destroy(); - return status; - } - - /* Initialize the relationship pool. */ - status = cl_qpool_init( &gp_obj_mgr->rel_pool, 0, 0, CL_REL_POOL_SIZE, - sizeof( cl_obj_rel_t ), NULL, NULL, gp_obj_mgr ); - if( status != CL_SUCCESS ) - { - cl_obj_mgr_destroy(); - return status; - } - - return CL_SUCCESS; -} - - - -void -cl_obj_mgr_destroy( void ) -{ - cl_list_item_t *p_list_item; - cl_obj_t *p_obj; - - /* See if the object manager had been created. */ - if( !gp_obj_mgr ) - return; - - /* Verify that all object's have been destroyed. */ - for( p_list_item = cl_qlist_head( &gp_obj_mgr->obj_list ); - p_list_item != cl_qlist_end( &gp_obj_mgr->obj_list ); - p_list_item = cl_qlist_next( p_list_item ) ) - { - p_obj = PARENT_STRUCT( p_list_item, cl_obj_t, pool_item ); -#if defined( _DEBUG_ ) - cl_dbg_out( "object not destroyed %p(%i), ref_cnt: %d\n", - p_obj, p_obj->type, p_obj->ref_cnt ); -#endif - } - - /* Destroy all object manager resources. */ - cl_spinlock_destroy( &gp_obj_mgr->lock ); - cl_async_proc_destroy( &gp_obj_mgr->async_proc_mgr ); - cl_qpool_destroy( &gp_obj_mgr->rel_pool ); - - /* Free the object manager and clear the global pointer. */ - cl_free( gp_obj_mgr ); - gp_obj_mgr = NULL; -} - - - -/* - * Get an item to track object relationships. - */ -cl_obj_rel_t* -cl_rel_alloc( void ) -{ - cl_obj_rel_t *p_rel; - - CL_ASSERT( gp_obj_mgr ); - - cl_spinlock_acquire( &gp_obj_mgr->lock ); - p_rel = (cl_obj_rel_t*)cl_qpool_get( &gp_obj_mgr->rel_pool ); - cl_spinlock_release( &gp_obj_mgr->lock ); - - return p_rel; -} - - - -/* - * Return an item used to track relationships back to the pool. - */ -void -cl_rel_free( - IN cl_obj_rel_t * const p_rel ) -{ - CL_ASSERT( gp_obj_mgr && p_rel ); - - cl_spinlock_acquire( &gp_obj_mgr->lock ); - cl_qpool_put( &gp_obj_mgr->rel_pool, &p_rel->pool_item ); - cl_spinlock_release( &gp_obj_mgr->lock ); -} - - - -/* - * Insert an object into the global object manager's list. - */ -static void -__track_obj( - IN cl_obj_t *p_obj ) -{ - CL_ASSERT( gp_obj_mgr && p_obj ); - - cl_spinlock_acquire( &gp_obj_mgr->lock ); - cl_qlist_insert_tail( &gp_obj_mgr->obj_list, - (cl_list_item_t*)&p_obj->pool_item ); - cl_spinlock_release( &gp_obj_mgr->lock ); -} - - - -/* - * Remove an object from the global object manager's list. - */ -static void -__remove_obj( - IN cl_obj_t *p_obj ) -{ - CL_ASSERT( gp_obj_mgr && p_obj ); - - cl_spinlock_acquire( &gp_obj_mgr->lock ); - cl_qlist_remove_item( &gp_obj_mgr->obj_list, - (cl_list_item_t*)&p_obj->pool_item ); - cl_spinlock_release( &gp_obj_mgr->lock ); -} - - - -/******************************************************************** - * Generic Object Class - *******************************************************************/ - -/* Function prototypes. */ -static void -__destroy_obj( - IN cl_obj_t *p_obj ); - -static void -__destroy_cb( - IN cl_async_proc_item_t *p_item ); - -/* Sets the state of an object and returns the old state. */ -static cl_state_t -__obj_set_state( - IN cl_obj_t * const p_obj, - IN const cl_state_t new_state ); - - - - -void -cl_obj_construct( - IN cl_obj_t * const p_obj, - IN const uint32_t obj_type ) -{ - CL_ASSERT( p_obj ); - cl_memclr( p_obj, sizeof( cl_obj_t ) ); - - cl_spinlock_construct( &p_obj->lock ); - p_obj->state = CL_UNINITIALIZED; - p_obj->type = obj_type; - cl_event_construct( &p_obj->event ); - - cl_qlist_init( &p_obj->parent_list ); - cl_qlist_init( &p_obj->child_list ); - - /* Insert the object into the global tracking list. */ - __track_obj( p_obj ); -} - - - -cl_status_t -cl_obj_init( - IN cl_obj_t * const p_obj, - IN cl_destroy_type_t destroy_type, - IN const cl_pfn_obj_call_t pfn_destroying OPTIONAL, - IN const cl_pfn_obj_call_t pfn_cleanup OPTIONAL, - IN const cl_pfn_obj_call_t pfn_free ) -{ - cl_status_t status; - - CL_ASSERT( p_obj && pfn_free ); - CL_ASSERT( p_obj->state == CL_UNINITIALIZED ); - - /* The object references itself until it is destroyed. */ - p_obj->ref_cnt = 1; - - /* Record destruction callbacks. */ - p_obj->pfn_destroying = pfn_destroying; - p_obj->pfn_cleanup = pfn_cleanup; - p_obj->pfn_free = pfn_free; - - /* Set the destroy function pointer based on the destruction type. */ - p_obj->destroy_type = destroy_type; - p_obj->async_item.pfn_callback = __destroy_cb; - - /* Initialize the spinlock. */ - status = cl_spinlock_init( &p_obj->lock ); - if( status != CL_SUCCESS ) - return status; - - /* Initialize the synchronous cleanup event. */ - status = cl_event_init( &p_obj->event, FALSE ); - if( status != CL_SUCCESS ) - return status; - - p_obj->state = CL_INITIALIZED; - - return CL_SUCCESS; -} - - - -void -cl_obj_destroy( - IN cl_obj_t * p_obj ) -{ - cl_state_t old_state; - - CL_ASSERT( p_obj ); - - /* Mark that we're destroying the object. */ - old_state = __obj_set_state( p_obj, CL_DESTROYING ); - - /* The user cannot destroy an object and its parent at the same time. */ - CL_ASSERT( old_state != CL_DESTROYING ); - - /* Destroy the object. */ - __destroy_obj( p_obj ); -} - - - -void -cl_obj_reset( - IN cl_obj_t * const p_obj ) -{ - CL_ASSERT( p_obj ); - CL_ASSERT( p_obj->ref_cnt == 0 ); - CL_ASSERT( p_obj->state == CL_DESTROYING ); - - p_obj->ref_cnt = 1; - p_obj->state = CL_INITIALIZED; - - cl_qlist_remove_all( &p_obj->parent_list ); - cl_qlist_remove_all( &p_obj->child_list ); -} - - - -static cl_state_t -__obj_set_state( - IN cl_obj_t * const p_obj, - IN const cl_state_t new_state ) -{ - cl_state_t old_state; - - cl_spinlock_acquire( &p_obj->lock ); - old_state = p_obj->state; - p_obj->state = new_state; - cl_spinlock_release( &p_obj->lock ); - - return old_state; -} - - - -/* - * Add a dependent relationship between two objects. - */ -void -cl_obj_insert_rel( - IN cl_obj_rel_t * const p_rel, - IN cl_obj_t * const p_parent_obj, - IN cl_obj_t * const p_child_obj ) -{ - CL_ASSERT( p_rel && p_parent_obj && p_child_obj ); - - /* The child object needs to maintain a reference on the parent. */ - cl_obj_ref( p_parent_obj ); - cl_obj_ref( p_child_obj ); - - /* Save the relationship details. */ - p_rel->p_child_obj = p_child_obj; - p_rel->p_parent_obj = p_parent_obj; - - /* - * Track the object - hold both locks to ensure that the relationship is - * viewable in the child and parent lists at the same time. - */ - cl_spinlock_acquire( &p_child_obj->lock ); - cl_spinlock_acquire( &p_parent_obj->lock ); - - cl_qlist_insert_tail( &p_child_obj->parent_list, &p_rel->list_item ); - cl_qlist_insert_tail( &p_parent_obj->child_list, - (cl_list_item_t*)&p_rel->pool_item ); - - cl_spinlock_release( &p_parent_obj->lock ); - cl_spinlock_release( &p_child_obj->lock ); -} - - - -/* - * Remove an existing relationship. - */ -void -cl_obj_remove_rel( - IN cl_obj_rel_t * const p_rel ) -{ - cl_obj_t *p_child_obj; - cl_obj_t *p_parent_obj; - - CL_ASSERT( p_rel ); - CL_ASSERT( p_rel->p_child_obj && p_rel->p_parent_obj ); - - p_child_obj = p_rel->p_child_obj; - p_parent_obj = p_rel->p_parent_obj; - - /* - * Release the objects - hold both locks to ensure that the relationship is - * removed from the child and parent lists at the same time. - */ - cl_spinlock_acquire( &p_child_obj->lock ); - cl_spinlock_acquire( &p_parent_obj->lock ); - - cl_qlist_remove_item( &p_child_obj->parent_list, &p_rel->list_item ); - cl_qlist_remove_item( &p_parent_obj->child_list, - (cl_list_item_t*)&p_rel->pool_item ); - - cl_spinlock_release( &p_parent_obj->lock ); - cl_spinlock_release( &p_child_obj->lock ); - - /* Dereference the objects. */ - cl_obj_deref( p_parent_obj ); - cl_obj_deref( p_child_obj ); - - p_rel->p_child_obj = NULL; - p_rel->p_parent_obj = NULL; -} - - - -/* - * Increment a reference count on an object. - */ -int32_t -cl_obj_ref( - IN cl_obj_t * const p_obj ) -{ - CL_ASSERT( p_obj ); - - /* - * We need to allow referencing the object during destruction in order - * to properly synchronize destruction between parent and child objects. - */ - CL_ASSERT( p_obj->state == CL_INITIALIZED || - p_obj->state == CL_DESTROYING ); - - return cl_atomic_inc( &p_obj->ref_cnt ); -} - - - -/* - * Decrement the reference count on an AL object. Destroy the object if - * it is no longer referenced. This object should not be an object's parent. - */ -int32_t -cl_obj_deref( - IN cl_obj_t * const p_obj ) -{ - int32_t ref_cnt; - - CL_ASSERT( p_obj ); - CL_ASSERT( p_obj->state == CL_INITIALIZED || - p_obj->state == CL_DESTROYING ); - - cl_spinlock_acquire( &p_obj->lock ); - ref_cnt = cl_atomic_dec( &p_obj->ref_cnt ); - cl_spinlock_release( &p_obj->lock ); - - /* If the reference count went to 0, the object should be destroyed. */ - if( ref_cnt == 0 ) - { - if( p_obj->destroy_type == CL_DESTROY_ASYNC ) - { - /* Queue the object for asynchronous destruction. */ - CL_ASSERT( gp_obj_mgr ); - cl_async_proc_queue( &gp_obj_mgr->async_proc_mgr, - &p_obj->async_item ); - } - else - { - /* Signal an event for synchronous destruction. */ - cl_event_signal( &p_obj->event ); - } - } - - return ref_cnt; -} - - - -/* - * Called to cleanup all resources allocated by an object. - */ -void -cl_obj_free( - IN cl_obj_t * const p_obj ) -{ - CL_ASSERT( p_obj ); - CL_ASSERT( p_obj->state == CL_UNINITIALIZED || - p_obj->state == CL_DESTROYING ); -#if defined( _DEBUG_ ) - { - cl_list_item_t *p_list_item; - cl_obj_rel_t *p_rel; - - /* - * Check that we didn't leave any list items in the parent list - * that came from the global pool. Ignore list items allocated by - * the user to simplify their usage model. - */ - for( p_list_item = cl_qlist_head( &p_obj->parent_list ); - p_list_item != cl_qlist_end( &p_obj->parent_list ); - p_list_item = cl_qlist_next( p_list_item ) ) - { - p_rel = (cl_obj_rel_t*)PARENT_STRUCT( p_list_item, - cl_obj_rel_t, list_item ); - CL_ASSERT( p_rel->pool_item.p_pool != - &gp_obj_mgr->rel_pool.qcpool ); - } - } -#endif - CL_ASSERT( cl_is_qlist_empty( &p_obj->child_list ) ); - - /* Remove the object from the global tracking list. */ - __remove_obj( p_obj ); - - cl_event_destroy( &p_obj->event ); - cl_spinlock_destroy( &p_obj->lock ); - - /* Mark the object as destroyed for debugging purposes. */ - p_obj->state = CL_DESTROYED; -} - - - -/* - * Remove the given object from its relationships with all its parents. - * This call requires synchronization to the given object. - */ -static void -__remove_parent_rel( - IN cl_obj_t * const p_obj ) -{ - cl_list_item_t *p_list_item; - cl_obj_rel_t *p_rel; - - /* - * Hold the object's lock to prevent a deadlock condition destroying a - * parent and a child object at the same time. The thread destroying - * the parent may be running at a higher priority. We need to let the - * child detach from its parents to prevent the parent from waiting - * forever on the child. - */ - cl_spinlock_acquire( &p_obj->lock ); - - /* Remove this child object from all its parents. */ - for( p_list_item = cl_qlist_tail( &p_obj->parent_list ); - p_list_item != cl_qlist_end( &p_obj->parent_list ); - p_list_item = cl_qlist_prev( p_list_item ) ) - { - p_rel = (cl_obj_rel_t*)PARENT_STRUCT( p_list_item, - cl_obj_rel_t, list_item ); - - /* - * Remove the child from the parent's list, but do not dereference - * the parent. This lets the user access the parent in the callback - * routines, but allows destruction to proceed. - */ - cl_spinlock_acquire( &p_rel->p_parent_obj->lock ); - cl_qlist_remove_item( &p_rel->p_parent_obj->child_list, - (cl_list_item_t*)&p_rel->pool_item ); - - /* - * Remove the relationship's reference to the child. Use an atomic - * decrement rather than cl_obj_deref, since we're already holding the - * child object's lock. - */ - cl_atomic_dec( &p_obj->ref_cnt ); - CL_ASSERT( p_obj->ref_cnt > 0 ); - - cl_spinlock_release( &p_rel->p_parent_obj->lock ); - - /* - * Mark that the child is no longer related to the parent. We still - * hold a reference on the parent object, so we don't clear the parent - * pointer until that reference is released. - */ - p_rel->p_child_obj = NULL; - } - cl_spinlock_release( &p_obj->lock ); -} - - - -static void -__destroy_child_obj( - IN cl_obj_t * p_obj ) -{ - cl_list_item_t *p_list_item; - cl_obj_rel_t *p_rel; - cl_obj_t *p_child_obj; - cl_state_t old_state; - - /* Destroy all child objects. */ - cl_spinlock_acquire( &p_obj->lock ); - for( p_list_item = cl_qlist_tail( &p_obj->child_list ); - p_list_item != cl_qlist_end( &p_obj->child_list ); - p_list_item = cl_qlist_tail( &p_obj->child_list ) ) - { - p_rel = (cl_obj_rel_t*)PARENT_STRUCT( p_list_item, - cl_obj_rel_t, pool_item ); - - /* - * Take a reference on the child to protect against another parent - * of the object destroying it while we are trying to access it. - * If the child object is being destroyed, it will try to remove - * this relationship from this parent. - */ - p_child_obj = p_rel->p_child_obj; - cl_obj_ref( p_child_obj ); - - /* - * We cannot hold the parent lock when acquiring the child's lock, or - * a deadlock can occur if the child is in the process of destroying - * itself and its parent relationships. - */ - cl_spinlock_release( &p_obj->lock ); - - /* - * Mark that we wish to destroy the object. If the old state indicates - * that we should destroy the object, continue with the destruction. - * Note that there is a reference held on the child object from its - * creation. We no longer need the prior reference taken above. - */ - old_state = __obj_set_state( p_child_obj, CL_DESTROYING ); - cl_obj_deref( p_child_obj ); - - if( old_state != CL_DESTROYING ) - __destroy_obj( p_child_obj ); - - /* Continue processing the relationship list. */ - cl_spinlock_acquire( &p_obj->lock ); - } - cl_spinlock_release( &p_obj->lock ); -} - - - -/* - * Destroys an object. This call returns TRUE if the destruction process - * should proceed, or FALSE if destruction is already in progress. - */ -static void -__destroy_obj( - IN cl_obj_t *p_obj ) -{ - CL_ASSERT( p_obj ); - CL_ASSERT( p_obj->state == CL_DESTROYING ); - - /* Remove this child object from all its parents. */ - __remove_parent_rel( p_obj ); - - /* Notify the user that the object is being destroyed. */ - if( p_obj->pfn_destroying ) - p_obj->pfn_destroying( p_obj ); - - /* Destroy all child objects. */ - __destroy_child_obj( p_obj ); - - /* Dereference this object as it is being destroyed. */ - cl_obj_deref( p_obj ); - - if( p_obj->destroy_type == CL_DESTROY_SYNC ) - { - /* Wait for all other references to go away. */ - cl_event_wait_on( &p_obj->event, 10000000, FALSE ); - __destroy_cb( &p_obj->async_item ); - } -} - - - -/* - * Dereference all parents the object was related to. - */ -static void -__deref_parents( - IN cl_obj_t * const p_obj ) -{ - cl_list_item_t *p_list_item; - cl_obj_rel_t *p_rel; - - /* Destruction of the object is already serialized - no need to lock. */ - - /* - * Dereference all parents. Keep the relationship items in the child's - * list, so that they can be returned to the user through the free callback. - */ - for( p_list_item = cl_qlist_head( &p_obj->parent_list ); - p_list_item != cl_qlist_end( &p_obj->parent_list ); - p_list_item = cl_qlist_next( p_list_item ) ) - { - p_rel = (cl_obj_rel_t*)PARENT_STRUCT( p_list_item, - cl_obj_rel_t, list_item ); - - CL_ASSERT( !p_rel->p_child_obj ); - cl_obj_deref( p_rel->p_parent_obj ); - p_rel->p_parent_obj = NULL; - } -} - - - -static void -__destroy_cb( - IN cl_async_proc_item_t *p_item ) -{ - cl_obj_t *p_obj; - - CL_ASSERT( p_item ); - - p_obj = PARENT_STRUCT( p_item, cl_obj_t, async_item ); - CL_ASSERT( !p_obj->ref_cnt ); - CL_ASSERT( p_obj->state == CL_DESTROYING ); - - /* Cleanup any hardware related resources. */ - if( p_obj->pfn_cleanup ) - p_obj->pfn_cleanup( p_obj ); - - /* We can now safely dereference all parents. */ - __deref_parents( p_obj ); - - /* Free the resources associated with the object. */ - CL_ASSERT( p_obj->pfn_free ); - p_obj->pfn_free( p_obj ); -} diff --git a/osm/complib/libosmcomp.map b/osm/complib/libosmcomp.map index e61ee57..42705cb 100644 --- a/osm/complib/libosmcomp.map +++ b/osm/complib/libosmcomp.map @@ -94,20 +94,8 @@ OSMCOMP_1.0 { cl_memset; cl_memcpy; cl_memcmp; - cl_obj_mgr_create; - cl_obj_mgr_destroy; cl_rel_alloc; cl_rel_free; - cl_obj_construct; - cl_obj_init; - cl_obj_destroy; - cl_obj_reset; - __obj_set_state; - cl_obj_insert_rel; - cl_obj_remove_rel; - cl_obj_ref; - cl_obj_deref; - cl_obj_free; __cl_perf_run_calibration; __cl_perf_construct; __cl_perf_init; diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am index a3f7c17..72b64c8 100644 --- a/osm/include/Makefile.am +++ b/osm/include/Makefile.am @@ -164,7 +164,6 @@ EXTRA_DIST = \ $(srcdir)/complib/cl_types.h \ $(srcdir)/complib/cl_fleximap.h \ $(srcdir)/complib/cl_qcomppool.h \ - $(srcdir)/complib/cl_obj.h \ $(srcdir)/iba/ib_types.h \ $(srcdir)/vendor/osm_vendor_mlx_transport_anafa.h \ $(srcdir)/vendor/osm_vendor_mlx.h \ diff --git a/osm/include/complib/cl_obj.h b/osm/include/complib/cl_obj.h deleted file mode 100644 index 25b4357..0000000 --- a/osm/include/complib/cl_obj.h +++ /dev/null @@ -1,871 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * $Id$ - */ - - -/* - * Abstract: - * Declaration of basic objects and relationships. - * - * Environment: - * All - * - * $Revision: 1.6 $ - */ - - -#if !defined(__CL_OBJ_H__) -#define __CL_OBJ_H__ - -#include -#include -#include -#include -#include -#include - -#ifdef __cplusplus -# define BEGIN_C_DECLS extern "C" { -# define END_C_DECLS } -#else /* !__cplusplus */ -# define BEGIN_C_DECLS -# define END_C_DECLS -#endif /* __cplusplus */ - -BEGIN_C_DECLS - -/****h* Component Library/Object -* NAME -* Object -* -* DESCRIPTION -* Object describes a basic class that can be used to track accesses to an -* object and provides automatic cleanup of an object that is dependent -* on another object. -* -* Dependencies between objects are described using a relationship. A -* child object is considered dependent on a parent object. Destruction of -* a parent object automatically results in the destruction of any child -* objects associated with the parent. -* -* The relationship between parent and child objects is many to many. -* Parents can have multiple child objects, and a child can be dependent on -* multiple parent objects. In the latter case, destruction of any parent -* object results in the destruction of the child object. -* -* Other relationships between objects are described using references. An -* object that takes a reference on a second object prevents the second object -* from being deallocated as long as the reference is held. -* -* SEE ALSO -* Types -* cl_destroy_type_t -* -* Structures: -* cl_obj_t, cl_obj_rel_t -* -* Callbacks: -* cl_pfn_obj_call_t -* -* Initialization/Destruction: -* cl_obj_mgr_create, cl_obj_mgr_destroy, -* cl_obj_construct, cl_obj_init, cl_obj_destroy, cl_obj_free -* -* Object Relationships: -* cl_obj_ref, cl_obj_deref, -* cl_rel_alloc, cl_rel_free, cl_obj_insert_rel, cl_obj_remove_rel -* -* Object Manipulation: -* cl_obj_reset -*********/ - - - -/* Forward declaration. */ -typedef struct _cl_obj *__p_cl_obj_t; - - - -/****s* Component Library: Object/cl_obj_mgr_t -* NAME -* cl_obj_mgr_t -* -* DESCRIPTION -* The global object manager. -* -* The manager must be created before constructing any other objects, and all -* objects must be destroyed before the object manager is destroyed. -* -* The manager is used to maintain the list of all objects currently active -* in the system. It provides a pool of relationship items used to -* describe parent-child, or dependent, relationships between two objects. -* The manager contains an asynchronous processing thread that is used to -* support asynchronous object destruction. -* -* SYNOPSIS -*/ -typedef struct _cl_obj_mgr -{ - cl_qlist_t obj_list; - cl_spinlock_t lock; - - cl_async_proc_t async_proc_mgr; - - cl_qpool_t rel_pool; - -} cl_obj_mgr_t; -/* -* FIELDS -* obj_list -* List of all object's in the system. Object's are inserted into this -* list when constructed and removed when freed. -* -* lock -* A lock used by the object manager for synchronization to the obj_list. -* -* async_proc_mgr -* An asynchronous processing manager used to process asynchronous -* destruction requests. Users wishing to synchronize the execution of -* specific routines with object destruction may queue work requests to -* this processing manager. -* -* rel_pool -* Pool of items used to describe dependent relationships. Users may -* obtain relationship objects from this pool when forming relationships, -* but are not required to do so. -* -* SEE ALSO -* Object, cl_obj_mgr_create, cl_obj_mgr_destroy, -* cl_obj_construct, cl_obj_free, -* cl_qlist_t, cl_spinlock_t, cl_async_proc_t, cl_qpool_t -*********/ - - - -/****f* Component Library: Object/cl_obj_mgr_create -* NAME -* cl_obj_mgr_create -* -* DESCRIPTION -* This routine creates an object manager used to track all objects by -* the user. The object manager assists with debugging efforts by identifying -* objects that are not destroyed properly. -* -* SYNOPSIS -*/ -cl_status_t -cl_obj_mgr_create(void); -/* -* PARAMETERS -* None. -* -* RETURN VALUE -* CL_SUCCESS -* The object manager was succesfully created. -* -* CL_INSUFFICIENT_MEMORY -* The object manager could not be allocated. -* -* NOTES -* This call must succeed before invoking any other object-related function. -* -* SEE ALSO -* Object, cl_obj_mgr_destroy -*********/ - - - -/****f* Component Library: Object/cl_obj_mgr_destroy -* NAME -* cl_obj_mgr_destroy -* -* DESCRIPTION -* This routine destroys the object manager created through cl_obj_mgr_create. -* -* SYNOPSIS -*/ -void -cl_obj_mgr_destroy(void); -/* -* PARAMETERS -* None. -* -* RETURN VALUE -* None. -* -* NOTES -* When the object manager is destroyed, it will display information about all -* objects that have not yet been destroyed. -* -* SEE ALSO -* Object, cl_obj_mgr_create -*********/ - - - -/****d* Component Library: Object/cl_pfn_obj_call_t -* NAME -* cl_pfn_obj_call_t -* -* DESCRIPTION -* The cl_pfn_obj_call_t function type defines the prototype for functions -* used to return objects to the user. -* -* SYNOPSIS -*/ -typedef void -(*cl_pfn_obj_call_t)( - IN struct _cl_obj *p_obj ); -/* -* PARAMETERS -* p_obj -* [in] Pointer to a cl_obj_t. This is the object being returned to -* the user. -* -* RETURN VALUES -* None. -* -* NOTES -* This function type is provided as a prototype for functions provided -* by users as parameters to the cl_obj_init function. -* -* SEE ALSO -* Object, cl_obj_init, cl_obj_t -*********/ - - - -/****d* Component Library: Object/cl_destroy_type_t -* NAME -* cl_destroy_type_t -* -* DESCRIPTION -* Indicates the type of destruction to perform on an object. -* -* SYNOPSIS -*/ -typedef enum _cl_destroy_type -{ - CL_DESTROY_ASYNC, - CL_DESTROY_SYNC - -} cl_destroy_type_t; -/* -* VALUES -* CL_DESTROY_ASYNC -* Indicates that the object should be destroyed asynchronously. Objects -* destroyed asynchronously complete initial destruction processing, then -* return the calling thread. Once their reference count goes to zero, -* they are queue onto an asynchronous thread to complete destruction -* processing. -* -* CL_DESTROY_SYNC -* Indicates that the object should be destroyed synchronously. Objects -* destroyed synchronously wait (block) until their reference count goes -* to zero. Once their reference count goes to zero, destruction -* processing is completed by the calling thread. -* -* SEE ALSO -* Object, cl_obj_init, cl_obj_destroy, cl_obj_free, cl_obj_t -*********/ - - - -/****s* Component Library: Object/cl_obj_t -* NAME -* cl_obj_t -* -* DESCRIPTION -* Object structure. -* -* SYNOPSIS -*/ -typedef struct _cl_obj -{ - cl_pool_item_t pool_item; /* Must be first. */ - uint32_t type; - cl_state_t state; - cl_destroy_type_t destroy_type; - - cl_async_proc_item_t async_item; - cl_event_t event; - - cl_pfn_obj_call_t pfn_destroying; - cl_pfn_obj_call_t pfn_cleanup; - cl_pfn_obj_call_t pfn_free; - - cl_spinlock_t lock; - - cl_qlist_t parent_list; - cl_qlist_t child_list; - - atomic32_t ref_cnt; - -} cl_obj_t; -/* -* FIELDS -* pool_item -* Used to track the object with the global object manager. We use -* a pool item, rather than a list item, to let users store the object -* in a pool. -* -* type -* Stores a user-specified object type. -* -* state -* Records the current state of the object, such as initialized, -* destroying, etc. -* -* destroy_type -* Specifies the type of destruction, synchronous or asynchronous, to -* perform on this object. -* -* async_item -* Asynchronous item used when destroying the object asynchronously. -* This item is queued to an asynchronous thread to complete destruction -* processing. -* -* event -* Event used when destroying the object synchronously. A call to destroy -* the object will wait on this event until the destruction has completed. -* -* pfn_destroying -* User-specified callback invoked to notify a user that an object has -* been marked for destruction. This callback is invoked directly from -* the thread destroying the object and is used to notify a user that -* a parent object has invoked a child object's destructor. -* -* pfn_cleanup -* User-specified callback invoked as an object is undergoing destruction. -* For object's destroyed asynchronously, this callback is invoked from -* the context of the asynchronous destruction thread. Users may block -* in the context of this thread; however, further destruction processing -* will not continue until this callback returns. -* -* pfn_free -* User-specified callback invoked to notify a user that an object has -* been destroyed and is ready for deallocation. Users should either -* call cl_obj_free or cl_obj_reset from within this callback. -* -* lock -* A lock provided by the object. -* -* parent_list -* A list of relationships to parent objects that an object is dependent -* on. -* -* child_list -* A list of all child objects that are dependent on this object. -* Destroying this object will result in all related objects maintained -* in the child list also being destroyed. -* -* ref_cnt -* A count of the number of objects still referencing this object. -* -* SEE ALSO -* Object, cl_obj_construct, cl_obj_init, cl_obj_destroy, -* cl_obj_free, cl_pfn_obj_call_t, cl_destroy_type_t, -* cl_pool_item_t, cl_state_t, cl_async_proc_item_t, -* cl_event_t, cl_spinlock_t, cl_qlist_t, atomic32_t -*********/ - - - -/****f* Component Library: Object/cl_obj_construct -* NAME -* cl_obj_construct -* -* DESCRIPTION -* This routine prepares an object for use. The object must be successfully -* initialized before being used. -* -* SYNOPSIS -*/ -void -cl_obj_construct( - IN cl_obj_t * const p_obj, - IN const uint32_t obj_type ); -/* -* PARAMETERS -* p_obj -* [in] A pointer to the object to construct. -* -* obj_type -* [in] A user-specified type associated with the object. This type -* is recorded by the object for debugging purposes and may be accessed -* by the user. -* -* RETURN VALUE -* None. -* -* NOTES -* This call must succeed before invoking any other function on an object. -* -* SEE ALSO -* Object, cl_obj_init, cl_obj_destroy, cl_obj_free. -*********/ - - -/****f* Component Library: Object/cl_obj_init -* NAME -* cl_obj_init -* -* DESCRIPTION -* This routine initializes an object for use. Upon the successful completion -* of this call, the object is ready for use. -* -* SYNOPSIS -*/ -cl_status_t -cl_obj_init( - IN cl_obj_t * const p_obj, - IN cl_destroy_type_t destroy_type, - IN const cl_pfn_obj_call_t pfn_destroying OPTIONAL, - IN const cl_pfn_obj_call_t pfn_cleanup OPTIONAL, - IN const cl_pfn_obj_call_t pfn_free ); -/* -* PARAMETERS -* p_obj -* [in] A pointer to the object to initialize. -* -* destroy_type -* [in] Specifies the destruction model used by this object. -* -* pfn_destroying -* [in] User-specified callback invoked to notify a user that an object has -* been marked for destruction. This callback is invoked directly from -* the thread destroying the object and is used to notify a user that -* a parent object has invoked a child object's destructor. -* -* pfn_cleanup -* [in] User-specified callback invoked to an object is undering -* destruction. For object's destroyed asynchronously, this callback -* is invoked from the context of the asynchronous destruction thread. -* Users may block in the context of this thread; however, further -* destruction processing will not continue until this callback returns. -* -* pfn_free -* [in] User-specified callback invoked to notify a user that an object has -* been destroyed and is ready for deallocation. Users should either -* call cl_obj_free or cl_obj_reset from within this callback. -* -* RETURN VALUE -* CL_SUCCESS -* The object was successfully initialized. -* -* CL_INSUFFICIENT_MEMORY -* The object could not allocate the necessary memory resources to -* complete initialization. -* -* NOTES -* The three destruction callbacks are used to notify the user of the progress -* of the destruction, permitting the user to perform an additional processing. -* Pfn_destroying is used to notify the user that the object is being -* destroyed. It is called after an object has removed itself from -* relationships with its parents, but before it destroys any child objects -* that it might have. -* -* Pfn_cleanup is invoked after all child objects have been destroyed, and -* there are no more references on the object itself. For objects destroyed -* asynchronously, pfn_cleanup is invoked from an asynchronous destruction -* thread. -* -* Pfn_free is called to notify the user that the destruction of the object has -* completed. All relationships have been removed, and all child objects have -* been destroyed. Relationship items (cl_obj_rel_t) that were used to -* identify parent objects are returned to the user through the p_parent_list -* field of the cl_obj_t structure. -* -* SEE ALSO -* Object, cl_obj_construct, cl_obj_destroy, cl_obj_free, -* cl_obj_t, cl_destroy_type_t, cl_pfn_obj_call_t, -*********/ - - -/****f* Component Library: Object/cl_obj_destroy -* NAME -* cl_obj_destroy -* -* DESCRIPTION -* This routine destroys the specified object. -* -* SYNOPSIS -*/ -void -cl_obj_destroy( - IN cl_obj_t * p_obj ); -/* -* PARAMETERS -* p_obj -* [in] A pointer to the object to destroy. -* -* RETURN VALUE -* None. -* -* NOTES -* This routine starts the destruction process for the specified object. For -* additional information regarding destruction callbacks, see the following -* fields in cl_obj_t and parameters in cl_obj_init: pfn_destroying, -* pfn_cleanup, and pfn_free. -* -* In most cases, after calling this routine, users should call cl_obj_free -* from within their pfn_free callback routine. -* -* SEE ALSO -* Object, cl_obj_construct, cl_obj_init, cl_obj_free, -* cl_obj_t, cl_destroy_type_t, cl_pfn_obj_call_t -*********/ - - - -/****f* Component Library: Object/cl_obj_free -* NAME -* cl_obj_free -* -* DESCRIPTION -* Release all resources allocated by an object. This routine should -* typically be called from a user's pfn_free routine. -* -* SYNOPSIS -*/ -void -cl_obj_free( - IN cl_obj_t * const p_obj ); -/* -* PARAMETERS -* p_obj -* [in] A pointer to the object to free. -* -* RETURN VALUE -* None. -* -* NOTES -* This call must be invoked to release the object from the global object -* manager. -* -* SEE ALSO -* Object, cl_obj_construct, cl_obj_init, cl_obj_destroy, cl_obj_t -*********/ - - - -/****f* Component Library: Object/cl_obj_reset -* NAME -* cl_obj_reset -* -* DESCRIPTION -* Reset an object's state. This is called after cl_obj_destroy has -* been called on a object, but before cl_obj_free has been invoked. -* After an object has been reset, it is ready for re-use. -* -* SYNOPSIS -*/ -void -cl_obj_reset( - IN cl_obj_t * const p_obj ); -/* -* PARAMETERS -* p_obj -* [in] A pointer to the object to reset. -* -* RETURN VALUE -* None. -* -* NOTES -* This routine allows an object to be initialized once, then destroyed -* and re-used multiple times. This permits the user to allocate and -* maintain a pool of objects. The objects may be reset and returned to -* the pool, rather than freed, after being destroyed. The objects would -* not be freed until the pool itself was destroyed. -* -* SEE ALSO -* Object, cl_obj_destroy, cl_obj_free, cl_obj_t -*********/ - - - -/****f* Component Library: Object/cl_obj_ref -* NAME -* cl_obj_ref -* -* DESCRIPTION -* Increments the reference count on an object and returns the updated count. -* This routine is thread safe, but does not result in locking the object. -* -* SYNOPSIS -*/ -int32_t -cl_obj_ref( - IN cl_obj_t * const p_obj ); -/* -* PARAMETERS -* p_obj -* [in] A pointer to the object to reference. -* -* RETURN VALUE -* The updated reference count. -* -* SEE ALSO -* Object, cl_obj_t, cl_obj_deref -*********/ - - - -/****f* Component Library: Object/cl_obj_deref -* NAME -* cl_obj_deref -* -* DESCRIPTION -* Decrements the reference count on an object and returns the updated count. -* This routine is thread safe, but results in locking the object. -* -* SYNOPSIS -*/ -int32_t -cl_obj_deref( - IN cl_obj_t * const p_obj ); -/* -* PARAMETERS -* p_obj -* [in] A pointer to the object to dereference. -* -* RETURN VALUE -* The updated reference count. -* -* SEE ALSO -* Object, cl_obj_t, cl_obj_ref -*********/ - - -/****s* Component Library: Object/cl_obj_rel_t -* NAME -* cl_obj_rel_t -* -* DESCRIPTION -* Identifies a dependent relationship between two objects. -* -* SYNOPSIS -*/ -typedef struct _cl_obj_rel -{ - cl_pool_item_t pool_item; /* Must be first. */ - struct _cl_obj *p_parent_obj; - - cl_list_item_t list_item; - struct _cl_obj *p_child_obj; - -} cl_obj_rel_t; -/* -* FIELDS -* pool_item -* An item used to store the relationship in a free pool maintained -* by the object manager. This field is also used by the parent object -* to store the relationship in its child_list. -* -* p_parent_obj -* A reference to the parent object for the relationship. -* -* list_item -* This field is used by the child object to store the relationship in -* its parent_list. -* -* p_child_obj -* A reference to the child object for the relationship. -* -* NOTES -* This structure is used to define all dependent relationships. Dependent -* relationships are those where the destruction of a parent object result in -* the destruction of child objects. For other types of relationships, simple -* references between objects may be used. -* -* Relationship items are stored in lists maintained by both the parent -* and child objects. References to both objects exist while the -* relationship is maintained. Typically, relationships are defined by -* the user by calling cl_obj_insert_rel, but are destroyed automatically -* via an object's destruction process. -* -* SEE ALSO -* Object, cl_rel_alloc, cl_rel_free, cl_obj_insert_rel, cl_obj_remove_rel, -* cl_obj_destroy -*********/ - - - -/****f* Component Library: Object/cl_rel_alloc -* NAME -* cl_rel_alloc -* -* DESCRIPTION -* Retrieves an object relationship item from the object manager. -* -* SYNOPSIS -*/ -cl_obj_rel_t* -cl_rel_alloc(void); -/* -* PARAMETERS -* None. -* -* RETURN VALUE -* A reference to an allocated relationship object, or NULL if no relationship -* object could be allocated. -* -* NOTES -* This routine retrieves a cl_obj_rel_t structure from a pool maintained -* by the object manager. The pool automatically grows as needed. -* -* Relationship items are used to describe a dependent relationship between -* a parent and child object. In cases where a child has a fixed number of -* relationships, the user may be able to allocate and manage the cl_obj_rel_t -* structures more efficiently than obtaining the structures through this call. -* -* SEE ALSO -* Object, cl_rel_free, cl_obj_insert_rel, cl_obj_remove_rel, cl_obj_destroy -*********/ - - - -/****f* Component Library: Object/cl_rel_free -* NAME -* cl_rel_free -* -* DESCRIPTION -* Return a relationship object to the global object manager. -* -* SYNOPSIS -*/ -void -cl_rel_free( - IN cl_obj_rel_t * const p_rel ); -/* -* PARAMETERS -* p_rel -* [in] A reference to the relationship item to free. -* -* RETURN VALUE -* None. -* -* NOTES -* Relationship items must not be freed until both the parent and child -* object have removed their references to one another. Relationship items -* may be freed after calling cl_obj_remove_rel or after the associated -* child object's free callback has been invoked. In the latter case, the -* invalid relationship items are referenced by the child object's parent_list. -* -* SEE ALSO -* Object, cl_rel_alloc, cl_obj_insert_rel, cl_obj_remove_rel, cl_obj_destroy -*********/ - - - -/****f* Component Library: Object/cl_obj_insert_rel -* NAME -* cl_obj_insert_rel -* -* DESCRIPTION -* Forms a relationship between two objects, with the existence of the child -* object dependent on the parent. -* -* SYNOPSIS -*/ -void -cl_obj_insert_rel( - IN cl_obj_rel_t * const p_rel, - IN cl_obj_t * const p_parent_obj, - IN cl_obj_t * const p_child_obj ); -/* -* PARAMETERS -* p_rel -* [in] A reference to an unused relationship item. -* -* p_parent_obj -* [in] A reference to the parent object. -* -* p_child_obj -* [in] A reference to the child object. -* -* RETURN VALUE -* None. -* -* NOTES -* This call inserts a relationship between the parent and child object. -* The relationship allows for the automatic destruction of the child object -* if the parent is destroyed. -* -* A given object can have multiple parent and child objects, but the -* relationships must form into an object tree. That is, there cannot be any -* cycles formed through the parent-child relationships. (For example, an -* object cannot be both the parent and a child of a second object.) -* -* SEE ALSO -* Object, cl_rel_alloc, cl_rel_free, cl_obj_remove_rel, cl_obj_destroy -*********/ - - - -/****f* Component Library: Object/cl_obj_remove_rel -* NAME -* cl_obj_remove_rel -* -* DESCRIPTION -* Manually removes a relationship between two objects. -* -* SYNOPSIS -*/ -void -cl_obj_remove_rel( - IN cl_obj_rel_t * const p_rel ); -/* -* PARAMETERS -* p_rel -* [in] A reference to the relationship to remove. -* -* RETURN VALUE -* None. -* -* NOTES -* This routine permits a user to manually remove a dependent relationship -* between two objects. When removing a relationship using this call, the -* user must ensure that objects referenced by the relationship are not -* destroyed, either directly or indirectly via a parent. -* -* SEE ALSO -* Object, cl_rel_alloc, cl_rel_free, cl_obj_insert_rel, cl_obj_destroy -*********/ - - -END_C_DECLS - -#endif /* __CL_OBJ_H__ */ From sashak at voltaire.com Wed May 10 10:19:25 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 10 May 2006 20:19:25 +0300 Subject: [openib-general] [PATCH] opensm: complib: cleanup unsused cl_reqmgr files Message-ID: <20060510171925.31364.89893.stgit@sashak.voltaire.com> Cleanup unused cl_reqmgr source and header files from complib. Signed-off-by: Sasha Khapyorsky --- osm/complib/Makefile.am | 3 osm/complib/cl_reqmgr.c | 297 ------------------------ osm/complib/libosmcomp.map | 5 osm/include/Makefile.am | 1 osm/include/complib/cl_reqmgr.h | 488 --------------------------------------- 5 files changed, 1 insertions(+), 793 deletions(-) diff --git a/osm/complib/Makefile.am b/osm/complib/Makefile.am index 40e978f..ecbd8e2 100644 --- a/osm/complib/Makefile.am +++ b/osm/complib/Makefile.am @@ -22,7 +22,7 @@ libosmcomp_la_SOURCES = cl_async_proc.c cl_dispatcher.c cl_event.c cl_event_wheel.c \ cl_list.c cl_log.c cl_map.c cl_memory.c \ cl_memory_osd.c cl_perf.c cl_pool.c \ - cl_ptr_vector.c cl_reqmgr.c \ + cl_ptr_vector.c \ cl_spinlock.c cl_statustext.c \ cl_thread.c cl_threadpool.c \ cl_timer.c cl_vector.c \ @@ -64,7 +64,6 @@ libosmcompinclude_HEADERS = $(srcdir)/.. $(srcdir)/../include/complib/cl_qlockpool.h \ $(srcdir)/../include/complib/cl_qmap.h \ $(srcdir)/../include/complib/cl_qpool.h \ - $(srcdir)/../include/complib/cl_reqmgr.h \ $(srcdir)/../include/complib/cl_spinlock.h \ $(srcdir)/../include/complib/cl_spinlock_osd.h \ $(srcdir)/../include/complib/cl_thread.h \ diff --git a/osm/complib/cl_reqmgr.c b/osm/complib/cl_reqmgr.c deleted file mode 100644 index e2492a4..0000000 --- a/osm/complib/cl_reqmgr.c +++ /dev/null @@ -1,297 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * $Id$ - */ - - - -/* - * Abstract: - * Implementation of asynchronous request manager. - * - * Environment: - * All - * - * $Revision: 1.3 $ - */ - - -#if HAVE_CONFIG_H -# include -#endif /* HAVE_CONFIG_H */ - -#include -#include - - -/* minimum number of objects to allocate */ -#define REQ_MGR_START_SIZE 10 -/* minimum number of objects to grow */ -#define REQ_MGR_GROW_SIZE 10 - - -/****i* Component Library: Request Manager/cl_request_object_t -* NAME -* cl_request_object_t -* -* DESCRIPTION -* Request manager structure. -* -* The cl_request_object_t structure should be treated as opaque and should be -* manipulated only through the provided functions. -* -* SYNOPSIS -*/ -typedef struct _cl_request_object -{ - cl_pool_item_t pool_item; - size_t count; - boolean_t partial_ok; - cl_pfn_req_cb_t pfn_callback; - const void *context1; - const void *context2; - -} cl_request_object_t; -/* -* FIELDS -* pool_item -* Pool item to store request in a pool or list. -* -* count -* Number of items requested. -* -* partial_ok -* Is it okay to return some of the items. -* -* pfn_callback -* Notification routine when completed. -* -* context1 -* Callback context information. -* -* context2 -* Callback context information. -* -* SEE ALSO -* Overview -*********/ - - -void -cl_req_mgr_construct( - IN cl_req_mgr_t* const p_req_mgr ) -{ - CL_ASSERT( p_req_mgr ); - - /* Clear the structure. */ - cl_memclr( p_req_mgr, sizeof(cl_req_mgr_t) ); - - /* Initialize the state of the free request stack. */ - cl_qpool_construct( &p_req_mgr->request_pool ); -} - - -cl_status_t -cl_req_mgr_init( - IN cl_req_mgr_t* const p_req_mgr, - IN cl_pfn_reqmgr_get_count_t pfn_get_count, - IN const void* const get_context ) -{ - cl_status_t status; - - CL_ASSERT( p_req_mgr ); - CL_ASSERT( pfn_get_count ); - - cl_qlist_init( &p_req_mgr->request_queue ); - - status = cl_qpool_init( &p_req_mgr->request_pool, REQ_MGR_START_SIZE, 0, - REQ_MGR_GROW_SIZE, sizeof(cl_request_object_t), NULL, NULL, NULL ); - - if( status != CL_SUCCESS ) - return( status ); - - /* Store callback information for the count function. */ - p_req_mgr->pfn_get_count = pfn_get_count; - p_req_mgr->get_context = get_context; - - return( CL_SUCCESS ); -} - - -void -cl_req_mgr_destroy( - IN cl_req_mgr_t* const p_req_mgr ) -{ - CL_ASSERT( p_req_mgr ); - - /* Return all requests to the grow pool. */ - if( cl_is_qpool_inited( &p_req_mgr->request_pool ) ) - { - cl_qpool_put_list( &p_req_mgr->request_pool, - &p_req_mgr->request_queue ); - } - - cl_qpool_destroy( &p_req_mgr->request_pool ); -} - - -cl_status_t -cl_req_mgr_get( - IN cl_req_mgr_t* const p_req_mgr, - IN OUT size_t* const p_count, - IN const cl_req_type_t req_type, - IN cl_pfn_req_cb_t pfn_callback, - IN const void* const context1, - IN const void* const context2 ) -{ - size_t available_count; - size_t count; - cl_request_object_t *p_request; - - CL_ASSERT( p_req_mgr ); - CL_ASSERT( cl_is_qpool_inited( &p_req_mgr->request_pool ) ); - CL_ASSERT( p_count ); - CL_ASSERT( *p_count ); - - /* Get the number of available objects in the grow pool. */ - available_count = - p_req_mgr->pfn_get_count( (void*)p_req_mgr->get_context ); - - /* - * Check to see if there is nothing on the queue, and there are - * enough items to satisfy the whole request. - */ - if( cl_is_qlist_empty( &p_req_mgr->request_queue ) && - *p_count <= available_count ) - { - return( CL_SUCCESS ); - } - - if( req_type == REQ_GET_SYNC ) - return( CL_INSUFFICIENT_RESOURCES ); - - /* We need a request object to place on the request queue. */ - p_request = (cl_request_object_t*) - cl_qpool_get( &p_req_mgr->request_pool ); - - if( !p_request ) - return( CL_INSUFFICIENT_MEMORY ); - - /* - * We can return the available number of objects but we still need - * to queue a request for the remainder. - */ - if( req_type == REQ_GET_PARTIAL_OK && - cl_is_qlist_empty( &p_req_mgr->request_queue ) ) - { - count = *p_count - available_count; - *p_count = available_count; - p_request->partial_ok = TRUE; - } - else - { - /* - * We cannot return any objects. We queue a request for - * all of them. - */ - count = *p_count; - *p_count = 0; - p_request->partial_ok = FALSE; - } - - /* Set the request fields and enqueue it. */ - p_request->pfn_callback = pfn_callback; - p_request->context1 = context1; - p_request->context2 = context2; - p_request->count = count; - - cl_qlist_insert_tail( &p_req_mgr->request_queue, - &p_request->pool_item.list_item ); - - return( CL_PENDING ); -} - - -cl_status_t -cl_req_mgr_resume( - IN cl_req_mgr_t* const p_req_mgr, - OUT uint32_t* const p_count, - OUT cl_pfn_req_cb_t* const ppfn_callback, - OUT const void** const p_context1, - OUT const void** const p_context2 ) -{ - size_t available_count; - cl_request_object_t *p_queued_request; - - CL_ASSERT( p_req_mgr ); - CL_ASSERT( cl_is_qpool_inited( &p_req_mgr->request_pool ) ); - - /* If no requests are pending, there's nothing to return. */ - if( cl_is_qlist_empty( &p_req_mgr->request_queue ) ) - return( CL_NOT_DONE ); - - /* - * Get the item at the head of the request queue, - * but do not remove it yet. - */ - p_queued_request = (cl_request_object_t*) - cl_qlist_head( &p_req_mgr->request_queue ); - - *ppfn_callback = p_queued_request->pfn_callback; - *p_context1 = p_queued_request->context1; - *p_context2 = p_queued_request->context2; - - available_count = - p_req_mgr->pfn_get_count( (void*)p_req_mgr->get_context ); - - /* See if the request can be fulfilled. */ - if( p_queued_request->count > available_count ) - { - if( !p_queued_request->partial_ok ) - return( CL_INSUFFICIENT_RESOURCES ); - - p_queued_request->count -= available_count; - *p_count = (uint32_t)available_count; - return( CL_PENDING ); - } - *p_count = (uint32_t) p_queued_request->count; - - /* The entire request can be met. Remove it from the request queue. */ - cl_qlist_remove_head( &p_req_mgr->request_queue ); - - /* Return the internal request object to the free stack. */ - cl_qpool_put( &p_req_mgr->request_pool, - &p_queued_request->pool_item ); - return( CL_SUCCESS ); -} diff --git a/osm/complib/libosmcomp.map b/osm/complib/libosmcomp.map index 42705cb..72ae2a4 100644 --- a/osm/complib/libosmcomp.map +++ b/osm/complib/libosmcomp.map @@ -132,11 +132,6 @@ OSMCOMP_1.0 { cl_qlock_pool_init; cl_qlock_pool_get; cl_qlock_pool_put; - cl_req_mgr_construct; - cl_req_mgr_init; - cl_req_mgr_destroy; - cl_req_mgr_get; - cl_req_mgr_resume; cl_spinlock_construct; cl_spinlock_init; cl_spinlock_destroy; diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am index 72b64c8..c7054ad 100644 --- a/osm/include/Makefile.am +++ b/osm/include/Makefile.am @@ -136,7 +136,6 @@ EXTRA_DIST = \ $(srcdir)/complib/cl_math.h \ $(srcdir)/complib/cl_qpool.h \ $(srcdir)/complib/cl_qlist.h \ - $(srcdir)/complib/cl_reqmgr.h \ $(srcdir)/complib/cl_vector.h \ $(srcdir)/complib/cl_byteswap_osd.h \ $(srcdir)/complib/cl_qlockpool.h \ diff --git a/osm/include/complib/cl_reqmgr.h b/osm/include/complib/cl_reqmgr.h deleted file mode 100644 index a9cc61c..0000000 --- a/osm/include/complib/cl_reqmgr.h +++ /dev/null @@ -1,488 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * $Id$ - */ - - - -/* - * Abstract: - * Declaration of asynchronous request manager. The request manager does - * not return resources, only notifies the user when resources are available. - * - * Environment: - * All - * - * $Revision: 1.3 $ - */ - - -#ifndef _CL_REQ_MGR_H_ -#define _CL_REQ_MGR_H_ - - -#include - -#ifdef __cplusplus -# define BEGIN_C_DECLS extern "C" { -# define END_C_DECLS } -#else /* !__cplusplus */ -# define BEGIN_C_DECLS -# define END_C_DECLS -#endif /* __cplusplus */ - -BEGIN_C_DECLS - -/****h* Component Library/Request Manager -* NAME -* Request Manager -* -* DESCRIPTION -* The Request Manager manages synchronous as well as asynchronous -* requests for objects. -* -* Request manager does not supply the objects, but merely returns whether -* objects are available to satisfy requests. This allows users to use -* various sources for objects. -* -* While the request manager manages synchronous and asynchronous requests -* for objects, it does not itself operate asynchronously. Instead, the -* cl_req_mgr_resume function returns information for resuming asynchronous -* requests. If a call to cl_req_mgr_resume returns CL_SUCCESS, additional -* requests may be able to resume. It is recommended that users flush -* pending requests by calling cl_req_mgr_resume while CL_SUCCESS is returned. -* -* The request manager functions operates on a cl_req_mgr_t structure which -* should be treated as opaque and should be manipulated only through the -* provided functions. -* -* SEE ALSO -* Types: -* cl_req_type_t -* -* Structures: -* cl_req_mgr_t -* -* Callbacks: -* cl_pfn_req_cb_t, cl_pfn_reqmgr_get_count_t -* -* Initialization/Destruction: -* cl_req_mgr_construct, cl_req_mgr_init, cl_req_mgr_destroy -* -* Manipulation: -* cl_req_mgr_get, cl_req_mgr_resume -* -* Attributes: -* cl_is_req_mgr_inited, cl_req_mgr_count -*********/ - - -/****d* Component Library: Request Manager/cl_pfn_req_cb_t -* NAME -* cl_pfn_req_cb_t -* -* DESCRIPTION -* The cl_pfn_req_cb_t function type defines the prototype for functions -* used to store a function pointer to a user defined function. -* -* SYNOPSIS -*/ -typedef void -(*cl_pfn_req_cb_t)( void ); -/* -* PARAMETERS -* This function does not take parameters. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* Function pointers specified by this parameter do not have to match the -* defined syntax, as these callbacks are never invoked directly by the -* request manager. When specifying a function with a different prototype, -* cast the function pointer to this type. -* -* SEE ALSO -* Request Manager, cl_req_mgr_get, cl_req_mgr_resume -*********/ - - -/****d* Component Library: Request Manager/cl_req_type_t -* NAME -* cl_req_type_t -* -* DESCRIPTION -* The cl_req_type_t enumerated type describes the type of request. -* -* SYNOPSIS -*/ -typedef enum _cl_req_type -{ - REQ_GET_SYNC, - REQ_GET_ASYNC, - REQ_GET_PARTIAL_OK - -} cl_req_type_t; -/* -* VALUES -* REQ_GET_SYNC -* Synchronous request. -* -* REQ_GET_ASYNC -* Asynchronous requests for which all objects are required at once. -* -* REQ_GET_PARTIAL_OK -* Asynchronous requests that may be broken into multiple smaller requests. -* -* SEE ALSO -* Request Manager, cl_req_mgr_get -*********/ - - -/****d* Component Library: Request Manager/cl_pfn_reqmgr_get_count_t -* NAME -* cl_pfn_reqmgr_get_count_t -* -* DESCRIPTION -* The cl_pfn_reqmgr_get_count_t function type defines the prototype for -* functions used to retrieve the number of available objects in a pool. -* -* SYNOPSIS -*/ -typedef size_t -(*cl_pfn_reqmgr_get_count_t)( - IN void* context ); -/* -* PARAMETERS -* Context -* [in] Context provided in a call to cl_req_mgr_init by -* the get_context parameter. -* -* RETURN VALUE -* Returns the number of objects available in an object pool for which -* requests are managed by a request manager. -* -* NOTES -* This function type is provided as function prototype reference for the -* function passed into cl_req_mgr_init. This function is invoked by the -* request manager when trying to fulfill requests for resources, either -* through a call to cl_req_mgr_get or cl_req_mgr_resume. -* -* SEE ALSO -* Request Manager, cl_req_mgr_init, cl_req_mgr_get, cl_req_mgr_resume -*********/ - - -/****s* Component Library: Request Manager/cl_req_mgr_t -* NAME -* cl_req_mgr_t -* -* DESCRIPTION -* Quick composite pool structure. -* -* The cl_req_mgr_t structure should be treated as opaque and should be -* manipulated only through the provided functions. -* -* SYNOPSIS -*/ -typedef struct _cl_req_mgr -{ - cl_pfn_reqmgr_get_count_t pfn_get_count; - const void *get_context; - cl_qlist_t request_queue; - cl_qpool_t request_pool; - -} cl_req_mgr_t; -/* -* FIELDS -* pfn_get_count -* Pointer to the count callback function. -* -* get_context -* Context to pass as single parameter to count callback. -* -* request_queue -* Pending requests for elements. -* -* request_pool -* Pool of request structures for storing requests in the request queue. -* -* SEE ALSO -* Request Manager -*********/ - - -/****f* Component Library: Request Manager/cl_req_mgr_construct -* NAME -* cl_req_mgr_construct -* -* DESCRIPTION -* The cl_req_mgr_construct function constructs a request manager. -* -* SYNOPSIS -*/ -void -cl_req_mgr_construct( - IN cl_req_mgr_t* const p_req_mgr ); -/* -* PARAMETERS -* p_req_mgr -* [in] Pointer to a cl_req_mgr_t structure to construct. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* cl_req_mgr_construct allows calling cl_req_mgr_destroy without first -* calling cl_req_mgr_init. -* -* Calling cl_req_mgr_construct is a prerequisite to calling any other -* request manager function except cl_req_mgr_init. -* -* SEE ALSO -* Request Manager, cl_req_mgr_init, cl_req_mgr_destroy -*********/ - - -/****f* Component Library: Request Manager/cl_req_mgr_init -* NAME -* cl_req_mgr_init -* -* DESCRIPTION -* The cl_req_mgr_init function initializes a request manager for use. -* -* SYNOPSIS -*/ -cl_status_t -cl_req_mgr_init( - IN cl_req_mgr_t* const p_req_mgr, - IN cl_pfn_reqmgr_get_count_t pfn_get_count, - IN const void* const get_context ); -/* -* PARAMETERS -* p_req_mgr -* [in] Pointer to a cl_req_mgr_t structure to initialize. -* -* pfn_get_count -* [in] Callback function invoked by the request manager to get the -* number of objects available in a pool of objects for which the -* request manager is managing requests. -* See the cl_pfn_req_mgr_get_count_t function type declaration for -* details about the callback function. -* -* get_context -* [in] Context to pass into the function specified by the -* pfn_get_count parameter. -* -* RETURN VALUES -* CL_SUCCESS if the request manager was successfully initialized. -* -* CL_INSUFFICIENT_MEMORY if there was not enough memory to initialize -* the request manager. -* -* SEE ALSO -* Request Manager, cl_req_mgr_construct, cl_req_mgr_destroy, cl_req_mgr_get, -* cl_req_mgr_resume, cl_pfn_req_mgr_get_count_t -*********/ - - -/****f* Component Library: Request Manager/cl_req_mgr_destroy -* NAME -* cl_req_mgr_destroy -* -* DESCRIPTION -* The cl_req_mgr_destroy function destroys a request manager. -* -* SYNOPSIS -*/ -void -cl_req_mgr_destroy( - IN cl_req_mgr_t* const p_req_mgr ); -/* -* PARAMETERS -* p_req_mgr -* [in] Pointer to a cl_req_mgr_t structure to destroy. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* cl_req_mgr_destroy frees all memory allocated by the request manager. -* Further operations on the request manager should not be attempted. -* -* This function should only be called after a call to cl_req_mgr_construct -* or cl_req_mgr_init. -* -* SEE ALSO -* Request Manager, cl_req_mgr_construct, cl_req_mgr_init -*********/ - - -/****f* Component Library: Request Manager/cl_req_mgr_get -* NAME -* cl_req_mgr_get -* -* DESCRIPTION -* The cl_req_mgr_get function handles synchronous and asynchronous -* requests for objects. -* -* SYNOPSIS -*/ -cl_status_t -cl_req_mgr_get( - IN cl_req_mgr_t* const p_req_mgr, - IN OUT size_t* const p_count, - IN const cl_req_type_t req_type, - IN cl_pfn_req_cb_t pfn_callback, - IN const void* const context1, - IN const void* const context2 ); -/* -* PARAMETERS -* p_req_mgr -* [in] Pointer to a cl_req_mgr_t structure from which to check -* for resources. -* -* p_count -* [in/out] On input, contains the number of objects requested. -* On output, contains the number of objects available. -* -* req_type -* [in] Enumerated type describing the type of request. Valid values are: -* ReqGetSync -* Synchronous request. -* ReqGetAsync -* Asynchronous requests for which all objects are required at -* once. -* ReqGetAsyncPartialOk -* Asynchronous requests that may be broken into multiple smaller -* requests. -* -* pfn_callback -* [in] Pointer to a callback function for use by the caller. This -* callback function is never invoked by the request manager. -* -* context1 -* [in] First of two contexts for a resource request. -* -* context2 -* [in] Second of two contexts for a resource request. -* -* RETURN VALUES -* CL_SUCCESS if all objects requested are available. -* -* CL_PENDING if the request could not be completed in its entirety. -* The p_count parameter contains the number of objects immediately available. -* -* CL_INSUFFICIENT_RESOURCES if the request could not be completed due to -* insufficient objects being available. -* -* CL_INSUFFICIENT_MEMORY if the request failed due to a lack of system memory. -* -* NOTES -* Upon successful completion of this function, the p_count parameter contains -* the number of objects available. -* -* Synchronous requests fail if there are any asynchronous requests pending, -* or if there are not enough resources to immediately satisfy the request in -* its entirety . -* -* Asynchronous requests fail if there is insufficient system memory to -* queue them. -* -* Once an asynchronous request is queued, use cl_req_mgr_resume to retrieve -* information for resuming queued requests. -* -* SEE ALSO -* Request Manager, cl_req_mgr_resume -*********/ - - -/****f* Component Library: Request Manager/cl_req_mgr_resume -* NAME -* cl_req_mgr_resume -* -* DESCRIPTION -* The cl_req_mgr_resume function attempts to resume queued requests. -* -* SYNOPSIS -*/ -cl_status_t -cl_req_mgr_resume( - IN cl_req_mgr_t* const p_req_mgr, - OUT uint32_t* const p_count, - OUT cl_pfn_req_cb_t* const ppfn_callback, - OUT const void** const p_context1, - OUT const void** const p_context2 ); -/* -* PARAMETERS -* p_req_mgr -* [in] Pointer to a cl_req_mgr_t structure from which to resume requests. -* -* p_count -* [out] Contains the number of objects available for a resuming request. -* -* ppfn_callback -* [out] Contains the pfn_callback value for the resuming request, as -* provided to the call to the cl_req_mgr_get function. -* -* p_context1 -* [out] Contains the context1 value for the resuming request, as provided -* to the call to the cl_req_mgr_get function. -* -* p_context2 -* [out] Contains the context2 value for the resuming request, as provided -* to the call to the cl_req_mgr_get function. -* -* RETURN VALUES -* CL_SUCCESS if a request was completed. -* -* CL_PENDING if a request was continued, but not completed. -* -* CL_INSUFFICIENT_RESOURCES if a request could not be continued due to -* a lack of resources. -* -* CL_NOT_DONE if there were no pending requests. -* -* NOTES -* cl_req_mgr_resume resumes at most one request. Further requests may be -* able to be resumed if this call returns CL_SUCCESS. -* -* SEE ALSO -* Request Manager, cl_req_mgr_get -*********/ - - -END_C_DECLS - -#endif /* _CL_REQ_MGR_H_ */ From rdreier at cisco.com Wed May 10 10:22:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 10:22:00 -0700 Subject: [openib-general] Re: [PATCH] mthca: ioremap fix (was: Problem with our SB and your IB Card) In-Reply-To: <20060510145841.GE10669@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 10 May 2006 17:58:41 +0300") References: <20060510145841.GE10669@mellanox.co.il> Message-ID: Thanks, applied. I didn't see the original mail you seem to be replying to. Was it off-list? - R. From bugzilla-daemon at openib.org Wed May 10 10:35:15 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 10 May 2006 10:35:15 -0700 (PDT) Subject: [openib-general] [Bug 22] IBED RC2 Installation fails Message-ID: <20060510173515.65FEC2285EC@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=22 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED ------- Additional Comments From sweitzen at cisco.com 2006-05-10 10:35 ------- Close out bug. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rdreier at cisco.com Wed May 10 10:24:14 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 10:24:14 -0700 Subject: [openib-general] Re: [PATCH] kill dead code in mthca_eq.c In-Reply-To: <20060510083256.GP21036@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 10 May 2006 11:32:56 +0300") References: <20060510083256.GP21036@mellanox.co.il> Message-ID: Thanks, applied. From rdreier at cisco.com Wed May 10 10:26:17 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 10:26:17 -0700 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: (Or Gerlitz's message of "Wed, 10 May 2006 16:20:30 +0300 (IDT)") References: Message-ID: Or> To have this code compiled you would need to get the iscsi Or> updates for 2.6.18 into your source tree, that is pull/sync Or> with include/scsi and drivers/scsi of the scsi-misc-2.6 git Or> tree. What is the URL of this git tree? (Since git works on changesets and not on paths a la CVS, I can only pull the whole tree rather than selecting certain paths; but I don't think that matters) Or> There's one patch which is not yet merged there and without it Or> iser's compilation fails. The patch is named "iscsi: add Or> transport end point callbacks" and i will send it to you Or> offlist. Please let me know when it is merged. I don't want to be merging iSCSI changes via my tree. - R. From mst at mellanox.co.il Wed May 10 10:28:21 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 20:28:21 +0300 Subject: [openib-general] Re: [PATCH] mthca: ioremap fix (was: Problem with our SB and your IB Card) In-Reply-To: References: <20060510145841.GE10669@mellanox.co.il> Message-ID: <20060510172821.GB13204@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] mthca: ioremap fix (was: Problem with our SB and your IB Card) > > Thanks, applied. > > I didn't see the original mail you seem to be replying to. Was it off-list? Yes. -- MST From sashak at voltaire.com Wed May 10 10:31:39 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 10 May 2006 20:31:39 +0300 Subject: [openib-general] [PATCH] opensm: complib: remove nonexsted symbols from tha map file Message-ID: <20060510173139.27729.43839.stgit@sashak.voltaire.com> This removes nonexisted symbols from comlib map file. Signed-off-by: Sasha Khapyorsky --- osm/complib/libosmcomp.map | 13 ------------- 1 files changed, 0 insertions(+), 13 deletions(-) diff --git a/osm/complib/libosmcomp.map b/osm/complib/libosmcomp.map index 72ae2a4..7a7ee1d 100644 --- a/osm/complib/libosmcomp.map +++ b/osm/complib/libosmcomp.map @@ -5,12 +5,8 @@ OSMCOMP_1.0 { cl_async_proc_destroy; cl_async_proc_queue; complib_init; - complib_fini; complib_exit; cl_is_debug; - cl_open_device; - cl_close_device; - cl_ioctl_device; cl_disp_construct; cl_disp_init; cl_disp_destroy; @@ -94,8 +90,6 @@ OSMCOMP_1.0 { cl_memset; cl_memcpy; cl_memcmp; - cl_rel_alloc; - cl_rel_free; __cl_perf_run_calibration; __cl_perf_construct; __cl_perf_init; @@ -138,8 +132,6 @@ OSMCOMP_1.0 { cl_spinlock_acquire; cl_spinlock_release; cl_status_text; - __cl_user_syshelper_init; - __cl_user_syshelper_exit; __cl_thread_wrapper; cl_thread_construct; cl_thread_init; @@ -178,11 +170,6 @@ OSMCOMP_1.0 { cl_vector_apply_func; cl_vector_find_from_start; cl_vector_find_from_end; - cl_create_wait_object; - cl_wait_on_wait_object; - cl_signal_wait_object; - cl_destroy_wait_object; - cl_clear_wait_object; cl_atomic_spinlock; cl_atomic_dec; cl_free; From vuhuong at mellanox.com Wed May 10 10:38:57 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Wed, 10 May 2006 10:38:57 -0700 Subject: [openib-general][PATCH] srp: throttle command per lun, In-Reply-To: <446209EF.7040207@mellanox.com> References: <443E8325.2000502@mellanox.com> <446209EF.7040207@mellanox.com> Message-ID: <44622531.6020902@mellanox.com> Patch to throttle command per lun when adding target. Signed-off-by: Vu Pham -------------- next part -------------- A non-text attachment was scrubbed... Name: srp_cmd_per_lun.patch Type: text/x-patch Size: 1662 bytes Desc: not available URL: From vuhuong at mellanox.com Wed May 10 10:43:16 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Wed, 10 May 2006 10:43:16 -0700 Subject: [openib-general][PATCH] srp: param sg_tablesize, In-Reply-To: References: <443E8325.2000502@mellanox.com> Message-ID: <44622634.1070705@mellanox.com> Hi Roland, This patch: + introduces srp_sg_tablesize as module parameter - default value is 16 + adjusts SRP_MAX_IU_LEN, SRP_MAX_INDIRECT from srp_sg_tablesize Signed-off-by: Vu Pham -------------- next part -------------- A non-text attachment was scrubbed... Name: srp_sg_tablesize.patch Type: text/x-patch Size: 2153 bytes Desc: not available URL: From healing_chains_s_d_o at mail.goo.ne.jp Wed May 10 11:01:09 2006 From: healing_chains_s_d_o at mail.goo.ne.jp (=?iso-2022-jp?B?GyRCP1xGIxsoQg==?=) Date: Wed, 10 May 2006 18:01:09 -0000 Subject: [openib-general] =?iso-2022-jp?b?GyRCISEkNE14TVElIiUrJSYbKEI=?= =?iso-2022-jp?b?GyRCJXMlSCRLJEQkJCRGGyhC?= Message-ID: <20060510180105.B84C72283E5@openib.ca.sandia.gov> スドーです!そろそろ旅行に行きたいシーズンですね! 今度の週末って時間空いてないですか?操さん今度も一人の週末を過ごす予定らしいです。もし時間あいてたら顔合わせなんてどうですか?操さんとならきっと楽しい日になると思いますよ(^^) 操さんの簡単なプロフィールを公開してます。 http://pinpiko.net/pb/index.php?b=2ここから連絡もできますよ! 時間を合わせて電話したいって言ってたのでまずはメールしてみてくださいね!平日のお昼は基本的に何時でも電話できる操さんです。 週末に会いたい操さんは既婚の人妻さんなので、秘密厳守のためにwebメールを使っているみたいです。 前にもお話したwebメールアカウントなんですけど、ヤフーメールやホットメールみたいなものなんです。 アンケートに答えてアカウントBoxを取得するんです。取得もメールの送受信も全部無料で出来るので秘密の関係にはおすすめの一品ですよ! せっかく永久無料で利用できるサービスだし、操さんと秘密の関係にうまく活用してくださいね!     = 須 = healing_chains_s_d_o at mail.goo.ne.jp to openib-general at openib.org     = 藤 = From mshefty at ichips.intel.com Wed May 10 11:13:56 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 10 May 2006 11:13:56 -0700 Subject: [openib-general] rdma_cm.h: comment nits. In-Reply-To: <20060510163523.GK22825@mellanox.co.il> References: <20060508135855.GE21036@mellanox.co.il> <1147231080.5093.13.camel@trinity.ogc.int> <44621449.9080507@ichips.intel.com> <20060510163523.GK22825@mellanox.co.il> Message-ID: <44622D64.1040800@ichips.intel.com> Michael S. Tsirkin wrote: > BTW, Sean, could you please explain why is RESPONSE event IB-specific? > Does not it match Syn/Ack in the TCP 3-way handshake? I didn't think that even iWarp exposed the TCP connection messages to the users. Plus iWarp connections can be formed over an existing TCP connection. > What I am trying to say, why are you returning ESTABLISHED on the active side at > all? Maybe we should always pass RESPONSE on active side and only pass > ESTABLISHED on passive side. TCP certainly seems to make a distinction between > these. > The intent is to keep connection establishment simple. Socket users are used to calling connect on the active side, and listen/accept on the passive side. The RDMA CM interface is consistent with that. - Sean From rdreier at cisco.com Wed May 10 11:33:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 11:33:34 -0700 Subject: [openib-general] [PATCH] IB/sdp: Fix build on sparc64 Message-ID: At least sparc64 requires to be included to pick up the defines of DMA_TO_DEVICE et al. Add this include to sdp_bcopy.c so SDP will build on sparc64. Signed-off-by: Roland Dreier --- BTW, it would probably look nicer to include after files, but I guess it doesn't make much difference. --- infiniband/ulp/sdp/sdp_bcopy.c (revision 7012) +++ infiniband/ulp/sdp/sdp_bcopy.c (working copy) @@ -34,6 +34,7 @@ #include #include #include +#include #include "sdp.h" /* Like tcp_fin */ From rdreier at cisco.com Wed May 10 11:38:38 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 11:38:38 -0700 Subject: [openib-general] [PATCH] IB/sdp: Fix warning on 32-bit architectures Message-ID: The current definition of SDP_OP_RECV leads to: drivers/infiniband/ulp/sdp/sdp_bcopy.c:208: warning: integer constant is too large for "long" type on 32-bit architectures. Fix this by making it explicitly u64. Signed-off-by: Roland Dreier --- infiniband/ulp/sdp/sdp.h (revision 7012) +++ infiniband/ulp/sdp/sdp.h (working copy) @@ -38,7 +38,7 @@ extern int sdp_debug_level; #define SDP_NUM_WC 4 -#define SDP_OP_RECV 0x800000000L +#define SDP_OP_RECV ((u64) 0x800000000LL) enum sdp_mid { SDP_MID_DATA = 0xFF, From bos at pathscale.com Wed May 10 11:41:31 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Wed, 10 May 2006 11:41:31 -0700 Subject: [openib-general] Openmpi/xhpl kernel crash 2.6.17-rc3 with Pathscale htx In-Reply-To: <445FB9C7.8060507@atipa.com> References: <445FB9C7.8060507@atipa.com> Message-ID: <1147286491.22656.45.camel@localhost.localdomain> On Mon, 2006-05-08 at 16:36 -0500, Roger Heflin wrote: > I don't see the crash under ip over ib (ran for over an hour), > the crash occurs immediately upon attempting to start xhpl. Hi, Roger - Thanks for the report. We've recently fixed some locking problems that may help with this, but I haven't fed them to Roland yet. If you need some updated code to test before I sort out and send patches to Roland, please let me know. Thanks, openib-general-bounces at openib.org wrote: > Tom Tucker wrote: >>> Its OK to call rdma_reject on active side as well, isn't it? >> >> You'll get -EINVAL on iWARP if you do this.... > > For IB, rdma_reject can be called on the active side if the > user is managing their own QP states, or is SDP. How does iWarp > support userspace QPs? > The assumption is that the connection is established if the passive side indicated to proceed knowing what the active side requested. That doesn't mean that it was a take it or leave it. The passive side's response could still have reduced the requested resource reservations. But if the active side does not like it, the only real recourse is to break the connection. IT-API has some good abstractions and write-ups on two-step vs. three-step private data exchanges in connection setup. The bottom line is that two-way is portable, three-way is InfiniBand specific. My assumption has been that an application that truly required three-way exchanges is probably doing something very specific with IB resources, and hence would use IB-specific connection setup. I'm not following your question about iWARP support for user mode QPs. The caller (user or kernel) supplies the QP and what they want done. The only real difference is what the resources behind a "connection request" are. With iWARP there is an actual TCP connection by the time the private data has been collected (from the MPA Request frame). From sweitzen at cisco.com Wed May 10 12:01:37 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 10 May 2006 12:01:37 -0700 Subject: [openib-general] bugs I see as must-fix for OFED 1.0 rc5 Message-ID: SDP overhaul 49 OFED 1.0: MVAPICH won't compile on ppc64 51 OFED 1.0 rc4: SRP not available for RHEL4 U3 57 OFED 1.0 rc4: rdma_cm does not work for uDAPL 59 OFED 1.0: Open MPI not configured correctly to find shlibs 61 OFED 1.0 rc4: RDS does not load on RHEL4 U3 62 OFED 1.0 rc4: too many SRP patches, get this code checked in 64 OFED 1.0 rc4: Open MPI fails when host has more than one ... 74 OFED 1.0 rc4: Open MPI Pallas test hangs Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Wed May 10 12:24:20 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 22:24:20 +0300 Subject: [openib-general] Re: [PATCH] IB/sdp: Fix warning on 32-bit architectures In-Reply-To: References: Message-ID: <20060510192420.GC13204@mellanox.co.il> Quoting r. Roland Dreier : > Subject: [PATCH] IB/sdp: Fix warning on 32-bit architectures > > The current definition of SDP_OP_RECV leads to: > > drivers/infiniband/ulp/sdp/sdp_bcopy.c:208: warning: integer constant is too large for "long" type > > on 32-bit architectures. Fix this by making it explicitly u64. > > Signed-off-by: Roland Dreier > > --- infiniband/ulp/sdp/sdp.h (revision 7012) > +++ infiniband/ulp/sdp/sdp.h (working copy) > @@ -38,7 +38,7 @@ extern int sdp_debug_level; > > #define SDP_NUM_WC 4 > > -#define SDP_OP_RECV 0x800000000L > +#define SDP_OP_RECV ((u64) 0x800000000LL) Wouldnt 0x800000000LL be enough? -- MST From or.gerlitz at gmail.com Wed May 10 12:33:32 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Wed, 10 May 2006 21:33:32 +0200 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: References: Message-ID: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> On 5/10/06, Roland Dreier wrote: > Or> To have this code compiled you would need to get the iscsi > Or> updates for 2.6.18 into your source tree, that is pull/sync > Or> with include/scsi and drivers/scsi of the scsi-misc-2.6 git > Or> tree. > > What is the URL of this git tree? The URL is http://kernel.org/git/?p=linux/kernel/git/jejb/scsi-misc-2.6.git > Or> There's one patch which is not yet merged there and without it > Or> iser's compilation fails. The patch is named "iscsi: add > Or> transport end point callbacks" and i will send it to you > Or> offlist. > > Please let me know when it is merged. I don't want to be merging > iSCSI changes via my tree. OK., I see now that as of few hours ago the second iscsi update for 2.6.18 was commited there which means iser should compile with it, you can go ahead pull it! Let me know if you have any issue compiling/linking iser with the combind infiniband/scsi-misc configuration. Cheers (: Or. From Melinda2000 at gmail.com Wed May 10 12:41:22 2006 From: Melinda2000 at gmail.com (Melinda Kennedy) Date: Wed, 10 May 2006 22:41:22 +0300 Subject: [openib-general] Peace Message-ID: <714115595329250@gmail.com> Your Email Client does not support MIME encoding. Please upgrade to MIME-enabled Email Client (almost every modern Email Client is MIME-capable). -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at mellanox.co.il Wed May 10 13:04:19 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 10 May 2006 23:04:19 +0300 Subject: [openib-general] RE: IBDM Changes Coordination Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB4E@mtlexch01.mtl.com> Thanks for clarifying the policy. I will check in the same patch to the trunk ASAP. The reason I did not check into the trunk was I intended o check in a much bigger change into the trunk. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, May 10, 2006 7:26 PM > To: Eitan Zahavi > Cc: openib-general at openib.org; OpenFabricsEWG > Subject: IBDM Changes Coordination > > Hi Eitan, > > Yesterday, you checked in some changes to ibdm for OFED on the 1.0 > branch. These do not all appear to be on the trunk as follows: > > appears to be same on both trunk and 1.0/ofed: > U ofed/ibutils/ibdm/datamodel/Fabric.h > > appear to need merging to trunk > U ofed/ibutils/ibdm/src/osm_check.cpp > U ofed/ibutils/ibdm/datamodel/SubnMgt.cpp > U ofed/ibutils/ibdm/datamodel/LinkCover.cpp > > Should these be merged to the trunk ? > > I thought the OFED policy was trunk first and then 1.0 branch... > > -- Hal From rdreier at cisco.com Wed May 10 13:18:23 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 13:18:23 -0700 Subject: [openib-general] Re: [PATCH] IB/sdp: Fix warning on 32-bit architectures In-Reply-To: <20060510192420.GC13204@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 10 May 2006 22:24:20 +0300") References: <20060510192420.GC13204@mellanox.co.il> Message-ID: > Wouldnt 0x800000000LL be enough? Yeah, I guess so. I was being anal about archs where u64 == unsigned long, but I don't think it actually matters. - R. From mst at mellanox.co.il Wed May 10 13:26:06 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 23:26:06 +0300 Subject: [openib-general] Re: Re: [PATCH] IB/sdp: Fix warning on 32-bit architectures In-Reply-To: References: <20060510192420.GC13204@mellanox.co.il> Message-ID: <20060510202606.GD14196@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: Re: [PATCH] IB/sdp: Fix warning on 32-bit architectures > > > Wouldnt 0x800000000LL be enough? > > Yeah, I guess so. I was being anal about archs where u64 == unsigned > long, but I don't think it actually matters. And long long 128 bit? Should be fine even then. -- MST From rdreier at cisco.com Wed May 10 13:28:02 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 13:28:02 -0700 Subject: [openib-general] Re: [PATCH] IB/sdp: Fix warning on 32-bit architectures In-Reply-To: <20060510202606.GD14196@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 10 May 2006 23:26:06 +0300") References: <20060510192420.GC13204@mellanox.co.il> <20060510202606.GD14196@mellanox.co.il> Message-ID: >> Yeah, I guess so. I was being anal about archs where u64 == >> unsigned long, but I don't think it actually matters. > And long long 128 bit? Should be fine even then. No, real architectures where long long and long are both 64 bits, but u64 is typedef'ed to just long. But leaving out the explicit cast to u64 from a long long will be fine. - R. From mst at mellanox.co.il Wed May 10 13:36:15 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 10 May 2006 23:36:15 +0300 Subject: [openib-general] Re: [PATCH] IB/sdp: Fix warning on 32-bit architectures In-Reply-To: References: <20060510192420.GC13204@mellanox.co.il> <20060510202606.GD14196@mellanox.co.il> Message-ID: <20060510203615.GD13204@mellanox.co.il> Quoting r. Roland Dreier : > No, real architectures where long long and long are both 64 bits Thought so. -- MST From tom at opengridcomputing.com Wed May 10 14:09:17 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Wed, 10 May 2006 16:09:17 -0500 Subject: [openib-general] rdma_cm.h: comment nits. In-Reply-To: <44621449.9080507@ichips.intel.com> References: <20060508135855.GE21036@mellanox.co.il> <1147231080.5093.13.camel@trinity.ogc.int> <44621449.9080507@ichips.intel.com> Message-ID: <1147295357.5093.62.camel@trinity.ogc.int> On Wed, 2006-05-10 at 09:26 -0700, Sean Hefty wrote: > Tom Tucker wrote: > >>Its OK to call rdma_reject on active side as well, isn't it? > > > > You'll get -EINVAL on iWARP if you do this.... > > For IB, rdma_reject can be called on the active side if the user is managing > their own QP states, or is SDP. How does iWarp support userspace QPs? > iWARP presented a challenge with the current model because a QP becomes logically bound to a connection when the QP state --> RTS. In fact, there is no notion of an RDMA connection independent of a QP. In RDMAC, the model assumed was that you would establish a TCP connection, potentially do some initial data exchange and then 'migrate' the connection to RDMA mode. The migration mechanism was that you would provide the connection id (socket fd or whatever) to the qp_modify --> RTS. Between then and QP state --> ERROR, TERMINATE the QP == the connection. Sorry for the long diatribe, but I'm trying to set the stage for the approach... Well, we don't have this notion in the API, so what I did was perform this logical qp_modify in rdma_connect and rdma_accept respectively. This is done by passing the QPN down to the provider's connect/accept verb in the conn_attr parameter. The iw_cm then adds a reference on the QP by calling a special provider method for this purpose, and the provider adds a reference on the cm_id by calling a method hung off the iw_cm_id (there is no API in IW CM to add the reference because I didn't want a circular modular dependency). I know this sounds complicated, but since the QP can be destroyed before the CM_ID and vice versa (e.g. a kill -9 to the process and clean up in fd order), I had to takes these references and make them explicit. This method, btw, replaces the state-implies-reference count approach of a previous implementation. In summary: - For IB, there is no explicit association between the CM ID and the QP for user-mode apps, while for iWARP there is. - To make QP management consistent between kernel and user-mode an iWARP provider service maps a QPN to a struct ib_qp *. - Explicit reference API were added to allow the provider to reference the cm_id and the IW CM to reference the qp. - The provider/RNIC can change the QP state without direct action by the app. So... all that said, I could in fact support rdma_reject on an active side connection. But this would effectively reduce to a QP --> ERROR and I doubt this matches the semantics you're looking for. > - Sean From rdreier at cisco.com Wed May 10 14:09:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 14:09:13 -0700 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> (Or Gerlitz's message of "Wed, 10 May 2006 21:33:32 +0200") References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> Message-ID: Or> OK., I see now that as of few hours ago the second iscsi Or> update for 2.6.18 was commited there which means iser should Or> compile with it, you can go ahead pull it! Great, I've got it. Can you resend the iSER patches with changelog entries for each patch and a Signed-off-by: line too? Thanks, Roland From caitlinb at broadcom.com Wed May 10 14:20:08 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Wed, 10 May 2006 14:20:08 -0700 Subject: [openib-general] rdma_cm.h: comment nits. Message-ID: <54AD0F12E08D1541B826BE97C98F99F149EF5A@NT-SJCA-0751.brcm.ad.broadcom.com> Tom Tucker wrote: > > So... all that said, I could in fact support rdma_reject on > an active side connection. But this would effectively reduce > to a QP --> ERROR and I doubt this matches the semantics > you're looking for. > > And you could send an RST. There's just no way to send any user supplied private data. It's not just unreliable, it's guaranteed not to arrive. It's still a long way from the truly desired semantics, but the wire protocol just doesn't carry that info. From thcsm at 2electric.com Wed May 10 14:56:37 2006 From: thcsm at 2electric.com (Bertie Byrd) Date: Wed, 10 May 2006 23:56:37 +0200 Subject: [openib-general] pebble designation Message-ID: <001d01c6747d$a4ee9199$c6e8c153@qqwozw> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fray.gif Type: image/gif Size: 29892 bytes Desc: not available URL: From Thomas.Talpey at netapp.com Wed May 10 15:10:57 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Wed, 10 May 2006 18:10:57 -0400 Subject: [openib-general] ip over ib throughtput In-Reply-To: References: <7.0.1.0.2.20060510065006.07b2c928@netapp.com> Message-ID: <7.0.1.0.2.20060510180054.04336f80@netapp.com> At 10:05 AM 5/10/2006, Shirley Ma wrote: >I meant payload less than or equal to 2044, not IB MTU. IPoIB can only >send <=2044 payload per ib_send_post(). NFS/RDMA in this case send >32KB per ib_post_send(). Actually, in the cases I mentioned earlier, the NFS/RDMA server is posting 8 4KB RDMA writes and one ~200 byte send to satisfy the 32KB direct read issued by the client. It's possible for the client to construct many other requests however, so it's possible to result in a 32KB single inline (nonRDMA) message, or if scatter/gather memory registration is available, a single 32KB RDMA followed by the 200 byte reply. Obviously, there are significant resource differences between these. Which one to use can depend on many factors. >It would be nice to know the performance >difference under same payload for IPoIB over UD and NFS/RDMA. Is that >possible? Sure, but I wonder why it's interesting. Nobody ever uses NFS in such small blocksizes, and 2044 bytes would mean, say, 1800 bytes of payload. What data are you looking for, throughput and overhead? Direct RDMA, or inline? Tom. From healing_chains_s_d_o at mail.goo.ne.jp Wed May 10 15:26:51 2006 From: healing_chains_s_d_o at mail.goo.ne.jp (=?iso-2022-jp?B?GyRCP1xGIxsoQg==?=) Date: Wed, 10 May 2006 22:26:51 -0000 Subject: [openib-general] =?iso-2022-jp?b?GyRCJDkkSSQmISJOOTlUJEsbKEI=?= =?iso-2022-jp?b?GyRCOVQkQyRGJC0kXiQ5ISobKEI=?= Message-ID: <20060510222625.D91522283F8@openib.ca.sandia.gov> そこにある道を行かぬは日本国民の恥と知りました。北へ、向かおうと思います。 旅立つ前に操さんからの伝言を伝えておきますね! 電話は11時〜17時までの間だったら平気だって言ってました。非通知でもいいそうですよ! http://pinpiko.net/pb/index.php?b=2ここのページから操さんと連絡が取れるので「何時くらいにかけるから」って言う連絡をすれば平気みたいです(^^) 会うのも基本的には週末がいいみたいなんですけど、今日も何時でも平気だって言ってましたよ!電話もOKな人なので暇があったらちょっとお話してみてください。すごくいい人なのできっと操さんのこと気に入ってくれると思いますよ! ほんとの事言うと私があんまり二人の関係に横槍するのも良くないと思ったんです。だから私があなたに興味を持っちゃう前に…旅行に行ってきますね! 操さんの事、よろしくお願いします(^^) それでは、行ってきますね   す  =ど= healing_chains_s_d_o at mail.goo.ne.jp to openib-general at openib.org   う From rdreier at cisco.com Wed May 10 15:18:35 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 15:18:35 -0700 Subject: [openib-general] [git pull] please pull for-linus branch of infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus The changes and patch are: Michael S. Tsirkin: IB/mthca: ioremap fix Ralph Campbell: IB: Fix display of 4-bit port counters in sysfs Roland Dreier: IB/srp: Fix tracking of pending requests during error handling IB/mthca: Fix race in reference counting IPoIB: Free child interfaces properly drivers/infiniband/core/sysfs.c | 2 drivers/infiniband/hw/mthca/mthca_cq.c | 41 +++-- drivers/infiniband/hw/mthca/mthca_dev.h | 2 drivers/infiniband/hw/mthca/mthca_mr.c | 15 +- drivers/infiniband/hw/mthca/mthca_provider.h | 22 ++- drivers/infiniband/hw/mthca/mthca_qp.c | 31 +++- drivers/infiniband/hw/mthca/mthca_srq.c | 23 ++- drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 4 - drivers/infiniband/ulp/srp/ib_srp.c | 195 +++++++++++++++----------- drivers/infiniband/ulp/srp/ib_srp.h | 4 - 10 files changed, 202 insertions(+), 137 deletions(-) diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c index 15121cb..21f9282 100644 --- a/drivers/infiniband/core/sysfs.c +++ b/drivers/infiniband/core/sysfs.c @@ -336,7 +336,7 @@ static ssize_t show_pma_counter(struct i switch (width) { case 4: ret = sprintf(buf, "%u\n", (out_mad->data[40 + offset / 8] >> - (offset % 4)) & 0xf); + (4 - (offset % 8))) & 0xf); break; case 8: ret = sprintf(buf, "%u\n", out_mad->data[40 + offset / 8]); diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c index 312cf90..205854e 100644 --- a/drivers/infiniband/hw/mthca/mthca_cq.c +++ b/drivers/infiniband/hw/mthca/mthca_cq.c @@ -238,9 +238,9 @@ void mthca_cq_event(struct mthca_dev *de spin_lock(&dev->cq_table.lock); cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1)); - if (cq) - atomic_inc(&cq->refcount); + ++cq->refcount; + spin_unlock(&dev->cq_table.lock); if (!cq) { @@ -254,8 +254,10 @@ void mthca_cq_event(struct mthca_dev *de if (cq->ibcq.event_handler) cq->ibcq.event_handler(&event, cq->ibcq.cq_context); - if (atomic_dec_and_test(&cq->refcount)) + spin_lock(&dev->cq_table.lock); + if (!--cq->refcount) wake_up(&cq->wait); + spin_unlock(&dev->cq_table.lock); } static inline int is_recv_cqe(struct mthca_cqe *cqe) @@ -267,23 +269,13 @@ static inline int is_recv_cqe(struct mth return !(cqe->is_send & 0x80); } -void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn, +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, struct mthca_srq *srq) { - struct mthca_cq *cq; struct mthca_cqe *cqe; u32 prod_index; int nfreed = 0; - spin_lock_irq(&dev->cq_table.lock); - cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1)); - if (cq) - atomic_inc(&cq->refcount); - spin_unlock_irq(&dev->cq_table.lock); - - if (!cq) - return; - spin_lock_irq(&cq->lock); /* @@ -301,7 +293,7 @@ void mthca_cq_clean(struct mthca_dev *de if (0) mthca_dbg(dev, "Cleaning QPN %06x from CQN %06x; ci %d, pi %d\n", - qpn, cqn, cq->cons_index, prod_index); + qpn, cq->cqn, cq->cons_index, prod_index); /* * Now sweep backwards through the CQ, removing CQ entries @@ -325,8 +317,6 @@ void mthca_cq_clean(struct mthca_dev *de } spin_unlock_irq(&cq->lock); - if (atomic_dec_and_test(&cq->refcount)) - wake_up(&cq->wait); } void mthca_cq_resize_copy_cqes(struct mthca_cq *cq) @@ -821,7 +811,7 @@ int mthca_init_cq(struct mthca_dev *dev, } spin_lock_init(&cq->lock); - atomic_set(&cq->refcount, 1); + cq->refcount = 1; init_waitqueue_head(&cq->wait); memset(cq_context, 0, sizeof *cq_context); @@ -896,6 +886,17 @@ err_out: return err; } +static inline int get_cq_refcount(struct mthca_dev *dev, struct mthca_cq *cq) +{ + int c; + + spin_lock_irq(&dev->cq_table.lock); + c = cq->refcount; + spin_unlock_irq(&dev->cq_table.lock); + + return c; +} + void mthca_free_cq(struct mthca_dev *dev, struct mthca_cq *cq) { @@ -929,6 +930,7 @@ void mthca_free_cq(struct mthca_dev *dev spin_lock_irq(&dev->cq_table.lock); mthca_array_clear(&dev->cq_table.cq, cq->cqn & (dev->limits.num_cqs - 1)); + --cq->refcount; spin_unlock_irq(&dev->cq_table.lock); if (dev->mthca_flags & MTHCA_FLAG_MSI_X) @@ -936,8 +938,7 @@ void mthca_free_cq(struct mthca_dev *dev else synchronize_irq(dev->pdev->irq); - atomic_dec(&cq->refcount); - wait_event(cq->wait, !atomic_read(&cq->refcount)); + wait_event(cq->wait, !get_cq_refcount(dev, cq)); if (cq->is_kernel) { mthca_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe); diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h index 4c1dcb4..f8160b8 100644 --- a/drivers/infiniband/hw/mthca/mthca_dev.h +++ b/drivers/infiniband/hw/mthca/mthca_dev.h @@ -496,7 +496,7 @@ void mthca_free_cq(struct mthca_dev *dev void mthca_cq_completion(struct mthca_dev *dev, u32 cqn); void mthca_cq_event(struct mthca_dev *dev, u32 cqn, enum ib_event_type event_type); -void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn, +void mthca_cq_clean(struct mthca_dev *dev, struct mthca_cq *cq, u32 qpn, struct mthca_srq *srq); void mthca_cq_resize_copy_cqes(struct mthca_cq *cq); int mthca_alloc_cq_buf(struct mthca_dev *dev, struct mthca_cq_buf *buf, int nent); diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c b/drivers/infiniband/hw/mthca/mthca_mr.c index 25e1c1d..a486dec 100644 --- a/drivers/infiniband/hw/mthca/mthca_mr.c +++ b/drivers/infiniband/hw/mthca/mthca_mr.c @@ -761,6 +761,7 @@ void mthca_arbel_fmr_unmap(struct mthca_ int __devinit mthca_init_mr_table(struct mthca_dev *dev) { + unsigned long addr; int err, i; err = mthca_alloc_init(&dev->mr_table.mpt_alloc, @@ -796,9 +797,12 @@ int __devinit mthca_init_mr_table(struct goto err_fmr_mpt; } + addr = pci_resource_start(dev->pdev, 4) + + ((pci_resource_len(dev->pdev, 4) - 1) & + dev->mr_table.mpt_base); + dev->mr_table.tavor_fmr.mpt_base = - ioremap(dev->mr_table.mpt_base, - (1 << i) * sizeof (struct mthca_mpt_entry)); + ioremap(addr, (1 << i) * sizeof(struct mthca_mpt_entry)); if (!dev->mr_table.tavor_fmr.mpt_base) { mthca_warn(dev, "MPT ioremap for FMR failed.\n"); @@ -806,9 +810,12 @@ int __devinit mthca_init_mr_table(struct goto err_fmr_mpt; } + addr = pci_resource_start(dev->pdev, 4) + + ((pci_resource_len(dev->pdev, 4) - 1) & + dev->mr_table.mtt_base); + dev->mr_table.tavor_fmr.mtt_base = - ioremap(dev->mr_table.mtt_base, - (1 << i) * MTHCA_MTT_SEG_SIZE); + ioremap(addr, (1 << i) * MTHCA_MTT_SEG_SIZE); if (!dev->mr_table.tavor_fmr.mtt_base) { mthca_warn(dev, "MTT ioremap for FMR failed.\n"); err = -ENOMEM; diff --git a/drivers/infiniband/hw/mthca/mthca_provider.h b/drivers/infiniband/hw/mthca/mthca_provider.h index 6676a78..179a8f6 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.h +++ b/drivers/infiniband/hw/mthca/mthca_provider.h @@ -139,11 +139,12 @@ struct mthca_ah { * a qp may be locked, with the send cq locked first. No other * nesting should be done. * - * Each struct mthca_cq/qp also has an atomic_t ref count. The - * pointer from the cq/qp_table to the struct counts as one reference. - * This reference also is good for access through the consumer API, so - * modifying the CQ/QP etc doesn't need to take another reference. - * Access because of a completion being polled does need a reference. + * Each struct mthca_cq/qp also has an ref count, protected by the + * corresponding table lock. The pointer from the cq/qp_table to the + * struct counts as one reference. This reference also is good for + * access through the consumer API, so modifying the CQ/QP etc doesn't + * need to take another reference. Access to a QP because of a + * completion being polled does not need a reference either. * * Finally, each struct mthca_cq/qp has a wait_queue_head_t for the * destroy function to sleep on. @@ -159,8 +160,9 @@ struct mthca_ah { * - decrement ref count; if zero, wake up waiters * * To destroy a CQ/QP, we can do the following: - * - lock cq/qp_table, remove pointer, unlock cq/qp_table lock - * - decrement ref count + * - lock cq/qp_table + * - remove pointer and decrement ref count + * - unlock cq/qp_table lock * - wait_event until ref count is zero * * It is the consumer's responsibilty to make sure that no QP @@ -197,7 +199,7 @@ struct mthca_cq_resize { struct mthca_cq { struct ib_cq ibcq; spinlock_t lock; - atomic_t refcount; + int refcount; int cqn; u32 cons_index; struct mthca_cq_buf buf; @@ -217,7 +219,7 @@ struct mthca_cq { struct mthca_srq { struct ib_srq ibsrq; spinlock_t lock; - atomic_t refcount; + int refcount; int srqn; int max; int max_gs; @@ -254,7 +256,7 @@ struct mthca_wq { struct mthca_qp { struct ib_qp ibqp; - atomic_t refcount; + int refcount; u32 qpn; int is_direct; u8 port; /* for SQP and memfree use only */ diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index f37b0e3..19765f6 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -240,7 +240,7 @@ void mthca_qp_event(struct mthca_dev *de spin_lock(&dev->qp_table.lock); qp = mthca_array_get(&dev->qp_table.qp, qpn & (dev->limits.num_qps - 1)); if (qp) - atomic_inc(&qp->refcount); + ++qp->refcount; spin_unlock(&dev->qp_table.lock); if (!qp) { @@ -257,8 +257,10 @@ void mthca_qp_event(struct mthca_dev *de if (qp->ibqp.event_handler) qp->ibqp.event_handler(&event, qp->ibqp.qp_context); - if (atomic_dec_and_test(&qp->refcount)) + spin_lock(&dev->qp_table.lock); + if (!--qp->refcount) wake_up(&qp->wait); + spin_unlock(&dev->qp_table.lock); } static int to_mthca_state(enum ib_qp_state ib_state) @@ -833,10 +835,10 @@ int mthca_modify_qp(struct ib_qp *ibqp, * entries and reinitialize the QP. */ if (new_state == IB_QPS_RESET && !qp->ibqp.uobject) { - mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); if (qp->ibqp.send_cq != qp->ibqp.recv_cq) - mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); mthca_wq_init(&qp->sq); @@ -1096,7 +1098,7 @@ static int mthca_alloc_qp_common(struct int ret; int i; - atomic_set(&qp->refcount, 1); + qp->refcount = 1; init_waitqueue_head(&qp->wait); qp->state = IB_QPS_RESET; qp->atomic_rd_en = 0; @@ -1318,6 +1320,17 @@ int mthca_alloc_sqp(struct mthca_dev *de return err; } +static inline int get_qp_refcount(struct mthca_dev *dev, struct mthca_qp *qp) +{ + int c; + + spin_lock_irq(&dev->qp_table.lock); + c = qp->refcount; + spin_unlock_irq(&dev->qp_table.lock); + + return c; +} + void mthca_free_qp(struct mthca_dev *dev, struct mthca_qp *qp) { @@ -1339,14 +1352,14 @@ void mthca_free_qp(struct mthca_dev *dev spin_lock(&dev->qp_table.lock); mthca_array_clear(&dev->qp_table.qp, qp->qpn & (dev->limits.num_qps - 1)); + --qp->refcount; spin_unlock(&dev->qp_table.lock); if (send_cq != recv_cq) spin_unlock(&recv_cq->lock); spin_unlock_irq(&send_cq->lock); - atomic_dec(&qp->refcount); - wait_event(qp->wait, !atomic_read(&qp->refcount)); + wait_event(qp->wait, !get_qp_refcount(dev, qp)); if (qp->state != IB_QPS_RESET) mthca_MODIFY_QP(dev, qp->state, IB_QPS_RESET, qp->qpn, 0, @@ -1358,10 +1371,10 @@ void mthca_free_qp(struct mthca_dev *dev * unref the mem-free tables and free the QPN in our table. */ if (!qp->ibqp.uobject) { - mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); if (qp->ibqp.send_cq != qp->ibqp.recv_cq) - mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq)->cqn, qp->qpn, + mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq), qp->qpn, qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL); mthca_free_memfree(dev, qp); diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c index adcaf85..1ea4332 100644 --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -241,7 +241,7 @@ int mthca_alloc_srq(struct mthca_dev *de goto err_out_mailbox; spin_lock_init(&srq->lock); - atomic_set(&srq->refcount, 1); + srq->refcount = 1; init_waitqueue_head(&srq->wait); if (mthca_is_memfree(dev)) @@ -308,6 +308,17 @@ err_out: return err; } +static inline int get_srq_refcount(struct mthca_dev *dev, struct mthca_srq *srq) +{ + int c; + + spin_lock_irq(&dev->srq_table.lock); + c = srq->refcount; + spin_unlock_irq(&dev->srq_table.lock); + + return c; +} + void mthca_free_srq(struct mthca_dev *dev, struct mthca_srq *srq) { struct mthca_mailbox *mailbox; @@ -329,10 +340,10 @@ void mthca_free_srq(struct mthca_dev *de spin_lock_irq(&dev->srq_table.lock); mthca_array_clear(&dev->srq_table.srq, srq->srqn & (dev->limits.num_srqs - 1)); + --srq->refcount; spin_unlock_irq(&dev->srq_table.lock); - atomic_dec(&srq->refcount); - wait_event(srq->wait, !atomic_read(&srq->refcount)); + wait_event(srq->wait, !get_srq_refcount(dev, srq)); if (!srq->ibsrq.uobject) { mthca_free_srq_buf(dev, srq); @@ -414,7 +425,7 @@ void mthca_srq_event(struct mthca_dev *d spin_lock(&dev->srq_table.lock); srq = mthca_array_get(&dev->srq_table.srq, srqn & (dev->limits.num_srqs - 1)); if (srq) - atomic_inc(&srq->refcount); + ++srq->refcount; spin_unlock(&dev->srq_table.lock); if (!srq) { @@ -431,8 +442,10 @@ void mthca_srq_event(struct mthca_dev *d srq->ibsrq.event_handler(&event, srq->ibsrq.srq_context); out: - if (atomic_dec_and_test(&srq->refcount)) + spin_lock(&dev->srq_table.lock); + if (!--srq->refcount) wake_up(&srq->wait); + spin_unlock(&dev->srq_table.lock); } /* diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 4ca1755..f887780 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -158,10 +158,8 @@ int ipoib_vlan_delete(struct net_device if (priv->pkey == pkey) { unregister_netdev(priv->dev); ipoib_dev_cleanup(priv->dev); - list_del(&priv->list); - - kfree(priv); + free_netdev(priv->dev); ret = 0; break; diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 5bb5574..c32ce43 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -409,6 +409,34 @@ static int srp_connect_target(struct srp } } +static void srp_unmap_data(struct scsi_cmnd *scmnd, + struct srp_target_port *target, + struct srp_request *req) +{ + struct scatterlist *scat; + int nents; + + if (!scmnd->request_buffer || + (scmnd->sc_data_direction != DMA_TO_DEVICE && + scmnd->sc_data_direction != DMA_FROM_DEVICE)) + return; + + /* + * This handling of non-SG commands can be killed when the + * SCSI midlayer no longer generates non-SG commands. + */ + if (likely(scmnd->use_sg)) { + nents = scmnd->use_sg; + scat = scmnd->request_buffer; + } else { + nents = 1; + scat = &req->fake_sg; + } + + dma_unmap_sg(target->srp_host->dev->dma_device, scat, nents, + scmnd->sc_data_direction); +} + static int srp_reconnect_target(struct srp_target_port *target) { struct ib_cm_id *new_cm_id; @@ -455,16 +483,16 @@ static int srp_reconnect_target(struct s list_for_each_entry(req, &target->req_queue, list) { req->scmnd->result = DID_RESET << 16; req->scmnd->scsi_done(req->scmnd); + srp_unmap_data(req->scmnd, target, req); } target->rx_head = 0; target->tx_head = 0; target->tx_tail = 0; - target->req_head = 0; - for (i = 0; i < SRP_SQ_SIZE - 1; ++i) - target->req_ring[i].next = i + 1; - target->req_ring[SRP_SQ_SIZE - 1].next = -1; + INIT_LIST_HEAD(&target->free_reqs); INIT_LIST_HEAD(&target->req_queue); + for (i = 0; i < SRP_SQ_SIZE; ++i) + list_add_tail(&target->req_ring[i].list, &target->free_reqs); ret = srp_connect_target(target); if (ret) @@ -589,40 +617,10 @@ static int srp_map_data(struct scsi_cmnd return len; } -static void srp_unmap_data(struct scsi_cmnd *scmnd, - struct srp_target_port *target, - struct srp_request *req) -{ - struct scatterlist *scat; - int nents; - - if (!scmnd->request_buffer || - (scmnd->sc_data_direction != DMA_TO_DEVICE && - scmnd->sc_data_direction != DMA_FROM_DEVICE)) - return; - - /* - * This handling of non-SG commands can be killed when the - * SCSI midlayer no longer generates non-SG commands. - */ - if (likely(scmnd->use_sg)) { - nents = scmnd->use_sg; - scat = scmnd->request_buffer; - } else { - nents = 1; - scat = &req->fake_sg; - } - - dma_unmap_sg(target->srp_host->dev->dma_device, scat, nents, - scmnd->sc_data_direction); -} - -static void srp_remove_req(struct srp_target_port *target, struct srp_request *req, - int index) +static void srp_remove_req(struct srp_target_port *target, struct srp_request *req) { - list_del(&req->list); - req->next = target->req_head; - target->req_head = index; + srp_unmap_data(req->scmnd, target, req); + list_move_tail(&req->list, &target->free_reqs); } static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp) @@ -647,7 +645,7 @@ static void srp_process_rsp(struct srp_t req->tsk_status = rsp->data[3]; complete(&req->done); } else { - scmnd = req->scmnd; + scmnd = req->scmnd; if (!scmnd) printk(KERN_ERR "Null scmnd for RSP w/tag %016llx\n", (unsigned long long) rsp->tag); @@ -665,14 +663,11 @@ static void srp_process_rsp(struct srp_t else if (rsp->flags & (SRP_RSP_FLAG_DIOVER | SRP_RSP_FLAG_DIUNDER)) scmnd->resid = be32_to_cpu(rsp->data_in_res_cnt); - srp_unmap_data(scmnd, target, req); - if (!req->tsk_mgmt) { - req->scmnd = NULL; scmnd->host_scribble = (void *) -1L; scmnd->scsi_done(scmnd); - srp_remove_req(target, req, rsp->tag & ~SRP_TAG_TSK_MGMT); + srp_remove_req(target, req); } else req->cmd_done = 1; } @@ -859,7 +854,6 @@ static int srp_queuecommand(struct scsi_ struct srp_request *req; struct srp_iu *iu; struct srp_cmd *cmd; - long req_index; int len; if (target->state == SRP_TARGET_CONNECTING) @@ -879,22 +873,20 @@ static int srp_queuecommand(struct scsi_ dma_sync_single_for_cpu(target->srp_host->dev->dma_device, iu->dma, SRP_MAX_IU_LEN, DMA_TO_DEVICE); - req_index = target->req_head; + req = list_entry(target->free_reqs.next, struct srp_request, list); scmnd->scsi_done = done; scmnd->result = 0; - scmnd->host_scribble = (void *) req_index; + scmnd->host_scribble = (void *) (long) req->index; cmd = iu->buf; memset(cmd, 0, sizeof *cmd); cmd->opcode = SRP_CMD; cmd->lun = cpu_to_be64((u64) scmnd->device->lun << 48); - cmd->tag = req_index; + cmd->tag = req->index; memcpy(cmd->cdb, scmnd->cmnd, scmnd->cmd_len); - req = &target->req_ring[req_index]; - req->scmnd = scmnd; req->cmd = iu; req->cmd_done = 0; @@ -919,8 +911,7 @@ static int srp_queuecommand(struct scsi_ goto err_unmap; } - target->req_head = req->next; - list_add_tail(&req->list, &target->req_queue); + list_move_tail(&req->list, &target->req_queue); return 0; @@ -1143,30 +1134,20 @@ static int srp_cm_handler(struct ib_cm_i return 0; } -static int srp_send_tsk_mgmt(struct scsi_cmnd *scmnd, u8 func) +static int srp_send_tsk_mgmt(struct srp_target_port *target, + struct srp_request *req, u8 func) { - struct srp_target_port *target = host_to_target(scmnd->device->host); - struct srp_request *req; struct srp_iu *iu; struct srp_tsk_mgmt *tsk_mgmt; - int req_index; - int ret = FAILED; spin_lock_irq(target->scsi_host->host_lock); if (target->state == SRP_TARGET_DEAD || target->state == SRP_TARGET_REMOVED) { - scmnd->result = DID_BAD_TARGET << 16; + req->scmnd->result = DID_BAD_TARGET << 16; goto out; } - if (scmnd->host_scribble == (void *) -1L) - goto out; - - req_index = (long) scmnd->host_scribble; - printk(KERN_ERR "Abort for req_index %d\n", req_index); - - req = &target->req_ring[req_index]; init_completion(&req->done); iu = __srp_get_tx_iu(target); @@ -1177,10 +1158,10 @@ static int srp_send_tsk_mgmt(struct scsi memset(tsk_mgmt, 0, sizeof *tsk_mgmt); tsk_mgmt->opcode = SRP_TSK_MGMT; - tsk_mgmt->lun = cpu_to_be64((u64) scmnd->device->lun << 48); - tsk_mgmt->tag = req_index | SRP_TAG_TSK_MGMT; + tsk_mgmt->lun = cpu_to_be64((u64) req->scmnd->device->lun << 48); + tsk_mgmt->tag = req->index | SRP_TAG_TSK_MGMT; tsk_mgmt->tsk_mgmt_func = func; - tsk_mgmt->task_tag = req_index; + tsk_mgmt->task_tag = req->index; if (__srp_post_send(target, iu, sizeof *tsk_mgmt)) goto out; @@ -1188,37 +1169,85 @@ static int srp_send_tsk_mgmt(struct scsi req->tsk_mgmt = iu; spin_unlock_irq(target->scsi_host->host_lock); + if (!wait_for_completion_timeout(&req->done, msecs_to_jiffies(SRP_ABORT_TIMEOUT_MS))) - return FAILED; - spin_lock_irq(target->scsi_host->host_lock); + return -1; - if (req->cmd_done) { - srp_remove_req(target, req, req_index); - scmnd->scsi_done(scmnd); - } else if (!req->tsk_status) { - srp_remove_req(target, req, req_index); - scmnd->result = DID_ABORT << 16; - ret = SUCCESS; - } + return 0; out: spin_unlock_irq(target->scsi_host->host_lock); - return ret; + return -1; +} + +static int srp_find_req(struct srp_target_port *target, + struct scsi_cmnd *scmnd, + struct srp_request **req) +{ + if (scmnd->host_scribble == (void *) -1L) + return -1; + + *req = &target->req_ring[(long) scmnd->host_scribble]; + + return 0; } static int srp_abort(struct scsi_cmnd *scmnd) { + struct srp_target_port *target = host_to_target(scmnd->device->host); + struct srp_request *req; + int ret = SUCCESS; + printk(KERN_ERR "SRP abort called\n"); - return srp_send_tsk_mgmt(scmnd, SRP_TSK_ABORT_TASK); + if (srp_find_req(target, scmnd, &req)) + return FAILED; + if (srp_send_tsk_mgmt(target, req, SRP_TSK_ABORT_TASK)) + return FAILED; + + spin_lock_irq(target->scsi_host->host_lock); + + if (req->cmd_done) { + srp_remove_req(target, req); + scmnd->scsi_done(scmnd); + } else if (!req->tsk_status) { + srp_remove_req(target, req); + scmnd->result = DID_ABORT << 16; + } else + ret = FAILED; + + spin_unlock_irq(target->scsi_host->host_lock); + + return ret; } static int srp_reset_device(struct scsi_cmnd *scmnd) { + struct srp_target_port *target = host_to_target(scmnd->device->host); + struct srp_request *req, *tmp; + printk(KERN_ERR "SRP reset_device called\n"); - return srp_send_tsk_mgmt(scmnd, SRP_TSK_LUN_RESET); + if (srp_find_req(target, scmnd, &req)) + return FAILED; + if (srp_send_tsk_mgmt(target, req, SRP_TSK_LUN_RESET)) + return FAILED; + if (req->tsk_status) + return FAILED; + + spin_lock_irq(target->scsi_host->host_lock); + + list_for_each_entry_safe(req, tmp, &target->req_queue, list) + if (req->scmnd->device == scmnd->device) { + req->scmnd->result = DID_RESET << 16; + scmnd->scsi_done(scmnd); + srp_remove_req(target, req); + } + + spin_unlock_irq(target->scsi_host->host_lock); + + return SUCCESS; } static int srp_reset_host(struct scsi_cmnd *scmnd) @@ -1518,10 +1547,12 @@ static ssize_t srp_create_target(struct INIT_WORK(&target->work, srp_reconnect_work, target); - for (i = 0; i < SRP_SQ_SIZE - 1; ++i) - target->req_ring[i].next = i + 1; - target->req_ring[SRP_SQ_SIZE - 1].next = -1; + INIT_LIST_HEAD(&target->free_reqs); INIT_LIST_HEAD(&target->req_queue); + for (i = 0; i < SRP_SQ_SIZE; ++i) { + target->req_ring[i].index = i; + list_add_tail(&target->req_ring[i].list, &target->free_reqs); + } ret = srp_parse_options(buf, target); if (ret) diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index bd7f7c3..c5cd43a 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -101,7 +101,7 @@ struct srp_request { */ struct scatterlist fake_sg; struct completion done; - short next; + short index; u8 cmd_done; u8 tsk_status; }; @@ -133,7 +133,7 @@ struct srp_target_port { unsigned tx_tail; struct srp_iu *tx_ring[SRP_SQ_SIZE + 1]; - int req_head; + struct list_head free_reqs; struct list_head req_queue; struct srp_request req_ring[SRP_SQ_SIZE]; From Thomas.Talpey at netapp.com Wed May 10 15:24:59 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Wed, 10 May 2006 18:24:59 -0400 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <44620889.3010702@mellanox.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> <44620889.3010702@mellanox.com> Message-ID: <7.0.1.0.2.20060510181242.04592e18@netapp.com> At 11:36 AM 5/10/2006, Vu Pham wrote: >I can get ~780 MB/s max without FMRs and ~920 MB/s with FMRs >(using 256 KB sequential read direct IO request) In the "without" case, what memory registration strategy? Also, what is the CPU utilization on the initiator in the two runs (i.e. is the 780MB/s run CPU limited)? Do you have performance results with smaller blocksizes? Thanks, Tom. From rdreier at cisco.com Wed May 10 15:30:16 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 15:30:16 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <7.0.1.0.2.20060510181242.04592e18@netapp.com> (Thomas Talpey's message of "Wed, 10 May 2006 18:24:59 -0400") References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> <44620889.3010702@mellanox.com> <7.0.1.0.2.20060510181242.04592e18@netapp.com> Message-ID: Thomas> In the "without" case, what memory registration strategy? Thomas> Also, what is the CPU utilization on the initiator in the Thomas> two runs (i.e. is the 780MB/s run CPU limited)? The without case is just using ib_get_dma_mr() for direct access to all of memory. - R. From rdreier at cisco.com Wed May 10 15:38:54 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 15:38:54 -0700 Subject: [openib-general] [PATCH] IB/ipath: Properly terminate PCI ID table Message-ID: The ipath driver's table of PCI IDs needs a { 0, } entry at the end. This makes all of the device aliases visible to userspace so hotplug loads the module for all supported devices. Without the patch, modinfo ipath_core only shows: alias: pci:v00001FC1d0000000Dsv*sd*bc*sc*i* instead of the correct: alias: pci:v00001FC1d00000010sv*sd*bc*sc*i* alias: pci:v00001FC1d0000000Dsv*sd*bc*sc*i* Signed-off-by: Roland Dreier --- Please apply to svn if this looks OK. Also let me know if it's _not_ OK to push this to Linus via my git tree. Thanks, Roland drivers/infiniband/hw/ipath/ipath_driver.c | 7 +++---- 1 files changed, 3 insertions(+), 4 deletions(-) 6a0d99e311ed8eca177750b3b5f81f526ce64544 diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 398add4..3697eda 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -116,10 +116,9 @@ #define PCI_DEVICE_ID_INFINIPATH_HT 0xd #define PCI_DEVICE_ID_INFINIPATH_PE800 0x10 static const struct pci_device_id ipath_pci_tbl[] = { - {PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, - PCI_DEVICE_ID_INFINIPATH_HT)}, - {PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, - PCI_DEVICE_ID_INFINIPATH_PE800)}, + { PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_HT) }, + { PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_PE800) }, + { 0, } }; MODULE_DEVICE_TABLE(pci, ipath_pci_tbl); -- 1.3.1 From fyzpsd at megapath.net Wed May 10 16:52:04 2006 From: fyzpsd at megapath.net (Melissa Neu) Date: Wed, 10 May 2006 15:52:04 -0800 Subject: [openib-general] Ephedra for you again Message-ID: <394m124m.1429340@flgaonline.com> A non-text attachment was scrubbed... Name: not available Type: multipart/alternative Size: 1891 bytes Desc: not available URL: From bos at pathscale.com Wed May 10 16:13:31 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Wed, 10 May 2006 16:13:31 -0700 Subject: [openib-general] Re: [PATCH] IB/ipath: Properly terminate PCI ID table In-Reply-To: References: Message-ID: <1147302811.4602.22.camel@localhost.localdomain> On Wed, 2006-05-10 at 15:38 -0700, Roland Dreier wrote: > Signed-off-by: Roland Dreier Signed-off-by: Bryan O'Sullivan From sean.hefty at intel.com Wed May 10 16:35:54 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 10 May 2006 16:35:54 -0700 Subject: [openib-general] question regarding GRH flag in ib_ah_attr Message-ID: For context, I'm trying to work backwards from send a message on a UD QP to determine what information is needed and how it is obtained. Does anyone know how the user determines if the grh flag should be set in the ib_ah_attr when allocating an ib_ah? Do they do this by examining the GIDs in a path record? - Sean From rdreier at cisco.com Wed May 10 16:41:40 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 16:41:40 -0700 Subject: [openib-general] 2.6.17 and 2.6.18 merge plans Message-ID: As the 2.6.17 release cycle starts to wrap up, I thought it would be a good time to send out a report on my status, and a request for patches. All I have queued up for 2.6.17 that isn't upstream already is the one-liner patch fixing the ipath PCI device ID table. If there is anything else you think should be in 2.6.17, please send me a patch against the for-2.6.17 branch of my git tree [1]. I know pathscale at least is sitting on ipath patches, and the window to get them into 2.6.17 is probably not all that big. For 2.6.18, I have the following major things already queued (in addition to a trivial mthca patch): - CMA. Sean, I think there have been a lot of updates to the cma since the last time you updated me. Can you send me a patch to resync my for-2.6.18 branch with the latest code for upstream? - SRP FMRs. In addition I am planning on the following for 2.6.18 already: - iSER. Just waiting for Or to send patches with proper changelogs and Signed-off-by: lines. - SRP tunable parameters. I just need to review and merge the patches, but I don't expect any holdup here. If there's something else you would like merged for 2.6.18, please send out patches soon, so that they can be reviewed and ready to go when 2.6.18 opens up. Thanks, Roland [1] My git tree can be cloned from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git From rdreier at cisco.com Wed May 10 16:44:42 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 16:44:42 -0700 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: (Sean Hefty's message of "Wed, 10 May 2006 16:35:54 -0700") References: Message-ID: Sean> Does anyone know how the user determines if the grh flag Sean> should be set in the ib_ah_attr when allocating an ib_ah? Sean> Do they do this by examining the GIDs in a path record? Good question. It's always needed for multicast, of course. For unicast, I guess one could look at whether the subnet prefixes of the SGID and DGID are the same, but I'm not sure that's sufficient -- a router could conceivably sit between two subnets with the same subnet prefix. Perhaps some of the Obsidian guys could comment? - R. From sean.hefty at intel.com Wed May 10 16:45:19 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 10 May 2006 16:45:19 -0700 Subject: [openib-general] 2.6.17 and 2.6.18 merge plans In-Reply-To: Message-ID: > - CMA. Sean, I think there have been a lot of updates to the cma > since the last time you updated me. Can you send me a patch to > resync my for-2.6.18 branch with the latest code for upstream? I will start on this by the end of the week. - Sean From gato0011 at iglesiadelmar.com Wed May 10 17:10:20 2006 From: gato0011 at iglesiadelmar.com (Cade Young) Date: Thu, 11 May 2006 08:10:20 +0800 Subject: [openib-general] cheap oem soft shipping //orldwide Message-ID: <000001c674ba$36df9800$0100007f@localhost> Special Offer Adobe Video Collection Adobe Premiere 1.5 Professional Adobe After Effects 6.5 Professional Adobe Audition 1.5 Adobe Encore DVD 1.5 $149.95 More Info >> Microsoft 2 in 1 MS Windows XP Pro MS Office 2003 Pro $99.95 More Info >> Microsoft + Adobe 3 in 1 MS Windows XP Pro MS Office 2003 Pro Adobe Acrobat 7.0 Professional $149.95 More Info >> Bestsellers Microsoft Office Professional Edition 2003 Rating: 6 reviews Retail price: $550.00 You save: $480.05 (87%) Our price: $69.95 [Add to cart] Microsoft Windows XP Professional Rating: 8 reviews Retail price: $200.00 You save: $150.05 (75%) Our price: $49.95 [Add to cart] Adobe Photoshop CS2 V 9.0 Rating: 3 reviews Retail price: $599.00 You save: $529.05 (88%) Our price: $69.95 [Add to cart] -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom at opengridcomputing.com Wed May 10 17:32:20 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Wed, 10 May 2006 19:32:20 -0500 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> <7.0.1.0.2.20060510063730.04336f80@netapp.com> Message-ID: <1147307540.5093.71.camel@trinity.ogc.int> On Wed, 2006-05-10 at 08:53 -0700, Roland Dreier wrote: > Thomas> I am planning to test this some more in the next few > Thomas> weeks, but what I'd really like to see is an IBTA > Thomas> 1.2-compliant implementation, and one that operated on > Thomas> work queue entries (not synchronous verbs). Is that being > Thomas> worked on? > > No current hardware supports that as far as I know. (Well, ipath > could fake it since they already implement all the verbs in software) > I'm almost certain I'll be shot for saying this, but isn't there a danger of confusion with real FMRs when the HW shows up? If the benefit isn't there -- why do it if the application outcomes are almost certainly all bad? > - R. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From gilbert at first2office.biz Wed May 10 12:39:51 2006 From: gilbert at first2office.biz (Walter) Date: Wed, 10 May 2006 20:39:51 +0100 Subject: [openib-general] V1agra 10 P1lls 100 mg $69.95 Message-ID: <000001c6749b$cb86cd80$0100007f@rps-loan-lt> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: l3.jpg Type: image/jpeg Size: 14135 bytes Desc: not available URL: From jgunthorpe at obsidianresearch.com Wed May 10 17:40:49 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 10 May 2006 18:40:49 -0600 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: References: Message-ID: <20060511004049.GD26684@obsidianresearch.com> On Wed, May 10, 2006 at 04:44:42PM -0700, Roland Dreier wrote: > Sean> Does anyone know how the user determines if the grh flag > Sean> should be set in the ib_ah_attr when allocating an ib_ah? > Sean> Do they do this by examining the GIDs in a path record? > > Good question. It's always needed for multicast, of course. For > unicast, I guess one could look at whether the subnet prefixes of the > SGID and DGID are the same, but I'm not sure that's sufficient -- a > router could conceivably sit between two subnets with the same subnet > prefix. > Perhaps some of the Obsidian guys could comment? Our intention in the absence of standardization is to leverage common practice in IPv6 for numbering - which means that global prefixes need to be globally unique (or at least site unqiue). A generic N port router cannot connect subnets with the same prefix because it is ambiguous where to send the packets. Logically I think the GRH usage should be selected after the output port is determined based on matching the port's PortInfo.GIDPrefix and the IBA default prefix (the link local prefix FE80:: which is always on-link) against the DGID. If there is a match it is on link, otherwise it is off link, through a router, and a GRH is necessary. Right now IBA only allows two prefixes, FE80:: and PortInfo.GIDPrefix so the check described above can be reduced to comparing the SGID and DGID prefixes, if they are different and the DGID prefix is not FE80:: then it is off link and needs a GRH. Regards, Jason From tom at opengridcomputing.com Wed May 10 17:43:18 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Wed, 10 May 2006 19:43:18 -0500 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <44620889.3010702@mellanox.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> <44620889.3010702@mellanox.com> Message-ID: <1147308198.5093.81.camel@trinity.ogc.int> On Wed, 2006-05-10 at 08:36 -0700, Vu Pham wrote: > Roland Dreier wrote: > > BTW, does Mellanox (or anyone else) have any numbers showing that > > using FMRs makes any difference in performance on a semi-realistic benchmark? > > > > I'm using xdd to test the performance > www.ioperformance.com/products.htm > > The target is Mellanox srp target reference implemenation > with 14 SATA spindles > > I can get ~780 MB/s max without FMRs and ~920 MB/s with FMRs > (using 256 KB sequential read direct IO request) Wow, that's awesome Vu! So what's the consensus on the reason for the improvement? - Fewer WR to send the same amount of data because the memory is virtually contiguous? - Fewer PDU due to larger writes and better packing? - Something else? Are these huge reads, ... or even better ... what are the parameters to xdd? Did you quantify the FMR registration cost per MB of registered memory space? This would be a very good number to have for figuring out where the sweet spot is... Thanks, Tom T. > > Vu > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From tom at opengridcomputing.com Wed May 10 17:48:19 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Wed, 10 May 2006 19:48:19 -0500 Subject: [openib-general] rdma_cm.h: comment nits. In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F149EF5A@NT-SJCA-0751.brcm.ad.broadcom.com> References: <54AD0F12E08D1541B826BE97C98F99F149EF5A@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <1147308499.5093.85.camel@trinity.ogc.int> On Wed, 2006-05-10 at 14:20 -0700, Caitlin Bestler wrote: > Tom Tucker wrote: > > > > > So... all that said, I could in fact support rdma_reject on > > an active side connection. But this would effectively reduce > > to a QP --> ERROR and I doubt this matches the semantics > > you're looking for. > > > > > > And you could send an RST. Yep, in fact that's what many RNIC's do when you move the QP to ERROR instead of CLOSING. > There's just no way to send any user > supplied private data. It's not just unreliable, it's guaranteed > not to arrive. It's still a long way from the truly desired > semantics, but the wire protocol just doesn't carry that info. > Yeah, I think you're correct -- it would be a bogus "emulation". > From iod00d at hp.com Wed May 10 17:50:33 2006 From: iod00d at hp.com (Grant Grundler) Date: Wed, 10 May 2006 17:50:33 -0700 Subject: [openib-general] 2.6.17 and 2.6.18 merge plans In-Reply-To: References: Message-ID: <20060511005033.GH6575@esmail.cup.hp.com> On Wed, May 10, 2006 at 04:41:40PM -0700, Roland Dreier wrote: ... > For 2.6.18, I have the following major things already queued (in > addition to a trivial mthca patch): SDP is still missing. I just think SDP is a substantial peice of the story for IB adoption. Sorry, I won't be able to contribute much here. thanks, grant From eBay at openib.org Wed May 10 18:04:04 2006 From: eBay at openib.org (eBay at openib.org) Date: Thu, 11 May 2006 10:04:04 +0900 Subject: [openib-general] eBay New Unpaid Item Message from chtradingking: #4636700352 Message-ID: <200605110104.k4B144Tb011387@localhost.localdomain> An HTML attachment was scrubbed... URL: From tom at opengridcomputing.com Wed May 10 18:08:15 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Wed, 10 May 2006 20:08:15 -0500 Subject: [openib-general] Re: [PATCH][UVERBS][RFC] node type in ibv_context In-Reply-To: References: <1145900760.18808.19.camel@trinity.ogc.int> <444D144E.3020506@ichips.intel.com> <1145911267.18808.36.camel@trinity.ogc.int> Message-ID: <1147309695.5093.90.camel@trinity.ogc.int> On Mon, 2006-05-01 at 22:13 -0700, Roland Dreier wrote: > Tom> Here's a patch that puts a node_type in the ibv_context. > > Two problems: > - It breaks the ABI (which is frozen for the libibverbs 1.0 series) > - Even when we're ready to break ABI, I think node_type should be in > struct ibv_device since it's not per-context at all. > Yeah, I originally had it there, but I waffled because I was worried (no use case btw) if the type check was every in the performance path that it would involve one extra pointer dereference. > - R. From michwarri at yahoo.com.hk Wed May 10 17:59:45 2006 From: michwarri at yahoo.com.hk (Michael Warri) Date: Thu, 11 May 2006 02:59:45 +0200 Subject: [openib-general] PLEASE ACKNOWLEDGE THE RECEIPT. Message-ID: <83e15a04b06b313c8888b8a412fc3971@promail4you.at> Mr. Michael Warri Beagon Oil Nigeria Limited. 45, Iretee Avenue, Victoria Island, Lagos, Nigeria. E-mail: micwarri at yahoo.com.hk� Goodmorning,� It is with a heart full of hope that I write, soliciting for your strict confidence in this transaction. This is by virtue of its nature as being confidential and top secret. I feel quite safe dealing with you in this business proposition, having gone through your remarkable profile on theinternet. However, this correspondence is official and private, and it should be treated as such. I also guarantee you that this deal is hitch free from all what you may think of. I am an accountant with Beagon Nigeria Limited, foreign Oil Company based in Lagos, Nigeria. Beagon was involved in prospecting, drilling and bunkering of oil at high sea (offshore), with expatriate consisting majority of the staffstrength. Unfortunately, these expatriates got into an illegal sale of oil and the money from it transferred through a security company out of the country but with a security stamp on it. The federal government of Nigeria discovered the illegal deal, and revoked their license and the expatriates were repatriated. So, by virtue of my position as the accountant of the company, I have in my possession the documents containing information about the trunk box deposited with the security company. The content of this box is thirty-two million United States dollars (us$32m). And I have put in an application with the security company to transfer the box to their correspondence office abroad, which they have obliged. And being that this deal was been carried out by foreign firm, I need a trustworthy foreigner like you that will assist me to a logical conclusion of this pending business which both of us will bound to benefit earnestly. All I need from you is to stand before the Security Company and sign as one of expatriates to enable you claim the money and pay it into your bank account. �Please, note that this transaction is 100% risk-free, for machinery has been put in place for successful conclusion. Note; I have with me all thenecessary documents that will prove that the money is for you. I expect you to be trustworthy and kind enough to keep my own share, when the money hit your account. we hereby agree to compensate your sincere and candid effort in this regard with 20% of the fund after the transfer and 5%(percent) will be set aside for any expenses both local and international transportation, telephone bills etc, while 75% will be for me and two of my colleagues. This deal will be concluded within seven (7) working days as soon as you indicate your readiness to assist in this beneficial transaction. Therefore, I will want you to confirm your full contact information;- �Your full name and address Your private telephone and fax number Any form of identification thru. e-mail attachment �I will be looking forward to doing business with you, and solicit your confidentiality in this transaction. Please acknowledge the receipt of this proposal via e-mail to enable me know if you are interested or not. � Yours, faithfully, Mr. Michael Warri. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michwarri at yahoo.com.hk Wed May 10 18:03:34 2006 From: michwarri at yahoo.com.hk (Michael Warri) Date: Thu, 11 May 2006 03:03:34 +0200 Subject: [openib-general] PLEASE ACKNOWLEDGE THE RECEIPT. Message-ID: Mr. Michael Warri Beagon Oil Nigeria Limited. 45, Iretee Avenue, Victoria Island, Lagos, Nigeria. E-mail: micwarri at yahoo.com.hk� Goodmorning,� It is with a heart full of hope that I write, soliciting for your strict confidence in this transaction. This is by virtue of its nature as being confidential and top secret. I feel quite safe dealing with you in this business proposition, having gone through your remarkable profile on theinternet. However, this correspondence is official and private, and it should be treated as such. I also guarantee you that this deal is hitch free from all what you may think of. I am an accountant with Beagon Nigeria Limited, foreign Oil Company based in Lagos, Nigeria. Beagon was involved in prospecting, drilling and bunkering of oil at high sea (offshore), with expatriate consisting majority of the staffstrength. Unfortunately, these expatriates got into an illegal sale of oil and the money from it transferred through a security company out of the country but with a security stamp on it. The federal government of Nigeria discovered the illegal deal, and revoked their license and the expatriates were repatriated. So, by virtue of my position as the accountant of the company, I have in my possession the documents containing information about the trunk box deposited with the security company. The content of this box is thirty-two million United States dollars (us$32m). And I have put in an application with the security company to transfer the box to their correspondence office abroad, which they have obliged. And being that this deal was been carried out by foreign firm, I need a trustworthy foreigner like you that will assist me to a logical conclusion of this pending business which both of us will bound to benefit earnestly. All I need from you is to stand before the Security Company and sign as one of expatriates to enable you claim the money and pay it into your bank account. �Please, note that this transaction is 100% risk-free, for machinery has been put in place for successful conclusion. Note; I have with me all thenecessary documents that will prove that the money is for you. I expect you to be trustworthy and kind enough to keep my own share, when the money hit your account. we hereby agree to compensate your sincere and candid effort in this regard with 20% of the fund after the transfer and 5%(percent) will be set aside for any expenses both local and international transportation, telephone bills etc, while 75% will be for me and two of my colleagues. This deal will be concluded within seven (7) working days as soon as you indicate your readiness to assist in this beneficial transaction. Therefore, I will want you to confirm your full contact information;- �Your full name and address Your private telephone and fax number Any form of identification thru. e-mail attachment �I will be looking forward to doing business with you, and solicit your confidentiality in this transaction. Please acknowledge the receipt of this proposal via e-mail to enable me know if you are interested or not. � Yours, faithfully, Mr. Michael Warri. � � � -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Wed May 10 18:26:00 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 May 2006 21:26:00 -0400 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: References: Message-ID: <1147310565.4485.56947.camel@hal.voltaire.com> On Wed, 2006-05-10 at 19:44, Roland Dreier wrote: > Sean> Does anyone know how the user determines if the grh flag > Sean> should be set in the ib_ah_attr when allocating an ib_ah? > Sean> Do they do this by examining the GIDs in a path record? > > Good question. It's always needed for multicast, of course. For > unicast, I guess one could look at whether the subnet prefixes of the > SGID and DGID are the same, but I'm not sure that's sufficient -- a > router could conceivably sit between two subnets with the same subnet > prefix. Huh ? In this case, aren't the subnet prefixes are required to be different ? -- Hal > Perhaps some of the Obsidian guys could comment? > > - R. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Wed May 10 18:29:23 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 May 2006 21:29:23 -0400 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: References: Message-ID: <1147310962.4485.57103.camel@hal.voltaire.com> On Wed, 2006-05-10 at 19:35, Sean Hefty wrote: > For context, I'm trying to work backwards from send a message on a UD QP to > determine what information is needed and how it is obtained. > > Does anyone know how the user determines if the grh flag should be set in the > ib_ah_attr when allocating an ib_ah? Do they do this by examining the GIDs in a > path record? Anytime the send is off the local subnet (as well as multicast), a GRH is required. Also, there is a management response rule for responding when the request contained a GRH that require a GRH (13.5.4.4 p. 769). -- Hal > > - Sean > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Wed May 10 18:43:09 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 10 May 2006 21:43:09 -0400 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <1147310565.4485.56947.camel@hal.voltaire.com> References: <1147310565.4485.56947.camel@hal.voltaire.com> Message-ID: <1147311788.4485.57378.camel@hal.voltaire.com> On Wed, 2006-05-10 at 21:26, Hal Rosenstock wrote: > On Wed, 2006-05-10 at 19:44, Roland Dreier wrote: > > Sean> Does anyone know how the user determines if the grh flag > > Sean> should be set in the ib_ah_attr when allocating an ib_ah? > > Sean> Do they do this by examining the GIDs in a path record? > > > > Good question. It's always needed for multicast, of course. For > > unicast, I guess one could look at whether the subnet prefixes of the > > SGID and DGID are the same, but I'm not sure that's sufficient -- a > > router could conceivably sit between two subnets with the same subnet > > prefix. > > Huh ? In this case, aren't the subnet prefixes are required to be > different ? Not just different but globally unique, right ? What you are describing is similar to a NAT function for IB which would need to be supported in the IB edge router to that private network. -- Hal > > -- Hal > > > Perhaps some of the Obsidian guys could comment? > > > > - R. > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From yukihana at yahoo.co.jp Wed May 10 19:42:38 2006 From: yukihana at yahoo.co.jp (=?iso-2022-jp?B?eXVraWhhbmE=?=) Date: Wed, 10 May 2006 19:42:38 -0700 (PDT) Subject: [openib-general] Re: Message-ID: <20060511024238.BAF262285FD@openib.ca.sandia.gov> 完┃全┃無┃料┃&┃使┃い┃放┃題┃ ━┛━┛━┛━┛━┛━┛━┛━┛━┛   ▽▽▽▽▽▽ http://love-match.bz/pc/?02 ◆入会費/年会費→→→→→→→→→→→無料 ◇メール受信&送信→→→→→→→→→→無料 ◆理想の相手/ご近所検索→→→→→→→無料 ◇写真/動画の登録&閲覧→→→→→→→無料 ◆住所/アドレス/電話番号の交換→→→無料 男性♂女性♀ともに完全無料で全てご利用頂けます。 http://love-match.bz/pc/?02 安心サイト宣言 ・・・…━━━━━━━━━━━━━━━━━━━━━━━━━★ 当番組はスポンサーサイト様からの広告料のみで運営しております。 その為、全サービスを完全無料で全てご利用頂けます。 理想の相手を求める方々に安心と信頼を第一に考えながら、 気軽にお使い頂けるサービスを目指し、 万全のサポート体制で皆様に合ったお相手探しをお手伝いしております。 ★━━━━━━━━━━━━━━━━━━━━━━━━━…・・・ http://ad.deai-ciao.net/?hkbb 広東省茂名市人民大街3-6-4-533 友誼網絡公司 139-3668-7892 From xma at us.ibm.com Wed May 10 20:11:43 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 10 May 2006 20:11:43 -0700 Subject: [openib-general] ip over ib throughtput In-Reply-To: <7.0.1.0.2.20060510180054.04336f80@netapp.com> Message-ID: "Talpey, Thomas" wrote on 05/10/2006 03:10:57 PM: > Sure, but I wonder why it's interesting. Nobody ever uses NFS in such > small blocksizes, and 2044 bytes would mean, say, 1800 bytes of payload. > What data are you looking for, throughput and overhead? Direct RDMA, > or inline? > > Tom. Throughput. I am wondering how much room IPoIB performance (throughput) can go. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bodalton at palmcoastcondo.com Wed May 10 22:30:22 2006 From: bodalton at palmcoastcondo.com (Sean Morgan) Date: Wed, 10 May 2006 20:30:22 -0900 Subject: [openib-general] Photoshop, Windows, Office Message-ID: <000001c674d6$03126b80$0100007f@localhost> Special Offer Adobe Video Collection Adobe Premiere 1.5 Professional Adobe After Effects 6.5 Professional Adobe Audition 1.5 Adobe Encore DVD 1.5 $149.95 More Info >> Microsoft 2 in 1 MS Windows XP Pro MS Office 2003 Pro $99.95 More Info >> Microsoft + Adobe 3 in 1 MS Windows XP Pro MS Office 2003 Pro Adobe Acrobat 7.0 Professional $149.95 More Info >> Bestsellers Microsoft Office Professional Edition 2003 Rating: 6 reviews Retail price: $550.00 You save: $480.05 (87%) Our price: $69.95 [Add to cart] Microsoft Windows XP Professional Rating: 8 reviews Retail price: $200.00 You save: $150.05 (75%) Our price: $49.95 [Add to cart] Adobe Photoshop CS2 V 9.0 Rating: 3 reviews Retail price: $599.00 You save: $529.05 (88%) Our price: $69.95 [Add to cart] -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Wed May 10 21:56:58 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 21:56:58 -0700 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <1147310565.4485.56947.camel@hal.voltaire.com> (Hal Rosenstock's message of "10 May 2006 21:26:00 -0400") References: <1147310565.4485.56947.camel@hal.voltaire.com> Message-ID: Hal> Huh ? In this case, aren't the subnet prefixes are required Hal> to be different ? It's kind of a crazy thing to do but I don't see anything in the IB spec that forbids two subnets with the same subnet prefix, or any reason why a router couldn't route between them. The SMs would just have to be smart enough to return the LID of the router for paths to ports on the other subnet, and the routers would have to have explicit routes rather than forwarding based on just GID prefix. - R. From rdreier at cisco.com Wed May 10 21:58:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 21:58:00 -0700 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <1147311788.4485.57378.camel@hal.voltaire.com> (Hal Rosenstock's message of "10 May 2006 21:43:09 -0400") References: <1147310565.4485.56947.camel@hal.voltaire.com> <1147311788.4485.57378.camel@hal.voltaire.com> Message-ID: Hal> What you are describing is similar to a NAT function for IB Hal> which would need to be supported in the IB edge router to Hal> that private network. Why does there have to be any NAT? The router would just have to replace the DLID the same as it usually does. I don't see why the GID prefix makes any difference really. - R. From rdreier at cisco.com Wed May 10 22:00:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 22:00:00 -0700 Subject: [openib-general] 2.6.17 and 2.6.18 merge plans In-Reply-To: <20060511005033.GH6575@esmail.cup.hp.com> (Grant Grundler's message of "Wed, 10 May 2006 17:50:33 -0700") References: <20060511005033.GH6575@esmail.cup.hp.com> Message-ID: Grant> SDP is still missing. I just think SDP is a substantial Grant> peice of the story for IB adoption. Sorry, I won't be able Grant> to contribute much here. It doesn't seem very realistic to expect to merge SDP for 2.6.18. The current codebase is very much a work in progress that has never been reviewed by anyone and is not really ready for review, and I would expect SDP to take a while to review even once it's ready for review. - R. From rdreier at cisco.com Wed May 10 22:01:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 May 2006 22:01:13 -0700 Subject: [openib-general] Re: [PATCH][UVERBS][RFC] node type in ibv_context In-Reply-To: <1147309695.5093.90.camel@trinity.ogc.int> (Tom Tucker's message of "Wed, 10 May 2006 20:08:15 -0500") References: <1145900760.18808.19.camel@trinity.ogc.int> <444D144E.3020506@ichips.intel.com> <1145911267.18808.36.camel@trinity.ogc.int> <1147309695.5093.90.camel@trinity.ogc.int> Message-ID: Tom> Yeah, I originally had it there, but I waffled because I was Tom> worried (no use case btw) if the type check was every in the Tom> performance path that it would involve one extra pointer Tom> dereference. That's a valid point. But that seems like a pathologically stupid app to me, to be honest. - R. From or.gerlitz at gmail.com Wed May 10 22:34:41 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Thu, 11 May 2006 08:34:41 +0300 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> Message-ID: <15ddcffd0605102234v79c77df0p85f0eb81af3388e9@mail.gmail.com> On 5/11/06, Roland Dreier wrote: > Or> OK., I see now that as of few hours ago the second iscsi > Or> update for 2.6.18 was commited there which means iser should > Or> compile with it, you can go ahead pull it! > > Great, I've got it. Can you resend the iSER patches with changelog > entries for each patch and a Signed-off-by: line too? sure, will do that Or. From mst at mellanox.co.il Wed May 10 22:39:39 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 11 May 2006 08:39:39 +0300 Subject: [openib-general] Re: 2.6.17 and 2.6.18 merge plans In-Reply-To: References: Message-ID: <20060511053939.GA30842@mellanox.co.il> Quoting r. Roland Dreier : > Subject: 2.6.17 and 2.6.18 merge plans > > As the 2.6.17 release cycle starts to wrap up, I thought it would be a > good time to send out a report on my status, and a request for patches. > > All I have queued up for 2.6.17 that isn't upstream already is the > one-liner patch fixing the ipath PCI device ID table. If there is > anything else you think should be in 2.6.17, please send me a patch > against the for-2.6.17 branch of my git tree [1]. How about module unloading races? Solving them by flushing WQs is not very elegant but no better solution surfaced either. Should I repost the patches? -- MST From fmar_9vyina at yahoo.co.jp Wed May 10 22:56:53 2006 From: fmar_9vyina at yahoo.co.jp (fmar_9vyina at yahoo.co.jp) Date: Wed, 10 May 2006 22:56:53 -0700 (PDT) Subject: [openib-general] =?utf-8?b?woLCqMKLw6DCjsKdwoLCv8KCw4jCg1jCg3w=?= =?utf-8?b?woPCk8KDVMKBW8KCw4bCisKEwpDDmMKCw4HCgsOEwo9vwonDr8KCwqI=?= =?utf-8?b?woLDnMKCwrnCgsOxwoLCqcKBSA==?= Message-ID: 20030905214239.54465mail@mail.lovelove-queensex552158754_lookserver772_womansystem01_woman-queen-love.tv ���߂܂��āB�y�}�_����F��\ ��q�z�Ɛ\���܂��B ���̃T�[�N���ɏo�����Ă�̂œ��ʂɋ��‚𒸂��A��W���܂��B �����؂�����ۂ��߂�N��1000���ȏ�̏����Ƃ̋t�T�|�[�g��ۂ��Ă݂܂��񂩁H �������E�����ׂď������S�A�j���͂܂������o�^�A�Љ��|����܂���B ����1�`2��(�����ɂ��A���͘b��������) �����؂����֌W�A�����O��A�r�W�l�X�p�[�g�i�[�A�����̖ړI��l�X�ł��B http://lovlyqueen.cx/h/ �T���ȏ�������͓����E���𕥂��ēo�^���Ă��܂��̂ŁA ��₩���A�T�N���̏����͈�l����܂���B �^�ʖڂ� �@��ʏ����Ƃ��t���� �A�����X�|���T�[�Ƃ��t���� �B���؂�����l�̂��t���� �C40�Έȏ㓯�m�̐V�������t���� ���]����Ă������T���Ă��܂��B http://lovlyqueen.cx/h/ From jgunthorpe at obsidianresearch.com Wed May 10 22:48:03 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 10 May 2006 23:48:03 -0600 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: References: <1147310565.4485.56947.camel@hal.voltaire.com> Message-ID: <20060511054803.GE26684@obsidianresearch.com> On Wed, May 10, 2006 at 09:56:58PM -0700, Roland Dreier wrote: > Hal> Huh ? In this case, aren't the subnet prefixes are required > Hal> to be different ? > > It's kind of a crazy thing to do but I don't see anything in the IB > spec that forbids two subnets with the same subnet prefix, or any > reason why a router couldn't route between them. The SMs would just > have to be smart enough to return the LID of the router for paths to > ports on the other subnet, and the routers would have to have explicit > routes rather than forwarding based on just GID prefix. Hmm, this is an interesting point, you can do this in IP land using host routes. How about this - the Path record (and related) SA responses include the Hop Limit fields and the spec says: 8.3.6 Hop Limit: [..] Setting this value to 0 or 1 will ensure that the packet will not be forwarded beyond the local subnet. So, it is within the spec to use HopLmt >= 2 as the GRH required flag. I'd propose that the combination of a non-link-local prefix and a >= 2 Hop Limit should force a GRH. SM's that do not support routers should always fill in 0 for HopLmt. Jason From whsclass80 at nationallotterypromo.com Wed May 10 22:55:59 2006 From: whsclass80 at nationallotterypromo.com (Eli Price) Date: Thu, 11 May 2006 13:55:59 +0800 Subject: [openib-general] Need S0ftware? Message-ID: <000001c674e9$327ea380$0100007f@localhost> Special Offer Adobe Video Collection Adobe Premiere 1.5 Professional Adobe After Effects 6.5 Professional Adobe Audition 1.5 Adobe Encore DVD 1.5 $149.95 More Info >> Microsoft 2 in 1 MS Windows XP Pro MS Office 2003 Pro $99.95 More Info >> Microsoft + Adobe 3 in 1 MS Windows XP Pro MS Office 2003 Pro Adobe Acrobat 7.0 Professional $149.95 More Info >> Bestsellers Microsoft Office Professional Edition 2003 Rating: 6 reviews Retail price: $550.00 You save: $480.05 (87%) Our price: $69.95 [Add to cart] Microsoft Windows XP Professional Rating: 8 reviews Retail price: $200.00 You save: $150.05 (75%) Our price: $49.95 [Add to cart] Adobe Photoshop CS2 V 9.0 Rating: 3 reviews Retail price: $599.00 You save: $529.05 (88%) Our price: $69.95 [Add to cart] -------------- next part -------------- An HTML attachment was scrubbed... URL: From bestof2093 at queenpj.com Wed May 10 23:13:21 2006 From: bestof2093 at queenpj.com (Adrian James) Date: Thu, 11 May 2006 14:13:21 +0800 Subject: [openib-general] Three Steps to the Software You Need at the Prices You Want Message-ID: <000001c674ec$e0d81800$0100007f@localhost> Special Offer Adobe Video Collection Adobe Premiere 1.5 Professional Adobe After Effects 6.5 Professional Adobe Audition 1.5 Adobe Encore DVD 1.5 $149.95 More Info >> Microsoft 2 in 1 MS Windows XP Pro MS Office 2003 Pro $99.95 More Info >> Microsoft + Adobe 3 in 1 MS Windows XP Pro MS Office 2003 Pro Adobe Acrobat 7.0 Professional $149.95 More Info >> Bestsellers Microsoft Office Professional Edition 2003 Rating: 6 reviews Retail price: $550.00 You save: $480.05 (87%) Our price: $69.95 [Add to cart] Microsoft Windows XP Professional Rating: 8 reviews Retail price: $200.00 You save: $150.05 (75%) Our price: $49.95 [Add to cart] Adobe Photoshop CS2 V 9.0 Rating: 3 reviews Retail price: $599.00 You save: $529.05 (88%) Our price: $69.95 [Add to cart] -------------- next part -------------- An HTML attachment was scrubbed... URL: From vuhuong at mellanox.com Wed May 10 23:26:26 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Wed, 10 May 2006 23:26:26 -0700 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <1147308198.5093.81.camel@trinity.ogc.int> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> <44620889.3010702@mellanox.com> <1147308198.5093.81.camel@trinity.ogc.int> Message-ID: <4462D912.7000101@mellanox.com> >> >>>BTW, does Mellanox (or anyone else) have any numbers showing that >>>using FMRs makes any difference in performance on a semi-realistic benchmark? >>> >> >>I'm using xdd to test the performance >>www.ioperformance.com/products.htm >> >>The target is Mellanox srp target reference implemenation >>with 14 SATA spindles >> >>I can get ~780 MB/s max without FMRs and ~920 MB/s with FMRs >>(using 256 KB sequential read direct IO request) > > > Wow, that's awesome Vu! > > So what's the consensus on the reason for the improvement? > - Fewer WR to send the same amount of data because the memory is > virtually contiguous? Fewer wqe, fewer interrupts & context switches, big I/O request directly sent to back-end storage/spindles (all are applied for target) For write command (RDMA read), it help to utilize the max outstanding rdma read operations (more data transfer per outstanding rdma read operations) > - Fewer PDU due to larger writes and better packing? This is not much. > - Something else? > > Are these huge reads, ... or even better ... what are the parameters to > xdd? -op read -reqsize 128 -block 2048 -dio -mbytes 8192 -targetdir /dev/ -targets 14 sdb sdc sdd .... > > Did you quantify the FMR registration cost per MB of registered memory > space? This would be a very good number to have for figuring out where > the sweet spot is... I don't have this number of FMR registration cost per MB. Dror, do we have any number on how many cycles cost per FMR registration? On the side note the FMR registration hit rate is ~100% with direct I/O Vu From ogerlitz at voltaire.com Wed May 10 23:59:54 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 11 May 2006 09:59:54 +0300 (IDT) Subject: [openib-general] [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator Message-ID: Roland, I am resending the patch series, this time with changelog description and Signed-off-by line, sorry for forgetting it in the original post. Or. From ogerlitz at voltaire.com Thu May 11 00:00:21 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 11 May 2006 10:00:21 +0300 (IDT) Subject: [openib-general] [PATCH 1/6] iSCSI iSER transport provider header file In-Reply-To: Message-ID: iSER (iSCSI Extensions for RDMA) transport provider driver for the iSCSI initiator, whose other parts (under drivers/scsi) are scsi_transport_iscsi - the transport management module, iscsi_tcp - the TCP transport provider module and libiscsi - a kernel library (module) implementing functionality needed by both TCP and iSER transports. iSER is both a provider of the iSCSI transport api and a SCSI low level driver. This file contains internal data structures and non static service functions. Signed-off-by: Or Gerlitz --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iscsi_iser.h 1970-01-01 02:00:00.000000000 +0200 +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iscsi_iser.h 2006-05-10 15:32:01.000000000 +0300 @@ -0,0 +1,354 @@ +/* + * iSER transport for the Open iSCSI Initiator & iSER transport internals + * + * Copyright (C) 2004 Dmitry Yusupov + * Copyright (C) 2004 Alex Aizman + * Copyright (C) 2005 Mike Christie + * based on code maintained by open-iscsi at googlegroups.com + * + * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2005, 2006 Cisco Systems. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: iscsi_iser.h 7051 2006-05-10 12:29:11Z ogerlitz $ + */ +#ifndef __ISCSI_ISER_H__ +#define __ISCSI_ISER_H__ + +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include +#include + +#define DRV_NAME "iser" +#define PFX DRV_NAME ": " +#define DRV_VER "0.1" +#define DRV_DATE "May 7th, 2006" + +#define iser_dbg(fmt, arg...) \ + do { \ + if (iser_debug_level > 0) \ + printk(KERN_DEBUG PFX "%s:" fmt,\ + __func__ , ## arg); \ + } while (0) + +#define iser_err(fmt, arg...) \ + do { \ + printk(KERN_ERR PFX "%s:" fmt, \ + __func__ , ## arg); \ + } while (0) + + /* support upto 512KB in one RDMA */ +#define ISCSI_ISER_SG_TABLESIZE (0x80000 >> PAGE_SHIFT) +#define ISCSI_ISER_MAX_LUN 256 +#define ISCSI_ISER_MAX_CMD_LEN 16 + +/* QP settings */ +/* Maximal bounds on received asynchronous PDUs */ +#define ISER_MAX_RX_MISC_PDUS 4 /* NOOP_IN(2) , ASYNC_EVENT(2) */ + +#define ISER_MAX_TX_MISC_PDUS 6 /* NOOP_OUT(2), TEXT(1), * + * SCSI_TMFUNC(2), LOGOUT(1) */ + +#define ISER_QP_MAX_RECV_DTOS (ISCSI_XMIT_CMDS_MAX + \ + ISER_MAX_RX_MISC_PDUS + \ + ISER_MAX_TX_MISC_PDUS) + +/* the max TX (send) WR supported by the iSER QP is defined by * + * max_send_wr = T * (1 + D) + C ; D is how many inflight dataouts we expect * + * to have at max for SCSI command. The tx posting & completion handling code * + * supports -EAGAIN scheme where tx is suspended till the QP has room for more * + * send WR. D=8 comes from 64K/8K */ + +#define ISER_INFLIGHT_DATAOUTS 8 + +#define ISER_QP_MAX_REQ_DTOS (ISCSI_XMIT_CMDS_MAX * \ + (1 + ISER_INFLIGHT_DATAOUTS) + \ + ISER_MAX_TX_MISC_PDUS + \ + ISER_MAX_RX_MISC_PDUS) + +#define ISER_VER 0x10 +#define ISER_WSV 0x08 +#define ISER_RSV 0x04 + +struct iser_hdr { + u8 flags; + u8 rsvd[3]; + __be32 write_stag; /* write rkey */ + __be64 write_va; + __be32 read_stag; /* read rkey */ + __be64 read_va; +} __attribute__((packed)); + + +/* Length of an object name string */ +#define ISER_OBJECT_NAME_SIZE 64 + +enum iser_ib_conn_state { + ISER_CONN_INIT, /* descriptor allocd, no conn */ + ISER_CONN_PENDING, /* in the process of being established */ + ISER_CONN_UP, /* up and running */ + ISER_CONN_TERMINATING, /* in the process of being terminated */ + ISER_CONN_DOWN, /* shut down */ + ISER_CONN_STATES_NUM +}; + +enum iser_task_status { + ISER_TASK_STATUS_INIT = 0, + ISER_TASK_STATUS_STARTED, + ISER_TASK_STATUS_COMPLETED +}; + +enum iser_data_dir { + ISER_DIR_IN = 0, /* to initiator */ + ISER_DIR_OUT, /* from initiator */ + ISER_DIRS_NUM +}; + +struct iser_data_buf { + void *buf; /* pointer to the sg list */ + unsigned int size; /* num entries of this sg */ + unsigned long data_len; /* total data len */ + unsigned int dma_nents; /* returned by dma_map_sg */ + char *copy_buf; /* allocated copy buf for SGs unaligned * + * for rdma which are copied */ + struct scatterlist sg_single; /* SG-ified clone of a non SG SC or * + * unaligned SG */ + }; + +/* fwd declarations */ +struct iser_device; +struct iscsi_iser_conn; +struct iscsi_iser_cmd_task; + +struct iser_mem_reg { + u32 lkey; + u32 rkey; + u64 va; + u64 len; + void *mem_h; +}; + +struct iser_regd_buf { + struct iser_mem_reg reg; /* memory registration info */ + void *virt_addr; + struct iser_device *device; /* device->device for dma_unmap */ + dma_addr_t dma_addr; /* if non zero, addr for dma_unmap */ + enum dma_data_direction direction; /* direction for dma_unmap */ + unsigned int data_size; + atomic_t ref_count; /* refcount, freed when dec to 0 */ +}; + +#define MAX_REGD_BUF_VECTOR_LEN 2 + +struct iser_dto { + struct iscsi_iser_cmd_task *ctask; + struct iscsi_iser_conn *conn; + int notify_enable; + + /* vector of registered buffers */ + unsigned int regd_vector_len; + struct iser_regd_buf *regd[MAX_REGD_BUF_VECTOR_LEN]; + + /* offset into the registered buffer may be specified */ + unsigned int offset[MAX_REGD_BUF_VECTOR_LEN]; + + /* a smaller size may be specified, if 0, then full size is used */ + unsigned int used_sz[MAX_REGD_BUF_VECTOR_LEN]; +}; + +enum iser_desc_type { + ISCSI_RX, + ISCSI_TX_CONTROL , + ISCSI_TX_SCSI_COMMAND, + ISCSI_TX_DATAOUT +}; + +struct iser_desc { + struct iser_hdr iser_header; + struct iscsi_hdr iscsi_header; + struct iser_regd_buf hdr_regd_buf; + void *data; /* used by RX & TX_CONTROL */ + struct iser_regd_buf data_regd_buf; /* used by RX & TX_CONTROL */ + enum iser_desc_type type; + struct iser_dto dto; +}; + +struct iser_device { + struct ib_device *ib_device; + struct ib_pd *pd; + struct ib_cq *cq; + struct ib_mr *mr; + struct tasklet_struct cq_tasklet; + struct list_head ig_list; /* entry in ig devices list */ + int refcount; +}; + +struct iser_conn { + struct iscsi_iser_conn *iser_conn; /* iser conn for upcalls */ + enum iser_ib_conn_state state; /* rdma connection state */ + spinlock_t lock; /* used for state changes */ + struct iser_device *device; /* device context */ + struct rdma_cm_id *cma_id; /* CMA ID */ + struct ib_qp *qp; /* QP */ + struct ib_fmr_pool *fmr_pool; /* pool of IB FMRs */ + int disc_evt_flag; /* disconn event delivered */ + wait_queue_head_t wait; /* waitq for conn/disconn */ + atomic_t post_recv_buf_count; /* posted rx count */ + atomic_t post_send_buf_count; /* posted tx count */ + struct work_struct comperror_work; /* conn term sleepable ctx*/ + char name[ISER_OBJECT_NAME_SIZE]; + struct iser_page_vec *page_vec; /* represents SG to fmr maps* + * maps serialized as tx is*/ + struct list_head conn_list; /* entry in ig conn list */ +}; + +struct iscsi_iser_conn { + struct iscsi_conn *iscsi_conn;/* ptr to iscsi conn */ + struct iser_conn *ib_conn; /* iSER IB conn */ + + rwlock_t lock; +}; + +struct iscsi_iser_cmd_task { + struct iser_desc desc; + struct iscsi_iser_conn *iser_conn; + int rdma_data_count;/* RDMA bytes */ + enum iser_task_status status; + int command_sent; /* set if command sent */ + int dir[ISER_DIRS_NUM]; /* set if dir use*/ + struct iser_regd_buf rdma_regd[ISER_DIRS_NUM];/* regd rdma buf */ + struct iser_data_buf data[ISER_DIRS_NUM]; /* orig. data des*/ + struct iser_data_buf data_copy[ISER_DIRS_NUM];/* contig. copy */ +}; + +struct iser_page_vec { + u64 *pages; + int length; + int offset; + int data_size; +}; + +struct iser_global { + struct mutex device_list_mutex;/* */ + struct list_head device_list; /* all iSER devices */ + struct mutex connlist_mutex; + struct list_head connlist; /* all iSER IB connections */ + + kmem_cache_t *desc_cache; +}; + +extern struct iser_global ig; +extern int iser_debug_level; + +/* allocate connection resources needed for rdma functionality */ +int iser_conn_set_full_featured_mode(struct iscsi_conn *conn); + +int iser_send_control(struct iscsi_conn *conn, + struct iscsi_mgmt_task *mtask); + +int iser_send_command(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask); + +int iser_send_data_out(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask, + struct iscsi_data *hdr); + +void iscsi_iser_recv(struct iscsi_conn *conn, + struct iscsi_hdr *hdr, + char *rx_data, + int rx_data_len); + +int iser_conn_init(struct iser_conn **ib_conn); + +void iser_conn_terminate(struct iser_conn *ib_conn); + +void iser_conn_release(struct iser_conn *ib_conn); + +void iser_rcv_completion(struct iser_desc *desc, + unsigned long dto_xfer_len); + +void iser_snd_completion(struct iser_desc *desc); + +void iser_ctask_rdma_init(struct iscsi_iser_cmd_task *ctask); + +void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *ctask); + +void iser_dto_buffs_release(struct iser_dto *dto); + +int iser_regd_buff_release(struct iser_regd_buf *regd_buf); + +void iser_reg_single(struct iser_device *device, + struct iser_regd_buf *regd_buf, + enum dma_data_direction direction); + +int iser_start_rdma_unaligned_sg(struct iscsi_iser_cmd_task *ctask, + enum iser_data_dir cmd_dir); + +void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_cmd_task *ctask, + enum iser_data_dir cmd_dir); + +int iser_reg_rdma_mem(struct iscsi_iser_cmd_task *ctask, + enum iser_data_dir cmd_dir); + +int iser_connect(struct iser_conn *ib_conn, + struct sockaddr_in *src_addr, + struct sockaddr_in *dst_addr, + int non_blocking); + +int iser_reg_page_vec(struct iser_conn *ib_conn, + struct iser_page_vec *page_vec, + struct iser_mem_reg *mem_reg); + +void iser_unreg_mem(struct iser_mem_reg *mem_reg); + +int iser_post_recv(struct iser_desc *rx_desc); +int iser_post_send(struct iser_desc *tx_desc); + +int iser_conn_state_comp(struct iser_conn *ib_conn, + enum iser_ib_conn_state comp); +#endif From ogerlitz at voltaire.com Thu May 11 00:00:44 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 11 May 2006 10:00:44 +0300 (IDT) Subject: [openib-general] [PATCH 2/6] iSCSI iSER transport provider high level code In-Reply-To: Message-ID: This file contains the code that registeres with the iscsi transport manager and with the SCSI Mid Layer, where much of the provided functions to iSCSI and SCSI are implemented in libiscsi. Signed-off-by: Or Gerlitz --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iscsi_iser.c 1970-01-01 02:00:00.000000000 +0200 +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iscsi_iser.c 2006-05-10 15:32:01.000000000 +0300 @@ -0,0 +1,794 @@ +/* + * iSCSI Initiator over iSER Data-Path + * + * Copyright (C) 2004 Dmitry Yusupov + * Copyright (C) 2004 Alex Aizman + * Copyright (C) 2005 Mike Christie + * Copyright (c) 2005, 2006 Voltaire, Inc. All rights reserved. + * maintained by openib-general at openib.org + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * Credits: + * Christoph Hellwig + * FUJITA Tomonori + * Arne Redlich + * Zhenyu Wang + * Modified by: + * Erez Zilber + * + * + * $Id: iscsi_iser.c 6965 2006-05-07 11:36:20Z ogerlitz $ + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "iscsi_iser.h" + +static unsigned int iscsi_max_lun = 512; +module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO); + +int iser_debug_level = 0; + +MODULE_DESCRIPTION("iSER (iSCSI Extensions for RDMA) Datamover " + "v" DRV_VER " (" DRV_DATE ")"); +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_AUTHOR("Alex Nezhinsky, Dan Bar Dov, Or Gerlitz"); + +module_param_named(debug_level, iser_debug_level, int, 0644); +MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0 (default:disabled)"); + +struct iser_global ig; + +void +iscsi_iser_recv(struct iscsi_conn *conn, + struct iscsi_hdr *hdr, char *rx_data, int rx_data_len) +{ + int rc = 0; + uint32_t ret_itt; + int datalen; + int ahslen; + + /* verify PDU length */ + datalen = ntoh24(hdr->dlength); + if (datalen != rx_data_len) { + printk(KERN_ERR "iscsi_iser: datalen %d (hdr) != %d (IB) \n", + datalen, rx_data_len); + rc = ISCSI_ERR_DATALEN; + goto error; + } + + /* read AHS */ + ahslen = hdr->hlength * 4; + + /* verify itt (itt encoding: age+cid+itt) */ + rc = iscsi_verify_itt(conn, hdr, &ret_itt); + + if (!rc) + rc = iscsi_complete_pdu(conn, hdr, rx_data, rx_data_len); + + if (rc && rc != ISCSI_ERR_NO_SCSI_CMD) + goto error; + + return; +error: + iscsi_conn_failure(conn, rc); +} + + +/** + * iscsi_iser_cmd_init - Initialize iSCSI SCSI_READ or SCSI_WRITE commands + * + **/ +static void +iscsi_iser_cmd_init(struct iscsi_cmd_task *ctask) +{ + struct iscsi_iser_conn *iser_conn = ctask->conn->dd_data; + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct scsi_cmnd *sc = ctask->sc; + + iser_ctask->command_sent = 0; + iser_ctask->iser_conn = iser_conn; + + if (sc->sc_data_direction == DMA_TO_DEVICE) { + BUG_ON(ctask->total_length == 0); + /* bytes to be sent via RDMA operations */ + iser_ctask->rdma_data_count = ctask->total_length - + ctask->imm_count - + ctask->unsol_count; + + debug_scsi("cmd [itt %x total %d imm %d imm_data %d " + "rdma_data %d]\n", + ctask->itt, ctask->total_length, ctask->imm_count, + ctask->unsol_count, ctask->rdma_data_count); + } else + /* bytes to be sent via RDMA operations */ + iser_ctask->rdma_data_count = ctask->total_length; + + iser_ctask_rdma_init(iser_ctask); +} + +/** + * iscsi_mtask_xmit - xmit management(immediate) task + * @conn: iscsi connection + * @mtask: task management task + * + * Notes: + * The function can return -EAGAIN in which case caller must + * call it again later, or recover. '0' return code means successful + * xmit. + * + **/ +static int +iscsi_iser_mtask_xmit(struct iscsi_conn *conn, + struct iscsi_mgmt_task *mtask) +{ + int error = 0; + + debug_scsi("mtask deq [cid %d itt 0x%x]\n", conn->id, mtask->itt); + + error = iser_send_control(conn, mtask); + + /* since iser xmits control with zero copy, mtasks can not be recycled + * right after sending them. + * The recycling scheme is based on whether a response is expected + * - if yes, the mtask is recycled at iscsi_complete_pdu + * - if no, the mtask is recycled at iser_snd_completion + */ + if (error && error != -EAGAIN) + iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); + + return error; +} + +static int +iscsi_iser_ctask_xmit_unsol_data(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask) +{ + struct iscsi_data hdr; + int error = 0; + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + + /* Send data-out PDUs while there's still unsolicited data to send */ + while (ctask->unsol_count > 0) { + iscsi_prep_unsolicit_data_pdu(ctask, &hdr, + iser_ctask->rdma_data_count); + + debug_scsi("Sending data-out: itt 0x%x, data count %d\n", + hdr.itt, ctask->data_count); + + /* the buffer description has been passed with the command */ + /* Send the command */ + error = iser_send_data_out(conn, ctask, &hdr); + if (error) { + ctask->unsol_datasn--; + goto iscsi_iser_ctask_xmit_unsol_data_exit; + } + ctask->unsol_count -= ctask->data_count; + debug_scsi("Need to send %d more as data-out PDUs\n", + ctask->unsol_count); + } + +iscsi_iser_ctask_xmit_unsol_data_exit: + return error; +} + +static int +iscsi_iser_ctask_xmit(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask) +{ + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + int error = 0; + + debug_scsi("ctask deq [cid %d itt 0x%x]\n", + conn->id, ctask->itt); + + /* + * serialize with TMF AbortTask + */ + if (ctask->mtask) + return error; + + /* Send the cmd PDU */ + if (!iser_ctask->command_sent) { + error = iser_send_command(conn, ctask); + if (error) + goto iscsi_iser_ctask_xmit_exit; + iser_ctask->command_sent = 1; + } + + /* Send unsolicited data-out PDU(s) if necessary */ + if (ctask->unsol_count) + error = iscsi_iser_ctask_xmit_unsol_data(conn, ctask); + + iscsi_iser_ctask_xmit_exit: + if (error && error != -EAGAIN) + iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED); + return error; +} + +static void +iscsi_iser_cleanup_ctask(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask) +{ + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + + if (iser_ctask->status == ISER_TASK_STATUS_STARTED) { + iser_ctask->status = ISER_TASK_STATUS_COMPLETED; + iser_ctask_rdma_finalize(iser_ctask); + } +} + +static struct iser_conn * +iscsi_iser_ib_conn_lookup(__u64 ep_handle) +{ + struct iser_conn *ib_conn; + struct iser_conn *uib_conn = (struct iser_conn *)(unsigned long)ep_handle; + + mutex_lock(&ig.connlist_mutex); + list_for_each_entry(ib_conn, &ig.connlist, conn_list) { + if (ib_conn == uib_conn) { + mutex_unlock(&ig.connlist_mutex); + return ib_conn; + } + } + mutex_unlock(&ig.connlist_mutex); + iser_err("no conn exists for eph %llx\n",(unsigned long long)ep_handle); + return NULL; +} + +static struct iscsi_cls_conn * +iscsi_iser_conn_create(struct iscsi_cls_session *cls_session, uint32_t conn_idx) +{ + struct iscsi_conn *conn; + struct iscsi_cls_conn *cls_conn; + struct iscsi_iser_conn *iser_conn; + + cls_conn = iscsi_conn_setup(cls_session, conn_idx); + if (!cls_conn) + return NULL; + conn = cls_conn->dd_data; + + /* + * due to issues with the login code re iser sematics + * this not set in iscsi_conn_setup - FIXME + */ + conn->max_recv_dlength = 128; + + iser_conn = kzalloc(sizeof(*iser_conn), GFP_KERNEL); + if (!iser_conn) + goto conn_alloc_fail; + + /* currently this is the only field which need to be initiated */ + rwlock_init(&iser_conn->lock); + + conn->recv_lock = &iser_conn->lock; + + conn->dd_data = iser_conn; + iser_conn->iscsi_conn = conn; + + return cls_conn; + +conn_alloc_fail: + iscsi_conn_teardown(cls_conn); + return NULL; +} + +static void +iscsi_iser_conn_destroy(struct iscsi_cls_conn *cls_conn) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_iser_conn *iser_conn = conn->dd_data; + + iscsi_conn_teardown(cls_conn); + kfree(iser_conn); +} + +static int +iscsi_iser_conn_bind(struct iscsi_cls_session *cls_session, + struct iscsi_cls_conn *cls_conn, uint64_t transport_eph, + int is_leading) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_iser_conn *iser_conn; + struct iser_conn *ib_conn; + int error; + + error = iscsi_conn_bind(cls_session, cls_conn, is_leading); + if (error) + return error; + + if (conn->stop_stage != STOP_CONN_SUSPEND) { + /* the transport ep handle comes from user space so it must be + * verified against the global ib connections list */ + ib_conn = iscsi_iser_ib_conn_lookup(transport_eph); + if (!ib_conn) { + iser_err("can't bind eph %llx\n", + (unsigned long long)transport_eph); + return -EINVAL; + } + /* binds the iSER connection retrieved from the previously + * connected ep_handle to the iSCSI layer connection. exchanges + * connection pointers */ + iser_err("binding iscsi conn %p to iser_conn %p\n",conn,ib_conn); + iser_conn = conn->dd_data; + ib_conn->iser_conn = iser_conn; + iser_conn->ib_conn = ib_conn; + } + + return 0; +} + +static int +iscsi_iser_conn_start(struct iscsi_cls_conn *cls_conn) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + int err; + + err = iscsi_conn_start(cls_conn); + if (err) + return err; + + return iser_conn_set_full_featured_mode(conn); +} + +static void +iscsi_iser_conn_terminate(struct iscsi_conn *conn) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iser_conn *ib_conn = iser_conn->ib_conn; + + BUG_ON(!ib_conn); + /* starts conn teardown process, waits until all previously * + * posted buffers get flushed, deallocates all conn resources */ + iser_conn_terminate(ib_conn); + iser_conn->ib_conn = NULL; + conn->recv_lock = NULL; +} + + +static struct iscsi_transport iscsi_iser_transport; + +static struct iscsi_cls_session * +iscsi_iser_session_create(struct iscsi_transport *iscsit, + struct scsi_transport_template *scsit, + uint32_t initial_cmdsn, uint32_t *hostno) +{ + struct iscsi_cls_session *cls_session; + struct iscsi_session *session; + int i; + uint32_t hn; + struct iscsi_cmd_task *ctask; + struct iscsi_mgmt_task *mtask; + struct iscsi_iser_cmd_task *iser_ctask; + struct iser_desc *desc; + + cls_session = iscsi_session_setup(iscsit, scsit, + sizeof(struct iscsi_iser_cmd_task), + sizeof(struct iser_desc), + initial_cmdsn, &hn); + if (!cls_session) + return NULL; + + *hostno = hn; + session = class_to_transport_session(cls_session); + + /* libiscsi setup itts, data and pool so just set desc fields */ + for (i = 0; i < session->cmds_max; i++) { + ctask = session->cmds[i]; + iser_ctask = ctask->dd_data; + ctask->hdr = (struct iscsi_cmd *)&iser_ctask->desc.iscsi_header; + } + + for (i = 0; i < session->mgmtpool_max; i++) { + mtask = session->mgmt_cmds[i]; + desc = mtask->dd_data; + mtask->hdr = &desc->iscsi_header; + desc->data = mtask->data; + } + + return cls_session; +} + +static int +iscsi_iser_conn_set_param(struct iscsi_cls_conn *cls_conn, + enum iscsi_param param, uint32_t value) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + struct iscsi_session *session = conn->session; + + spin_lock_bh(&session->lock); + if (conn->c_stage != ISCSI_CONN_INITIAL_STAGE && + conn->stop_stage != STOP_CONN_RECOVER) { + printk(KERN_ERR "iscsi_iser: can not change parameter [%d]\n", + param); + spin_unlock_bh(&session->lock); + return 0; + } + spin_unlock_bh(&session->lock); + + switch (param) { + case ISCSI_PARAM_MAX_RECV_DLENGTH: + /* TBD */ + break; + case ISCSI_PARAM_MAX_XMIT_DLENGTH: + conn->max_xmit_dlength = value; + break; + case ISCSI_PARAM_HDRDGST_EN: + if (value) { + printk(KERN_ERR "DataDigest wasn't negotiated to None"); + return -EPROTO; + } + break; + case ISCSI_PARAM_DATADGST_EN: + if (value) { + printk(KERN_ERR "DataDigest wasn't negotiated to None"); + return -EPROTO; + } + break; + case ISCSI_PARAM_INITIAL_R2T_EN: + session->initial_r2t_en = value; + break; + case ISCSI_PARAM_IMM_DATA_EN: + session->imm_data_en = value; + break; + case ISCSI_PARAM_FIRST_BURST: + session->first_burst = value; + break; + case ISCSI_PARAM_MAX_BURST: + session->max_burst = value; + break; + case ISCSI_PARAM_PDU_INORDER_EN: + session->pdu_inorder_en = value; + break; + case ISCSI_PARAM_DATASEQ_INORDER_EN: + session->dataseq_inorder_en = value; + break; + case ISCSI_PARAM_ERL: + session->erl = value; + break; + case ISCSI_PARAM_IFMARKER_EN: + if (value) { + printk(KERN_ERR "IFMarker wasn't negotiated to No"); + return -EPROTO; + } + break; + case ISCSI_PARAM_OFMARKER_EN: + if (value) { + printk(KERN_ERR "OFMarker wasn't negotiated to No"); + return -EPROTO; + } + break; + default: + break; + } + + return 0; +} + +static int +iscsi_iser_session_get_param(struct iscsi_cls_session *cls_session, + enum iscsi_param param, uint32_t *value) +{ + struct Scsi_Host *shost = iscsi_session_to_shost(cls_session); + struct iscsi_session *session = iscsi_hostdata(shost->hostdata); + + switch (param) { + case ISCSI_PARAM_INITIAL_R2T_EN: + *value = session->initial_r2t_en; + break; + case ISCSI_PARAM_MAX_R2T: + *value = session->max_r2t; + break; + case ISCSI_PARAM_IMM_DATA_EN: + *value = session->imm_data_en; + break; + case ISCSI_PARAM_FIRST_BURST: + *value = session->first_burst; + break; + case ISCSI_PARAM_MAX_BURST: + *value = session->max_burst; + break; + case ISCSI_PARAM_PDU_INORDER_EN: + *value = session->pdu_inorder_en; + break; + case ISCSI_PARAM_DATASEQ_INORDER_EN: + *value = session->dataseq_inorder_en; + break; + case ISCSI_PARAM_ERL: + *value = session->erl; + break; + case ISCSI_PARAM_IFMARKER_EN: + *value = 0; + break; + case ISCSI_PARAM_OFMARKER_EN: + *value = 0; + break; + default: + return ISCSI_ERR_PARAM_NOT_FOUND; + } + + return 0; +} + +static int +iscsi_iser_conn_get_param(struct iscsi_cls_conn *cls_conn, + enum iscsi_param param, uint32_t *value) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + + switch(param) { + case ISCSI_PARAM_MAX_RECV_DLENGTH: + *value = conn->max_recv_dlength; + break; + case ISCSI_PARAM_MAX_XMIT_DLENGTH: + *value = conn->max_xmit_dlength; + break; + case ISCSI_PARAM_HDRDGST_EN: + *value = 0; + break; + case ISCSI_PARAM_DATADGST_EN: + *value = 0; + break; + /*case ISCSI_PARAM_TARGET_RECV_DLENGTH: + *value = conn->target_recv_dlength; + break; + case ISCSI_PARAM_INITIATOR_RECV_DLENGTH: + *value = conn->initiator_recv_dlength; + break;*/ + default: + return ISCSI_ERR_PARAM_NOT_FOUND; + } + + return 0; +} + + +static void +iscsi_iser_conn_get_stats(struct iscsi_cls_conn *cls_conn, struct iscsi_stats *stats) +{ + struct iscsi_conn *conn = cls_conn->dd_data; + + stats->txdata_octets = conn->txdata_octets; + stats->rxdata_octets = conn->rxdata_octets; + stats->scsicmd_pdus = conn->scsicmd_pdus_cnt; + stats->dataout_pdus = conn->dataout_pdus_cnt; + stats->scsirsp_pdus = conn->scsirsp_pdus_cnt; + stats->datain_pdus = conn->datain_pdus_cnt; /* always 0 */ + stats->r2t_pdus = conn->r2t_pdus_cnt; /* always 0 */ + stats->tmfcmd_pdus = conn->tmfcmd_pdus_cnt; + stats->tmfrsp_pdus = conn->tmfrsp_pdus_cnt; + stats->custom_length = 3; + strcpy(stats->custom[0].desc, "qp_tx_queue_full"); + stats->custom[0].value = 0; /* TB iser_conn->qp_tx_queue_full; */ + strcpy(stats->custom[1].desc, "fmr_map_not_avail"); + stats->custom[1].value = 0; /* TB iser_conn->fmr_map_not_avail */; + strcpy(stats->custom[2].desc, "eh_abort_cnt"); + stats->custom[2].value = conn->eh_abort_cnt; +} + +static int +iscsi_iser_ep_connect(struct sockaddr *dst_addr, int non_blocking, + __u64 *ep_handle) +{ + int err; + struct iser_conn *ib_conn; + + err = iser_conn_init(&ib_conn); + if (err) + goto out; + + err = iser_connect(ib_conn, NULL, (struct sockaddr_in *)dst_addr, non_blocking); + if (!err) + *ep_handle = (__u64)(unsigned long)ib_conn; + +out: + return err; +} + +static int +iscsi_iser_ep_poll(__u64 ep_handle, int timeout_ms) +{ + struct iser_conn *ib_conn = iscsi_iser_ib_conn_lookup(ep_handle); + int rc; + + if (!ib_conn) + return -EINVAL; + + rc = wait_event_interruptible_timeout(ib_conn->wait, + ib_conn->state == ISER_CONN_UP, + msecs_to_jiffies(timeout_ms)); + + /* if conn establishment failed, return error code to iscsi */ + if (!rc && + (ib_conn->state == ISER_CONN_TERMINATING || + ib_conn->state == ISER_CONN_DOWN)) + rc = -1; + + iser_err("ib conn %p rc = %d\n", ib_conn, rc); + + if (rc > 0) + return 1; /* success, this is the equivalent of POLLOUT */ + else if (!rc) + return 0; /* timeout */ + else + return rc; /* signal */ +} + +static void +iscsi_iser_ep_disconnect(__u64 ep_handle) +{ + struct iser_conn *ib_conn = iscsi_iser_ib_conn_lookup(ep_handle); + + if (!ib_conn) + return; + + iser_err("ib conn %p state %d\n",ib_conn, ib_conn->state); + + iser_conn_terminate(ib_conn); +} + +static struct scsi_host_template iscsi_iser_sht = { + .name = "iSCSI Initiator over iSER, v." + ISCSI_VERSION_STR, + .queuecommand = iscsi_queuecommand, + .can_queue = ISCSI_XMIT_CMDS_MAX - 1, + .sg_tablesize = ISCSI_ISER_SG_TABLESIZE, + .cmd_per_lun = ISCSI_MAX_CMD_PER_LUN, + .eh_abort_handler = iscsi_eh_abort, + .eh_host_reset_handler = iscsi_eh_host_reset, + .use_clustering = DISABLE_CLUSTERING, + .proc_name = "iscsi_iser", + .this_id = -1, +}; + +static struct iscsi_transport iscsi_iser_transport = { + .owner = THIS_MODULE, + .name = "iser", + .caps = CAP_RECOVERY_L0 | CAP_MULTI_R2T, + .param_mask = ISCSI_MAX_RECV_DLENGTH | + ISCSI_MAX_XMIT_DLENGTH | + ISCSI_HDRDGST_EN | + ISCSI_DATADGST_EN | + ISCSI_INITIAL_R2T_EN | + ISCSI_MAX_R2T | + ISCSI_IMM_DATA_EN | + ISCSI_FIRST_BURST | + ISCSI_MAX_BURST | + ISCSI_PDU_INORDER_EN | + ISCSI_DATASEQ_INORDER_EN, + .host_template = &iscsi_iser_sht, + .conndata_size = sizeof(struct iscsi_conn), + .max_lun = ISCSI_ISER_MAX_LUN, + .max_cmd_len = ISCSI_ISER_MAX_CMD_LEN, + /* session management */ + .create_session = iscsi_iser_session_create, + .destroy_session = iscsi_session_teardown, + /* connection management */ + .create_conn = iscsi_iser_conn_create, + .bind_conn = iscsi_iser_conn_bind, + .destroy_conn = iscsi_iser_conn_destroy, + .set_param = iscsi_iser_conn_set_param, + .get_conn_param = iscsi_iser_conn_get_param, + .get_session_param = iscsi_iser_session_get_param, + .start_conn = iscsi_iser_conn_start, + .stop_conn = iscsi_conn_stop, + /* these are called as part of conn recovery */ + .suspend_conn_recv = NULL, /* FIXME is/how this relvant to iser? */ + .terminate_conn = iscsi_iser_conn_terminate, + /* IO */ + .send_pdu = iscsi_conn_send_pdu, + .get_stats = iscsi_iser_conn_get_stats, + .init_cmd_task = iscsi_iser_cmd_init, + .xmit_cmd_task = iscsi_iser_ctask_xmit, + .xmit_mgmt_task = iscsi_iser_mtask_xmit, + .cleanup_cmd_task = iscsi_iser_cleanup_ctask, + /* recovery */ + .session_recovery_timedout = iscsi_session_recovery_timedout, + + .ep_connect = iscsi_iser_ep_connect, + .ep_poll = iscsi_iser_ep_poll, + .ep_disconnect = iscsi_iser_ep_disconnect +}; + +static int __init iser_init(void) +{ + int err; + + iser_dbg("Starting iSER datamover...\n"); + + if (iscsi_max_lun < 1) { + printk(KERN_ERR "Invalid max_lun value of %u\n", iscsi_max_lun); + return -EINVAL; + } + + iscsi_iser_transport.max_lun = iscsi_max_lun; + + memset(&ig, 0, sizeof(struct iser_global)); + + ig.desc_cache = kmem_cache_create("iser_descriptors", + sizeof (struct iser_desc), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (ig.desc_cache == NULL) + return -ENOMEM; + + /* device init is called only after the first addr resolution */ + mutex_init(&ig.device_list_mutex); + INIT_LIST_HEAD(&ig.device_list); + mutex_init(&ig.connlist_mutex); + INIT_LIST_HEAD(&ig.connlist); + + if (!iscsi_register_transport(&iscsi_iser_transport)) { + iser_err("iscsi_register_transport failed\n"); + err = -EINVAL; + goto register_transport_failure; + } + + return 0; + +register_transport_failure: + kmem_cache_destroy(ig.desc_cache); + + return err; +} + +static void __exit iser_exit(void) +{ + iser_dbg("Removing iSER datamover...\n"); + iscsi_unregister_transport(&iscsi_iser_transport); + kmem_cache_destroy(ig.desc_cache); +} + +module_init(iser_init); +module_exit(iser_exit); From ogerlitz at voltaire.com Thu May 11 00:02:19 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 11 May 2006 10:02:19 +0300 (IDT) Subject: [openib-general] [PATCH 3/6] iSER initiator iSCSI PDU and TX/RX completions processing In-Reply-To: Message-ID: This file contains the iSER initiator processing of iSCSI PDUs - controls, commands and data-outs along with processing of TX and RX completions. It interacts with the lower level iser code doing the memory registration and and the cma and verbs calls. Signed-off-by: Or Gerlitz --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iser_initiator.c 1970-01-01 02:00:00.000000000 +0200 +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iser_initiator.c 2006-05-10 15:32:01.000000000 +0300 @@ -0,0 +1,734 @@ +/* + * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: iser_initiator.c 6964 2006-05-07 11:11:43Z ogerlitz $ + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "iscsi_iser.h" + +/* Constant PDU lengths calculations */ +#define ISER_TOTAL_HEADERS_LEN (sizeof (struct iser_hdr) + \ + sizeof (struct iscsi_hdr)) + +/* iser_dto_add_regd_buff - increments the reference count for * + * the registered buffer & adds it to the DTO object */ +static void iser_dto_add_regd_buff(struct iser_dto *dto, + struct iser_regd_buf *regd_buf, + unsigned long use_offset, + unsigned long use_size) +{ + int add_idx; + + atomic_inc(®d_buf->ref_count); + + add_idx = dto->regd_vector_len; + dto->regd[add_idx] = regd_buf; + dto->used_sz[add_idx] = use_size; + dto->offset[add_idx] = use_offset; + + dto->regd_vector_len++; +} + +static int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask, + struct iser_data_buf *data, + enum iser_data_dir iser_dir, + enum dma_data_direction dma_dir) +{ + struct device *dma_device; + + iser_ctask->dir[iser_dir] = 1; + dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + + data->dma_nents = dma_map_sg(dma_device, data->buf, data->size, dma_dir); + if (data->dma_nents == 0) { + iser_err("dma_map_sg failed!!!\n"); + return -EINVAL; + } + return 0; +} + +static void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask) +{ + struct device *dma_device; + struct iser_data_buf *data; + + dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + + if (iser_ctask->dir[ISER_DIR_IN]) { + data = &iser_ctask->data[ISER_DIR_IN]; + dma_unmap_sg(dma_device, data->buf, data->size, DMA_FROM_DEVICE); + } + + if (iser_ctask->dir[ISER_DIR_OUT]) { + data = &iser_ctask->data[ISER_DIR_OUT]; + dma_unmap_sg(dma_device, data->buf, data->size, DMA_TO_DEVICE); + } +} + +/* Register user buffer memory and initialize passive rdma + * dto descriptor. Total data size is stored in + * iser_ctask->data[ISER_DIR_IN].data_len + */ +static int iser_prepare_read_cmd(struct iscsi_cmd_task *ctask, + unsigned int edtl) + +{ + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct iser_regd_buf *regd_buf; + int err; + struct iser_hdr *hdr = &iser_ctask->desc.iser_header; + struct iser_data_buf *buf_in = &iser_ctask->data[ISER_DIR_IN]; + + err = iser_dma_map_task_data(iser_ctask, + buf_in, + ISER_DIR_IN, + DMA_FROM_DEVICE); + if (err) + return err; + + if (edtl > iser_ctask->data[ISER_DIR_IN].data_len) { + iser_err("Total data length: %ld, less than EDTL: " + "%d, in READ cmd BHS itt: %d, conn: 0x%p\n", + iser_ctask->data[ISER_DIR_IN].data_len, edtl, + ctask->itt, iser_ctask->iser_conn); + return -EINVAL; + } + + err = iser_reg_rdma_mem(iser_ctask,ISER_DIR_IN); + if (err) { + iser_err("Failed to set up Data-IN RDMA\n"); + return err; + } + regd_buf = &iser_ctask->rdma_regd[ISER_DIR_IN]; + + hdr->flags |= ISER_RSV; + hdr->read_stag = cpu_to_be32(regd_buf->reg.rkey); + hdr->read_va = cpu_to_be64(regd_buf->reg.va); + + iser_dbg("Cmd itt:%d READ tags RKEY:%#.4X VA:%#llX\n", + ctask->itt, regd_buf->reg.rkey, + (unsigned long long)regd_buf->reg.va); + + return 0; +} + +/* Register user buffer memory and initialize passive rdma + * dto descriptor. Total data size is stored in + * ctask->data[ISER_DIR_OUT].data_len + */ +static int +iser_prepare_write_cmd(struct iscsi_cmd_task *ctask, + unsigned int imm_sz, + unsigned int unsol_sz, + unsigned int edtl) +{ + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct iser_regd_buf *regd_buf; + int err; + struct iser_dto *send_dto = &iser_ctask->desc.dto; + struct iser_hdr *hdr = &iser_ctask->desc.iser_header; + struct iser_data_buf *buf_out = &iser_ctask->data[ISER_DIR_OUT]; + + err = iser_dma_map_task_data(iser_ctask, + buf_out, + ISER_DIR_OUT, + DMA_TO_DEVICE); + if (err) + return err; + + if (edtl > iser_ctask->data[ISER_DIR_OUT].data_len) { + iser_err("Total data length: %ld, less than EDTL: %d, " + "in WRITE cmd BHS itt: %d, conn: 0x%p\n", + iser_ctask->data[ISER_DIR_OUT].data_len, + edtl, ctask->itt, ctask->conn); + return -EINVAL; + } + + err = iser_reg_rdma_mem(iser_ctask,ISER_DIR_OUT); + if (err != 0) { + iser_err("Failed to register write cmd RDMA mem\n"); + return err; + } + + regd_buf = &iser_ctask->rdma_regd[ISER_DIR_OUT]; + + if (unsol_sz < edtl) { + hdr->flags |= ISER_WSV; + hdr->write_stag = cpu_to_be32(regd_buf->reg.rkey); + hdr->write_va = cpu_to_be64(regd_buf->reg.va + unsol_sz); + + iser_dbg("Cmd itt:%d, WRITE tags, RKEY:%#.4X " + "VA:%#llX + unsol:%d\n", + ctask->itt, regd_buf->reg.rkey, + (unsigned long long)regd_buf->reg.va, unsol_sz); + } + + if (imm_sz > 0) { + iser_dbg("Cmd itt:%d, WRITE, adding imm.data sz: %d\n", + ctask->itt, imm_sz); + iser_dto_add_regd_buff(send_dto, + regd_buf, + 0, + imm_sz); + } + + return 0; +} + +/** + * iser_post_receive_control - allocates, initializes and posts receive DTO. + */ +static int iser_post_receive_control(struct iscsi_conn *conn) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iser_desc *rx_desc; + struct iser_regd_buf *regd_hdr; + struct iser_regd_buf *regd_data; + struct iser_dto *recv_dto = NULL; + struct iser_device *device = iser_conn->ib_conn->device; + int rx_data_size, err = 0; + + rx_desc = kmem_cache_alloc(ig.desc_cache, GFP_KERNEL); + if (rx_desc == NULL) { + iser_err("Failed to alloc desc for post recv\n"); + return -ENOMEM; + } + rx_desc->type = ISCSI_RX; + + /* for the login sequence we must support rx of upto 8K */ + if (conn->c_stage == ISCSI_CONN_INITIAL_STAGE) + rx_data_size = DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH; + else /* FIXME till user space sets conn->max_recv_dlength correctly */ + rx_data_size = 128; + + rx_desc->data = kmalloc(rx_data_size, GFP_KERNEL); + if (rx_desc->data == NULL) { + iser_err("Failed to alloc data buf for post recv\n"); + err = -ENOMEM; + goto post_rx_kmalloc_failure; + } + + recv_dto = &rx_desc->dto; + recv_dto->conn = iser_conn; + recv_dto->regd_vector_len = 0; + + regd_hdr = &rx_desc->hdr_regd_buf; + memset(regd_hdr, 0, sizeof(struct iser_regd_buf)); + regd_hdr->device = device; + regd_hdr->virt_addr = rx_desc; /* == &rx_desc->iser_header */ + regd_hdr->data_size = ISER_TOTAL_HEADERS_LEN; + + iser_reg_single(device, regd_hdr, DMA_FROM_DEVICE); + + iser_dto_add_regd_buff(recv_dto, regd_hdr, 0, 0); + + regd_data = &rx_desc->data_regd_buf; + memset(regd_data, 0, sizeof(struct iser_regd_buf)); + regd_data->device = device; + regd_data->virt_addr = rx_desc->data; + regd_data->data_size = rx_data_size; + + iser_reg_single(device, regd_data, DMA_FROM_DEVICE); + + iser_dto_add_regd_buff(recv_dto, regd_data, 0, 0); + + err = iser_post_recv(rx_desc); + if (!err) + return 0; + + /* iser_post_recv failed */ + iser_dto_buffs_release(recv_dto); + kfree(rx_desc->data); +post_rx_kmalloc_failure: + kmem_cache_free(ig.desc_cache, rx_desc); + return err; +} + +/* creates a new tx descriptor and adds header regd buffer */ +static void iser_create_send_desc(struct iscsi_iser_conn *iser_conn, + struct iser_desc *tx_desc) +{ + struct iser_regd_buf *regd_hdr = &tx_desc->hdr_regd_buf; + struct iser_dto *send_dto = &tx_desc->dto; + + memset(regd_hdr, 0, sizeof(struct iser_regd_buf)); + regd_hdr->device = iser_conn->ib_conn->device; + regd_hdr->virt_addr = tx_desc; /* == &tx_desc->iser_header */ + regd_hdr->data_size = ISER_TOTAL_HEADERS_LEN; + + send_dto->conn = iser_conn; + send_dto->notify_enable = 1; + send_dto->regd_vector_len = 0; + + memset(&tx_desc->iser_header, 0, sizeof(struct iser_hdr)); + tx_desc->iser_header.flags = ISER_VER; + + iser_dto_add_regd_buff(send_dto, regd_hdr, 0, 0); +} + +/** + * iser_conn_set_full_featured_mode - (iSER API) + */ +int iser_conn_set_full_featured_mode(struct iscsi_conn *conn) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + + int i; + /* no need to keep it in a var, we are after login so if this should + * be negotiated, by now the result should be available here */ + int initial_post_recv_bufs_num = ISER_MAX_RX_MISC_PDUS; + + iser_dbg("Initially post: %d\n", initial_post_recv_bufs_num); + + /* Check that there is no posted recv or send buffers left - */ + /* they must be consumed during the login phase */ + BUG_ON(atomic_read(&iser_conn->ib_conn->post_recv_buf_count) != 0); + BUG_ON(atomic_read(&iser_conn->ib_conn->post_send_buf_count) != 0); + + /* Initial post receive buffers */ + for (i = 0; i < initial_post_recv_bufs_num; i++) { + if (iser_post_receive_control(conn) != 0) { + iser_err("Failed to post recv bufs at:%d conn:0x%p\n", + i, conn); + return -ENOMEM; + } + } + iser_dbg("Posted %d post recv bufs, conn:0x%p\n", i, conn); + return 0; +} + +static int +iser_check_xmit(struct iscsi_conn *conn, void *task) +{ + int rc = 0; + struct iscsi_iser_conn *iser_conn = conn->dd_data; + + write_lock_bh(conn->recv_lock); + if (atomic_read(&iser_conn->ib_conn->post_send_buf_count) == + ISER_QP_MAX_REQ_DTOS) { + iser_dbg("%ld can't xmit task %p, suspending tx\n",jiffies,task); + set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); + rc = -EAGAIN; + } + write_unlock_bh(conn->recv_lock); + return rc; +} + + +/** + * iser_send_command - send command PDU + */ +int iser_send_command(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct iser_dto *send_dto = NULL; + unsigned long edtl; + int err = 0; + struct iser_data_buf *data_buf; + + struct iscsi_cmd *hdr = ctask->hdr; + struct scsi_cmnd *sc = ctask->sc; + + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { + iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); + return -EPERM; + } + if (iser_check_xmit(conn, ctask)) + return -EAGAIN; + + edtl = ntohl(hdr->data_length); + + /* build the tx desc regd header and add it to the tx desc dto */ + iser_ctask->desc.type = ISCSI_TX_SCSI_COMMAND; + send_dto = &iser_ctask->desc.dto; + send_dto->ctask = iser_ctask; + iser_create_send_desc(iser_conn, &iser_ctask->desc); + + if (hdr->flags & ISCSI_FLAG_CMD_READ) + data_buf = &iser_ctask->data[ISER_DIR_IN]; + else + data_buf = &iser_ctask->data[ISER_DIR_OUT]; + + if (sc->use_sg) { /* using a scatter list */ + data_buf->buf = sc->request_buffer; + data_buf->size = sc->use_sg; + } else { /* using a single buffer - convert it into one entry SG */ + sg_init_one(&data_buf->sg_single, + sc->request_buffer, sc->request_bufflen); + data_buf->buf = &data_buf->sg_single; + data_buf->size = 1; + } + + data_buf->data_len = sc->request_bufflen; + + if (hdr->flags & ISCSI_FLAG_CMD_READ) { + err = iser_prepare_read_cmd(ctask, edtl); + if (err) + goto send_command_error; + } + if (hdr->flags & ISCSI_FLAG_CMD_WRITE) { + err = iser_prepare_write_cmd(ctask, + ctask->imm_count, + ctask->imm_count + + ctask->unsol_count, + edtl); + if (err) + goto send_command_error; + } + + iser_reg_single(iser_conn->ib_conn->device, + send_dto->regd[0], DMA_TO_DEVICE); + + if (iser_post_receive_control(conn) != 0) { + iser_err("post_recv failed!\n"); + err = -ENOMEM; + goto send_command_error; + } + + iser_ctask->status = ISER_TASK_STATUS_STARTED; + + err = iser_post_send(&iser_ctask->desc); + if (!err) + return 0; + +send_command_error: + iser_dto_buffs_release(send_dto); + iser_err("conn %p failed ctask->itt %d err %d\n",conn, ctask->itt, err); + return err; +} + +/** + * iser_send_data_out - send data out PDU + */ +int iser_send_data_out(struct iscsi_conn *conn, + struct iscsi_cmd_task *ctask, + struct iscsi_data *hdr) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iscsi_iser_cmd_task *iser_ctask = ctask->dd_data; + struct iser_desc *tx_desc = NULL; + struct iser_dto *send_dto = NULL; + unsigned long buf_offset; + unsigned long data_seg_len; + unsigned int itt; + int err = 0; + + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { + iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); + return -EPERM; + } + + if (iser_check_xmit(conn, ctask)) + return -EAGAIN; + + itt = ntohl(hdr->itt); + data_seg_len = ntoh24(hdr->dlength); + buf_offset = ntohl(hdr->offset); + + iser_dbg("%s itt %d dseg_len %d offset %d\n", + __func__,(int)itt,(int)data_seg_len,(int)buf_offset); + + tx_desc = kmem_cache_alloc(ig.desc_cache, GFP_KERNEL); + if (tx_desc == NULL) { + iser_err("Failed to alloc desc for post dataout\n"); + return -ENOMEM; + } + + tx_desc->type = ISCSI_TX_DATAOUT; + memcpy(&tx_desc->iscsi_header, hdr, sizeof(struct iscsi_hdr)); + + /* build the tx desc regd header and add it to the tx desc dto */ + send_dto = &tx_desc->dto; + send_dto->ctask = iser_ctask; + iser_create_send_desc(iser_conn, tx_desc); + + iser_reg_single(iser_conn->ib_conn->device, + send_dto->regd[0], DMA_TO_DEVICE); + + /* all data was registered for RDMA, we can use the lkey */ + iser_dto_add_regd_buff(send_dto, + &iser_ctask->rdma_regd[ISER_DIR_OUT], + buf_offset, + data_seg_len); + + if (buf_offset + data_seg_len > iser_ctask->data[ISER_DIR_OUT].data_len) { + iser_err("Offset:%ld & DSL:%ld in Data-Out " + "inconsistent with total len:%ld, itt:%d\n", + buf_offset, data_seg_len, + iser_ctask->data[ISER_DIR_OUT].data_len, itt); + err = -EINVAL; + goto send_data_out_error; + } + iser_dbg("data-out itt: %d, offset: %ld, sz: %ld\n", + itt, buf_offset, data_seg_len); + + + err = iser_post_send(tx_desc); + if (!err) + return 0; + +send_data_out_error: + iser_dto_buffs_release(send_dto); + kmem_cache_free(ig.desc_cache, tx_desc); + iser_err("conn %p failed err %d\n",conn, err); + return err; +} + +int iser_send_control(struct iscsi_conn *conn, + struct iscsi_mgmt_task *mtask) +{ + struct iscsi_iser_conn *iser_conn = conn->dd_data; + struct iser_desc *mdesc = mtask->dd_data; + struct iser_dto *send_dto = NULL; + unsigned int itt; + unsigned long data_seg_len; + int err = 0; + unsigned char opcode; + struct iser_regd_buf *regd_buf; + struct iser_device *device; + + if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { + iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); + return -EPERM; + } + + if (iser_check_xmit(conn,mtask)) + return -EAGAIN; + + /* build the tx desc regd header and add it to the tx desc dto */ + mdesc->type = ISCSI_TX_CONTROL; + send_dto = &mdesc->dto; + send_dto->ctask = NULL; + iser_create_send_desc(iser_conn, mdesc); + + device = iser_conn->ib_conn->device; + + iser_reg_single(device, send_dto->regd[0], DMA_TO_DEVICE); + + itt = ntohl(mtask->hdr->itt); + opcode = mtask->hdr->opcode & ISCSI_OPCODE_MASK; + data_seg_len = ntoh24(mtask->hdr->dlength); + + if (data_seg_len > 0) { + regd_buf = &mdesc->data_regd_buf; + memset(regd_buf, 0, sizeof(struct iser_regd_buf)); + regd_buf->device = device; + regd_buf->virt_addr = mtask->data; + regd_buf->data_size = mtask->data_count; + iser_reg_single(device, regd_buf, + DMA_TO_DEVICE); + iser_dto_add_regd_buff(send_dto, regd_buf, + 0, + data_seg_len); + } + + if (iser_post_receive_control(conn) != 0) { + iser_err("post_rcv_buff failed!\n"); + err = -ENOMEM; + goto send_control_error; + } + + err = iser_post_send(mdesc); + if (!err) + return 0; + +send_control_error: + iser_dto_buffs_release(send_dto); + iser_err("conn %p failed err %d\n",conn, err); + return err; +} + +/** + * iser_rcv_dto_completion - recv DTO completion + */ +void iser_rcv_completion(struct iser_desc *rx_desc, + unsigned long dto_xfer_len) +{ + struct iser_dto *dto = &rx_desc->dto; + struct iscsi_iser_conn *conn = dto->conn; + struct iscsi_session *session = conn->iscsi_conn->session; + struct iscsi_cmd_task *ctask; + struct iscsi_iser_cmd_task *iser_ctask; + struct iscsi_hdr *hdr; + char *rx_data = NULL; + int rx_data_len = 0; + unsigned int itt; + unsigned char opcode; + + hdr = &rx_desc->iscsi_header; + + iser_dbg("op 0x%x itt 0x%x\n", hdr->opcode,hdr->itt); + + if (dto_xfer_len > ISER_TOTAL_HEADERS_LEN) { /* we have data */ + rx_data_len = dto_xfer_len - ISER_TOTAL_HEADERS_LEN; + rx_data = dto->regd[1]->virt_addr; + rx_data += dto->offset[1]; + } + + opcode = hdr->opcode & ISCSI_OPCODE_MASK; + + if (opcode == ISCSI_OP_SCSI_CMD_RSP) { + itt = hdr->itt & ISCSI_ITT_MASK; /* mask out cid and age bits */ + if (!(itt < session->cmds_max)) + iser_err("itt can't be matched to task!!!" + "conn %p opcode %d cmds_max %d itt %d\n", + conn->iscsi_conn,opcode,session->cmds_max,itt); + /* use the mapping given with the cmds array indexed by itt */ + ctask = (struct iscsi_cmd_task *)session->cmds[itt]; + iser_ctask = ctask->dd_data; + iser_dbg("itt %d ctask %p\n",itt,ctask); + iser_ctask->status = ISER_TASK_STATUS_COMPLETED; + iser_ctask_rdma_finalize(iser_ctask); + } + + iser_dto_buffs_release(dto); + + iscsi_iser_recv(conn->iscsi_conn, hdr, rx_data, rx_data_len); + + kfree(rx_desc->data); + kmem_cache_free(ig.desc_cache, rx_desc); + + /* decrementing conn->post_recv_buf_count only --after-- freeing the * + * task eliminates the need to worry on tasks which are completed in * + * parallel to the execution of iser_conn_term. So the code that waits * + * for the posted rx bufs refcount to become zero handles everything */ + atomic_dec(&conn->ib_conn->post_recv_buf_count); +} + +void iser_snd_completion(struct iser_desc *tx_desc) +{ + struct iser_dto *dto = &tx_desc->dto; + struct iscsi_iser_conn *iser_conn = dto->conn; + struct iscsi_conn *conn = iser_conn->iscsi_conn; + struct iscsi_mgmt_task *mtask; + + iser_dbg("Initiator, Data sent dto=0x%p\n", dto); + + iser_dto_buffs_release(dto); + + if (tx_desc->type == ISCSI_TX_DATAOUT) + kmem_cache_free(ig.desc_cache, tx_desc); + + atomic_dec(&iser_conn->ib_conn->post_send_buf_count); + + write_lock(conn->recv_lock); + if (conn->suspend_tx) { + iser_dbg("%ld resuming tx\n",jiffies); + clear_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx); + scsi_queue_work(conn->session->host, &conn->xmitwork); + } + write_unlock(conn->recv_lock); + + if (tx_desc->type == ISCSI_TX_CONTROL) { + /* this arithmetic is legal by libiscsi dd_data allocation */ + mtask = (void *) ((long)(void *)tx_desc - + sizeof(struct iscsi_mgmt_task)); + if (mtask->hdr->itt == cpu_to_be32(ISCSI_RESERVED_TAG)) { + struct iscsi_session *session = conn->session; + + spin_lock(&conn->session->lock); + list_del(&mtask->running); + __kfifo_put(session->mgmtpool.queue, (void*)&mtask, + sizeof(void*)); + spin_unlock(&session->lock); + } + } +} + +void iser_ctask_rdma_init(struct iscsi_iser_cmd_task *iser_ctask) + +{ + iser_ctask->status = ISER_TASK_STATUS_INIT; + + iser_ctask->dir[ISER_DIR_IN] = 0; + iser_ctask->dir[ISER_DIR_OUT] = 0; + + iser_ctask->data[ISER_DIR_IN].data_len = 0; + iser_ctask->data[ISER_DIR_OUT].data_len = 0; + + memset(&iser_ctask->rdma_regd[ISER_DIR_IN], 0, + sizeof(struct iser_regd_buf)); + memset(&iser_ctask->rdma_regd[ISER_DIR_OUT], 0, + sizeof(struct iser_regd_buf)); +} + +void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *iser_ctask) +{ + int deferred; + + /* if we were reading, copy back to unaligned sglist, + * anyway dma_unmap and free the copy + */ + if (iser_ctask->data_copy[ISER_DIR_IN].copy_buf != NULL) + iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_IN); + if (iser_ctask->data_copy[ISER_DIR_OUT].copy_buf != NULL) + iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_OUT); + + if (iser_ctask->dir[ISER_DIR_IN]) { + deferred = iser_regd_buff_release + (&iser_ctask->rdma_regd[ISER_DIR_IN]); + if (deferred) { + iser_err("References remain for BUF-IN rdma reg\n"); + BUG(); + } + } + + if (iser_ctask->dir[ISER_DIR_OUT]) { + deferred = iser_regd_buff_release + (&iser_ctask->rdma_regd[ISER_DIR_OUT]); + if (deferred) { + iser_err("References remain for BUF-OUT rdma reg\n"); + BUG(); + } + } + + iser_dma_unmap_task_data(iser_ctask); +} + +void iser_dto_buffs_release(struct iser_dto *dto) +{ + int i; + + for (i = 0; i < dto->regd_vector_len; i++) + iser_regd_buff_release(dto->regd[i]); +} + From ogerlitz at voltaire.com Thu May 11 00:02:46 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 11 May 2006 10:02:46 +0300 (IDT) Subject: [openib-general] [PATCH 4/6] iSER RDMA CM (CMA) and IB verbs interaction In-Reply-To: Message-ID: This file contains the low level interaction with the RDMA CM and the IB verbs, where iSER is consumer of both. Signed-off-by: Or Gerlitz --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iser_verbs.c 1970-01-01 02:00:00.000000000 +0200 +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iser_verbs.c 2006-05-10 15:32:01.000000000 +0300 @@ -0,0 +1,827 @@ +/* + * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. + * Copyright (c) 2005, 2006 Cisco Systems. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: iser_verbs.c 7051 2006-05-10 12:29:11Z ogerlitz $ + */ +#include +#include +#include +#include +#include +#include + +#include "iscsi_iser.h" + +#define ISCSI_ISER_MAX_CONN 8 +#define ISER_MAX_CQ_LEN ((ISER_QP_MAX_RECV_DTOS + \ + ISER_QP_MAX_REQ_DTOS) * \ + ISCSI_ISER_MAX_CONN) + +static void iser_cq_tasklet_fn(unsigned long data); +static void iser_cq_callback(struct ib_cq *cq, void *cq_context); +static void iser_comp_error_worker(void *data); + +static void iser_cq_event_callback(struct ib_event *cause, void *context) +{ + iser_err("got cq event %d \n", cause->event); +} + +static void iser_qp_event_callback(struct ib_event *cause, void *context) +{ + iser_err("got qp event %d\n",cause->event); +} + +/** + * iser_create_device_ib_res - creates Protection Domain (PD), Completion + * Queue (CQ), DMA Memory Region (DMA MR) with the device associated with + * the adapator. + * + * returns 0 on success, -1 on failure + */ +static int iser_create_device_ib_res(struct iser_device *device) +{ + device->pd = ib_alloc_pd(device->ib_device); + if (IS_ERR(device->pd)) + goto pd_err; + + device->cq = ib_create_cq(device->ib_device, + iser_cq_callback, + iser_cq_event_callback, + (void *)device, + ISER_MAX_CQ_LEN); + if (IS_ERR(device->cq)) + goto cq_err; + + if (ib_req_notify_cq(device->cq, IB_CQ_NEXT_COMP)) + goto cq_arm_err; + + tasklet_init(&device->cq_tasklet, + iser_cq_tasklet_fn, + (unsigned long)device); + + device->mr = ib_get_dma_mr(device->pd, + IB_ACCESS_LOCAL_WRITE); + if (IS_ERR(device->mr)) + goto dma_mr_err; + + return 0; + +dma_mr_err: + tasklet_kill(&device->cq_tasklet); +cq_arm_err: + ib_destroy_cq(device->cq); +cq_err: + ib_dealloc_pd(device->pd); +pd_err: + iser_err("failed to allocate an IB resource\n"); + return -1; +} + +/** + * iser_free_device_ib_res - destory/dealloc/dereg the DMA MR, + * CQ and PD created with the device associated with the adapator. + */ +static void iser_free_device_ib_res(struct iser_device *device) +{ + BUG_ON(device->mr == NULL); + + tasklet_kill(&device->cq_tasklet); + + (void)ib_dereg_mr(device->mr); + (void)ib_destroy_cq(device->cq); + (void)ib_dealloc_pd(device->pd); + + device->mr = NULL; + device->cq = NULL; + device->pd = NULL; +} + +/** + * iser_create_ib_conn_res - Creates FMR pool and Queue-Pair (QP) + * + * returns 0 on success, -1 on failure + */ +static int iser_create_ib_conn_res(struct iser_conn *ib_conn) +{ + struct iser_device *device; + struct ib_qp_init_attr init_attr; + int ret; + struct ib_fmr_pool_param params; + + BUG_ON(ib_conn->device == NULL); + + device = ib_conn->device; + + ib_conn->page_vec = kmalloc(sizeof(struct iser_page_vec) + + (sizeof(u64) * (ISCSI_ISER_SG_TABLESIZE +1)), + GFP_KERNEL); + if (!ib_conn->page_vec) { + ret = -ENOMEM; + goto alloc_err; + } + ib_conn->page_vec->pages = (u64 *) (ib_conn->page_vec + 1); + + params.page_shift = PAGE_SHIFT; + /* when the first/last SG element are not start/end * + * page aligned, the map whould be of N+1 pages */ + params.max_pages_per_fmr = ISCSI_ISER_SG_TABLESIZE + 1; + /* make the pool size twice the max number of SCSI commands * + * the ML is expected to queue, watermark for unmap at 50% */ + params.pool_size = ISCSI_XMIT_CMDS_MAX * 2; + params.dirty_watermark = ISCSI_XMIT_CMDS_MAX; + params.cache = 0; + params.flush_function = NULL; + params.access = (IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_WRITE | + IB_ACCESS_REMOTE_READ); + + ib_conn->fmr_pool = ib_create_fmr_pool(device->pd, ¶ms); + if (IS_ERR(ib_conn->fmr_pool)) { + ret = PTR_ERR(ib_conn->fmr_pool); + goto fmr_pool_err; + } + + memset(&init_attr, 0, sizeof init_attr); + + init_attr.event_handler = iser_qp_event_callback; + init_attr.qp_context = (void *)ib_conn; + init_attr.send_cq = device->cq; + init_attr.recv_cq = device->cq; + init_attr.cap.max_send_wr = ISER_QP_MAX_REQ_DTOS; + init_attr.cap.max_recv_wr = ISER_QP_MAX_RECV_DTOS; + init_attr.cap.max_send_sge = MAX_REGD_BUF_VECTOR_LEN; + init_attr.cap.max_recv_sge = 2; + init_attr.sq_sig_type = IB_SIGNAL_REQ_WR; + init_attr.qp_type = IB_QPT_RC; + + ret = rdma_create_qp(ib_conn->cma_id, device->pd, &init_attr); + if (ret) + goto qp_err; + + ib_conn->qp = ib_conn->cma_id->qp; + iser_err("setting conn %p cma_id %p: fmr_pool %p qp %p\n", + ib_conn, ib_conn->cma_id, + ib_conn->fmr_pool, ib_conn->cma_id->qp); + return ret; + +qp_err: + (void)ib_destroy_fmr_pool(ib_conn->fmr_pool); +fmr_pool_err: + kfree(ib_conn->page_vec); +alloc_err: + iser_err("unable to alloc mem or create resource, err %d\n", ret); + return ret; +} + +/** + * releases the FMR pool, QP and CMA ID objects, returns 0 on success, + * -1 on failure + */ +static int iser_free_ib_conn_res(struct iser_conn *ib_conn) +{ + BUG_ON(ib_conn == NULL); + + iser_err("freeing conn %p cma_id %p fmr pool %p qp %p\n", + ib_conn, ib_conn->cma_id, + ib_conn->fmr_pool, ib_conn->qp); + + /* qp is created only once both addr & route are resolved */ + if (ib_conn->fmr_pool != NULL) + ib_destroy_fmr_pool(ib_conn->fmr_pool); + + if (ib_conn->qp != NULL) + rdma_destroy_qp(ib_conn->cma_id); + + if (ib_conn->cma_id != NULL) + rdma_destroy_id(ib_conn->cma_id); + + ib_conn->fmr_pool = NULL; + ib_conn->qp = NULL; + ib_conn->cma_id = NULL; + kfree(ib_conn->page_vec); + + return 0; +} + +/** + * based on the resolved device node GUID see if there already allocated + * device for this device. If there's no such, create one. + */ +static +struct iser_device *iser_device_find_by_ib_device(struct rdma_cm_id *cma_id) +{ + struct list_head *p_list; + struct iser_device *device = NULL; + + mutex_lock(&ig.device_list_mutex); + + p_list = ig.device_list.next; + while (p_list != &ig.device_list) { + device = list_entry(p_list, struct iser_device, ig_list); + /* find if there's a match using the node GUID */ + if (device->ib_device->node_guid == cma_id->device->node_guid) + break; + } + + if (device == NULL) { + device = kzalloc(sizeof *device, GFP_KERNEL); + if (device == NULL) + goto out; + /* assign this device to the device */ + device->ib_device = cma_id->device; + /* init the device and link it into ig device list */ + if (iser_create_device_ib_res(device)) { + kfree(device); + device = NULL; + goto out; + } + list_add(&device->ig_list, &ig.device_list); + } +out: + BUG_ON(device == NULL); + device->refcount++; + mutex_unlock(&ig.device_list_mutex); + return device; +} + +/* if there's no demand for this device, release it */ +static void iser_device_try_release(struct iser_device *device) +{ + mutex_lock(&ig.device_list_mutex); + device->refcount--; + iser_err("device %p refcount %d\n",device,device->refcount); + if (!device->refcount) { + iser_free_device_ib_res(device); + list_del(&device->ig_list); + kfree(device); + } + mutex_unlock(&ig.device_list_mutex); +} + +int iser_conn_state_comp(struct iser_conn *ib_conn, + enum iser_ib_conn_state comp) +{ + int ret; + + spin_lock_bh(&ib_conn->lock); + ret = (ib_conn->state == comp); + spin_unlock_bh(&ib_conn->lock); + return ret; +} + +static int iser_conn_state_comp_exch(struct iser_conn *ib_conn, + enum iser_ib_conn_state comp, + enum iser_ib_conn_state exch) +{ + int ret; + + spin_lock_bh(&ib_conn->lock); + if ((ret = (ib_conn->state == comp))) + ib_conn->state = exch; + spin_unlock_bh(&ib_conn->lock); + return ret; +} + +/** + * triggers start of the disconnect procedures and wait for them to be done + */ +void iser_conn_terminate(struct iser_conn *ib_conn) +{ + int err = 0; + + /* change the ib conn state only if the conn is UP, however always call + * rdma_disconnect since this is the only way to cause the CMA to change + * the QP state to ERROR + */ + + iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, ISER_CONN_TERMINATING); + err = rdma_disconnect(ib_conn->cma_id); + if (err) + iser_err("Failed to disconnect, conn: 0x%p err %d\n", + ib_conn,err); + + wait_event_interruptible(ib_conn->wait, + ib_conn->state == ISER_CONN_DOWN); + + iser_conn_release(ib_conn); +} + +static void iser_connect_error(struct rdma_cm_id *cma_id) +{ + struct iser_conn *ib_conn; + ib_conn = (struct iser_conn *)cma_id->context; + + ib_conn->state = ISER_CONN_DOWN; + wake_up_interruptible(&ib_conn->wait); +} + +static void iser_addr_handler(struct rdma_cm_id *cma_id) +{ + struct iser_device *device; + struct iser_conn *ib_conn; + int ret; + + device = iser_device_find_by_ib_device(cma_id); + ib_conn = (struct iser_conn *)cma_id->context; + ib_conn->device = device; + + ret = rdma_resolve_route(cma_id, 1000); + if (ret) { + iser_err("resolve route failed: %d\n", ret); + iser_connect_error(cma_id); + } + return; +} + +static void iser_route_handler(struct rdma_cm_id *cma_id) +{ + struct rdma_conn_param conn_param; + int ret; + + ret = iser_create_ib_conn_res((struct iser_conn *)cma_id->context); + if (ret) + goto failure; + + iser_dbg("path.mtu is %d setting it to %d\n", + cma_id->route.path_rec->mtu, IB_MTU_1024); + + /* we must set the MTU to 1024 as this is what the target is assuming */ + if (cma_id->route.path_rec->mtu > IB_MTU_1024) + cma_id->route.path_rec->mtu = IB_MTU_1024; + + memset(&conn_param, 0, sizeof conn_param); + conn_param.responder_resources = 4; + conn_param.initiator_depth = 1; + conn_param.retry_count = 7; + conn_param.rnr_retry_count = 6; + + ret = rdma_connect(cma_id, &conn_param); + if (ret) { + iser_err("failure connecting: %d\n", ret); + goto failure; + } + + return; +failure: + iser_connect_error(cma_id); +} + +static void iser_connected_handler(struct rdma_cm_id *cma_id) +{ + struct iser_conn *ib_conn; + + ib_conn = (struct iser_conn *)cma_id->context; + ib_conn->state = ISER_CONN_UP; + wake_up_interruptible(&ib_conn->wait); +} + +static void iser_disconnected_handler(struct rdma_cm_id *cma_id) +{ + struct iser_conn *ib_conn; + + ib_conn = (struct iser_conn *)cma_id->context; + ib_conn->disc_evt_flag = 1; + + /* getting here when the state is UP means that the conn is being * + * terminated asynchronously from the iSCSI layer's perspective. */ + if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, + ISER_CONN_TERMINATING)) + iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, + ISCSI_ERR_CONN_FAILED); + + /* Complete the termination process if no posts are pending */ + if ((atomic_read(&ib_conn->post_recv_buf_count) == 0) && + (atomic_read(&ib_conn->post_send_buf_count) == 0)) { + ib_conn->state = ISER_CONN_DOWN; + wake_up_interruptible(&ib_conn->wait); + } +} + +static int iser_cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) +{ + int ret = 0; + + iser_err("event %d conn %p id %p\n",event->event,cma_id->context,cma_id); + + switch (event->event) { + case RDMA_CM_EVENT_ADDR_RESOLVED: + iser_addr_handler(cma_id); + break; + case RDMA_CM_EVENT_ROUTE_RESOLVED: + iser_route_handler(cma_id); + break; + case RDMA_CM_EVENT_ESTABLISHED: + iser_connected_handler(cma_id); + break; + case RDMA_CM_EVENT_ADDR_ERROR: + case RDMA_CM_EVENT_ROUTE_ERROR: + case RDMA_CM_EVENT_CONNECT_ERROR: + case RDMA_CM_EVENT_UNREACHABLE: + case RDMA_CM_EVENT_REJECTED: + iser_err("event: %d, error: %d\n", event->event, event->status); + iser_connect_error(cma_id); + break; + case RDMA_CM_EVENT_DISCONNECTED: + iser_disconnected_handler(cma_id); + break; + case RDMA_CM_EVENT_DEVICE_REMOVAL: + BUG(); + break; + case RDMA_CM_EVENT_CONNECT_RESPONSE: + BUG(); + break; + case RDMA_CM_EVENT_CONNECT_REQUEST: + default: + break; + } + return ret; +} + +int iser_conn_init(struct iser_conn **ibconn) +{ + struct iser_conn *ib_conn; + + ib_conn = kzalloc(sizeof *ib_conn, GFP_KERNEL); + if (!ib_conn) { + iser_err("can't alloc memory for struct iser_conn\n"); + return -ENOMEM; + } + ib_conn->state = ISER_CONN_INIT; + init_waitqueue_head(&ib_conn->wait); + atomic_set(&ib_conn->post_recv_buf_count, 0); + atomic_set(&ib_conn->post_send_buf_count, 0); + INIT_WORK(&ib_conn->comperror_work, iser_comp_error_worker, + ib_conn); + INIT_LIST_HEAD(&ib_conn->conn_list); + spin_lock_init(&ib_conn->lock); + + *ibconn = ib_conn; + return 0; +} + + /** + * starts the process of connecting to the target + * sleeps untill the connection is established or rejected + */ +int iser_connect(struct iser_conn *ib_conn, + struct sockaddr_in *src_addr, + struct sockaddr_in *dst_addr, + int non_blocking) +{ + struct sockaddr *src, *dst; + int err = 0; + + sprintf(ib_conn->name,"%d.%d.%d.%d:%d", + NIPQUAD(dst_addr->sin_addr.s_addr), dst_addr->sin_port); + + /* the device is known only --after-- address resolution */ + ib_conn->device = NULL; + + iser_err("connecting to: %d.%d.%d.%d, port 0x%x\n", + NIPQUAD(dst_addr->sin_addr), dst_addr->sin_port); + + ib_conn->state = ISER_CONN_PENDING; + + ib_conn->cma_id = rdma_create_id(iser_cma_handler, + (void *)ib_conn, + RDMA_PS_TCP); + if (IS_ERR(ib_conn->cma_id)) { + err = PTR_ERR(ib_conn->cma_id); + iser_err("rdma_create_id failed: %d\n", err); + goto id_failure; + } + + src = (struct sockaddr *)src_addr; + dst = (struct sockaddr *)dst_addr; + err = rdma_resolve_addr(ib_conn->cma_id, src, dst, 1000); + if (err) { + iser_err("rdma_resolve_addr failed: %d\n", err); + goto addr_failure; + } + + if (!non_blocking) { + wait_event_interruptible(ib_conn->wait, + (ib_conn->state != ISER_CONN_PENDING)); + + if (ib_conn->state != ISER_CONN_UP) { + err = -EIO; + goto connect_failure; + } + } + + mutex_lock(&ig.connlist_mutex); + list_add(&ib_conn->conn_list, &ig.connlist); + mutex_unlock(&ig.connlist_mutex); + return 0; + +id_failure: + ib_conn->cma_id = NULL; +addr_failure: + ib_conn->state = ISER_CONN_DOWN; +connect_failure: + iser_conn_release(ib_conn); + return err; +} + +/** + * Frees all conn objects and deallocs conn descriptor + */ +void iser_conn_release(struct iser_conn *ib_conn) +{ + struct iser_device *device = ib_conn->device; + + BUG_ON(ib_conn->state != ISER_CONN_DOWN); + + mutex_lock(&ig.connlist_mutex); + list_del(&ib_conn->conn_list); + mutex_unlock(&ig.connlist_mutex); + + iser_free_ib_conn_res(ib_conn); + ib_conn->device = NULL; + /* on EVENT_ADDR_ERROR there's no device yet for this conn */ + if (device != NULL) + iser_device_try_release(device); + kfree(ib_conn); +} + + +/** + * iser_reg_page_vec - Register physical memory + * + * returns: 0 on success, errno code on failure + */ +int iser_reg_page_vec(struct iser_conn *ib_conn, + struct iser_page_vec *page_vec, + struct iser_mem_reg *mem_reg) +{ + struct ib_pool_fmr *mem; + u64 io_addr; + u64 *page_list; + int status; + + page_list = page_vec->pages; + io_addr = page_list[0]; + + mem = ib_fmr_pool_map_phys(ib_conn->fmr_pool, + page_list, + page_vec->length, + &io_addr); + + if (IS_ERR(mem)) { + status = (int)PTR_ERR(mem); + iser_err("ib_fmr_pool_map_phys failed: %d\n", status); + return status; + } + + mem_reg->lkey = mem->fmr->lkey; + mem_reg->rkey = mem->fmr->rkey; + mem_reg->len = page_vec->length * PAGE_SIZE; + mem_reg->va = io_addr; + mem_reg->mem_h = (void *)mem; + + mem_reg->va += page_vec->offset; + mem_reg->len = page_vec->data_size; + + iser_dbg("PHYSICAL Mem.register, [PHYS p_array: 0x%p, sz: %d, " + "entry[0]: (0x%08lx,%ld)] -> " + "[lkey: 0x%08X mem_h: 0x%p va: 0x%08lX sz: %ld]\n", + page_vec, page_vec->length, + (unsigned long)page_vec->pages[0], + (unsigned long)page_vec->data_size, + (unsigned int)mem_reg->lkey, mem_reg->mem_h, + (unsigned long)mem_reg->va, (unsigned long)mem_reg->len); + return 0; +} + +/** + * Unregister (previosuly registered) memory. + */ +void iser_unreg_mem(struct iser_mem_reg *reg) +{ + int ret; + + iser_dbg("PHYSICAL Mem.Unregister mem_h %p\n",reg->mem_h); + + ret = ib_fmr_pool_unmap((struct ib_pool_fmr *)reg->mem_h); + if (ret) + iser_err("ib_fmr_pool_unmap failed %d\n", ret); + + reg->mem_h = NULL; +} + +/** + * iser_dto_to_iov - builds IOV from a dto descriptor + */ +static void iser_dto_to_iov(struct iser_dto *dto, struct ib_sge *iov, int iov_len) +{ + int i; + struct ib_sge *sge; + struct iser_regd_buf *regd_buf; + + if (dto->regd_vector_len > iov_len) { + iser_err("iov size %d too small for posting dto of len %d\n", + iov_len, dto->regd_vector_len); + BUG(); + } + + for (i = 0; i < dto->regd_vector_len; i++) { + sge = &iov[i]; + regd_buf = dto->regd[i]; + + sge->addr = regd_buf->reg.va; + sge->length = regd_buf->reg.len; + sge->lkey = regd_buf->reg.lkey; + + if (dto->used_sz[i] > 0) /* Adjust size */ + sge->length = dto->used_sz[i]; + + /* offset and length should not exceed the regd buf length */ + if (sge->length + dto->offset[i] > regd_buf->reg.len) { + iser_err("Used len:%ld + offset:%d, exceed reg.buf.len:" + "%ld in dto:0x%p [%d], va:0x%08lX\n", + (unsigned long)sge->length, dto->offset[i], + (unsigned long)regd_buf->reg.len, dto, i, + (unsigned long)sge->addr); + BUG(); + } + + sge->addr += dto->offset[i]; /* Adjust offset */ + } +} + +/** + * iser_post_recv - Posts a receive buffer. + * + * returns 0 on success, -1 on failure + */ +int iser_post_recv(struct iser_desc *rx_desc) +{ + int ib_ret, ret_val = 0; + struct ib_recv_wr recv_wr, *recv_wr_failed; + struct ib_sge iov[2]; + struct iser_conn *ib_conn; + struct iser_dto *recv_dto = &rx_desc->dto; + + /* Retrieve conn */ + ib_conn = recv_dto->conn->ib_conn; + + iser_dto_to_iov(recv_dto, iov, 2); + + recv_wr.next = NULL; + recv_wr.sg_list = iov; + recv_wr.num_sge = recv_dto->regd_vector_len; + recv_wr.wr_id = (unsigned long)rx_desc; + + atomic_inc(&ib_conn->post_recv_buf_count); + ib_ret = ib_post_recv(ib_conn->qp, &recv_wr, &recv_wr_failed); + if (ib_ret) { + iser_err("ib_post_recv failed ret=%d\n", ib_ret); + atomic_dec(&ib_conn->post_recv_buf_count); + ret_val = -1; + } + + return ret_val; +} + +/** + * iser_start_send - Initiate a Send DTO operation + * + * returns 0 on success, -1 on failure + */ +int iser_post_send(struct iser_desc *tx_desc) +{ + int ib_ret, ret_val = 0; + struct ib_send_wr send_wr, *send_wr_failed; + struct ib_sge iov[MAX_REGD_BUF_VECTOR_LEN]; + struct iser_conn *ib_conn; + struct iser_dto *dto = &tx_desc->dto; + + ib_conn = dto->conn->ib_conn; + + iser_dto_to_iov(dto, iov, MAX_REGD_BUF_VECTOR_LEN); + + send_wr.next = NULL; + send_wr.wr_id = (unsigned long)tx_desc; + send_wr.sg_list = iov; + send_wr.num_sge = dto->regd_vector_len; + send_wr.opcode = IB_WR_SEND; + send_wr.send_flags = dto->notify_enable ? IB_SEND_SIGNALED : 0; + + atomic_inc(&ib_conn->post_send_buf_count); + + ib_ret = ib_post_send(ib_conn->qp, &send_wr, &send_wr_failed); + if (ib_ret) { + iser_err("Failed to start SEND DTO, dto: 0x%p, IOV len: %d\n", + dto, dto->regd_vector_len); + iser_err("ib_post_send failed, ret:%d\n", ib_ret); + atomic_dec(&ib_conn->post_send_buf_count); + ret_val = -1; + } + + return ret_val; +} + +static void iser_comp_error_worker(void *data) +{ + struct iser_conn *ib_conn = data; + + /* getting here when the state is UP means that the conn is being * + * terminated asynchronously from the iSCSI layer's perspective. */ + if (iser_conn_state_comp_exch(ib_conn, ISER_CONN_UP, + ISER_CONN_TERMINATING)) + iscsi_conn_failure(ib_conn->iser_conn->iscsi_conn, + ISCSI_ERR_CONN_FAILED); + + /* complete the termination process if disconnect event was delivered * + * note there are no more non completed posts to the QP */ + if (ib_conn->disc_evt_flag) { + ib_conn->state = ISER_CONN_DOWN; + wake_up_interruptible(&ib_conn->wait); + } +} + +static void iser_handle_comp_error(struct iser_desc *desc) +{ + struct iser_dto *dto = &desc->dto; + struct iser_conn *ib_conn = dto->conn->ib_conn; + + iser_dto_buffs_release(dto); + + if (desc->type == ISCSI_RX) { + kfree(desc->data); + kmem_cache_free(ig.desc_cache, desc); + atomic_dec(&ib_conn->post_recv_buf_count); + } else { /* type is TX control/command/dataout */ + if (desc->type == ISCSI_TX_DATAOUT) + kmem_cache_free(ig.desc_cache, desc); + atomic_dec(&ib_conn->post_send_buf_count); + } + + if (atomic_read(&ib_conn->post_recv_buf_count) == 0 && + atomic_read(&ib_conn->post_send_buf_count) == 0) + schedule_work(&ib_conn->comperror_work); +} + +static void iser_cq_tasklet_fn(unsigned long data) +{ + struct iser_device *device = (struct iser_device *)data; + struct ib_cq *cq = device->cq; + struct ib_wc wc; + struct iser_desc *desc; + unsigned long xfer_len; + + while (ib_poll_cq(cq, 1, &wc) == 1) { + desc = (struct iser_desc *) (unsigned long) wc.wr_id; + BUG_ON(desc == NULL); + + if (wc.status == IB_WC_SUCCESS) { + if (desc->type == ISCSI_RX) { + xfer_len = (unsigned long)wc.byte_len; + iser_rcv_completion(desc, xfer_len); + } else /* type == ISCSI_TX_CONTROL/SCSI_CMD/DOUT */ + iser_snd_completion(desc); + } else { + iser_err("comp w. error op %d status %d\n",desc->type,wc.status); + iser_handle_comp_error(desc); + } + } + /* #warning "it is assumed here that arming CQ only once its empty" * + * " would not cause interrupts to be missed" */ + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); +} + +static void iser_cq_callback(struct ib_cq *cq, void *cq_context) +{ + struct iser_device *device = (struct iser_device *)cq_context; + + tasklet_schedule(&device->cq_tasklet); +} From ogerlitz at voltaire.com Thu May 11 00:03:08 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 11 May 2006 10:03:08 +0300 (IDT) Subject: [openib-general] [PATCH 5/6] iSER handling of memory for RDMA In-Reply-To: Message-ID: This file contains the processing carried over an SG list associated with a SCSI command such that it can be registered with the IB verbs. The registration produces a network virtual address (VA) and a remote access key (RKEY or STAG in iSER spec notation) which are used by the target for its RDMA operation. Signed-off-by: Or Gerlitz --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/iser_memory.c 1970-01-01 02:00:00.000000000 +0200 +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/iser_memory.c 2006-05-10 15:32:01.000000000 +0300 @@ -0,0 +1,401 @@ +/* + * Copyright (c) 2004, 2005, 2006 Voltaire, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: iser_memory.c 6964 2006-05-07 11:11:43Z ogerlitz $ + */ +#include +#include +#include +#include +#include +#include +#include + +#include "iscsi_iser.h" + +#define ISER_KMALLOC_THRESHOLD 0x20000 /* 128K - kmalloc limit */ +/** + * Decrements the reference count for the + * registered buffer & releases it + * + * returns 0 if released, 1 if deferred + */ +int iser_regd_buff_release(struct iser_regd_buf *regd_buf) +{ + struct device *dma_device; + + if ((atomic_read(®d_buf->ref_count) == 0) || + atomic_dec_and_test(®d_buf->ref_count)) { + /* if we used the dma mr, unreg is just NOP */ + if (regd_buf->reg.rkey != 0) + iser_unreg_mem(®d_buf->reg); + + if (regd_buf->dma_addr) { + dma_device = regd_buf->device->ib_device->dma_device; + dma_unmap_single(dma_device, + regd_buf->dma_addr, + regd_buf->data_size, + regd_buf->direction); + } + /* else this regd buf is associated with task which we */ + /* dma_unmap_single/sg later */ + return 0; + } else { + iser_dbg("Release deferred, regd.buff: 0x%p\n", regd_buf); + return 1; + } +} + +/** + * iser_reg_single - fills registered buffer descriptor with + * registration information + */ +void iser_reg_single(struct iser_device *device, + struct iser_regd_buf *regd_buf, + enum dma_data_direction direction) +{ + dma_addr_t dma_addr; + + dma_addr = dma_map_single(device->ib_device->dma_device, + regd_buf->virt_addr, + regd_buf->data_size, direction); + BUG_ON(dma_mapping_error(dma_addr)); + + regd_buf->reg.lkey = device->mr->lkey; + regd_buf->reg.rkey = 0; /* indicate there's no need to unreg */ + regd_buf->reg.len = regd_buf->data_size; + regd_buf->reg.va = dma_addr; + + regd_buf->dma_addr = dma_addr; + regd_buf->direction = direction; +} + +/** + * iser_start_rdma_unaligned_sg + */ +int iser_start_rdma_unaligned_sg(struct iscsi_iser_cmd_task *iser_ctask, + enum iser_data_dir cmd_dir) +{ + int dma_nents; + struct device *dma_device; + char *mem = NULL; + struct iser_data_buf *data = &iser_ctask->data[cmd_dir]; + unsigned long cmd_data_len = data->data_len; + + if (cmd_data_len > ISER_KMALLOC_THRESHOLD) + mem = (void *)__get_free_pages(GFP_KERNEL, + long_log2(roundup_pow_of_two(cmd_data_len)) - PAGE_SHIFT); + else + mem = kmalloc(cmd_data_len, GFP_KERNEL); + + if (mem == NULL) { + iser_err("Failed to allocate mem size %d %d for copying sglist\n", + data->size,(int)cmd_data_len); + return -ENOMEM; + } + + if (cmd_dir == ISER_DIR_OUT) { + /* copy the unaligned sg the buffer which is used for RDMA */ + struct scatterlist *sg = (struct scatterlist *)data->buf; + int i; + char *p, *from; + + for (p = mem, i = 0; i < data->size; i++) { + from = kmap_atomic(sg[i].page, KM_USER0); + memcpy(p, + from + sg[i].offset, + sg[i].length); + kunmap_atomic(from, KM_USER0); + p += sg[i].length; + } + } + + sg_init_one(&iser_ctask->data_copy[cmd_dir].sg_single, mem, cmd_data_len); + iser_ctask->data_copy[cmd_dir].buf = + &iser_ctask->data_copy[cmd_dir].sg_single; + iser_ctask->data_copy[cmd_dir].size = 1; + + iser_ctask->data_copy[cmd_dir].copy_buf = mem; + + dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + + if (cmd_dir == ISER_DIR_OUT) + dma_nents = dma_map_sg(dma_device, + &iser_ctask->data_copy[cmd_dir].sg_single, + 1, DMA_TO_DEVICE); + else + dma_nents = dma_map_sg(dma_device, + &iser_ctask->data_copy[cmd_dir].sg_single, + 1, DMA_FROM_DEVICE); + + BUG_ON(dma_nents == 0); + + iser_ctask->data_copy[cmd_dir].dma_nents = dma_nents; + return 0; +} + +/** + * iser_finalize_rdma_unaligned_sg + */ +void iser_finalize_rdma_unaligned_sg(struct iscsi_iser_cmd_task *iser_ctask, + enum iser_data_dir cmd_dir) +{ + struct device *dma_device; + struct iser_data_buf *mem_copy; + unsigned long cmd_data_len; + + dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device; + mem_copy = &iser_ctask->data_copy[cmd_dir]; + + if (cmd_dir == ISER_DIR_OUT) + dma_unmap_sg(dma_device, &mem_copy->sg_single, 1, + DMA_TO_DEVICE); + else + dma_unmap_sg(dma_device, &mem_copy->sg_single, 1, + DMA_FROM_DEVICE); + + if (cmd_dir == ISER_DIR_IN) { + char *mem; + struct scatterlist *sg; + unsigned char *p, *to; + unsigned int sg_size; + int i; + + /* copy back read RDMA to unaligned sg */ + mem = mem_copy->copy_buf; + + sg = (struct scatterlist *)iser_ctask->data[ISER_DIR_IN].buf; + sg_size = iser_ctask->data[ISER_DIR_IN].size; + + for (p = mem, i = 0; i < sg_size; i++){ + to = kmap_atomic(sg[i].page, KM_SOFTIRQ0); + memcpy(to + sg[i].offset, + p, + sg[i].length); + kunmap_atomic(to, KM_SOFTIRQ0); + p += sg[i].length; + } + } + + cmd_data_len = iser_ctask->data[cmd_dir].data_len; + + if (cmd_data_len > ISER_KMALLOC_THRESHOLD) + free_pages((unsigned long)mem_copy->copy_buf, + long_log2(roundup_pow_of_two(cmd_data_len)) - PAGE_SHIFT); + else + kfree(mem_copy->copy_buf); + + mem_copy->copy_buf = NULL; +} + +/** + * iser_sg_to_page_vec - Translates scatterlist entries to physical addresses + * and returns the length of resulting physical address array (may be less than + * the original due to possible compaction). + * + * we build a "page vec" under the assumption that the SG meets the RDMA + * alignment requirements. Other then the first and last SG elements, all + * the "internal" elements can be compacted into a list whose elements are + * dma addresses of physical pages. The code supports also the weird case + * where --few fragments of the same page-- are present in the SG as + * consecutive elements. Also, it handles one entry SG. + */ +static int iser_sg_to_page_vec(struct iser_data_buf *data, + struct iser_page_vec *page_vec) +{ + struct scatterlist *sg = (struct scatterlist *)data->buf; + dma_addr_t first_addr, last_addr, page; + int start_aligned, end_aligned; + unsigned int cur_page = 0; + unsigned long total_sz = 0; + int i; + + /* compute the offset of first element */ + page_vec->offset = (u64) sg[0].offset; + + for (i = 0; i < data->dma_nents; i++) { + total_sz += sg_dma_len(&sg[i]); + + first_addr = sg_dma_address(&sg[i]); + last_addr = first_addr + sg_dma_len(&sg[i]); + + start_aligned = !(first_addr & ~PAGE_MASK); + end_aligned = !(last_addr & ~PAGE_MASK); + + /* continue to collect page fragments till aligned or SG ends */ + while (!end_aligned && (i + 1 < data->dma_nents)) { + i++; + total_sz += sg_dma_len(&sg[i]); + last_addr = sg_dma_address(&sg[i]) + sg_dma_len(&sg[i]); + end_aligned = !(last_addr & ~PAGE_MASK); + } + + first_addr = first_addr & PAGE_MASK; + + for (page = first_addr; page < last_addr; page += PAGE_SIZE) + page_vec->pages[cur_page++] = page; + + } + page_vec->data_size = total_sz; + iser_dbg("page_vec->data_size:%d cur_page %d\n", page_vec->data_size,cur_page); + return cur_page; +} + +#define MASK_4K ((1UL << 12) - 1) /* 0xFFF */ +#define IS_4K_ALIGNED(addr) ((((unsigned long)addr) & MASK_4K) == 0) + +/** + * iser_data_buf_aligned_len - Tries to determine the maximal correctly aligned + * for RDMA sub-list of a scatter-gather list of memory buffers, and returns + * the number of entries which are aligned correctly. Supports the case where + * consecutive SG elements are actually fragments of the same physcial page. + */ +static unsigned int iser_data_buf_aligned_len(struct iser_data_buf *data) +{ + struct scatterlist *sg; + dma_addr_t end_addr, next_addr; + int i, cnt; + unsigned int ret_len = 0; + + sg = (struct scatterlist *)data->buf; + + for (cnt = 0, i = 0; i < data->dma_nents; i++, cnt++) { + /* iser_dbg("Checking sg iobuf [%d]: phys=0x%08lX " + "offset: %ld sz: %ld\n", i, + (unsigned long)page_to_phys(sg[i].page), + (unsigned long)sg[i].offset, + (unsigned long)sg[i].length); */ + end_addr = sg_dma_address(&sg[i]) + + sg_dma_len(&sg[i]); + /* iser_dbg("Checking sg iobuf end address " + "0x%08lX\n", end_addr); */ + if (i + 1 < data->dma_nents) { + next_addr = sg_dma_address(&sg[i+1]); + /* are i, i+1 fragments of the same page? */ + if (end_addr == next_addr) + continue; + else if (!IS_4K_ALIGNED(end_addr)) { + ret_len = cnt + 1; + break; + } + } + } + if (i == data->dma_nents) + ret_len = cnt; /* loop ended */ + iser_dbg("Found %d aligned entries out of %d in sg:0x%p\n", + ret_len, data->dma_nents, data); + return ret_len; +} + +static void iser_data_buf_dump(struct iser_data_buf *data) +{ + struct scatterlist *sg = (struct scatterlist *)data->buf; + int i; + + for (i = 0; i < data->size; i++) + iser_err("sg[%d] dma_addr:0x%lX page:0x%p " + "off:%d sz:%d dma_len:%d\n", + i, (unsigned long)sg_dma_address(&sg[i]), + sg[i].page, sg[i].offset, + sg[i].length,sg_dma_len(&sg[i])); +} + +static void iser_dump_page_vec(struct iser_page_vec *page_vec) +{ + int i; + + iser_err("page vec length %d data size %d\n", + page_vec->length, page_vec->data_size); + for (i = 0; i < page_vec->length; i++) + iser_err("%d %lx\n",i,(unsigned long)page_vec->pages[i]); +} + +static void iser_page_vec_build(struct iser_data_buf *data, + struct iser_page_vec *page_vec) +{ + int page_vec_len = 0; + + page_vec->length = 0; + page_vec->offset = 0; + + iser_dbg("Translating sg sz: %d\n", data->dma_nents); + page_vec_len = iser_sg_to_page_vec(data,page_vec); + iser_dbg("sg len %d page_vec_len %d\n", data->dma_nents,page_vec_len); + + page_vec->length = page_vec_len; + + if (page_vec_len * PAGE_SIZE < page_vec->data_size) { + iser_err("page_vec too short to hold this SG\n"); + iser_data_buf_dump(data); + iser_dump_page_vec(page_vec); + BUG(); + } +} + +/** + * iser_reg_rdma_mem - Registers memory intended for RDMA, + * obtaining rkey and va + * + * returns 0 on success, errno code on failure + */ +int iser_reg_rdma_mem(struct iscsi_iser_cmd_task *iser_ctask, + enum iser_data_dir cmd_dir) +{ + struct iser_conn *ib_conn = iser_ctask->iser_conn->ib_conn; + struct iser_data_buf *mem = &iser_ctask->data[cmd_dir]; + struct iser_regd_buf *regd_buf; + int aligned_len; + int err; + + regd_buf = &iser_ctask->rdma_regd[cmd_dir]; + + aligned_len = iser_data_buf_aligned_len(mem); + if (aligned_len != mem->size) { + iser_err("rdma alignment violation %d/%d aligned\n", + aligned_len, mem->size); + iser_data_buf_dump(mem); + /* allocate copy buf, if we are writing, copy the */ + /* unaligned scatterlist, dma map the copy */ + if (iser_start_rdma_unaligned_sg(iser_ctask, cmd_dir) != 0) + return -ENOMEM; + mem = &iser_ctask->data_copy[cmd_dir]; + } + + iser_page_vec_build(mem, ib_conn->page_vec); + err = iser_reg_page_vec(ib_conn, ib_conn->page_vec, ®d_buf->reg); + if (err) + return err; + + /* take a reference on this regd buf such that it will not be released * + * (eg in send dto completion) before we get the scsi response */ + atomic_inc(®d_buf->ref_count); + return 0; +} From ogerlitz at voltaire.com Thu May 11 00:03:30 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 11 May 2006 10:03:30 +0300 (IDT) Subject: [openib-general] [PATCH 6/6] iSER Kconfig and Makefile In-Reply-To: Message-ID: Kconfig and Makefile for iSER. Signed-off-by: Or Gerlitz --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/Kconfig 1970-01-01 02:00:00.000000000 +0200 +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/Kconfig 2006-05-10 15:32:01.000000000 +0300 @@ -0,0 +1,12 @@ +config INFINIBAND_ISER + tristate "ISCSI RDMA Protocol" + depends on INFINIBAND && SCSI + select SCSI_ISCSI_ATTRS + ---help--- + + Support for the ISCSI RDMA Protocol over InfiniBand. This + allows you to access storage devices that speak ISER/ISCSI + over InfiniBand. + + The ISER protocol is defined by IETF. + See . --- /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/Makefile 1970-01-01 02:00:00.000000000 +0200 +++ /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser/Makefile 2006-05-10 15:32:01.000000000 +0300 @@ -0,0 +1,8 @@ +EXTRA_CFLAGS += -Idrivers/infiniband/include + +obj-$(CONFIG_INFINIBAND_ISER) += ib_iser.o + +ib_iser-y := iser_verbs.o \ + iser_initiator.o \ + iser_memory.o \ + iscsi_iser.o From crltns at technoscan.com Thu May 11 00:09:41 2006 From: crltns at technoscan.com (Jimmy Harden) Date: Thu, 11 May 2006 09:09:41 +0200 Subject: [openib-general] chrome Message-ID: <001e01c674ca$fb3c8103$b2d74b52@jmwfvv> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: refurbishment.gif Type: image/gif Size: 29345 bytes Desc: not available URL: From or.gerlitz at gmail.com Thu May 11 00:33:09 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Thu, 11 May 2006 10:33:09 +0300 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> Message-ID: <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> On 5/11/06, Roland Dreier wrote: > Or> OK., I see now that as of few hours ago the second iscsi > Or> update for 2.6.18 was commited there which means iser should > Or> compile with it, you can go ahead pull it! > > Great, I've got it. Can you resend the iSER patches with changelog > entries for each patch and a Signed-off-by: line too? I don't see the niether of the two iscsi updates for 2.6.18 (both sent by Mike Christie) in your git tree, i was looking for it all over (in the for-2.6.18 , for-mm, master, for-linus branches ...). Do i missing anything or you were waiting for my repost of the patches to pull the iscsi updates? Anyway, just to make sure, the IB updates for 2.6.18 will be present in the for-2.6.18 branch of your git tree, correct? Or. From countydoor at gloomba.com Thu May 11 00:37:19 2006 From: countydoor at gloomba.com (Nickolas Roberts) Date: Thu, 11 May 2006 15:37:19 +0800 Subject: [openib-general] Be the "biggest" out of all your friends Message-ID: <000001c674f8$5bf2fb80$0100007f@localhost> In a trice without warning the face of nature grew sullen Black angry mouths, the clouds swallowed up the sun The air was dense with suppressed excitement The wind howled through the long corridors and sobbed and whisperedin the secret recesses -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: top.jpg Type: image/jpeg Size: 8387 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: down.gif Type: image/gif Size: 7523 bytes Desc: not available URL: From karin17ari at yahoo.ca Thu May 11 00:54:15 2006 From: karin17ari at yahoo.ca (=?shift-jis?B?YWU=?=) Date: Thu, 11 May 2006 00:54:15 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCJCIkZCRHJDkhIxsoQg==?= Message-ID: <20060511075415.689EA2283EB@openib.ca.sandia.gov> あやっていいます。暇なら会ってエッチしませんか?急にビックリですよね。 どうしてもエッチがしたしたくて、私は今からでも平気です。 写真こっち http://www.deai-news24.net/?a02  で確認してOKだったら、携帯のアドレスも載ってるから、そっちにメールしてほしいです。 メルいらない? s_for_sweetbaby at yahoo.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Thu May 11 00:44:36 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 11 May 2006 10:44:36 +0300 Subject: [openib-general] Re: rdma_cm.h: comment nits. In-Reply-To: <1147295357.5093.62.camel@trinity.ogc.int> References: <20060508135855.GE21036@mellanox.co.il> <1147231080.5093.13.camel@trinity.ogc.int> <44621449.9080507@ichips.intel.com> <1147295357.5093.62.camel@trinity.ogc.int> Message-ID: <20060511074435.GJ10669@mellanox.co.il> Quoting r. Tom Tucker : > So... all that said, I could in fact support rdma_reject on an active > side connection. But this would effectively reduce to a QP --> ERROR and > I doubt this matches the semantics you're looking for. Why not? Sounds good. -- MST From mst at mellanox.co.il Thu May 11 00:54:32 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 11 May 2006 10:54:32 +0300 Subject: [openib-general] Re: rdma_cm.h: comment nits. In-Reply-To: <44622D64.1040800@ichips.intel.com> References: <20060508135855.GE21036@mellanox.co.il> <1147231080.5093.13.camel@trinity.ogc.int> <44621449.9080507@ichips.intel.com> <20060510163523.GK22825@mellanox.co.il> <44622D64.1040800@ichips.intel.com> Message-ID: <20060511075432.GK10669@mellanox.co.il> Quoting r. Sean Hefty : > The intent is to keep connection establishment simple. Socket users are > used to calling connect on the active side, and listen/accept on the > passive side. The RDMA CM interface is consistent with that. Sure. But you'll notice ESTABLISHED is something we have invented to map better to asynchronous API. Its purely by choice that the same event is reporting active and passive connections. All I am saying is two different events for active/passive side would be better. It could be RESPONSE/ESTABLISHED or ACTIVE_ESTABLISHED/PASSIVE_ESTABLISHED, or whatever. What SDP does: looking at response private data and taking reject/accept decisions based on this - is I think both portable and needed by other ULPs. Note we don't need to pass any private data in the reject. -- MST From mst at mellanox.co.il Thu May 11 00:56:08 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 11 May 2006 10:56:08 +0300 Subject: [openib-general] Re: rdma_cm.h: comment nits. In-Reply-To: <1147308499.5093.85.camel@trinity.ogc.int> References: <54AD0F12E08D1541B826BE97C98F99F149EF5A@NT-SJCA-0751.brcm.ad.broadcom.com> <1147308499.5093.85.camel@trinity.ogc.int> Message-ID: <20060511075608.GL10669@mellanox.co.il> Quoting r. Tom Tucker : > Subject: RE: rdma_cm.h: comment nits. > > On Wed, 2006-05-10 at 14:20 -0700, Caitlin Bestler wrote: > > Tom Tucker wrote: > > > > > > > > So... all that said, I could in fact support rdma_reject on > > > an active side connection. But this would effectively reduce > > > to a QP --> ERROR and I doubt this matches the semantics > > > you're looking for. > > > > > > > > > > And you could send an RST. > > Yep, in fact that's what many RNIC's do when you move the QP to ERROR > instead of CLOSING. > > > There's just no way to send any user > > supplied private data. It's not just unreliable, it's guaranteed > > not to arrive. It's still a long way from the truly desired > > semantics, but the wire protocol just doesn't carry that info. > > > > Yeah, I think you're correct -- it would be a bogus "emulation". I don't think any real ULP passes private data inside the Reject. Private data in response (SYN/ACK) is clearly portable, is it not? -- MST From bugzilla-daemon at openib.org Thu May 11 02:03:45 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 11 May 2006 02:03:45 -0700 (PDT) Subject: [openib-general] [Bug 78] OFED 1.0 RC 4 iser install fails if patches already applied Message-ID: <20060511090345.447362283EB@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=78 danb at voltaire.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From halr at voltaire.com Thu May 11 03:57:32 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 May 2006 06:57:32 -0400 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: References: <1147310565.4485.56947.camel@hal.voltaire.com> Message-ID: <1147345049.4485.68071.camel@hal.voltaire.com> On Thu, 2006-05-11 at 00:56, Roland Dreier wrote: > Hal> Huh ? In this case, aren't the subnet prefixes are required > Hal> to be different ? > > It's kind of a crazy thing to do but I don't see anything in the IB > spec that forbids two subnets with the same subnet prefix, There's errata against the current confusion in the IBA spec in terms of GID v. subnet prefix. The bottom line on this is: Each subnet is uniquely identified with a subnet ID known as the Subnet Prefix. > or any reason why a router couldn't route between them. The SMs would just > have to be smart enough to return the LID of the router for paths to > ports on the other subnet, and the routers would have to have explicit > routes rather than forwarding based on just GID prefix. Assuming the above is ignored (and the subnet prefixes are not unique), the routers along any particular path would just have explicit routes for one of these duplicate subnets, right ? -- Hal > - R. From halr at voltaire.com Thu May 11 04:20:19 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 May 2006 07:20:19 -0400 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <20060511054803.GE26684@obsidianresearch.com> References: <1147310565.4485.56947.camel@hal.voltaire.com> <20060511054803.GE26684@obsidianresearch.com> Message-ID: <1147346418.4485.68543.camel@hal.voltaire.com> On Thu, 2006-05-11 at 01:48, Jason Gunthorpe wrote: > On Wed, May 10, 2006 at 09:56:58PM -0700, Roland Dreier wrote: > > Hal> Huh ? In this case, aren't the subnet prefixes are required > > Hal> to be different ? > > > > It's kind of a crazy thing to do but I don't see anything in the IB > > spec that forbids two subnets with the same subnet prefix, or any > > reason why a router couldn't route between them. The SMs would just > > have to be smart enough to return the LID of the router for paths to > > ports on the other subnet, and the routers would have to have explicit > > routes rather than forwarding based on just GID prefix. > > Hmm, this is an interesting point, you can do this in IP land using > host routes. > > How about this - the Path record (and related) SA responses include > the Hop Limit fields and the spec says: > > 8.3.6 Hop Limit: [..] Setting this value to 0 or 1 will ensure that > the packet will not be forwarded beyond the local subnet. > > So, it is within the spec to use HopLmt >= 2 as the GRH required flag. That would be a simpler check but HopLimit is not a required component of PathRecord but I think this may not be sufficient as just because a HopLimit >= 2 doesn't mean that a packet would be forwarded off subnet. > I'd propose that the combination of a non-link-local prefix and a >= 2 > Hop Limit should force a GRH. SM's that do not support routers should > always fill in 0 for HopLmt. Why is a request with just a non link local prefix (with HopLimit wildcarded) not sufficient ? -- Hal > Jason From eitan at mellanox.co.il Thu May 11 04:58:58 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 11 May 2006 14:58:58 +0300 Subject: [openib-general] question regarding GRH flag in ib_ah_attr Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB57@mtlexch01.mtl.com> I agree with Hal. If you look for Path Record to ANOTHER subnet you should provide the GRH in the sent packet address ... Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Hal Rosenstock > Sent: Thursday, May 11, 2006 2:20 PM > To: Jason Gunthorpe > Cc: Roland Dreier; openib-general at openib.org > Subject: Re: [openib-general] question regarding GRH flag in ib_ah_attr > > On Thu, 2006-05-11 at 01:48, Jason Gunthorpe wrote: > > On Wed, May 10, 2006 at 09:56:58PM -0700, Roland Dreier wrote: > > > Hal> Huh ? In this case, aren't the subnet prefixes are required > > > Hal> to be different ? > > > > > > It's kind of a crazy thing to do but I don't see anything in the IB > > > spec that forbids two subnets with the same subnet prefix, or any > > > reason why a router couldn't route between them. The SMs would just > > > have to be smart enough to return the LID of the router for paths to > > > ports on the other subnet, and the routers would have to have explicit > > > routes rather than forwarding based on just GID prefix. > > > > Hmm, this is an interesting point, you can do this in IP land using > > host routes. > > > > How about this - the Path record (and related) SA responses include > > the Hop Limit fields and the spec says: > > > > 8.3.6 Hop Limit: [..] Setting this value to 0 or 1 will ensure that > > the packet will not be forwarded beyond the local subnet. > > > > So, it is within the spec to use HopLmt >= 2 as the GRH required flag. > > That would be a simpler check but HopLimit is not a required component > of PathRecord but I think this may not be sufficient as just because a > HopLimit >= 2 doesn't mean that a packet would be forwarded off subnet. > > > I'd propose that the combination of a non-link-local prefix and a >= 2 > > Hop Limit should force a GRH. SM's that do not support routers should > > always fill in 0 for HopLmt. > > Why is a request with just a non link local prefix (with HopLimit > wildcarded) not sufficient ? > > -- Hal > > > Jason > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From Thomas.Talpey at netapp.com Thu May 11 05:44:13 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Thu, 11 May 2006 08:44:13 -0400 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <1147307540.5093.71.camel@trinity.ogc.int> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> <7.0.1.0.2.20060510063730.04336f80@netapp.com> <1147307540.5093.71.camel@trinity.ogc.int> Message-ID: <7.0.1.0.2.20060511084302.04592e18@netapp.com> I certainly won't shoot you - I agree. The other risk of the current FMRs is that people will think the "F" means "Fast". Tom. At 08:32 PM 5/10/2006, Tom Tucker wrote: >On Wed, 2006-05-10 at 08:53 -0700, Roland Dreier wrote: >> Thomas> I am planning to test this some more in the next few >> Thomas> weeks, but what I'd really like to see is an IBTA >> Thomas> 1.2-compliant implementation, and one that operated on >> Thomas> work queue entries (not synchronous verbs). Is that being >> Thomas> worked on? >> >> No current hardware supports that as far as I know. (Well, ipath >> could fake it since they already implement all the verbs in software) >> > >I'm almost certain I'll be shot for saying this, but isn't there a >danger of confusion with real FMRs when the HW shows up? If the benefit >isn't there -- why do it if the application outcomes are almost >certainly all bad? From jlentini at netapp.com Thu May 11 06:11:16 2006 From: jlentini at netapp.com (James Lentini) Date: Thu, 11 May 2006 09:11:16 -0400 (EDT) Subject: [openib-general] Re: [DAPL] latest DAPL cannot be compiled with the latest librdmacm In-Reply-To: <200605101652.26604.dotanb@mellanox.co.il> References: <200605101652.26604.dotanb@mellanox.co.il> Message-ID: On Wed, 10 May 2006, Dotan Barak wrote: > Hi. > > The latest DAPL cannot be compiled with the latest librdmacm after > an API change in the librdmacm. Arlin sent a patch for this. I'll test it out and update DAPL by tomorrow at the latest (I'm out of the office today, I might have some time tonight). james From halr at voltaire.com Thu May 11 06:09:57 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 May 2006 09:09:57 -0400 Subject: [openib-general] Re: [PATCH] opensm: complib: cleanup unused cl_obj files In-Reply-To: <20060510171820.31351.20465.stgit@sashak.voltaire.com> References: <20060510171820.31351.20465.stgit@sashak.voltaire.com> Message-ID: <1147352994.4485.70526.camel@hal.voltaire.com> On Wed, 2006-05-10 at 13:18, Sasha Khapyorsky wrote: > Cleanup unused cl_obj source and header files from complib. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only. -- Hal From halr at voltaire.com Thu May 11 06:18:35 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 May 2006 09:18:35 -0400 Subject: [openib-general] Re: [PATCH] opensm: complib: cleanup unsused cl_reqmgr files In-Reply-To: <20060510171925.31364.89893.stgit@sashak.voltaire.com> References: <20060510171925.31364.89893.stgit@sashak.voltaire.com> Message-ID: <1147353509.4485.70696.camel@hal.voltaire.com> On Wed, 2006-05-10 at 13:19, Sasha Khapyorsky wrote: > Cleanup unused cl_reqmgr source and header files from complib. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only. -- Hal From mst at mellanox.co.il Thu May 11 06:29:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 11 May 2006 16:29:54 +0300 Subject: [openib-general][patch review] srp: fmr implementation, In-Reply-To: <1147307540.5093.71.camel@trinity.ogc.int> References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F17@mtlexch01.mtl.com> <7.0.1.0.2.20060510063730.04336f80@netapp.com> <1147307540.5093.71.camel@trinity.ogc.int> Message-ID: <20060511132954.GD555@mellanox.co.il> Quoting r. Tom Tucker : > If the benefit isn't there -- why do it if the application outcomes are almost > certainly all bad? I think I saw Vu's earlier post with numbers showing FMR performance benefits. I don't know what causes QP hangs and event hangs for Thomas - I don't think I ever saw such behaviour e.g. in SDP with or without FMRs, I've no idea what is NFS/RDMA doing differently. -- MST From halr at voltaire.com Thu May 11 06:24:26 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 May 2006 09:24:26 -0400 Subject: [openib-general] Re: [PATCH] opensm: complib: remove nonexsted symbols from tha map file In-Reply-To: <20060510173139.27729.43839.stgit@sashak.voltaire.com> References: <20060510173139.27729.43839.stgit@sashak.voltaire.com> Message-ID: <1147353865.4485.70819.camel@hal.voltaire.com> On Wed, 2006-05-10 at 13:31, Sasha Khapyorsky wrote: > This removes nonexisted symbols from comlib map file. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only. -- Hal From glebn at voltaire.com Thu May 11 06:42:17 2006 From: glebn at voltaire.com (Gleb Natapov) Date: Thu, 11 May 2006 16:42:17 +0300 Subject: [openib-general] [resend][RFC][PATCH] adding call to madvise Message-ID: <20060511134217.GW5319@minantech.com> Hello I've sent this mail a week ago and had no response. I hope this is not because of lack of interest in user space support :) Anyway I repost it one more time to get the feedback and to find the way to include this patch to openib ASAP and not wait till madvise defines will propogate to libc. ----- Forwarded message from Gleb Natapov ----- Hello Roland, Included patch adds call to madvise(MADV_DO[NT]FORK) to libibverbs and libmthca. In libibverbs it uses memory.c to do reference counting on overlapping user registrations and in libmthca it marks all internal qp/cq memory. The MADV_DOFORK/MADV_DONTFORK defines not yet propagate to libc so I added them in local header files just to be able to compile. I think the proper way to handle this is in configure. Suggestions are welcome. Note that this patch also changes ABI since struct ibv_mr is bigger now. Index: libibverbs/include/infiniband/verbs.h =================================================================== --- libibverbs/include/infiniband/verbs.h (revision 7112) +++ libibverbs/include/infiniband/verbs.h (working copy) @@ -289,6 +289,8 @@ uint32_t handle; uint32_t lkey; uint32_t rkey; + void *addr; + size_t length; }; struct ibv_global_route { Index: libibverbs/src/verbs.c =================================================================== --- libibverbs/src/verbs.c (revision 7112) +++ libibverbs/src/verbs.c (working copy) @@ -154,10 +154,15 @@ { struct ibv_mr *mr; + ibv_dontfork_range(addr, length); mr = pd->context->ops.reg_mr(pd, addr, length, access); if (mr) { mr->context = pd->context; mr->pd = pd; + mr->addr = addr; + mr->length = length; + } else { + ibv_dofork_range(addr, length); } return mr; @@ -165,7 +170,12 @@ int ibv_dereg_mr(struct ibv_mr *mr) { - return mr->context->ops.dereg_mr(mr); + int rc = mr->context->ops.dereg_mr(mr); + + if (!rc) + ibv_dofork_range(mr->addr, mr->length); + + return rc; } static struct ibv_comp_channel *ibv_create_comp_channel_v2(struct ibv_context *context) Index: libibverbs/src/ibverbs.h =================================================================== --- libibverbs/src/ibverbs.h (revision 7112) +++ libibverbs/src/ibverbs.h (working copy) @@ -61,8 +61,8 @@ extern HIDDEN int ibverbs_init(struct ibv_device ***list); extern HIDDEN int ibv_init_mem_map(void); -extern HIDDEN int ibv_lock_range(void *base, size_t size); -extern HIDDEN int ibv_unlock_range(void *base, size_t size); +extern HIDDEN int ibv_dontfork_range(void *base, size_t size); +extern HIDDEN int ibv_dofork_range(void *base, size_t size); #define IBV_INIT_CMD(cmd, size, opcode) \ do { \ @@ -85,4 +85,11 @@ (cmd)->response = (uintptr_t) (out); \ } while (0) +#ifndef MADV_DONTFORK +#define MADV_DONTFORK 10 +#endif +#ifndef MADV_DOFORK +#define MADV_DOFORK 11 +#endif + #endif /* IB_VERBS_H */ Index: libibverbs/src/memory.c =================================================================== --- libibverbs/src/memory.c (revision 7112) +++ libibverbs/src/memory.c (working copy) @@ -136,7 +136,7 @@ node->next->prev = node->prev; } -int ibv_lock_range(void *base, size_t size) +int ibv_dontfork_range(void *base, size_t size) { uintptr_t start, end; struct ibv_mem_node *node, *tmp; @@ -187,8 +187,8 @@ if (node->refcnt++ == 0) { - ret = mlock((void *) node->start, - node->end - node->start + 1); + ret = madvise((void *) node->start, + node->end - node->start + 1, MADV_DONTFORK); if (ret) goto out; } @@ -202,7 +202,7 @@ return ret; } -int ibv_unlock_range(void *base, size_t size) +int ibv_dofork_range(void *base, size_t size) { uintptr_t start, end; struct ibv_mem_node *node, *tmp; @@ -226,8 +226,8 @@ while (node && node->end <= end) { if (--node->refcnt == 0) { - ret = munlock((void *) node->start, - node->end - node->start + 1); + ret = madvise((void *) node->start, + node->end - node->start + 1, MADV_DOFORK); } if (__mm_prev(node) && node->refcnt == __mm_prev(node)->refcnt) { Index: libmthca/src/mthca.h =================================================================== --- libmthca/src/mthca.h (revision 7112) +++ libmthca/src/mthca.h (working copy) @@ -341,4 +341,10 @@ int mthca_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); int mthca_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); +#ifndef MADV_DONTFORK +#define MADV_DONTFORK 10 +#endif +#ifndef MADV_DOFORK +#define MADV_DOFORK 11 +#endif #endif /* MTHCA_H */ Index: libmthca/src/verbs.c =================================================================== --- libmthca/src/verbs.c (revision 7112) +++ libmthca/src/verbs.c (working copy) @@ -134,6 +134,9 @@ return NULL; } + mr->addr = addr; + mr->length = length; + return mr; } @@ -188,6 +191,7 @@ if (!cq->buf) goto err; + madvise(cq->buf, cqe * MTHCA_CQ_ENTRY_SIZE, MADV_DONTFORK); cq->mr = __mthca_reg_mr(to_mctx(context)->pd, cq->buf, cqe * MTHCA_CQ_ENTRY_SIZE, 0, IBV_ACCESS_LOCAL_WRITE); @@ -247,6 +251,7 @@ mthca_dereg_mr(cq->mr); err_buf: + madvise(cq->buf, cqe * MTHCA_CQ_ENTRY_SIZE, MADV_DOFORK); free(cq->buf); err: @@ -278,6 +283,7 @@ goto out; } + madvise(buf, cqe * MTHCA_CQ_ENTRY_SIZE, MADV_DONTFORK); mr = __mthca_reg_mr(to_mctx(ibcq->context)->pd, buf, cqe * MTHCA_CQ_ENTRY_SIZE, 0, IBV_ACCESS_LOCAL_WRITE); @@ -302,12 +308,14 @@ mthca_cq_resize_copy_cqes(cq, buf, old_cqe); mthca_dereg_mr(cq->mr); + madvise(cq->mr->addr, cq->mr->length, MADV_DOFORK); free(cq->buf); cq->buf = buf; cq->mr = mr; out: + madvise(buf, cqe * MTHCA_CQ_ENTRY_SIZE, MADV_DOFORK); pthread_spin_unlock(&cq->lock); return ret; } @@ -328,6 +336,7 @@ } mthca_dereg_mr(to_mcq(cq)->mr); + madvise(to_mcq(cq)->mr->addr, to_mcq(cq)->mr->length, MADV_DOFORK); free(to_mcq(cq)->buf); free(to_mcq(cq)); @@ -381,6 +390,7 @@ if (mthca_alloc_srq_buf(pd, &attr->attr, srq)) goto err; + madvise(srq->buf, srq->buf_size, MADV_DONTFORK); srq->mr = __mthca_reg_mr(pd, srq->buf, srq->buf_size, 0, 0); if (!srq->mr) goto err_free; @@ -421,6 +431,7 @@ mthca_dereg_mr(srq->mr); err_free: + madvise(srq->buf, srq->buf_size, MADV_DOFORK); free(srq->wrid); free(srq->buf); @@ -460,6 +471,7 @@ to_msrq(srq)->db_index); mthca_dereg_mr(to_msrq(srq)->mr); + madvise(to_msrq(srq)->mr->addr, to_msrq(srq)->mr->length, MADV_DOFORK); free(to_msrq(srq)->buf); free(to_msrq(srq)->wrid); @@ -499,6 +511,7 @@ pthread_spin_init(&qp->rq.lock, PTHREAD_PROCESS_PRIVATE)) goto err_free; + madvise(qp->buf, qp->buf_size, MADV_DONTFORK); qp->mr = __mthca_reg_mr(pd, qp->buf, qp->buf_size, 0, 0); if (!qp->mr) goto err_free; @@ -565,6 +578,7 @@ mthca_dereg_mr(qp->mr); err_free: + madvise(qp->buf, qp->buf_size, MADV_DOFORK); free(qp->wrid); free(qp->buf); @@ -647,6 +661,7 @@ } mthca_dereg_mr(to_mqp(qp)->mr); + madvise(to_mqp(qp)->mr->addr, to_mqp(qp)->mr->length, MADV_DOFORK); free(to_mqp(qp)->buf); free(to_mqp(qp)->wrid); Index: libmthca/src/ah.c =================================================================== --- libmthca/src/ah.c (revision 7112) +++ libmthca/src/ah.c (working copy) @@ -64,8 +64,10 @@ return NULL; } + madvise(page->buf, page_size, MADV_DONTFORK); page->mr = mthca_reg_mr(&pd->ibv_pd, page->buf, page_size, 0); if (!page->mr) { + madvise(page->buf, page_size, MADV_DOFORK); free(page->buf); free(page); return NULL; @@ -183,6 +185,7 @@ page->next->prev = page->prev; mthca_dereg_mr(page->mr); + madvise(page->mr->addr, page->mr->length, MADV_DOFORK); free(page->buf); free(page); } ----- End forwarded message ----- -- Gleb. From bugzilla-daemon at openib.org Thu May 11 07:34:10 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 11 May 2006 07:34:10 -0700 (PDT) Subject: [openib-general] [Bug 68] OFED 1.0 rc4: kernel build failed in IB core on SUSE10 Message-ID: <20060511143410.C899622865B@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=68 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |vlad at mellanox.co.il ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Thu May 11 07:37:52 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 11 May 2006 07:37:52 -0700 (PDT) Subject: [openib-general] [Bug 72] OFED 1.0: Make IPoIB default configurations sane Message-ID: <20060511143752.E73B122865B@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=72 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |vlad at mellanox.co.il ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Thu May 11 08:12:35 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 11 May 2006 18:12:35 +0300 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: References: <20060509172703.GA22825@mellanox.co.il> Message-ID: <20060511151235.GA1610@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [PATCH] cm refcount race fix > > Here's a patch for all of the files that you listed. > > I did do some basic testing and didn't see any issues. > > Signed-off-by: Sean Hefty This has run for a couple of nights here without issues. Please commit. I also think we shall push this patch for 2.6.17 - it is clean and simple enough. Agree? -- MST From xma at us.ibm.com Thu May 11 09:02:19 2006 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 11 May 2006 09:02:19 -0700 Subject: [openib-general] 2.6.17 and 2.6.18 merge plans In-Reply-To: Message-ID: Roland, By all the data I have collected so far I think it's not a good idea to have while loop poll_cq() under IB hardware interrupt context. poll_cq() is very expensive, and it increases other hardwares' interrupt latency. If we move this out of hardware interrupt context, latency would be inceased anyway. I have done lots of tests on splitting CQ + work queue on recv/send + remove tx_ring patches over mthca. Both SMP and UP unidirectional throughput gets improved from 20% - 75% w o/i tuning. The latency has increased between 4-10% on mthca. The interesting result is UP performance is good. I used hyperthread CPU running all these tests, don't know whether it's the reason. If you think there are enough time to review these patches and have more chance to be merged into 2.6.17/18, I will clean and submit these patches ASAP, and test on ehca if none multi-threads ehca is available. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Thu May 11 09:01:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 09:01:05 -0700 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> (Or Gerlitz's message of "Thu, 11 May 2006 10:33:09 +0300") References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> Message-ID: Or> I don't see the niether of the two iscsi updates for 2.6.18 Or> (both sent by Mike Christie) in your git tree, i was looking Or> for it all over (in the for-2.6.18 , for-mm, master, for-linus Or> branches ...). Do i missing anything or you were waiting for Or> my repost of the patches to pull the iscsi updates? Yeah, I haven't pushed it out yet. I will be putting iSER into an iser branch of my tree, which I'll ask Linus to pull once the SCSI changes are in his tree. - R. From rdreier at cisco.com Thu May 11 09:04:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 09:04:34 -0700 Subject: [openib-general] 2.6.17 and 2.6.18 merge plans In-Reply-To: (Shirley Ma's message of "Thu, 11 May 2006 09:02:19 -0700") References: Message-ID: Shirley> If you think there are enough time to review these Shirley> patches and have more chance to be merged into 2.6.17/18, Shirley> I will clean and submit these patches ASAP, and test on Shirley> ehca if none multi-threads ehca is available. 2.6.17 is closed for this sort of thing already. But 2.6.18 is a possibility. - R. From rdreier at cisco.com Thu May 11 09:05:30 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 09:05:30 -0700 Subject: [openib-general] Re: 2.6.17 and 2.6.18 merge plans In-Reply-To: <20060511053939.GA30842@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 11 May 2006 08:39:39 +0300") References: <20060511053939.GA30842@mellanox.co.il> Message-ID: Michael> How about module unloading races? Solving them by Michael> flushing WQs is not very elegant but no better solution Michael> surfaced either. Should I repost the patches? I think we already rejected that approach, so I don't think there's much point in that. From rdreier at cisco.com Thu May 11 09:19:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 09:19:13 -0700 Subject: [openib-general] [resend][RFC][PATCH] adding call to madvise In-Reply-To: <20060511134217.GW5319@minantech.com> (Gleb Natapov's message of "Thu, 11 May 2006 16:42:17 +0300") References: <20060511134217.GW5319@minantech.com> Message-ID: Gleb> Hello I've sent this mail a week ago and had no response. I Gleb> hope this is not because of lack of interest in user space Gleb> support :) Anyway I repost it one more time to get the Gleb> feedback and to find the way to include this patch to openib Gleb> ASAP and not wait till madvise defines will propogate to Gleb> libc. Sorry, I missed it the first time around. In general this seems good, but I have a few quick comments: > +#ifndef MADV_DONTFORK > +#define MADV_DONTFORK 10 > +#endif > +#ifndef MADV_DOFORK > +#define MADV_DOFORK 11 > +#endif This should probably be in the only file that uses it, memory.c. And I think it's cleanest to use autoconf to check if MADV_DONTFORK and MADV_DOFORK are available. > --- libibverbs/include/infiniband/verbs.h (revision 7112) > +++ libibverbs/include/infiniband/verbs.h (working copy) > @@ -289,6 +289,8 @@ > uint32_t handle; > uint32_t lkey; > uint32_t rkey; > + void *addr; > + size_t length; > }; This breaks ABI, right? > --- libmthca/src/verbs.c (revision 7112) > +++ libmthca/src/verbs.c (working copy) > @@ -134,6 +134,9 @@ > return NULL; > } > > + mr->addr = addr; > + mr->length = length; > + > return mr; > } What's the reason to set addr and length here? Doesn't libibverbs already do it? > if (!cq->buf) > goto err; > > + madvise(cq->buf, cqe * MTHCA_CQ_ENTRY_SIZE, MADV_DONTFORK); > cq->mr = __mthca_reg_mr(to_mctx(context)->pd, cq->buf, > cqe * MTHCA_CQ_ENTRY_SIZE, > 0, IBV_ACCESS_LOCAL_WRITE); > @@ -247,6 +251,7 @@ > mthca_dereg_mr(cq->mr); > > err_buf: > + madvise(cq->buf, cqe * MTHCA_CQ_ENTRY_SIZE, MADV_DOFORK); > free(cq->buf); It seems it would be better to put the DONTFORK call into mthca_alloc_cq_buf(), and the DOFORK into a new mthca_free_cq_buf() call. Actually, to handle the QP and SRQ cases too it's probably better to have wrappers for posix_memalign() and free() to keep this encapsulated in one place. - R. From sweitzen at cisco.com Thu May 11 09:29:02 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Thu, 11 May 2006 09:29:02 -0700 Subject: [openib-general] OFED 1.0 rc4 won't compile on orig FC5 kernel Message-ID: Is this a useful kernel to try, or should get latest FC5 kernel or 2.6.16 from kernel.org? gcc -Wp,-MD,/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core /. sysfs.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.1.0/include -D_ _KERNEL__ -I/var/tmp/OFED/tmp/openib/openib/include -I/var/tmp/OFED/tmp/openib/ openib/src/linux-kernel/infiniband/include -Iinclude -Wall -Wundef -Wstrict-p rototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -ffreestanding -Os -fomit-frame-pointer -g -march=k8 -mtune=nocona -m64 -mno-red-zone -mcmodel=ker nel -pipe -fno-reorder-blocks -Wno-sign-compare -fno-asynchronous-unwind-tabl es -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wdeclaration-after-s tatement -Wno-pointer-sign -I/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/i nfiniband/include -I/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband /ulp/ipoib -I/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/ulp/kd apl -I/var/tmp/OFED/tmp/openib/openib/drivers/infiniband/debug -DMODULE -D"KBU ILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(sysfs)" -D"KBUILD_MODNAME=KBUILD_S TR(ib_core)" -c -o /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/c ore/.tmp_sysfs.o /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/cor e/sysfs.c In file included from include/asm/pci.h:9, from include/linux/pci.h:648, from /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniban d/include/rdma/ib_mad.h:42, from /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniban d/core/sysfs.c:42: include/linux/mm.h: In function 'kernel_map_pages': include/linux/mm.h:1055: warning: implicit declaration of function 'mutex_debug_ check_no_locks_freed' /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/sysfs.c : In fun ction 'ib_device_uevent': /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/sysfs.c :443: wa rning: implicit declaration of function 'add_hotplug_env_var' /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/sysfs.c : At top level: /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/sysfs.c :674: er ror: unknown field 'hotplug' specified in initializer /var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/sysfs.c :674: wa rning: initialization from incompatible pointer type make[3]: *** [/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core/s ysfs.o] Error 1 make[2]: *** [/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniband/core] Error 2 make[1]: *** [_module_/var/tmp/OFED/tmp/openib/openib/src/linux-kernel/infiniban d] Error 2 make[1]: Leaving directory `/usr/src/kernels/2.6.15-1.2054_FC5-x86_64' make: *** [kernel] Error 2 ERROR: Failed to execute: make kernel Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Thu May 11 09:30:54 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 11 May 2006 09:30:54 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <20060511151235.GA1610@mellanox.co.il> References: <20060509172703.GA22825@mellanox.co.il> <20060511151235.GA1610@mellanox.co.il> Message-ID: <446366BE.7020103@ichips.intel.com> Michael S. Tsirkin wrote: > This has run for a couple of nights here without issues. Please commit. > I also think we shall push this patch for 2.6.17 - it is clean and simple > enough. Agree? I've committed these changes to svn. Roland, can you queue this patch for 2.6.17, or at least 2.6.18? - Sean From rdreier at cisco.com Thu May 11 09:25:48 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 09:25:48 -0700 Subject: [openib-general] [PATCH] IB/mthca: Convert FW commands to use wait_for_completion_timeout() Message-ID: The kernel has had wait_for_completion_timeout() for a long time now. mthca should use it to handle FW commands timing out, instead of implementing the same thing in a much more complicated way by using wait_for_completion() along with a timer that does complete(). Signed-off-by: Roland Dreier --- Anyone see anything wrong with this cleanup? diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index 1985b5d..7f78c94 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -174,7 +174,6 @@ enum { struct mthca_cmd_context { struct completion done; - struct timer_list timer; int result; int next; u64 out_param; @@ -362,15 +361,6 @@ void mthca_cmd_event(struct mthca_dev *d complete(&context->done); } -static void event_timeout(unsigned long context_ptr) -{ - struct mthca_cmd_context *context = - (struct mthca_cmd_context *) context_ptr; - - context->result = -EBUSY; - complete(&context->done); -} - static int mthca_cmd_wait(struct mthca_dev *dev, u64 in_param, u64 *out_param, @@ -401,11 +391,10 @@ static int mthca_cmd_wait(struct mthca_d if (err) goto out; - context->timer.expires = jiffies + timeout; - add_timer(&context->timer); - - wait_for_completion(&context->done); - del_timer_sync(&context->timer); + if (!wait_for_completion_timeout(&context->done, timeout)) { + err = -EBUSY; + goto out; + } err = context->result; if (err) @@ -535,10 +524,6 @@ int mthca_cmd_use_events(struct mthca_de for (i = 0; i < dev->cmd.max_cmds; ++i) { dev->cmd.context[i].token = i; dev->cmd.context[i].next = i + 1; - init_timer(&dev->cmd.context[i].timer); - dev->cmd.context[i].timer.data = - (unsigned long) &dev->cmd.context[i]; - dev->cmd.context[i].timer.function = event_timeout; } dev->cmd.context[dev->cmd.max_cmds - 1].next = -1; From mst at mellanox.co.il Thu May 11 09:36:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 11 May 2006 19:36:43 +0300 Subject: [openib-general] Re: OFED 1.0 rc4 won't compile on orig FC5 kernel In-Reply-To: References: Message-ID: <20060511163643.GC1072@mellanox.co.il> Quoting r. Scott Weitzenkamp (sweitzen) : > Subject: OFED 1.0 rc4 won't compile on orig FC5 kernel > > Is this a useful kernel to try, or should get latest FC5 kernel or 2.6.16 from kernel.org? I think you should go to latest update. -- MST From mshefty at ichips.intel.com Thu May 11 09:50:33 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 11 May 2006 09:50:33 -0700 Subject: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface In-Reply-To: References: Message-ID: <44636B59.9040808@ichips.intel.com> Roland, I just wanted to make sure that this patch wasn't dropped. Can we queue the multicast module for 2.6.18? - Sean From rdreier at cisco.com Thu May 11 09:51:15 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 09:51:15 -0700 Subject: [openib-general] Re: [PATCH] cm refcount race fix In-Reply-To: <446366BE.7020103@ichips.intel.com> (Sean Hefty's message of "Thu, 11 May 2006 09:30:54 -0700") References: <20060509172703.GA22825@mellanox.co.il> <20060511151235.GA1610@mellanox.co.il> <446366BE.7020103@ichips.intel.com> Message-ID: Sean> can you queue this patch for 2.6.17, or at least 2.6.18? Yes, what should I queue -- can you send a patch or at least point me at specific svn revs? - R. From rdreier at cisco.com Thu May 11 09:53:02 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 09:53:02 -0700 Subject: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface In-Reply-To: <44636B59.9040808@ichips.intel.com> (Sean Hefty's message of "Thu, 11 May 2006 09:50:33 -0700") References: <44636B59.9040808@ichips.intel.com> Message-ID: Sean> I just wanted to make sure that this patch wasn't dropped. Sorry, forgot all about it. I will read this over and commit it to svn, and hopefully people can stress test it to see how it compares to what we have. Sean> Can we queue the multicast module for 2.6.18? I guess so. What's the motivation? Do we have any users of it other than IPoIB? - R. From mshefty at ichips.intel.com Thu May 11 10:01:52 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 11 May 2006 10:01:52 -0700 Subject: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface In-Reply-To: References: <44636B59.9040808@ichips.intel.com> Message-ID: <44636E00.9000508@ichips.intel.com> Roland Dreier wrote: > Sean> Can we queue the multicast module for 2.6.18? > > I guess so. What's the motivation? Do we have any users of it other > than IPoIB? I'm working on adding multicast support for userspace (MPI), which will also need this. - Sean From tom at opengridcomputing.com Thu May 11 10:04:15 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Thu, 11 May 2006 12:04:15 -0500 Subject: [openib-general] Re: rdma_cm.h: comment nits. In-Reply-To: <20060511075608.GL10669@mellanox.co.il> References: <54AD0F12E08D1541B826BE97C98F99F149EF5A@NT-SJCA-0751.brcm.ad.broadcom.com> <1147308499.5093.85.camel@trinity.ogc.int> <20060511075608.GL10669@mellanox.co.il> Message-ID: <1147367055.9881.23.camel@trinity.ogc.int> On Thu, 2006-05-11 at 10:56 +0300, Michael S. Tsirkin wrote: > Quoting r. Tom Tucker : > > Subject: RE: rdma_cm.h: comment nits. > > > > On Wed, 2006-05-10 at 14:20 -0700, Caitlin Bestler wrote: > > > Tom Tucker wrote: > > > > > > > > > > > So... all that said, I could in fact support rdma_reject on > > > > an active side connection. But this would effectively reduce > > > > to a QP --> ERROR and I doubt this matches the semantics > > > > you're looking for. > > > > > > > > > > > > > > And you could send an RST. > > > > Yep, in fact that's what many RNIC's do when you move the QP to ERROR > > instead of CLOSING. > > > > > There's just no way to send any user > > > supplied private data. It's not just unreliable, it's guaranteed > > > not to arrive. It's still a long way from the truly desired > > > semantics, but the wire protocol just doesn't carry that info. > > > > > > > Yeah, I think you're correct -- it would be a bogus "emulation". > > I don't think any real ULP passes private data inside the Reject. > > Private data in response (SYN/ACK) is clearly portable, is it not? For iWARP the data is actually exchanged after TCP connection establishment as part of MPA negotiation. But yes, private data exchange is supported during connection establishment. It can be provided on the active side (rdma_connect) and on the passive side (rdma_accept, rdma_reject). What is not currently supported is calling rdma_connect and then rdma_reject (presumably to cancel the connect request after receiving the remote peers private data). The supported behavior for iWARP on the active side would be to call rdma_disconnect if you didn't like the private data provided. > From mshefty at ichips.intel.com Thu May 11 10:02:21 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 11 May 2006 10:02:21 -0700 Subject: [openib-general] [PATCH] refcount race fixes Message-ID: <44636E1D.70608@ichips.intel.com> Roland, this is the patch that I was referring to. --- Fix race condition during destruction calls to avoid possibility of accessing object after it has been freed. Signed-off-by: Sean Hefty --- Index: mad_rmpp.c =================================================================== --- mad_rmpp.c (revision 6884) +++ mad_rmpp.c (working copy) @@ -49,7 +49,7 @@ struct mad_rmpp_recv { struct list_head list; struct work_struct timeout_work; struct work_struct cleanup_work; - wait_queue_head_t wait; + struct completion comp; enum rmpp_state state; spinlock_t lock; atomic_t refcount; @@ -69,10 +69,16 @@ struct mad_rmpp_recv { u8 method; }; +static inline void deref_rmpp_recv(struct mad_rmpp_recv *rmpp_recv) +{ + if (atomic_dec_and_test(&rmpp_recv->refcount)) + complete(&rmpp_recv->comp); +} + static void destroy_rmpp_recv(struct mad_rmpp_recv *rmpp_recv) { - atomic_dec(&rmpp_recv->refcount); - wait_event(rmpp_recv->wait, !atomic_read(&rmpp_recv->refcount)); + deref_rmpp_recv(rmpp_recv); + wait_for_completion(&rmpp_recv->comp); ib_destroy_ah(rmpp_recv->ah); kfree(rmpp_recv); } @@ -253,7 +259,7 @@ create_rmpp_recv(struct ib_mad_agent_pri goto error; rmpp_recv->agent = agent; - init_waitqueue_head(&rmpp_recv->wait); + init_completion(&rmpp_recv->comp); INIT_WORK(&rmpp_recv->timeout_work, recv_timeout_handler, rmpp_recv); INIT_WORK(&rmpp_recv->cleanup_work, recv_cleanup_handler, rmpp_recv); spin_lock_init(&rmpp_recv->lock); @@ -279,12 +285,6 @@ error: kfree(rmpp_recv); return NULL; } -static inline void deref_rmpp_recv(struct mad_rmpp_recv *rmpp_recv) -{ - if (atomic_dec_and_test(&rmpp_recv->refcount)) - wake_up(&rmpp_recv->wait); -} - static struct mad_rmpp_recv * find_rmpp_recv(struct ib_mad_agent_private *agent, struct ib_mad_recv_wc *mad_recv_wc) Index: cm.c =================================================================== --- cm.c (revision 6884) +++ cm.c (working copy) @@ -34,6 +34,8 @@ * * $Id$ */ + +#include #include #include #include @@ -122,7 +124,7 @@ struct cm_id_private { struct rb_node service_node; struct rb_node sidr_id_node; spinlock_t lock; /* Do not acquire inside cm.lock */ - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; struct ib_mad_send_buf *msg; @@ -160,7 +162,7 @@ static void cm_work_handler(void *data); static inline void cm_deref_id(struct cm_id_private *cm_id_priv) { if (atomic_dec_and_test(&cm_id_priv->refcount)) - wake_up(&cm_id_priv->wait); + complete(&cm_id_priv->comp); } static int cm_alloc_msg(struct cm_id_private *cm_id_priv, @@ -611,7 +613,7 @@ struct ib_cm_id *ib_create_cm_id(struct goto error; spin_lock_init(&cm_id_priv->lock); - init_waitqueue_head(&cm_id_priv->wait); + init_completion(&cm_id_priv->comp); INIT_LIST_HEAD(&cm_id_priv->work_list); atomic_set(&cm_id_priv->work_count, -1); atomic_set(&cm_id_priv->refcount, 1); @@ -776,8 +778,8 @@ retest: } cm_free_id(cm_id->local_id); - atomic_dec(&cm_id_priv->refcount); - wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount)); + cm_deref_id(cm_id_priv); + wait_for_completion(&cm_id_priv->comp); while ((work = cm_dequeue_work(cm_id_priv)) != NULL) cm_free_work(work); kfree(cm_id_priv->compare_data); Index: multicast.c =================================================================== --- multicast.c (revision 6884) +++ multicast.c (working copy) @@ -30,6 +30,7 @@ * SOFTWARE. */ +#include #include #include #include @@ -69,7 +70,7 @@ struct mcast_port { spinlock_t lock; struct rb_root table; atomic_t refcount; - wait_queue_head_t wait; + struct completion comp; u8 port_num; }; @@ -110,7 +111,7 @@ struct mcast_member { struct list_head list; enum mcast_state state; atomic_t refcount; - wait_queue_head_t wait; + struct completion comp; }; static void join_handler(int status, struct ib_sa_mcmember_rec *rec, @@ -168,7 +169,7 @@ static struct mcast_group *mcast_insert( static void deref_port(struct mcast_port *port) { if (atomic_dec_and_test(&port->refcount)) - wake_up(&port->wait); + complete(&port->comp); } static void release_group(struct mcast_group *group) @@ -189,7 +190,7 @@ static void release_group(struct mcast_g static void deref_member(struct mcast_member *member) { if (atomic_dec_and_test(&member->refcount)) - wake_up(&member->wait); + complete(&member->comp); } static void queue_join(struct mcast_member *member) @@ -512,7 +513,7 @@ struct ib_multicast *ib_join_multicast(s member->multicast.comp_mask = comp_mask; member->multicast.callback = callback; member->multicast.context = context; - init_waitqueue_head(&member->wait); + init_completion(&member->comp); atomic_set(&member->refcount, 1); member->state = MCAST_JOINING; @@ -569,8 +570,8 @@ void ib_free_multicast(struct ib_multica release_group(group); } - atomic_dec(&member->refcount); - wait_event(member->wait, !atomic_read(&member->refcount)); + deref_member(member); + wait_for_completion(&member->comp); kfree(member); } EXPORT_SYMBOL(ib_free_multicast); @@ -602,7 +603,7 @@ static void mcast_add_one(struct ib_devi port->port_num = dev->start_port + i; spin_lock_init(&port->lock); port->table = RB_ROOT; - init_waitqueue_head(&port->wait); + init_completion(&port->comp); atomic_set(&port->refcount, 1); } @@ -644,8 +645,8 @@ static void mcast_remove_one(struct ib_d for (i = 0; i < dev->end_port - dev->start_port; i++) { port = &dev->port[i]; leave_groups(port); - atomic_dec(&port->refcount); - wait_event(port->wait, !atomic_read(&port->refcount)); + deref_port(port); + wait_for_completion(&port->comp); } kfree(dev); Index: cma.c =================================================================== --- cma.c (revision 6948) +++ cma.c (working copy) @@ -29,6 +29,7 @@ * */ +#include #include #include #include @@ -70,7 +71,7 @@ struct cma_device { struct list_head list; struct ib_device *device; __be64 node_guid; - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; struct list_head id_list; }; @@ -111,7 +112,7 @@ struct rdma_id_private { enum cma_state state; spinlock_t lock; - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; wait_queue_head_t wait_remove; atomic_t dev_remove; @@ -244,11 +245,16 @@ static void cma_attach_to_dev(struct rdm list_add_tail(&id_priv->list, &cma_dev->id_list); } +static inline void cma_deref_dev(struct cma_device *cma_dev) +{ + if (atomic_dec_and_test(&cma_dev->refcount)) + complete(&cma_dev->comp); +} + static void cma_detach_from_dev(struct rdma_id_private *id_priv) { list_del(&id_priv->list); - if (atomic_dec_and_test(&id_priv->cma_dev->refcount)) - wake_up(&id_priv->cma_dev->wait); + cma_deref_dev(id_priv->cma_dev); id_priv->cma_dev = NULL; } @@ -288,7 +294,7 @@ static int cma_acquire_dev(struct rdma_i static void cma_deref_id(struct rdma_id_private *id_priv) { if (atomic_dec_and_test(&id_priv->refcount)) - wake_up(&id_priv->wait); + complete(&id_priv->comp); } static void cma_release_remove(struct rdma_id_private *id_priv) @@ -311,7 +317,7 @@ struct rdma_cm_id* rdma_create_id(rdma_c id_priv->id.event_handler = event_handler; id_priv->id.ps = ps; spin_lock_init(&id_priv->lock); - init_waitqueue_head(&id_priv->wait); + init_completion(&id_priv->comp); atomic_set(&id_priv->refcount, 1); init_waitqueue_head(&id_priv->wait_remove); atomic_set(&id_priv->dev_remove, 0); @@ -618,8 +624,8 @@ static void cma_destroy_listen(struct rd } list_del(&id_priv->listen_list); - atomic_dec(&id_priv->refcount); - wait_event(id_priv->wait, !atomic_read(&id_priv->refcount)); + cma_deref_id(id_priv); + wait_for_completion(&id_priv->comp); kfree(id_priv); } @@ -699,8 +705,8 @@ void rdma_destroy_id(struct rdma_cm_id * } cma_release_port(id_priv); - atomic_dec(&id_priv->refcount); - wait_event(id_priv->wait, !atomic_read(&id_priv->refcount)); + cma_deref_id(id_priv); + wait_for_completion(&id_priv->comp); kfree(id_priv->id.route.path_rec); kfree(id_priv); @@ -1778,7 +1784,7 @@ static void cma_add_one(struct ib_device if (!cma_dev->node_guid) goto err; - init_waitqueue_head(&cma_dev->wait); + init_completion(&cma_dev->comp); atomic_set(&cma_dev->refcount, 1); INIT_LIST_HEAD(&cma_dev->id_list); ib_set_client_data(device, &cma_client, cma_dev); @@ -1845,8 +1851,8 @@ static void cma_process_remove(struct cm } mutex_unlock(&lock); - atomic_dec(&cma_dev->refcount); - wait_event(cma_dev->wait, !atomic_read(&cma_dev->refcount)); + cma_deref_dev(cma_dev); + wait_for_completion(&cma_dev->comp); } static void cma_remove_one(struct ib_device *device) Index: mad.c =================================================================== --- mad.c (revision 6886) +++ mad.c (working copy) @@ -353,7 +353,7 @@ struct ib_mad_agent *ib_register_mad_age INIT_WORK(&mad_agent_priv->local_work, local_completions, mad_agent_priv); atomic_set(&mad_agent_priv->refcount, 1); - init_waitqueue_head(&mad_agent_priv->wait); + init_completion(&mad_agent_priv->comp); return &mad_agent_priv->agent; @@ -468,7 +468,7 @@ struct ib_mad_agent *ib_register_mad_sno mad_snoop_priv->agent.qp = port_priv->qp_info[qpn].qp; mad_snoop_priv->agent.port_num = port_num; mad_snoop_priv->mad_snoop_flags = mad_snoop_flags; - init_waitqueue_head(&mad_snoop_priv->wait); + init_completion(&mad_snoop_priv->comp); mad_snoop_priv->snoop_index = register_snoop_agent( &port_priv->qp_info[qpn], mad_snoop_priv); @@ -487,6 +487,18 @@ error1: } EXPORT_SYMBOL(ib_register_mad_snoop); +static inline void deref_mad_agent(struct ib_mad_agent_private *mad_agent_priv) +{ + if (atomic_dec_and_test(&mad_agent_priv->refcount)) + complete(&mad_agent_priv->comp); +} + +static inline void deref_snoop_agent(struct ib_mad_snoop_private *mad_snoop_priv) +{ + if (atomic_dec_and_test(&mad_snoop_priv->refcount)) + complete(&mad_snoop_priv->comp); +} + static void unregister_mad_agent(struct ib_mad_agent_private *mad_agent_priv) { struct ib_mad_port_private *port_priv; @@ -510,9 +522,8 @@ static void unregister_mad_agent(struct flush_workqueue(port_priv->wq); ib_cancel_rmpp_recvs(mad_agent_priv); - atomic_dec(&mad_agent_priv->refcount); - wait_event(mad_agent_priv->wait, - !atomic_read(&mad_agent_priv->refcount)); + deref_mad_agent(mad_agent_priv); + wait_for_completion(&mad_agent_priv->comp); kfree(mad_agent_priv->reg_req); ib_dereg_mr(mad_agent_priv->agent.mr); @@ -530,9 +541,8 @@ static void unregister_mad_snoop(struct atomic_dec(&qp_info->snoop_count); spin_unlock_irqrestore(&qp_info->snoop_lock, flags); - atomic_dec(&mad_snoop_priv->refcount); - wait_event(mad_snoop_priv->wait, - !atomic_read(&mad_snoop_priv->refcount)); + deref_snoop_agent(mad_snoop_priv); + wait_for_completion(&mad_snoop_priv->comp); kfree(mad_snoop_priv); } @@ -601,8 +611,7 @@ static void snoop_send(struct ib_mad_qp_ spin_unlock_irqrestore(&qp_info->snoop_lock, flags); mad_snoop_priv->agent.snoop_handler(&mad_snoop_priv->agent, send_buf, mad_send_wc); - if (atomic_dec_and_test(&mad_snoop_priv->refcount)) - wake_up(&mad_snoop_priv->wait); + deref_snoop_agent(mad_snoop_priv); spin_lock_irqsave(&qp_info->snoop_lock, flags); } spin_unlock_irqrestore(&qp_info->snoop_lock, flags); @@ -627,8 +636,7 @@ static void snoop_recv(struct ib_mad_qp_ spin_unlock_irqrestore(&qp_info->snoop_lock, flags); mad_snoop_priv->agent.recv_handler(&mad_snoop_priv->agent, mad_recv_wc); - if (atomic_dec_and_test(&mad_snoop_priv->refcount)) - wake_up(&mad_snoop_priv->wait); + deref_snoop_agent(mad_snoop_priv); spin_lock_irqsave(&qp_info->snoop_lock, flags); } spin_unlock_irqrestore(&qp_info->snoop_lock, flags); @@ -969,8 +977,7 @@ void ib_free_send_mad(struct ib_mad_send free_send_rmpp_list(mad_send_wr); kfree(send_buf->mad); - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); } EXPORT_SYMBOL(ib_free_send_mad); @@ -1789,8 +1796,7 @@ static void ib_mad_complete_recv(struct mad_recv_wc = ib_process_rmpp_recv_wc(mad_agent_priv, mad_recv_wc); if (!mad_recv_wc) { - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); return; } } @@ -1802,8 +1808,7 @@ static void ib_mad_complete_recv(struct if (!mad_send_wr) { spin_unlock_irqrestore(&mad_agent_priv->lock, flags); ib_free_recv_mad(mad_recv_wc); - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); return; } ib_mark_mad_done(mad_send_wr); @@ -1822,8 +1827,7 @@ static void ib_mad_complete_recv(struct } else { mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, mad_recv_wc); - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); } } @@ -2053,8 +2057,7 @@ void ib_mad_complete_send_wr(struct ib_m mad_send_wc); /* Release reference on agent taken when sending */ - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); return; done: spin_unlock_irqrestore(&mad_agent_priv->lock, flags); Index: mad_priv.h =================================================================== --- mad_priv.h (revision 6884) +++ mad_priv.h (working copy) @@ -37,6 +37,7 @@ #ifndef __IB_MAD_PRIV_H__ #define __IB_MAD_PRIV_H__ +#include #include #include #include @@ -108,7 +109,7 @@ struct ib_mad_agent_private { struct list_head rmpp_list; atomic_t refcount; - wait_queue_head_t wait; + struct completion comp; }; struct ib_mad_snoop_private { @@ -117,7 +118,7 @@ struct ib_mad_snoop_private { int snoop_index; int mad_snoop_flags; atomic_t refcount; - wait_queue_head_t wait; + struct completion comp; }; struct ib_mad_send_wr_private { Index: ucm.c =================================================================== --- ucm.c (revision 6884) +++ ucm.c (working copy) @@ -32,6 +32,8 @@ * * $Id$ */ + +#include #include #include #include @@ -73,7 +75,7 @@ struct ib_ucm_file { struct ib_ucm_context { int id; - wait_queue_head_t wait; + struct completion comp; atomic_t ref; int events_reported; @@ -139,7 +141,7 @@ static struct ib_ucm_context *ib_ucm_ctx static void ib_ucm_ctx_put(struct ib_ucm_context *ctx) { if (atomic_dec_and_test(&ctx->ref)) - wake_up(&ctx->wait); + complete(&ctx->comp); } static inline int ib_ucm_new_cm_id(int event) @@ -179,7 +181,7 @@ static struct ib_ucm_context *ib_ucm_ctx return NULL; atomic_set(&ctx->ref, 1); - init_waitqueue_head(&ctx->wait); + init_completion(&ctx->comp); ctx->file = file; INIT_LIST_HEAD(&ctx->events); @@ -559,8 +561,8 @@ static ssize_t ib_ucm_destroy_id(struct if (IS_ERR(ctx)) return PTR_ERR(ctx); - atomic_dec(&ctx->ref); - wait_event(ctx->wait, !atomic_read(&ctx->ref)); + ib_ucm_ctx_put(ctx); + wait_for_completion(&ctx->comp); /* No new events will be generated after destroying the cm_id. */ ib_destroy_cm_id(ctx->cm_id); Index: ucma.c =================================================================== --- ucma.c (revision 6949) +++ ucma.c (working copy) @@ -30,6 +30,7 @@ * SOFTWARE. */ +#include #include #include #include @@ -61,7 +62,7 @@ struct ucma_file { struct ucma_context { int id; - wait_queue_head_t wait; + struct completion comp; atomic_t ref; int events_reported; int backlog; @@ -105,7 +106,7 @@ static struct ucma_context* ucma_get_ctx static void ucma_put_ctx(struct ucma_context *ctx) { if (atomic_dec_and_test(&ctx->ref)) - wake_up(&ctx->wait); + complete(&ctx->comp); } static void ucma_cleanup_events(struct ucma_context *ctx) @@ -140,7 +141,7 @@ static struct ucma_context* ucma_alloc_c return NULL; atomic_set(&ctx->ref, 1); - init_waitqueue_head(&ctx->wait); + init_completion(&ctx->comp); ctx->file = file; INIT_LIST_HEAD(&ctx->events); @@ -341,8 +342,8 @@ static ssize_t ucma_destroy_id(struct uc if (IS_ERR(ctx)) return PTR_ERR(ctx); - atomic_dec(&ctx->ref); - wait_event(ctx->wait, !atomic_read(&ctx->ref)); + ucma_put_ctx(ctx); + wait_for_completion(&ctx->comp); /* No new events will be generated after destroying the id. */ rdma_destroy_id(ctx->cm_id); _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Thu May 11 10:04:38 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 10:04:38 -0700 Subject: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface In-Reply-To: <44636E00.9000508@ichips.intel.com> (Sean Hefty's message of "Thu, 11 May 2006 10:01:52 -0700") References: <44636B59.9040808@ichips.intel.com> <44636E00.9000508@ichips.intel.com> Message-ID: Sean> I'm working on adding multicast support for userspace (MPI), Sean> which will also need this. Right, but that's not going to be ready for 2.6.18, is it? - R. From mst at mellanox.co.il Thu May 11 10:06:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 11 May 2006 20:06:25 +0300 Subject: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface In-Reply-To: References: <44636B59.9040808@ichips.intel.com> Message-ID: <20060511170625.GA2595@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface > > Sean> I just wanted to make sure that this patch wasn't dropped. > > Sorry, forgot all about it. I will read this over and commit it to > svn, and hopefully people can stress test it to see how it compares to > what we have. > > Sean> Can we queue the multicast module for 2.6.18? > > I guess so. What's the motivation? Do we have any users of it other > than IPoIB? I also think it might make sense to use the new SA query retry support. I think currently ipoib retries SA queries by itself. -- MST From mst at mellanox.co.il Thu May 11 10:07:20 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 11 May 2006 20:07:20 +0300 Subject: [openib-general] Fwd: RE: [PATCH] cm refcount race fix Message-ID: <20060511170720.GB2595@mellanox.co.il> Here's the patch that I tested. ----- Forwarded message from Sean Hefty ----- From: "Sean Hefty" Subject: RE: [PATCH] cm refcount race fix Date: Tue, 9 May 2006 12:14:02 -0700 In-Reply-To: <20060509172703.GA22825 at mellanox.co.il> X-Spam: exempt Here's a patch for all of the files that you listed. I did do some basic testing and didn't see any issues. Signed-off-by: Sean Hefty --- Index: mad_rmpp.c =================================================================== --- mad_rmpp.c (revision 6884) +++ mad_rmpp.c (working copy) @@ -49,7 +49,7 @@ struct mad_rmpp_recv { struct list_head list; struct work_struct timeout_work; struct work_struct cleanup_work; - wait_queue_head_t wait; + struct completion comp; enum rmpp_state state; spinlock_t lock; atomic_t refcount; @@ -69,10 +69,16 @@ struct mad_rmpp_recv { u8 method; }; +static inline void deref_rmpp_recv(struct mad_rmpp_recv *rmpp_recv) +{ + if (atomic_dec_and_test(&rmpp_recv->refcount)) + complete(&rmpp_recv->comp); +} + static void destroy_rmpp_recv(struct mad_rmpp_recv *rmpp_recv) { - atomic_dec(&rmpp_recv->refcount); - wait_event(rmpp_recv->wait, !atomic_read(&rmpp_recv->refcount)); + deref_rmpp_recv(rmpp_recv); + wait_for_completion(&rmpp_recv->comp); ib_destroy_ah(rmpp_recv->ah); kfree(rmpp_recv); } @@ -253,7 +259,7 @@ create_rmpp_recv(struct ib_mad_agent_pri goto error; rmpp_recv->agent = agent; - init_waitqueue_head(&rmpp_recv->wait); + init_completion(&rmpp_recv->comp); INIT_WORK(&rmpp_recv->timeout_work, recv_timeout_handler, rmpp_recv); INIT_WORK(&rmpp_recv->cleanup_work, recv_cleanup_handler, rmpp_recv); spin_lock_init(&rmpp_recv->lock); @@ -279,12 +285,6 @@ error: kfree(rmpp_recv); return NULL; } -static inline void deref_rmpp_recv(struct mad_rmpp_recv *rmpp_recv) -{ - if (atomic_dec_and_test(&rmpp_recv->refcount)) - wake_up(&rmpp_recv->wait); -} - static struct mad_rmpp_recv * find_rmpp_recv(struct ib_mad_agent_private *agent, struct ib_mad_recv_wc *mad_recv_wc) Index: cm.c =================================================================== --- cm.c (revision 6884) +++ cm.c (working copy) @@ -34,6 +34,8 @@ * * $Id$ */ + +#include #include #include #include @@ -122,7 +124,7 @@ struct cm_id_private { struct rb_node service_node; struct rb_node sidr_id_node; spinlock_t lock; /* Do not acquire inside cm.lock */ - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; struct ib_mad_send_buf *msg; @@ -160,7 +162,7 @@ static void cm_work_handler(void *data); static inline void cm_deref_id(struct cm_id_private *cm_id_priv) { if (atomic_dec_and_test(&cm_id_priv->refcount)) - wake_up(&cm_id_priv->wait); + complete(&cm_id_priv->comp); } static int cm_alloc_msg(struct cm_id_private *cm_id_priv, @@ -611,7 +613,7 @@ struct ib_cm_id *ib_create_cm_id(struct goto error; spin_lock_init(&cm_id_priv->lock); - init_waitqueue_head(&cm_id_priv->wait); + init_completion(&cm_id_priv->comp); INIT_LIST_HEAD(&cm_id_priv->work_list); atomic_set(&cm_id_priv->work_count, -1); atomic_set(&cm_id_priv->refcount, 1); @@ -776,8 +778,8 @@ retest: } cm_free_id(cm_id->local_id); - atomic_dec(&cm_id_priv->refcount); - wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount)); + cm_deref_id(cm_id_priv); + wait_for_completion(&cm_id_priv->comp); while ((work = cm_dequeue_work(cm_id_priv)) != NULL) cm_free_work(work); kfree(cm_id_priv->compare_data); Index: multicast.c =================================================================== --- multicast.c (revision 6884) +++ multicast.c (working copy) @@ -30,6 +30,7 @@ * SOFTWARE. */ +#include #include #include #include @@ -69,7 +70,7 @@ struct mcast_port { spinlock_t lock; struct rb_root table; atomic_t refcount; - wait_queue_head_t wait; + struct completion comp; u8 port_num; }; @@ -110,7 +111,7 @@ struct mcast_member { struct list_head list; enum mcast_state state; atomic_t refcount; - wait_queue_head_t wait; + struct completion comp; }; static void join_handler(int status, struct ib_sa_mcmember_rec *rec, @@ -168,7 +169,7 @@ static struct mcast_group *mcast_insert( static void deref_port(struct mcast_port *port) { if (atomic_dec_and_test(&port->refcount)) - wake_up(&port->wait); + complete(&port->comp); } static void release_group(struct mcast_group *group) @@ -189,7 +190,7 @@ static void release_group(struct mcast_g static void deref_member(struct mcast_member *member) { if (atomic_dec_and_test(&member->refcount)) - wake_up(&member->wait); + complete(&member->comp); } static void queue_join(struct mcast_member *member) @@ -512,7 +513,7 @@ struct ib_multicast *ib_join_multicast(s member->multicast.comp_mask = comp_mask; member->multicast.callback = callback; member->multicast.context = context; - init_waitqueue_head(&member->wait); + init_completion(&member->comp); atomic_set(&member->refcount, 1); member->state = MCAST_JOINING; @@ -569,8 +570,8 @@ void ib_free_multicast(struct ib_multica release_group(group); } - atomic_dec(&member->refcount); - wait_event(member->wait, !atomic_read(&member->refcount)); + deref_member(member); + wait_for_completion(&member->comp); kfree(member); } EXPORT_SYMBOL(ib_free_multicast); @@ -602,7 +603,7 @@ static void mcast_add_one(struct ib_devi port->port_num = dev->start_port + i; spin_lock_init(&port->lock); port->table = RB_ROOT; - init_waitqueue_head(&port->wait); + init_completion(&port->comp); atomic_set(&port->refcount, 1); } @@ -644,8 +645,8 @@ static void mcast_remove_one(struct ib_d for (i = 0; i < dev->end_port - dev->start_port; i++) { port = &dev->port[i]; leave_groups(port); - atomic_dec(&port->refcount); - wait_event(port->wait, !atomic_read(&port->refcount)); + deref_port(port); + wait_for_completion(&port->comp); } kfree(dev); Index: cma.c =================================================================== --- cma.c (revision 6948) +++ cma.c (working copy) @@ -29,6 +29,7 @@ * */ +#include #include #include #include @@ -70,7 +71,7 @@ struct cma_device { struct list_head list; struct ib_device *device; __be64 node_guid; - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; struct list_head id_list; }; @@ -111,7 +112,7 @@ struct rdma_id_private { enum cma_state state; spinlock_t lock; - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; wait_queue_head_t wait_remove; atomic_t dev_remove; @@ -244,11 +245,16 @@ static void cma_attach_to_dev(struct rdm list_add_tail(&id_priv->list, &cma_dev->id_list); } +static inline void cma_deref_dev(struct cma_device *cma_dev) +{ + if (atomic_dec_and_test(&cma_dev->refcount)) + complete(&cma_dev->comp); +} + static void cma_detach_from_dev(struct rdma_id_private *id_priv) { list_del(&id_priv->list); - if (atomic_dec_and_test(&id_priv->cma_dev->refcount)) - wake_up(&id_priv->cma_dev->wait); + cma_deref_dev(id_priv->cma_dev); id_priv->cma_dev = NULL; } @@ -288,7 +294,7 @@ static int cma_acquire_dev(struct rdma_i static void cma_deref_id(struct rdma_id_private *id_priv) { if (atomic_dec_and_test(&id_priv->refcount)) - wake_up(&id_priv->wait); + complete(&id_priv->comp); } static void cma_release_remove(struct rdma_id_private *id_priv) @@ -311,7 +317,7 @@ struct rdma_cm_id* rdma_create_id(rdma_c id_priv->id.event_handler = event_handler; id_priv->id.ps = ps; spin_lock_init(&id_priv->lock); - init_waitqueue_head(&id_priv->wait); + init_completion(&id_priv->comp); atomic_set(&id_priv->refcount, 1); init_waitqueue_head(&id_priv->wait_remove); atomic_set(&id_priv->dev_remove, 0); @@ -618,8 +624,8 @@ static void cma_destroy_listen(struct rd } list_del(&id_priv->listen_list); - atomic_dec(&id_priv->refcount); - wait_event(id_priv->wait, !atomic_read(&id_priv->refcount)); + cma_deref_id(id_priv); + wait_for_completion(&id_priv->comp); kfree(id_priv); } @@ -699,8 +705,8 @@ void rdma_destroy_id(struct rdma_cm_id * } cma_release_port(id_priv); - atomic_dec(&id_priv->refcount); - wait_event(id_priv->wait, !atomic_read(&id_priv->refcount)); + cma_deref_id(id_priv); + wait_for_completion(&id_priv->comp); kfree(id_priv->id.route.path_rec); kfree(id_priv); @@ -1778,7 +1784,7 @@ static void cma_add_one(struct ib_device if (!cma_dev->node_guid) goto err; - init_waitqueue_head(&cma_dev->wait); + init_completion(&cma_dev->comp); atomic_set(&cma_dev->refcount, 1); INIT_LIST_HEAD(&cma_dev->id_list); ib_set_client_data(device, &cma_client, cma_dev); @@ -1845,8 +1851,8 @@ static void cma_process_remove(struct cm } mutex_unlock(&lock); - atomic_dec(&cma_dev->refcount); - wait_event(cma_dev->wait, !atomic_read(&cma_dev->refcount)); + cma_deref_dev(cma_dev); + wait_for_completion(&cma_dev->comp); } static void cma_remove_one(struct ib_device *device) Index: mad.c =================================================================== --- mad.c (revision 6886) +++ mad.c (working copy) @@ -353,7 +353,7 @@ struct ib_mad_agent *ib_register_mad_age INIT_WORK(&mad_agent_priv->local_work, local_completions, mad_agent_priv); atomic_set(&mad_agent_priv->refcount, 1); - init_waitqueue_head(&mad_agent_priv->wait); + init_completion(&mad_agent_priv->comp); return &mad_agent_priv->agent; @@ -468,7 +468,7 @@ struct ib_mad_agent *ib_register_mad_sno mad_snoop_priv->agent.qp = port_priv->qp_info[qpn].qp; mad_snoop_priv->agent.port_num = port_num; mad_snoop_priv->mad_snoop_flags = mad_snoop_flags; - init_waitqueue_head(&mad_snoop_priv->wait); + init_completion(&mad_snoop_priv->comp); mad_snoop_priv->snoop_index = register_snoop_agent( &port_priv->qp_info[qpn], mad_snoop_priv); @@ -487,6 +487,18 @@ error1: } EXPORT_SYMBOL(ib_register_mad_snoop); +static inline void deref_mad_agent(struct ib_mad_agent_private *mad_agent_priv) +{ + if (atomic_dec_and_test(&mad_agent_priv->refcount)) + complete(&mad_agent_priv->comp); +} + +static inline void deref_snoop_agent(struct ib_mad_snoop_private *mad_snoop_priv) +{ + if (atomic_dec_and_test(&mad_snoop_priv->refcount)) + complete(&mad_snoop_priv->comp); +} + static void unregister_mad_agent(struct ib_mad_agent_private *mad_agent_priv) { struct ib_mad_port_private *port_priv; @@ -510,9 +522,8 @@ static void unregister_mad_agent(struct flush_workqueue(port_priv->wq); ib_cancel_rmpp_recvs(mad_agent_priv); - atomic_dec(&mad_agent_priv->refcount); - wait_event(mad_agent_priv->wait, - !atomic_read(&mad_agent_priv->refcount)); + deref_mad_agent(mad_agent_priv); + wait_for_completion(&mad_agent_priv->comp); kfree(mad_agent_priv->reg_req); ib_dereg_mr(mad_agent_priv->agent.mr); @@ -530,9 +541,8 @@ static void unregister_mad_snoop(struct atomic_dec(&qp_info->snoop_count); spin_unlock_irqrestore(&qp_info->snoop_lock, flags); - atomic_dec(&mad_snoop_priv->refcount); - wait_event(mad_snoop_priv->wait, - !atomic_read(&mad_snoop_priv->refcount)); + deref_snoop_agent(mad_snoop_priv); + wait_for_completion(&mad_snoop_priv->comp); kfree(mad_snoop_priv); } @@ -601,8 +611,7 @@ static void snoop_send(struct ib_mad_qp_ spin_unlock_irqrestore(&qp_info->snoop_lock, flags); mad_snoop_priv->agent.snoop_handler(&mad_snoop_priv->agent, send_buf, mad_send_wc); - if (atomic_dec_and_test(&mad_snoop_priv->refcount)) - wake_up(&mad_snoop_priv->wait); + deref_snoop_agent(mad_snoop_priv); spin_lock_irqsave(&qp_info->snoop_lock, flags); } spin_unlock_irqrestore(&qp_info->snoop_lock, flags); @@ -627,8 +636,7 @@ static void snoop_recv(struct ib_mad_qp_ spin_unlock_irqrestore(&qp_info->snoop_lock, flags); mad_snoop_priv->agent.recv_handler(&mad_snoop_priv->agent, mad_recv_wc); - if (atomic_dec_and_test(&mad_snoop_priv->refcount)) - wake_up(&mad_snoop_priv->wait); + deref_snoop_agent(mad_snoop_priv); spin_lock_irqsave(&qp_info->snoop_lock, flags); } spin_unlock_irqrestore(&qp_info->snoop_lock, flags); @@ -969,8 +977,7 @@ void ib_free_send_mad(struct ib_mad_send free_send_rmpp_list(mad_send_wr); kfree(send_buf->mad); - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); } EXPORT_SYMBOL(ib_free_send_mad); @@ -1789,8 +1796,7 @@ static void ib_mad_complete_recv(struct mad_recv_wc = ib_process_rmpp_recv_wc(mad_agent_priv, mad_recv_wc); if (!mad_recv_wc) { - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); return; } } @@ -1802,8 +1808,7 @@ static void ib_mad_complete_recv(struct if (!mad_send_wr) { spin_unlock_irqrestore(&mad_agent_priv->lock, flags); ib_free_recv_mad(mad_recv_wc); - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); return; } ib_mark_mad_done(mad_send_wr); @@ -1822,8 +1827,7 @@ static void ib_mad_complete_recv(struct } else { mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, mad_recv_wc); - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); } } @@ -2053,8 +2057,7 @@ void ib_mad_complete_send_wr(struct ib_m mad_send_wc); /* Release reference on agent taken when sending */ - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); return; done: spin_unlock_irqrestore(&mad_agent_priv->lock, flags); Index: mad_priv.h =================================================================== --- mad_priv.h (revision 6884) +++ mad_priv.h (working copy) @@ -37,6 +37,7 @@ #ifndef __IB_MAD_PRIV_H__ #define __IB_MAD_PRIV_H__ +#include #include #include #include @@ -108,7 +109,7 @@ struct ib_mad_agent_private { struct list_head rmpp_list; atomic_t refcount; - wait_queue_head_t wait; + struct completion comp; }; struct ib_mad_snoop_private { @@ -117,7 +118,7 @@ struct ib_mad_snoop_private { int snoop_index; int mad_snoop_flags; atomic_t refcount; - wait_queue_head_t wait; + struct completion comp; }; struct ib_mad_send_wr_private { Index: ucm.c =================================================================== --- ucm.c (revision 6884) +++ ucm.c (working copy) @@ -32,6 +32,8 @@ * * $Id$ */ + +#include #include #include #include @@ -73,7 +75,7 @@ struct ib_ucm_file { struct ib_ucm_context { int id; - wait_queue_head_t wait; + struct completion comp; atomic_t ref; int events_reported; @@ -139,7 +141,7 @@ static struct ib_ucm_context *ib_ucm_ctx static void ib_ucm_ctx_put(struct ib_ucm_context *ctx) { if (atomic_dec_and_test(&ctx->ref)) - wake_up(&ctx->wait); + complete(&ctx->comp); } static inline int ib_ucm_new_cm_id(int event) @@ -179,7 +181,7 @@ static struct ib_ucm_context *ib_ucm_ctx return NULL; atomic_set(&ctx->ref, 1); - init_waitqueue_head(&ctx->wait); + init_completion(&ctx->comp); ctx->file = file; INIT_LIST_HEAD(&ctx->events); @@ -559,8 +561,8 @@ static ssize_t ib_ucm_destroy_id(struct if (IS_ERR(ctx)) return PTR_ERR(ctx); - atomic_dec(&ctx->ref); - wait_event(ctx->wait, !atomic_read(&ctx->ref)); + ib_ucm_ctx_put(ctx); + wait_for_completion(&ctx->comp); /* No new events will be generated after destroying the cm_id. */ ib_destroy_cm_id(ctx->cm_id); Index: ucma.c =================================================================== --- ucma.c (revision 6949) +++ ucma.c (working copy) @@ -30,6 +30,7 @@ * SOFTWARE. */ +#include #include #include #include @@ -61,7 +62,7 @@ struct ucma_file { struct ucma_context { int id; - wait_queue_head_t wait; + struct completion comp; atomic_t ref; int events_reported; int backlog; @@ -105,7 +106,7 @@ static struct ucma_context* ucma_get_ctx static void ucma_put_ctx(struct ucma_context *ctx) { if (atomic_dec_and_test(&ctx->ref)) - wake_up(&ctx->wait); + complete(&ctx->comp); } static void ucma_cleanup_events(struct ucma_context *ctx) @@ -140,7 +141,7 @@ static struct ucma_context* ucma_alloc_c return NULL; atomic_set(&ctx->ref, 1); - init_waitqueue_head(&ctx->wait); + init_completion(&ctx->comp); ctx->file = file; INIT_LIST_HEAD(&ctx->events); @@ -341,8 +342,8 @@ static ssize_t ucma_destroy_id(struct uc if (IS_ERR(ctx)) return PTR_ERR(ctx); - atomic_dec(&ctx->ref); - wait_event(ctx->wait, !atomic_read(&ctx->ref)); + ucma_put_ctx(ctx); + wait_for_completion(&ctx->comp); /* No new events will be generated after destroying the id. */ rdma_destroy_id(ctx->cm_id); ----- End forwarded message ----- -- MST From jgunthorpe at obsidianresearch.com Thu May 11 10:12:10 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Thu, 11 May 2006 11:12:10 -0600 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <1147346418.4485.68543.camel@hal.voltaire.com> References: <1147310565.4485.56947.camel@hal.voltaire.com> <20060511054803.GE26684@obsidianresearch.com> <1147346418.4485.68543.camel@hal.voltaire.com> Message-ID: <20060511171210.GH26684@obsidianresearch.com> On Thu, May 11, 2006 at 07:20:19AM -0400, Hal Rosenstock wrote: > That would be a simpler check but HopLimit is not a required component > of PathRecord but I think this may not be sufficient as just because a > HopLimit >= 2 doesn't mean that a packet would be forwarded off subnet. I was thinking of the other direction: How does the requestor/client know if a Path requires a GRH. To allow what Roland is talking about you need an unambiguous mechanism where the SA can signal to the client that the path needs a GRH. The only field I can see that could be used for that is HopLimit.. Think of it the other way, HopLimit < 2 means it _can't_ be forwarded off subnet, so that result from the SA should _always_ cause the requesting client to not use a GRH for that path. Any test beyond HopLimit could be done in the SA prior to returning the path records to the client. If further tests are put in the client they only limit the routing configurations that are possible. Note: Although 8.3.6 specifies that 0 and 1 don't let the packet off the subnet table 60 says that CA's should set the HopLimit to 0 and the 'first' router should fill it in. Hmm.. > Why is a request with just a non link local prefix (with HopLimit > wildcarded) not sufficient ? I think it wouuld be best of the SA had full control over what headers the CA's put on their packets on a path by path basis. That allows for the most flexability down the road. Jason From mshefty at ichips.intel.com Thu May 11 10:14:10 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 11 May 2006 10:14:10 -0700 Subject: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface In-Reply-To: References: <44636B59.9040808@ichips.intel.com> <44636E00.9000508@ichips.intel.com> Message-ID: <446370E2.6000105@ichips.intel.com> Roland Dreier wrote: > Sean> I'm working on adding multicast support for userspace (MPI), > Sean> which will also need this. > > Right, but that's not going to be ready for 2.6.18, is it? It won't be ready for merging, no. I was hoping to limit the number of out of tree modules that it required, so there's not a strong reason for merging it. How close to 2.6.18 are we? - Sean From caitlinb at broadcom.com Thu May 11 10:14:23 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Thu, 11 May 2006 10:14:23 -0700 Subject: [openib-general] RE: rdma_cm.h: comment nits. Message-ID: <54AD0F12E08D1541B826BE97C98F99F149F031@NT-SJCA-0751.brcm.ad.broadcom.com> Michael S. Tsirkin wrote: > Quoting r. Tom Tucker : >> So... all that said, I could in fact support rdma_reject on an active >> side connection. But this would effectively reduce to a QP --> ERROR >> and I doubt this matches the semantics you're looking for. > > Why not? Sounds good. Three points: 1) On iWARP this is really an operation on the QP. The cm_id is an artifact of the callback interface, and this reject would be at a point where no further callbacks were needed. It seems strange to request that the application keep this around just for this purpose. 2) The private data supplied will not be delivered. It's unreliable over IB CM, but with iWARP it is reliably never delivered. That's something that the user should at least be told somehow. 3) This is slightly askew from DAPL and IT-API two-way semantics, where the destruction of the connection request is implicit. There's nothing inherently wrong with this, but it should be highlighted so that the wrong expectations are not inferred by readers incorrectly. With those caveats, what you are suggesting is a transport neutral method of abruptly terminating a connection which when used immediately over IB also delivers private data to the other end. I suppose that with the proper caveats and documentation there really isn't any problem with that. I would prefer that the iWARP implemenetation not be required to enforce that the reject call be made *immediately*, because that would be an extra step. Instead the rdma_reject call would be accepted anytime after the MPA Response is received and the cm_id is reassigned or released. From rdreier at cisco.com Thu May 11 10:16:20 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 10:16:20 -0700 Subject: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface In-Reply-To: <446370E2.6000105@ichips.intel.com> (Sean Hefty's message of "Thu, 11 May 2006 10:14:10 -0700") References: <44636B59.9040808@ichips.intel.com> <44636E00.9000508@ichips.intel.com> <446370E2.6000105@ichips.intel.com> Message-ID: Sean> How close to 2.6.18 are we? Unknown really but 2.6.17 will be out probably somewhere between 2 to 4 weeks from now... - R. From mshefty at ichips.intel.com Thu May 11 10:21:08 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 11 May 2006 10:21:08 -0700 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <1147310962.4485.57103.camel@hal.voltaire.com> References: <1147310962.4485.57103.camel@hal.voltaire.com> Message-ID: <44637284.3080004@ichips.intel.com> Hal Rosenstock wrote: > Anytime the send is off the local subnet (as well as multicast), a GRH > is required. Also, there is a management response rule for responding > when the request contained a GRH that require a GRH (13.5.4.4 p. 769). Reading through the responses, I think my problems are worse. Now I'm not even sure how I determine which remote node I'm trying to talk to short of hard-coding the DGID... We currently use ARP to resolve an IP address to a DGID, which I don't believe will work across a router. Does an app even know enough to be able to get a path record? - Sean From caitlinb at broadcom.com Thu May 11 10:23:22 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Thu, 11 May 2006 10:23:22 -0700 Subject: [openib-general] RE: rdma_cm.h: comment nits. Message-ID: <54AD0F12E08D1541B826BE97C98F99F149F036@NT-SJCA-0751.brcm.ad.broadcom.com> Tom Tucker wrote: > On Thu, 2006-05-11 at 10:56 +0300, Michael S. Tsirkin wrote: >> Quoting r. Tom Tucker : >>> Subject: RE: rdma_cm.h: comment nits. >>> >>> On Wed, 2006-05-10 at 14:20 -0700, Caitlin Bestler wrote: >>>> Tom Tucker wrote: >>>> >>>>> >>>>> So... all that said, I could in fact support rdma_reject on an >>>>> active side connection. But this would effectively reduce to a >>>>> QP --> ERROR and I doubt this matches the semantics you're >>>>> looking for. >>>>> >>>>> >>>> >>>> And you could send an RST. >>> >>> Yep, in fact that's what many RNIC's do when you move the QP to >>> ERROR instead of CLOSING. >>> >>>> There's just no way to send any user supplied private data. It's >>>> not just unreliable, it's guaranteed not to arrive. It's still a >>>> long way from the truly desired semantics, but the wire protocol >>>> just doesn't carry that info. >>>> >>> >>> Yeah, I think you're correct -- it would be a bogus "emulation". >> >> I don't think any real ULP passes private data inside the Reject. > >> >> Private data in response (SYN/ACK) is clearly portable, is it not? > > For iWARP the data is actually exchanged after TCP connection > establishment as part of MPA negotiation. But yes, private > data exchange is supported during connection establishment. > It can be provided on the active side (rdma_connect) and on > the passive side (rdma_accept, rdma_reject). What is not > currently supported is calling rdma_connect and then > rdma_reject (presumably to cancel the connect request after > receiving the remote peers private data). The supported > behavior for iWARP on the active side would be to call > rdma_disconnect if you didn't like the private data provided. A good summary. The very minimal benefit to a transport neutral applicatiaon here is the ability to cancel a connection before it is established without destroying the QP. This is nice in theory, but I doubt that there are many applications that need to cancel connection requests for a reason other than shutting down -- and none that need to do this so frequently that destroying the QP would be an unacceptable overhead. So this really comes down to whether the application should be able to request the unreliable delivery of the private data, when in fact it is *more* than unreliable over iWARP. It's definitely approaching transport neutral syntax with transport specific semantics. From rdreier at cisco.com Thu May 11 10:24:51 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 10:24:51 -0700 Subject: [openib-general] [PATCH 6/6] iSER Kconfig and Makefile In-Reply-To: (Or Gerlitz's message of "Thu, 11 May 2006 10:03:30 +0300 (IDT)") References: Message-ID: I fixed up this patch so that it actually hooks into the build (as below). (BTW, when sending patches in the future, please make them apply with "-p1" -- your patches had paths like /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/Kconfig so I had to manually strip off the /usr/src/) Anyway, with a config like CONFIG_SCSI_ISCSI_ATTRS=y # CONFIG_ISCSI_TCP is not set CONFIG_INFINIBAND_ISER=y My build fails with a bunch of errors like: drivers/built-in.o: In function `iser_comp_error_worker':iser_verbs.c:(.text+0x7d7cd): undefined reference to `iscsi_conn_failure' and so on. Is the correct fix for this to add obj-$(CONFIG_INFINIBAND_ISER) += libiscsi.o to drivers/scsi/Makefile? Also, I get the following sparse warning: drivers/infiniband/ulp/iser/iser_initiator.c:610:25: error: incompatible types for operation (&) and the code there does look fishy: itt = hdr->itt & ISCSI_ITT_MASK; /* mask out cid and age bits */ hdr->itt is __be32 but ISCSI_ITT_MASK is just (0xfff), so it seems that there is something wrong, either with the iSCSI endianness annotation or with the code itself. Thanks, Roland --- diff-tree 9120bc6c8b5bdd1f4c85df7a04779ae8faa0c1a5 (from 4161cba09429dae06d249584ee1c7d63c672422c) Author: Or Gerlitz Date: Thu May 11 10:03:30 2006 +0300 IB/iser: iSER Kconfig and Makefile Kconfig and Makefile for iSER. Signed-off-by: Or Gerlitz Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index ba2d650..69a53d4 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -41,4 +41,6 @@ source "drivers/infiniband/ulp/ipoib/Kco source "drivers/infiniband/ulp/srp/Kconfig" +source "drivers/infiniband/ulp/iser/Kconfig" + endmenu diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile index eea2732..abeaf79 100644 --- a/drivers/infiniband/Makefile +++ b/drivers/infiniband/Makefile @@ -3,3 +3,4 @@ obj-$(CONFIG_INFINIBAND_MTHCA) += hw/mt obj-$(CONFIG_IPATH_CORE) += hw/ipath/ obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ obj-$(CONFIG_INFINIBAND_SRP) += ulp/srp/ +obj-$(CONFIG_INFINIBAND_SRP) += ulp/iser/ diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig new file mode 100644 index 0000000..fead87d --- /dev/null +++ b/drivers/infiniband/ulp/iser/Kconfig @@ -0,0 +1,11 @@ +config INFINIBAND_ISER + tristate "ISCSI RDMA Protocol" + depends on INFINIBAND && SCSI + select SCSI_ISCSI_ATTRS + ---help--- + Support for the ISCSI RDMA Protocol over InfiniBand. This + allows you to access storage devices that speak ISER/ISCSI + over InfiniBand. + + The ISER protocol is defined by IETF. + See . diff --git a/drivers/infiniband/ulp/iser/Makefile b/drivers/infiniband/ulp/iser/Makefile new file mode 100644 index 0000000..fe6cd15 --- /dev/null +++ b/drivers/infiniband/ulp/iser/Makefile @@ -0,0 +1,4 @@ +obj-$(CONFIG_INFINIBAND_ISER) += ib_iser.o + +ib_iser-y := iser_verbs.o iser_initiator.o iser_memory.o \ + iscsi_iser.o From mshefty at ichips.intel.com Thu May 11 10:25:45 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 11 May 2006 10:25:45 -0700 Subject: [openib-general] [PATCH] RE: compliancy issue? In-Reply-To: References: Message-ID: <44637399.6040000@ichips.intel.com> Sean Hefty wrote: > Can you try this simple patch and see if it fixes your problem? You will > need to call rdma_accept() or rdma_reject() after receiving a CONNECT_RESPONSE > event. The conn_param to rdma_accept() should be NULL. Michael, Did you ever get a chance to try this patch? If so, I will commit. - Sean From rdreier at cisco.com Thu May 11 10:29:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 10:29:32 -0700 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <44637284.3080004@ichips.intel.com> (Sean Hefty's message of "Thu, 11 May 2006 10:21:08 -0700") References: <1147310962.4485.57103.camel@hal.voltaire.com> <44637284.3080004@ichips.intel.com> Message-ID: Sean> We currently use ARP to resolve an IP address to a DGID, Sean> which I don't believe will work across a router. Does an Sean> app even know enough to be able to get a path record? I think you're fine. The IB router just has to handle forwarding multicasts between two IB subnets for ARP to work. If there's also an IP router in between the two hosts then there's a problem, but I don't think it's that reasonable to expect to make a direct RDMA connection in that case. - R. From rdreier at cisco.com Thu May 11 10:30:38 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 10:30:38 -0700 Subject: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface In-Reply-To: <20060511170625.GA2595@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 11 May 2006 20:06:25 +0300") References: <44636B59.9040808@ichips.intel.com> <20060511170625.GA2595@mellanox.co.il> Message-ID: Michael> I also think it might make sense to use the new SA query Michael> retry support. I think currently ipoib retries SA Michael> queries by itself. Yes, IPoIB does the retrying itself now. So are you voting in favor of adding the SA retry stuff for 2.6.18? How about the separate multicast module? - R. From halr at voltaire.com Thu May 11 10:45:00 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 May 2006 13:45:00 -0400 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: References: <1147310962.4485.57103.camel@hal.voltaire.com> <44637284.3080004@ichips.intel.com> Message-ID: <1147369498.4485.75761.camel@hal.voltaire.com> On Thu, 2006-05-11 at 13:29, Roland Dreier wrote: > Sean> We currently use ARP to resolve an IP address to a DGID, > Sean> which I don't believe will work across a router. Does an > Sean> app even know enough to be able to get a path record? > > I think you're fine. The IB router just has to handle forwarding > multicasts Specifically IPoIB broadcast > between two IB subnets for ARP to work. Yes, because an IPoIB subnet can span multiple IB subnets. > If there's also an IP router in between the two hosts when the hosts are on different IP(oIB) subnets. > then there's a problem, but I don't think it's that reasonable > to expect to make a direct RDMA connection in that case. That's a different case; you don't ARP off your IPoIB subnet; you get the next hop router towards that IPoIB subnet. -- Hal > - R. From jgunthorpe at obsidianresearch.com Thu May 11 11:03:59 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Thu, 11 May 2006 12:03:59 -0600 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <44637284.3080004@ichips.intel.com> References: <1147310962.4485.57103.camel@hal.voltaire.com> <44637284.3080004@ichips.intel.com> Message-ID: <20060511180359.GJ26684@obsidianresearch.com> On Thu, May 11, 2006 at 10:21:08AM -0700, Sean Hefty wrote: > Hal Rosenstock wrote: > >Anytime the send is off the local subnet (as well as multicast), a GRH > >is required. Also, there is a management response rule for responding > >when the request contained a GRH that require a GRH (13.5.4.4 p. 769). > Reading through the responses, I think my problems are worse. Now I'm not > even sure how I determine which remote node I'm trying to talk to short of > hard-coding the DGID... > We currently use ARP to resolve an IP address to a DGID, which I don't > believe will work across a router. Does an app even know enough to be able > to get a path record? The only wrinkles I could see you having is how to choose between multiple DGID's when generating the ARP response. I don't think that is a serious issue though since any GID to any GID should be routable on the subnet. I haven't looked at the ARP code, but based on the RFCs the IPv4 ARP process would be more or less: 1) Send ARP datagram to the broadcast multicast group LID w/ GRH. The ARP packet includes the IPv4 address of the sender and the GID/QPN (hardware address) of the sender, asking for the hardware address of the target IPv4. A router must support multicast routing so that the ARP request is forwarded to the remote subnet. It has a GRH of course so this is OK. The SM and router work together to make this happen. 2) ARP responder matches the target IP address, gets the IP of the requestor, and the GID/QPN from the ARP packet's sender fields We are still OK since the GID in the ARP packet's sender fields is global. 3) ARP responder produces a unicast packet to the IPv4 requestor address: - The sender's GID/QPN is converted into a path either from a local cache or via a SA query. The sender's GID combined with any of the target's GID's should be sufficient to ask the SA for a path. [Note: that you must use the _hardware_ address here and you cannot just lookup the IPv4 sender address in the neighbor cache. This is needed to support ARP tricks like zeroconf that use null source IPs] - This query results in a path record for communication with the sender. [Some implementations will learn based on ARP requests and will update the neighbor cache here] - The path record is used to generate the unicast headers, GRH and all - if necessary. - The same SGID that was used in the path record query above is returned in the ARP response as the target's address. Since the SA specifies the path to get back to the requestor based only on the GID in the ARP request it can produce a path that crosses the router. 4) The ARP requestor now gets the respondor's GID/QPN from the unicast ARP response and does the same path lookup that the ARP requestor did to get the 'reverse' path. Again, since the SA is now involved the resulting path can cross the router. IPv6 is similar, but the packet format is different and the 'ARP' (NS packet) request is sent to a multicast address chosen by 'hashing' the IPv6 address. Hope this helps, Jason From HNGUYEN at de.ibm.com Thu May 11 11:08:56 2006 From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen) Date: Thu, 11 May 2006 20:08:56 +0200 Subject: [openib-general] patch: ibv_query_qp() causes segmentation fault if init_attr is NULL In-Reply-To: Message-ID: Hi Roland! Just realized in one of our testcases that ibv_query_qp() causes a segmentation fault if the parameter init_attr is NULL. Not sure if I'm wrong, but init_attr appears to be optional, isn't it? Anyway here is my patch for review: Index: src/userspace/libibverbs/src/cmd.c =================================================================== --- src/userspace/libibverbs/src/cmd.c (revision 7064) +++ src/userspace/libibverbs/src/cmd.c (working copy) @@ -673,17 +673,19 @@ attr->alt_ah_attr.is_global = resp.alt_dest.is_global; attr->alt_ah_attr.port_num = resp.alt_dest.port_num; - init_attr->qp_context = qp->qp_context; - init_attr->send_cq = qp->send_cq; - init_attr->recv_cq = qp->recv_cq; - init_attr->srq = qp->srq; - init_attr->qp_type = qp->qp_type; - init_attr->cap.max_send_wr = resp.max_send_wr; - init_attr->cap.max_recv_wr = resp.max_recv_wr; - init_attr->cap.max_send_sge = resp.max_send_sge; - init_attr->cap.max_recv_sge = resp.max_recv_sge; - init_attr->cap.max_inline_data = resp.max_inline_data; - init_attr->sq_sig_all = resp.sq_sig_all; + if (init_attr) { + init_attr->qp_context = qp->qp_context; + init_attr->send_cq = qp->send_cq; + init_attr->recv_cq = qp->recv_cq; + init_attr->srq = qp->srq; + init_attr->qp_type = qp->qp_type; + init_attr->cap.max_send_wr = resp.max_send_wr; + init_attr->cap.max_recv_wr = resp.max_recv_wr; + init_attr->cap.max_send_sge = resp.max_send_sge; + init_attr->cap.max_recv_sge = resp.max_recv_sge; + init_attr->cap.max_inline_data = resp.max_inline_data; + init_attr->sq_sig_all = resp.sq_sig_all; + } return 0; } Thanks! Mit freundlichen Gruessen/Kind Regards Hoang-Nam Nguyen (See attached file: cmd.c_trunk_7064.diff) -------------- next part -------------- A non-text attachment was scrubbed... Name: cmd.c_trunk_7064.diff Type: application/octet-stream Size: 1657 bytes Desc: not available URL: From or.gerlitz at gmail.com Thu May 11 11:24:09 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Thu, 11 May 2006 20:24:09 +0200 Subject: [openib-general] [PATCH 6/6] iSER Kconfig and Makefile In-Reply-To: References: Message-ID: <15ddcffd0605111124v390e1dbei18508787ec29afac@mail.gmail.com> On 5/11/06, Roland Dreier wrote: > I fixed up this patch so that it actually hooks into the build (as > below). (BTW, when sending patches in the future, please make them > apply with "-p1" -- your patches had paths like > > /usr/src/linux-2.6.17-rc3/drivers/infiniband/ulp/iser-x/Kconfig > > so I had to manually strip off the /usr/src/) OK, will do that. > > Anyway, with a config like > > CONFIG_SCSI_ISCSI_ATTRS=y > # CONFIG_ISCSI_TCP is not set > CONFIG_INFINIBAND_ISER=y > > My build fails with a bunch of errors like: > > drivers/built-in.o: In function `iser_comp_error_worker':iser_verbs.c:(.text+0x7d7cd): undefined reference to `iscsi_conn_failure' > > and so on. Is the correct fix for this to add > > obj-$(CONFIG_INFINIBAND_ISER) += libiscsi.o > > to drivers/scsi/Makefile? Indeed since libiscsi does not have a CONFIG_ of its own, you need to set CONFIG_ISCSI_TCP to have libiscsi being built and then iser links with it. So there are two options here: either to set a CONFIG_LIBISCSI and select it by both CONFIG_ISCSI_TCP and CONFIG_INFINIBAND_ISER or the approach you were suggesting. I am cc-ing Mike Christie and James Bottomley on this email, if you guys have a preference, let me know and i can produce a patch to drivers/scsi/Kconfig and Makefile. > Also, I get the following sparse warning: > > drivers/infiniband/ulp/iser/iser_initiator.c:610:25: error: incompatible types for operation (&) > > and the code there does look fishy: > > itt = hdr->itt & ISCSI_ITT_MASK; /* mask out cid and age bits */ > > hdr->itt is __be32 but ISCSI_ITT_MASK is just (0xfff), so it seems > that there is something wrong, either with the iSCSI endianness > annotation or with the code itself. What's little wrong here is that hdr->itt is __be32 but it never gets htonl-ed before placing on the wire so i can't ntohl it back before AND-ing it with the ITT_MASK. Its only little wrong b/c in a way the target treats the ITT (Initiator Task Tag) as an opaque tag and hence it just returns on the command/control response the itt of the command/control. At the bottom line, its need to be fixed by converting the itt to network order in libiscsi and then converting it to host order in iser before doing the arthimetic AND, so the sparse endianess related warning will be gone. I will be able to fix that on Sunday and send the fixes to linux-scsi and openib. > Thanks, > Roland Thanks to you for pointing these issues, also, in drivers/infiniband/Makefile > +obj-$(CONFIG_INFINIBAND_SRP) += ulp/iser/ should be > +obj-$(CONFIG_INFINIBAND_ISER) += ulp/iser/ Or. > diff-tree 9120bc6c8b5bdd1f4c85df7a04779ae8faa0c1a5 (from 4161cba09429dae06d249584ee1c7d63c672422c) > Author: Or Gerlitz > Date: Thu May 11 10:03:30 2006 +0300 > > IB/iser: iSER Kconfig and Makefile > > Kconfig and Makefile for iSER. > > Signed-off-by: Or Gerlitz > Signed-off-by: Roland Dreier > > diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig > index ba2d650..69a53d4 100644 > --- a/drivers/infiniband/Kconfig > +++ b/drivers/infiniband/Kconfig > @@ -41,4 +41,6 @@ source "drivers/infiniband/ulp/ipoib/Kco > > source "drivers/infiniband/ulp/srp/Kconfig" > > +source "drivers/infiniband/ulp/iser/Kconfig" > + > endmenu > diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile > index eea2732..abeaf79 100644 > --- a/drivers/infiniband/Makefile > +++ b/drivers/infiniband/Makefile > @@ -3,3 +3,4 @@ obj-$(CONFIG_INFINIBAND_MTHCA) += hw/mt > obj-$(CONFIG_IPATH_CORE) += hw/ipath/ > obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ > obj-$(CONFIG_INFINIBAND_SRP) += ulp/srp/ > +obj-$(CONFIG_INFINIBAND_SRP) += ulp/iser/ > diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig > new file mode 100644 > index 0000000..fead87d > --- /dev/null > +++ b/drivers/infiniband/ulp/iser/Kconfig > @@ -0,0 +1,11 @@ > +config INFINIBAND_ISER > + tristate "ISCSI RDMA Protocol" > + depends on INFINIBAND && SCSI > + select SCSI_ISCSI_ATTRS > + ---help--- > + Support for the ISCSI RDMA Protocol over InfiniBand. This > + allows you to access storage devices that speak ISER/ISCSI > + over InfiniBand. > + > + The ISER protocol is defined by IETF. > + See . > diff --git a/drivers/infiniband/ulp/iser/Makefile b/drivers/infiniband/ulp/iser/Makefile > new file mode 100644 > index 0000000..fe6cd15 > --- /dev/null > +++ b/drivers/infiniband/ulp/iser/Makefile > @@ -0,0 +1,4 @@ > +obj-$(CONFIG_INFINIBAND_ISER) += ib_iser.o > + > +ib_iser-y := iser_verbs.o iser_initiator.o iser_memory.o \ > + iscsi_iser.o From rdreier at cisco.com Thu May 11 11:29:06 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 11:29:06 -0700 Subject: [openib-general] [PATCH 6/6] iSER Kconfig and Makefile In-Reply-To: <15ddcffd0605111124v390e1dbei18508787ec29afac@mail.gmail.com> (Or Gerlitz's message of "Thu, 11 May 2006 20:24:09 +0200") References: <15ddcffd0605111124v390e1dbei18508787ec29afac@mail.gmail.com> Message-ID: Or> Indeed since libiscsi does not have a CONFIG_ of its own, you Or> need to set CONFIG_ISCSI_TCP to have libiscsi being built and Or> then iser links with it. Or> So there are two options here: either to set a CONFIG_LIBISCSI Or> and select it by both CONFIG_ISCSI_TCP and Or> CONFIG_INFINIBAND_ISER or the approach you were suggesting. I Or> am cc-ing Mike Christie and James Bottomley on this email, if Or> you guys have a preference, let me know and i can produce a Or> patch to drivers/scsi/Kconfig and Makefile. In the iser branch of my git tree I just added obj-$(CONFIG_INFINIBAND_ISER) += libiscsi.o to drivers/scsi/Makefile. So let me know if I should change that. Or> Thanks to you for pointing these issues, also, in Or> drivers/infiniband/Makefile >> +obj-$(CONFIG_INFINIBAND_SRP) += ulp/iser/ Or> should be >> +obj-$(CONFIG_INFINIBAND_ISER) += ulp/iser/ Thanks, I made a copy-and-paste error there. I pushed out a fixed tree. - R. From or.gerlitz at gmail.com Thu May 11 11:29:04 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Thu, 11 May 2006 20:29:04 +0200 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> Message-ID: <15ddcffd0605111129i2039fde1ld65517f900a1b917@mail.gmail.com> On 5/11/06, Roland Dreier wrote: > Or> I don't see the niether of the two iscsi updates for 2.6.18 > Or> (both sent by Mike Christie) in your git tree, i was looking > Or> for it all over (in the for-2.6.18 , for-mm, master, for-linus > Or> branches ...). Do i missing anything or you were waiting for > Or> my repost of the patches to pull the iscsi updates? > > Yeah, I haven't pushed it out yet. > > I will be putting iSER into an iser branch of my tree, which I'll ask > Linus to pull once the SCSI changes are in his tree. OK, thanks. Let me know when you have the branch, so i will be able to test it with this exact code configuration. Or. From rdreier at cisco.com Thu May 11 11:31:28 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 11:31:28 -0700 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: <15ddcffd0605111129i2039fde1ld65517f900a1b917@mail.gmail.com> (Or Gerlitz's message of "Thu, 11 May 2006 20:29:04 +0200") References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> <15ddcffd0605111129i2039fde1ld65517f900a1b917@mail.gmail.com> Message-ID: Or> OK, thanks. Let me know when you have the branch, so i will be Or> able to test it with this exact code configuration. It's there and pushed to master.kernel.org - R. From glebn at voltaire.com Thu May 11 11:59:26 2006 From: glebn at voltaire.com (Gleb Natapov) Date: Thu, 11 May 2006 21:59:26 +0300 Subject: [openib-general] [resend][RFC][PATCH] adding call to madvise In-Reply-To: References: <20060511134217.GW5319@minantech.com> Message-ID: <20060511185926.GA1561@minantech.com> On Thu, May 11, 2006 at 09:19:13AM -0700, Roland Dreier wrote: > In general this seems good, but I have a few quick comments: > > > +#ifndef MADV_DONTFORK > > +#define MADV_DONTFORK 10 > > +#endif > > +#ifndef MADV_DOFORK > > +#define MADV_DOFORK 11 > > +#endif > > This should probably be in the only file that uses it, memory.c. And > I think it's cleanest to use autoconf to check if MADV_DONTFORK and > MADV_DOFORK are available. I'll move this to memory.c. What about libmthca? Leave it in the header there? > > > --- libibverbs/include/infiniband/verbs.h (revision 7112) > > +++ libibverbs/include/infiniband/verbs.h (working copy) > > @@ -289,6 +289,8 @@ > > uint32_t handle; > > uint32_t lkey; > > uint32_t rkey; > > + void *addr; > > + size_t length; > > }; > > This breaks ABI, right? > Yes. Absolutely. AFAIK you are going to remove libsysfs dependency and break ABI with this change, I think we can piggyback this one then. > > --- libmthca/src/verbs.c (revision 7112) > > +++ libmthca/src/verbs.c (working copy) > > @@ -134,6 +134,9 @@ > > return NULL; > > } > > > > + mr->addr = addr; > > + mr->length = length; > > + > > return mr; > > } > > What's the reason to set addr and length here? Doesn't libibverbs > already do it? > libmthca uses __mthca_reg_mr() to do internal registrations (qp, cq). If every hw driver will set them we can remove this from libibverbs. > > if (!cq->buf) > > goto err; > > > > + madvise(cq->buf, cqe * MTHCA_CQ_ENTRY_SIZE, MADV_DONTFORK); > > cq->mr = __mthca_reg_mr(to_mctx(context)->pd, cq->buf, > > cqe * MTHCA_CQ_ENTRY_SIZE, > > 0, IBV_ACCESS_LOCAL_WRITE); > > @@ -247,6 +251,7 @@ > > mthca_dereg_mr(cq->mr); > > > > err_buf: > > + madvise(cq->buf, cqe * MTHCA_CQ_ENTRY_SIZE, MADV_DOFORK); > > free(cq->buf); > > It seems it would be better to put the DONTFORK call into > mthca_alloc_cq_buf(), and the DOFORK into a new mthca_free_cq_buf() call. > > Actually, to handle the QP and SRQ cases too it's probably better to > have wrappers for posix_memalign() and free() to keep this > encapsulated in one place. > Will do. -- Gleb. From mst at mellanox.co.il Thu May 11 12:02:22 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 11 May 2006 22:02:22 +0300 Subject: [openib-general] [PATCH] RE: compliancy issue? In-Reply-To: <44637399.6040000@ichips.intel.com> References: <44637399.6040000@ichips.intel.com> Message-ID: <20060511190222.GA3669@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] [PATCH] RE: compliancy issue? > > Sean Hefty wrote: > >Can you try this simple patch and see if it fixes your problem? You will > >need to call rdma_accept() or rdma_reject() after receiving a > >CONNECT_RESPONSE > >event. The conn_param to rdma_accept() should be NULL. > > Michael, > > Did you ever get a chance to try this patch? If so, I will commit. Not yet, sorry - I didn't yet update SDP to handle the API change. -- MST From mst at mellanox.co.il Thu May 11 12:07:32 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 11 May 2006 22:07:32 +0300 Subject: [openib-general] Re: rdma_cm.h: comment nits. In-Reply-To: <1147367055.9881.23.camel@trinity.ogc.int> References: <54AD0F12E08D1541B826BE97C98F99F149EF5A@NT-SJCA-0751.brcm.ad.broadcom.com> <1147308499.5093.85.camel@trinity.ogc.int> <20060511075608.GL10669@mellanox.co.il> <1147367055.9881.23.camel@trinity.ogc.int> Message-ID: <20060511190732.GB3669@mellanox.co.il> Quoting r. Tom Tucker : > What is not currently supported is calling rdma_connect > and then rdma_reject (presumably to cancel the connect request after > receiving the remote peers private data). The supported behavior for > iWARP on the active side would be to call rdma_disconnect if you didn't > like the private data provided. In that case, how about we change rdma_disconnect for IB to do reject if connection isn't established yet? -- MST From mshefty at ichips.intel.com Thu May 11 12:22:13 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 11 May 2006 12:22:13 -0700 Subject: [openib-general] Re: rdma_cm.h: comment nits. In-Reply-To: <20060511190732.GB3669@mellanox.co.il> References: <54AD0F12E08D1541B826BE97C98F99F149EF5A@NT-SJCA-0751.brcm.ad.broadcom.com> <1147308499.5093.85.camel@trinity.ogc.int> <20060511075608.GL10669@mellanox.co.il> <1147367055.9881.23.camel@trinity.ogc.int> <20060511190732.GB3669@mellanox.co.il> Message-ID: <44638EE5.1060806@ichips.intel.com> Michael S. Tsirkin wrote: > In that case, how about we change rdma_disconnect for IB to do reject if > connection isn't established yet? Unless I missed something, the only problem that we're trying to solve is that SDP needs to be able to reject a connection based on private data in the REP, but wants the RDMA CM manage its QP states. The patch that I posted should allow this. I don't see the motivation to change the general behavior of the RDMA CM beyond that. My personal take on private data is that it is application data that could be transferred over the connection, with fewer restrictions and greater reliability. - Sean From tom at opengridcomputing.com Thu May 11 12:32:28 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Thu, 11 May 2006 14:32:28 -0500 Subject: [openib-general] Re: rdma_cm.h: comment nits. In-Reply-To: <44638EE5.1060806@ichips.intel.com> References: <54AD0F12E08D1541B826BE97C98F99F149EF5A@NT-SJCA-0751.brcm.ad.broadcom.com> <1147308499.5093.85.camel@trinity.ogc.int> <20060511075608.GL10669@mellanox.co.il> <1147367055.9881.23.camel@trinity.ogc.int> <20060511190732.GB3669@mellanox.co.il> <44638EE5.1060806@ichips.intel.com> Message-ID: <1147375948.9881.28.camel@trinity.ogc.int> On Thu, 2006-05-11 at 12:22 -0700, Sean Hefty wrote: > Michael S. Tsirkin wrote: > > In that case, how about we change rdma_disconnect for IB to do reject if > > connection isn't established yet? > > Unless I missed something, the only problem that we're trying to solve is that > SDP needs to be able to reject a connection based on private data in the REP, > but wants the RDMA CM manage its QP states. The patch that I posted should > allow this. > > I don't see the motivation to change the general behavior of the RDMA CM beyond > that. My personal take on private data is that it is application data that > could be transferred over the connection, with fewer restrictions and greater > reliability. > I agree. > - Sean From mst at mellanox.co.il Thu May 11 12:36:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 11 May 2006 22:36:19 +0300 Subject: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface In-Reply-To: References: <44636B59.9040808@ichips.intel.com> <20060511170625.GA2595@mellanox.co.il> Message-ID: <20060511193619.GC3669@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface > > Michael> I also think it might make sense to use the new SA query > Michael> retry support. I think currently ipoib retries SA > Michael> queries by itself. > > Yes, IPoIB does the retrying itself now. > > So are you voting in favor of adding the SA retry stuff for 2.6.18? Oh, I donnu. I find the idea itself appealing. OTOH, I was assuming that SA retry stuff would be a prerequisite for CMA - I thought that for CMA it makes sens to use SA retry since it does not handle retries itself, but for some reason it seems to pass 0 as retries parameter. I was just pointing out that *if* the SA retry stuff goes in then it makes sense to use it in IPoIB, since its quite easy. > How about the separate multicast module? I'm nervous about doing big changes in the multicast code in ipoib - it had more than a fair share of subtle races. I wander whether we can avoid this by simply preventing userspace from touching IPoIB multicast groups. I have a gut feeling this might also be a good idea, security wise. What do you think? -- MST From mshefty at ichips.intel.com Thu May 11 12:48:26 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 11 May 2006 12:48:26 -0700 Subject: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface In-Reply-To: <20060511193619.GC3669@mellanox.co.il> References: <44636B59.9040808@ichips.intel.com> <20060511170625.GA2595@mellanox.co.il> <20060511193619.GC3669@mellanox.co.il> Message-ID: <4463950A.6030503@ichips.intel.com> Michael S. Tsirkin wrote: > I'm nervous about doing big changes in the multicast code in ipoib - it had more > than a fair share of subtle races. The patch simplifies the multicast code in ipoib, and the serialization in the multicast module is simple enough that we should be able to have a fair amount of confidence that races are avoided. (It's not like I'm biased or anything.) The code just needs more testing. Going forward, it makes more sense to have ipoib use the multicast module. The only question to me is when that conversion takes place. I'm also worried about having to try to regenerate the patch after other changes go into the code. - Sean From mst at mellanox.co.il Thu May 11 13:04:17 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 11 May 2006 23:04:17 +0300 Subject: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface In-Reply-To: <4463950A.6030503@ichips.intel.com> References: <44636B59.9040808@ichips.intel.com> <20060511170625.GA2595@mellanox.co.il> <20060511193619.GC3669@mellanox.co.il> <4463950A.6030503@ichips.intel.com> Message-ID: <20060511200417.GA4138@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface > > Michael S. Tsirkin wrote: > >I'm nervous about doing big changes in the multicast code in ipoib - it had > >more than a fair share of subtle races. > > The patch simplifies the multicast code in ipoib, and the serialization in > the multicast module is simple enough that we should be able to have a fair > amount of confidence that races are avoided. (It's not like I'm biased or > anything.) The code just needs more testing. I'm not against this approach on principle, I hope I didn't make it sound like I am. I just don't feel IPoIB should be the first user of *any* module. > Going forward, it makes more sense to have ipoib use the multicast module. > The only question to me is when that conversion takes place. I'm also > worried about having to try to regenerate the patch after other changes go > into the code. I don't think anyone's touching ipoib_multicast.c at the moment, so don't worry about that. -- MST From eeb at bartonsoftware.com Thu May 11 13:21:00 2006 From: eeb at bartonsoftware.com (Eric Barton) Date: Thu, 11 May 2006 21:21:00 +0100 Subject: [openib-general] Comms Errors Message-ID: <200605112021.k4BKL0Hr012573@robert.bartonsoftware.com> Hi, What have I done bad to make a send (the very first one on a newly established RC QP) complete with status 10 (IB_WC_REM_ACCESS_ERR)? The remote QP gets an event callback of type 3 (IB_EVENT_QP_ACCESS_ERR), and then all posted receives on both QPs complete with status 5 (IB_WC_WR_FLUSH_ERR), which I guess is becuase openib moved the QP state to error. Thanks in advance... -- Cheers, Eric --------------------------------------------------- |Eric Barton Barton Software | |9 York Gardens Tel: +44 (117) 330 1575 | |Clifton Mobile: +44 (7909) 680 356 | |Bristol BS8 4LL Fax: call first | |United Kingdom E-Mail: eeb at bartonsoftware.com| --------------------------------------------------- From mshefty at ichips.intel.com Thu May 11 13:48:24 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 11 May 2006 13:48:24 -0700 Subject: [openib-general] Comms Errors In-Reply-To: <200605112021.k4BKL0Hr012573@robert.bartonsoftware.com> References: <200605112021.k4BKL0Hr012573@robert.bartonsoftware.com> Message-ID: <4463A318.7050708@ichips.intel.com> Eric Barton wrote: > What have I done bad to make a send (the very first one on a newly established > RC QP) complete with status 10 (IB_WC_REM_ACCESS_ERR)? > > The remote QP gets an event callback of type 3 (IB_EVENT_QP_ACCESS_ERR), and > then all posted receives on both QPs complete with status 5 > (IB_WC_WR_FLUSH_ERR), which I guess is becuase openib moved the QP state to > error. How did you connect the QPs? From mshefty at ichips.intel.com Thu May 11 16:13:33 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 11 May 2006 16:13:33 -0700 Subject: [openib-general] Comms Errors In-Reply-To: <004901c67548$a4379900$0281a8c0@ebpc> References: <004901c67548$a4379900$0281a8c0@ebpc> Message-ID: <4463C51D.3010305@ichips.intel.com> Eric Barton wrote: > int > kiblnd_post_rx (kib_rx_t *rx, int credit) > { > kib_conn_t *conn = rx->rx_conn; > struct ib_recv_wr *bad_wrq; > int rc; > > LASSERT (!in_interrupt()); > LASSERT (credit == IBLND_POSTRX_NO_CREDIT || > credit == IBLND_POSTRX_PEER_CREDIT || > credit == IBLND_POSTRX_RSRVD_CREDIT); > > rx->rx_sge.length = IBLND_MSG_SIZE; > rx->rx_sge.lkey = kiblnd_data.kib_mr->lkey; > rx->rx_sge.addr = rx->rx_msgaddr; > > rx->rx_wrq.next = NULL; > rx->rx_wrq.sg_list = &rx->rx_sge; > rx->rx_wrq.num_sge = 1; > rx->rx_wrq.wr_id = kiblnd_ptr2wreqid(rx, IBLND_WID_RX); > > LASSERT (conn->ibc_state >= IBLND_CONN_INIT); > LASSERT (rx->rx_nob >= 0); /* not posted */ > > CDEBUG(D_NET, "posting rx [%d %x "LPX64"]\n", > rx->rx_wrq.sg_list->length, > rx->rx_wrq.sg_list->lkey, > rx->rx_wrq.sg_list->addr); > > if (conn->ibc_state > IBLND_CONN_ESTABLISHED) { > /* No more posts for this rx; so lose its ref */ > kiblnd_conn_decref(conn); > return 0; > } > > rx->rx_nob = -1; /* flag posted */ > > rc = ib_post_recv(conn->ibc_cmid->qp, &rx->rx_wrq, &bad_wrq); > ... Based on your errors, my guess is that there's something wrong with the posted receive, but I don't see what from this code. Can you post where you create the mr and map the receive buffers too? - Sean From rdreier at cisco.com Thu May 11 16:40:55 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 16:40:55 -0700 Subject: [openib-general] [resend][RFC][PATCH] adding call to madvise In-Reply-To: <20060511185926.GA1561@minantech.com> (Gleb Natapov's message of "Thu, 11 May 2006 21:59:26 +0300") References: <20060511134217.GW5319@minantech.com> <20060511185926.GA1561@minantech.com> Message-ID: Gleb> I'll move this to memory.c. What about libmthca? Leave it in Gleb> the header there? I guess so. Gleb> Yes. Absolutely. AFAIK you are going to remove libsysfs Gleb> dependency and break ABI with this change, I think we can Gleb> piggyback this one then. Right, I should open a libibverbs 1.1 tree for ABI breaking changes soon. Gleb> libmthca uses __mthca_reg_mr() to do internal registrations Gleb> (qp, cq). If every hw driver will set them we can remove Gleb> this from libibverbs. But don't you the madvise by hand for these cases? It's definitely better to have libibverbs set common fields of non-HW-dependent data structures. I guess I still don't follow why these fields are set in two places. - R. From rdreier at cisco.com Thu May 11 16:43:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 11 May 2006 16:43:21 -0700 Subject: [openib-general] Comms Errors In-Reply-To: <200605112021.k4BKL0Hr012573@robert.bartonsoftware.com> (Eric Barton's message of "Thu, 11 May 2006 21:21:00 +0100") References: <200605112021.k4BKL0Hr012573@robert.bartonsoftware.com> Message-ID: Eric> What have I done bad to make a send (the very first one on a Eric> newly established RC QP) complete with status 10 Eric> (IB_WC_REM_ACCESS_ERR)? This would indicate that you did an RDMA operation to memory you're not allowed to touch. Either the R_Key or the address is wrong. - R. From halr at voltaire.com Thu May 11 16:52:05 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 11 May 2006 19:52:05 -0400 Subject: [openib-general] [PATCH] osmtest: create SA database prior to running SLVL and VLArb tests Message-ID: <1147391493.4485.83158.camel@hal.voltaire.com> In osmtest/osmtest.c, need to create SA database before running SLVL and VLarb tests Signed-off-by: Hal Rosenstock Modified: gen2/trunk/src/userspace/management/osm/osmtest/osmtest.c =================================================================== --- gen2/trunk/src/userspace/management/osm/osmtest/osmtest.c 2006-05-11 23:45:36 UTC (rev 7129) +++ gen2/trunk/src/userspace/management/osm/osmtest/osmtest.c 2006-05-11 23:53:09 UTC (rev 7130) @@ -6414,6 +6414,16 @@ */ if (p_osmt->opt.flow == 7) { + status = osmtest_create_db( p_osmt ); + if( status != IB_SUCCESS ) + { + osm_log( &p_osmt->log, OSM_LOG_ERROR, + "osmtest_run: ERR 014A: " + "Database creation failed (%s)\n", + ib_get_err_str( status ) ); + goto Exit; + } + status = osmt_run_slvl_and_vlarb_records_flow(p_osmt); if( status != IB_SUCCESS ) { From hycsw at ca.sandia.gov Thu May 11 17:43:15 2006 From: hycsw at ca.sandia.gov (Helen Chen) Date: Thu, 11 May 2006 17:43:15 -0700 Subject: [openib-general] Fw: SRP problems Message-ID: <001001c6755d$10ed6700$26f8f692@hycswlaptop> RE: configurationHelp needed! Helen ----- Original Message ----- From: Helen Chen To: vuhuong at mellanox.com Cc: Ellis, Dave ; Korpacz, Joe ; hycsw at ca (Helen Chen) ; Decker, Jeffrey C Sent: Thursday, May 11, 2006 9:16 AM Subject: Fw: SRP problems Vu, Dave Ellis from Engenio suggested that we contact you for the SRP problems we are experiencing. Currently we are running the SRP implementation distributed in the 2.6.16.5 kernel. Your help will be highly appreciated. Thanks, Helen ----- Original Message ----- From: Decker, Jeffrey C To: Ellis, Dave Cc: Chen, Helen Y Sent: Wednesday, April 26, 2006 5:01 PM Subject: RE: configuration Hi Dave, Basically I was doing exactly what these instructions say to do. What I did find is that I need to echo each target separately and also that for whatever reason it shows twice as many entries in /proc/scsi/scsi. Mitch hooked his Mellanox IB Gold up to it and we also found that it registers twice as many entries in /proc/scsi/scsi. Anyway, we have 4 accessible devices in /dev/sd*. The problem I am having now is getting a bunch of errors from dmesg and the disks are reallly slow. Do you know how to fix this? The output is the same as I showed you before. And there is not a lun 31 or whatever showing up on the gui. ........ printk: 97 messages suppressed. ib_srp: Target has req_lim 0 printk: 147 messages suppressed. ib_srp: Target has req_lim 0 printk: 104 messages suppressed. ib_srp: Target has req_lim 0 printk: 103 messages suppressed. ib_srp: Target has req_lim 0 printk: 119 messages suppressed. ib_srp: Target has req_lim 0 printk: 146 messages suppressed. ib_srp: Target has req_lim 0 printk: 65 messages suppressed. ib_srp: Target has req_lim 0 printk: 90 messages suppressed. ib_srp: Target has req_lim 0 printk: 139 messages suppressed. ib_srp: Target has req_lim 0 printk: 116 messages suppressed. ib_srp: Target has req_lim 0 printk: 133 messages suppressed. ib_srp: Target has req_lim 0 on3 ~ # Thanks! Jeff -----Original Message----- From: Ellis, Dave [mailto:Dave.Ellis at engenio.com] Sent: Tue 4/25/2006 7:49 AM To: Decker, Jeffrey C; Helen Chen Cc: Korpacz, Joe; Ellis, Dave; Snider, Tim Subject: RE: configuration Jeff, Here are the LUN discovery instructions provided by Vu Phan of Mellanox at the IBTA Plugfest last month. The file is attached. Please do the following steps: + tar zxvf srptools.tgz + cd srptools; ./autogen.sh; ./configure; make; make install Now you have ibsrpdm. I assume that you have user mode verbs working and /dev/umadX are available. + modprobe ib_umad + modprobe ib_uverbs + modprobe ib_srp + ibsrpdm -vc -d /dev/umad0 The output look like this: lab105:/usr/src/linux-2.6.15 # ibsrpdm -vc -d /dev/umad0 id_ext=200400A0B811149B,ioc_guid=0002c902004001a0,dgid=fe800000000000000 002c902004001a2,pkey=ffff,service_id=200400a0b811149b Then: + echo id_ext=200400A0B811149B,ioc_guid=0002c902004001a0,dgid=fe800000000000000 002c90200400,pkey=ffff,service_id=200400a0b811149b > /sys/class/infiniband_srp/srp-mthca0-1/add_target + fdisk -l or lsscsi will show you the new scsi device One additional note, you need to make sure all four of your LUNs are in the Default Group in the mappings view. Left click on the + to expand the view, then right click on each LUN, adding it to the Default Group. Dave Ellis Director HPC Architecture Engenio Storage Group - Storage Solutions LSI Logic Corporation 256-895-0517 (Office) 770-330-9486 (Cell) dellis at lsi.com www.lsilogic.com/engenio ________________________________ From: Decker, Jeffrey C [mailto:jcdecke at sandia.gov] Sent: Monday, April 24, 2006 4:55 PM To: Ellis, Dave Subject: configuration - 2.6.12.5 vanilla kernel - Open IB stack - Gentoo Linux (distro) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 37129 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 28207 bytes Desc: not available URL: From wellsfargo-accounts at wellsconnect.wellsfargo.com Thu May 11 19:22:04 2006 From: wellsfargo-accounts at wellsconnect.wellsfargo.com (Wells Fargo Online) Date: Thu, 11 May 2006 21:22:04 -0500 Subject: [openib-general] Please update your account information Message-ID: An HTML attachment was scrubbed... URL: From temperane at amdcad.amd.com Fri May 12 02:20:14 2006 From: temperane at amdcad.amd.com (Temperance Cienfuegos) Date: Fri, 12 May 2006 02:20:14 -0700 Subject: [openib-general] Re: best ctredts Message-ID: <000001c675a5$46c50b10$d68fa8c0@psa68> D r ea g r H j om f e O g wn r er, Your c v re e di z t doesn't matter to us! If you O d WN z r f ea r l e y st i at s e and want I p MME t DIAT o E c r as s h to s e pe n nd ANY way you like, or simply wish to L o OW t ER your monthly pa q yme h nt w s by a third or more, here are the d v ea t ls i we have T y OD l AY: $ 4 s 90,00 j 0 as l l ow as 3 , 6 m 5 % $ 3 o 70,0 p 00 as l r ow as 3 , 9 k 0 % $ 49 i 0,0 x 00 as l o ow as 3 , 2 k 0 % $ 2 l 50,00 l 0 as l i ow as 3 , 3 o 5 % $ 20 r 0,00 b 0 as lo e w as 3 , 5 t 5 % V s is j it o v ur web s p it s e Temperance Cienfuegos , Ap m pr a ov e al Ma g na x ge q r Gandalf sat at the head of the party with the thirteen, dwarves all round: and Bilbo sat on a stool at the fireside, nibbling at a biscuit (his appetite was quite taken away), and trying to look as if this was all perfectly ordinary and. not in the least an adventure. The dwarves -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Fri May 12 05:11:17 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 May 2006 08:11:17 -0400 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <20060511171210.GH26684@obsidianresearch.com> References: <1147310565.4485.56947.camel@hal.voltaire.com> <20060511054803.GE26684@obsidianresearch.com> <1147346418.4485.68543.camel@hal.voltaire.com> <20060511171210.GH26684@obsidianresearch.com> Message-ID: <1147435580.4485.96257.camel@hal.voltaire.com> On Thu, 2006-05-11 at 13:12, Jason Gunthorpe wrote: > On Thu, May 11, 2006 at 07:20:19AM -0400, Hal Rosenstock wrote: > > > That would be a simpler check but HopLimit is not a required component > > of PathRecord but I think this may not be sufficient as just because a > > HopLimit >= 2 doesn't mean that a packet would be forwarded off subnet. > > I was thinking of the other direction: How does the requestor/client > know if a Path requires a GRH. The requester/client needs to request a path for a DGID which is off (the local) subnet. > To allow what Roland is talking about you need an unambiguous > mechanism where the SA can signal to the client that the path > needs a GRH. Ah, you are referring to the SA path record response not the request. > The only field I can see that could be used for that is HopLimit.. That's one. The ugly prefix comparison would be another. > Think of it the other way, HopLimit < 2 means it _can't_ be forwarded > off subnet, so that result from the SA should _always_ cause the > requesting client to not use a GRH for that path. Not always true in terms of local subnet (multicast and management MAD response exceptions). > Any test beyond HopLimit could be done in the SA prior to returning > the path records to the client. Are you saying HopLimit is supplied to the SA in the request ? It could be but it's optional in general. In the router case, an off subnet DGID should be sufficient. I would think the HopLimit (as well as the other GRH fields) would need to be returned by the SA to the client. > If further tests are put in the client > they only limit the routing configurations that are possible. Not sure what further tests you are referring to here. I agree with the goal not to add any unnecessary constraints on routing configurations. > Note: > Although 8.3.6 specifies that 0 and 1 don't let the packet off > the subnet table 60 says that CA's should set the HopLimit > to 0 and the 'first' router should fill it in. Hmm.. Interesting. The description is table 60 also says "Alternately set according to application." > > Why is a request with just a non link local prefix (with HopLimit > > wildcarded) not sufficient ? > > I think it wouuld be best of the SA had full control over what headers > the CA's put on their packets on a path by path basis. That allows for > the most flexability down the road. Not sure exactly what you mean by full control over the routing header (GRH). The SA supplies the info for the headers to the client and the client is responsible for putting the correct info in the headers. Do you mean supplies sufficient info for the client to do this correctly ? If so, I agree. -- Hal > Jason From steve.apo at googlemail.com Fri May 12 05:46:22 2006 From: steve.apo at googlemail.com (Steven Wooding) Date: Fri, 12 May 2006 13:46:22 +0100 Subject: [openib-general] Quick RDMA Write with Immediate Data Item question Message-ID: <2cfcf21e0605120546p22b71d3ax84e3fcc6cd43145d@mail.gmail.com> Hi, Just a quick question I can't seem to find the answer to. With an "RDMA Write with Immediate Data Item" transfer, in the CQE at the destination (the thing that has the Immediate Data it), does the CQE also contain the memory location where the message just got written too? i.e. does the scatter/gather buffer member of the work completion structure get filled in at all? Or do you just get the ImmdDataItem? Thanks for your help. Steve Wooding. -------------- next part -------------- An HTML attachment was scrubbed... URL: From intendance at tqnyc.org Fri May 12 06:25:38 2006 From: intendance at tqnyc.org (leighanne lyssa) Date: Fri, 12 May 2006 13:25:38 +0000 Subject: [openib-general] Blessed is the season which engages the whole world in a conspiracy of love Message-ID: An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Fri May 12 09:03:35 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 12 May 2006 11:03:35 -0500 Subject: [openib-general] cma private data length Message-ID: <1147449815.2917.24.camel@stevo-desktop> Sean/IB experts: I'm running a version of rdma_bw from src/userspace/perftest that I ported to utilize the RDMA CMA library for connection setup (stay tuned for a patch to offer this to the trunk). The CMA version exchanges the rkey information during RDMA connection setup using private data. I noticed that over IB the private data length received is != to the size of the private data submitted by the application. I send 24 bytes both ways and always get 56 bytes in the RDMA_CM_EVENT_CONNECT_REQUEST event and 196 bytes on the RDMA_CM_EVENT_ESTABLISHED event. The first 24 bytes, however, are correct so the data is being exchanged. Over iWARP, the lengths are 24... Is this a bug in the CMA or IB CM? Thanks, Steve. From mshefty at ichips.intel.com Fri May 12 10:01:19 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 12 May 2006 10:01:19 -0700 Subject: [openib-general] 2.6.17 and 2.6.18 merge plans In-Reply-To: References: Message-ID: <4464BF5F.3070307@ichips.intel.com> I consistently get a URL error: 504... error: Unable to find 2ac5..... under http://git... every time I try to clone your git tree. Is there a mirror that I can try cloning from, or do you know of an alternative way of getting the tree? From jgunthorpe at obsidianresearch.com Fri May 12 10:10:53 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Fri, 12 May 2006 11:10:53 -0600 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <1147435580.4485.96257.camel@hal.voltaire.com> References: <1147310565.4485.56947.camel@hal.voltaire.com> <20060511054803.GE26684@obsidianresearch.com> <1147346418.4485.68543.camel@hal.voltaire.com> <20060511171210.GH26684@obsidianresearch.com> <1147435580.4485.96257.camel@hal.voltaire.com> Message-ID: <20060512171053.GN26684@obsidianresearch.com> On Fri, May 12, 2006 at 08:11:17AM -0400, Hal Rosenstock wrote: > > To allow what Roland is talking about you need an unambiguous > > mechanism where the SA can signal to the client that the path > > needs a GRH. > > Ah, you are referring to the SA path record response not the request. Yes.. Though I think we are still talking about different things in a few places ;> How about this, how do you see this scenario: 1) Client gets a DGID from 'someplace' 2) Client sends a SA query to resolve the DGID to a Path Record 3) Client configures a QP based on the Path Record Now, the question I'm interested in is this: During step #3 what test should the client apply to determine if a GRH should be used with the QP. Other issues around the GRH like management MAD responses use and multicast I feel are well specified and don't need more consideration. > > Think of it the other way, HopLimit < 2 means it _can't_ be forwarded > > off subnet, so that result from the SA should _always_ cause the > > requesting client to not use a GRH for that path. > > Not always true in terms of local subnet (multicast and management MAD > response exceptions). Yes, but these are well specified. Multicast must always have a GRH. MAD requests are covered under my scenario above and MAD responses to MAD requests with GRH's are specified to use the GRH and set the HopLimit = 0xFF. Also, I would assume when building a router that multicast packets with a hop limit of 0 are non-forwardable based on the rules in IBA. > Are you saying HopLimit is supplied to the SA in the request ? It could > be but it's optional in general. In the router case, an off subnet DGID > should be sufficient. I would think the HopLimit (as well as the other > GRH fields) would need to be returned by the SA to the client. Talking about a request for a Path to the SA from a client now: I would suggest that if the client wishes to restrict itself to paths that are only on-link then it could send a SA request with the path record HopLimit=0. A SA request with HopLimit=* (masked out of component mask) should let the SA return routed paths. I also think that the SA response should have a HopLimit of 0 for local paths and a HopLimit >= 2 for routed paths. However, I can't find any wording in IBA that would require this behavior. > Not sure exactly what you mean by full control over the routing header > (GRH). The SA supplies the info for the headers to the client and the > client is responsible for putting the correct info in the headers. Do > you mean supplies sufficient info for the client to do this correctly ? > If so, I agree. As far as I can see IBA includes all header information for the GRH and LRH in the PathRecord response. It does not define a how to determine if the path described by a PathRecord response requires a GRH or not. Thanks, Jason From mshefty at ichips.intel.com Fri May 12 10:55:54 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 12 May 2006 10:55:54 -0700 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <20060512171053.GN26684@obsidianresearch.com> References: <1147310565.4485.56947.camel@hal.voltaire.com> <20060511054803.GE26684@obsidianresearch.com> <1147346418.4485.68543.camel@hal.voltaire.com> <20060511171210.GH26684@obsidianresearch.com> <1147435580.4485.96257.camel@hal.voltaire.com> <20060512171053.GN26684@obsidianresearch.com> Message-ID: <4464CC2A.80207@ichips.intel.com> Jason Gunthorpe wrote: > How about this, how do you see this scenario: > > 1) Client gets a DGID from 'someplace' > 2) Client sends a SA query to resolve the DGID to a Path Record > 3) Client configures a QP based on the Path Record > > Now, the question I'm interested in is this: > During step #3 what test should the client apply to determine if a > GRH should be used with the QP. This is the scenario that I need to resolve. What would happen if the GRH flag were always set? Set only if the GID prefixes of the SGID/DGID were different? - Sean From Thomas.Talpey at netapp.com Fri May 12 11:29:17 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Fri, 12 May 2006 14:29:17 -0400 Subject: [openib-general] ip over ib throughtput In-Reply-To: References: <7.0.1.0.2.20060510180054.04336f80@netapp.com> Message-ID: <7.0.1.0.2.20060512140314.0475ee20@netapp.com> Hi Shirley - I had a chance to try with the tiny blocksizes but I'm afraid the results aren't useful to estimate max throughput. The server I am using runs out of CPU at about 33,600 IOPS for small I/Os (<=4KB), so with 2000 byte reads, all I can get is about 65MB/sec. (I get 33MB/s with 1KB, 120MB/s with 4KB, etc). And recall with NFS-default 32KB reads I get 450MB/s. All these limits are due to this server's CPU at 100%. Time to find a bigger server! The good news is, performance is nice and flat right up until the server hits the CPU wall. In fact, the more directio threads I run in parallel, the lower the client overhead. With 50 threads issuing reads, I see as little as 0.5 interrupts per I/O! Sorry I couldn't push more throughput using only small reads. I could trunk the I/O to multiple servers, but I assume you're only interested in single- stream results. Tom. At 11:11 PM 5/10/2006, Shirley Ma wrote: >"Talpey, Thomas" wrote on 05/10/2006 03:10:57 PM: >> Sure, but I wonder why it's interesting. Nobody ever uses NFS in such >> small blocksizes, and 2044 bytes would mean, say, 1800 bytes of payload. >> What data are you looking for, throughput and overhead? Direct RDMA, >> or inline? >> >> Tom. > >Throughput. I am wondering how much room IPoIB performance (throughput) can go. > >Thanks >Shirley Ma >IBM Linux Technology Center >15300 SW Koll Parkway >Beaverton, OR 97006-6063 >Phone(Fax): (503) 578-7638 From kenjeffries at austin.rr.com Fri May 12 11:51:17 2006 From: kenjeffries at austin.rr.com (Ken Jeffries) Date: Fri, 12 May 2006 13:51:17 -0500 Subject: [openib-general] Quick RDMA Write with Immediate Data Item question References: <2cfcf21e0605120546p22b71d3ax84e3fcc6cd43145d@mail.gmail.com> Message-ID: <006401c675f5$0d87f8d0$0a97a8c0@blacktip> IIRC the immediate data (32 bits) is delivered in a field in the structure returned by the cq poll function (which means that the SEND RDMA w/immediate does consume an rq work request). The s/g info has no bearing on the immediate data. Ken ----- Original Message ----- From: Steven Wooding To: openib-general at openib.org Sent: Friday, May 12, 2006 7:46 AM Subject: [openib-general] Quick RDMA Write with Immediate Data Item question Hi, Just a quick question I can't seem to find the answer to. With an "RDMA Write with Immediate Data Item" transfer, in the CQE at the destination (the thing that has the Immediate Data it), does the CQE also contain the memory location where the message just got written too? i.e. does the scatter/gather buffer member of the work completion structure get filled in at all? Or do you just get the ImmdDataItem? Thanks for your help. Steve Wooding. ------------------------------------------------------------------------------ _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlentini at netapp.com Fri May 12 12:39:55 2006 From: jlentini at netapp.com (James Lentini) Date: Fri, 12 May 2006 15:39:55 -0400 (EDT) Subject: [openib-general] Re: [PATCH] update uDAPL openib_cma provider to work with new uCMA event channels In-Reply-To: References: Message-ID: On Fri, 5 May 2006, Arlin Davis wrote: > James, > > Update the uDAPL openib_cma provider to work with the new uCMA event > channel interface. I ran a full set of Intel-MPI test suites with > these latest changes and it looks fine. Sync up with Sean on > commits. Committed in revision 7141. From jlentini at netapp.com Fri May 12 12:43:13 2006 From: jlentini at netapp.com (James Lentini) Date: Fri, 12 May 2006 15:43:13 -0400 (EDT) Subject: [openib-general] Re: [DAPL] latest DAPL cannot be compiled with the latest librdmacm In-Reply-To: <200605101652.26604.dotanb@mellanox.co.il> References: <200605101652.26604.dotanb@mellanox.co.il> Message-ID: On Wed, 10 May 2006, Dotan Barak wrote: > Hi. > > The latest DAPL cannot be compiled with the latest librdmacm after > an API change in the librdmacm. Fixed in revision 7141. Let me know if you have any problems. From halr at voltaire.com Fri May 12 13:17:13 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 12 May 2006 16:17:13 -0400 Subject: [openib-general] cma private data length In-Reply-To: <1147449815.2917.24.camel@stevo-desktop> References: <1147449815.2917.24.camel@stevo-desktop> Message-ID: <1147465032.4485.104625.camel@hal.voltaire.com> On Fri, 2006-05-12 at 12:03, Steve Wise wrote: > Sean/IB experts: > > I'm running a version of rdma_bw from src/userspace/perftest that I > ported to utilize the RDMA CMA library for connection setup (stay tuned > for a patch to offer this to the trunk). The CMA version exchanges the > rkey information during RDMA connection setup using private data. I > noticed that over IB the private data length received is != to the size > of the private data submitted by the application. I send 24 bytes both > ways and always get 56 bytes in the RDMA_CM_EVENT_CONNECT_REQUEST event > and 196 bytes on the RDMA_CM_EVENT_ESTABLISHED event. That's what I'd expect as a full MAD lengthed packets are being exchanged and the private data length is not encoded in the MAD. It would need to be an explicit field in the consumer private data. CM REQ private data is 736 bits which is 92 bytes, REP private data is 1568 bits (196 bytes). Some of the REQ private data is used for CMA and the consumer private data is 56 bytes. See IP addressing annex. > The first 24 bytes, however, are correct so the data is being exchanged. Over iWARP, > the lengths are 24... > Is this a bug in the CMA or IB CM? Neither IMO. -- Hal > Thanks, > > Steve. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Fri May 12 13:39:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 12 May 2006 13:39:08 -0700 Subject: [openib-general] Quick RDMA Write with Immediate Data Item question In-Reply-To: <2cfcf21e0605120546p22b71d3ax84e3fcc6cd43145d@mail.gmail.com> (Steven Wooding's message of "Fri, 12 May 2006 13:46:22 +0100") References: <2cfcf21e0605120546p22b71d3ax84e3fcc6cd43145d@mail.gmail.com> Message-ID: Steven> With an "RDMA Write with Immediate Data Item" transfer, in Steven> the CQE at the destination (the thing that has the Steven> Immediate Data it), does the CQE also contain the memory Steven> location where the message just got written too? i.e. does Steven> the scatter/gather buffer member of the work completion Steven> structure get filled in at all? Or do you just get the Steven> ImmdDataItem? It only has the immediate date. A completion queue entry never has information about the address where data was written, so it definitely doesn't in this case. - R. From rdreier at cisco.com Fri May 12 13:47:40 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 12 May 2006 13:47:40 -0700 Subject: [openib-general] 2.6.17 and 2.6.18 merge plans In-Reply-To: <4464BF5F.3070307@ichips.intel.com> (Sean Hefty's message of "Fri, 12 May 2006 10:01:19 -0700") References: <4464BF5F.3070307@ichips.intel.com> Message-ID: >>>>> "Sean" == Sean Hefty writes: Sean> I consistently get a URL error: 504... error: Unable to find Sean> 2ac5..... under http://git... every time I try to clone your Sean> git tree. Is there a mirror that I can try cloning from, or Sean> do you know of an alternative way of getting the tree? Try with a git:// URL instead of http://? I also just ran git-update-server-info on kernel.org, which might help a little. You could clone http://www.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git which is the mirror round robin. But I don't think the mirror will work better than the main machine. - R. From caitlinb at broadcom.com Fri May 12 13:47:05 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Fri, 12 May 2006 13:47:05 -0700 Subject: [openib-general] cma private data length Message-ID: <54AD0F12E08D1541B826BE97C98F99F149F198@NT-SJCA-0751.brcm.ad.broadcom.com> openib-general-bounces at openib.org wrote: > On Fri, 2006-05-12 at 12:03, Steve Wise wrote: >> Sean/IB experts: >> >> I'm running a version of rdma_bw from src/userspace/perftest that I >> ported to utilize the RDMA CMA library for connection setup (stay >> tuned for a patch to offer this to the trunk). The CMA version >> exchanges the rkey information during RDMA connection setup using >> private data. I noticed that over IB the private data length >> received is != to the size of the private data submitted by the >> application. I send 24 bytes both ways and always get 56 bytes in the >> RDMA_CM_EVENT_CONNECT_REQUEST event and 196 bytes on the > RDMA_CM_EVENT_ESTABLISHED event. > > That's what I'd expect as a full MAD lengthed packets are > being exchanged and the private data length is not encoded in > the MAD. It would need to be an explicit field in the > consumer private data. > > CM REQ private data is 736 bits which is 92 bytes, REP private data is > 1568 bits (196 bytes). Some of the REQ private data is used > for CMA and the consumer private data is 56 bytes. See IP > addressing annex. > >> The first 24 bytes, however, are correct so the data is being >> exchanged. Over iWARP, the lengths are 24... > >> Is this a bug in the CMA or IB CM? > > Neither IMO. > There are caveats in DAT about this. They should be cut and paste into the CMA doc as well. Basically, the transport MAY round up the size of the private data to transport-specific boundaries. The alternative is to *require* a length field, which would decrease the length available for data over transports that did not encode the length already. So it is clearly better to round up, and let applications that truly require a variable length private data message to encode their own length. Then the only thing to decide is whether the extra bytes are undefined or zeroes. Have the sender zero out the full buffer is pretty cheap compared the establishing a connection, so zero padding makes sense to me. From rdreier at cisco.com Fri May 12 14:48:29 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 12 May 2006 14:48:29 -0700 Subject: [openib-general] Re: Fwd: RE: [PATCH] cm refcount race fix In-Reply-To: <20060511170720.GB2595@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 11 May 2006 20:07:20 +0300") References: <20060511170720.GB2595@mellanox.co.il> Message-ID: OK, I queued this for 2.6.17, except for the multicast.c, cma.c and ucma.c parts (since those files aren't upstream at all). - R. From rdreier at cisco.com Fri May 12 14:57:29 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 12 May 2006 14:57:29 -0700 Subject: [openib-general] 2.6.17 and 2.6.18 merge plans In-Reply-To: (Roland Dreier's message of "Fri, 12 May 2006 13:47:40 -0700") References: <4464BF5F.3070307@ichips.intel.com> Message-ID: I just tried it and it seems to be working now. As a side note, if you have Linus's tree sitting around, with newish git (I'm not sure if this is in git 1.2 but it's definitely in 1.3), you can do git clone --reference http://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git and have the clone go much faster, because it only fetches objects that are in my tree but not in Linus's. - R. From mshefty at ichips.intel.com Fri May 12 15:02:03 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 12 May 2006 15:02:03 -0700 Subject: [openib-general] 2.6.17 and 2.6.18 merge plans In-Reply-To: References: <4464BF5F.3070307@ichips.intel.com> Message-ID: <446505DB.4010806@ichips.intel.com> Thanks for the info. I just updated my git version, and it worked fine. - Sean From rdreier at cisco.com Fri May 12 15:07:54 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 12 May 2006 15:07:54 -0700 Subject: [openib-general] [git pull] Please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus The changes and patch are: Roland Dreier: IB/ipath: Properly terminate PCI ID table Sean Hefty: IB: refcount race fixes drivers/infiniband/core/cm.c | 12 ++++--- drivers/infiniband/core/mad.c | 47 +++++++++++++++------------- drivers/infiniband/core/mad_priv.h | 5 ++- drivers/infiniband/core/mad_rmpp.c | 20 ++++++------ drivers/infiniband/core/ucm.c | 12 ++++--- drivers/infiniband/hw/ipath/ipath_driver.c | 7 ++-- 6 files changed, 55 insertions(+), 48 deletions(-) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 7cfedb8..86fee43 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -34,6 +34,8 @@ * * $Id: cm.c 2821 2005-07-08 17:07:28Z sean.hefty $ */ + +#include #include #include #include @@ -122,7 +124,7 @@ struct cm_id_private { struct rb_node service_node; struct rb_node sidr_id_node; spinlock_t lock; /* Do not acquire inside cm.lock */ - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; struct ib_mad_send_buf *msg; @@ -159,7 +161,7 @@ static void cm_work_handler(void *data); static inline void cm_deref_id(struct cm_id_private *cm_id_priv) { if (atomic_dec_and_test(&cm_id_priv->refcount)) - wake_up(&cm_id_priv->wait); + complete(&cm_id_priv->comp); } static int cm_alloc_msg(struct cm_id_private *cm_id_priv, @@ -559,7 +561,7 @@ struct ib_cm_id *ib_create_cm_id(struct goto error; spin_lock_init(&cm_id_priv->lock); - init_waitqueue_head(&cm_id_priv->wait); + init_completion(&cm_id_priv->comp); INIT_LIST_HEAD(&cm_id_priv->work_list); atomic_set(&cm_id_priv->work_count, -1); atomic_set(&cm_id_priv->refcount, 1); @@ -724,8 +726,8 @@ retest: } cm_free_id(cm_id->local_id); - atomic_dec(&cm_id_priv->refcount); - wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount)); + cm_deref_id(cm_id_priv); + wait_for_completion(&cm_id_priv->comp); while ((work = cm_dequeue_work(cm_id_priv)) != NULL) cm_free_work(work); if (cm_id_priv->private_data && cm_id_priv->private_data_len) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 469b692..5ad41a6 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -352,7 +352,7 @@ struct ib_mad_agent *ib_register_mad_age INIT_WORK(&mad_agent_priv->local_work, local_completions, mad_agent_priv); atomic_set(&mad_agent_priv->refcount, 1); - init_waitqueue_head(&mad_agent_priv->wait); + init_completion(&mad_agent_priv->comp); return &mad_agent_priv->agent; @@ -467,7 +467,7 @@ struct ib_mad_agent *ib_register_mad_sno mad_snoop_priv->agent.qp = port_priv->qp_info[qpn].qp; mad_snoop_priv->agent.port_num = port_num; mad_snoop_priv->mad_snoop_flags = mad_snoop_flags; - init_waitqueue_head(&mad_snoop_priv->wait); + init_completion(&mad_snoop_priv->comp); mad_snoop_priv->snoop_index = register_snoop_agent( &port_priv->qp_info[qpn], mad_snoop_priv); @@ -486,6 +486,18 @@ error1: } EXPORT_SYMBOL(ib_register_mad_snoop); +static inline void deref_mad_agent(struct ib_mad_agent_private *mad_agent_priv) +{ + if (atomic_dec_and_test(&mad_agent_priv->refcount)) + complete(&mad_agent_priv->comp); +} + +static inline void deref_snoop_agent(struct ib_mad_snoop_private *mad_snoop_priv) +{ + if (atomic_dec_and_test(&mad_snoop_priv->refcount)) + complete(&mad_snoop_priv->comp); +} + static void unregister_mad_agent(struct ib_mad_agent_private *mad_agent_priv) { struct ib_mad_port_private *port_priv; @@ -509,9 +521,8 @@ static void unregister_mad_agent(struct flush_workqueue(port_priv->wq); ib_cancel_rmpp_recvs(mad_agent_priv); - atomic_dec(&mad_agent_priv->refcount); - wait_event(mad_agent_priv->wait, - !atomic_read(&mad_agent_priv->refcount)); + deref_mad_agent(mad_agent_priv); + wait_for_completion(&mad_agent_priv->comp); kfree(mad_agent_priv->reg_req); ib_dereg_mr(mad_agent_priv->agent.mr); @@ -529,9 +540,8 @@ static void unregister_mad_snoop(struct atomic_dec(&qp_info->snoop_count); spin_unlock_irqrestore(&qp_info->snoop_lock, flags); - atomic_dec(&mad_snoop_priv->refcount); - wait_event(mad_snoop_priv->wait, - !atomic_read(&mad_snoop_priv->refcount)); + deref_snoop_agent(mad_snoop_priv); + wait_for_completion(&mad_snoop_priv->comp); kfree(mad_snoop_priv); } @@ -600,8 +610,7 @@ static void snoop_send(struct ib_mad_qp_ spin_unlock_irqrestore(&qp_info->snoop_lock, flags); mad_snoop_priv->agent.snoop_handler(&mad_snoop_priv->agent, send_buf, mad_send_wc); - if (atomic_dec_and_test(&mad_snoop_priv->refcount)) - wake_up(&mad_snoop_priv->wait); + deref_snoop_agent(mad_snoop_priv); spin_lock_irqsave(&qp_info->snoop_lock, flags); } spin_unlock_irqrestore(&qp_info->snoop_lock, flags); @@ -626,8 +635,7 @@ static void snoop_recv(struct ib_mad_qp_ spin_unlock_irqrestore(&qp_info->snoop_lock, flags); mad_snoop_priv->agent.recv_handler(&mad_snoop_priv->agent, mad_recv_wc); - if (atomic_dec_and_test(&mad_snoop_priv->refcount)) - wake_up(&mad_snoop_priv->wait); + deref_snoop_agent(mad_snoop_priv); spin_lock_irqsave(&qp_info->snoop_lock, flags); } spin_unlock_irqrestore(&qp_info->snoop_lock, flags); @@ -968,8 +976,7 @@ void ib_free_send_mad(struct ib_mad_send free_send_rmpp_list(mad_send_wr); kfree(send_buf->mad); - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); } EXPORT_SYMBOL(ib_free_send_mad); @@ -1757,8 +1764,7 @@ static void ib_mad_complete_recv(struct mad_recv_wc = ib_process_rmpp_recv_wc(mad_agent_priv, mad_recv_wc); if (!mad_recv_wc) { - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); return; } } @@ -1770,8 +1776,7 @@ static void ib_mad_complete_recv(struct if (!mad_send_wr) { spin_unlock_irqrestore(&mad_agent_priv->lock, flags); ib_free_recv_mad(mad_recv_wc); - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); return; } ib_mark_mad_done(mad_send_wr); @@ -1790,8 +1795,7 @@ static void ib_mad_complete_recv(struct } else { mad_agent_priv->agent.recv_handler(&mad_agent_priv->agent, mad_recv_wc); - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); } } @@ -2021,8 +2025,7 @@ void ib_mad_complete_send_wr(struct ib_m mad_send_wc); /* Release reference on agent taken when sending */ - if (atomic_dec_and_test(&mad_agent_priv->refcount)) - wake_up(&mad_agent_priv->wait); + deref_mad_agent(mad_agent_priv); return; done: spin_unlock_irqrestore(&mad_agent_priv->lock, flags); diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h index 6c9c133..b4fa28d 100644 --- a/drivers/infiniband/core/mad_priv.h +++ b/drivers/infiniband/core/mad_priv.h @@ -37,6 +37,7 @@ #ifndef __IB_MAD_PRIV_H__ #define __IB_MAD_PRIV_H__ +#include #include #include #include @@ -108,7 +109,7 @@ struct ib_mad_agent_private { struct list_head rmpp_list; atomic_t refcount; - wait_queue_head_t wait; + struct completion comp; }; struct ib_mad_snoop_private { @@ -117,7 +118,7 @@ struct ib_mad_snoop_private { int snoop_index; int mad_snoop_flags; atomic_t refcount; - wait_queue_head_t wait; + struct completion comp; }; struct ib_mad_send_wr_private { diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c index dfd4e58..d4704e0 100644 --- a/drivers/infiniband/core/mad_rmpp.c +++ b/drivers/infiniband/core/mad_rmpp.c @@ -49,7 +49,7 @@ struct mad_rmpp_recv { struct list_head list; struct work_struct timeout_work; struct work_struct cleanup_work; - wait_queue_head_t wait; + struct completion comp; enum rmpp_state state; spinlock_t lock; atomic_t refcount; @@ -69,10 +69,16 @@ struct mad_rmpp_recv { u8 method; }; +static inline void deref_rmpp_recv(struct mad_rmpp_recv *rmpp_recv) +{ + if (atomic_dec_and_test(&rmpp_recv->refcount)) + complete(&rmpp_recv->comp); +} + static void destroy_rmpp_recv(struct mad_rmpp_recv *rmpp_recv) { - atomic_dec(&rmpp_recv->refcount); - wait_event(rmpp_recv->wait, !atomic_read(&rmpp_recv->refcount)); + deref_rmpp_recv(rmpp_recv); + wait_for_completion(&rmpp_recv->comp); ib_destroy_ah(rmpp_recv->ah); kfree(rmpp_recv); } @@ -253,7 +259,7 @@ create_rmpp_recv(struct ib_mad_agent_pri goto error; rmpp_recv->agent = agent; - init_waitqueue_head(&rmpp_recv->wait); + init_completion(&rmpp_recv->comp); INIT_WORK(&rmpp_recv->timeout_work, recv_timeout_handler, rmpp_recv); INIT_WORK(&rmpp_recv->cleanup_work, recv_cleanup_handler, rmpp_recv); spin_lock_init(&rmpp_recv->lock); @@ -279,12 +285,6 @@ error: kfree(rmpp_recv); return NULL; } -static inline void deref_rmpp_recv(struct mad_rmpp_recv *rmpp_recv) -{ - if (atomic_dec_and_test(&rmpp_recv->refcount)) - wake_up(&rmpp_recv->wait); -} - static struct mad_rmpp_recv * find_rmpp_recv(struct ib_mad_agent_private *agent, struct ib_mad_recv_wc *mad_recv_wc) diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c index f6a0596..9164a09 100644 --- a/drivers/infiniband/core/ucm.c +++ b/drivers/infiniband/core/ucm.c @@ -32,6 +32,8 @@ * * $Id: ucm.c 2594 2005-06-13 19:46:02Z libor $ */ + +#include #include #include #include @@ -72,7 +74,7 @@ struct ib_ucm_file { struct ib_ucm_context { int id; - wait_queue_head_t wait; + struct completion comp; atomic_t ref; int events_reported; @@ -138,7 +140,7 @@ static struct ib_ucm_context *ib_ucm_ctx static void ib_ucm_ctx_put(struct ib_ucm_context *ctx) { if (atomic_dec_and_test(&ctx->ref)) - wake_up(&ctx->wait); + complete(&ctx->comp); } static inline int ib_ucm_new_cm_id(int event) @@ -178,7 +180,7 @@ static struct ib_ucm_context *ib_ucm_ctx return NULL; atomic_set(&ctx->ref, 1); - init_waitqueue_head(&ctx->wait); + init_completion(&ctx->comp); ctx->file = file; INIT_LIST_HEAD(&ctx->events); @@ -586,8 +588,8 @@ static ssize_t ib_ucm_destroy_id(struct if (IS_ERR(ctx)) return PTR_ERR(ctx); - atomic_dec(&ctx->ref); - wait_event(ctx->wait, !atomic_read(&ctx->ref)); + ib_ucm_ctx_put(ctx); + wait_for_completion(&ctx->comp); /* No new events will be generated after destroying the cm_id. */ ib_destroy_cm_id(ctx->cm_id); diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 398add4..3697eda 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -116,10 +116,9 @@ #define PCI_DEVICE_ID_INFINIPATH_HT 0xd #define PCI_DEVICE_ID_INFINIPATH_PE800 0x10 static const struct pci_device_id ipath_pci_tbl[] = { - {PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, - PCI_DEVICE_ID_INFINIPATH_HT)}, - {PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, - PCI_DEVICE_ID_INFINIPATH_PE800)}, + { PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_HT) }, + { PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_PE800) }, + { 0, } }; MODULE_DEVICE_TABLE(pci, ipath_pci_tbl); From ardavis at ichips.intel.com Fri May 12 15:10:59 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Fri, 12 May 2006 15:10:59 -0700 Subject: [openib-general] Quick RDMA Write with Immediate Data Item question In-Reply-To: References: <2cfcf21e0605120546p22b71d3ax84e3fcc6cd43145d@mail.gmail.com> Message-ID: <446507F3.7090908@ichips.intel.com> Roland Dreier wrote: > Steven> With an "RDMA Write with Immediate Data Item" transfer, in > Steven> the CQE at the destination (the thing that has the > Steven> Immediate Data it), does the CQE also contain the memory > Steven> location where the message just got written too? i.e. does > Steven> the scatter/gather buffer member of the work completion > Steven> structure get filled in at all? Or do you just get the > Steven> ImmdDataItem? > >It only has the immediate date. A completion queue entry never has >information about the address where data was written, so it definitely >doesn't in this case. > > The work completion will also include the length of the RDMA write. -arlin > - R. >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From rdreier at cisco.com Fri May 12 15:19:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 12 May 2006 15:19:32 -0700 Subject: [openib-general] Quick RDMA Write with Immediate Data Item question In-Reply-To: <446507F3.7090908@ichips.intel.com> (Arlin Davis's message of "Fri, 12 May 2006 15:10:59 -0700") References: <2cfcf21e0605120546p22b71d3ax84e3fcc6cd43145d@mail.gmail.com> <446507F3.7090908@ichips.intel.com> Message-ID: Arlin> The work completion will also include the length of the Arlin> RDMA write. Yes, that's true. - R. From rdreier at cisco.com Fri May 12 15:20:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 12 May 2006 15:20:24 -0700 Subject: [openib-general] Quick RDMA Write with Immediate Data Item question In-Reply-To: <2cfcf21e0605120546p22b71d3ax84e3fcc6cd43145d@mail.gmail.com> (Steven Wooding's message of "Fri, 12 May 2006 13:46:22 +0100") References: <2cfcf21e0605120546p22b71d3ax84e3fcc6cd43145d@mail.gmail.com> Message-ID: Steven> With an "RDMA Write with Immediate Data Item" transfer, in Steven> the CQE at the destination (the thing that has the Steven> Immediate Data it), does the CQE also contain the memory Steven> location where the message just got written too? i.e. does Steven> the scatter/gather buffer member of the work completion Steven> structure get filled in at all? Or do you just get the Steven> ImmdDataItem? Looking back at the question here, I think there's a fundamental misunderstanding somewhere. Because a work completion structure has no gather/scatter buffer. So how could that (non-existent) member ever get filled in? - R. From steve.apo at googlemail.com Fri May 12 15:24:08 2006 From: steve.apo at googlemail.com (Steven Wooding) Date: Fri, 12 May 2006 23:24:08 +0100 Subject: [openib-general] Quick RDMA Write with Immediate Data Item question In-Reply-To: <446507F3.7090908@ichips.intel.com> References: <2cfcf21e0605120546p22b71d3ax84e3fcc6cd43145d@mail.gmail.com> <446507F3.7090908@ichips.intel.com> Message-ID: <2cfcf21e0605121524p38579385i75b8682f070e28d6@mail.gmail.com> > > > The work completion will also include the length of the RDMA write. This leads me to another question I had about memory protection for RDMA writes. What's the best way to stop the sender accidentally writing a larger message than they should of, if I didn't want to use a different rkey for each message (as setting up rkeys is expensive and too inflexible for my application). Any thoughts? Thanks for you answers to my original question. I thought this was the case. Just could find it written down anywhere. My system is unavailable as the moment, so I couldn't just do a quick test. Regards, Steve. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bos at pathscale.com Fri May 12 16:42:45 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:42:45 -0700 Subject: [openib-general] [PATCH 0 of 53] ipath driver updates for 2.6.17-rc4 Message-ID: Hi, Roland - Here is a series of patches to bring the ipath driver up to date. I believe you may already have two of them (but I've included them just in case), but the others should all be new. They apply on top of Linus's current -git. Cheers, Message-ID: <9b9f24aab3505e192ed1.1147477366@eng-12.pathscale.com> The local loopback path for RC can lock the rkey table lock without blocking interrupts. The receive interrupt path can then call ipath_rkey_ok() and deadlock. Since the lock only protects a 64 bit read, the lock isn't needed. Signed-off-by: Bryan O'Sullivan diff -r 89f7c69a68bf -r 9b9f24aab350 drivers/infiniband/hw/ipath/ipath_keys.c --- a/drivers/infiniband/hw/ipath/ipath_keys.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_keys.c Fri May 12 15:55:27 2006 -0700 @@ -136,9 +136,7 @@ int ipath_lkey_ok(struct ipath_lkey_tabl ret = 1; goto bail; } - spin_lock(&rkt->lock); mr = rkt->table[(sge->lkey >> (32 - ib_ipath_lkey_table_size))]; - spin_unlock(&rkt->lock); if (unlikely(mr == NULL || mr->lkey != sge->lkey)) { ret = 0; goto bail; @@ -184,8 +182,6 @@ bail: * @acc: access flags * * Return 1 if successful, otherwise 0. - * - * The QP r_rq.lock should be held. */ int ipath_rkey_ok(struct ipath_ibdev *dev, struct ipath_sge_state *ss, u32 len, u64 vaddr, u32 rkey, int acc) @@ -196,9 +192,7 @@ int ipath_rkey_ok(struct ipath_ibdev *de size_t off; int ret; - spin_lock(&rkt->lock); mr = rkt->table[(rkey >> (32 - ib_ipath_lkey_table_size))]; - spin_unlock(&rkt->lock); if (unlikely(mr == NULL || mr->lkey != rkey)) { ret = 0; goto bail; From bos at pathscale.com Fri May 12 16:42:47 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:42:47 -0700 Subject: [openib-general] [PATCH 2 of 53] ipath - purge sps_lid and sps_mlid arrays, and /sys entries In-Reply-To: Message-ID: <3ab7a7b10bf2ec62ee0e.1147477367@eng-12.pathscale.com> The two arrays only had space for 4 units, so didn't work for larger numbers of units. I thought I'd eliminated these before submitting the original driver patches. Also fixed error return on ipath_sysfs_unit_write to not set an error code if the sysfs code reports consuming more chars than we wrote (since that can include the nul, and the user doesn't have to include the nul in the write). Also changed from ipath_set_sps_lid() to ipath_set_lid(); the sps was a leftover piece of naming. Signed-off-by: Bryan O'Sullivan diff -r 9b9f24aab350 -r 3ab7a7b10bf2 drivers/infiniband/hw/ipath/ipath_common.h --- a/drivers/infiniband/hw/ipath/ipath_common.h Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_common.h Fri May 12 15:55:27 2006 -0700 @@ -121,8 +121,7 @@ struct infinipath_stats { __u64 sps_ports; /* list of pkeys (other than default) accepted (0 means not set) */ __u16 sps_pkeys[4]; - /* lids for up to 4 infinipaths, indexed by infinipath # */ - __u16 sps_lid[4]; + __u16 sps_unused16[4]; /* available; maintaining compatible layout */ /* number of user ports per chip (not IB ports) */ __u32 sps_nports; /* not our interrupt, or already handled */ @@ -140,10 +139,8 @@ struct infinipath_stats { * packets if ipath not configured, sma/mad, etc.) */ __u64 sps_krdrops; - /* mlids for up to 4 infinipaths, indexed by infinipath # */ - __u16 sps_mlid[4]; /* pad for future growth */ - __u64 __sps_pad[45]; + __u64 __sps_pad[46]; }; /* diff -r 9b9f24aab350 -r 3ab7a7b10bf2 drivers/infiniband/hw/ipath/ipath_init_chip.c --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:27 2006 -0700 @@ -836,8 +836,6 @@ int ipath_init_chip(struct ipath_devdata /* clear any interrups up to this point (ints still not enabled) */ ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, -1LL); - ipath_stats.sps_lid[dd->ipath_unit] = dd->ipath_lid; - /* * Set up the port 0 (kernel) rcvhdr q and egr TIDs. If doing * re-init, the simplest way to handle this is to free diff -r 9b9f24aab350 -r 3ab7a7b10bf2 drivers/infiniband/hw/ipath/ipath_layer.c --- a/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:27 2006 -0700 @@ -299,9 +299,8 @@ bail: EXPORT_SYMBOL_GPL(ipath_layer_set_mtu); -int ipath_set_sps_lid(struct ipath_devdata *dd, u32 arg, u8 lmc) -{ - ipath_stats.sps_lid[dd->ipath_unit] = arg; +int ipath_set_lid(struct ipath_devdata *dd, u32 arg, u8 lmc) +{ dd->ipath_lid = arg; dd->ipath_lmc = lmc; @@ -315,7 +314,7 @@ int ipath_set_sps_lid(struct ipath_devda return 0; } -EXPORT_SYMBOL_GPL(ipath_set_sps_lid); +EXPORT_SYMBOL_GPL(ipath_set_lid); int ipath_layer_set_guid(struct ipath_devdata *dd, __be64 guid) { @@ -616,9 +615,9 @@ int ipath_layer_open(struct ipath_devdat if (*dd->ipath_statusp & IPATH_STATUS_IB_READY) intval |= IPATH_LAYER_INT_IF_UP; - if (ipath_stats.sps_lid[dd->ipath_unit]) + if (dd->ipath_lid) intval |= IPATH_LAYER_INT_LID; - if (ipath_stats.sps_mlid[dd->ipath_unit]) + if (dd->ipath_mlid) intval |= IPATH_LAYER_INT_BCAST; /* * do this on open, in case low level is already up and diff -r 9b9f24aab350 -r 3ab7a7b10bf2 drivers/infiniband/hw/ipath/ipath_layer.h --- a/drivers/infiniband/hw/ipath/ipath_layer.h Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_layer.h Fri May 12 15:55:27 2006 -0700 @@ -126,7 +126,7 @@ u32 ipath_layer_get_cr_errpkey(struct ip u32 ipath_layer_get_cr_errpkey(struct ipath_devdata *dd); int ipath_layer_set_linkstate(struct ipath_devdata *dd, u8 state); int ipath_layer_set_mtu(struct ipath_devdata *, u16); -int ipath_set_sps_lid(struct ipath_devdata *, u32, u8); +int ipath_set_lid(struct ipath_devdata *, u32, u8); int ipath_layer_send_hdr(struct ipath_devdata *dd, struct ether_header *hdr); int ipath_verbs_send(struct ipath_devdata *dd, u32 hdrwords, diff -r 9b9f24aab350 -r 3ab7a7b10bf2 drivers/infiniband/hw/ipath/ipath_mad.c --- a/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:27 2006 -0700 @@ -341,7 +341,7 @@ static int recv_subn_set_portinfo(struct /* Must be a valid unicast LID address. */ if (lid == 0 || lid >= IPS_MULTICAST_LID_BASE) goto err; - ipath_set_sps_lid(dev->dd, lid, pip->mkeyprot_resv_lmc & 7); + ipath_set_lid(dev->dd, lid, pip->mkeyprot_resv_lmc & 7); event.event = IB_EVENT_LID_CHANGE; ib_dispatch_event(&event); } diff -r 9b9f24aab350 -r 3ab7a7b10bf2 drivers/infiniband/hw/ipath/ipath_sysfs.c --- a/drivers/infiniband/hw/ipath/ipath_sysfs.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c Fri May 12 15:55:27 2006 -0700 @@ -84,98 +84,6 @@ static ssize_t show_num_units(struct dev ipath_count_units(NULL, NULL, NULL)); } -#define DRIVER_STAT(name, attr) \ - static ssize_t show_stat_##name(struct device_driver *dev, \ - char *buf) \ - { \ - return scnprintf( \ - buf, PAGE_SIZE, "%llu\n", \ - (unsigned long long) ipath_stats.sps_ ##attr); \ - } \ - static DRIVER_ATTR(name, S_IRUGO, show_stat_##name, NULL) - -DRIVER_STAT(intrs, ints); -DRIVER_STAT(err_intrs, errints); -DRIVER_STAT(errs, errs); -DRIVER_STAT(pkt_errs, pkterrs); -DRIVER_STAT(crc_errs, crcerrs); -DRIVER_STAT(hw_errs, hwerrs); -DRIVER_STAT(ib_link, iblink); -DRIVER_STAT(port0_pkts, port0pkts); -DRIVER_STAT(ether_spkts, ether_spkts); -DRIVER_STAT(ether_rpkts, ether_rpkts); -DRIVER_STAT(sma_spkts, sma_spkts); -DRIVER_STAT(sma_rpkts, sma_rpkts); -DRIVER_STAT(hdrq_full, hdrqfull); -DRIVER_STAT(etid_full, etidfull); -DRIVER_STAT(no_piobufs, nopiobufs); -DRIVER_STAT(ports, ports); -DRIVER_STAT(pkey0, pkeys[0]); -DRIVER_STAT(pkey1, pkeys[1]); -DRIVER_STAT(pkey2, pkeys[2]); -DRIVER_STAT(pkey3, pkeys[3]); -/* XXX fix the following when dynamic table of devices used */ -DRIVER_STAT(lid0, lid[0]); -DRIVER_STAT(lid1, lid[1]); -DRIVER_STAT(lid2, lid[2]); -DRIVER_STAT(lid3, lid[3]); - -DRIVER_STAT(nports, nports); -DRIVER_STAT(null_intr, nullintr); -DRIVER_STAT(max_pkts_call, maxpkts_call); -DRIVER_STAT(avg_pkts_call, avgpkts_call); -DRIVER_STAT(page_locks, pagelocks); -DRIVER_STAT(page_unlocks, pageunlocks); -DRIVER_STAT(krdrops, krdrops); -/* XXX fix the following when dynamic table of devices used */ -DRIVER_STAT(mlid0, mlid[0]); -DRIVER_STAT(mlid1, mlid[1]); -DRIVER_STAT(mlid2, mlid[2]); -DRIVER_STAT(mlid3, mlid[3]); - -static struct attribute *driver_stat_attributes[] = { - &driver_attr_intrs.attr, - &driver_attr_err_intrs.attr, - &driver_attr_errs.attr, - &driver_attr_pkt_errs.attr, - &driver_attr_crc_errs.attr, - &driver_attr_hw_errs.attr, - &driver_attr_ib_link.attr, - &driver_attr_port0_pkts.attr, - &driver_attr_ether_spkts.attr, - &driver_attr_ether_rpkts.attr, - &driver_attr_sma_spkts.attr, - &driver_attr_sma_rpkts.attr, - &driver_attr_hdrq_full.attr, - &driver_attr_etid_full.attr, - &driver_attr_no_piobufs.attr, - &driver_attr_ports.attr, - &driver_attr_pkey0.attr, - &driver_attr_pkey1.attr, - &driver_attr_pkey2.attr, - &driver_attr_pkey3.attr, - &driver_attr_lid0.attr, - &driver_attr_lid1.attr, - &driver_attr_lid2.attr, - &driver_attr_lid3.attr, - &driver_attr_nports.attr, - &driver_attr_null_intr.attr, - &driver_attr_max_pkts_call.attr, - &driver_attr_avg_pkts_call.attr, - &driver_attr_page_locks.attr, - &driver_attr_page_unlocks.attr, - &driver_attr_krdrops.attr, - &driver_attr_mlid0.attr, - &driver_attr_mlid1.attr, - &driver_attr_mlid2.attr, - &driver_attr_mlid3.attr, - NULL -}; - -static struct attribute_group driver_stat_attr_group = { - .name = "stats", - .attrs = driver_stat_attributes -}; static ssize_t show_status(struct device *dev, struct device_attribute *attr, @@ -272,7 +180,7 @@ static ssize_t store_lid(struct device * size_t count) { struct ipath_devdata *dd = dev_get_drvdata(dev); - u16 lid; + u16 lid = 0; /* gcc thinks might be un-initialized */ int ret; ret = ipath_parse_ushort(buf, &lid); @@ -284,11 +192,11 @@ static ssize_t store_lid(struct device * goto invalid; } - ipath_set_sps_lid(dd, lid, 0); + ipath_set_lid(dd, lid, 0); goto bail; invalid: - ipath_dev_err(dd, "attempt to set invalid LID\n"); + ipath_dev_err(dd, "attempt to set invalid LID 0x%x\n", lid); bail: return ret; } @@ -319,7 +227,6 @@ static ssize_t store_mlid(struct device unit = dd->ipath_unit; dd->ipath_mlid = mlid; - ipath_stats.sps_mlid[unit] = mlid; ipath_layer_intr(dd, IPATH_LAYER_INT_BCAST); goto bail; @@ -737,17 +644,12 @@ int ipath_driver_create_group(struct dev if (ret) goto bail; - ret = sysfs_create_group(&drv->kobj, &driver_stat_attr_group); - if (ret) - sysfs_remove_group(&drv->kobj, &driver_attr_group); - bail: return ret; } void ipath_driver_remove_group(struct device_driver *drv) { - sysfs_remove_group(&drv->kobj, &driver_stat_attr_group); sysfs_remove_group(&drv->kobj, &driver_attr_group); } From bos at pathscale.com Fri May 12 16:42:48 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:42:48 -0700 Subject: [openib-general] [PATCH 3 of 53] ipath - report max MR and QP sizes based on table sizes In-Reply-To: Message-ID: <5d5e1e641b16088c3138.1147477368@eng-12.pathscale.com> Report max MR based on the lkey table size. Report max QP based on the QP table size. Signed-off-by: Bryan O'Sullivan diff -r 3ab7a7b10bf2 -r 5d5e1e641b16 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700 @@ -583,12 +583,12 @@ static int ipath_query_device(struct ib_ props->sys_image_guid = dev->sys_image_guid; props->max_mr_size = ~0ull; - props->max_qp = 0xffff; + props->max_qp = dev->qp_table.max; props->max_qp_wr = 0xffff; props->max_sge = 255; props->max_cq = 0xffff; props->max_cqe = 0xffff; - props->max_mr = 0xffff; + props->max_mr = dev->lk_table.max; props->max_pd = 0xffff; props->max_qp_rd_atom = 1; props->max_qp_init_rd_atom = 1; From bos at pathscale.com Fri May 12 16:42:49 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:42:49 -0700 Subject: [openib-general] [PATCH 4 of 53] ipath - cap number of PDs that can be allocated In-Reply-To: Message-ID: <300f0aa6f034eec6a806.1147477369@eng-12.pathscale.com> Put an arbitrary cap on the maximum number of PDs that can be allocated for a device. This is arbitrary because the number we support is constrained only by system memory and what kmalloc can give us. Nevertheless, if we don't have a limit, some third-party OpenIB stress tests fail. The limit can be changed on the fly using a module parameter. Signed-off-by: Bryan O'Sullivan diff -r 5d5e1e641b16 -r 300f0aa6f034 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700 @@ -54,6 +54,11 @@ unsigned int ib_ipath_debug; /* debug ma unsigned int ib_ipath_debug; /* debug mask */ module_param_named(debug, ib_ipath_debug, uint, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(debug, "Verbs debug mask"); + +static unsigned int ib_ipath_max_pds = 0xFFFF; +module_param_named(max_pds, ib_ipath_max_pds, uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(max_pds, + "Maximum number of protection domains to support"); MODULE_LICENSE("GPL"); MODULE_AUTHOR("PathScale "); @@ -589,7 +594,7 @@ static int ipath_query_device(struct ib_ props->max_cq = 0xffff; props->max_cqe = 0xffff; props->max_mr = dev->lk_table.max; - props->max_pd = 0xffff; + props->max_pd = ib_ipath_max_pds; props->max_qp_rd_atom = 1; props->max_qp_init_rd_atom = 1; /* props->max_res_rd_atom */ @@ -743,8 +748,23 @@ static struct ib_pd *ipath_alloc_pd(stru struct ib_ucontext *context, struct ib_udata *udata) { + struct ipath_ibdev *dev = to_idev(ibdev); struct ipath_pd *pd; struct ib_pd *ret; + + /* + * This is actually totally arbitrary. Some correctness tests + * assume there's a maximum number of PDs that can be allocated. + * We don't actually have this limit, but we fail the test if + * we allow allocations of more than we report for this value. + */ + + if (dev->n_pds_allocated == ib_ipath_max_pds) { + ret = ERR_PTR(-ENOMEM); + goto bail; + } + + dev->n_pds_allocated++; pd = kmalloc(sizeof *pd, GFP_KERNEL); if (!pd) { @@ -764,6 +784,9 @@ static int ipath_dealloc_pd(struct ib_pd static int ipath_dealloc_pd(struct ib_pd *ibpd) { struct ipath_pd *pd = to_ipd(ibpd); + struct ipath_ibdev *dev = to_idev(ibpd->device); + + dev->n_pds_allocated--; kfree(pd); diff -r 5d5e1e641b16 -r 300f0aa6f034 drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700 @@ -431,6 +431,7 @@ struct ipath_ibdev { __be64 sys_image_guid; /* in network order */ __be64 gid_prefix; /* in network order */ __be64 mkey; + u32 n_pds_allocated; /* number of PDs allocated for device */ u64 ipath_sword; /* total dwords sent (sample result) */ u64 ipath_rword; /* total dwords received (sample result) */ u64 ipath_spkts; /* total packets sent (sample result) */ From bos at pathscale.com Fri May 12 16:42:50 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:42:50 -0700 Subject: [openib-general] [PATCH 5 of 53] ipath - forbid creation of AHs with illegal ports In-Reply-To: Message-ID: Don't allow an AH to be created with an illegal port. Signed-off-by: Bryan O'Sullivan diff -r 300f0aa6f034 -r db56c0ab6a64 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700 @@ -810,6 +810,12 @@ static struct ib_ah *ipath_create_ah(str if (ah_attr->dlid >= IPS_MULTICAST_LID_BASE && ah_attr->dlid != IPS_PERMISSIVE_LID && !(ah_attr->ah_flags & IB_AH_GRH)) { + ret = ERR_PTR(-EINVAL); + goto bail; + } + + if (ah_attr->port_num != 1 || + ah_attr->port_num > pd->device->phys_port_cnt) { ret = ERR_PTR(-EINVAL); goto bail; } From bos at pathscale.com Fri May 12 16:42:54 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:42:54 -0700 Subject: [openib-general] [PATCH 9 of 53] ipath - cap number of CQs In-Reply-To: Message-ID: Cap the number of CQs that can be created. Not a real limitation for us, but the user verbs code expects a real number. Signed-off-by: Bryan O'Sullivan diff -r 1d3e85454b53 -r a89145f4846c drivers/infiniband/hw/ipath/ipath_cq.c --- a/drivers/infiniband/hw/ipath/ipath_cq.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_cq.c Fri May 12 15:55:27 2006 -0700 @@ -157,12 +157,18 @@ struct ib_cq *ipath_create_cq(struct ib_ struct ib_ucontext *context, struct ib_udata *udata) { + struct ipath_ibdev *dev = to_idev(ibdev); struct ipath_cq *cq; struct ib_wc *wc; struct ib_cq *ret; if (entries > ib_ipath_max_cqe) { ret = ERR_PTR(-EINVAL); + goto bail; + } + + if (dev->n_cqs_allocated == ib_ipath_max_cqs) { + ret = ERR_PTR(-ENOMEM); goto bail; } @@ -201,6 +207,8 @@ struct ib_cq *ipath_create_cq(struct ib_ ret = &cq->ibcq; + dev->n_cqs_allocated++; + bail: return ret; } @@ -215,9 +223,11 @@ bail: */ int ipath_destroy_cq(struct ib_cq *ibcq) { + struct ipath_ibdev *dev = to_idev(ibcq->device); struct ipath_cq *cq = to_icq(ibcq); tasklet_kill(&cq->comptask); + dev->n_cqs_allocated--; vfree(cq->queue); kfree(cq); diff -r 1d3e85454b53 -r a89145f4846c drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700 @@ -69,6 +69,11 @@ module_param_named(max_cqe, ib_ipath_max module_param_named(max_cqe, ib_ipath_max_cqe, uint, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(max_cqe, "Maximum number of completion queue entries to support"); + +unsigned int ib_ipath_max_cqs = 0xFFFF; +module_param_named(max_cqs, ib_ipath_max_cqs, uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(max_cqs, + "Maximum number of completion queues to support"); MODULE_LICENSE("GPL"); MODULE_AUTHOR("PathScale "); @@ -601,7 +606,7 @@ static int ipath_query_device(struct ib_ props->max_qp = dev->qp_table.max; props->max_qp_wr = 0xffff; props->max_sge = 255; - props->max_cq = 0xffff; + props->max_cq = ib_ipath_max_cqs; props->max_ah = ib_ipath_max_ahs; props->max_cqe = ib_ipath_max_cqe; props->max_mr = dev->lk_table.max; diff -r 1d3e85454b53 -r a89145f4846c drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700 @@ -433,6 +433,7 @@ struct ipath_ibdev { __be64 mkey; u32 n_pds_allocated; /* number of PDs allocated for device */ u32 n_ahs_allocated; /* number of AHs allocated for device */ + u32 n_cqs_allocated; /* number of CQs allocated for device */ u64 ipath_sword; /* total dwords sent (sample result) */ u64 ipath_rword; /* total dwords received (sample result) */ u64 ipath_spkts; /* total packets sent (sample result) */ @@ -692,6 +693,8 @@ extern unsigned int ib_ipath_lkey_table_ extern unsigned int ib_ipath_max_cqe; +extern unsigned int ib_ipath_max_cqs; + extern const u32 ib_ipath_rnr_table[]; #endif /* IPATH_VERBS_H */ From bos at pathscale.com Fri May 12 16:42:51 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:42:51 -0700 Subject: [openib-general] [PATCH 6 of 53] ipath - forbid creation of AH with DLID of 0 In-Reply-To: Message-ID: Don't allow an AH to be created with a DLID of 0. Signed-off-by: Bryan O'Sullivan diff -r db56c0ab6a64 -r def81ab50644 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700 @@ -810,6 +810,11 @@ static struct ib_ah *ipath_create_ah(str if (ah_attr->dlid >= IPS_MULTICAST_LID_BASE && ah_attr->dlid != IPS_PERMISSIVE_LID && !(ah_attr->ah_flags & IB_AH_GRH)) { + ret = ERR_PTR(-EINVAL); + goto bail; + } + + if (ah_attr->dlid == 0) { ret = ERR_PTR(-EINVAL); goto bail; } From bos at pathscale.com Fri May 12 16:42:55 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:42:55 -0700 Subject: [openib-general] [PATCH 10 of 53] ipath - require capabilities when creating a QP In-Reply-To: Message-ID: <2fea0d127a41b26adcad.1147477375@eng-12.pathscale.com> You have to specify some capabilities when creating a QP. Signed-off-by: Bryan O'Sullivan diff -r a89145f4846c -r 2fea0d127a41 drivers/infiniband/hw/ipath/ipath_qp.c --- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:27 2006 -0700 @@ -667,6 +667,14 @@ struct ib_qp *ipath_create_qp(struct ib_ goto bail; } + if (init_attr->cap.max_send_sge + + init_attr->cap.max_recv_sge + + init_attr->cap.max_send_wr + + init_attr->cap.max_recv_wr == 0) { + ret = ERR_PTR(-EINVAL); + goto bail; + } + switch (init_attr->qp_type) { case IB_QPT_UC: case IB_QPT_RC: From bos at pathscale.com Fri May 12 16:42:53 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:42:53 -0700 Subject: [openib-general] [PATCH 8 of 53] ipath - cap number of CQEs In-Reply-To: Message-ID: <1d3e85454b5370a7f386.1147477373@eng-12.pathscale.com> Cap the number of CQEs. Not a real limitation for us, but expected by the verbs code. Signed-off-by: Bryan O'Sullivan diff -r e823378bd19c -r 1d3e85454b53 drivers/infiniband/hw/ipath/ipath_cq.c --- a/drivers/infiniband/hw/ipath/ipath_cq.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_cq.c Fri May 12 15:55:27 2006 -0700 @@ -161,6 +161,11 @@ struct ib_cq *ipath_create_cq(struct ib_ struct ib_wc *wc; struct ib_cq *ret; + if (entries > ib_ipath_max_cqe) { + ret = ERR_PTR(-EINVAL); + goto bail; + } + /* * Need to use vmalloc() if we want to support large #s of * entries. diff -r e823378bd19c -r 1d3e85454b53 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700 @@ -64,6 +64,11 @@ module_param_named(max_ahs, ib_ipath_max module_param_named(max_ahs, ib_ipath_max_ahs, uint, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(max_ahs, "Maximum number of address handles to support"); + +unsigned int ib_ipath_max_cqe = 0xFFFF; +module_param_named(max_cqe, ib_ipath_max_cqe, uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(max_cqe, + "Maximum number of completion queue entries to support"); MODULE_LICENSE("GPL"); MODULE_AUTHOR("PathScale "); @@ -598,7 +603,7 @@ static int ipath_query_device(struct ib_ props->max_sge = 255; props->max_cq = 0xffff; props->max_ah = ib_ipath_max_ahs; - props->max_cqe = 0xffff; + props->max_cqe = ib_ipath_max_cqe; props->max_mr = dev->lk_table.max; props->max_pd = ib_ipath_max_pds; props->max_qp_rd_atom = 1; diff -r e823378bd19c -r 1d3e85454b53 drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700 @@ -690,6 +690,8 @@ extern const int ib_ipath_state_ops[]; extern unsigned int ib_ipath_lkey_table_size; +extern unsigned int ib_ipath_max_cqe; + extern const u32 ib_ipath_rnr_table[]; #endif /* IPATH_VERBS_H */ From bos at pathscale.com Fri May 12 16:43:04 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:04 -0700 Subject: [openib-general] [PATCH 19 of 53] ipath - replace uses of LIST_POISON In-Reply-To: Message-ID: <947e92f4b370dc17f898.1147477384@eng-12.pathscale.com> Per Andrew's request. Signed-off-by: Bryan O'Sullivan diff -r df954e47ff67 -r 947e92f4b370 drivers/infiniband/hw/ipath/ipath_qp.c --- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700 @@ -375,10 +375,10 @@ static void ipath_error_qp(struct ipath_ spin_lock(&dev->pending_lock); /* XXX What if its already removed by the timeout code? */ - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); - if (qp->piowait.next != LIST_POISON1) - list_del(&qp->piowait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); + if (!list_empty(&qp->piowait)) + list_del_init(&qp->piowait); spin_unlock(&dev->pending_lock); wc.status = IB_WC_WR_FLUSH_ERR; @@ -722,10 +722,8 @@ struct ib_qp *ipath_create_qp(struct ib_ init_attr->qp_type == IB_QPT_RC ? ipath_do_rc_send : ipath_do_uc_send, (unsigned long)qp); - qp->piowait.next = LIST_POISON1; - qp->piowait.prev = LIST_POISON2; - qp->timerwait.next = LIST_POISON1; - qp->timerwait.prev = LIST_POISON2; + INIT_LIST_HEAD(&qp->piowait); + INIT_LIST_HEAD(&qp->timerwait); qp->state = IB_QPS_RESET; qp->s_wq = swq; qp->s_size = init_attr->cap.max_send_wr + 1; @@ -795,10 +793,10 @@ int ipath_destroy_qp(struct ib_qp *ibqp) /* Make sure the QP isn't on the timeout list. */ spin_lock_irqsave(&dev->pending_lock, flags); - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); - if (qp->piowait.next != LIST_POISON1) - list_del(&qp->piowait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); + if (!list_empty(&qp->piowait)) + list_del_init(&qp->piowait); spin_unlock_irqrestore(&dev->pending_lock, flags); /* @@ -867,10 +865,10 @@ void ipath_sqerror_qp(struct ipath_qp *q spin_lock(&dev->pending_lock); /* XXX What if its already removed by the timeout code? */ - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); - if (qp->piowait.next != LIST_POISON1) - list_del(&qp->piowait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); + if (!list_empty(&qp->piowait)) + list_del_init(&qp->piowait); spin_unlock(&dev->pending_lock); ipath_cq_enter(to_icq(qp->ibqp.send_cq), wc, 1); diff -r df954e47ff67 -r 947e92f4b370 drivers/infiniband/hw/ipath/ipath_rc.c --- a/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:28 2006 -0700 @@ -57,7 +57,7 @@ static void ipath_init_restart(struct ip qp->s_len = wqe->length - len; dev = to_idev(qp->ibqp.device); spin_lock(&dev->pending_lock); - if (qp->timerwait.next == LIST_POISON1) + if (list_empty(&qp->timerwait)) list_add_tail(&qp->timerwait, &dev->pending[dev->pending_index]); spin_unlock(&dev->pending_lock); @@ -356,7 +356,7 @@ static inline int ipath_make_rc_req(stru if ((int)(qp->s_psn - qp->s_next_psn) > 0) qp->s_next_psn = qp->s_psn; spin_lock(&dev->pending_lock); - if (qp->timerwait.next == LIST_POISON1) + if (list_empty(&qp->timerwait)) list_add_tail(&qp->timerwait, &dev->pending[dev->pending_index]); spin_unlock(&dev->pending_lock); @@ -726,8 +726,8 @@ void ipath_restart_rc(struct ipath_qp *q */ dev = to_idev(qp->ibqp.device); spin_lock(&dev->pending_lock); - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); spin_unlock(&dev->pending_lock); if (wqe->wr.opcode == IB_WR_RDMA_READ) @@ -886,8 +886,8 @@ static int do_rc_ack(struct ipath_qp *qp * just won't find anything to restart if we ACK everything. */ spin_lock(&dev->pending_lock); - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); spin_unlock(&dev->pending_lock); /* @@ -1194,8 +1194,7 @@ static inline void ipath_rc_rcv_resp(str IB_WR_RDMA_READ)) goto ack_done; spin_lock(&dev->pending_lock); - if (qp->s_rnr_timeout == 0 && - qp->timerwait.next != LIST_POISON1) + if (qp->s_rnr_timeout == 0 && !list_empty(&qp->timerwait)) list_move_tail(&qp->timerwait, &dev->pending[dev->pending_index]); spin_unlock(&dev->pending_lock); diff -r df954e47ff67 -r 947e92f4b370 drivers/infiniband/hw/ipath/ipath_ruc.c --- a/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:28 2006 -0700 @@ -435,7 +435,7 @@ void ipath_no_bufs_available(struct ipat unsigned long flags; spin_lock_irqsave(&dev->pending_lock, flags); - if (qp->piowait.next == LIST_POISON1) + if (list_empty(&qp->piowait)) list_add_tail(&qp->piowait, &dev->piowait); spin_unlock_irqrestore(&dev->pending_lock, flags); /* diff -r df954e47ff67 -r 947e92f4b370 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 @@ -517,7 +517,7 @@ static void ipath_ib_timer(void *arg) last = &dev->pending[dev->pending_index]; while (!list_empty(last)) { qp = list_entry(last->next, struct ipath_qp, timerwait); - list_del(&qp->timerwait); + list_del_init(&qp->timerwait); qp->timer_next = resend; resend = qp; atomic_inc(&qp->refcount); @@ -527,7 +527,7 @@ static void ipath_ib_timer(void *arg) qp = list_entry(last->next, struct ipath_qp, timerwait); if (--qp->s_rnr_timeout == 0) { do { - list_del(&qp->timerwait); + list_del_init(&qp->timerwait); tasklet_hi_schedule(&qp->s_task); if (list_empty(last)) break; @@ -607,7 +607,7 @@ static int ipath_ib_piobufavail(void *ar while (!list_empty(&dev->piowait)) { qp = list_entry(dev->piowait.next, struct ipath_qp, piowait); - list_del(&qp->piowait); + list_del_init(&qp->piowait); tasklet_hi_schedule(&qp->s_task); } spin_unlock_irqrestore(&dev->pending_lock, flags); From bos at pathscale.com Fri May 12 16:43:03 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:03 -0700 Subject: [openib-general] [PATCH 18 of 53] ipath - make max mcast sizes configurable In-Reply-To: Message-ID: Make the max IB mcast sizes configurable. Signed-off-by: Bryan O'Sullivan diff -r c5f3731224bb -r df954e47ff67 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 @@ -81,6 +81,32 @@ unsigned int ib_ipath_max_sges = 0xFF; unsigned int ib_ipath_max_sges = 0xFF; module_param_named(max_sges, ib_ipath_max_sges, uint, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(max_sges, "Maximum number of SGEs to support"); + +unsigned int ib_ipath_max_mcast_grps = 16384; +module_param_named(max_mcast_grps, ib_ipath_max_mcast_grps, uint, + S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(max_mcast_grps, + "Maximum number of multicast groups to support"); + +unsigned int ib_ipath_max_mcast_qp_attached = 16; +module_param_named(max_mcast_qp_attached, ib_ipath_max_mcast_qp_attached, + uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(max_mcast_qp_attached, + "Maximum number of attached QPs to support"); + +unsigned int ib_ipath_max_srqs = 1024; +module_param_named(max_srqs, ib_ipath_max_srqs, uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(max_srqs, "Maximum number of SRQs to support"); + +unsigned int ib_ipath_max_srq_sges = 128; +module_param_named(max_srq_sges, ib_ipath_max_srq_sges, + uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(max_srq_sges, "Maximum number of SRQ SGEs to support"); + +unsigned int ib_ipath_max_srq_wrs = 0x1FFFF; +module_param_named(max_srq_wrs, ib_ipath_max_srq_wrs, + uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(max_srq_wrs, "Maximum number of SRQ WRs support"); MODULE_LICENSE("GPL"); MODULE_AUTHOR("PathScale "); @@ -621,14 +647,14 @@ static int ipath_query_device(struct ib_ props->max_qp_rd_atom = 1; props->max_qp_init_rd_atom = 1; /* props->max_res_rd_atom */ - props->max_srq = 0xffff; - props->max_srq_wr = 0xffff; - props->max_srq_sge = 255; + props->max_srq = ib_ipath_max_srqs; + props->max_srq_wr = ib_ipath_max_srq_wrs; + props->max_srq_sge = ib_ipath_max_srq_sges; /* props->local_ca_ack_delay */ props->atomic_cap = IB_ATOMIC_HCA; props->max_pkeys = ipath_layer_get_npkeys(dev->dd); - props->max_mcast_grp = 0xffff; - props->max_mcast_qp_attach = 0xffff; + props->max_mcast_grp = ib_ipath_max_mcast_grps; + props->max_mcast_qp_attach = ib_ipath_max_mcast_qp_attached; props->max_total_mcast_qp_attach = props->max_mcast_qp_attach * props->max_mcast_grp; diff -r c5f3731224bb -r df954e47ff67 drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 @@ -148,6 +148,7 @@ struct ipath_mcast { struct list_head qp_list; wait_queue_head_t wait; atomic_t refcount; + int n_attached; }; /* Memory region */ @@ -434,6 +435,7 @@ struct ipath_ibdev { u32 n_pds_allocated; /* number of PDs allocated for device */ u32 n_ahs_allocated; /* number of AHs allocated for device */ u32 n_cqs_allocated; /* number of CQs allocated for device */ + u32 n_mcast_grps_allocated; /* number of mcast groups allocated */ u64 ipath_sword; /* total dwords sent (sample result) */ u64 ipath_rword; /* total dwords received (sample result) */ u64 ipath_spkts; /* total packets sent (sample result) */ @@ -699,6 +701,16 @@ extern unsigned int ib_ipath_max_qp_wrs; extern unsigned int ib_ipath_max_sges; +extern unsigned int ib_ipath_max_mcast_grps; + +extern unsigned int ib_ipath_max_mcast_qp_attached; + +extern unsigned int ib_ipath_max_srqs; + +extern unsigned int ib_ipath_max_srq_sges; + +extern unsigned int ib_ipath_max_srq_wrs; + extern const u32 ib_ipath_rnr_table[]; #endif /* IPATH_VERBS_H */ diff -r c5f3731224bb -r df954e47ff67 drivers/infiniband/hw/ipath/ipath_verbs_mcast.c --- a/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c Fri May 12 15:55:28 2006 -0700 @@ -92,6 +92,7 @@ static struct ipath_mcast *ipath_mcast_a INIT_LIST_HEAD(&mcast->qp_list); init_waitqueue_head(&mcast->wait); atomic_set(&mcast->refcount, 0); + mcast->n_attached = 0; bail: return mcast; @@ -157,7 +158,8 @@ bail: * the table but the QP was added. Return ESRCH if the QP was already * attached and neither structure was added. */ -static int ipath_mcast_add(struct ipath_mcast *mcast, +static int ipath_mcast_add(struct ipath_ibdev *dev, + struct ipath_mcast *mcast, struct ipath_mcast_qp *mqp) { struct rb_node **n = &mcast_tree.rb_node; @@ -188,16 +190,28 @@ static int ipath_mcast_add(struct ipath_ /* Search the QP list to see if this is already there. */ list_for_each_entry_rcu(p, &tmcast->qp_list, list) { if (p->qp == mqp->qp) { - spin_unlock_irqrestore(&mcast_lock, flags); ret = ESRCH; goto bail; } } + if (tmcast->n_attached == ib_ipath_max_mcast_qp_attached) { + ret = ENOMEM; + goto bail; + } + + tmcast->n_attached++; + list_add_tail_rcu(&mqp->list, &tmcast->qp_list); - spin_unlock_irqrestore(&mcast_lock, flags); ret = EEXIST; goto bail; } + + if (dev->n_mcast_grps_allocated == ib_ipath_max_mcast_grps) { + ret = ENOMEM; + goto bail; + } + + dev->n_mcast_grps_allocated++; list_add_tail_rcu(&mqp->list, &mcast->qp_list); @@ -205,17 +219,18 @@ static int ipath_mcast_add(struct ipath_ rb_link_node(&mcast->rb_node, pn, n); rb_insert_color(&mcast->rb_node, &mcast_tree); + ret = 0; + +bail: spin_unlock_irqrestore(&mcast_lock, flags); - ret = 0; - -bail: return ret; } int ipath_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) { struct ipath_qp *qp = to_iqp(ibqp); + struct ipath_ibdev *dev = to_idev(ibqp->device); struct ipath_mcast *mcast; struct ipath_mcast_qp *mqp; int ret; @@ -235,7 +250,7 @@ int ipath_multicast_attach(struct ib_qp ret = -ENOMEM; goto bail; } - switch (ipath_mcast_add(mcast, mqp)) { + switch (ipath_mcast_add(dev, mcast, mqp)) { case ESRCH: /* Neither was used: can't attach the same QP twice. */ ipath_mcast_qp_free(mqp); @@ -245,6 +260,12 @@ int ipath_multicast_attach(struct ib_qp case EEXIST: /* The mcast wasn't used */ ipath_mcast_free(mcast); break; + case ENOMEM: + /* Exceeded the maximum number of mcast groups. */ + ipath_mcast_qp_free(mqp); + ipath_mcast_free(mcast); + ret = -ENOMEM; + goto bail; default: break; } @@ -258,6 +279,7 @@ int ipath_multicast_detach(struct ib_qp int ipath_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) { struct ipath_qp *qp = to_iqp(ibqp); + struct ipath_ibdev *dev = to_idev(ibqp->device); struct ipath_mcast *mcast = NULL; struct ipath_mcast_qp *p, *tmp; struct rb_node *n; @@ -296,6 +318,7 @@ int ipath_multicast_detach(struct ib_qp * link until we are sure there are no list walkers. */ list_del_rcu(&p->list); + mcast->n_attached--; /* If this was the last attached QP, remove the GID too. */ if (list_empty(&mcast->qp_list)) { @@ -319,6 +342,7 @@ int ipath_multicast_detach(struct ib_qp atomic_dec(&mcast->refcount); wait_event(mcast->wait, !atomic_read(&mcast->refcount)); ipath_mcast_free(mcast); + dev->n_mcast_grps_allocated--; } ret = 0; From bos at pathscale.com Fri May 12 16:42:52 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:42:52 -0700 Subject: [openib-general] [PATCH 7 of 53] ipath - cap maximum number of AHs In-Reply-To: Message-ID: Cap the maximum number of address handles. Signed-off-by: Bryan O'Sullivan diff -r def81ab50644 -r e823378bd19c drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700 @@ -59,6 +59,11 @@ module_param_named(max_pds, ib_ipath_max module_param_named(max_pds, ib_ipath_max_pds, uint, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(max_pds, "Maximum number of protection domains to support"); + +static unsigned int ib_ipath_max_ahs = 0xFFFF; +module_param_named(max_ahs, ib_ipath_max_ahs, uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(max_ahs, + "Maximum number of address handles to support"); MODULE_LICENSE("GPL"); MODULE_AUTHOR("PathScale "); @@ -592,6 +597,7 @@ static int ipath_query_device(struct ib_ props->max_qp_wr = 0xffff; props->max_sge = 255; props->max_cq = 0xffff; + props->max_ah = ib_ipath_max_ahs; props->max_cqe = 0xffff; props->max_mr = dev->lk_table.max; props->max_pd = ib_ipath_max_pds; @@ -764,13 +770,13 @@ static struct ib_pd *ipath_alloc_pd(stru goto bail; } - dev->n_pds_allocated++; - pd = kmalloc(sizeof *pd, GFP_KERNEL); if (!pd) { ret = ERR_PTR(-ENOMEM); goto bail; } + + dev->n_pds_allocated++; /* ib_alloc_pd() will initialize pd->ibpd. */ pd->user = udata != NULL; @@ -805,6 +811,12 @@ static struct ib_ah *ipath_create_ah(str { struct ipath_ah *ah; struct ib_ah *ret; + struct ipath_ibdev *dev = to_idev(pd->device); + + if (dev->n_ahs_allocated == ib_ipath_max_ahs) { + ret = ERR_PTR(-ENOMEM); + goto bail; + } /* A multicast address requires a GRH (see ch. 8.4.1). */ if (ah_attr->dlid >= IPS_MULTICAST_LID_BASE && @@ -848,7 +860,10 @@ bail: */ static int ipath_destroy_ah(struct ib_ah *ibah) { + struct ipath_ibdev *dev = to_idev(ibah->device); struct ipath_ah *ah = to_iah(ibah); + + dev->n_ahs_allocated--; kfree(ah); diff -r def81ab50644 -r e823378bd19c drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700 @@ -432,6 +432,7 @@ struct ipath_ibdev { __be64 gid_prefix; /* in network order */ __be64 mkey; u32 n_pds_allocated; /* number of PDs allocated for device */ + u32 n_ahs_allocated; /* number of AHs allocated for device */ u64 ipath_sword; /* total dwords sent (sample result) */ u64 ipath_rword; /* total dwords received (sample result) */ u64 ipath_spkts; /* total packets sent (sample result) */ From bos at pathscale.com Fri May 12 16:42:58 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:42:58 -0700 Subject: [openib-general] [PATCH 13 of 53] ipath - limit number of SGEs and WRs per QP In-Reply-To: Message-ID: <02a05b853d209c1de666.1147477378@eng-12.pathscale.com> We can't create more than a certain number of SGEs or WRs per QP. Signed-off-by: Bryan O'Sullivan diff -r ab2b013f1f95 -r 02a05b853d20 drivers/infiniband/hw/ipath/ipath_cq.c --- a/drivers/infiniband/hw/ipath/ipath_cq.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_cq.c Fri May 12 15:55:28 2006 -0700 @@ -162,7 +162,7 @@ struct ib_cq *ipath_create_cq(struct ib_ struct ib_wc *wc; struct ib_cq *ret; - if (entries > ib_ipath_max_cqe) { + if (entries > ib_ipath_max_cqes) { ret = ERR_PTR(-EINVAL); goto bail; } diff -r ab2b013f1f95 -r 02a05b853d20 drivers/infiniband/hw/ipath/ipath_qp.c --- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700 @@ -663,8 +663,10 @@ struct ib_qp *ipath_create_qp(struct ib_ size_t sz; struct ib_qp *ret; - if (init_attr->cap.max_send_sge > 255 || - init_attr->cap.max_recv_sge > 255) { + if (init_attr->cap.max_send_sge > ib_ipath_max_sges || + init_attr->cap.max_recv_sge > ib_ipath_max_sges || + init_attr->cap.max_send_wr > ib_ipath_max_qp_wrs || + init_attr->cap.max_recv_wr > ib_ipath_max_qp_wrs) { ret = ERR_PTR(-ENOMEM); goto bail; } diff -r ab2b013f1f95 -r 02a05b853d20 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 @@ -62,18 +62,25 @@ MODULE_PARM_DESC(max_pds, static unsigned int ib_ipath_max_ahs = 0xFFFF; module_param_named(max_ahs, ib_ipath_max_ahs, uint, S_IWUSR | S_IRUGO); -MODULE_PARM_DESC(max_ahs, - "Maximum number of address handles to support"); - -unsigned int ib_ipath_max_cqe = 0xFFFF; -module_param_named(max_cqe, ib_ipath_max_cqe, uint, S_IWUSR | S_IRUGO); -MODULE_PARM_DESC(max_cqe, +MODULE_PARM_DESC(max_ahs, "Maximum number of address handles to support"); + +unsigned int ib_ipath_max_cqes = 0xFFFF; +module_param_named(max_cqes, ib_ipath_max_cqes, uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(max_cqes, "Maximum number of completion queue entries to support"); unsigned int ib_ipath_max_cqs = 0xFFFF; module_param_named(max_cqs, ib_ipath_max_cqs, uint, S_IWUSR | S_IRUGO); -MODULE_PARM_DESC(max_cqs, - "Maximum number of completion queues to support"); +MODULE_PARM_DESC(max_cqs, "Maximum number of completion queues to support"); + +unsigned int ib_ipath_max_qp_wrs = 255; +module_param_named(max_qp_wrs, ib_ipath_max_qp_wrs, uint, + S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(max_qp_wrs, "Maximum number of QP WRs to support"); + +unsigned int ib_ipath_max_sges = 255; +module_param_named(max_sges, ib_ipath_max_sges, uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(max_sges, "Maximum number of SGEs to support"); MODULE_LICENSE("GPL"); MODULE_AUTHOR("PathScale "); @@ -604,11 +611,11 @@ static int ipath_query_device(struct ib_ props->max_mr_size = ~0ull; props->max_qp = dev->qp_table.max; - props->max_qp_wr = 0xffff; - props->max_sge = 255; + props->max_qp_wr = ib_ipath_max_qp_wrs; + props->max_sge = ib_ipath_max_sges; props->max_cq = ib_ipath_max_cqs; props->max_ah = ib_ipath_max_ahs; - props->max_cqe = ib_ipath_max_cqe; + props->max_cqe = ib_ipath_max_cqes; props->max_mr = dev->lk_table.max; props->max_pd = ib_ipath_max_pds; props->max_qp_rd_atom = 1; diff -r ab2b013f1f95 -r 02a05b853d20 drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 @@ -691,10 +691,14 @@ extern const int ib_ipath_state_ops[]; extern unsigned int ib_ipath_lkey_table_size; -extern unsigned int ib_ipath_max_cqe; +extern unsigned int ib_ipath_max_cqes; extern unsigned int ib_ipath_max_cqs; +extern unsigned int ib_ipath_max_qp_wrs; + +extern unsigned int ib_ipath_max_sges; + extern const u32 ib_ipath_rnr_table[]; #endif /* IPATH_VERBS_H */ From bos at pathscale.com Fri May 12 16:43:08 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:08 -0700 Subject: [openib-general] [PATCH 23 of 53] ipath - [TRIVIAL] typo fixes In-Reply-To: Message-ID: <8b882bb46a320431f644.1147477388@eng-12.pathscale.com> A few typo fixes. Signed-off-by: Bryan O'Sullivan diff -r 1887e7b3e2a3 -r 8b882bb46a32 drivers/infiniband/hw/ipath/ipath_intr.c --- a/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700 @@ -753,7 +753,7 @@ irqreturn_t ipath_intr(int irq, void *da } /* - * We try to avoid readint the interrupt status register, since + * We try to avoid reading the interrupt status register, since * that's a PIO read, and stalls the processor for up to about * ~0.25 usec. The idea is that if we processed a port0 packet, * we blindly clear the port 0 receive interrupt bits, and nothing diff -r 1887e7b3e2a3 -r 8b882bb46a32 drivers/infiniband/hw/ipath/ipath_layer.c --- a/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:28 2006 -0700 @@ -882,7 +882,7 @@ static void copy_io(u32 __iomem *piobuf, /** * ipath_verbs_send - send a packet from the verbs layer * @dd: the infinipath device - * @hdrwords: the number of works in the header + * @hdrwords: the number of words in the header * @hdr: the packet header * @len: the length of the packet in bytes * @ss: the SGE to send From bos at pathscale.com Fri May 12 16:43:00 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:00 -0700 Subject: [openib-general] [PATCH 15 of 53] ipath - make some maximum values more sane In-Reply-To: Message-ID: <480ceff18a886d7504a5.1147477380@eng-12.pathscale.com> Increase the limits on some maximum values. Signed-off-by: Bryan O'Sullivan diff -r 5d9fbba3222e -r 480ceff18a88 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 @@ -64,21 +64,21 @@ module_param_named(max_ahs, ib_ipath_max module_param_named(max_ahs, ib_ipath_max_ahs, uint, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(max_ahs, "Maximum number of address handles to support"); -unsigned int ib_ipath_max_cqes = 0xFFFF; +unsigned int ib_ipath_max_cqes = 0x2FFFF; module_param_named(max_cqes, ib_ipath_max_cqes, uint, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(max_cqes, "Maximum number of completion queue entries to support"); -unsigned int ib_ipath_max_cqs = 0xFFFF; +unsigned int ib_ipath_max_cqs = 0x1FFFF; module_param_named(max_cqs, ib_ipath_max_cqs, uint, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(max_cqs, "Maximum number of completion queues to support"); -unsigned int ib_ipath_max_qp_wrs = 255; +unsigned int ib_ipath_max_qp_wrs = 0x1FFFF; module_param_named(max_qp_wrs, ib_ipath_max_qp_wrs, uint, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(max_qp_wrs, "Maximum number of QP WRs to support"); -unsigned int ib_ipath_max_sges = 255; +unsigned int ib_ipath_max_sges = 0xFF; module_param_named(max_sges, ib_ipath_max_sges, uint, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(max_sges, "Maximum number of SGEs to support"); From bos at pathscale.com Fri May 12 16:43:01 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:01 -0700 Subject: [openib-general] [PATCH 16 of 53] ipath - fix reporting of driver version to userspace In-Reply-To: Message-ID: <176d1f0c26a3d2464eea.1147477381@eng-12.pathscale.com> Fix the interface version that gets exported to userspace. Signed-off-by: Bryan O'Sullivan diff -r 480ceff18a88 -r 176d1f0c26a3 drivers/infiniband/hw/ipath/ipath_file_ops.c --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:28 2006 -0700 @@ -139,7 +139,7 @@ static int ipath_get_base_info(struct ip kinfo->spi_piosize = dd->ipath_ibmaxlen; kinfo->spi_mtu = dd->ipath_ibmaxlen; /* maxlen, not ibmtu */ kinfo->spi_port = pd->port_port; - kinfo->spi_sw_version = IPATH_USER_SWVERSION; + kinfo->spi_sw_version = IPATH_KERN_SWVERSION; kinfo->spi_hw_version = dd->ipath_revision; if (copy_to_user(ubase, kinfo, sizeof(*kinfo))) From bos at pathscale.com Fri May 12 16:43:09 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:09 -0700 Subject: [openib-general] [PATCH 24 of 53] ipath - count dropped VL15 packets In-Reply-To: Message-ID: We need to count these for IB conformance. Signed-off-by: Bryan O'Sullivan diff -r 8b882bb46a32 -r e468ad0bd83e drivers/infiniband/hw/ipath/ipath_mad.c --- a/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700 @@ -646,6 +646,7 @@ struct ib_pma_portcounters { #define IB_PMA_SEL_PORT_RCV_ERRORS __constant_htons(0x0008) #define IB_PMA_SEL_PORT_RCV_REMPHYS_ERRORS __constant_htons(0x0010) #define IB_PMA_SEL_PORT_XMIT_DISCARDS __constant_htons(0x0040) +#define IB_PMA_SEL_PORT_VL15_DROPPED __constant_htons(0x0800) #define IB_PMA_SEL_PORT_XMIT_DATA __constant_htons(0x1000) #define IB_PMA_SEL_PORT_RCV_DATA __constant_htons(0x2000) #define IB_PMA_SEL_PORT_XMIT_PACKETS __constant_htons(0x4000) @@ -929,6 +930,10 @@ static int recv_pma_get_portcounters(str else p->port_xmit_discards = cpu_to_be16((u16)cntrs.port_xmit_discards); + if (dev->n_vl15_dropped > 0xFFFFUL) + p->vl15_dropped = __constant_cpu_to_be16(0xFFFF); + else + p->vl15_dropped = cpu_to_be16((u16)dev->n_vl15_dropped); if (cntrs.port_xmit_data > 0xFFFFFFFFUL) p->port_xmit_data = __constant_cpu_to_be32(0xFFFFFFFF); else @@ -1022,6 +1027,9 @@ static int recv_pma_set_portcounters(str if (p->counter_select & IB_PMA_SEL_PORT_XMIT_DISCARDS) dev->n_port_xmit_discards = cntrs.port_xmit_discards; + + if (p->counter_select & IB_PMA_SEL_PORT_VL15_DROPPED) + dev->n_vl15_dropped = 0; if (p->counter_select & IB_PMA_SEL_PORT_XMIT_DATA) dev->n_port_xmit_data = cntrs.port_xmit_data; diff -r 8b882bb46a32 -r e468ad0bd83e drivers/infiniband/hw/ipath/ipath_ud.c --- a/drivers/infiniband/hw/ipath/ipath_ud.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_ud.c Fri May 12 15:55:28 2006 -0700 @@ -554,7 +554,11 @@ void ipath_ud_rcv(struct ipath_ibdev *de spin_lock_irqsave(&rq->lock, flags); if (rq->tail == rq->head) { spin_unlock_irqrestore(&rq->lock, flags); - dev->n_pkt_drops++; + /* Count VL15 packets dropped due to no receive buffer */ + if (qp->ibqp.qp_num == 0) + dev->n_vl15_dropped++; + else + dev->n_pkt_drops++; goto bail; } /* Silently drop packets which are too big. */ diff -r 8b882bb46a32 -r e468ad0bd83e drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 @@ -468,6 +468,7 @@ struct ipath_ibdev { u32 n_other_naks; u32 n_timeouts; u32 n_pkt_drops; + u32 n_vl15_dropped; u32 n_wqe_errs; u32 n_rdma_dup_busy; u32 n_piowait; From bos at pathscale.com Fri May 12 16:43:06 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:06 -0700 Subject: [openib-general] [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt In-Reply-To: Message-ID: <4e0a07d20868c6c4f038.1147477386@eng-12.pathscale.com> I think Roland already has this patch. diff -r 201654fe1962 -r 4e0a07d20868 drivers/infiniband/hw/ipath/ipath_keys.c --- a/drivers/infiniband/hw/ipath/ipath_keys.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_keys.c Fri May 12 15:55:28 2006 -0700 @@ -126,11 +126,11 @@ int ipath_lkey_ok(struct ipath_lkey_tabl /* * We use LKEY == zero to mean a physical kmalloc() address. * This is a bit of a hack since we rely on dma_map_single() - * being reversible by calling bus_to_virt(). + * being reversible by calling phys_to_virt(). */ if (sge->lkey == 0) { isge->mr = NULL; - isge->vaddr = bus_to_virt(sge->addr); + isge->vaddr = phys_to_virt(sge->addr); isge->length = sge->length; isge->sge_length = sge->length; ret = 1; From bos at pathscale.com Fri May 12 16:43:07 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:07 -0700 Subject: [openib-general] [PATCH 22 of 53] ipath - fix "many lost ticks" warning In-Reply-To: Message-ID: <1887e7b3e2a3361b1edc.1147477387@eng-12.pathscale.com> Don't disable interrupts for long, or the kernel gets shirty. Signed-off-by: Bryan O'Sullivan diff -r 4e0a07d20868 -r 1887e7b3e2a3 drivers/infiniband/hw/ipath/ipath_keys.c --- a/drivers/infiniband/hw/ipath/ipath_keys.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_keys.c Fri May 12 15:55:28 2006 -0700 @@ -120,6 +120,7 @@ int ipath_lkey_ok(struct ipath_lkey_tabl struct ib_sge *sge, int acc) { struct ipath_mregion *mr; + unsigned n, m; size_t off; int ret; @@ -151,20 +152,22 @@ int ipath_lkey_ok(struct ipath_lkey_tabl } off += mr->offset; + m = 0; + n = 0; + while (off >= mr->map[m]->segs[n].length) { + off -= mr->map[m]->segs[n].length; + n++; + if (n >= IPATH_SEGSZ) { + m++; + n = 0; + } + } isge->mr = mr; - isge->m = 0; - isge->n = 0; - while (off >= mr->map[isge->m]->segs[isge->n].length) { - off -= mr->map[isge->m]->segs[isge->n].length; - isge->n++; - if (isge->n >= IPATH_SEGSZ) { - isge->m++; - isge->n = 0; - } - } - isge->vaddr = mr->map[isge->m]->segs[isge->n].vaddr + off; - isge->length = mr->map[isge->m]->segs[isge->n].length - off; + isge->vaddr = mr->map[m]->segs[n].vaddr + off; + isge->length = mr->map[m]->segs[n].length - off; isge->sge_length = sge->length; + isge->m = m; + isge->n = n; ret = 1; @@ -189,6 +192,7 @@ int ipath_rkey_ok(struct ipath_ibdev *de struct ipath_lkey_table *rkt = &dev->lk_table; struct ipath_sge *sge = &ss->sge; struct ipath_mregion *mr; + unsigned n, m; size_t off; int ret; @@ -206,20 +210,22 @@ int ipath_rkey_ok(struct ipath_ibdev *de } off += mr->offset; + m = 0; + n = 0; + while (off >= mr->map[m]->segs[n].length) { + off -= mr->map[m]->segs[n].length; + n++; + if (n >= IPATH_SEGSZ) { + m++; + n = 0; + } + } sge->mr = mr; - sge->m = 0; - sge->n = 0; - while (off >= mr->map[sge->m]->segs[sge->n].length) { - off -= mr->map[sge->m]->segs[sge->n].length; - sge->n++; - if (sge->n >= IPATH_SEGSZ) { - sge->m++; - sge->n = 0; - } - } - sge->vaddr = mr->map[sge->m]->segs[sge->n].vaddr + off; - sge->length = mr->map[sge->m]->segs[sge->n].length - off; + sge->vaddr = mr->map[m]->segs[n].vaddr + off; + sge->length = mr->map[m]->segs[n].length - off; sge->sge_length = len; + sge->m = m; + sge->n = n; ss->sg_list = NULL; ss->num_sge = 1; diff -r 4e0a07d20868 -r 1887e7b3e2a3 drivers/infiniband/hw/ipath/ipath_qp.c --- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700 @@ -332,10 +332,11 @@ static void ipath_reset_qp(struct ipath_ qp->remote_qpn = 0; qp->qkey = 0; qp->qp_access_flags = 0; + clear_bit(IPATH_S_BUSY, &qp->s_flags); qp->s_hdrwords = 0; qp->s_psn = 0; qp->r_psn = 0; - atomic_set(&qp->msn, 0); + qp->r_msn = 0; if (qp->ibqp.qp_type == IB_QPT_RC) { qp->s_state = IB_OPCODE_RC_SEND_LAST; qp->r_state = IB_OPCODE_RC_SEND_LAST; @@ -344,7 +345,8 @@ static void ipath_reset_qp(struct ipath_ qp->r_state = IB_OPCODE_UC_SEND_LAST; } qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE; - qp->s_nak_state = 0; + qp->r_ack_state = IB_OPCODE_RC_ACKNOWLEDGE; + qp->r_nak_state = 0; qp->s_rnr_timeout = 0; qp->s_head = 0; qp->s_tail = 0; @@ -362,10 +364,10 @@ static void ipath_reset_qp(struct ipath_ * @qp: the QP to put into an error state * * Flushes both send and receive work queues. - * QP r_rq.lock and s_lock should be held. - */ - -static void ipath_error_qp(struct ipath_qp *qp) + * QP s_lock should be held. + */ + +void ipath_error_qp(struct ipath_qp *qp) { struct ipath_ibdev *dev = to_idev(qp->ibqp.device); struct ib_wc wc; @@ -408,12 +410,14 @@ static void ipath_error_qp(struct ipath_ qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE; wc.opcode = IB_WC_RECV; + spin_lock(&qp->r_rq.lock); while (qp->r_rq.tail != qp->r_rq.head) { wc.wr_id = get_rwqe_ptr(&qp->r_rq, qp->r_rq.tail)->wr_id; if (++qp->r_rq.tail >= qp->r_rq.size) qp->r_rq.tail = 0; ipath_cq_enter(to_icq(qp->ibqp.recv_cq), &wc, 1); } + spin_unlock(&qp->r_rq.lock); } /** @@ -433,8 +437,7 @@ int ipath_modify_qp(struct ib_qp *ibqp, unsigned long flags; int ret; - spin_lock_irqsave(&qp->r_rq.lock, flags); - spin_lock(&qp->s_lock); + spin_lock_irqsave(&qp->s_lock, flags); cur_state = attr_mask & IB_QP_CUR_STATE ? attr->cur_qp_state : qp->state; @@ -505,7 +508,7 @@ int ipath_modify_qp(struct ib_qp *ibqp, } if (attr_mask & IB_QP_MIN_RNR_TIMER) - qp->s_min_rnr_timer = attr->min_rnr_timer; + qp->r_min_rnr_timer = attr->min_rnr_timer; if (attr_mask & IB_QP_QKEY) qp->qkey = attr->qkey; @@ -514,25 +517,13 @@ int ipath_modify_qp(struct ib_qp *ibqp, qp->s_pkey_index = attr->pkey_index; qp->state = new_state; - spin_unlock(&qp->s_lock); - spin_unlock_irqrestore(&qp->r_rq.lock, flags); - - /* - * If QP1 changed to the RTS state, try to move to the link to INIT - * even if it was ACTIVE so the SM will reinitialize the SMA's - * state. - */ - if (qp->ibqp.qp_num == 1 && new_state == IB_QPS_RTS) { - struct ipath_ibdev *dev = to_idev(ibqp->device); - - ipath_layer_set_linkstate(dev->dd, IPATH_IB_LINKDOWN); - } + spin_unlock_irqrestore(&qp->s_lock, flags); + ret = 0; goto bail; inval: - spin_unlock(&qp->s_lock); - spin_unlock_irqrestore(&qp->r_rq.lock, flags); + spin_unlock_irqrestore(&qp->s_lock, flags); ret = -EINVAL; bail: @@ -566,7 +557,7 @@ int ipath_query_qp(struct ib_qp *ibqp, s attr->sq_draining = 0; attr->max_rd_atomic = 1; attr->max_dest_rd_atomic = 1; - attr->min_rnr_timer = qp->s_min_rnr_timer; + attr->min_rnr_timer = qp->r_min_rnr_timer; attr->port_num = 1; attr->timeout = 0; attr->retry_cnt = qp->s_retry_cnt; @@ -593,16 +584,12 @@ int ipath_query_qp(struct ib_qp *ibqp, s * @qp: the queue pair to compute the AETH for * * Returns the AETH. - * - * The QP s_lock should be held. */ __be32 ipath_compute_aeth(struct ipath_qp *qp) { - u32 aeth = atomic_read(&qp->msn) & IPS_MSN_MASK; - - if (qp->s_nak_state) { - aeth |= qp->s_nak_state << IPS_AETH_CREDIT_SHIFT; - } else if (qp->ibqp.srq) { + u32 aeth = qp->r_msn & IPS_MSN_MASK; + + if (qp->ibqp.srq) { /* * Shared receive queues don't generate credits. * Set the credit field to the invalid value. diff -r 4e0a07d20868 -r 1887e7b3e2a3 drivers/infiniband/hw/ipath/ipath_rc.c --- a/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:28 2006 -0700 @@ -41,7 +41,7 @@ * @qp: the QP who's SGE we're restarting * @wqe: the work queue to initialize the QP's SGE from * - * The QP s_lock should be held. + * The QP s_lock should be held and interrupts disabled. */ static void ipath_init_restart(struct ipath_qp *qp, struct ipath_swqe *wqe) { @@ -64,7 +64,7 @@ static void ipath_init_restart(struct ip } /** - * ipath_make_rc_ack - construct a response packet (ACK, NAK, or RDMA read) + * ipath_make_rc_ack - construct a RDMA read response packet * @qp: a pointer to the QP * @ohdr: a pointer to the IB header being constructed * @pmtu: the path MTU @@ -76,7 +76,6 @@ u32 ipath_make_rc_ack(struct ipath_qp *q struct ipath_other_headers *ohdr, u32 pmtu) { - struct ipath_sge_state *ss; u32 hwords; u32 len; u32 bth0; @@ -90,7 +89,6 @@ u32 ipath_make_rc_ack(struct ipath_qp *q */ switch (qp->s_ack_state) { case OP(RDMA_READ_REQUEST): - ss = &qp->s_rdma_sge; len = qp->s_rdma_len; if (len > pmtu) { len = pmtu; @@ -107,7 +105,6 @@ u32 ipath_make_rc_ack(struct ipath_qp *q qp->s_ack_state = OP(RDMA_READ_RESPONSE_MIDDLE); /* FALLTHROUGH */ case OP(RDMA_READ_RESPONSE_MIDDLE): - ss = &qp->s_rdma_sge; len = qp->s_rdma_len; if (len > pmtu) len = pmtu; @@ -122,44 +119,18 @@ u32 ipath_make_rc_ack(struct ipath_qp *q case OP(RDMA_READ_RESPONSE_LAST): case OP(RDMA_READ_RESPONSE_ONLY): + default: /* * We have to prevent new requests from changing * the r_sge state while a ipath_verbs_send() * is in progress. - * Changing r_state allows the receiver - * to continue processing new packets. - * We do it here now instead of above so - * that we are sure the packet was sent before - * changing the state. - */ - qp->r_state = OP(RDMA_READ_RESPONSE_LAST); + */ qp->s_ack_state = OP(ACKNOWLEDGE); bth0 = 0; goto bail; - - case OP(COMPARE_SWAP): - case OP(FETCH_ADD): - ss = NULL; - len = 0; - qp->r_state = OP(SEND_LAST); - qp->s_ack_state = OP(ACKNOWLEDGE); - bth0 = OP(ATOMIC_ACKNOWLEDGE) << 24; - ohdr->u.at.aeth = ipath_compute_aeth(qp); - ohdr->u.at.atomic_ack_eth = cpu_to_be64(qp->s_ack_atomic); - hwords += sizeof(ohdr->u.at) / 4; - break; - - default: - /* Send a regular ACK. */ - ss = NULL; - len = 0; - qp->s_ack_state = OP(ACKNOWLEDGE); - bth0 = qp->s_ack_state << 24; - ohdr->u.aeth = ipath_compute_aeth(qp); - hwords++; } qp->s_hdrwords = hwords; - qp->s_cur_sge = ss; + qp->s_cur_sge = &qp->s_rdma_sge; qp->s_cur_size = len; bail: @@ -175,7 +146,7 @@ bail: * @bth2p: pointer to the BTH PSN word * * Return 1 if constructed; otherwise, return 0. - * Note the QP s_lock must be held. + * Note the QP s_lock must be held and interrupts disabled. */ int ipath_make_rc_req(struct ipath_qp *qp, struct ipath_other_headers *ohdr, @@ -532,11 +503,16 @@ static void send_rc_ack(struct ipath_qp ohdr = &hdr.u.l.oth; lrh0 = IPS_LRH_GRH; } + /* read pkey_index w/o lock (its atomic) */ bth0 = ipath_layer_get_pkey(dev->dd, qp->s_pkey_index); - ohdr->u.aeth = ipath_compute_aeth(qp); - if (qp->s_ack_state >= OP(COMPARE_SWAP)) { + if (qp->r_nak_state) + ohdr->u.aeth = (qp->r_msn & IPS_MSN_MASK) | + (qp->r_nak_state << IPS_AETH_CREDIT_SHIFT); + else + ohdr->u.aeth = ipath_compute_aeth(qp); + if (qp->r_ack_state >= OP(COMPARE_SWAP)) { bth0 |= OP(ATOMIC_ACKNOWLEDGE) << 24; - ohdr->u.at.atomic_ack_eth = cpu_to_be64(qp->s_ack_atomic); + ohdr->u.at.atomic_ack_eth = cpu_to_be64(qp->r_atomic_data); hwords += sizeof(ohdr->u.at.atomic_ack_eth) / 4; } else bth0 |= OP(ACKNOWLEDGE) << 24; @@ -547,13 +523,13 @@ static void send_rc_ack(struct ipath_qp hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->dd)); ohdr->bth[0] = cpu_to_be32(bth0); ohdr->bth[1] = cpu_to_be32(qp->remote_qpn); - ohdr->bth[2] = cpu_to_be32(qp->s_ack_psn & IPS_PSN_MASK); + ohdr->bth[2] = cpu_to_be32(qp->r_ack_psn & IPS_PSN_MASK); /* * If we can send the ACK, clear the ACK state. */ if (ipath_verbs_send(dev->dd, hwords, (u32 *) &hdr, 0, NULL) == 0) { - qp->s_ack_state = OP(ACKNOWLEDGE); + qp->r_ack_state = OP(ACKNOWLEDGE); dev->n_unicast_xmit++; } else dev->n_rc_qacks++; @@ -647,7 +623,7 @@ done: * @psn: packet sequence number for the request * @wc: the work completion request * - * The QP s_lock should be held. + * The QP s_lock should be held and interrupts disabled. */ void ipath_restart_rc(struct ipath_qp *qp, u32 psn, struct ib_wc *wc) { @@ -711,7 +687,7 @@ bail: * * This is called from ipath_rc_rcv_resp() to process an incoming RC ACK * for the given QP. - * Called at interrupt level with the QP s_lock held. + * Called at interrupt level with the QP s_lock held and interrupts disabled. * Returns 1 if OK, 0 if current operation should be aborted (NAK). */ static int do_rc_ack(struct ipath_qp *qp, u32 aeth, u32 psn, int opcode) @@ -1125,8 +1101,6 @@ static inline int ipath_rc_rcv_error(str { struct ib_reth *reth; - spin_lock(&qp->s_lock); - if (diff > 0) { /* * Packet sequence error. @@ -1134,15 +1108,16 @@ static inline int ipath_rc_rcv_error(str * Don't queue the NAK if a RDMA read, atomic, or * NAK is pending though. */ - if ((qp->s_ack_state >= OP(RDMA_READ_REQUEST) && - qp->s_ack_state != OP(ACKNOWLEDGE)) || - qp->s_nak_state != 0) + if (qp->s_ack_state != OP(ACKNOWLEDGE) || + qp->r_nak_state != 0) goto done; - qp->s_ack_state = OP(SEND_ONLY); - qp->s_nak_state = IB_NAK_PSN_ERROR; - /* Use the expected PSN. */ - qp->s_ack_psn = qp->r_psn; - goto resched; + if (qp->r_ack_state < OP(COMPARE_SWAP)) { + qp->r_ack_state = OP(SEND_ONLY); + qp->r_nak_state = IB_NAK_PSN_ERROR; + /* Use the expected PSN. */ + qp->r_ack_psn = qp->r_psn; + } + goto send_ack; } /* @@ -1156,30 +1131,29 @@ static inline int ipath_rc_rcv_error(str * send the earliest so that RDMA reads can be restarted at * the requester's expected PSN. */ - if (qp->s_ack_state != OP(ACKNOWLEDGE) && - ipath_cmp24(psn, qp->s_ack_psn) >= 0) { - if (qp->s_ack_state < OP(RDMA_READ_REQUEST)) - qp->s_ack_psn = psn; - goto done; - } - switch (opcode) { - case OP(RDMA_READ_REQUEST): - /* - * We have to be careful to not change s_rdma_sge - * while ipath_do_rc_send() is using it and not - * holding the s_lock. - */ - if (qp->s_ack_state != OP(ACKNOWLEDGE) && - qp->s_ack_state >= OP(RDMA_READ_REQUEST)) { - dev->n_rdma_dup_busy++; - goto done; - } + if (opcode == OP(RDMA_READ_REQUEST)) { /* RETH comes after BTH */ if (!header_in_data) reth = &ohdr->u.rc.reth; else { reth = (struct ib_reth *)data; data += sizeof(*reth); + } + /* + * If we receive a duplicate RDMA request, it means the + * requester saw a sequence error and needs to restart + * from an earlier point. We can abort the current + * RDMA read send in that case. + */ + spin_lock_irq(&qp->s_lock); + if (qp->s_ack_state != OP(ACKNOWLEDGE) && + (qp->s_hdrwords || ipath_cmp24(psn, qp->s_ack_psn) >= 0)) { + /* + * We are already sending earlier requested data. + * Don't abort it to send later out of sequence data. + */ + spin_unlock_irq(&qp->s_lock); + goto done; } qp->s_rdma_len = be32_to_cpu(reth->length); if (qp->s_rdma_len != 0) { @@ -1194,8 +1168,10 @@ static inline int ipath_rc_rcv_error(str ok = ipath_rkey_ok(dev, &qp->s_rdma_sge, qp->s_rdma_len, vaddr, rkey, IB_ACCESS_REMOTE_READ); - if (unlikely(!ok)) + if (unlikely(!ok)) { + spin_unlock_irq(&qp->s_lock); goto done; + } } else { qp->s_rdma_sge.sg_list = NULL; qp->s_rdma_sge.num_sge = 0; @@ -1204,8 +1180,30 @@ static inline int ipath_rc_rcv_error(str qp->s_rdma_sge.sge.length = 0; qp->s_rdma_sge.sge.sge_length = 0; } - break; - + qp->s_ack_state = opcode; + qp->s_ack_psn = psn; + spin_unlock_irq(&qp->s_lock); + tasklet_hi_schedule(&qp->s_task); + goto send_ack; + } + + /* + * A pending RDMA read will ACK anything before it so + * ignore earlier duplicate requests. + */ + if (qp->s_ack_state != OP(ACKNOWLEDGE)) + goto done; + + /* + * If an ACK is pending, don't replace the pending ACK + * with an earlier one since the later one will ACK the earlier. + * Also, if we already have a pending atomic, send it. + */ + if (qp->r_ack_state != OP(ACKNOWLEDGE) && + (ipath_cmp24(psn, qp->r_ack_psn) <= 0 || + qp->r_ack_state >= OP(COMPARE_SWAP))) + goto send_ack; + switch (opcode) { case OP(COMPARE_SWAP): case OP(FETCH_ADD): /* @@ -1214,17 +1212,15 @@ static inline int ipath_rc_rcv_error(str */ if ((psn & IPS_PSN_MASK) != qp->r_atomic_psn) goto done; - qp->s_ack_atomic = qp->r_atomic_data; break; } - qp->s_ack_state = opcode; - qp->s_nak_state = 0; - qp->s_ack_psn = psn; -resched: + qp->r_ack_state = opcode; + qp->r_nak_state = 0; + qp->r_ack_psn = psn; +send_ack: return 0; done: - spin_unlock(&qp->s_lock); return 1; } @@ -1249,7 +1245,6 @@ void ipath_rc_rcv(struct ipath_ibdev *de u32 hdrsize; u32 psn; u32 pad; - unsigned long flags; struct ib_wc wc; u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu); int diff; @@ -1290,10 +1285,8 @@ void ipath_rc_rcv(struct ipath_ibdev *de opcode <= OP(ATOMIC_ACKNOWLEDGE)) { ipath_rc_rcv_resp(dev, ohdr, data, tlen, qp, opcode, psn, hdrsize, pmtu, header_in_data); - goto bail; - } - - spin_lock_irqsave(&qp->r_rq.lock, flags); + goto done; + } /* Compute 24 bits worth of difference. */ diff = ipath_cmp24(psn, qp->r_psn); @@ -1301,7 +1294,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de if (ipath_rc_rcv_error(dev, ohdr, data, qp, opcode, psn, diff, header_in_data)) goto done; - goto resched; + goto send_ack; } /* Check for opcode sequence errors. */ @@ -1318,18 +1311,14 @@ void ipath_rc_rcv(struct ipath_ibdev *de * Don't queue the NAK if a RDMA read, atomic, or NAK * is pending though. */ - spin_lock(&qp->s_lock); - if (qp->s_ack_state >= OP(RDMA_READ_REQUEST) && - qp->s_ack_state != OP(ACKNOWLEDGE)) { - spin_unlock(&qp->s_lock); - goto done; - } + if (qp->r_ack_state >= OP(COMPARE_SWAP)) + goto send_ack; /* XXX Flush WQEs */ qp->state = IB_QPS_ERR; - qp->s_ack_state = OP(SEND_ONLY); - qp->s_nak_state = IB_NAK_INVALID_REQUEST; - qp->s_ack_psn = qp->r_psn; - goto resched; + qp->r_ack_state = OP(SEND_ONLY); + qp->r_nak_state = IB_NAK_INVALID_REQUEST; + qp->r_ack_psn = qp->r_psn; + goto send_ack; case OP(RDMA_WRITE_FIRST): case OP(RDMA_WRITE_MIDDLE): @@ -1338,20 +1327,6 @@ void ipath_rc_rcv(struct ipath_ibdev *de opcode == OP(RDMA_WRITE_LAST_WITH_IMMEDIATE)) break; goto nack_inv; - - case OP(RDMA_READ_REQUEST): - case OP(COMPARE_SWAP): - case OP(FETCH_ADD): - /* - * Drop all new requests until a response has been sent. A - * new request then ACKs the RDMA response we sent. Relaxed - * ordering would allow new requests to be processed but we - * would need to keep a queue of rwqe's for all that are in - * progress. Note that we can't RNR NAK this request since - * the RDMA READ or atomic response is already queued to be - * sent (unless we implement a response send queue). - */ - goto done; default: if (opcode == OP(SEND_MIDDLE) || @@ -1361,6 +1336,11 @@ void ipath_rc_rcv(struct ipath_ibdev *de opcode == OP(RDMA_WRITE_LAST) || opcode == OP(RDMA_WRITE_LAST_WITH_IMMEDIATE)) goto nack_inv; + /* + * Note that it is up to the requester to not send a new + * RDMA read or atomic operation before receiving an ACK + * for the previous operation. + */ break; } @@ -1377,16 +1357,12 @@ void ipath_rc_rcv(struct ipath_ibdev *de * Don't queue the NAK if a RDMA read or atomic * is pending though. */ - spin_lock(&qp->s_lock); - if (qp->s_ack_state >= OP(RDMA_READ_REQUEST) && - qp->s_ack_state != OP(ACKNOWLEDGE)) { - spin_unlock(&qp->s_lock); - goto done; - } - qp->s_ack_state = OP(SEND_ONLY); - qp->s_nak_state = IB_RNR_NAK | qp->s_min_rnr_timer; - qp->s_ack_psn = qp->r_psn; - goto resched; + if (qp->r_ack_state >= OP(COMPARE_SWAP)) + goto send_ack; + qp->r_ack_state = OP(SEND_ONLY); + qp->r_nak_state = IB_RNR_NAK | qp->r_min_rnr_timer; + qp->r_ack_psn = qp->r_psn; + goto send_ack; } qp->r_rcv_len = 0; /* FALLTHROUGH */ @@ -1443,7 +1419,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de if (unlikely(wc.byte_len > qp->r_len)) goto nack_inv; ipath_copy_sge(&qp->r_sge, data, tlen); - atomic_inc(&qp->msn); + qp->r_msn++; if (opcode == OP(RDMA_WRITE_LAST) || opcode == OP(RDMA_WRITE_ONLY)) break; @@ -1487,29 +1463,8 @@ void ipath_rc_rcv(struct ipath_ibdev *de ok = ipath_rkey_ok(dev, &qp->r_sge, qp->r_len, vaddr, rkey, IB_ACCESS_REMOTE_WRITE); - if (unlikely(!ok)) { - nack_acc: - /* - * A NAK will ACK earlier sends and RDMA - * writes. Don't queue the NAK if a RDMA - * read, atomic, or NAK is pending though. - */ - spin_lock(&qp->s_lock); - nack_acc1: - if (qp->s_ack_state >= - OP(RDMA_READ_REQUEST) && - qp->s_ack_state != OP(ACKNOWLEDGE)) { - spin_unlock(&qp->s_lock); - goto done; - } - /* XXX Flush WQEs */ - qp->state = IB_QPS_ERR; - qp->s_ack_state = OP(RDMA_WRITE_ONLY); - qp->s_nak_state = - IB_NAK_REMOTE_ACCESS_ERROR; - qp->s_ack_psn = qp->r_psn; - goto resched; - } + if (unlikely(!ok)) + goto nack_acc; } else { qp->r_sge.sg_list = NULL; qp->r_sge.sge.mr = NULL; @@ -1539,16 +1494,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_READ))) goto nack_acc; - /* - * Ignore request if we already have an - * RDMA read or ATOMIC pending. - */ - spin_lock(&qp->s_lock); - if (qp->s_ack_state != OP(ACKNOWLEDGE) && - qp->s_ack_state >= OP(RDMA_READ_REQUEST)) { - spin_unlock(&qp->s_lock); - goto done; - } + spin_lock_irq(&qp->s_lock); qp->s_rdma_len = be32_to_cpu(reth->length); if (qp->s_rdma_len != 0) { u32 rkey = be32_to_cpu(reth->rkey); @@ -1560,7 +1506,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de qp->s_rdma_len, vaddr, rkey, IB_ACCESS_REMOTE_READ); if (unlikely(!ok)) - goto nack_acc1; + goto nack_acc; /* * Update the next expected PSN. We add 1 later * below, so only add the remainder here. @@ -1580,13 +1526,20 @@ void ipath_rc_rcv(struct ipath_ibdev *de * finish sending the result since a duplicate request would * increment it more than once. */ - atomic_inc(&qp->msn); + qp->r_msn++; + qp->s_ack_state = opcode; - qp->s_nak_state = 0; qp->s_ack_psn = psn; + spin_unlock_irq(&qp->s_lock); + qp->r_psn++; qp->r_state = opcode; - goto rdmadone; + qp->r_nak_state = 0; + + /* Call ipath_do_rc_send() in another thread. */ + tasklet_hi_schedule(&qp->s_task); + + goto done; case OP(COMPARE_SWAP): case OP(FETCH_ADD): { @@ -1615,7 +1568,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de goto nack_acc; /* Perform atomic OP and save result. */ sdata = be64_to_cpu(ateth->swap_data); - spin_lock(&dev->pending_lock); + spin_lock_irq(&dev->pending_lock); qp->r_atomic_data = *(u64 *) qp->r_sge.sge.vaddr; if (opcode == OP(FETCH_ADD)) *(u64 *) qp->r_sge.sge.vaddr = @@ -1623,8 +1576,8 @@ void ipath_rc_rcv(struct ipath_ibdev *de else if (qp->r_atomic_data == be64_to_cpu(ateth->compare_data)) *(u64 *) qp->r_sge.sge.vaddr = sdata; - spin_unlock(&dev->pending_lock); - atomic_inc(&qp->msn); + spin_unlock_irq(&dev->pending_lock); + qp->r_msn++; qp->r_atomic_psn = psn & IPS_PSN_MASK; psn |= 1 << 31; break; @@ -1636,46 +1589,39 @@ void ipath_rc_rcv(struct ipath_ibdev *de } qp->r_psn++; qp->r_state = opcode; + qp->r_nak_state = 0; /* Send an ACK if requested or required. */ if (psn & (1 << 31)) { /* * Coalesce ACKs unless there is a RDMA READ or * ATOMIC pending. */ - spin_lock(&qp->s_lock); - if (qp->s_ack_state == OP(ACKNOWLEDGE) || - qp->s_ack_state < OP(RDMA_READ_REQUEST)) { - qp->s_ack_state = opcode; - qp->s_nak_state = 0; - qp->s_ack_psn = psn; - qp->s_ack_atomic = qp->r_atomic_data; - goto resched; - } - spin_unlock(&qp->s_lock); - } + if (qp->r_ack_state < OP(COMPARE_SWAP)) { + qp->r_ack_state = opcode; + qp->r_ack_psn = psn; + } + goto send_ack; + } + goto done; + +nack_acc: + /* + * A NAK will ACK earlier sends and RDMA writes. + * Don't queue the NAK if a RDMA read, atomic, or NAK + * is pending though. + */ + if (qp->r_ack_state < OP(COMPARE_SWAP)) { + /* XXX Flush WQEs */ + qp->state = IB_QPS_ERR; + qp->r_ack_state = OP(RDMA_WRITE_ONLY); + qp->r_nak_state = IB_NAK_REMOTE_ACCESS_ERROR; + qp->r_ack_psn = qp->r_psn; + } +send_ack: + /* Send ACK right away unless a RDMA read is pending. */ + if (qp->s_ack_state == OP(ACKNOWLEDGE)) + send_rc_ack(qp); + done: - spin_unlock_irqrestore(&qp->r_rq.lock, flags); - goto bail; - -resched: - /* - * Try to send ACK right away but not if ipath_do_rc_send() is - * active. - */ - if (qp->s_hdrwords == 0 && - (qp->s_ack_state < IB_OPCODE_RDMA_READ_REQUEST || - qp->s_ack_state >= IB_OPCODE_COMPARE_SWAP)) - send_rc_ack(qp); - else - dev->n_rc_qacks++; - -rdmadone: - spin_unlock(&qp->s_lock); - spin_unlock_irqrestore(&qp->r_rq.lock, flags); - - /* Call ipath_do_rc_send() in another thread. */ - tasklet_hi_schedule(&qp->s_task); - -bail: return; } diff -r 4e0a07d20868 -r 1887e7b3e2a3 drivers/infiniband/hw/ipath/ipath_ruc.c --- a/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:28 2006 -0700 @@ -112,10 +112,11 @@ void ipath_insert_rnr_queue(struct ipath * * Return 0 if no RWQE is available, otherwise return 1. * - * Called at interrupt level with the QP r_rq.lock held. + * Can be called from interrupt level. */ int ipath_get_rwqe(struct ipath_qp *qp, int wr_id_only) { + unsigned long flags; struct ipath_rq *rq; struct ipath_srq *srq; struct ipath_rwqe *wqe; @@ -123,6 +124,8 @@ int ipath_get_rwqe(struct ipath_qp *qp, if (!qp->ibqp.srq) { rq = &qp->r_rq; + spin_lock_irqsave(&rq->lock, flags); + if (unlikely(rq->tail == rq->head)) { ret = 0; goto bail; @@ -137,15 +140,14 @@ int ipath_get_rwqe(struct ipath_qp *qp, } if (++rq->tail >= rq->size) rq->tail = 0; - ret = 1; - goto bail; + goto done; } srq = to_isrq(qp->ibqp.srq); rq = &srq->rq; - spin_lock(&rq->lock); + spin_lock_irqsave(&rq->lock, flags); + if (unlikely(rq->tail == rq->head)) { - spin_unlock(&rq->lock); ret = 0; goto bail; } @@ -175,13 +177,13 @@ int ipath_get_rwqe(struct ipath_qp *qp, ev.event = IB_EVENT_SRQ_LIMIT_REACHED; srq->ibsrq.event_handler(&ev, srq->ibsrq.srq_context); - } else - spin_unlock(&rq->lock); - } else - spin_unlock(&rq->lock); + } + } +done: ret = 1; bail: + spin_unlock_irqrestore(&rq->lock, flags); return ret; } @@ -247,10 +249,8 @@ again: wc.imm_data = wqe->wr.imm_data; /* FALLTHROUGH */ case IB_WR_SEND: - spin_lock_irqsave(&qp->r_rq.lock, flags); if (!ipath_get_rwqe(qp, 0)) { rnr_nak: - spin_unlock_irqrestore(&qp->r_rq.lock, flags); /* Handle RNR NAK */ if (qp->ibqp.qp_type == IB_QPT_UC) goto send_comp; @@ -262,20 +262,17 @@ again: sqp->s_rnr_retry--; dev->n_rnr_naks++; sqp->s_rnr_timeout = - ib_ipath_rnr_table[sqp->s_min_rnr_timer]; + ib_ipath_rnr_table[sqp->r_min_rnr_timer]; ipath_insert_rnr_queue(sqp); goto done; } - spin_unlock_irqrestore(&qp->r_rq.lock, flags); break; case IB_WR_RDMA_WRITE_WITH_IMM: wc.wc_flags = IB_WC_WITH_IMM; wc.imm_data = wqe->wr.imm_data; - spin_lock_irqsave(&qp->r_rq.lock, flags); if (!ipath_get_rwqe(qp, 1)) goto rnr_nak; - spin_unlock_irqrestore(&qp->r_rq.lock, flags); /* FALLTHROUGH */ case IB_WR_RDMA_WRITE: if (wqe->length == 0) diff -r 4e0a07d20868 -r 1887e7b3e2a3 drivers/infiniband/hw/ipath/ipath_uc.c --- a/drivers/infiniband/hw/ipath/ipath_uc.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_uc.c Fri May 12 15:55:28 2006 -0700 @@ -240,7 +240,6 @@ void ipath_uc_rcv(struct ipath_ibdev *de u32 hdrsize; u32 psn; u32 pad; - unsigned long flags; struct ib_wc wc; u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu); struct ib_reth *reth; @@ -278,8 +277,6 @@ void ipath_uc_rcv(struct ipath_ibdev *de wc.imm_data = 0; wc.wc_flags = 0; - spin_lock_irqsave(&qp->r_rq.lock, flags); - /* Compare the PSN verses the expected PSN. */ if (unlikely(ipath_cmp24(psn, qp->r_psn) != 0)) { /* @@ -536,15 +533,11 @@ void ipath_uc_rcv(struct ipath_ibdev *de default: /* Drop packet for unknown opcodes. */ - spin_unlock_irqrestore(&qp->r_rq.lock, flags); dev->n_pkt_drops++; - goto bail; + goto done; } qp->r_psn++; qp->r_state = opcode; done: - spin_unlock_irqrestore(&qp->r_rq.lock, flags); - -bail: return; } diff -r 4e0a07d20868 -r 1887e7b3e2a3 drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 @@ -306,32 +306,33 @@ struct ipath_qp { u32 s_next_psn; /* PSN for next request */ u32 s_last_psn; /* last response PSN processed */ u32 s_psn; /* current packet sequence number */ + u32 s_ack_psn; /* PSN for RDMA_READ */ u32 s_rnr_timeout; /* number of milliseconds for RNR timeout */ - u32 s_ack_psn; /* PSN for next ACK or RDMA_READ */ - u64 s_ack_atomic; /* data for atomic ACK */ + u32 r_ack_psn; /* PSN for next ACK or atomic ACK */ u64 r_wr_id; /* ID for current receive WQE */ u64 r_atomic_data; /* data for last atomic op */ u32 r_atomic_psn; /* PSN of last atomic op */ u32 r_len; /* total length of r_sge */ u32 r_rcv_len; /* receive data len processed */ u32 r_psn; /* expected rcv packet sequence number */ + u32 r_msn; /* message sequence number */ u8 state; /* QP state */ u8 s_state; /* opcode of last packet sent */ u8 s_ack_state; /* opcode of packet to ACK */ - u8 s_nak_state; /* non-zero if NAK is pending */ u8 r_state; /* opcode of last packet received */ + u8 r_ack_state; /* opcode of packet to ACK */ + u8 r_nak_state; /* non-zero if NAK is pending */ + u8 r_min_rnr_timer; /* retry timeout value for RNR NAKs */ u8 r_reuse_sge; /* for UC receive errors */ u8 r_sge_inx; /* current index into sg_list */ + u8 qp_access_flags; u8 s_max_sge; /* size of s_wq->sg_list */ - u8 qp_access_flags; u8 s_retry_cnt; /* number of times to retry */ u8 s_rnr_retry_cnt; - u8 s_min_rnr_timer; u8 s_retry; /* requester retry counter */ u8 s_rnr_retry; /* requester RNR retry counter */ u8 s_pkey_index; /* PKEY index to use */ enum ib_mtu path_mtu; - atomic_t msn; /* message sequence number */ u32 remote_qpn; u32 qkey; /* QKEY for this QP (for UD or RD) */ u32 s_size; /* send work queue size */ From bos at pathscale.com Fri May 12 16:43:22 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:22 -0700 Subject: [openib-general] [PATCH 37 of 53] ipath - name zero counter offsets consistently In-Reply-To: Message-ID: Name zero counter offsets consistently so it's clear they aren't counters. Signed-off-by: Bryan O'Sullivan diff -r ec1934faf5d1 -r f8debae94d44 drivers/infiniband/hw/ipath/ipath_mad.c --- a/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700 @@ -251,7 +251,7 @@ static int recv_subn_get_portinfo(struct /* P_KeyViolations are counted by hardware. */ pip->pkey_violations = cpu_to_be16((ipath_layer_get_cr_errpkey(dev->dd) - - dev->n_pkey_violations) & 0xFFFF); + dev->z_pkey_violations) & 0xFFFF); pip->qkey_violations = cpu_to_be16(dev->qkey_violations); /* Only the hardware GUID is supported for now */ pip->guid_cap = 1; @@ -425,7 +425,7 @@ static int recv_subn_set_portinfo(struct * later. */ if (pip->pkey_violations == 0) - dev->n_pkey_violations = + dev->z_pkey_violations = ipath_layer_get_cr_errpkey(dev->dd); if (pip->qkey_violations == 0) @@ -883,18 +883,18 @@ static int recv_pma_get_portcounters(str ipath_layer_get_counters(dev->dd, &cntrs); /* Adjust counters for any resets done. */ - cntrs.symbol_error_counter -= dev->n_symbol_error_counter; + cntrs.symbol_error_counter -= dev->z_symbol_error_counter; cntrs.link_error_recovery_counter -= - dev->n_link_error_recovery_counter; - cntrs.link_downed_counter -= dev->n_link_downed_counter; + dev->z_link_error_recovery_counter; + cntrs.link_downed_counter -= dev->z_link_downed_counter; cntrs.port_rcv_errors += dev->rcv_errors; - cntrs.port_rcv_errors -= dev->n_port_rcv_errors; - cntrs.port_rcv_remphys_errors -= dev->n_port_rcv_remphys_errors; - cntrs.port_xmit_discards -= dev->n_port_xmit_discards; - cntrs.port_xmit_data -= dev->n_port_xmit_data; - cntrs.port_rcv_data -= dev->n_port_rcv_data; - cntrs.port_xmit_packets -= dev->n_port_xmit_packets; - cntrs.port_rcv_packets -= dev->n_port_rcv_packets; + cntrs.port_rcv_errors -= dev->z_port_rcv_errors; + cntrs.port_rcv_remphys_errors -= dev->z_port_rcv_remphys_errors; + cntrs.port_xmit_discards -= dev->z_port_xmit_discards; + cntrs.port_xmit_data -= dev->z_port_xmit_data; + cntrs.port_rcv_data -= dev->z_port_rcv_data; + cntrs.port_xmit_packets -= dev->z_port_xmit_packets; + cntrs.port_rcv_packets -= dev->z_port_rcv_packets; cntrs.local_link_integrity_errors -= dev->z_local_link_integrity_errors; cntrs.excessive_buffer_overrun_errors -= @@ -981,10 +981,10 @@ static int recv_pma_get_portcounters_ext &rpkts, &xwait); /* Adjust counters for any resets done. */ - swords -= dev->n_port_xmit_data; - rwords -= dev->n_port_rcv_data; - spkts -= dev->n_port_xmit_packets; - rpkts -= dev->n_port_rcv_packets; + swords -= dev->z_port_xmit_data; + rwords -= dev->z_port_rcv_data; + spkts -= dev->z_port_xmit_packets; + rpkts -= dev->z_port_rcv_packets; memset(pmp->data, 0, sizeof(pmp->data)); @@ -1020,25 +1020,25 @@ static int recv_pma_set_portcounters(str ipath_layer_get_counters(dev->dd, &cntrs); if (p->counter_select & IB_PMA_SEL_SYMBOL_ERROR) - dev->n_symbol_error_counter = cntrs.symbol_error_counter; + dev->z_symbol_error_counter = cntrs.symbol_error_counter; if (p->counter_select & IB_PMA_SEL_LINK_ERROR_RECOVERY) - dev->n_link_error_recovery_counter = + dev->z_link_error_recovery_counter = cntrs.link_error_recovery_counter; if (p->counter_select & IB_PMA_SEL_LINK_DOWNED) - dev->n_link_downed_counter = cntrs.link_downed_counter; + dev->z_link_downed_counter = cntrs.link_downed_counter; if (p->counter_select & IB_PMA_SEL_PORT_RCV_ERRORS) - dev->n_port_rcv_errors = + dev->z_port_rcv_errors = cntrs.port_rcv_errors + dev->rcv_errors; if (p->counter_select & IB_PMA_SEL_PORT_RCV_REMPHYS_ERRORS) - dev->n_port_rcv_remphys_errors = + dev->z_port_rcv_remphys_errors = cntrs.port_rcv_remphys_errors; if (p->counter_select & IB_PMA_SEL_PORT_XMIT_DISCARDS) - dev->n_port_xmit_discards = cntrs.port_xmit_discards; + dev->z_port_xmit_discards = cntrs.port_xmit_discards; if (p->counter_select & IB_PMA_SEL_LOCAL_LINK_INTEGRITY_ERRORS) dev->z_local_link_integrity_errors = @@ -1052,16 +1052,16 @@ static int recv_pma_set_portcounters(str dev->n_vl15_dropped = 0; if (p->counter_select & IB_PMA_SEL_PORT_XMIT_DATA) - dev->n_port_xmit_data = cntrs.port_xmit_data; + dev->z_port_xmit_data = cntrs.port_xmit_data; if (p->counter_select & IB_PMA_SEL_PORT_RCV_DATA) - dev->n_port_rcv_data = cntrs.port_rcv_data; + dev->z_port_rcv_data = cntrs.port_rcv_data; if (p->counter_select & IB_PMA_SEL_PORT_XMIT_PACKETS) - dev->n_port_xmit_packets = cntrs.port_xmit_packets; + dev->z_port_xmit_packets = cntrs.port_xmit_packets; if (p->counter_select & IB_PMA_SEL_PORT_RCV_PACKETS) - dev->n_port_rcv_packets = cntrs.port_rcv_packets; + dev->z_port_rcv_packets = cntrs.port_rcv_packets; return recv_pma_get_portcounters(pmp, ibdev, port); } @@ -1078,16 +1078,16 @@ static int recv_pma_set_portcounters_ext &rpkts, &xwait); if (p->counter_select & IB_PMA_SELX_PORT_XMIT_DATA) - dev->n_port_xmit_data = swords; + dev->z_port_xmit_data = swords; if (p->counter_select & IB_PMA_SELX_PORT_RCV_DATA) - dev->n_port_rcv_data = rwords; + dev->z_port_rcv_data = rwords; if (p->counter_select & IB_PMA_SELX_PORT_XMIT_PACKETS) - dev->n_port_xmit_packets = spkts; + dev->z_port_xmit_packets = spkts; if (p->counter_select & IB_PMA_SELX_PORT_RCV_PACKETS) - dev->n_port_rcv_packets = rpkts; + dev->z_port_rcv_packets = rpkts; if (p->counter_select & IB_PMA_SELX_PORT_UNI_XMIT_PACKETS) dev->n_unicast_xmit = 0; diff -r ec1934faf5d1 -r f8debae94d44 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 @@ -700,7 +700,7 @@ static int ipath_query_port(struct ib_de props->max_msg_sz = 4096; props->pkey_tbl_len = ipath_layer_get_npkeys(dev->dd); props->bad_pkey_cntr = ipath_layer_get_cr_errpkey(dev->dd) - - dev->n_pkey_violations; + dev->z_pkey_violations; props->qkey_viol_cntr = dev->qkey_violations; props->active_width = IB_WIDTH_4X; /* See rate_show() */ @@ -1034,18 +1034,18 @@ static void *ipath_register_ib_device(in /* Snapshot current HW counters to "clear" them. */ ipath_layer_get_counters(dd, &cntrs); - idev->n_symbol_error_counter = cntrs.symbol_error_counter; - idev->n_link_error_recovery_counter = + idev->z_symbol_error_counter = cntrs.symbol_error_counter; + idev->z_link_error_recovery_counter = cntrs.link_error_recovery_counter; - idev->n_link_downed_counter = cntrs.link_downed_counter; - idev->n_port_rcv_errors = cntrs.port_rcv_errors; - idev->n_port_rcv_remphys_errors = + idev->z_link_downed_counter = cntrs.link_downed_counter; + idev->z_port_rcv_errors = cntrs.port_rcv_errors; + idev->z_port_rcv_remphys_errors = cntrs.port_rcv_remphys_errors; - idev->n_port_xmit_discards = cntrs.port_xmit_discards; - idev->n_port_xmit_data = cntrs.port_xmit_data; - idev->n_port_rcv_data = cntrs.port_rcv_data; - idev->n_port_xmit_packets = cntrs.port_xmit_packets; - idev->n_port_rcv_packets = cntrs.port_rcv_packets; + idev->z_port_xmit_discards = cntrs.port_xmit_discards; + idev->z_port_xmit_data = cntrs.port_xmit_data; + idev->z_port_rcv_data = cntrs.port_rcv_data; + idev->z_port_xmit_packets = cntrs.port_xmit_packets; + idev->z_port_rcv_packets = cntrs.port_rcv_packets; idev->z_local_link_integrity_errors = cntrs.local_link_integrity_errors; idev->z_excessive_buffer_overrun_errors = diff -r ec1934faf5d1 -r f8debae94d44 drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 @@ -448,17 +448,17 @@ struct ipath_ibdev { u64 n_unicast_rcv; /* total unicast packets received */ u64 n_multicast_xmit; /* total multicast packets sent */ u64 n_multicast_rcv; /* total multicast packets received */ - u64 n_symbol_error_counter; /* starting count for PMA */ - u64 n_link_error_recovery_counter; /* starting count for PMA */ - u64 n_link_downed_counter; /* starting count for PMA */ - u64 n_port_rcv_errors; /* starting count for PMA */ - u64 n_port_rcv_remphys_errors; /* starting count for PMA */ - u64 n_port_xmit_discards; /* starting count for PMA */ - u64 n_port_xmit_data; /* starting count for PMA */ - u64 n_port_rcv_data; /* starting count for PMA */ - u64 n_port_xmit_packets; /* starting count for PMA */ - u64 n_port_rcv_packets; /* starting count for PMA */ - u32 n_pkey_violations; /* starting count for PMA */ + u64 z_symbol_error_counter; /* starting count for PMA */ + u64 z_link_error_recovery_counter; /* starting count for PMA */ + u64 z_link_downed_counter; /* starting count for PMA */ + u64 z_port_rcv_errors; /* starting count for PMA */ + u64 z_port_rcv_remphys_errors; /* starting count for PMA */ + u64 z_port_xmit_discards; /* starting count for PMA */ + u64 z_port_xmit_data; /* starting count for PMA */ + u64 z_port_rcv_data; /* starting count for PMA */ + u64 z_port_xmit_packets; /* starting count for PMA */ + u64 z_port_rcv_packets; /* starting count for PMA */ + u32 z_pkey_violations; /* starting count for PMA */ u32 z_local_link_integrity_errors; /* starting count for PMA */ u32 z_excessive_buffer_overrun_errors; /* starting count for PMA */ u32 n_rc_resends; From bos at pathscale.com Fri May 12 16:43:11 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:11 -0700 Subject: [openib-general] [PATCH 26 of 53] ipath - treat PE800 rev1 and rev2 as similar In-Reply-To: Message-ID: <8e2d63833cf2a2d13337.1147477391@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r 2b7918a7133e -r 8e2d63833cf2 drivers/infiniband/hw/ipath/ipath_pe800.c --- a/drivers/infiniband/hw/ipath/ipath_pe800.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_pe800.c Fri May 12 15:55:28 2006 -0700 @@ -532,7 +532,7 @@ static int ipath_pe_boardname(struct ipa if (n) snprintf(name, namelen, "%s", n); - if (dd->ipath_majrev != 4 || dd->ipath_minrev != 1) { + if (dd->ipath_majrev != 4 || !dd->ipath_minrev || dd->ipath_minrev>2) { ipath_dev_err(dd, "Unsupported PE-800 revision %u.%u!\n", dd->ipath_majrev, dd->ipath_minrev); ret = 1; From bos at pathscale.com Fri May 12 16:43:19 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:19 -0700 Subject: [openib-general] [PATCH 34 of 53] ipath - fix occasional hangs in SDP In-Reply-To: Message-ID: <09077b2f476f80594b82.1147477399@eng-12.pathscale.com> We were updating the head register multiple times in the rcvhdrq processing loop, and setting the counter on each update. Since that meant that the tail register was ahead of head for all but the last update, we would get extra interrupts. The fix was to not write the counter value except on the last update. Signed-off-by: Bryan O'Sullivan diff -r 5ddaf7c07cdf -r 09077b2f476f drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700 @@ -918,7 +918,7 @@ void ipath_kreceive(struct ipath_devdata const u32 maxcnt = dd->ipath_rcvhdrcnt * rsize; /* words */ u32 etail = -1, l, hdrqtail; struct ips_message_header *hdr; - u32 eflags, i, etype, tlen, pkttot = 0; + u32 eflags, i, etype, tlen, pkttot = 0, updegr=0; static u64 totcalls; /* stats, may eventually remove */ char emsg[128]; @@ -932,14 +932,14 @@ void ipath_kreceive(struct ipath_devdata if (test_and_set_bit(0, &dd->ipath_rcv_pending)) goto bail; - if (dd->ipath_port0head == - (u32)le64_to_cpu(*dd->ipath_hdrqtailptr)) + l = dd->ipath_port0head; + if(l == (u32)le64_to_cpu(*dd->ipath_hdrqtailptr)) goto done; /* read only once at start for performance */ hdrqtail = (u32)le64_to_cpu(*dd->ipath_hdrqtailptr); - for (i = 0, l = dd->ipath_port0head; l != hdrqtail; i++) { + for (i = 0; l != hdrqtail; i++) { u32 qp; u8 *bthbytes; @@ -1050,15 +1050,26 @@ void ipath_kreceive(struct ipath_devdata l += rsize; if (l >= maxcnt) l = 0; + if (etype != RCVHQ_RCV_TYPE_EXPECTED) + updegr = 1; /* - * update for each packet, to help prevent overflows if we - * have lots of packets. + * update head regs on last packet, and every 16 packets. + * Reduce bus traffic, while still trying to prevent + * rcvhdrq overflows, for when the queue is nearly full */ - (void)ipath_write_ureg(dd, ur_rcvhdrhead, - dd->ipath_rhdrhead_intr_off | l, 0); - if (etype != RCVHQ_RCV_TYPE_EXPECTED) - (void)ipath_write_ureg(dd, ur_rcvegrindexhead, - etail, 0); + if(l == hdrqtail || (i && !(i&0xf))) { + u64 lval; + if(l == hdrqtail) /* want interrupt only on last */ + lval = dd->ipath_rhdrhead_intr_off | l; + else + lval = l; + (void)ipath_write_ureg(dd, ur_rcvhdrhead, lval, 0); + if(updegr) { + (void)ipath_write_ureg(dd, ur_rcvegrindexhead, + etail, 0); + updegr = 0; + } + } } pkttot += i; diff -r 5ddaf7c07cdf -r 09077b2f476f drivers/infiniband/hw/ipath/ipath_intr.c --- a/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700 @@ -350,7 +350,7 @@ static unsigned handle_frequent_errors(s return supp_msgs; } -static void handle_errors(struct ipath_devdata *dd, ipath_err_t errs) +static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs) { char msg[512]; u64 ignore_this_time = 0; @@ -434,7 +434,7 @@ static void handle_errors(struct ipath_d INFINIPATH_E_IBSTATUSCHANGED); } if (!errs) - return; + return 0; if (!noprint) /* @@ -558,9 +558,7 @@ static void handle_errors(struct ipath_d wake_up_interruptible(&ipath_sma_state_wait); } - if (chkerrpkts) - /* process possible error packets in hdrq */ - ipath_kreceive(dd); + return chkerrpkts; } /* this is separate to allow for better optimization of ipath_intr() */ @@ -716,13 +714,14 @@ static void handle_urcv(struct ipath_dev } } + irqreturn_t ipath_intr(int irq, void *data, struct pt_regs *regs) { struct ipath_devdata *dd = data; - u32 istat; + u32 istat, chk0rcv = 0; ipath_err_t estat = 0; irqreturn_t ret; - u32 p0bits; + u32 p0bits, oldhead; static unsigned unexpected = 0; static const u32 port0rbits = (1U<ipath_port0head != - (u32)le64_to_cpu(*dd->ipath_hdrqtailptr)) { - u32 oldhead = dd->ipath_port0head; + oldhead = dd->ipath_port0head; + if (oldhead != (u32)le64_to_cpu(*dd->ipath_hdrqtailptr)) { if(dd->ipath_flags & IPATH_GPIO_INTR) { ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear, (u64) (1 << 2)); @@ -783,6 +781,8 @@ irqreturn_t ipath_intr(int irq, void *da } istat = ipath_read_kreg32(dd, dd->ipath_kregs->kr_intstatus); + p0bits = port0rbits; + if (unlikely(!istat)) { ipath_stats.sps_nullintr++; ret = IRQ_NONE; /* not our interrupt, or already handled */ @@ -820,10 +820,11 @@ irqreturn_t ipath_intr(int irq, void *da ipath_dev_err(dd, "Read of error status failed " "(all bits set); ignoring\n"); else - handle_errors(dd, estat); - } - - p0bits = port0rbits; + if(handle_errors(dd, estat)) + /* force calling ipath_kreceive() */ + chk0rcv = 1; + } + if (istat & INFINIPATH_I_GPIO) { /* * Packets are available in the port 0 rcv queue. @@ -845,8 +846,10 @@ irqreturn_t ipath_intr(int irq, void *da ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear, (u64) (1 << 2)); p0bits |= INFINIPATH_I_GPIO; - } - } + chk0rcv = 1; + } + } + chk0rcv |= istat & p0bits; /* * clear the ones we will deal with on this round @@ -858,18 +861,16 @@ irqreturn_t ipath_intr(int irq, void *da ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, istat); /* - * we check for both transition from empty to non-empty, and urgent - * packets (those with the interrupt bit set in the header), and - * if enabled, the GPIO bit 2 interrupt used for port0 on some - * HT-400 boards. - * Do this before checking for pio buffers available, since - * receives can overflow; piobuf waiters can afford a few - * extra cycles, since they were waiting anyway. - */ - if(istat & p0bits) { + * handle port0 receive before checking for pio buffers available, + * since receives can overflow; piobuf waiters can afford a few + * extra cycles, since they were waiting anyway, and user's waiting + * for receive are at the bottom. + */ + if(chk0rcv) { ipath_kreceive(dd); istat &= ~port0rbits; } + if (istat & ((infinipath_i_rcvavail_mask << INFINIPATH_I_RCVAVAIL_SHIFT) | (infinipath_i_rcvurg_mask << From bos at pathscale.com Fri May 12 16:42:59 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:42:59 -0700 Subject: [openib-general] [PATCH 14 of 53] ipath - forbid empty MRs In-Reply-To: Message-ID: <5d9fbba3222eeb941679.1147477379@eng-12.pathscale.com> Don't allow zero-length regions to be created. Signed-off-by: Bryan O'Sullivan diff -r 02a05b853d20 -r 5d9fbba3222e drivers/infiniband/hw/ipath/ipath_mr.c --- a/drivers/infiniband/hw/ipath/ipath_mr.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_mr.c Fri May 12 15:55:28 2006 -0700 @@ -168,6 +168,11 @@ struct ib_mr *ipath_reg_user_mr(struct i struct ib_umem_chunk *chunk; int n, m, i; struct ib_mr *ret; + + if (region->length == 0) { + ret = ERR_PTR(-EINVAL); + goto bail; + } n = 0; list_for_each_entry(chunk, ®ion->chunk_list, list) From bos at pathscale.com Fri May 12 16:43:15 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:15 -0700 Subject: [openib-general] [PATCH 30 of 53] ipath - count VL15 packet drops due to bad VL or lack of buffers In-Reply-To: Message-ID: Signed-off-by: Bryan O'Sullivan diff -r 23519e578bf0 -r b098b021b6fd drivers/infiniband/hw/ipath/ipath_ud.c --- a/drivers/infiniband/hw/ipath/ipath_ud.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_ud.c Fri May 12 15:55:28 2006 -0700 @@ -554,11 +554,16 @@ void ipath_ud_rcv(struct ipath_ibdev *de spin_lock_irqsave(&rq->lock, flags); if (rq->tail == rq->head) { spin_unlock_irqrestore(&rq->lock, flags); - /* Count VL15 packets dropped due to no receive buffer */ + /* + * Count VL15 packets dropped due to no receive buffer. + * Otherwise, count them as buffer overruns since usually, + * the HW will be able to receive packets even if there are + * no QPs with posted receive buffers. + */ if (qp->ibqp.qp_num == 0) dev->n_vl15_dropped++; else - dev->n_pkt_drops++; + dev->rcv_errors++; goto bail; } /* Silently drop packets which are too big. */ From bos at pathscale.com Fri May 12 16:43:20 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:20 -0700 Subject: [openib-general] [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes In-Reply-To: Message-ID: Made in-memory rcvhdrq tail update be in dma_alloc'ed memory, not random user or special kernel (needed for powerpc, also "just the right thing to do". Some cleanups to make unexpected link transitions less likely to produce complaints about packet errors, and also to not leave SMA packets stuck and unable to go out. Call dma_free_coherent without ipath_mutex held. A few other random debug and comment cleanups. Always init rcvhdrq head/tail registers to 0, to avoid race conditions (should have been that way some time ago). Signed-off-by: Bryan O'Sullivan diff -r 09077b2f476f -r e29625bd9050 drivers/infiniband/hw/ipath/ipath_common.h --- a/drivers/infiniband/hw/ipath/ipath_common.h Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_common.h Fri May 12 15:55:28 2006 -0700 @@ -307,6 +307,9 @@ struct ipath_base_info { __u32 spi_rcv_egrchunksize; /* total size of mmap to cover full rcvegrbuffers */ __u32 spi_rcv_egrbuftotlen; + __u32 spi_filler_for_align; + /* address of readonly memory copy of the rcvhdrq tail register. */ + __u64 spi_rcvhdr_tailaddr; } __attribute__ ((aligned(8))); @@ -376,13 +379,7 @@ struct ipath_user_info { */ __u32 spu_rcvhdrsize; - /* - * cache line aligned (64 byte) user address to - * which the rcvhdrtail register will be written by infinipath - * whenever it changes, so that no chip registers are read in - * the performance path. - */ - __u64 spu_rcvhdraddr; + __u64 spu_unused; /* kept for compatible layout */ /* * address of struct base_info to write to diff -r 09077b2f476f -r e29625bd9050 drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700 @@ -131,14 +131,6 @@ static struct pci_driver ipath_driver = .id_table = ipath_pci_tbl, }; -/* - * This is where port 0's rcvhdrtail register is written back; we also - * want nothing else sharing the cache line, so make it a cache line - * in size. Used for all units. - */ -volatile __le64 *ipath_port0_rcvhdrtail; -dma_addr_t ipath_port0_rcvhdrtail_dma; -static int port0_rcvhdrtail_refs; static inline void read_bars(struct ipath_devdata *dd, struct pci_dev *dev, u32 *bar0, u32 *bar1) @@ -171,14 +163,13 @@ static void ipath_free_devdata(struct pc list_del(&dd->ipath_list); spin_unlock_irqrestore(&ipath_devs_lock, flags); } - dma_free_coherent(&pdev->dev, sizeof(*dd), dd, dd->ipath_dma_addr); + vfree(dd); } static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev) { unsigned long flags; struct ipath_devdata *dd; - dma_addr_t dma_addr; int ret; if (!idr_pre_get(&unit_table, GFP_KERNEL)) { @@ -186,15 +177,13 @@ static struct ipath_devdata *ipath_alloc goto bail; } - dd = dma_alloc_coherent(&pdev->dev, sizeof(*dd), &dma_addr, - GFP_KERNEL); - + dd = vmalloc(sizeof(*dd)); if (!dd) { dd = ERR_PTR(-ENOMEM); goto bail; } - - dd->ipath_dma_addr = dma_addr; + memset(dd, 0, sizeof(*dd)); + dd->ipath_unit = -1; spin_lock_irqsave(&ipath_devs_lock, flags); @@ -272,47 +261,6 @@ int ipath_count_units(int *npresentp, in return nunits; } -static int init_port0_rcvhdrtail(struct pci_dev *pdev) -{ - int ret; - - mutex_lock(&ipath_mutex); - - if (!ipath_port0_rcvhdrtail) { - ipath_port0_rcvhdrtail = - dma_alloc_coherent(&pdev->dev, - IPATH_PORT0_RCVHDRTAIL_SIZE, - &ipath_port0_rcvhdrtail_dma, - GFP_KERNEL); - - if (!ipath_port0_rcvhdrtail) { - ret = -ENOMEM; - goto bail; - } - } - port0_rcvhdrtail_refs++; - ret = 0; - -bail: - mutex_unlock(&ipath_mutex); - - return ret; -} - -static void cleanup_port0_rcvhdrtail(struct pci_dev *pdev) -{ - mutex_lock(&ipath_mutex); - - if (!--port0_rcvhdrtail_refs) { - dma_free_coherent(&pdev->dev, IPATH_PORT0_RCVHDRTAIL_SIZE, - (void *) ipath_port0_rcvhdrtail, - ipath_port0_rcvhdrtail_dma); - ipath_port0_rcvhdrtail = NULL; - } - - mutex_unlock(&ipath_mutex); -} - /* * These next two routines are placeholders in case we don't have per-arch * code for controlling write combining. If explicit control of write @@ -337,20 +285,12 @@ static int __devinit ipath_init_one(stru u32 bar0 = 0, bar1 = 0; u8 rev; - ret = init_port0_rcvhdrtail(pdev); - if (ret < 0) { - printk(KERN_ERR IPATH_DRV_NAME - ": Could not allocate port0_rcvhdrtail: error %d\n", - -ret); - goto bail; - } - dd = ipath_alloc_devdata(pdev); if (IS_ERR(dd)) { ret = PTR_ERR(dd); printk(KERN_ERR IPATH_DRV_NAME ": Could not allocate devdata: error %d\n", -ret); - goto bail_rcvhdrtail; + goto bail; } ipath_cdbg(VERBOSE, "initializing unit #%u\n", dd->ipath_unit); @@ -562,9 +502,6 @@ bail_devdata: bail_devdata: ipath_free_devdata(pdev, dd); -bail_rcvhdrtail: - cleanup_port0_rcvhdrtail(pdev); - bail: return ret; } @@ -595,7 +532,6 @@ static void __devexit ipath_remove_one(s pci_disable_device(pdev); ipath_free_devdata(pdev, dd); - cleanup_port0_rcvhdrtail(pdev); } /* general driver use */ @@ -1372,26 +1308,20 @@ bail: * @dd: the infinipath device * @pd: the port data * - * this *must* be physically contiguous memory, and for now, - * that limits it to what kmalloc can do. + * this must be contiguous memory (from an i/o perspective), and must be + * DMA'able (which means for some systems, it will go through an IOMMU, + * or be forced into a low address range). */ int ipath_create_rcvhdrq(struct ipath_devdata *dd, struct ipath_portdata *pd) { - int ret = 0, amt; - - amt = ALIGN(dd->ipath_rcvhdrcnt * dd->ipath_rcvhdrentsize * - sizeof(u32), PAGE_SIZE); + int ret = 0; + if (!pd->port_rcvhdrq) { - /* - * not using REPEAT isn't viable; at 128KB, we can easily - * fail this. The problem with REPEAT is we can block here - * "forever". There isn't an inbetween, unfortunately. We - * could reduce the risk by never freeing the rcvhdrq except - * at unload, but even then, the first time a port is used, - * we could delay for some time... - */ + dma_addr_t phys_hdrqtail; gfp_t gfp_flags = GFP_USER | __GFP_COMP; + int amt = ALIGN(dd->ipath_rcvhdrcnt * dd->ipath_rcvhdrentsize * + sizeof(u32), PAGE_SIZE); pd->port_rcvhdrq = dma_alloc_coherent( &dd->pcidev->dev, amt, &pd->port_rcvhdrq_phys, @@ -1404,6 +1334,16 @@ int ipath_create_rcvhdrq(struct ipath_de ret = -ENOMEM; goto bail; } + pd->port_rcvhdrtail_kvaddr = dma_alloc_coherent( + &dd->pcidev->dev, PAGE_SIZE, &phys_hdrqtail, GFP_KERNEL); + if (!pd->port_rcvhdrtail_kvaddr) { + ipath_dev_err(dd, "attempt to allocate 1 page " + "for port %u rcvhdrqtailaddr failed\n", + pd->port_port); + ret = -ENOMEM; + goto bail; + } + pd->port_rcvhdrqtailaddr_phys = phys_hdrqtail; pd->port_rcvhdrq_size = amt; @@ -1413,20 +1353,28 @@ int ipath_create_rcvhdrq(struct ipath_de (unsigned long) pd->port_rcvhdrq_phys, (unsigned long) pd->port_rcvhdrq_size, pd->port_port); - } else { - /* - * clear for security, sanity, and/or debugging, each - * time we reuse - */ - memset(pd->port_rcvhdrq, 0, amt); - } + + ipath_cdbg(VERBOSE, "port %d hdrtailaddr, %llx physical\n", + pd->port_port, + (unsigned long long) phys_hdrqtail); + } + else + ipath_cdbg(VERBOSE, "reuse port %d rcvhdrq @%p %llx phys; " + "hdrtailaddr@%p %llx physical\n", + pd->port_port, pd->port_rcvhdrq, + pd->port_rcvhdrq_phys, pd->port_rcvhdrtail_kvaddr, + (unsigned long long)pd->port_rcvhdrqtailaddr_phys); + + /* clear for security and sanity on each use */ + memset(pd->port_rcvhdrq, 0, pd->port_rcvhdrq_size); + memset((void *)pd->port_rcvhdrtail_kvaddr, 0, PAGE_SIZE); /* * tell chip each time we init it, even if we are re-using previous - * memory (we zero it at process close) - */ - ipath_cdbg(VERBOSE, "writing port %d rcvhdraddr as %lx\n", - pd->port_port, (unsigned long) pd->port_rcvhdrq_phys); + * memory (we zero the register at process close) + */ + ipath_write_kreg_port(dd, dd->ipath_kregs->kr_rcvhdrtailaddr, + pd->port_port, pd->port_rcvhdrqtailaddr_phys); ipath_write_kreg_port(dd, dd->ipath_kregs->kr_rcvhdraddr, pd->port_port, pd->port_rcvhdrq_phys); @@ -1514,15 +1462,27 @@ void ipath_set_ib_lstate(struct ipath_de [INFINIPATH_IBCC_LINKCMD_ARMED] = "ARMED", [INFINIPATH_IBCC_LINKCMD_ACTIVE] = "ACTIVE" }; + int linkcmd = (which >> INFINIPATH_IBCC_LINKCMD_SHIFT) & + INFINIPATH_IBCC_LINKCMD_MASK; + ipath_cdbg(SMA, "Trying to move unit %u to %s, current ltstate " "is %s\n", dd->ipath_unit, - what[(which >> INFINIPATH_IBCC_LINKCMD_SHIFT) & - INFINIPATH_IBCC_LINKCMD_MASK], + what[linkcmd], ipath_ibcstatus_str[ (ipath_read_kreg64 (dd, dd->ipath_kregs->kr_ibcstatus) >> INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) & INFINIPATH_IBCS_LINKTRAININGSTATE_MASK]); + /* flush all queued sends when going to DOWN or INIT, to be sure that + * they don't block SMA and other MAD packets */ + if(!linkcmd || linkcmd == INFINIPATH_IBCC_LINKCMD_INIT) { + ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, + INFINIPATH_S_ABORT); + ipath_disarm_piobufs(dd, dd->ipath_lastport_piobuf, + (unsigned)(dd->ipath_piobcnt2k + + dd->ipath_piobcnt4k) - + dd->ipath_lastport_piobuf); + } ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcctrl, dd->ipath_ibcctrl | which); @@ -1670,60 +1630,54 @@ void ipath_shutdown_device(struct ipath_ /** * ipath_free_pddata - free a port's allocated data * @dd: the infinipath device - * @port: the port - * @freehdrq: free the port data structure if true - * - * when closing, free up any allocated data for a port, if the - * reference count goes to zero - * Note: this also optionally frees the portdata itself! - * Any changes here have to be matched up with the reinit case - * of ipath_init_chip(), which calls this routine on reinit after reset. - */ -void ipath_free_pddata(struct ipath_devdata *dd, u32 port, int freehdrq) -{ - struct ipath_portdata *pd = dd->ipath_pd[port]; - + * @pd: the portdata structure + * + * free up any allocated data for a port + * This should not touch anything that would affect a simultaneous + * re-allocation of port data, because it is called after ipath_mutex + * is released (and can be called from reinit as well). + * It should never change any chip state, or global driver state. + * (The only exception to global state is freeing the port0 port0_skbs.) + */ +void ipath_free_pddata(struct ipath_devdata *dd, struct ipath_portdata *pd) +{ if (!pd) return; - if (freehdrq) - /* - * only clear and free portdata if we are going to also - * release the hdrq, otherwise we leak the hdrq on each - * open/close cycle - */ - dd->ipath_pd[port] = NULL; - if (freehdrq && pd->port_rcvhdrq) { + + if (pd->port_rcvhdrq) { ipath_cdbg(VERBOSE, "free closed port %d rcvhdrq @ %p " "(size=%lu)\n", pd->port_port, pd->port_rcvhdrq, (unsigned long) pd->port_rcvhdrq_size); dma_free_coherent(&dd->pcidev->dev, pd->port_rcvhdrq_size, pd->port_rcvhdrq, pd->port_rcvhdrq_phys); pd->port_rcvhdrq = NULL; - } - if (port && pd->port_rcvegrbuf) { - /* always free this */ - if (pd->port_rcvegrbuf) { - unsigned e; - - for (e = 0; e < pd->port_rcvegrbuf_chunks; e++) { - void *base = pd->port_rcvegrbuf[e]; - size_t size = pd->port_rcvegrbuf_size; - - ipath_cdbg(VERBOSE, "egrbuf free(%p, %lu), " - "chunk %u/%u\n", base, - (unsigned long) size, - e, pd->port_rcvegrbuf_chunks); - dma_free_coherent( - &dd->pcidev->dev, size, base, - pd->port_rcvegrbuf_phys[e]); - } - vfree(pd->port_rcvegrbuf); - pd->port_rcvegrbuf = NULL; - vfree(pd->port_rcvegrbuf_phys); - pd->port_rcvegrbuf_phys = NULL; - } + if(pd->port_rcvhdrtail_kvaddr) { + dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE, + (void *)pd->port_rcvhdrtail_kvaddr, + pd->port_rcvhdrqtailaddr_phys); + pd->port_rcvhdrtail_kvaddr = NULL; + } + } + if(pd->port_port && pd->port_rcvegrbuf) { + unsigned e; + + for (e = 0; e < pd->port_rcvegrbuf_chunks; e++) { + void *base = pd->port_rcvegrbuf[e]; + size_t size = pd->port_rcvegrbuf_size; + + ipath_cdbg(VERBOSE, "egrbuf free(%p, %lu), " + "chunk %u/%u\n", base, + (unsigned long) size, + e, pd->port_rcvegrbuf_chunks); + dma_free_coherent(&dd->pcidev->dev, size, + base, pd->port_rcvegrbuf_phys[e]); + } + vfree(pd->port_rcvegrbuf); + pd->port_rcvegrbuf = NULL; + vfree(pd->port_rcvegrbuf_phys); + pd->port_rcvegrbuf_phys = NULL; pd->port_rcvegrbuf_chunks = 0; - } else if (port == 0 && dd->ipath_port0_skbs) { + } else if (pd->port_port == 0 && dd->ipath_port0_skbs) { unsigned e; struct sk_buff **skbs = dd->ipath_port0_skbs; @@ -1735,10 +1689,8 @@ void ipath_free_pddata(struct ipath_devd dev_kfree_skb(skbs[e]); vfree(skbs); } - if (freehdrq) { - kfree(pd->port_tid_pg_list); - kfree(pd); - } + kfree(pd->port_tid_pg_list); + kfree(pd); } static int __init infinipath_init(void) @@ -1864,10 +1816,14 @@ static void cleanup_device(struct ipath_ /* * free any resources still in use (usually just kernel ports) - * at unload - */ - for (port = 0; port < dd->ipath_cfgports; port++) - ipath_free_pddata(dd, port, 1); + * at unload; we do for portcnt, not cfgports, because cfgports + * could have changed while we were loaded. + */ + for (port = 0; port < dd->ipath_portcnt; port++) { + struct ipath_portdata *pd = dd->ipath_pd[port]; + dd->ipath_pd[port] = NULL; + ipath_free_pddata(dd, pd); + } kfree(dd->ipath_pd); /* * debuggability, in case some cleanup path tries to use it @@ -1908,19 +1864,19 @@ static void __exit infinipath_cleanup(vo } else ipath_dbg("irq is 0, not doing free_irq " "for unit %u\n", dd->ipath_unit); - - /* - * we check for NULL here, because it's outside - * the kregbase check, and we need to call it - * after the free_irq. Thus it's possible that - * the function pointers were never initialized. - */ - if (dd->ipath_f_cleanup) - /* clean up chip-specific stuff */ - dd->ipath_f_cleanup(dd); - dd->pcidev = NULL; } + + /* + * we check for NULL here, because it's outside the kregbase + * check, and we need to call it after the free_irq. Thus + * it's possible that the function pointers were never + * initialized. + */ + if (dd->ipath_f_cleanup) + /* clean up chip-specific stuff */ + dd->ipath_f_cleanup(dd); + spin_lock_irqsave(&ipath_devs_lock, flags); } diff -r 09077b2f476f -r e29625bd9050 drivers/infiniband/hw/ipath/ipath_file_ops.c --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:28 2006 -0700 @@ -122,6 +122,7 @@ static int ipath_get_base_info(struct ip * on to yet another method of dealing with this */ kinfo->spi_rcvhdr_base = (u64) pd->port_rcvhdrq_phys; + kinfo->spi_rcvhdr_tailaddr = (u64)pd->port_rcvhdrqtailaddr_phys; kinfo->spi_rcv_egrbufs = (u64) pd->port_rcvegr_phys; kinfo->spi_pioavailaddr = (u64) dd->ipath_pioavailregs_phys; kinfo->spi_status = (u64) kinfo->spi_pioavailaddr + @@ -783,11 +784,12 @@ static int ipath_create_user_egr(struct bail_rcvegrbuf_phys: for (e = 0; e < pd->port_rcvegrbuf_chunks && - pd->port_rcvegrbuf[e]; e++) + pd->port_rcvegrbuf[e]; e++) { dma_free_coherent(&dd->pcidev->dev, size, pd->port_rcvegrbuf[e], pd->port_rcvegrbuf_phys[e]); + } vfree(pd->port_rcvegrbuf_phys); pd->port_rcvegrbuf_phys = NULL; bail_rcvegrbuf: @@ -802,10 +804,7 @@ static int ipath_do_user_init(struct ipa { int ret = 0; struct ipath_devdata *dd = pd->port_dd; - u64 physaddr, uaddr, off, atmp; - struct page *pagep; u32 head32; - u64 head; /* for now, if major version is different, bail */ if ((uinfo->spu_userversion >> 16) != IPATH_USER_SWMAJOR) { @@ -829,54 +828,6 @@ static int ipath_do_user_init(struct ipa } /* for now we do nothing with rcvhdrcnt: uinfo->spu_rcvhdrcnt */ - - /* set up for the rcvhdr Q tail register writeback to user memory */ - if (!uinfo->spu_rcvhdraddr || - !access_ok(VERIFY_WRITE, (u64 __user *) (unsigned long) - uinfo->spu_rcvhdraddr, sizeof(u64))) { - ipath_dbg("Port %d rcvhdrtail addr %llx not valid\n", - pd->port_port, - (unsigned long long) uinfo->spu_rcvhdraddr); - ret = -EINVAL; - goto done; - } - - off = offset_in_page(uinfo->spu_rcvhdraddr); - uaddr = PAGE_MASK & (unsigned long) uinfo->spu_rcvhdraddr; - ret = ipath_get_user_pages_nocopy(uaddr, &pagep); - if (ret) { - dev_info(&dd->pcidev->dev, "Failed to lookup and lock " - "address %llx for rcvhdrtail: errno %d\n", - (unsigned long long) uinfo->spu_rcvhdraddr, -ret); - goto done; - } - ipath_stats.sps_pagelocks++; - pd->port_rcvhdrtail_uaddr = uaddr; - pd->port_rcvhdrtail_pagep = pagep; - pd->port_rcvhdrtail_kvaddr = - page_address(pagep); - pd->port_rcvhdrtail_kvaddr += off; - physaddr = page_to_phys(pagep) + off; - ipath_cdbg(VERBOSE, "port %d user addr %llx hdrtailaddr, %llx " - "physical (off=%llx)\n", - pd->port_port, - (unsigned long long) uinfo->spu_rcvhdraddr, - (unsigned long long) physaddr, (unsigned long long) off); - ipath_write_kreg_port(dd, dd->ipath_kregs->kr_rcvhdrtailaddr, - pd->port_port, physaddr); - atmp = ipath_read_kreg64_port(dd, - dd->ipath_kregs->kr_rcvhdrtailaddr, - pd->port_port); - if (physaddr != atmp) { - ipath_dev_err(dd, - "Catastrophic software error, " - "RcvHdrTailAddr%u written as %llx, " - "read back as %llx\n", pd->port_port, - (unsigned long long) physaddr, - (unsigned long long) atmp); - ret = -EINVAL; - goto done; - } /* for right now, kernel piobufs are at end, so port 1 is at 0 */ pd->port_piobufs = dd->ipath_piobufbase + @@ -896,26 +847,18 @@ static int ipath_do_user_init(struct ipa ret = ipath_create_user_egr(pd); if (ret) goto done; - /* enable receives now */ - /* atomically set enable bit for this port */ - set_bit(INFINIPATH_R_PORTENABLE_SHIFT + pd->port_port, - &dd->ipath_rcvctrl); /* - * set the head registers for this port to the current values + * set the eager head register for this port to the current values * of the tail pointers, since we don't know if they were * updated on last use of the port. */ - head32 = ipath_read_ureg32(dd, ur_rcvhdrtail, pd->port_port); - head = (u64) head32; - ipath_write_ureg(dd, ur_rcvhdrhead, head, pd->port_port); head32 = ipath_read_ureg32(dd, ur_rcvegrindextail, pd->port_port); ipath_write_ureg(dd, ur_rcvegrindexhead, head32, pd->port_port); dd->ipath_lastegrheads[pd->port_port] = -1; dd->ipath_lastrcvhdrqtails[pd->port_port] = -1; - ipath_cdbg(VERBOSE, "Wrote port%d head %llx, egrhead %x from " - "tail regs\n", pd->port_port, - (unsigned long long) head, head32); + ipath_cdbg(VERBOSE, "Wrote port%d egrhead %x from tail regs\n", + pd->port_port, head32); pd->port_tidcursor = 0; /* start at beginning after open */ /* * now enable the port; the tail registers will be written to memory @@ -924,13 +867,62 @@ static int ipath_do_user_init(struct ipa * transition from 0 to 1, so clear it first, then set it as part of * enabling the port. This will (very briefly) affect any other * open ports, but it shouldn't be long enough to be an issue. + * We explictly set the in-memory copy to 0 beforehand, so we don't + * have to wait to be sure the DMA update has happened. */ + *pd->port_rcvhdrtail_kvaddr = 0ULL; + set_bit(INFINIPATH_R_PORTENABLE_SHIFT + pd->port_port, + &dd->ipath_rcvctrl); ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, dd->ipath_rcvctrl & ~INFINIPATH_R_TAILUPD); ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, dd->ipath_rcvctrl); - done: + return ret; +} + + +/* common code for the mappings on dma_alloc_coherent mem */ +static int ipath_mmap_mem(struct vm_area_struct *vma, + struct ipath_portdata *pd, unsigned len, + int write_ok, dma_addr_t addr, char *what) +{ + struct ipath_devdata *dd = pd->port_dd; + unsigned pfn = (unsigned long)addr >> PAGE_SHIFT; + int ret; + + if ((vma->vm_end - vma->vm_start) > len) { + dev_info(&dd->pcidev->dev, + "FAIL on %s: len %lx > %x\n", what, + vma->vm_end - vma->vm_start, len); + ret = -EFAULT; + goto bail; + } + + if(!write_ok) { + if (vma->vm_flags & VM_WRITE) { + dev_info(&dd->pcidev->dev, + "%s must be mapped readonly\n", what); + ret = -EPERM; + goto bail; + } + + /* don't allow them to later change with mprotect */ + vma->vm_flags &= ~VM_MAYWRITE; + } + + ret = remap_pfn_range(vma, vma->vm_start, pfn, + len, vma->vm_page_prot); + if(ret) + dev_info(&dd->pcidev->dev, + "%s port%u mmap of %lx, %x bytes r%c failed: %d\n", + what, pd->port_port, (unsigned long)addr, len, + write_ok?'w':'o', ret); + else + ipath_cdbg(VERBOSE, "%s port%u mmaped %lx, %x bytes r%c\n", + what, pd->port_port, (unsigned long)addr, len, + write_ok?'w':'o'); +bail: return ret; } @@ -940,8 +932,11 @@ static int mmap_ureg(struct vm_area_stru unsigned long phys; int ret; - /* it's the real hardware, so io_remap works */ - + /* + * This is real hardware, so use io_remap. This is the mechanism + * for the user process to update the head registers for their port + * in the chip. + */ if ((vma->vm_end - vma->vm_start) > PAGE_SIZE) { dev_info(&dd->pcidev->dev, "FAIL mmap userreg: reqlen " "%lx > PAGE\n", vma->vm_end - vma->vm_start); @@ -967,10 +962,11 @@ static int mmap_piobufs(struct vm_area_s int ret; /* - * When we map the PIO buffers, we want to map them as writeonly, no - * read possible. + * When we map the PIO buffers in the chip, we want to map them as + * writeonly, no read possible. This prevents access to previous + * process data, and catches users who might try to read the i/o + * space due to a bug. */ - if ((vma->vm_end - vma->vm_start) > (dd->ipath_pbufsport * dd->ipath_palign)) { dev_info(&dd->pcidev->dev, "FAIL mmap piobufs: " @@ -981,11 +977,10 @@ static int mmap_piobufs(struct vm_area_s } phys = dd->ipath_physaddr + pd->port_piobufs; + /* - * Do *NOT* mark this as non-cached (PWT bit), or we don't get the + * Don't mark this as non-cached, or we don't get the * write combining behavior we want on the PIO buffers! - * vma->vm_page_prot = - * pgprot_noncached(vma->vm_page_prot); */ if (vma->vm_flags & VM_READ) { @@ -997,8 +992,7 @@ static int mmap_piobufs(struct vm_area_s } /* don't allow them to later change to readable with mprotect */ - - vma->vm_flags &= ~VM_MAYWRITE; + vma->vm_flags &= ~VM_MAYREAD; vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND; ret = io_remap_pfn_range(vma, vma->vm_start, phys >> PAGE_SHIFT, @@ -1016,11 +1010,6 @@ static int mmap_rcvegrbufs(struct vm_are size_t total_size, i; dma_addr_t *phys; int ret; - - if (!pd->port_rcvegrbuf) { - ret = -EFAULT; - goto bail; - } size = pd->port_rcvegrbuf_size; total_size = pd->port_rcvegrbuf_chunks * size; @@ -1039,12 +1028,11 @@ static int mmap_rcvegrbufs(struct vm_are ret = -EPERM; goto bail; } + /* don't allow them to later change to writeable with mprotect */ + vma->vm_flags &= ~VM_MAYWRITE; start = vma->vm_start; phys = pd->port_rcvegrbuf_phys; - - /* don't allow them to later change to writeable with mprotect */ - vma->vm_flags &= ~VM_MAYWRITE; for (i = 0; i < pd->port_rcvegrbuf_chunks; i++, start += size) { ret = remap_pfn_range(vma, start, phys[i] >> PAGE_SHIFT, @@ -1054,78 +1042,6 @@ static int mmap_rcvegrbufs(struct vm_are } ret = 0; -bail: - return ret; -} - -static int mmap_rcvhdrq(struct vm_area_struct *vma, - struct ipath_portdata *pd) -{ - struct ipath_devdata *dd = pd->port_dd; - size_t total_size; - int ret; - - /* - * kmalloc'ed memory, physically contiguous; this is from - * spi_rcvhdr_base; we allow user to map read-write so they can - * write hdrq entries to allow protocol code to directly poll - * whether a hdrq entry has been written. - */ - total_size = ALIGN(dd->ipath_rcvhdrcnt * dd->ipath_rcvhdrentsize * - sizeof(u32), PAGE_SIZE); - if ((vma->vm_end - vma->vm_start) > total_size) { - dev_info(&dd->pcidev->dev, - "FAIL on rcvhdrq: reqlen %lx > actual %lx\n", - vma->vm_end - vma->vm_start, - (unsigned long) total_size); - ret = -EFAULT; - goto bail; - } - - ret = remap_pfn_range(vma, vma->vm_start, - pd->port_rcvhdrq_phys >> PAGE_SHIFT, - vma->vm_end - vma->vm_start, - vma->vm_page_prot); -bail: - return ret; -} - -static int mmap_pioavailregs(struct vm_area_struct *vma, - struct ipath_portdata *pd) -{ - struct ipath_devdata *dd = pd->port_dd; - int ret; - - /* - * when we map the PIO bufferavail registers, we want to map them as - * readonly, no write possible. - * - * kmalloc'ed memory, physically contiguous, one page only, readonly - */ - - if ((vma->vm_end - vma->vm_start) > PAGE_SIZE) { - dev_info(&dd->pcidev->dev, "FAIL on pioavailregs_dma: " - "reqlen %lx > actual %lx\n", - vma->vm_end - vma->vm_start, - (unsigned long) PAGE_SIZE); - ret = -EFAULT; - goto bail; - } - - if (vma->vm_flags & VM_WRITE) { - dev_info(&dd->pcidev->dev, - "Can't map pioavailregs as writable (flags=%lx)\n", - vma->vm_flags); - ret = -EPERM; - goto bail; - } - - /* don't allow them to later change with mprotect */ - vma->vm_flags &= ~VM_MAYWRITE; - - ret = remap_pfn_range(vma, vma->vm_start, - dd->ipath_pioavailregs_phys >> PAGE_SHIFT, - PAGE_SIZE, vma->vm_page_prot); bail: return ret; } @@ -1149,6 +1065,7 @@ static int ipath_mmap(struct file *fp, s pd = port_fp(fp); dd = pd->port_dd; + /* * This is the ipath_do_user_init() code, mapping the shared buffers * into the user process. The address referred to by vm_pgoff is the @@ -1158,28 +1075,59 @@ static int ipath_mmap(struct file *fp, s pgaddr = vma->vm_pgoff << PAGE_SHIFT; /* - * note that ureg does *NOT* have the kregvirt as part of it, to be - * sure that for 32 bit programs, we don't end up trying to map a > - * 44 address. Has to match ipath_get_base_info() code that sets - * __spi_uregbase + * Must fit in 40 bits for our hardware; some checked elsewhere, + * but we'll be paranoid. Check for 0 is mostly in case one of the + * allocations failed, but user called mmap anyway. We want to catch + * that before it can match. */ - + if(!pgaddr || pgaddr >= (1ULL<<40)) { + ipath_dev_err(dd, "Bad physical address %llx, start %lx, end %lx\n", + (unsigned long long)pgaddr, vma->vm_start, vma->vm_end); + return -EINVAL; + } + + /* just the offset of the port user registers, not physical addr */ ureg = dd->ipath_uregbase + dd->ipath_palign * pd->port_port; ipath_cdbg(MM, "ushare: pgaddr %llx vm_start=%lx, vmlen %lx\n", (unsigned long long) pgaddr, vma->vm_start, vma->vm_end - vma->vm_start); - if (pgaddr == ureg) + if(vma->vm_start & (PAGE_SIZE-1)) { + ipath_dev_err(dd, + "vm_start not aligned: %lx, end=%lx phys %lx\n", + vma->vm_start, vma->vm_end, (unsigned long)pgaddr); + ret = -EINVAL; + } + else if (pgaddr == ureg) ret = mmap_ureg(vma, dd, ureg); else if (pgaddr == pd->port_piobufs) ret = mmap_piobufs(vma, dd, pd); else if (pgaddr == (u64) pd->port_rcvegr_phys) ret = mmap_rcvegrbufs(vma, pd); - else if (pgaddr == (u64) pd->port_rcvhdrq_phys) - ret = mmap_rcvhdrq(vma, pd); + else if (pgaddr == (u64) pd->port_rcvhdrq_phys) { + /* + * The rcvhdrq itself; readonly except on HT-400 (so have + * to allow writable mapping), multiple pages, contiguous + * from an i/o perspective. + */ + unsigned total_size = + ALIGN(dd->ipath_rcvhdrcnt * dd->ipath_rcvhdrentsize + * sizeof(u32), PAGE_SIZE); + ret = ipath_mmap_mem(vma, pd, total_size, 1, + pd->port_rcvhdrq_phys, + "rcvhdrq"); + } + else if (pgaddr == (u64)pd->port_rcvhdrqtailaddr_phys) + /* in-memory copy of rcvhdrq tail register */ + ret = ipath_mmap_mem(vma, pd, PAGE_SIZE, 0, + pd->port_rcvhdrqtailaddr_phys, + "rcvhdrq tail"); else if (pgaddr == dd->ipath_pioavailregs_phys) - ret = mmap_pioavailregs(vma, pd); + /* in-memory copy of pioavail registers */ + ret = ipath_mmap_mem(vma, pd, PAGE_SIZE, 0, + dd->ipath_pioavailregs_phys, + "pioavail registers"); else ret = -EINVAL; @@ -1532,14 +1480,6 @@ static int ipath_close(struct inode *in, } if (dd->ipath_kregbase) { - if (pd->port_rcvhdrtail_uaddr) { - pd->port_rcvhdrtail_uaddr = 0; - pd->port_rcvhdrtail_kvaddr = NULL; - ipath_release_user_pages_on_close( - &pd->port_rcvhdrtail_pagep, 1); - pd->port_rcvhdrtail_pagep = NULL; - ipath_stats.sps_pageunlocks++; - } ipath_write_kreg_port( dd, dd->ipath_kregs->kr_rcvhdrtailaddr, port, 0ULL); @@ -1576,9 +1516,9 @@ static int ipath_close(struct inode *in, dd->ipath_f_clear_tids(dd, pd->port_port); - ipath_free_pddata(dd, pd->port_port, 0); - + dd->ipath_pd[pd->port_port] = NULL; /* before releasing mutex */ mutex_unlock(&ipath_mutex); + ipath_free_pddata(dd, pd); /* after releasing the mutex */ return ret; } @@ -1908,3 +1848,4 @@ bail: bail: return; } + diff -r 09077b2f476f -r e29625bd9050 drivers/infiniband/hw/ipath/ipath_init_chip.c --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:28 2006 -0700 @@ -409,17 +409,8 @@ static int init_pioavailregs(struct ipat /* and its length */ dd->ipath_freezelen = L1_CACHE_BYTES - sizeof(dd->ipath_statusp[0]); - if (dd->ipath_unit * 64 > (IPATH_PORT0_RCVHDRTAIL_SIZE - 64)) { - ipath_dev_err(dd, "unit %u too large for port 0 " - "rcvhdrtail buffer size\n", dd->ipath_unit); - ret = -ENODEV; - } - else - ret = 0; - - /* so we can get current tail in ipath_kreceive(), per chip */ - dd->ipath_hdrqtailptr = &ipath_port0_rcvhdrtail[ - dd->ipath_unit * (64 / sizeof(*ipath_port0_rcvhdrtail))]; + ret = 0; + done: return ret; } @@ -652,7 +643,7 @@ int ipath_init_chip(struct ipath_devdata { int ret = 0, i; u32 val32, kpiobufs; - u64 val, atmp; + u64 val; struct ipath_portdata *pd = NULL; /* keep gcc4 happy */ ret = init_housekeeping(dd, &pd, reinit); @@ -775,24 +766,6 @@ int ipath_init_chip(struct ipath_devdata goto done; } - val = ipath_port0_rcvhdrtail_dma + dd->ipath_unit * 64; - - /* verify that the alignment requirement was met */ - ipath_write_kreg_port(dd, dd->ipath_kregs->kr_rcvhdrtailaddr, - 0, val); - atmp = ipath_read_kreg64_port( - dd, dd->ipath_kregs->kr_rcvhdrtailaddr, 0); - if (val != atmp) { - ipath_dev_err(dd, "Catastrophic software error, " - "RcvHdrTailAddr0 written as %llx, " - "read back as %llx from %x\n", - (unsigned long long) val, - (unsigned long long) atmp, - dd->ipath_kregs->kr_rcvhdrtailaddr); - ret = -EINVAL; - goto done; - } - ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvbthqp, IPATH_KD_QP); /* @@ -841,12 +814,18 @@ int ipath_init_chip(struct ipath_devdata * re-init, the simplest way to handle this is to free * existing, and re-allocate. */ - if (reinit) - ipath_free_pddata(dd, 0, 0); + if (reinit) { + struct ipath_portdata *pd = dd->ipath_pd[0]; + dd->ipath_pd[0] = NULL; + ipath_free_pddata(dd, pd); + } dd->ipath_f_tidtemplate(dd); ret = ipath_create_rcvhdrq(dd, pd); - if (!ret) + if (!ret) { + dd->ipath_hdrqtailptr = + (volatile __le64 *)pd->port_rcvhdrtail_kvaddr; ret = create_port0_egr(dd); + } if (ret) ipath_dev_err(dd, "failed to allocate port 0 (kernel) " "rcvhdrq and/or egr bufs\n"); diff -r 09077b2f476f -r e29625bd9050 drivers/infiniband/hw/ipath/ipath_intr.c --- a/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700 @@ -36,6 +36,7 @@ #include "ips_common.h" #include "ipath_layer.h" +/* These are all rcv-related errors which we want to count for stats */ #define E_SUM_PKTERRS \ (INFINIPATH_E_RHDRLEN | INFINIPATH_E_RBADTID | \ INFINIPATH_E_RBADVERSION | INFINIPATH_E_RHDR | \ @@ -44,12 +45,25 @@ INFINIPATH_E_RFORMATERR | INFINIPATH_E_RUNSUPVL | \ INFINIPATH_E_RUNEXPCHAR | INFINIPATH_E_REBP) +/* These are all send-related errors which we want to count for stats */ #define E_SUM_ERRS \ (INFINIPATH_E_SPIOARMLAUNCH | INFINIPATH_E_SUNEXPERRPKTNUM | \ INFINIPATH_E_SDROPPEDDATAPKT | INFINIPATH_E_SDROPPEDSMPPKT | \ INFINIPATH_E_SMAXPKTLEN | INFINIPATH_E_SUNSUPVL | \ INFINIPATH_E_SMINPKTLEN | INFINIPATH_E_SPKTLEN | \ INFINIPATH_E_INVALIDADDR) + +/* + * these are errors that can occur when the link changes state while + * a packet is being sent or received. This doesn't cover things + * like EBP or VCRC that can be the result of a sending having the + * link change state, so we receive a "known bad" packet. + */ +#define E_SUM_LINK_PKTERRS \ + (INFINIPATH_E_SDROPPEDDATAPKT | INFINIPATH_E_SDROPPEDSMPPKT | \ + INFINIPATH_E_SMINPKTLEN | INFINIPATH_E_SPKTLEN | \ + INFINIPATH_E_RSHORTPKTLEN | INFINIPATH_E_RMINPKTLEN | \ + INFINIPATH_E_RUNEXPCHAR) static u64 handle_e_sum_errs(struct ipath_devdata *dd, ipath_err_t errs) { @@ -100,9 +114,7 @@ static u64 handle_e_sum_errs(struct ipat if (ipath_debug & __IPATH_PKTDBG) printk("\n"); } - if ((errs & (INFINIPATH_E_SDROPPEDDATAPKT | - INFINIPATH_E_SDROPPEDSMPPKT | - INFINIPATH_E_SMINPKTLEN)) && + if ((errs & E_SUM_LINK_PKTERRS) && !(dd->ipath_flags & IPATH_LINKACTIVE)) { /* * This can happen when SMA is trying to bring the link @@ -111,11 +123,9 @@ static u64 handle_e_sum_errs(struct ipat * valid. We don't want to confuse people, so we just * don't print them, except at debug */ - ipath_dbg("Ignoring pktsend errors %llx, because not " - "yet active\n", (unsigned long long) errs); - ignore_this_time = INFINIPATH_E_SDROPPEDDATAPKT | - INFINIPATH_E_SDROPPEDSMPPKT | - INFINIPATH_E_SMINPKTLEN; + ipath_dbg("Ignoring packet errors %llx, because link not " + "ACTIVE\n", (unsigned long long) errs); + ignore_this_time = errs & E_SUM_LINK_PKTERRS; } return ignore_this_time; @@ -156,7 +166,29 @@ static void handle_e_ibstatuschanged(str */ val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcstatus); lstate = val & IPATH_IBSTATE_MASK; - if (lstate == IPATH_IBSTATE_INIT || lstate == IPATH_IBSTATE_ARM || + + /* + * this is confusing enough when it happens that I want to always put it + * on the console and in the logs. If it was a requested state change, + * we'll have already cleared the flags, so we won't print this warning + */ + if ((lstate != IPATH_IBSTATE_ARM && lstate != IPATH_IBSTATE_ACTIVE) + && (dd->ipath_flags & (IPATH_LINKARMED | IPATH_LINKACTIVE))) { + dev_info(&dd->pcidev->dev, "Link state changed from %s to %s\n", + (dd->ipath_flags & IPATH_LINKARMED) ? "ARM" : "ACTIVE", + ib_linkstate(lstate)); + /* + * Flush all queued sends when link went to DOWN or INIT, + * to be sure that they don't block SMA and other MAD packets + */ + ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, + INFINIPATH_S_ABORT); + ipath_disarm_piobufs(dd, dd->ipath_lastport_piobuf, + (unsigned)(dd->ipath_piobcnt2k + + dd->ipath_piobcnt4k) - + dd->ipath_lastport_piobuf); + } + else if (lstate == IPATH_IBSTATE_INIT || lstate == IPATH_IBSTATE_ARM || lstate == IPATH_IBSTATE_ACTIVE) { /* * only print at SMA if there is a change, debug if not @@ -379,6 +411,19 @@ static int handle_errors(struct ipath_de if (errs & E_SUM_ERRS) ignore_this_time = handle_e_sum_errs(dd, errs); + else if ((errs & E_SUM_LINK_PKTERRS) && + !(dd->ipath_flags & IPATH_LINKACTIVE)) { + /* + * This can happen when SMA is trying to bring the link + * up, but the IB link changes state at the "wrong" time. + * The IB logic then complains that the packet isn't + * valid. We don't want to confuse people, so we just + * don't print them, except at debug + */ + ipath_dbg("Ignoring packet errors %llx, because link not " + "ACTIVE\n", (unsigned long long) errs); + ignore_this_time = errs & E_SUM_LINK_PKTERRS; + } if (supp_msgs == 250000) { /* diff -r 09077b2f476f -r e29625bd9050 drivers/infiniband/hw/ipath/ipath_kernel.h --- a/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:28 2006 -0700 @@ -61,9 +61,7 @@ struct ipath_portdata { /* rcvhdrq base, needs mmap before useful */ void *port_rcvhdrq; /* kernel virtual address where hdrqtail is updated */ - u64 *port_rcvhdrtail_kvaddr; - /* page * used for uaddr */ - struct page *port_rcvhdrtail_pagep; + volatile __le64 *port_rcvhdrtail_kvaddr; /* * temp buffer for expected send setup, allocated at open, instead * of each setup call @@ -78,11 +76,7 @@ struct ipath_portdata { dma_addr_t port_rcvegr_phys; /* mmap of hdrq, must fit in 44 bits */ dma_addr_t port_rcvhdrq_phys; - /* - * the actual user address that we ipath_mlock'ed, so we can - * ipath_munlock it at close - */ - unsigned long port_rcvhdrtail_uaddr; + dma_addr_t port_rcvhdrqtailaddr_phys; /* * number of opens on this instance (0 or 1; ignoring forks, dup, * etc. for now) @@ -167,7 +161,6 @@ struct ipath_devdata { * only written to by the chip, not the driver. */ volatile __le64 *ipath_hdrqtailptr; - dma_addr_t ipath_dma_addr; /* ipath_cfgports pointers */ struct ipath_portdata **ipath_pd; /* sk_buffs used by port 0 eager receive queue */ @@ -518,10 +511,6 @@ struct ipath_devdata { u8 ipath_lmc; }; -extern volatile __le64 *ipath_port0_rcvhdrtail; -extern dma_addr_t ipath_port0_rcvhdrtail_dma; - -#define IPATH_PORT0_RCVHDRTAIL_SIZE PAGE_SIZE extern struct list_head ipath_dev_list; extern spinlock_t ipath_devs_lock; @@ -582,7 +571,7 @@ void ipath_disarm_piobufs(struct ipath_d unsigned cnt); int ipath_create_rcvhdrq(struct ipath_devdata *, struct ipath_portdata *); -void ipath_free_pddata(struct ipath_devdata *, u32, int); +void ipath_free_pddata(struct ipath_devdata *, struct ipath_portdata *); int ipath_parse_ushort(const char *str, unsigned short *valp); From bos at pathscale.com Fri May 12 16:42:57 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:42:57 -0700 Subject: [openib-general] [PATCH 12 of 53] ipath - reduce overhead of receive interrupts In-Reply-To: Message-ID: Somewhat reduce overhead on receive interrupts, and count the number of interrupts where that works (fastrcvint). Signed-off-by: Bryan O'Sullivan diff -r cc6d7f2537b2 -r ab2b013f1f95 drivers/infiniband/hw/ipath/ipath_common.h --- a/drivers/infiniband/hw/ipath/ipath_common.h Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_common.h Fri May 12 15:55:28 2006 -0700 @@ -96,8 +96,8 @@ struct infinipath_stats { __u64 sps_hwerrs; /* number of times IB link changed state unexpectedly */ __u64 sps_iblink; - /* no longer used; left for compatibility */ - __u64 sps_unused3; + /* kernel receive interrupts that didn't read intstat */ + __u64 sps_fastrcvint; /* number of kernel (port0) packets received */ __u64 sps_port0pkts; /* number of "ethernet" packets sent by driver */ diff -r cc6d7f2537b2 -r ab2b013f1f95 drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700 @@ -936,12 +936,7 @@ void ipath_kreceive(struct ipath_devdata (u32)le64_to_cpu(*dd->ipath_hdrqtailptr)) goto done; -gotmore: - /* - * read only once at start. If in flood situation, this helps - * performance slightly. If more arrive while we are processing, - * we'll come back here and do them - */ + /* read only once at start for performance */ hdrqtail = (u32)le64_to_cpu(*dd->ipath_hdrqtailptr); for (i = 0, l = dd->ipath_port0head; l != hdrqtail; i++) { @@ -1070,10 +1065,6 @@ gotmore: dd->ipath_port0head = l; - if (hdrqtail != (u32)le64_to_cpu(*dd->ipath_hdrqtailptr)) - /* more arrived while we handled first batch */ - goto gotmore; - if (pkttot > ipath_stats.sps_maxpkts_call) ipath_stats.sps_maxpkts_call = pkttot; ipath_stats.sps_port0pkts += pkttot; diff -r cc6d7f2537b2 -r ab2b013f1f95 drivers/infiniband/hw/ipath/ipath_intr.c --- a/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700 @@ -493,10 +493,10 @@ static void handle_errors(struct ipath_d continue; if (hd == (tl + 1) || (!hd && tl == dd->ipath_hdrqlast)) { + if (i == 0) + chkerrpkts = 1; dd->ipath_lastrcvhdrqtails[i] = tl; pd->port_hdrqfull++; - if (i == 0) - chkerrpkts = 1; } } } @@ -678,7 +678,12 @@ set: dd->ipath_sendctrl); } -static void handle_rcv(struct ipath_devdata *dd, u32 istat) +/* + * Handle receive interrupts for user ports; this means a user + * process was waiting for a packet to arrive, and didn't want + * to poll + */ +static void handle_urcv(struct ipath_devdata *dd, u32 istat) { u64 portr; int i; @@ -688,22 +693,17 @@ static void handle_rcv(struct ipath_devd infinipath_i_rcvavail_mask) | ((istat >> INFINIPATH_I_RCVURG_SHIFT) & infinipath_i_rcvurg_mask); - for (i = 0; i < dd->ipath_cfgports; i++) { + for (i = 1; i < dd->ipath_cfgports; i++) { struct ipath_portdata *pd = dd->ipath_pd[i]; - if (portr & (1 << i) && pd && - pd->port_cnt) { - if (i == 0) - ipath_kreceive(dd); - else if (test_bit(IPATH_PORT_WAITING_RCV, - &pd->port_flag)) { - int rcbit; - clear_bit(IPATH_PORT_WAITING_RCV, - &pd->port_flag); - rcbit = i + INFINIPATH_R_INTRAVAIL_SHIFT; - clear_bit(1UL << rcbit, &dd->ipath_rcvctrl); - wake_up_interruptible(&pd->port_wait); - rcvdint = 1; - } + if (portr & (1 << i) && pd && pd->port_cnt && + test_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag)) { + int rcbit; + clear_bit(IPATH_PORT_WAITING_RCV, + &pd->port_flag); + rcbit = i + INFINIPATH_R_INTRAVAIL_SHIFT; + clear_bit(1UL << rcbit, &dd->ipath_rcvctrl); + wake_up_interruptible(&pd->port_wait); + rcvdint = 1; } } if (rcvdint) { @@ -721,19 +721,66 @@ irqreturn_t ipath_intr(int irq, void *da struct ipath_devdata *dd = data; u32 istat; ipath_err_t estat = 0; + irqreturn_t ret; + u32 p0bits; static unsigned unexpected = 0; - irqreturn_t ret; + static const u32 port0rbits = (1U<ipath_flags & IPATH_PRESENT)) { - /* this is mostly so we don't try to touch the chip while - * it is being reset */ - /* - * This return value is perhaps odd, but we do not want the + /* + * This return value is not great, but we do not want the * interrupt core code to remove our interrupt handler * because we don't appear to be handling an interrupt * during a chip reset. */ return IRQ_HANDLED; + } + + /* + * this needs to be flags&initted, not statusp, so we keep + * taking interrupts even after link goes down, etc. + * Also, we *must* clear the interrupt at some point, or we won't + * take it again, which can be real bad for errors, etc... + */ + + if (!(dd->ipath_flags & IPATH_INITTED)) { + ipath_bad_intr(dd, &unexpected); + ret = IRQ_NONE; + goto bail; + } + + /* + * We try to avoid readint the interrupt status register, since + * that's a PIO read, and stalls the processor for up to about + * ~0.25 usec. The idea is that if we processed a port0 packet, + * we blindly clear the port 0 receive interrupt bits, and nothing + * else, then return. If other interrupts are pending, the chip + * will re-interrupt us as soon as we write the intclear register. + * We then won't process any more kernel packets (if not the 2nd + * time, then the 3rd or 4th) and we'll then handle the other + * interrupts. We clear the interrupts first so that we don't + * lose intr for later packets that arrive while we are processing. + */ + if (dd->ipath_port0head != + (u32)le64_to_cpu(*dd->ipath_hdrqtailptr)) { + u32 oldhead = dd->ipath_port0head; + if(dd->ipath_flags & IPATH_GPIO_INTR) { + ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear, + (u64) (1 << 2)); + p0bits = port0rbits | INFINIPATH_I_GPIO; + } + else + p0bits = port0rbits; + ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, p0bits); + ipath_kreceive(dd); + if(oldhead != dd->ipath_port0head) { + ipath_stats.sps_fastrcvint++; + goto done; + } + istat = ipath_read_kreg32(dd, dd->ipath_kregs->kr_intstatus); } istat = ipath_read_kreg32(dd, dd->ipath_kregs->kr_intstatus); @@ -749,31 +796,17 @@ irqreturn_t ipath_intr(int irq, void *da goto bail; } - ipath_stats.sps_ints++; - - /* - * this needs to be flags&initted, not statusp, so we keep - * taking interrupts even after link goes down, etc. - * Also, we *must* clear the interrupt at some point, or we won't - * take it again, which can be real bad for errors, etc... - */ - - if (!(dd->ipath_flags & IPATH_INITTED)) { - ipath_bad_intr(dd, &unexpected); - ret = IRQ_NONE; - goto bail; - } if (unexpected) unexpected = 0; - ipath_cdbg(VERBOSE, "intr stat=0x%x\n", istat); - - if (istat & ~infinipath_i_bitsextant) + if(unlikely(istat & ~infinipath_i_bitsextant)) ipath_dev_err(dd, "interrupt with unknown interrupts %x set\n", istat & (u32) ~ infinipath_i_bitsextant); - - if (istat & INFINIPATH_I_ERROR) { + else + ipath_cdbg(VERBOSE, "intr stat=0x%x\n", istat); + + if(unlikely(istat & INFINIPATH_I_ERROR)) { ipath_stats.sps_errints++; estat = ipath_read_kreg64(dd, dd->ipath_kregs->kr_errorstatus); @@ -791,7 +824,14 @@ irqreturn_t ipath_intr(int irq, void *da handle_errors(dd, estat); } + p0bits = port0rbits; if (istat & INFINIPATH_I_GPIO) { + /* + * Packets are available in the port 0 rcv queue. + * Eventually this needs to be generalized to check + * IPATH_GPIO_INTR, and the specific GPIO bit, if + * GPIO interrupts are used for anything else. + */ if (unlikely(!(dd->ipath_flags & IPATH_GPIO_INTR))) { u32 gpiostatus; gpiostatus = ipath_read_kreg32( @@ -805,14 +845,7 @@ irqreturn_t ipath_intr(int irq, void *da /* Clear GPIO status bit 2 */ ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear, (u64) (1 << 2)); - - /* - * Packets are available in the port 0 rcv queue. - * Eventually this needs to be generalized to check - * IPATH_GPIO_INTR, and the specific GPIO bit, if - * GPIO interrupts are used for anything else. - */ - ipath_kreceive(dd); + p0bits |= INFINIPATH_I_GPIO; } } @@ -825,6 +858,25 @@ irqreturn_t ipath_intr(int irq, void *da */ ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, istat); + /* + * we check for both transition from empty to non-empty, and urgent + * packets (those with the interrupt bit set in the header), and + * if enabled, the GPIO bit 2 interrupt used for port0 on some + * HT-400 boards. + * Do this before checking for pio buffers available, since + * receives can overflow; piobuf waiters can afford a few + * extra cycles, since they were waiting anyway. + */ + if(istat & p0bits) { + ipath_kreceive(dd); + istat &= ~port0rbits; + } + if (istat & ((infinipath_i_rcvavail_mask << + INFINIPATH_I_RCVAVAIL_SHIFT) + | (infinipath_i_rcvurg_mask << + INFINIPATH_I_RCVURG_SHIFT))) + handle_urcv(dd, istat); + if (istat & INFINIPATH_I_SPIOBUFAVAIL) { clear_bit(IPATH_S_PIOINTBUFAVAIL, &dd->ipath_sendctrl); ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, @@ -836,17 +888,7 @@ irqreturn_t ipath_intr(int irq, void *da handle_layer_pioavail(dd); } - /* - * we check for both transition from empty to non-empty, and urgent - * packets (those with the interrupt bit set in the header) - */ - - if (istat & ((infinipath_i_rcvavail_mask << - INFINIPATH_I_RCVAVAIL_SHIFT) - | (infinipath_i_rcvurg_mask << - INFINIPATH_I_RCVURG_SHIFT))) - handle_rcv(dd, istat); - +done: ret = IRQ_HANDLED; bail: diff -r cc6d7f2537b2 -r ab2b013f1f95 drivers/infiniband/hw/ipath/ipath_stats.c --- a/drivers/infiniband/hw/ipath/ipath_stats.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_stats.c Fri May 12 15:55:28 2006 -0700 @@ -185,7 +185,6 @@ static void ipath_qcheck(struct ipath_de dd->ipath_port0head, (unsigned long long) ipath_stats.sps_port0pkts); - ipath_kreceive(dd); } dd->ipath_lastport0rcv_cnt = ipath_stats.sps_port0pkts; } From bos at pathscale.com Fri May 12 16:42:56 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:42:56 -0700 Subject: [openib-general] [PATCH 11 of 53] ipath - don't modify QP if changes fail In-Reply-To: Message-ID: Make sure modify_qp won't modify the QP if any of the changes failed. Signed-off-by: Bryan O'Sullivan diff -r 2fea0d127a41 -r cc6d7f2537b2 drivers/infiniband/hw/ipath/ipath_qp.c --- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:27 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700 @@ -427,6 +427,7 @@ int ipath_modify_qp(struct ib_qp *ibqp, int ipath_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask) { + struct ipath_ibdev *dev = to_idev(ibqp->device); struct ipath_qp *qp = to_iqp(ibqp); enum ib_qp_state cur_state, new_state; unsigned long flags; @@ -443,6 +444,19 @@ int ipath_modify_qp(struct ib_qp *ibqp, attr_mask)) goto inval; + if (attr_mask & IB_QP_AV) + if (attr->ah_attr.dlid == 0 || + attr->ah_attr.dlid >= IPS_MULTICAST_LID_BASE) + goto inval; + + if (attr_mask & IB_QP_PKEY_INDEX) + if (attr->pkey_index >= ipath_layer_get_npkeys(dev->dd)) + goto inval; + + if (attr_mask & IB_QP_MIN_RNR_TIMER) + if (attr->min_rnr_timer > 31) + goto inval; + switch (new_state) { case IB_QPS_RESET: ipath_reset_qp(qp); @@ -457,13 +471,8 @@ int ipath_modify_qp(struct ib_qp *ibqp, } - if (attr_mask & IB_QP_PKEY_INDEX) { - struct ipath_ibdev *dev = to_idev(ibqp->device); - - if (attr->pkey_index >= ipath_layer_get_npkeys(dev->dd)) - goto inval; + if (attr_mask & IB_QP_PKEY_INDEX) qp->s_pkey_index = attr->pkey_index; - } if (attr_mask & IB_QP_DEST_QPN) qp->remote_qpn = attr->dest_qp_num; @@ -479,12 +488,8 @@ int ipath_modify_qp(struct ib_qp *ibqp, if (attr_mask & IB_QP_ACCESS_FLAGS) qp->qp_access_flags = attr->qp_access_flags; - if (attr_mask & IB_QP_AV) { - if (attr->ah_attr.dlid == 0 || - attr->ah_attr.dlid >= IPS_MULTICAST_LID_BASE) - goto inval; + if (attr_mask & IB_QP_AV) qp->remote_ah_attr = attr->ah_attr; - } if (attr_mask & IB_QP_PATH_MTU) qp->path_mtu = attr->path_mtu; @@ -499,11 +504,8 @@ int ipath_modify_qp(struct ib_qp *ibqp, qp->s_rnr_retry_cnt = qp->s_rnr_retry; } - if (attr_mask & IB_QP_MIN_RNR_TIMER) { - if (attr->min_rnr_timer > 31) - goto inval; + if (attr_mask & IB_QP_MIN_RNR_TIMER) qp->s_min_rnr_timer = attr->min_rnr_timer; - } if (attr_mask & IB_QP_QKEY) qp->qkey = attr->qkey; From bos at pathscale.com Fri May 12 16:43:05 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:05 -0700 Subject: [openib-general] [PATCH 20 of 53] ipath - more sharing between RC and UC code In-Reply-To: Message-ID: <201654fe19625588a574.1147477385@eng-12.pathscale.com> Share more common code between RC and UC protocols. Signed-off-by: Bryan O'Sullivan diff -r 947e92f4b370 -r 201654fe1962 drivers/infiniband/hw/ipath/ipath_qp.c --- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700 @@ -718,9 +718,7 @@ struct ib_qp *ipath_create_qp(struct ib_ spin_lock_init(&qp->r_rq.lock); atomic_set(&qp->refcount, 0); init_waitqueue_head(&qp->wait); - tasklet_init(&qp->s_task, - init_attr->qp_type == IB_QPT_RC ? - ipath_do_rc_send : ipath_do_uc_send, + tasklet_init(&qp->s_task, ipath_do_ruc_send, (unsigned long)qp); INIT_LIST_HEAD(&qp->piowait); INIT_LIST_HEAD(&qp->timerwait); @@ -905,9 +903,9 @@ void ipath_get_credit(struct ipath_qp *q * as many packets as we like. Otherwise, we have to * honor the credit field. */ - if (credit == IPS_AETH_CREDIT_INVAL) { + if (credit == IPS_AETH_CREDIT_INVAL) qp->s_lsn = (u32) -1; - } else if (qp->s_lsn != (u32) -1) { + else if (qp->s_lsn != (u32) -1) { /* Compute new LSN (i.e., MSN + credit) */ credit = (aeth + credit_table[credit]) & IPS_MSN_MASK; if (ipath_cmp24(credit, qp->s_lsn) > 0) diff -r 947e92f4b370 -r 201654fe1962 drivers/infiniband/hw/ipath/ipath_rc.c --- a/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:28 2006 -0700 @@ -72,9 +72,9 @@ static void ipath_init_restart(struct ip * Return bth0 if constructed; otherwise, return 0. * Note the QP s_lock must be held. */ -static inline u32 ipath_make_rc_ack(struct ipath_qp *qp, - struct ipath_other_headers *ohdr, - u32 pmtu) +u32 ipath_make_rc_ack(struct ipath_qp *qp, + struct ipath_other_headers *ohdr, + u32 pmtu) { struct ipath_sge_state *ss; u32 hwords; @@ -95,8 +95,7 @@ static inline u32 ipath_make_rc_ack(stru if (len > pmtu) { len = pmtu; qp->s_ack_state = OP(RDMA_READ_RESPONSE_FIRST); - } - else + } else qp->s_ack_state = OP(RDMA_READ_RESPONSE_ONLY); qp->s_rdma_len -= len; bth0 = qp->s_ack_state << 24; @@ -135,7 +134,8 @@ static inline u32 ipath_make_rc_ack(stru */ qp->r_state = OP(RDMA_READ_RESPONSE_LAST); qp->s_ack_state = OP(ACKNOWLEDGE); - return 0; + bth0 = 0; + goto bail; case OP(COMPARE_SWAP): case OP(FETCH_ADD): @@ -143,7 +143,7 @@ static inline u32 ipath_make_rc_ack(stru len = 0; qp->r_state = OP(SEND_LAST); qp->s_ack_state = OP(ACKNOWLEDGE); - bth0 = IB_OPCODE_ATOMIC_ACKNOWLEDGE << 24; + bth0 = OP(ATOMIC_ACKNOWLEDGE) << 24; ohdr->u.at.aeth = ipath_compute_aeth(qp); ohdr->u.at.atomic_ack_eth = cpu_to_be64(qp->s_ack_atomic); hwords += sizeof(ohdr->u.at) / 4; @@ -162,6 +162,7 @@ static inline u32 ipath_make_rc_ack(stru qp->s_cur_sge = ss; qp->s_cur_size = len; +bail: return bth0; } @@ -176,9 +177,9 @@ static inline u32 ipath_make_rc_ack(stru * Return 1 if constructed; otherwise, return 0. * Note the QP s_lock must be held. */ -static inline int ipath_make_rc_req(struct ipath_qp *qp, - struct ipath_other_headers *ohdr, - u32 pmtu, u32 *bth0p, u32 *bth2p) +int ipath_make_rc_req(struct ipath_qp *qp, + struct ipath_other_headers *ohdr, + u32 pmtu, u32 *bth0p, u32 *bth2p) { struct ipath_ibdev *dev = to_idev(qp->ibqp.device); struct ipath_sge_state *ss; @@ -257,7 +258,7 @@ static inline int ipath_make_rc_req(stru break; case IB_WR_RDMA_WRITE: - if (newreq) + if (newreq && qp->s_lsn != (u32) -1) qp->s_lsn++; /* FALLTHROUGH */ case IB_WR_RDMA_WRITE_WITH_IMM: @@ -283,8 +284,7 @@ static inline int ipath_make_rc_req(stru else { qp->s_state = OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE); - /* Immediate data comes - * after RETH */ + /* Immediate data comes after RETH */ ohdr->u.rc.imm_data = wqe->wr.imm_data; hwords += 1; if (wqe->wr.send_flags & IB_SEND_SOLICITED) @@ -304,7 +304,8 @@ static inline int ipath_make_rc_req(stru qp->s_state = OP(RDMA_READ_REQUEST); hwords += sizeof(ohdr->u.rc.reth) / 4; if (newreq) { - qp->s_lsn++; + if (qp->s_lsn != (u32) -1) + qp->s_lsn++; /* * Adjust s_next_psn to count the * expected number of responses. @@ -335,7 +336,8 @@ static inline int ipath_make_rc_req(stru wqe->wr.wr.atomic.compare_add); hwords += sizeof(struct ib_atomic_eth) / 4; if (newreq) { - qp->s_lsn++; + if (qp->s_lsn != (u32) -1) + qp->s_lsn++; wqe->lpsn = wqe->psn; } if (++qp->s_cur == qp->s_size) @@ -355,6 +357,11 @@ static inline int ipath_make_rc_req(stru bth2 |= qp->s_psn++ & IPS_PSN_MASK; if ((int)(qp->s_psn - qp->s_next_psn) > 0) qp->s_next_psn = qp->s_psn; + /* + * Put the QP on the pending list so lost ACKs will cause + * a retry. More than one request can be pending so the + * QP may already be on the dev->pending list. + */ spin_lock(&dev->pending_lock); if (list_empty(&qp->timerwait)) list_add_tail(&qp->timerwait, @@ -364,8 +371,8 @@ static inline int ipath_make_rc_req(stru case OP(RDMA_READ_RESPONSE_FIRST): /* - * This case can only happen if a send is restarted. See - * ipath_restart_rc(). + * This case can only happen if a send is restarted. + * See ipath_restart_rc(). */ ipath_init_restart(qp, wqe); /* FALLTHROUGH */ @@ -496,176 +503,48 @@ done: return 0; } -static inline void ipath_make_rc_grh(struct ipath_qp *qp, - struct ib_global_route *grh, - u32 nwords) -{ - struct ipath_ibdev *dev = to_idev(qp->ibqp.device); - - /* GRH header size in 32-bit words. */ - qp->s_hdrwords += 10; - qp->s_hdr.u.l.grh.version_tclass_flow = - cpu_to_be32((6 << 28) | - (grh->traffic_class << 20) | - grh->flow_label); - qp->s_hdr.u.l.grh.paylen = - cpu_to_be16(((qp->s_hdrwords - 12) + nwords + - SIZE_OF_CRC) << 2); - /* next_hdr is defined by C8-7 in ch. 8.4.1 */ - qp->s_hdr.u.l.grh.next_hdr = 0x1B; - qp->s_hdr.u.l.grh.hop_limit = grh->hop_limit; - /* The SGID is 32-bit aligned. */ - qp->s_hdr.u.l.grh.sgid.global.subnet_prefix = dev->gid_prefix; - qp->s_hdr.u.l.grh.sgid.global.interface_id = - ipath_layer_get_guid(dev->dd); - qp->s_hdr.u.l.grh.dgid = grh->dgid; -} - /** - * ipath_do_rc_send - perform a send on an RC QP - * @data: contains a pointer to the QP + * send_rc_ack - Construct an ACK packet and send it + * @qp: a pointer to the QP * - * Process entries in the send work queue until credit or queue is - * exhausted. Only allow one CPU to send a packet per QP (tasklet). - * Otherwise, after we drop the QP s_lock, two threads could send - * packets out of order. + * This is called from ipath_rc_rcv() and only uses the receive + * side QP state. + * Note that RDMA reads are handled in the send side QP state and tasklet. */ -void ipath_do_rc_send(unsigned long data) -{ - struct ipath_qp *qp = (struct ipath_qp *)data; - struct ipath_ibdev *dev = to_idev(qp->ibqp.device); - unsigned long flags; - u16 lrh0; - u32 nwords; - u32 extra_bytes; - u32 bth0; - u32 bth2; - u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu); - struct ipath_other_headers *ohdr; - - if (test_and_set_bit(IPATH_S_BUSY, &qp->s_flags)) - goto bail; - - if (unlikely(qp->remote_ah_attr.dlid == - ipath_layer_get_lid(dev->dd))) { - struct ib_wc wc; - - /* - * Pass in an uninitialized ib_wc to be consistent with - * other places where ipath_ruc_loopback() is called. - */ - ipath_ruc_loopback(qp, &wc); - goto clear; - } - - ohdr = &qp->s_hdr.u.oth; - if (qp->remote_ah_attr.ah_flags & IB_AH_GRH) - ohdr = &qp->s_hdr.u.l.oth; - -again: - /* Check for a constructed packet to be sent. */ - if (qp->s_hdrwords != 0) { - /* - * If no PIO bufs are available, return. An interrupt will - * call ipath_ib_piobufavail() when one is available. - */ - _VERBS_INFO("h %u %p\n", qp->s_hdrwords, &qp->s_hdr); - _VERBS_INFO("d %u %p %u %p %u %u %u %u\n", qp->s_cur_size, - qp->s_cur_sge->sg_list, - qp->s_cur_sge->num_sge, - qp->s_cur_sge->sge.vaddr, - qp->s_cur_sge->sge.sge_length, - qp->s_cur_sge->sge.length, - qp->s_cur_sge->sge.m, - qp->s_cur_sge->sge.n); - if (ipath_verbs_send(dev->dd, qp->s_hdrwords, - (u32 *) &qp->s_hdr, qp->s_cur_size, - qp->s_cur_sge)) { - ipath_no_bufs_available(qp, dev); - goto bail; - } - dev->n_unicast_xmit++; - /* Record that we sent the packet and s_hdr is empty. */ - qp->s_hdrwords = 0; - } - - /* - * The lock is needed to synchronize between setting - * qp->s_ack_state, resend timer, and post_send(). - */ - spin_lock_irqsave(&qp->s_lock, flags); - - /* Sending responses has higher priority over sending requests. */ - if (qp->s_ack_state != OP(ACKNOWLEDGE) && - (bth0 = ipath_make_rc_ack(qp, ohdr, pmtu)) != 0) - bth2 = qp->s_ack_psn++ & IPS_PSN_MASK; - else if (!ipath_make_rc_req(qp, ohdr, pmtu, &bth0, &bth2)) - goto done; - - spin_unlock_irqrestore(&qp->s_lock, flags); - - /* Construct the header. */ - extra_bytes = (4 - qp->s_cur_size) & 3; - nwords = (qp->s_cur_size + extra_bytes) >> 2; - lrh0 = IPS_LRH_BTH; - if (unlikely(qp->remote_ah_attr.ah_flags & IB_AH_GRH)) { - ipath_make_rc_grh(qp, &qp->remote_ah_attr.grh, nwords); - lrh0 = IPS_LRH_GRH; - } - lrh0 |= qp->remote_ah_attr.sl << 4; - qp->s_hdr.lrh[0] = cpu_to_be16(lrh0); - qp->s_hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid); - qp->s_hdr.lrh[2] = cpu_to_be16(qp->s_hdrwords + nwords + - SIZE_OF_CRC); - qp->s_hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->dd)); - bth0 |= ipath_layer_get_pkey(dev->dd, qp->s_pkey_index); - bth0 |= extra_bytes << 20; - ohdr->bth[0] = cpu_to_be32(bth0); - ohdr->bth[1] = cpu_to_be32(qp->remote_qpn); - ohdr->bth[2] = cpu_to_be32(bth2); - - /* Check for more work to do. */ - goto again; - -done: - spin_unlock_irqrestore(&qp->s_lock, flags); -clear: - clear_bit(IPATH_S_BUSY, &qp->s_flags); -bail: - return; -} - static void send_rc_ack(struct ipath_qp *qp) { struct ipath_ibdev *dev = to_idev(qp->ibqp.device); u16 lrh0; u32 bth0; + u32 hwords; + struct ipath_ib_header hdr; struct ipath_other_headers *ohdr; /* Construct the header. */ - ohdr = &qp->s_hdr.u.oth; + ohdr = &hdr.u.oth; lrh0 = IPS_LRH_BTH; /* header size in 32-bit words LRH+BTH+AETH = (8+12+4)/4. */ - qp->s_hdrwords = 6; + hwords = 6; if (unlikely(qp->remote_ah_attr.ah_flags & IB_AH_GRH)) { - ipath_make_rc_grh(qp, &qp->remote_ah_attr.grh, 0); - ohdr = &qp->s_hdr.u.l.oth; + hwords += ipath_make_grh(dev, &hdr.u.l.grh, + &qp->remote_ah_attr.grh, + hwords, 0); + ohdr = &hdr.u.l.oth; lrh0 = IPS_LRH_GRH; } bth0 = ipath_layer_get_pkey(dev->dd, qp->s_pkey_index); ohdr->u.aeth = ipath_compute_aeth(qp); if (qp->s_ack_state >= OP(COMPARE_SWAP)) { - bth0 |= IB_OPCODE_ATOMIC_ACKNOWLEDGE << 24; + bth0 |= OP(ATOMIC_ACKNOWLEDGE) << 24; ohdr->u.at.atomic_ack_eth = cpu_to_be64(qp->s_ack_atomic); - qp->s_hdrwords += sizeof(ohdr->u.at.atomic_ack_eth) / 4; - } - else + hwords += sizeof(ohdr->u.at.atomic_ack_eth) / 4; + } else bth0 |= OP(ACKNOWLEDGE) << 24; lrh0 |= qp->remote_ah_attr.sl << 4; - qp->s_hdr.lrh[0] = cpu_to_be16(lrh0); - qp->s_hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid); - qp->s_hdr.lrh[2] = cpu_to_be16(qp->s_hdrwords + SIZE_OF_CRC); - qp->s_hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->dd)); + hdr.lrh[0] = cpu_to_be16(lrh0); + hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid); + hdr.lrh[2] = cpu_to_be16(hwords + SIZE_OF_CRC); + hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->dd)); ohdr->bth[0] = cpu_to_be32(bth0); ohdr->bth[1] = cpu_to_be32(qp->remote_qpn); ohdr->bth[2] = cpu_to_be32(qp->s_ack_psn & IPS_PSN_MASK); @@ -673,12 +552,93 @@ static void send_rc_ack(struct ipath_qp /* * If we can send the ACK, clear the ACK state. */ - if (ipath_verbs_send(dev->dd, qp->s_hdrwords, (u32 *) &qp->s_hdr, - 0, NULL) == 0) { + if (ipath_verbs_send(dev->dd, hwords, (u32 *) &hdr, 0, NULL) == 0) { qp->s_ack_state = OP(ACKNOWLEDGE); + dev->n_unicast_xmit++; + } else dev->n_rc_qacks++; - dev->n_unicast_xmit++; - } +} + +/** + * reset_psn - reset the QP state to send starting from PSN + * @qp: the QP + * @psn: the packet sequence number to restart at + * + * This is called from ipath_rc_rcv() to process an incoming RC ACK + * for the given QP. + * Called at interrupt level with the QP s_lock held. + */ +static void reset_psn(struct ipath_qp *qp, u32 psn) +{ + u32 n = qp->s_last; + struct ipath_swqe *wqe = get_swqe_ptr(qp, n); + u32 opcode; + + qp->s_cur = n; + + /* + * If we are starting the request from the beginning, + * let the normal send code handle initialization. + */ + if (ipath_cmp24(psn, wqe->psn) <= 0) { + qp->s_state = OP(SEND_LAST); + goto done; + } + + /* Find the work request opcode corresponding to the given PSN. */ + opcode = wqe->wr.opcode; + for (;;) { + int diff; + + if (++n == qp->s_size) + n = 0; + if (n == qp->s_tail) + break; + wqe = get_swqe_ptr(qp, n); + diff = ipath_cmp24(psn, wqe->psn); + if (diff < 0) + break; + qp->s_cur = n; + /* + * If we are starting the request from the beginning, + * let the normal send code handle initialization. + */ + if (diff == 0) { + qp->s_state = OP(SEND_LAST); + goto done; + } + opcode = wqe->wr.opcode; + } + + /* + * Set the state to restart in the middle of a request. + * Don't change the s_sge, s_cur_sge, or s_cur_size. + * See ipath_do_rc_send(). + */ + switch (opcode) { + case IB_WR_SEND: + case IB_WR_SEND_WITH_IMM: + qp->s_state = OP(RDMA_READ_RESPONSE_FIRST); + break; + + case IB_WR_RDMA_WRITE: + case IB_WR_RDMA_WRITE_WITH_IMM: + qp->s_state = OP(RDMA_READ_RESPONSE_LAST); + break; + + case IB_WR_RDMA_READ: + qp->s_state = OP(RDMA_READ_RESPONSE_MIDDLE); + break; + + default: + /* + * This case shouldn't happen since its only + * one PSN per req. + */ + qp->s_state = OP(SEND_LAST); + } +done: + qp->s_psn = psn; } /** @@ -693,7 +653,6 @@ void ipath_restart_rc(struct ipath_qp *q { struct ipath_swqe *wqe = get_swqe_ptr(qp, qp->s_last); struct ipath_ibdev *dev; - u32 n; /* * If there are no requests pending, we are done. @@ -735,130 +694,13 @@ void ipath_restart_rc(struct ipath_qp *q else dev->n_rc_resends += (int)qp->s_psn - (int)psn; - /* - * If we are starting the request from the beginning, let the normal - * send code handle initialization. - */ - qp->s_cur = qp->s_last; - if (ipath_cmp24(psn, wqe->psn) <= 0) { - qp->s_state = OP(SEND_LAST); - qp->s_psn = wqe->psn; - } else { - n = qp->s_cur; - for (;;) { - if (++n == qp->s_size) - n = 0; - if (n == qp->s_tail) { - if (ipath_cmp24(psn, qp->s_next_psn) >= 0) { - qp->s_cur = n; - wqe = get_swqe_ptr(qp, n); - } - break; - } - wqe = get_swqe_ptr(qp, n); - if (ipath_cmp24(psn, wqe->psn) < 0) - break; - qp->s_cur = n; - } - qp->s_psn = psn; - - /* - * Reset the state to restart in the middle of a request. - * Don't change the s_sge, s_cur_sge, or s_cur_size. - * See ipath_do_rc_send(). - */ - switch (wqe->wr.opcode) { - case IB_WR_SEND: - case IB_WR_SEND_WITH_IMM: - qp->s_state = OP(RDMA_READ_RESPONSE_FIRST); - break; - - case IB_WR_RDMA_WRITE: - case IB_WR_RDMA_WRITE_WITH_IMM: - qp->s_state = OP(RDMA_READ_RESPONSE_LAST); - break; - - case IB_WR_RDMA_READ: - qp->s_state = - OP(RDMA_READ_RESPONSE_MIDDLE); - break; - - default: - /* - * This case shouldn't happen since its only - * one PSN per req. - */ - qp->s_state = OP(SEND_LAST); - } - } + reset_psn(qp, psn); done: tasklet_hi_schedule(&qp->s_task); bail: return; -} - -/** - * reset_psn - reset the QP state to send starting from PSN - * @qp: the QP - * @psn: the packet sequence number to restart at - * - * This is called from ipath_rc_rcv() to process an incoming RC ACK - * for the given QP. - * Called at interrupt level with the QP s_lock held. - */ -static void reset_psn(struct ipath_qp *qp, u32 psn) -{ - struct ipath_swqe *wqe; - u32 n; - - n = qp->s_cur; - wqe = get_swqe_ptr(qp, n); - for (;;) { - if (++n == qp->s_size) - n = 0; - if (n == qp->s_tail) { - if (ipath_cmp24(psn, qp->s_next_psn) >= 0) { - qp->s_cur = n; - wqe = get_swqe_ptr(qp, n); - } - break; - } - wqe = get_swqe_ptr(qp, n); - if (ipath_cmp24(psn, wqe->psn) < 0) - break; - qp->s_cur = n; - } - qp->s_psn = psn; - - /* - * Set the state to restart in the middle of a - * request. Don't change the s_sge, s_cur_sge, or - * s_cur_size. See ipath_do_rc_send(). - */ - switch (wqe->wr.opcode) { - case IB_WR_SEND: - case IB_WR_SEND_WITH_IMM: - qp->s_state = OP(RDMA_READ_RESPONSE_FIRST); - break; - - case IB_WR_RDMA_WRITE: - case IB_WR_RDMA_WRITE_WITH_IMM: - qp->s_state = OP(RDMA_READ_RESPONSE_LAST); - break; - - case IB_WR_RDMA_READ: - qp->s_state = OP(RDMA_READ_RESPONSE_MIDDLE); - break; - - default: - /* - * This case shouldn't happen since its only - * one PSN per req. - */ - qp->s_state = OP(SEND_LAST); - } } /** @@ -867,7 +709,7 @@ static void reset_psn(struct ipath_qp *q * @psn: the packet sequence number of the ACK * @opcode: the opcode of the request that resulted in the ACK * - * This is called from ipath_rc_rcv() to process an incoming RC ACK + * This is called from ipath_rc_rcv_resp() to process an incoming RC ACK * for the given QP. * Called at interrupt level with the QP s_lock held. * Returns 1 if OK, 0 if current operation should be aborted (NAK). @@ -1011,17 +853,7 @@ static int do_rc_ack(struct ipath_qp *qp dev->n_rc_resends += (int)qp->s_psn - (int)psn; - /* - * If we are starting the request from the beginning, let - * the normal send code handle initialization. - */ - qp->s_cur = qp->s_last; - wqe = get_swqe_ptr(qp, qp->s_cur); - if (ipath_cmp24(psn, wqe->psn) <= 0) { - qp->s_state = OP(SEND_LAST); - qp->s_psn = wqe->psn; - } else - reset_psn(qp, psn); + reset_psn(qp, psn); qp->s_rnr_timeout = ib_ipath_rnr_table[(aeth >> IPS_AETH_CREDIT_SHIFT) & @@ -1182,32 +1014,33 @@ static inline void ipath_rc_rcv_resp(str goto ack_done; } rdma_read: - if (unlikely(qp->s_state != OP(RDMA_READ_REQUEST))) - goto ack_done; - if (unlikely(tlen != (hdrsize + pmtu + 4))) - goto ack_done; - if (unlikely(pmtu >= qp->s_len)) - goto ack_done; - /* We got a response so update the timeout. */ - if (unlikely(qp->s_last == qp->s_tail || - get_swqe_ptr(qp, qp->s_last)->wr.opcode != - IB_WR_RDMA_READ)) - goto ack_done; - spin_lock(&dev->pending_lock); - if (qp->s_rnr_timeout == 0 && !list_empty(&qp->timerwait)) - list_move_tail(&qp->timerwait, - &dev->pending[dev->pending_index]); - spin_unlock(&dev->pending_lock); - /* - * Update the RDMA receive state but do the copy w/o holding the - * locks and blocking interrupts. XXX Yet another place that - * affects relaxed RDMA order since we don't want s_sge modified. - */ - qp->s_len -= pmtu; - qp->s_last_psn = psn; - spin_unlock_irqrestore(&qp->s_lock, flags); - ipath_copy_sge(&qp->s_sge, data, pmtu); - goto bail; + if (unlikely(qp->s_state != OP(RDMA_READ_REQUEST))) + goto ack_done; + if (unlikely(tlen != (hdrsize + pmtu + 4))) + goto ack_done; + if (unlikely(pmtu >= qp->s_len)) + goto ack_done; + /* We got a response so update the timeout. */ + if (unlikely(qp->s_last == qp->s_tail || + get_swqe_ptr(qp, qp->s_last)->wr.opcode != + IB_WR_RDMA_READ)) + goto ack_done; + spin_lock(&dev->pending_lock); + if (qp->s_rnr_timeout == 0 && !list_empty(&qp->timerwait)) + list_move_tail(&qp->timerwait, + &dev->pending[dev->pending_index]); + spin_unlock(&dev->pending_lock); + /* + * Update the RDMA receive state but do the copy w/o + * holding the locks and blocking interrupts. + * XXX Yet another place that affects relaxed RDMA order + * since we don't want s_sge modified. + */ + qp->s_len -= pmtu; + qp->s_last_psn = psn; + spin_unlock_irqrestore(&qp->s_lock, flags); + ipath_copy_sge(&qp->s_sge, data, pmtu); + goto bail; case OP(RDMA_READ_RESPONSE_LAST): /* ACKs READ req. */ @@ -1230,18 +1063,12 @@ static inline void ipath_rc_rcv_resp(str * ICRC (4). */ if (unlikely(tlen <= (hdrsize + pad + 8))) { - /* - * XXX Need to generate an error CQ - * entry. - */ + /* XXX Need to generate an error CQ entry. */ goto ack_done; } tlen -= hdrsize + pad + 8; if (unlikely(tlen != qp->s_len)) { - /* - * XXX Need to generate an error CQ - * entry. - */ + /* XXX Need to generate an error CQ entry. */ goto ack_done; } if (!header_in_data) @@ -1254,9 +1081,12 @@ static inline void ipath_rc_rcv_resp(str if (do_rc_ack(qp, aeth, psn, OP(RDMA_READ_RESPONSE_LAST))) { /* * Change the state so we contimue - * processing new requests. + * processing new requests and wake up the + * tasklet if there are posted sends. */ qp->s_state = OP(SEND_LAST); + if (qp->s_tail != qp->s_head) + tasklet_hi_schedule(&qp->s_task); } goto ack_done; } @@ -1295,6 +1125,8 @@ static inline int ipath_rc_rcv_error(str { struct ib_reth *reth; + spin_lock(&qp->s_lock); + if (diff > 0) { /* * Packet sequence error. @@ -1302,13 +1134,10 @@ static inline int ipath_rc_rcv_error(str * Don't queue the NAK if a RDMA read, atomic, or * NAK is pending though. */ - spin_lock(&qp->s_lock); if ((qp->s_ack_state >= OP(RDMA_READ_REQUEST) && - qp->s_ack_state != IB_OPCODE_ACKNOWLEDGE) || - qp->s_nak_state != 0) { - spin_unlock(&qp->s_lock); + qp->s_ack_state != OP(ACKNOWLEDGE)) || + qp->s_nak_state != 0) goto done; - } qp->s_ack_state = OP(SEND_ONLY); qp->s_nak_state = IB_NAK_PSN_ERROR; /* Use the expected PSN. */ @@ -1327,12 +1156,10 @@ static inline int ipath_rc_rcv_error(str * send the earliest so that RDMA reads can be restarted at * the requester's expected PSN. */ - spin_lock(&qp->s_lock); - if (qp->s_ack_state != IB_OPCODE_ACKNOWLEDGE && + if (qp->s_ack_state != OP(ACKNOWLEDGE) && ipath_cmp24(psn, qp->s_ack_psn) >= 0) { - if (qp->s_ack_state < IB_OPCODE_RDMA_READ_REQUEST) + if (qp->s_ack_state < OP(RDMA_READ_REQUEST)) qp->s_ack_psn = psn; - spin_unlock(&qp->s_lock); goto done; } switch (opcode) { @@ -1343,8 +1170,7 @@ static inline int ipath_rc_rcv_error(str * holding the s_lock. */ if (qp->s_ack_state != OP(ACKNOWLEDGE) && - qp->s_ack_state >= IB_OPCODE_RDMA_READ_REQUEST) { - spin_unlock(&qp->s_lock); + qp->s_ack_state >= OP(RDMA_READ_REQUEST)) { dev->n_rdma_dup_busy++; goto done; } @@ -1383,13 +1209,11 @@ static inline int ipath_rc_rcv_error(str case OP(COMPARE_SWAP): case OP(FETCH_ADD): /* - * Check for the PSN of the last atomic operations + * Check for the PSN of the last atomic operation * performed and resend the result if found. */ - if ((psn & IPS_PSN_MASK) != qp->r_atomic_psn) { - spin_unlock(&qp->s_lock); + if ((psn & IPS_PSN_MASK) != qp->r_atomic_psn) goto done; - } qp->s_ack_atomic = qp->r_atomic_data; break; } @@ -1400,6 +1224,7 @@ resched: return 0; done: + spin_unlock(&qp->s_lock); return 1; } @@ -1453,11 +1278,6 @@ void ipath_rc_rcv(struct ipath_ibdev *de } else psn = be32_to_cpu(ohdr->bth[2]); } - /* - * The opcode is in the low byte when its in network order - * (top byte when in host order). - */ - opcode = be32_to_cpu(ohdr->bth[0]) >> 24; /* * Process responses (ACKs) before anything else. Note that the @@ -1465,6 +1285,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de * queue rather than the expected receive packet sequence number. * In other words, this QP is the requester. */ + opcode = be32_to_cpu(ohdr->bth[0]) >> 24; if (opcode >= OP(RDMA_READ_RESPONSE_FIRST) && opcode <= OP(ATOMIC_ACKNOWLEDGE)) { ipath_rc_rcv_resp(dev, ohdr, data, tlen, qp, opcode, psn, @@ -1492,22 +1313,23 @@ void ipath_rc_rcv(struct ipath_ibdev *de opcode == OP(SEND_LAST_WITH_IMMEDIATE)) break; nack_inv: - /* - * A NAK will ACK earlier sends and RDMA writes. Don't queue the - * NAK if a RDMA read, atomic, or NAK is pending though. - */ - spin_lock(&qp->s_lock); - if (qp->s_ack_state >= OP(RDMA_READ_REQUEST) && - qp->s_ack_state != IB_OPCODE_ACKNOWLEDGE) { - spin_unlock(&qp->s_lock); - goto done; - } - /* XXX Flush WQEs */ - qp->state = IB_QPS_ERR; - qp->s_ack_state = OP(SEND_ONLY); - qp->s_nak_state = IB_NAK_INVALID_REQUEST; - qp->s_ack_psn = qp->r_psn; - goto resched; + /* + * A NAK will ACK earlier sends and RDMA writes. + * Don't queue the NAK if a RDMA read, atomic, or NAK + * is pending though. + */ + spin_lock(&qp->s_lock); + if (qp->s_ack_state >= OP(RDMA_READ_REQUEST) && + qp->s_ack_state != OP(ACKNOWLEDGE)) { + spin_unlock(&qp->s_lock); + goto done; + } + /* XXX Flush WQEs */ + qp->state = IB_QPS_ERR; + qp->s_ack_state = OP(SEND_ONLY); + qp->s_nak_state = IB_NAK_INVALID_REQUEST; + qp->s_ack_psn = qp->r_psn; + goto resched; case OP(RDMA_WRITE_FIRST): case OP(RDMA_WRITE_MIDDLE): @@ -1556,9 +1378,8 @@ void ipath_rc_rcv(struct ipath_ibdev *de * is pending though. */ spin_lock(&qp->s_lock); - if (qp->s_ack_state >= - OP(RDMA_READ_REQUEST) && - qp->s_ack_state != IB_OPCODE_ACKNOWLEDGE) { + if (qp->s_ack_state >= OP(RDMA_READ_REQUEST) && + qp->s_ack_state != OP(ACKNOWLEDGE)) { spin_unlock(&qp->s_lock); goto done; } @@ -1674,10 +1495,10 @@ void ipath_rc_rcv(struct ipath_ibdev *de * read, atomic, or NAK is pending though. */ spin_lock(&qp->s_lock); + nack_acc1: if (qp->s_ack_state >= OP(RDMA_READ_REQUEST) && - qp->s_ack_state != - IB_OPCODE_ACKNOWLEDGE) { + qp->s_ack_state != OP(ACKNOWLEDGE)) { spin_unlock(&qp->s_lock); goto done; } @@ -1715,9 +1536,16 @@ void ipath_rc_rcv(struct ipath_ibdev *de reth = (struct ib_reth *)data; data += sizeof(*reth); } + if (unlikely(!(qp->qp_access_flags & + IB_ACCESS_REMOTE_READ))) + goto nack_acc; + /* + * Ignore request if we already have an + * RDMA read or ATOMIC pending. + */ spin_lock(&qp->s_lock); if (qp->s_ack_state != OP(ACKNOWLEDGE) && - qp->s_ack_state >= IB_OPCODE_RDMA_READ_REQUEST) { + qp->s_ack_state >= OP(RDMA_READ_REQUEST)) { spin_unlock(&qp->s_lock); goto done; } @@ -1731,10 +1559,8 @@ void ipath_rc_rcv(struct ipath_ibdev *de ok = ipath_rkey_ok(dev, &qp->s_rdma_sge, qp->s_rdma_len, vaddr, rkey, IB_ACCESS_REMOTE_READ); - if (unlikely(!ok)) { - spin_unlock(&qp->s_lock); - goto nack_acc; - } + if (unlikely(!ok)) + goto nack_acc1; /* * Update the next expected PSN. We add 1 later * below, so only add the remainder here. @@ -1749,9 +1575,6 @@ void ipath_rc_rcv(struct ipath_ibdev *de qp->s_rdma_sge.sge.length = 0; qp->s_rdma_sge.sge.sge_length = 0; } - if (unlikely(!(qp->qp_access_flags & - IB_ACCESS_REMOTE_READ))) - goto nack_acc; /* * We need to increment the MSN here instead of when we * finish sending the result since a duplicate request would @@ -1821,7 +1644,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de */ spin_lock(&qp->s_lock); if (qp->s_ack_state == OP(ACKNOWLEDGE) || - qp->s_ack_state < IB_OPCODE_RDMA_READ_REQUEST) { + qp->s_ack_state < OP(RDMA_READ_REQUEST)) { qp->s_ack_state = opcode; qp->s_nak_state = 0; qp->s_ack_psn = psn; @@ -1843,6 +1666,8 @@ resched: (qp->s_ack_state < IB_OPCODE_RDMA_READ_REQUEST || qp->s_ack_state >= IB_OPCODE_COMPARE_SWAP)) send_rc_ack(qp); + else + dev->n_rc_qacks++; rdmadone: spin_unlock(&qp->s_lock); diff -r 947e92f4b370 -r 201654fe1962 drivers/infiniband/hw/ipath/ipath_ruc.c --- a/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:28 2006 -0700 @@ -31,6 +31,7 @@ */ #include "ipath_verbs.h" +#include "ips_common.h" /* * Convert the AETH RNR timeout code into the number of milliseconds. @@ -187,7 +188,6 @@ bail: /** * ipath_ruc_loopback - handle UC and RC lookback requests * @sqp: the loopback QP - * @wc: the work completion entry * * This is called from ipath_do_uc_send() or ipath_do_rc_send() to * forward a WQE addressed to the same HCA. @@ -196,13 +196,14 @@ bail: * receive interrupts since this is a connected protocol and all packets * will pass through here. */ -void ipath_ruc_loopback(struct ipath_qp *sqp, struct ib_wc *wc) +static void ipath_ruc_loopback(struct ipath_qp *sqp) { struct ipath_ibdev *dev = to_idev(sqp->ibqp.device); struct ipath_qp *qp; struct ipath_swqe *wqe; struct ipath_sge *sge; unsigned long flags; + struct ib_wc wc; u64 sdata; qp = ipath_lookup_qpn(&dev->qp_table, sqp->remote_qpn); @@ -233,8 +234,8 @@ again: wqe = get_swqe_ptr(sqp, sqp->s_last); spin_unlock_irqrestore(&sqp->s_lock, flags); - wc->wc_flags = 0; - wc->imm_data = 0; + wc.wc_flags = 0; + wc.imm_data = 0; sqp->s_sge.sge = wqe->sg_list[0]; sqp->s_sge.sg_list = wqe->sg_list + 1; @@ -242,8 +243,8 @@ again: sqp->s_len = wqe->length; switch (wqe->wr.opcode) { case IB_WR_SEND_WITH_IMM: - wc->wc_flags = IB_WC_WITH_IMM; - wc->imm_data = wqe->wr.imm_data; + wc.wc_flags = IB_WC_WITH_IMM; + wc.imm_data = wqe->wr.imm_data; /* FALLTHROUGH */ case IB_WR_SEND: spin_lock_irqsave(&qp->r_rq.lock, flags); @@ -254,7 +255,7 @@ again: if (qp->ibqp.qp_type == IB_QPT_UC) goto send_comp; if (sqp->s_rnr_retry == 0) { - wc->status = IB_WC_RNR_RETRY_EXC_ERR; + wc.status = IB_WC_RNR_RETRY_EXC_ERR; goto err; } if (sqp->s_rnr_retry_cnt < 7) @@ -269,8 +270,8 @@ again: break; case IB_WR_RDMA_WRITE_WITH_IMM: - wc->wc_flags = IB_WC_WITH_IMM; - wc->imm_data = wqe->wr.imm_data; + wc.wc_flags = IB_WC_WITH_IMM; + wc.imm_data = wqe->wr.imm_data; spin_lock_irqsave(&qp->r_rq.lock, flags); if (!ipath_get_rwqe(qp, 1)) goto rnr_nak; @@ -284,20 +285,20 @@ again: wqe->wr.wr.rdma.rkey, IB_ACCESS_REMOTE_WRITE))) { acc_err: - wc->status = IB_WC_REM_ACCESS_ERR; + wc.status = IB_WC_REM_ACCESS_ERR; err: - wc->wr_id = wqe->wr.wr_id; - wc->opcode = ib_ipath_wc_opcode[wqe->wr.opcode]; - wc->vendor_err = 0; - wc->byte_len = 0; - wc->qp_num = sqp->ibqp.qp_num; - wc->src_qp = sqp->remote_qpn; - wc->pkey_index = 0; - wc->slid = sqp->remote_ah_attr.dlid; - wc->sl = sqp->remote_ah_attr.sl; - wc->dlid_path_bits = 0; - wc->port_num = 0; - ipath_sqerror_qp(sqp, wc); + wc.wr_id = wqe->wr.wr_id; + wc.opcode = ib_ipath_wc_opcode[wqe->wr.opcode]; + wc.vendor_err = 0; + wc.byte_len = 0; + wc.qp_num = sqp->ibqp.qp_num; + wc.src_qp = sqp->remote_qpn; + wc.pkey_index = 0; + wc.slid = sqp->remote_ah_attr.dlid; + wc.sl = sqp->remote_ah_attr.sl; + wc.dlid_path_bits = 0; + wc.port_num = 0; + ipath_sqerror_qp(sqp, &wc); goto done; } break; @@ -373,22 +374,22 @@ again: goto send_comp; if (wqe->wr.opcode == IB_WR_RDMA_WRITE_WITH_IMM) - wc->opcode = IB_WC_RECV_RDMA_WITH_IMM; + wc.opcode = IB_WC_RECV_RDMA_WITH_IMM; else - wc->opcode = IB_WC_RECV; - wc->wr_id = qp->r_wr_id; - wc->status = IB_WC_SUCCESS; - wc->vendor_err = 0; - wc->byte_len = wqe->length; - wc->qp_num = qp->ibqp.qp_num; - wc->src_qp = qp->remote_qpn; + wc.opcode = IB_WC_RECV; + wc.wr_id = qp->r_wr_id; + wc.status = IB_WC_SUCCESS; + wc.vendor_err = 0; + wc.byte_len = wqe->length; + wc.qp_num = qp->ibqp.qp_num; + wc.src_qp = qp->remote_qpn; /* XXX do we know which pkey matched? Only needed for GSI. */ - wc->pkey_index = 0; - wc->slid = qp->remote_ah_attr.dlid; - wc->sl = qp->remote_ah_attr.sl; - wc->dlid_path_bits = 0; + wc.pkey_index = 0; + wc.slid = qp->remote_ah_attr.dlid; + wc.sl = qp->remote_ah_attr.sl; + wc.dlid_path_bits = 0; /* Signal completion event if the solicited bit is set. */ - ipath_cq_enter(to_icq(qp->ibqp.recv_cq), wc, + ipath_cq_enter(to_icq(qp->ibqp.recv_cq), &wc, wqe->wr.send_flags & IB_SEND_SOLICITED); send_comp: @@ -396,19 +397,19 @@ send_comp: if (!test_bit(IPATH_S_SIGNAL_REQ_WR, &sqp->s_flags) || (wqe->wr.send_flags & IB_SEND_SIGNALED)) { - wc->wr_id = wqe->wr.wr_id; - wc->status = IB_WC_SUCCESS; - wc->opcode = ib_ipath_wc_opcode[wqe->wr.opcode]; - wc->vendor_err = 0; - wc->byte_len = wqe->length; - wc->qp_num = sqp->ibqp.qp_num; - wc->src_qp = 0; - wc->pkey_index = 0; - wc->slid = 0; - wc->sl = 0; - wc->dlid_path_bits = 0; - wc->port_num = 0; - ipath_cq_enter(to_icq(sqp->ibqp.send_cq), wc, 0); + wc.wr_id = wqe->wr.wr_id; + wc.status = IB_WC_SUCCESS; + wc.opcode = ib_ipath_wc_opcode[wqe->wr.opcode]; + wc.vendor_err = 0; + wc.byte_len = wqe->length; + wc.qp_num = sqp->ibqp.qp_num; + wc.src_qp = 0; + wc.pkey_index = 0; + wc.slid = 0; + wc.sl = 0; + wc.dlid_path_bits = 0; + wc.port_num = 0; + ipath_cq_enter(to_icq(sqp->ibqp.send_cq), &wc, 0); } /* Update s_last now that we are finished with the SWQE */ @@ -454,11 +455,11 @@ void ipath_no_bufs_available(struct ipat } /** - * ipath_post_rc_send - post RC and UC sends + * ipath_post_ruc_send - post RC and UC sends * @qp: the QP to post on * @wr: the work request to send */ -int ipath_post_rc_send(struct ipath_qp *qp, struct ib_send_wr *wr) +int ipath_post_ruc_send(struct ipath_qp *qp, struct ib_send_wr *wr) { struct ipath_swqe *wqe; unsigned long flags; @@ -533,13 +534,149 @@ int ipath_post_rc_send(struct ipath_qp * qp->s_head = next; spin_unlock_irqrestore(&qp->s_lock, flags); - if (qp->ibqp.qp_type == IB_QPT_UC) - ipath_do_uc_send((unsigned long) qp); - else - ipath_do_rc_send((unsigned long) qp); + ipath_do_ruc_send((unsigned long) qp); ret = 0; bail: return ret; } + +/** + * ipath_make_grh - construct a GRH header + * @dev: a pointer to the ipath device + * @hdr: a pointer to the GRH header being constructed + * @grh: the global route address to send to + * @hwords: the number of 32 bit words of header being sent + * @nwords: the number of 32 bit words of data being sent + * + * Return the size of the header in 32 bit words. + */ +u32 ipath_make_grh(struct ipath_ibdev *dev, struct ib_grh *hdr, + struct ib_global_route *grh, u32 hwords, u32 nwords) +{ + hdr->version_tclass_flow = + cpu_to_be32((6 << 28) | + (grh->traffic_class << 20) | + grh->flow_label); + hdr->paylen = cpu_to_be16((hwords - 2 + nwords + SIZE_OF_CRC) << 2); + /* next_hdr is defined by C8-7 in ch. 8.4.1 */ + hdr->next_hdr = 0x1B; + hdr->hop_limit = grh->hop_limit; + /* The SGID is 32-bit aligned. */ + hdr->sgid.global.subnet_prefix = dev->gid_prefix; + hdr->sgid.global.interface_id = ipath_layer_get_guid(dev->dd); + hdr->dgid = grh->dgid; + + /* GRH header size in 32-bit words. */ + return sizeof(struct ib_grh) / sizeof(u32); +} + +/** + * ipath_do_ruc_send - perform a send on an RC or UC QP + * @data: contains a pointer to the QP + * + * Process entries in the send work queue until credit or queue is + * exhausted. Only allow one CPU to send a packet per QP (tasklet). + * Otherwise, after we drop the QP s_lock, two threads could send + * packets out of order. + */ +void ipath_do_ruc_send(unsigned long data) +{ + struct ipath_qp *qp = (struct ipath_qp *)data; + struct ipath_ibdev *dev = to_idev(qp->ibqp.device); + unsigned long flags; + u16 lrh0; + u32 nwords; + u32 extra_bytes; + u32 bth0; + u32 bth2; + u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu); + struct ipath_other_headers *ohdr; + + if (test_and_set_bit(IPATH_S_BUSY, &qp->s_flags)) + goto bail; + + if (unlikely(qp->remote_ah_attr.dlid == + ipath_layer_get_lid(dev->dd))) { + ipath_ruc_loopback(qp); + goto clear; + } + + ohdr = &qp->s_hdr.u.oth; + if (qp->remote_ah_attr.ah_flags & IB_AH_GRH) + ohdr = &qp->s_hdr.u.l.oth; + +again: + /* Check for a constructed packet to be sent. */ + if (qp->s_hdrwords != 0) { + /* + * If no PIO bufs are available, return. An interrupt will + * call ipath_ib_piobufavail() when one is available. + */ + if (ipath_verbs_send(dev->dd, qp->s_hdrwords, + (u32 *) &qp->s_hdr, qp->s_cur_size, + qp->s_cur_sge)) { + ipath_no_bufs_available(qp, dev); + goto bail; + } + dev->n_unicast_xmit++; + /* Record that we sent the packet and s_hdr is empty. */ + qp->s_hdrwords = 0; + } + + /* + * The lock is needed to synchronize between setting + * qp->s_ack_state, resend timer, and post_send(). + */ + spin_lock_irqsave(&qp->s_lock, flags); + + /* Sending responses has higher priority over sending requests. */ + if (qp->s_ack_state != IB_OPCODE_RC_ACKNOWLEDGE && + (bth0 = ipath_make_rc_ack(qp, ohdr, pmtu)) != 0) + bth2 = qp->s_ack_psn++ & IPS_PSN_MASK; + else if (!((qp->ibqp.qp_type == IB_QPT_RC) ? + ipath_make_rc_req(qp, ohdr, pmtu, &bth0, &bth2) : + ipath_make_uc_req(qp, ohdr, pmtu, &bth0, &bth2))) { + /* + * Clear the busy bit before unlocking to avoid races with + * adding new work queue items and then failing to process + * them. + */ + clear_bit(IPATH_S_BUSY, &qp->s_flags); + spin_unlock_irqrestore(&qp->s_lock, flags); + goto bail; + } + + spin_unlock_irqrestore(&qp->s_lock, flags); + + /* Construct the header. */ + extra_bytes = (4 - qp->s_cur_size) & 3; + nwords = (qp->s_cur_size + extra_bytes) >> 2; + lrh0 = IPS_LRH_BTH; + if (unlikely(qp->remote_ah_attr.ah_flags & IB_AH_GRH)) { + qp->s_hdrwords += ipath_make_grh(dev, &qp->s_hdr.u.l.grh, + &qp->remote_ah_attr.grh, + qp->s_hdrwords, nwords); + lrh0 = IPS_LRH_GRH; + } + lrh0 |= qp->remote_ah_attr.sl << 4; + qp->s_hdr.lrh[0] = cpu_to_be16(lrh0); + qp->s_hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid); + qp->s_hdr.lrh[2] = cpu_to_be16(qp->s_hdrwords + nwords + + SIZE_OF_CRC); + qp->s_hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->dd)); + bth0 |= ipath_layer_get_pkey(dev->dd, qp->s_pkey_index); + bth0 |= extra_bytes << 20; + ohdr->bth[0] = cpu_to_be32(bth0); + ohdr->bth[1] = cpu_to_be32(qp->remote_qpn); + ohdr->bth[2] = cpu_to_be32(bth2); + + /* Check for more work to do. */ + goto again; + +clear: + clear_bit(IPATH_S_BUSY, &qp->s_flags); +bail: + return; +} diff -r 947e92f4b370 -r 201654fe1962 drivers/infiniband/hw/ipath/ipath_uc.c --- a/drivers/infiniband/hw/ipath/ipath_uc.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_uc.c Fri May 12 15:55:28 2006 -0700 @@ -61,90 +61,40 @@ static void complete_last_send(struct ip } /** - * ipath_do_uc_send - do a send on a UC queue - * @data: contains a pointer to the QP to send on - * - * Process entries in the send work queue until the queue is exhausted. - * Only allow one CPU to send a packet per QP (tasklet). - * Otherwise, after we drop the QP lock, two threads could send - * packets out of order. - * This is similar to ipath_do_rc_send() below except we don't have - * timeouts or resends. + * ipath_make_uc_req - construct a request packet (SEND, RDMA write) + * @qp: a pointer to the QP + * @ohdr: a pointer to the IB header being constructed + * @pmtu: the path MTU + * @bth0p: pointer to the BTH opcode word + * @bth2p: pointer to the BTH PSN word + * + * Return 1 if constructed; otherwise, return 0. + * Note the QP s_lock must be held and interrupts disabled. */ -void ipath_do_uc_send(unsigned long data) +int ipath_make_uc_req(struct ipath_qp *qp, + struct ipath_other_headers *ohdr, + u32 pmtu, u32 *bth0p, u32 *bth2p) { - struct ipath_qp *qp = (struct ipath_qp *)data; - struct ipath_ibdev *dev = to_idev(qp->ibqp.device); struct ipath_swqe *wqe; - unsigned long flags; - u16 lrh0; u32 hwords; - u32 nwords; - u32 extra_bytes; u32 bth0; - u32 bth2; - u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu); u32 len; - struct ipath_other_headers *ohdr; struct ib_wc wc; - if (test_and_set_bit(IPATH_S_BUSY, &qp->s_flags)) - goto bail; - - if (unlikely(qp->remote_ah_attr.dlid == - ipath_layer_get_lid(dev->dd))) { - /* Pass in an uninitialized ib_wc to save stack space. */ - ipath_ruc_loopback(qp, &wc); - clear_bit(IPATH_S_BUSY, &qp->s_flags); - goto bail; - } - - ohdr = &qp->s_hdr.u.oth; - if (qp->remote_ah_attr.ah_flags & IB_AH_GRH) - ohdr = &qp->s_hdr.u.l.oth; - -again: - /* Check for a constructed packet to be sent. */ - if (qp->s_hdrwords != 0) { - /* - * If no PIO bufs are available, return. - * An interrupt will call ipath_ib_piobufavail() - * when one is available. - */ - if (ipath_verbs_send(dev->dd, qp->s_hdrwords, - (u32 *) &qp->s_hdr, - qp->s_cur_size, - qp->s_cur_sge)) { - ipath_no_bufs_available(qp, dev); - goto bail; - } - dev->n_unicast_xmit++; - /* Record that we sent the packet and s_hdr is empty. */ - qp->s_hdrwords = 0; - } - - lrh0 = IPS_LRH_BTH; + if (!(ib_ipath_state_ops[qp->state] & IPATH_PROCESS_SEND_OK)) + goto done; + /* header size in 32-bit words LRH+BTH = (8+12)/4. */ hwords = 5; - - /* - * The lock is needed to synchronize between - * setting qp->s_ack_state and post_send(). - */ - spin_lock_irqsave(&qp->s_lock, flags); - - if (!(ib_ipath_state_ops[qp->state] & IPATH_PROCESS_SEND_OK)) - goto done; - - bth0 = ipath_layer_get_pkey(dev->dd, qp->s_pkey_index); - - /* Send a request. */ + bth0 = 0; + + /* Get the next send request. */ wqe = get_swqe_ptr(qp, qp->s_last); switch (qp->s_state) { default: /* - * Signal the completion of the last send (if there is - * one). + * Signal the completion of the last send + * (if there is one). */ if (qp->s_last != qp->s_tail) complete_last_send(qp, wqe, &wc); @@ -257,61 +207,16 @@ again: } break; } - bth2 = qp->s_next_psn++ & IPS_PSN_MASK; qp->s_len -= len; - bth0 |= qp->s_state << 24; - - spin_unlock_irqrestore(&qp->s_lock, flags); - - /* Construct the header. */ - extra_bytes = (4 - len) & 3; - nwords = (len + extra_bytes) >> 2; - if (unlikely(qp->remote_ah_attr.ah_flags & IB_AH_GRH)) { - /* Header size in 32-bit words. */ - hwords += 10; - lrh0 = IPS_LRH_GRH; - qp->s_hdr.u.l.grh.version_tclass_flow = - cpu_to_be32((6 << 28) | - (qp->remote_ah_attr.grh.traffic_class - << 20) | - qp->remote_ah_attr.grh.flow_label); - qp->s_hdr.u.l.grh.paylen = - cpu_to_be16(((hwords - 12) + nwords + - SIZE_OF_CRC) << 2); - /* next_hdr is defined by C8-7 in ch. 8.4.1 */ - qp->s_hdr.u.l.grh.next_hdr = 0x1B; - qp->s_hdr.u.l.grh.hop_limit = - qp->remote_ah_attr.grh.hop_limit; - /* The SGID is 32-bit aligned. */ - qp->s_hdr.u.l.grh.sgid.global.subnet_prefix = - dev->gid_prefix; - qp->s_hdr.u.l.grh.sgid.global.interface_id = - ipath_layer_get_guid(dev->dd); - qp->s_hdr.u.l.grh.dgid = qp->remote_ah_attr.grh.dgid; - } qp->s_hdrwords = hwords; qp->s_cur_sge = &qp->s_sge; qp->s_cur_size = len; - lrh0 |= qp->remote_ah_attr.sl << 4; - qp->s_hdr.lrh[0] = cpu_to_be16(lrh0); - /* DEST LID */ - qp->s_hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid); - qp->s_hdr.lrh[2] = cpu_to_be16(hwords + nwords + SIZE_OF_CRC); - qp->s_hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->dd)); - bth0 |= extra_bytes << 20; - ohdr->bth[0] = cpu_to_be32(bth0); - ohdr->bth[1] = cpu_to_be32(qp->remote_qpn); - ohdr->bth[2] = cpu_to_be32(bth2); - - /* Check for more work to do. */ - goto again; + *bth0p = bth0 | (qp->s_state << 24); + *bth2p = qp->s_next_psn++ & IPS_PSN_MASK; + return 1; done: - spin_unlock_irqrestore(&qp->s_lock, flags); - clear_bit(IPATH_S_BUSY, &qp->s_flags); - -bail: - return; + return 0; } /** @@ -535,12 +440,13 @@ void ipath_uc_rcv(struct ipath_ibdev *de if (qp->r_len != 0) { u32 rkey = be32_to_cpu(reth->rkey); u64 vaddr = be64_to_cpu(reth->vaddr); + int ok; /* Check rkey */ - if (unlikely(!ipath_rkey_ok( - dev, &qp->r_sge, qp->r_len, - vaddr, rkey, - IB_ACCESS_REMOTE_WRITE))) { + ok = ipath_rkey_ok(dev, &qp->r_sge, qp->r_len, + vaddr, rkey, + IB_ACCESS_REMOTE_WRITE); + if (unlikely(!ok)) { dev->n_pkt_drops++; goto done; } @@ -558,8 +464,7 @@ void ipath_uc_rcv(struct ipath_ibdev *de } if (opcode == OP(RDMA_WRITE_ONLY)) goto rdma_last; - else if (opcode == - OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE)) + else if (opcode == OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE)) goto rdma_last_imm; /* FALLTHROUGH */ case OP(RDMA_WRITE_MIDDLE): @@ -592,9 +497,9 @@ void ipath_uc_rcv(struct ipath_ibdev *de dev->n_pkt_drops++; goto done; } - if (qp->r_reuse_sge) { + if (qp->r_reuse_sge) qp->r_reuse_sge = 0; - } else if (!ipath_get_rwqe(qp, 1)) { + else if (!ipath_get_rwqe(qp, 1)) { dev->n_pkt_drops++; goto done; } diff -r 947e92f4b370 -r 201654fe1962 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 @@ -246,7 +246,7 @@ static int ipath_post_send(struct ib_qp switch (qp->ibqp.qp_type) { case IB_QPT_UC: case IB_QPT_RC: - err = ipath_post_rc_send(qp, wr); + err = ipath_post_ruc_send(qp, wr); break; case IB_QPT_SMI: diff -r 947e92f4b370 -r 201654fe1962 drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 @@ -585,10 +585,6 @@ void ipath_sqerror_qp(struct ipath_qp *q void ipath_get_credit(struct ipath_qp *qp, u32 aeth); -void ipath_do_rc_send(unsigned long data); - -void ipath_do_uc_send(unsigned long data); - void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int sig); int ipath_rkey_ok(struct ipath_ibdev *dev, struct ipath_sge_state *ss, @@ -601,7 +597,7 @@ void ipath_copy_sge(struct ipath_sge_sta void ipath_skip_sge(struct ipath_sge_state *ss, u32 length); -int ipath_post_rc_send(struct ipath_qp *qp, struct ib_send_wr *wr); +int ipath_post_ruc_send(struct ipath_qp *qp, struct ib_send_wr *wr); void ipath_uc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr, int has_grh, void *data, u32 tlen, struct ipath_qp *qp); @@ -683,7 +679,19 @@ void ipath_insert_rnr_queue(struct ipath int ipath_get_rwqe(struct ipath_qp *qp, int wr_id_only); -void ipath_ruc_loopback(struct ipath_qp *sqp, struct ib_wc *wc); +u32 ipath_make_grh(struct ipath_ibdev *dev, struct ib_grh *hdr, + struct ib_global_route *grh, u32 hwords, u32 nwords); + +void ipath_do_ruc_send(unsigned long data); + +u32 ipath_make_rc_ack(struct ipath_qp *qp, struct ipath_other_headers *ohdr, + u32 pmtu); + +int ipath_make_rc_req(struct ipath_qp *qp, struct ipath_other_headers *ohdr, + u32 pmtu, u32 *bth0p, u32 *bth2p); + +int ipath_make_uc_req(struct ipath_qp *qp, struct ipath_other_headers *ohdr, + u32 pmtu, u32 *bth0p, u32 *bth2p); extern const enum ib_wc_opcode ib_ipath_wc_opcode[]; From bos at pathscale.com Fri May 12 16:43:02 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:02 -0700 Subject: [openib-general] [PATCH 17 of 53] ipath - fail properly if GID missing In-Reply-To: Message-ID: Return -EINVAL if we can't find a multicast GID. Signed-off-by: Bryan O'Sullivan diff -r 176d1f0c26a3 -r c5f3731224bb drivers/infiniband/hw/ipath/ipath_verbs_mcast.c --- a/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c Fri May 12 15:55:28 2006 -0700 @@ -272,7 +272,7 @@ int ipath_multicast_detach(struct ib_qp while (1) { if (n == NULL) { spin_unlock_irqrestore(&mcast_lock, flags); - ret = 0; + ret = -EINVAL; goto bail; } From bos at pathscale.com Fri May 12 16:43:25 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:25 -0700 Subject: [openib-general] [PATCH 40 of 53] ipath - remember to drop spinlock In-Reply-To: Message-ID: <160a111381ae9f6f5df2.1147477405@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r 5b565c24d62a -r 160a111381ae drivers/infiniband/hw/ipath/ipath_rc.c --- a/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:29 2006 -0700 @@ -1505,8 +1505,10 @@ void ipath_rc_rcv(struct ipath_ibdev *de ok = ipath_rkey_ok(dev, &qp->s_rdma_sge, qp->s_rdma_len, vaddr, rkey, IB_ACCESS_REMOTE_READ); - if (unlikely(!ok)) + if (unlikely(!ok)) { + spin_unlock_irq(&qp->s_lock); goto nack_acc; + } /* * Update the next expected PSN. We add 1 later * below, so only add the remainder here. From bos at pathscale.com Fri May 12 16:43:10 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:10 -0700 Subject: [openib-general] [PATCH 25 of 53] ipath - remove some duplicated lines of code In-Reply-To: Message-ID: <2b7918a7133eafcc21bb.1147477390@eng-12.pathscale.com> Cosmetic fixes. Signed-off-by: Bryan O'Sullivan diff -r e468ad0bd83e -r 2b7918a7133e drivers/infiniband/hw/ipath/ipath_ht400.c --- a/drivers/infiniband/hw/ipath/ipath_ht400.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_ht400.c Fri May 12 15:55:28 2006 -0700 @@ -1555,7 +1555,6 @@ void ipath_init_ht400_funcs(struct ipath dd->ipath_f_reset = ipath_setup_ht_reset; dd->ipath_f_get_boardname = ipath_ht_boardname; dd->ipath_f_init_hwerrors = ipath_ht_init_hwerrors; - dd->ipath_f_init_hwerrors = ipath_ht_init_hwerrors; dd->ipath_f_early_init = ipath_ht_early_init; dd->ipath_f_handle_hwerrors = ipath_ht_handle_hwerrors; dd->ipath_f_quiet_serdes = ipath_ht_quiet_serdes; diff -r e468ad0bd83e -r 2b7918a7133e drivers/infiniband/hw/ipath/ipath_qp.c --- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700 @@ -513,9 +513,6 @@ int ipath_modify_qp(struct ib_qp *ibqp, if (attr_mask & IB_QP_QKEY) qp->qkey = attr->qkey; - if (attr_mask & IB_QP_PKEY_INDEX) - qp->s_pkey_index = attr->pkey_index; - qp->state = new_state; spin_unlock_irqrestore(&qp->s_lock, flags); From bos at pathscale.com Fri May 12 16:43:34 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:34 -0700 Subject: [openib-general] [PATCH 49 of 53] ipath - NULL-terminate pci_device_id table In-Reply-To: Message-ID: <40532fdc53f0f1befcd7.1147477414@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r 49b446b12f16 -r 40532fdc53f0 drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:29 2006 -0700 @@ -120,6 +120,7 @@ static const struct pci_device_id ipath_ PCI_DEVICE_ID_INFINIPATH_HT)}, {PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_PE800)}, + {0} }; MODULE_DEVICE_TABLE(pci, ipath_pci_tbl); From bos at pathscale.com Fri May 12 16:43:21 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:21 -0700 Subject: [openib-general] [PATCH 36 of 53] ipath - count local link integrity errors In-Reply-To: Message-ID: Signed-off-by: Bryan O'Sullivan diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700 @@ -446,6 +446,8 @@ static int __devinit ipath_init_one(stru * by ipath_setup_htconfig. */ dd->ipath_flags = 0; + dd->ipath_lli_counter = 0; + dd->ipath_lli_errors = 0; if (dd->ipath_f_bus(dd, pdev)) ipath_dev_err(dd, "Failed to setup config space; " @@ -927,6 +929,18 @@ void ipath_kreceive(struct ipath_devdata "tlen=%x opcode=%x egridx=%x: %s\n", eflags, l, etype, tlen, bthbytes[0], ips_get_index((__le32 *) rc), emsg); + /* Count local link integrity errors. */ + if (eflags & (INFINIPATH_RHF_H_ICRCERR | + INFINIPATH_RHF_H_VCRCERR)) { + u8 n = (dd->ipath_ibcctrl >> + INFINIPATH_IBCC_PHYERRTHRESHOLD_SHIFT) & + INFINIPATH_IBCC_PHYERRTHRESHOLD_MASK; + + if (++dd->ipath_lli_counter > n) { + dd->ipath_lli_counter = 0; + dd->ipath_lli_errors++; + } + } } else if (etype == RCVHQ_RCV_TYPE_NON_KD) { int ret = __ipath_verbs_rcv(dd, rc + 1, ebuf, tlen); @@ -934,6 +948,8 @@ void ipath_kreceive(struct ipath_devdata ipath_cdbg(VERBOSE, "received IB packet, " "not SMA (QP=%x)\n", qp); + if (dd->ipath_lli_counter) + dd->ipath_lli_counter--; } else if (etype == RCVHQ_RCV_TYPE_EAGER) { if (qp == IPATH_KD_QP && bthbytes[0] == ipath_layer_rcv_opcode && @@ -1864,19 +1880,19 @@ static void __exit infinipath_cleanup(vo } else ipath_dbg("irq is 0, not doing free_irq " "for unit %u\n", dd->ipath_unit); + + /* + * we check for NULL here, because it's outside + * the kregbase check, and we need to call it + * after the free_irq. Thus it's possible that + * the function pointers were never initialized. + */ + if (dd->ipath_f_cleanup) + /* clean up chip-specific stuff */ + dd->ipath_f_cleanup(dd); + dd->pcidev = NULL; } - - /* - * we check for NULL here, because it's outside the kregbase - * check, and we need to call it after the free_irq. Thus - * it's possible that the function pointers were never - * initialized. - */ - if (dd->ipath_f_cleanup) - /* clean up chip-specific stuff */ - dd->ipath_f_cleanup(dd); - spin_lock_irqsave(&ipath_devs_lock, flags); } diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_intr.c --- a/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700 @@ -261,6 +261,7 @@ static void handle_e_ibstatuschanged(str | IPATH_LINKACTIVE | IPATH_LINKARMED); *dd->ipath_statusp &= ~IPATH_STATUS_IB_READY; + dd->ipath_lli_counter = 0; if (!noprint) { if (((dd->ipath_lastibcstat >> INFINIPATH_IBCS_LINKSTATE_SHIFT) & diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_kernel.h --- a/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:28 2006 -0700 @@ -509,6 +509,11 @@ struct ipath_devdata { u8 ipath_pci_cacheline; /* LID mask control */ u8 ipath_lmc; + + /* local link integrity counter */ + u32 ipath_lli_counter; + /* local link integrity errors */ + u32 ipath_lli_errors; }; diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_layer.c --- a/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:28 2006 -0700 @@ -1013,6 +1013,11 @@ int ipath_layer_get_counters(struct ipat ipath_snap_cntr(dd, dd->ipath_cregs->cr_ibsymbolerrcnt); cntrs->link_error_recovery_counter = ipath_snap_cntr(dd, dd->ipath_cregs->cr_iblinkerrrecovcnt); + /* + * The link downed counter counts when the other side downs the + * connection. We add in the number of times we downed the link + * due to local link integrity errors to compensate. + */ cntrs->link_downed_counter = ipath_snap_cntr(dd, dd->ipath_cregs->cr_iblinkdowncnt); cntrs->port_rcv_errors = @@ -1037,6 +1042,8 @@ int ipath_layer_get_counters(struct ipat ipath_snap_cntr(dd, dd->ipath_cregs->cr_pktsendcnt); cntrs->port_rcv_packets = ipath_snap_cntr(dd, dd->ipath_cregs->cr_pktrcvcnt); + cntrs->local_link_integrity_errors = dd->ipath_lli_errors; + cntrs->excessive_buffer_overrun_errors = 0; /* XXX */ ret = 0; diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_layer.h --- a/drivers/infiniband/hw/ipath/ipath_layer.h Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_layer.h Fri May 12 15:55:28 2006 -0700 @@ -54,6 +54,8 @@ struct ipath_layer_counters { u64 port_rcv_data; u64 port_xmit_packets; u64 port_rcv_packets; + u32 local_link_integrity_errors; + u32 excessive_buffer_overrun_errors; }; /* diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_mad.c --- a/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700 @@ -646,6 +646,8 @@ struct ib_pma_portcounters { #define IB_PMA_SEL_PORT_RCV_ERRORS __constant_htons(0x0008) #define IB_PMA_SEL_PORT_RCV_REMPHYS_ERRORS __constant_htons(0x0010) #define IB_PMA_SEL_PORT_XMIT_DISCARDS __constant_htons(0x0040) +#define IB_PMA_SEL_LOCAL_LINK_INTEGRITY_ERRORS __constant_htons(0x0200) +#define IB_PMA_SEL_EXCESSIVE_BUFFER_OVERRUNS __constant_htons(0x0400) #define IB_PMA_SEL_PORT_VL15_DROPPED __constant_htons(0x0800) #define IB_PMA_SEL_PORT_XMIT_DATA __constant_htons(0x1000) #define IB_PMA_SEL_PORT_RCV_DATA __constant_htons(0x2000) @@ -893,6 +895,10 @@ static int recv_pma_get_portcounters(str cntrs.port_rcv_data -= dev->n_port_rcv_data; cntrs.port_xmit_packets -= dev->n_port_xmit_packets; cntrs.port_rcv_packets -= dev->n_port_rcv_packets; + cntrs.local_link_integrity_errors -= + dev->z_local_link_integrity_errors; + cntrs.excessive_buffer_overrun_errors -= + dev->z_excessive_buffer_overrun_errors; memset(pmp->data, 0, sizeof(pmp->data)); @@ -930,6 +936,12 @@ static int recv_pma_get_portcounters(str else p->port_xmit_discards = cpu_to_be16((u16)cntrs.port_xmit_discards); + if (cntrs.local_link_integrity_errors > 0xFUL) + cntrs.local_link_integrity_errors = 0xFUL; + if (cntrs.excessive_buffer_overrun_errors > 0xFUL) + cntrs.excessive_buffer_overrun_errors = 0xFUL; + p->lli_ebor_errors = (cntrs.local_link_integrity_errors << 4) | + cntrs.excessive_buffer_overrun_errors; if (dev->n_vl15_dropped > 0xFFFFUL) p->vl15_dropped = __constant_cpu_to_be16(0xFFFF); else @@ -1028,6 +1040,14 @@ static int recv_pma_set_portcounters(str if (p->counter_select & IB_PMA_SEL_PORT_XMIT_DISCARDS) dev->n_port_xmit_discards = cntrs.port_xmit_discards; + if (p->counter_select & IB_PMA_SEL_LOCAL_LINK_INTEGRITY_ERRORS) + dev->z_local_link_integrity_errors = + cntrs.local_link_integrity_errors; + + if (p->counter_select & IB_PMA_SEL_EXCESSIVE_BUFFER_OVERRUNS) + dev->z_excessive_buffer_overrun_errors = + cntrs.excessive_buffer_overrun_errors; + if (p->counter_select & IB_PMA_SEL_PORT_VL15_DROPPED) dev->n_vl15_dropped = 0; diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 @@ -1046,6 +1046,10 @@ static void *ipath_register_ib_device(in idev->n_port_rcv_data = cntrs.port_rcv_data; idev->n_port_xmit_packets = cntrs.port_xmit_packets; idev->n_port_rcv_packets = cntrs.port_rcv_packets; + idev->z_local_link_integrity_errors = + cntrs.local_link_integrity_errors; + idev->z_excessive_buffer_overrun_errors = + cntrs.excessive_buffer_overrun_errors; /* * The system image GUID is supposed to be the same for all diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 @@ -459,6 +459,8 @@ struct ipath_ibdev { u64 n_port_xmit_packets; /* starting count for PMA */ u64 n_port_rcv_packets; /* starting count for PMA */ u32 n_pkey_violations; /* starting count for PMA */ + u32 z_local_link_integrity_errors; /* starting count for PMA */ + u32 z_excessive_buffer_overrun_errors; /* starting count for PMA */ u32 n_rc_resends; u32 n_rc_acks; u32 n_rc_qacks; From bos at pathscale.com Fri May 12 16:43:27 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:27 -0700 Subject: [openib-general] [PATCH 42 of 53] ipath - increment pointer properly when doing a diag read In-Reply-To: Message-ID: <0aba84dce5063d74c3da.1147477407@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r 83f1832c6015 -r 0aba84dce506 drivers/infiniband/hw/ipath/ipath_diag.c --- a/drivers/infiniband/hw/ipath/ipath_diag.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_diag.c Fri May 12 15:55:29 2006 -0700 @@ -113,7 +113,7 @@ static int ipath_read_umem64(struct ipat goto bail; } reg_addr++; - uaddr++; + uaddr += sizeof(u64); } ret = 0; bail: @@ -153,7 +153,7 @@ static int ipath_write_umem64(struct ipa writeq(data, reg_addr); reg_addr++; - uaddr++; + uaddr += sizeof(u64); } ret = 0; bail: @@ -191,7 +191,8 @@ static int ipath_read_umem32(struct ipat } reg_addr++; - uaddr++; + uaddr += sizeof(u32); + } ret = 0; bail: @@ -230,7 +231,7 @@ static int ipath_write_umem32(struct ipa writel(data, reg_addr); reg_addr++; - uaddr++; + uaddr += sizeof(u32); } ret = 0; bail: From bos at pathscale.com Fri May 12 16:43:24 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:24 -0700 Subject: [openib-general] [PATCH 39 of 53] ipath - count PE800 receive interrupts on user ports In-Reply-To: Message-ID: <5b565c24d62ad0e355ae.1147477404@eng-12.pathscale.com> Fixed so it works on the PE-800. It had not previously been updated to match PE-800 receive interrupt differences from HT-400. Signed-off-by: Bryan O'Sullivan diff -r e9306861dc6a -r 5b565c24d62a drivers/infiniband/hw/ipath/ipath_file_ops.c --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:28 2006 -0700 @@ -1172,6 +1172,10 @@ static unsigned int ipath_poll(struct fi if (tail == head) { set_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag); + if(dd->ipath_rhdrhead_intr_off) /* arm rcv interrupt */ + (void)ipath_write_ureg(dd, ur_rcvhdrhead, + dd->ipath_rhdrhead_intr_off + | head, pd->port_port); poll_wait(fp, &pd->port_wait, pt); if (test_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag)) { From bos at pathscale.com Fri May 12 16:43:36 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:36 -0700 Subject: [openib-general] [PATCH 51 of 53] ipath - fix reporting of vendor ID and a few other trivial bits In-Reply-To: Message-ID: <5f665c503f0d2c2a9a0e.1147477416@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r bd1de2e983db -r 5f665c503f0d drivers/infiniband/hw/ipath/ipath_layer.c --- a/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 16:41:45 2006 -0700 @@ -339,18 +339,26 @@ u32 ipath_layer_get_nguid(struct ipath_d EXPORT_SYMBOL_GPL(ipath_layer_get_nguid); -int ipath_layer_query_device(struct ipath_devdata *dd, u32 * vendor, - u32 * boardrev, u32 * majrev, u32 * minrev) -{ - *vendor = dd->ipath_vendorid; - *boardrev = dd->ipath_boardrev; - *majrev = dd->ipath_majrev; - *minrev = dd->ipath_minrev; - - return 0; -} - -EXPORT_SYMBOL_GPL(ipath_layer_query_device); +u32 ipath_layer_get_majrev(struct ipath_devdata *dd) +{ + return dd->ipath_majrev; +} + +EXPORT_SYMBOL_GPL(ipath_layer_get_majrev); + +u32 ipath_layer_get_minrev(struct ipath_devdata *dd) +{ + return dd->ipath_minrev; +} + +EXPORT_SYMBOL_GPL(ipath_layer_get_minrev); + +u32 ipath_layer_get_pcirev(struct ipath_devdata *dd) +{ + return dd->ipath_pcirev; +} + +EXPORT_SYMBOL_GPL(ipath_layer_get_pcirev); u32 ipath_layer_get_flags(struct ipath_devdata *dd) { @@ -372,6 +380,13 @@ u16 ipath_layer_get_deviceid(struct ipat } EXPORT_SYMBOL_GPL(ipath_layer_get_deviceid); + +u32 ipath_layer_get_vendorid(struct ipath_devdata *dd) +{ + return dd->ipath_vendorid; +} + +EXPORT_SYMBOL_GPL(ipath_layer_get_vendorid); u64 ipath_layer_get_lastibcstat(struct ipath_devdata *dd) { diff -r bd1de2e983db -r 5f665c503f0d drivers/infiniband/hw/ipath/ipath_layer.h --- a/drivers/infiniband/hw/ipath/ipath_layer.h Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_layer.h Fri May 12 16:41:45 2006 -0700 @@ -145,11 +145,13 @@ int ipath_layer_set_guid(struct ipath_de int ipath_layer_set_guid(struct ipath_devdata *, __be64 guid); __be64 ipath_layer_get_guid(struct ipath_devdata *); u32 ipath_layer_get_nguid(struct ipath_devdata *); -int ipath_layer_query_device(struct ipath_devdata *, u32 * vendor, - u32 * boardrev, u32 * majrev, u32 * minrev); +u32 ipath_layer_get_majrev(struct ipath_devdata *); +u32 ipath_layer_get_minrev(struct ipath_devdata *); +u32 ipath_layer_get_pcirev(struct ipath_devdata *); u32 ipath_layer_get_flags(struct ipath_devdata *dd); struct device *ipath_layer_get_device(struct ipath_devdata *dd); u16 ipath_layer_get_deviceid(struct ipath_devdata *dd); +u32 ipath_layer_get_vendorid(struct ipath_devdata *); u64 ipath_layer_get_lastibcstat(struct ipath_devdata *dd); u32 ipath_layer_get_ibmtu(struct ipath_devdata *dd); int ipath_layer_enable_timer(struct ipath_devdata *dd); diff -r bd1de2e983db -r 5f665c503f0d drivers/infiniband/hw/ipath/ipath_mad.c --- a/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 16:41:45 2006 -0700 @@ -84,7 +84,7 @@ static int recv_subn_get_nodeinfo(struct { struct nodeinfo *nip = (struct nodeinfo *)&smp->data; struct ipath_devdata *dd = to_idev(ibdev)->dd; - u32 vendor, boardid, majrev, minrev; + u32 vendor, majrev, minrev; if (smp->attr_mod) smp->status |= IB_SMP_INVALID_FIELD; @@ -104,9 +104,11 @@ static int recv_subn_get_nodeinfo(struct nip->port_guid = nip->sys_guid; nip->partition_cap = cpu_to_be16(ipath_layer_get_npkeys(dd)); nip->device_id = cpu_to_be16(ipath_layer_get_deviceid(dd)); - ipath_layer_query_device(dd, &vendor, &boardid, &majrev, &minrev); + majrev = ipath_layer_get_majrev(dd); + minrev = ipath_layer_get_minrev(dd); nip->revision = cpu_to_be32((majrev << 16) | minrev); nip->local_port_num = port; + vendor = ipath_layer_get_vendorid(dd); nip->vendor_id[0] = 0; nip->vendor_id[1] = vendor >> 8; nip->vendor_id[2] = vendor; diff -r bd1de2e983db -r 5f665c503f0d drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 16:41:45 2006 -0700 @@ -620,18 +620,15 @@ static int ipath_query_device(struct ib_ struct ib_device_attr *props) { struct ipath_ibdev *dev = to_idev(ibdev); - u32 vendor, boardrev, majrev, minrev; memset(props, 0, sizeof(*props)); props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR | IB_DEVICE_BAD_QKEY_CNTR | IB_DEVICE_SHUTDOWN_PORT | IB_DEVICE_SYS_IMAGE_GUID; - ipath_layer_query_device(dev->dd, &vendor, &boardrev, - &majrev, &minrev); - props->vendor_id = vendor; - props->vendor_part_id = boardrev; - props->hw_ver = boardrev << 16 | majrev << 8 | minrev; + props->vendor_id = ipath_layer_get_vendorid(dev->dd); + props->vendor_part_id = ipath_layer_get_deviceid(dev->dd); + props->hw_ver = ipath_layer_get_pcirev(dev->dd); props->sys_image_guid = dev->sys_image_guid; @@ -1220,11 +1217,8 @@ static ssize_t show_rev(struct class_dev { struct ipath_ibdev *dev = container_of(cdev, struct ipath_ibdev, ibdev.class_dev); - int vendor, boardrev, majrev, minrev; - - ipath_layer_query_device(dev->dd, &vendor, &boardrev, - &majrev, &minrev); - return sprintf(buf, "%d.%d\n", majrev, minrev); + + return sprintf(buf, "%x\n", ipath_layer_get_pcirev(dev->dd)); } static ssize_t show_hca(struct class_device *cdev, char *buf) @@ -1253,7 +1247,7 @@ static ssize_t show_stats(struct class_d len = sprintf(buf, "RC resends %d\n" "RC no QACK %d\n" - "RC ACKs %d\n" + "RC ACKs %d\n" "RC SEQ NAKs %d\n" "RC RDMA seq %d\n" "RC RNR NAKs %d\n" @@ -1263,7 +1257,7 @@ static ssize_t show_stats(struct class_d "piobuf wait %d\n" "no piobuf %d\n" "PKT drops %d\n" - "WQE errs %d\n", + "WQE errs %d\n", dev->n_rc_resends, dev->n_rc_qacks, dev->n_rc_acks, dev->n_seq_naks, dev->n_rdma_seq, dev->n_rnr_naks, dev->n_other_naks, dev->n_timeouts, From bos at pathscale.com Fri May 12 16:43:31 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:31 -0700 Subject: [openib-general] [PATCH 46 of 53] ipath - enable GPIO interrupt on HT-460 In-Reply-To: Message-ID: <04c86dd11b2780e114ab.1147477411@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r b41e576e5202 -r 04c86dd11b27 drivers/infiniband/hw/ipath/ipath_eeprom.c --- a/drivers/infiniband/hw/ipath/ipath_eeprom.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c Fri May 12 15:55:29 2006 -0700 @@ -505,11 +505,10 @@ static u8 flash_csum(struct ipath_flash * ipath_get_guid - get the GUID from the i2c device * @dd: the infinipath device * - * When we add the multi-chip support, we will probably have to add - * the ability to use the number of guids field, and get the guid from - * the first chip's flash, to use for all of them. - */ -void ipath_get_guid(struct ipath_devdata *dd) + * We have the capability to use the ipath_nguid field, and get + * the guid from the first chip's flash, to use for all of them. + */ +void ipath_get_eeprom_info(struct ipath_devdata *dd) { void *buf; struct ipath_flash *ifp; diff -r b41e576e5202 -r 04c86dd11b27 drivers/infiniband/hw/ipath/ipath_ht400.c --- a/drivers/infiniband/hw/ipath/ipath_ht400.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_ht400.c Fri May 12 15:55:29 2006 -0700 @@ -607,7 +607,12 @@ static int ipath_ht_boardname(struct ipa case 4: /* Ponderosa is one of the bringup boards */ n = "Ponderosa"; break; - case 5: /* HT-460 original production board */ + case 5: + /* + * HT-460 original production board; two production levels, with + * different serial number ranges. See ipath_ht_early_init() for + * case where we enable IPATH_GPIO_INTR for later serial # range. + */ n = "InfiniPath_HT-460"; break; case 6: @@ -1520,6 +1525,18 @@ static int ipath_ht_early_init(struct ip */ ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, INFINIPATH_S_ABORT); + + ipath_get_eeprom_info(dd); + if(dd->ipath_boardrev == 5 && dd->ipath_serial[0] == '1' && + dd->ipath_serial[1] == '2' && dd->ipath_serial[2] == '8') { + /* + * Later production HT-460 has same changes as HT-465, so + * can use GPIO interrupts. They have serial #'s starting + * with 128, rather than 112. + */ + dd->ipath_flags |= IPATH_GPIO_INTR; + dd->ipath_flags &= ~IPATH_POLL_RX_INTR; + } return 0; } diff -r b41e576e5202 -r 04c86dd11b27 drivers/infiniband/hw/ipath/ipath_init_chip.c --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:29 2006 -0700 @@ -857,7 +857,6 @@ int ipath_init_chip(struct ipath_devdata done: if (!ret) { - ipath_get_guid(dd); *dd->ipath_statusp |= IPATH_STATUS_CHIP_PRESENT; if (!dd->ipath_f_intrsetup(dd)) { /* now we can enable all interrupts from the chip */ diff -r b41e576e5202 -r 04c86dd11b27 drivers/infiniband/hw/ipath/ipath_kernel.h --- a/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:29 2006 -0700 @@ -646,7 +646,7 @@ void ipath_init_pe800_funcs(struct ipath void ipath_init_pe800_funcs(struct ipath_devdata *); /* init HT-400-specific func */ void ipath_init_ht400_funcs(struct ipath_devdata *); -void ipath_get_guid(struct ipath_devdata *); +void ipath_get_eeprom_info(struct ipath_devdata *); u64 ipath_snap_cntr(struct ipath_devdata *, ipath_creg); /* diff -r b41e576e5202 -r 04c86dd11b27 drivers/infiniband/hw/ipath/ipath_pe800.c --- a/drivers/infiniband/hw/ipath/ipath_pe800.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_pe800.c Fri May 12 15:55:29 2006 -0700 @@ -1180,6 +1180,8 @@ static int ipath_pe_early_init(struct ip */ dd->ipath_rhdrhead_intr_off = 1ULL<<32; + ipath_get_eeprom_info(dd); + return 0; } From bos at pathscale.com Fri May 12 16:43:16 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:16 -0700 Subject: [openib-general] [PATCH 31 of 53] ipath - forbid sending of bad packet sizes In-Reply-To: Message-ID: <4868daa7f215e1546295.1147477396@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r b098b021b6fd -r 4868daa7f215 drivers/infiniband/hw/ipath/ipath_ud.c --- a/drivers/infiniband/hw/ipath/ipath_ud.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_ud.c Fri May 12 15:55:28 2006 -0700 @@ -273,6 +273,11 @@ int ipath_post_ud_send(struct ipath_qp * } len += wr->sg_list[i].length; ss.num_sge++; + } + /* Check for invalid packet size. */ + if (len > ipath_layer_get_ibmtu(dev->dd)) { + ret = -EINVAL; + goto bail; } extra_bytes = (4 - len) & 3; nwords = (len + extra_bytes) >> 2; From bos at pathscale.com Fri May 12 16:43:38 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:38 -0700 Subject: [openib-general] [PATCH 53 of 53] ipath - add memory barrier when waiting for writes In-Reply-To: Message-ID: Signed-off-by: Bryan O'Sullivan diff -r fd9bdeea5b10 -r f8ebb8c1e436 drivers/infiniband/hw/ipath/ipath_eeprom.c --- a/drivers/infiniband/hw/ipath/ipath_eeprom.c Fri May 12 16:42:39 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c Fri May 12 16:42:39 2006 -0700 @@ -185,6 +185,7 @@ bail: */ static void i2c_wait_for_writes(struct ipath_devdata *dd) { + mb(); (void)ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); } From bos at pathscale.com Fri May 12 16:43:17 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:17 -0700 Subject: [openib-general] [PATCH 32 of 53] ipath - fix NULL dereference during cleanup In-Reply-To: Message-ID: Fix NULL deref due to pcidev being clobbered before dd->ipath_f_cleanup() was called. Signed-off-by: Bryan O'Sullivan diff -r 4868daa7f215 -r b9fd1a46c910 drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700 @@ -1897,19 +1897,19 @@ static void __exit infinipath_cleanup(vo } else ipath_dbg("irq is 0, not doing free_irq " "for unit %u\n", dd->ipath_unit); + + /* + * we check for NULL here, because it's outside + * the kregbase check, and we need to call it + * after the free_irq. Thus it's possible that + * the function pointers were never initialized. + */ + if (dd->ipath_f_cleanup) + /* clean up chip-specific stuff */ + dd->ipath_f_cleanup(dd); + dd->pcidev = NULL; } - - /* - * we check for NULL here, because it's outside the kregbase - * check, and we need to call it after the free_irq. Thus - * it's possible that the function pointers were never - * initialized. - */ - if (dd->ipath_f_cleanup) - /* clean up chip-specific stuff */ - dd->ipath_f_cleanup(dd); - spin_lock_irqsave(&ipath_devs_lock, flags); } From bos at pathscale.com Fri May 12 16:43:26 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:26 -0700 Subject: [openib-general] [PATCH 41 of 53] ipath - disable interrupts while holding spinlock in RWQE get In-Reply-To: Message-ID: <83f1832c601594846868.1147477406@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r 160a111381ae -r 83f1832c6015 drivers/infiniband/hw/ipath/ipath_ruc.c --- a/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:29 2006 -0700 @@ -171,12 +171,13 @@ int ipath_get_rwqe(struct ipath_qp *qp, n = rq->head - rq->tail; if (n < srq->limit) { srq->limit = 0; - spin_unlock(&rq->lock); + spin_unlock_irqrestore(&rq->lock, flags); ev.device = qp->ibqp.device; ev.element.srq = qp->ibqp.srq; ev.event = IB_EVENT_SRQ_LIMIT_REACHED; srq->ibsrq.event_handler(&ev, srq->ibsrq.srq_context); + spin_lock_irqsave(&rq->lock, flags); } } done: From bos at pathscale.com Fri May 12 16:43:18 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:18 -0700 Subject: [openib-general] [PATCH 33 of 53] ipath - clean up some comments In-Reply-To: Message-ID: <5ddaf7c07cdf82fedd4d.1147477398@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r b9fd1a46c910 -r 5ddaf7c07cdf drivers/infiniband/hw/ipath/ipath_kernel.h --- a/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:28 2006 -0700 @@ -720,13 +720,8 @@ u64 ipath_read_kreg64_port(const struct * @port: port number * * Return the contents of a register that is virtualized to be per port. - * Prints a debug message and returns -1 on errors (not distinguishable from - * valid contents at runtime; we may add a separate error variable at some - * point). - * - * This is normally not used by the kernel, but may be for debugging, and - * has a different implementation than user mode, which is why it's not in - * _common.h. + * Returns -1 on errors (not distinguishable from valid contents at + * runtime; we may add a separate error variable at some point). */ static inline u32 ipath_read_ureg32(const struct ipath_devdata *dd, ipath_ureg regno, int port) From bos at pathscale.com Fri May 12 16:43:33 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:33 -0700 Subject: [openib-general] [PATCH 48 of 53] ipath - QP should ignore receive queue size if SRQ specified In-Reply-To: Message-ID: <49b446b12f1698106975.1147477413@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r a1615956e57f -r 49b446b12f16 drivers/infiniband/hw/ipath/ipath_qp.c --- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:29 2006 -0700 @@ -684,16 +684,22 @@ struct ib_qp *ipath_create_qp(struct ib_ ret = ERR_PTR(-ENOMEM); goto bail; } - qp->r_rq.size = init_attr->cap.max_recv_wr + 1; - sz = sizeof(struct ipath_sge) * - init_attr->cap.max_recv_sge + - sizeof(struct ipath_rwqe); - qp->r_rq.wq = vmalloc(qp->r_rq.size * sz); - if (!qp->r_rq.wq) { - kfree(qp); - vfree(swq); - ret = ERR_PTR(-ENOMEM); - goto bail; + if (init_attr->srq) { + qp->r_rq.size = 0; + qp->r_rq.max_sge = 0; + qp->r_rq.wq = NULL; + } else { + qp->r_rq.size = init_attr->cap.max_recv_wr + 1; + qp->r_rq.max_sge = init_attr->cap.max_recv_sge; + sz = (sizeof(struct ipath_sge) * qp->r_rq.max_sge) + + sizeof(struct ipath_rwqe); + qp->r_rq.wq = vmalloc(qp->r_rq.size * sz); + if (!qp->r_rq.wq) { + kfree(qp); + vfree(swq); + ret = ERR_PTR(-ENOMEM); + goto bail; + } } /* @@ -712,7 +718,6 @@ struct ib_qp *ipath_create_qp(struct ib_ qp->s_wq = swq; qp->s_size = init_attr->cap.max_send_wr + 1; qp->s_max_sge = init_attr->cap.max_send_sge; - qp->r_rq.max_sge = init_attr->cap.max_recv_sge; qp->s_flags = init_attr->sq_sig_type == IB_SIGNAL_REQ_WR ? 1 << IPATH_S_SIGNAL_REQ_WR : 0; dev = to_idev(ibpd->device); From bos at pathscale.com Fri May 12 16:43:30 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:30 -0700 Subject: [openib-general] [PATCH 45 of 53] ipath - fix memory leak when create of QP fails In-Reply-To: Message-ID: Signed-off-by: Bryan O'Sullivan diff -r 28d938eb0463 -r b41e576e5202 drivers/infiniband/hw/ipath/ipath_init_chip.c --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:29 2006 -0700 @@ -114,6 +114,7 @@ static int create_port0_egr(struct ipath "eager TID %u\n", e); while (e != 0) dev_kfree_skb(skbs[--e]); + vfree(skbs); ret = -ENOMEM; goto bail; } From bos at pathscale.com Fri May 12 16:43:14 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:14 -0700 Subject: [openib-general] [PATCH 29 of 53] ipath - remove redundant register read In-Reply-To: Message-ID: <23519e578bf04f4582ab.1147477394@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r 47f1df66d097 -r 23519e578bf0 drivers/infiniband/hw/ipath/ipath_intr.c --- a/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700 @@ -780,7 +780,6 @@ irqreturn_t ipath_intr(int irq, void *da ipath_stats.sps_fastrcvint++; goto done; } - istat = ipath_read_kreg32(dd, dd->ipath_kregs->kr_intstatus); } istat = ipath_read_kreg32(dd, dd->ipath_kregs->kr_intstatus); From bos at pathscale.com Fri May 12 16:43:37 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:37 -0700 Subject: [openib-general] [PATCH 52 of 53] ipath - register as IB device owner In-Reply-To: Message-ID: Signed-off-by: Bryan O'Sullivan diff -r 5f665c503f0d -r fd9bdeea5b10 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 16:41:45 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 16:42:39 2006 -0700 @@ -1060,6 +1060,7 @@ static void *ipath_register_ib_device(in idev->dd = dd; strlcpy(dev->name, "ipath%d", IB_DEVICE_NAME_MAX); + dev->owner = THIS_MODULE; dev->node_guid = ipath_layer_get_guid(dd); dev->uverbs_abi_ver = IPATH_UVERBS_ABI_VERSION; dev->uverbs_cmd_mask = From bos at pathscale.com Fri May 12 16:43:13 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:13 -0700 Subject: [openib-general] [PATCH 28 of 53] ipath - forbid setting of invalid MLID In-Reply-To: Message-ID: <47f1df66d0979b655d01.1147477393@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r 551966b88d7c -r 47f1df66d097 drivers/infiniband/hw/ipath/ipath_sysfs.c --- a/drivers/infiniband/hw/ipath/ipath_sysfs.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c Fri May 12 15:55:28 2006 -0700 @@ -221,7 +221,7 @@ static ssize_t store_mlid(struct device int ret; ret = ipath_parse_ushort(buf, &mlid); - if (ret < 0) + if (ret < 0 || mlid < 0xc000) goto invalid; unit = dd->ipath_unit; From bos at pathscale.com Fri May 12 16:43:23 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:23 -0700 Subject: [openib-general] [PATCH 38 of 53] ipath - SRQ compliance checks In-Reply-To: Message-ID: We were not rigorous enough in checking SRQs. Signed-off-by: Bryan O'Sullivan diff -r f8debae94d44 -r e9306861dc6a drivers/infiniband/hw/ipath/ipath_srq.c --- a/drivers/infiniband/hw/ipath/ipath_srq.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_srq.c Fri May 12 15:55:28 2006 -0700 @@ -125,11 +125,23 @@ struct ib_srq *ipath_create_srq(struct i struct ib_srq_init_attr *srq_init_attr, struct ib_udata *udata) { + struct ipath_ibdev *dev = to_idev(ibpd->device); struct ipath_srq *srq; u32 sz; struct ib_srq *ret; - if (srq_init_attr->attr.max_sge < 1) { + if (dev->n_srqs_allocated == ib_ipath_max_srqs) { + ret = ERR_PTR(-ENOMEM); + goto bail; + } + + if (srq_init_attr->attr.max_wr == 0) { + ret = ERR_PTR(-EINVAL); + goto bail; + } + + if ((srq_init_attr->attr.max_sge > ib_ipath_max_srq_sges) || + (srq_init_attr->attr.max_wr > ib_ipath_max_srq_wrs)) { ret = ERR_PTR(-EINVAL); goto bail; } @@ -164,6 +176,8 @@ struct ib_srq *ipath_create_srq(struct i ret = &srq->ibsrq; + dev->n_srqs_allocated++; + bail: return ret; } @@ -181,24 +195,26 @@ int ipath_modify_srq(struct ib_srq *ibsr unsigned long flags; int ret; - if (attr_mask & IB_SRQ_LIMIT) { - spin_lock_irqsave(&srq->rq.lock, flags); - srq->limit = attr->srq_limit; - spin_unlock_irqrestore(&srq->rq.lock, flags); - } + if (attr_mask & IB_SRQ_MAX_WR) + if ((attr->max_wr > ib_ipath_max_srq_wrs) || + (attr->max_sge > srq->rq.max_sge)) { + ret = -EINVAL; + goto bail; + } + + if (attr_mask & IB_SRQ_LIMIT) + if (attr->srq_limit >= srq->rq.size) { + ret = -EINVAL; + goto bail; + } + if (attr_mask & IB_SRQ_MAX_WR) { - u32 size = attr->max_wr + 1; struct ipath_rwqe *wq, *p; - u32 n; - u32 sz; - - if (attr->max_sge < srq->rq.max_sge) { - ret = -EINVAL; - goto bail; - } + u32 sz, size, n; sz = sizeof(struct ipath_rwqe) + attr->max_sge * sizeof(struct ipath_sge); + size = attr->max_wr + 1; wq = vmalloc(size * sz); if (!wq) { ret = -ENOMEM; @@ -242,6 +258,11 @@ int ipath_modify_srq(struct ib_srq *ibsr spin_unlock_irqrestore(&srq->rq.lock, flags); } + if (attr_mask & IB_SRQ_LIMIT) { + spin_lock_irqsave(&srq->rq.lock, flags); + srq->limit = attr->srq_limit; + spin_unlock_irqrestore(&srq->rq.lock, flags); + } ret = 0; bail: @@ -265,7 +286,9 @@ int ipath_destroy_srq(struct ib_srq *ibs int ipath_destroy_srq(struct ib_srq *ibsrq) { struct ipath_srq *srq = to_isrq(ibsrq); - + struct ipath_ibdev *dev = to_idev(ibsrq->device); + + dev->n_srqs_allocated--; vfree(srq->rq.wq); kfree(srq); diff -r f8debae94d44 -r e9306861dc6a drivers/infiniband/hw/ipath/ipath_verbs.h --- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700 @@ -436,6 +436,7 @@ struct ipath_ibdev { u32 n_pds_allocated; /* number of PDs allocated for device */ u32 n_ahs_allocated; /* number of AHs allocated for device */ u32 n_cqs_allocated; /* number of CQs allocated for device */ + u32 n_srqs_allocated; /* number of SRQs allocated for device */ u32 n_mcast_grps_allocated; /* number of mcast groups allocated */ u64 ipath_sword; /* total dwords sent (sample result) */ u64 ipath_rword; /* total dwords received (sample result) */ From bos at pathscale.com Fri May 12 16:43:12 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:12 -0700 Subject: [openib-general] [PATCH 27 of 53] ipath - fix accounting of data packets with bad VLs In-Reply-To: Message-ID: <551966b88d7c74827a54.1147477392@eng-12.pathscale.com> For better IB conformance. Signed-off-by: Bryan O'Sullivan diff -r 8e2d63833cf2 -r 551966b88d7c drivers/infiniband/hw/ipath/ipath_layer.c --- a/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:28 2006 -0700 @@ -1019,13 +1019,11 @@ int ipath_layer_get_counters(struct ipat ipath_snap_cntr(dd, dd->ipath_cregs->cr_rxdroppktcnt) + ipath_snap_cntr(dd, dd->ipath_cregs->cr_rcvovflcnt) + ipath_snap_cntr(dd, dd->ipath_cregs->cr_portovflcnt) + - ipath_snap_cntr(dd, dd->ipath_cregs->cr_errrcvflowctrlcnt) + ipath_snap_cntr(dd, dd->ipath_cregs->cr_err_rlencnt) + ipath_snap_cntr(dd, dd->ipath_cregs->cr_invalidrlencnt) + ipath_snap_cntr(dd, dd->ipath_cregs->cr_erricrccnt) + ipath_snap_cntr(dd, dd->ipath_cregs->cr_errvcrccnt) + ipath_snap_cntr(dd, dd->ipath_cregs->cr_errlpcrccnt) + - ipath_snap_cntr(dd, dd->ipath_cregs->cr_errlinkcnt) + ipath_snap_cntr(dd, dd->ipath_cregs->cr_badformatcnt); cntrs->port_rcv_remphys_errors = ipath_snap_cntr(dd, dd->ipath_cregs->cr_rcvebpcnt); diff -r 8e2d63833cf2 -r 551966b88d7c drivers/infiniband/hw/ipath/ipath_mad.c --- a/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700 @@ -1316,32 +1316,8 @@ int ipath_process_mad(struct ib_device * struct ib_wc *in_wc, struct ib_grh *in_grh, struct ib_mad *in_mad, struct ib_mad *out_mad) { - struct ipath_ibdev *dev = to_idev(ibdev); int ret; - /* - * Snapshot current HW counters to "clear" them. - * This should be done when the driver is loaded except that for - * some reason we get a zillion errors when brining up the link. - */ - if (dev->rcv_errors == 0) { - struct ipath_layer_counters cntrs; - - ipath_layer_get_counters(to_idev(ibdev)->dd, &cntrs); - dev->rcv_errors++; - dev->n_symbol_error_counter = cntrs.symbol_error_counter; - dev->n_link_error_recovery_counter = - cntrs.link_error_recovery_counter; - dev->n_link_downed_counter = cntrs.link_downed_counter; - dev->n_port_rcv_errors = cntrs.port_rcv_errors + 1; - dev->n_port_rcv_remphys_errors = - cntrs.port_rcv_remphys_errors; - dev->n_port_xmit_discards = cntrs.port_xmit_discards; - dev->n_port_xmit_data = cntrs.port_xmit_data; - dev->n_port_rcv_data = cntrs.port_rcv_data; - dev->n_port_xmit_packets = cntrs.port_xmit_packets; - dev->n_port_rcv_packets = cntrs.port_rcv_packets; - } switch (in_mad->mad_hdr.mgmt_class) { case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE: case IB_MGMT_CLASS_SUBN_LID_ROUTED: diff -r 8e2d63833cf2 -r 551966b88d7c drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700 @@ -981,6 +981,7 @@ static int ipath_verbs_register_sysfs(st */ static void *ipath_register_ib_device(int unit, struct ipath_devdata *dd) { + struct ipath_layer_counters cntrs; struct ipath_ibdev *idev; struct ib_device *dev; int ret; @@ -1030,6 +1031,21 @@ static void *ipath_register_ib_device(in idev->pma_counter_select[3] = IB_PMA_PORT_RCV_PKTS; idev->pma_counter_select[5] = IB_PMA_PORT_XMIT_WAIT; idev->link_width_enabled = 3; /* 1x or 4x */ + + /* Snapshot current HW counters to "clear" them. */ + ipath_layer_get_counters(dd, &cntrs); + idev->n_symbol_error_counter = cntrs.symbol_error_counter; + idev->n_link_error_recovery_counter = + cntrs.link_error_recovery_counter; + idev->n_link_downed_counter = cntrs.link_downed_counter; + idev->n_port_rcv_errors = cntrs.port_rcv_errors; + idev->n_port_rcv_remphys_errors = + cntrs.port_rcv_remphys_errors; + idev->n_port_xmit_discards = cntrs.port_xmit_discards; + idev->n_port_xmit_data = cntrs.port_xmit_data; + idev->n_port_rcv_data = cntrs.port_rcv_data; + idev->n_port_xmit_packets = cntrs.port_xmit_packets; + idev->n_port_rcv_packets = cntrs.port_rcv_packets; /* * The system image GUID is supposed to be the same for all From bos at pathscale.com Fri May 12 16:43:28 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:28 -0700 Subject: [openib-general] [PATCH 43 of 53] ipath - fix memory leak when creating a QP fails In-Reply-To: Message-ID: <7634b2f0fc40d4998445.1147477408@eng-12.pathscale.com> Signed-off-by: Bryan O'Sullivan diff -r 0aba84dce506 -r 7634b2f0fc40 drivers/infiniband/hw/ipath/ipath_qp.c --- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:29 2006 -0700 @@ -680,6 +680,7 @@ struct ib_qp *ipath_create_qp(struct ib_ case IB_QPT_GSI: qp = kmalloc(sizeof(*qp), GFP_KERNEL); if (!qp) { + vfree(swq); ret = ERR_PTR(-ENOMEM); goto bail; } @@ -690,6 +691,7 @@ struct ib_qp *ipath_create_qp(struct ib_ qp->r_rq.wq = vmalloc(qp->r_rq.size * sz); if (!qp->r_rq.wq) { kfree(qp); + vfree(swq); ret = ERR_PTR(-ENOMEM); goto bail; } From bos at pathscale.com Fri May 12 16:43:35 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:35 -0700 Subject: [openib-general] [PATCH 50 of 53] ipath - reduce maximum table sizes In-Reply-To: Message-ID: Decrease the number of WRs and SGEs we support from 131071/255 to 16383/60. This decreases our maximum memory usage per QP from ~1800MB down to about 40MB. This is still a lot, but it's better than 2GB. Signed-off-by: Bryan O'Sullivan diff -r 40532fdc53f0 -r bd1de2e983db drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:29 2006 -0700 @@ -73,12 +73,12 @@ module_param_named(max_cqs, ib_ipath_max module_param_named(max_cqs, ib_ipath_max_cqs, uint, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(max_cqs, "Maximum number of completion queues to support"); -unsigned int ib_ipath_max_qp_wrs = 0x1FFFF; +unsigned int ib_ipath_max_qp_wrs = 0x3FFF; module_param_named(max_qp_wrs, ib_ipath_max_qp_wrs, uint, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(max_qp_wrs, "Maximum number of QP WRs to support"); -unsigned int ib_ipath_max_sges = 0xFF; +unsigned int ib_ipath_max_sges = 0x60; module_param_named(max_sges, ib_ipath_max_sges, uint, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(max_sges, "Maximum number of SGEs to support"); From bos at pathscale.com Fri May 12 16:43:32 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:32 -0700 Subject: [openib-general] [PATCH 47 of 53] ipath - fix problem with lost interrupts on HT-400 In-Reply-To: Message-ID: We can have a race clearing chip interrupt with another interrupt about to be delivered and can clear it before it is delivered on the GPIO workaround. By doing the extra check here for the in-memory tail register updating while we were doing earlier packets, we "almost" guarantee we have covered that case. Signed-off-by: Bryan O'Sullivan diff -r 04c86dd11b27 -r a1615956e57f drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:29 2006 -0700 @@ -858,7 +858,7 @@ void ipath_kreceive(struct ipath_devdata const u32 maxcnt = dd->ipath_rcvhdrcnt * rsize; /* words */ u32 etail = -1, l, hdrqtail; struct ips_message_header *hdr; - u32 eflags, i, etype, tlen, pkttot = 0, updegr=0; + u32 eflags, i, etype, tlen, pkttot = 0, updegr=0, reloop=0; static u64 totcalls; /* stats, may eventually remove */ char emsg[128]; @@ -873,12 +873,11 @@ void ipath_kreceive(struct ipath_devdata goto bail; l = dd->ipath_port0head; - if(l == (u32)le64_to_cpu(*dd->ipath_hdrqtailptr)) + hdrqtail = (u32)le64_to_cpu(*dd->ipath_hdrqtailptr); + if(l == hdrqtail) goto done; - /* read only once at start for performance */ - hdrqtail = (u32)le64_to_cpu(*dd->ipath_hdrqtailptr); - +reloop: for (i = 0; l != hdrqtail; i++) { u32 qp; u8 *bthbytes; @@ -1013,16 +1012,34 @@ void ipath_kreceive(struct ipath_devdata */ if(l == hdrqtail || (i && !(i&0xf))) { u64 lval; - if(l == hdrqtail) /* want interrupt only on last */ + if(l == hdrqtail) { + /* PE-800 interrupt only on last */ lval = dd->ipath_rhdrhead_intr_off | l; + } else lval = l; (void)ipath_write_ureg(dd, ur_rcvhdrhead, lval, 0); if(updegr) { - (void)ipath_write_ureg(dd, ur_rcvegrindexhead, + ipath_write_ureg(dd, ur_rcvegrindexhead, etail, 0); updegr = 0; } + } + } + if(!dd->ipath_rhdrhead_intr_off && !reloop) { + /* HT-400 workaround; we can have a race clearing chip + * interrupt with another interrupt about to be delivered, + * and can clear it before it is delivered on the GPIO + * workaround. By doing the extra check here for the + * in-memory tail register updating while we were doing + * earlier packets, we "almost" guarantee we have covered + * that case. + */ + u32 hqtail = (u32)le64_to_cpu(*dd->ipath_hdrqtailptr); + if(hqtail != hdrqtail) { + hdrqtail = hqtail; + reloop = 1; /* loop 1 extra time at most */ + goto reloop; } } diff -r 04c86dd11b27 -r a1615956e57f drivers/infiniband/hw/ipath/ipath_intr.c --- a/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:29 2006 -0700 @@ -761,13 +761,14 @@ static void handle_urcv(struct ipath_dev } + irqreturn_t ipath_intr(int irq, void *data, struct pt_regs *regs) { struct ipath_devdata *dd = data; u32 istat, chk0rcv = 0; ipath_err_t estat = 0; irqreturn_t ret; - u32 p0bits, oldhead; + u32 oldhead, curtail; static unsigned unexpected = 0; static const u32 port0rbits = (1U<ipath_port0head; - if (oldhead != (u32)le64_to_cpu(*dd->ipath_hdrqtailptr)) { + curtail = (u32)le64_to_cpu(*dd->ipath_hdrqtailptr); + if (oldhead != curtail) { if(dd->ipath_flags & IPATH_GPIO_INTR) { ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear, (u64) (1 << 2)); - p0bits = port0rbits | INFINIPATH_I_GPIO; + istat = port0rbits | INFINIPATH_I_GPIO; } else - p0bits = port0rbits; - ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, p0bits); + istat = port0rbits; + ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, istat); ipath_kreceive(dd); if(oldhead != dd->ipath_port0head) { ipath_stats.sps_fastrcvint++; @@ -827,7 +829,6 @@ irqreturn_t ipath_intr(int irq, void *da } istat = ipath_read_kreg32(dd, dd->ipath_kregs->kr_intstatus); - p0bits = port0rbits; if (unlikely(!istat)) { ipath_stats.sps_nullintr++; @@ -890,19 +891,19 @@ irqreturn_t ipath_intr(int irq, void *da else { /* Clear GPIO status bit 2 */ ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear, - (u64) (1 << 2)); - p0bits |= INFINIPATH_I_GPIO; + (u64) (1 << 2)); chk0rcv = 1; } } - chk0rcv |= istat & p0bits; - - /* - * clear the ones we will deal with on this round - * We clear it early, mostly for receive interrupts, so we - * know the chip will have seen this by the time we process - * the queue, and will re-interrupt if necessary. The processor - * itself won't take the interrupt again until we return. + chk0rcv |= istat & port0rbits; + + /* + * Clear the interrupt bits we found set, unless they are receive + * related, in which case we already cleared them above, and don't + * want to clear them again, because we might lose an interrupt. + * Clear it early, so we "know" know the chip will have seen this by + * the time we process the queue, and will re-interrupt if necessary. + * The processor itself won't take the interrupt again until we return. */ ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, istat); From bos at pathscale.com Fri May 12 16:43:29 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Fri, 12 May 2006 16:43:29 -0700 Subject: [openib-general] [PATCH 44 of 53] ipath - allow diags on any unit In-Reply-To: Message-ID: <28d938eb04630e0d1c41.1147477409@eng-12.pathscale.com> Previously, we hardwired all diags to unit 0. Signed-off-by: Bryan O'Sullivan diff -r 7634b2f0fc40 -r 28d938eb0463 drivers/infiniband/hw/ipath/ipath_diag.c --- a/drivers/infiniband/hw/ipath/ipath_diag.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_diag.c Fri May 12 15:55:29 2006 -0700 @@ -66,18 +66,20 @@ static struct file_operations diag_file_ .release = ipath_diag_release }; -static struct cdev *diag_cdev; -static struct class_device *diag_class_dev; - -int ipath_diag_init(void) -{ - return ipath_cdev_init(IPATH_DIAG_MINOR, "ipath_diag", - &diag_file_ops, &diag_cdev, &diag_class_dev); -} - -void ipath_diag_cleanup(void) -{ - ipath_cdev_cleanup(&diag_cdev, &diag_class_dev); +int ipath_diag_add(struct ipath_devdata *dd) +{ + char name[16]; + + snprintf(name, sizeof(name), "ipath_diag%d", dd->ipath_unit); + + return ipath_cdev_init(IPATH_DIAG_MINOR_BASE + dd->ipath_unit, name, + &diag_file_ops, &dd->diag_cdev, + &dd->diag_class_dev); +} + +void ipath_diag_remove(struct ipath_devdata *dd) +{ + ipath_cdev_cleanup(&dd->diag_cdev, &dd->diag_class_dev); } /** @@ -101,8 +103,7 @@ static int ipath_read_umem64(struct ipat int ret; /* not very efficient, but it works for now */ - if (reg_addr < dd->ipath_kregbase || - reg_end > dd->ipath_kregend) { + if (reg_addr < dd->ipath_kregbase || reg_end > dd->ipath_kregend) { ret = -EINVAL; goto bail; } @@ -139,8 +140,7 @@ static int ipath_write_umem64(struct ipa int ret; /* not very efficient, but it works for now */ - if (reg_addr < dd->ipath_kregbase || - reg_end > dd->ipath_kregend) { + if (reg_addr < dd->ipath_kregbase || reg_end > dd->ipath_kregend) { ret = -EINVAL; goto bail; } @@ -240,59 +240,45 @@ bail: static int ipath_diag_open(struct inode *in, struct file *fp) { + int unit = iminor(in) - IPATH_DIAG_MINOR_BASE; struct ipath_devdata *dd; - int unit = 0; /* XXX this is bogus */ - unsigned long flags; - int ret; - - dd = ipath_lookup(unit); + int ret; mutex_lock(&ipath_mutex); - spin_lock_irqsave(&ipath_devs_lock, flags); if (ipath_diag_inuse) { ret = -EBUSY; goto bail; } - list_for_each_entry(dd, &ipath_dev_list, ipath_list) { - /* - * we need at least one infinipath device to be present - * (don't use INITTED, because we want to be able to open - * even if device is in freeze mode, which cleared INITTED). - * There is a small amount of risk to this, which is why we - * also verify kregbase is set. - */ - - if (!(dd->ipath_flags & IPATH_PRESENT) || - !dd->ipath_kregbase) - continue; - - ipath_diag_inuse = 1; - diag_set_link = 0; - ret = 0; - goto bail; - } - - ret = -ENODEV; - -bail: - spin_unlock_irqrestore(&ipath_devs_lock, flags); + dd = ipath_lookup(unit); + + if (dd == NULL || !(dd->ipath_flags & IPATH_PRESENT) || + !dd->ipath_kregbase) { + ret = -ENODEV; + goto bail; + } + + fp->private_data = dd; + ipath_diag_inuse = 1; + diag_set_link = 0; + ret = 0; /* Only expose a way to reset the device if we make it into diag mode. */ - if (ret == 0) - ipath_expose_reset(&dd->pcidev->dev); - + ipath_expose_reset(&dd->pcidev->dev); + +bail: mutex_unlock(&ipath_mutex); return ret; } -static int ipath_diag_release(struct inode *i, struct file *f) +static int ipath_diag_release(struct inode *in, struct file *fp) { mutex_lock(&ipath_mutex); ipath_diag_inuse = 0; + fp->private_data = NULL; mutex_unlock(&ipath_mutex); return 0; } @@ -300,16 +286,9 @@ static ssize_t ipath_diag_read(struct fi static ssize_t ipath_diag_read(struct file *fp, char __user *data, size_t count, loff_t *off) { - int unit = 0; /* XXX provide for reads on other units some day */ - struct ipath_devdata *dd; + struct ipath_devdata *dd = fp->private_data; void __iomem *kreg_base; ssize_t ret; - - dd = ipath_lookup(unit); - if (!dd) { - ret = -ENODEV; - goto bail; - } kreg_base = dd->ipath_kregbase; @@ -329,23 +308,16 @@ static ssize_t ipath_diag_read(struct fi ret = count; } -bail: return ret; } static ssize_t ipath_diag_write(struct file *fp, const char __user *data, size_t count, loff_t *off) { - int unit = 0; /* XXX this is bogus */ - struct ipath_devdata *dd; + struct ipath_devdata *dd = fp->private_data; void __iomem *kreg_base; ssize_t ret; - dd = ipath_lookup(unit); - if (!dd) { - ret = -ENODEV; - goto bail; - } kreg_base = dd->ipath_kregbase; if (count == 0) @@ -364,6 +336,6 @@ static ssize_t ipath_diag_write(struct f ret = count; } -bail: - return ret; -} + return ret; +} + diff -r 7634b2f0fc40 -r 28d938eb0463 drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:29 2006 -0700 @@ -488,6 +488,7 @@ static int __devinit ipath_init_one(stru ipath_device_create_group(&pdev->dev, dd); ipathfs_add_device(dd); ipath_user_add(dd); + ipath_diag_add(dd); ipath_layer_add(dd); goto bail; @@ -517,8 +518,9 @@ static void __devexit ipath_remove_one(s return; dd = pci_get_drvdata(pdev); - ipath_layer_del(dd); - ipath_user_del(dd); + ipath_layer_remove(dd); + ipath_diag_remove(dd); + ipath_user_remove(dd); ipathfs_remove_device(dd); ipath_device_remove_group(&pdev->dev, dd); ipath_cdbg(VERBOSE, "Releasing pci memory regions, dd %p, " diff -r 7634b2f0fc40 -r 28d938eb0463 drivers/infiniband/hw/ipath/ipath_file_ops.c --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:29 2006 -0700 @@ -1390,16 +1390,16 @@ done: static int ipath_open(struct inode *in, struct file *fp) { - int ret, minor; + int ret, user_minor; mutex_lock(&ipath_mutex); - minor = iminor(in); + user_minor = iminor(in) - IPATH_USER_MINOR_BASE; ipath_cdbg(VERBOSE, "open on dev %lx (minor %d)\n", - (long)in->i_rdev, minor); - - if (minor) - ret = find_free_port(minor - 1, fp); + (long)in->i_rdev, user_minor); + + if (user_minor) + ret = find_free_port(user_minor - 1, fp); else ret = find_best_unit(fp); @@ -1799,19 +1799,13 @@ int ipath_user_add(struct ipath_devdata "error %d\n", -ret); goto bail; } - ret = ipath_diag_init(); - if (ret < 0) { - ipath_dev_err(dd, "Unable to set up diag support: " - "error %d\n", -ret); - goto bail_sma; - } ret = init_cdev(0, "ipath", &ipath_file_ops, &wildcard_cdev, &wildcard_class_dev); if (ret < 0) { ipath_dev_err(dd, "Could not create wildcard " "minor: error %d\n", -ret); - goto bail_diag; + goto bail_sma; } atomic_set(&user_setup, 1); @@ -1820,31 +1814,28 @@ int ipath_user_add(struct ipath_devdata snprintf(name, sizeof(name), "ipath%d", dd->ipath_unit); ret = init_cdev(dd->ipath_unit + 1, name, &ipath_file_ops, - &dd->cdev, &dd->class_dev); + &dd->user_cdev, &dd->user_class_dev); if (ret < 0) ipath_dev_err(dd, "Could not create user minor %d, %s\n", dd->ipath_unit + 1, name); goto bail; -bail_diag: - ipath_diag_cleanup(); bail_sma: user_cleanup(); bail: return ret; } -void ipath_user_del(struct ipath_devdata *dd) -{ - cleanup_cdev(&dd->cdev, &dd->class_dev); +void ipath_user_remove(struct ipath_devdata *dd) +{ + cleanup_cdev(&dd->user_cdev, &dd->user_class_dev); if (atomic_dec_return(&user_count) == 0) { if (atomic_read(&user_setup) == 0) goto bail; cleanup_cdev(&wildcard_cdev, &wildcard_class_dev); - ipath_diag_cleanup(); user_cleanup(); atomic_set(&user_setup, 0); diff -r 7634b2f0fc40 -r 28d938eb0463 drivers/infiniband/hw/ipath/ipath_kernel.h --- a/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:29 2006 -0700 @@ -347,8 +347,10 @@ struct ipath_devdata { char *ipath_freezemsg; /* pci access data structure */ struct pci_dev *pcidev; - struct cdev *cdev; - struct class_device *class_dev; + struct cdev *user_cdev; + struct cdev *diag_cdev; + struct class_device *user_class_dev; + struct class_device *diag_class_dev; /* timer used to prevent stats overflow, error throttling, etc. */ struct timer_list ipath_stats_timer; /* check for stale messages in rcv queue */ @@ -531,7 +533,7 @@ extern int __ipath_verbs_rcv(struct ipat extern int __ipath_verbs_rcv(struct ipath_devdata *, void *, void *, u32); void ipath_layer_add(struct ipath_devdata *); -void ipath_layer_del(struct ipath_devdata *); +void ipath_layer_remove(struct ipath_devdata *); int ipath_init_chip(struct ipath_devdata *, int); int ipath_enable_wc(struct ipath_devdata *dd); @@ -545,14 +547,14 @@ void ipath_cdev_cleanup(struct cdev **cd void ipath_cdev_cleanup(struct cdev **cdevp, struct class_device **class_devp); -int ipath_diag_init(void); -void ipath_diag_cleanup(void); +int ipath_diag_add(struct ipath_devdata *); +void ipath_diag_remove(struct ipath_devdata *); void ipath_diag_bringup_link(struct ipath_devdata *); extern wait_queue_head_t ipath_sma_state_wait; int ipath_user_add(struct ipath_devdata *dd); -void ipath_user_del(struct ipath_devdata *dd); +void ipath_user_remove(struct ipath_devdata *dd); struct sk_buff *ipath_alloc_skb(struct ipath_devdata *dd, gfp_t); @@ -831,9 +833,10 @@ extern struct mutex ipath_mutex; #define IPATH_DRV_NAME "ipath_core" #define IPATH_MAJOR 233 +#define IPATH_USER_MINOR_BASE 0 #define IPATH_SMA_MINOR 128 -#define IPATH_DIAG_MINOR 129 -#define IPATH_NMINORS 130 +#define IPATH_DIAG_MINOR_BASE 129 +#define IPATH_NMINORS 255 #define ipath_dev_err(dd,fmt,...) \ do { \ diff -r 7634b2f0fc40 -r 28d938eb0463 drivers/infiniband/hw/ipath/ipath_layer.c --- a/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:29 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:29 2006 -0700 @@ -402,7 +402,7 @@ void ipath_layer_add(struct ipath_devdat mutex_unlock(&ipath_layer_mutex); } -void ipath_layer_del(struct ipath_devdata *dd) +void ipath_layer_remove(struct ipath_devdata *dd) { mutex_lock(&ipath_layer_mutex); From hycsw at ca.sandia.gov Fri May 12 17:21:21 2006 From: hycsw at ca.sandia.gov (Helen Chen) Date: Fri, 12 May 2006 17:21:21 -0700 (PDT) Subject: [openib-general][patch review] srp: fmr implementation, Message-ID: <200605130021.RAA18736@ca.sandia.gov> Hi Vu, I am very excited to hear that you are archeiving such impressive SRP performance. We are evaluating the Engenio SRP and are having trouble with the SRP initiator using the srp stack from the openfabric-1.0rc4 revision. We are able to discover the SRP LUN's, but can't seem to complete mke2fs with the following dmesg. Dave Ellis claims that you had successfully conducted the interoperability test with the Engenip storage a while ago, and may be able to lend us a hand in trouble shooting. Is this a configuration issue or protocol compatibility problem? We'd really appreciate your taking time out of your busy schedule to help. Sincerely, Helen Chen ........ printk: 97 messages suppressed. ib_srp: Target has req_lim 0 printk: 147 messages suppressed. ib_srp: Target has req_lim 0 printk: 104 messages suppressed. ib_srp: Target has req_lim 0 printk: 103 messages suppressed. ib_srp: Target has req_lim 0 printk: 119 messages suppressed. ib_srp: Target has req_lim 0 printk: 146 messages suppressed. ib_srp: Target has req_lim 0 printk: 65 messages suppressed. ib_srp: Target has req_lim 0 printk: 90 messages suppressed. ib_srp: Target has req_lim 0 printk: 139 messages suppressed. ib_srp: Target has req_lim 0 printk: 116 messages suppressed. ib_srp: Target has req_lim 0 printk: 133 messages suppressed. ib_srp: Target has req_lim 0 -----Original Message----- >From openib-general-bounces at openib.org Wed May 10 08:37:04 2006 >Roland Dreier wrote: >> BTW, does Mellanox (or anyone else) have any numbers showing that >> using FMRs makes any difference in performance on a semi-realistic benchmark? >> > >I'm using xdd to test the performance >www.ioperformance.com/products.htm > >The target is Mellanox srp target reference implemenation >with 14 SATA spindles > >I can get ~780 MB/s max without FMRs and ~920 MB/s with FMRs >(using 256 KB sequential read direct IO request) > >Vu >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >tO UNSUBSCribe, please visit http://openib.org/mailman/listinfo/openib-general From tom at opengridcomputing.com Fri May 12 18:40:14 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 12 May 2006 20:40:14 -0500 Subject: [openib-general] RE: [PATCH][UVERBS][RFC] node type in ibv_context In-Reply-To: Message-ID: <20060513013648.011AB66CF4@mail.es335.com> -----Original Message----- From: Roland Dreier [mailto:rdreier at cisco.com] Sent: Thursday, May 11, 2006 12:01 AM To: Tom Tucker Cc: Sean Hefty; Roland Dreier; openib-general at openib.org Subject: Re: [PATCH][UVERBS][RFC] node type in ibv_context Tom> Yeah, I originally had it there, but I waffled because I was Tom> worried (no use case btw) if the type check was every in the Tom> performance path that it would involve one extra pointer Tom> dereference. That's a valid point. But that seems like a pathologically stupid app to me, to be honest. [tt] Yeah - but these apps seem to dominate my consciousness -- thus my history with start-ups... ;-) [tt] I'll move it -- no worries... - R. From zhushisongzhu at yahoo.com Sat May 13 05:02:11 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Sat, 13 May 2006 05:02:11 -0700 (PDT) Subject: [openib-general] Re: sdp can't support many connections (>2000) In-Reply-To: <20060510111355.GR21036@mellanox.co.il> Message-ID: <20060513120211.76841.qmail@web36904.mail.mud.yahoo.com> My OS: redhat EL-4.3 for X86_64 After I upgraded kernel-smp-2.6.9-34.OpenIB.6829.trunk.EL.root.x86_64.rpm kernel-smp-devel-2.6.9-34.OpenIB.6829.trunk.EL.root.x86_64.rpm and openib-usermode-6829-1.x86_64.rpm, there are some something wrong. (1) sdp connection failed. ./ttcp.aio -r ( on Server) ./ttcp.aio -t 193.12.10.14 (on Client) (2) I can't ping another machine through ib. After I run opensm once, I can. I stop it for a while , I can't. How can I set the IB for its work correctly? tks zhu --- "Michael S. Tsirkin" wrote: > Quoting r. zhu shi song : > > Subject: Re: sdp can't support many connections > (>2000) > > > > I can't get the latest source from " > > svn co https://openfabrics.org/svn/gen2" in one > whole > > day, it's so slow. > > I use openib.org/svn/gen2 but I expect its just a > redirection. > Hmm. We'll be putting up a tarball about Monday I > think. > > > Do you think the lastest source solve the > problem? > > It should. > > > Or > > can you test sdp for > 2000 concurrent > connections? > > tks > > zhu > > I'll try to go test it around next week, busy now. > > > --- "Michael S. Tsirkin" > wrote: > > > > > Quoting r. zhu shi song > : > > > > Subject: Re: sdp can't support many > connections > > > (>2000) > > > > > > > > ab send the request to squid cache server > running > > > on > > > > Machine B. Then squid send the real request > to > > > google > > > > website. > > > > So how can I upgrade my version to solve the > > > > problem? > > > > > > > > zhu > > > > > > Try getting latest stack snapshot from svn. > > > > > > -- > > > MST > > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From zhushisongzhu at yahoo.com Sat May 13 06:51:31 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Sat, 13 May 2006 06:51:31 -0700 (PDT) Subject: [openib-general] openib-usermode 6829-1 and IBG2-2.0.1 which is newer In-Reply-To: <20060510111355.GR21036@mellanox.co.il> Message-ID: <20060513135131.18320.qmail@web36912.mail.mud.yahoo.com> I upgrage IBG2-2.0.1 install using openib-usermode-6829-1.x86_64.rpm. Which is newer version? Under 6829-1 version, libsdp.so can't work but AF_INET_SDP program is ok. See below: [root at localhost ~]# LD_PRELOAD=libsdp.so ab -c 1 -n 1 -X 193.12.10.14:3129 http://www.google.com/index.html default libsdp configuration is used This is ApacheBench, Version 2.0.41-dev <$Revision: 1.141 $> apache-2.0 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/ Benchmarking www.google.com [through 193.12.10.14:3129] (be patient)... Test aborted after 10 failures apr_connect(): Transport endpoint is already connected (106) [root at localhost ~]# what's the wrong? zhu __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From mst at mellanox.co.il Sun May 14 00:26:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 14 May 2006 10:26:33 +0300 Subject: [openib-general] Re: sdp can't support many connections (>2000) In-Reply-To: <20060513120211.76841.qmail@web36904.mail.mud.yahoo.com> References: <20060510111355.GR21036@mellanox.co.il> <20060513120211.76841.qmail@web36904.mail.mud.yahoo.com> Message-ID: <20060514072633.GC28876@mellanox.co.il> Quoting r. zhu shi song : > Subject: Re: sdp can't support many connections (>2000) > > My OS: redhat EL-4.3 for X86_64 > After I upgraded > kernel-smp-2.6.9-34.OpenIB.6829.trunk.EL.root.x86_64.rpm > kernel-smp-devel-2.6.9-34.OpenIB.6829.trunk.EL.root.x86_64.rpm > and > openib-usermode-6829-1.x86_64.rpm, there are some > something wrong. > (1) > sdp connection failed. > > ./ttcp.aio -r ( on Server) > > ./ttcp.aio -t 193.12.10.14 (on Client) Try stracing it, see what's wrong. > (2) I can't ping another machine through ib. After I > run opensm once, I can. I stop it for a while , I > can't. > > How can I set the IB for its work correctly? > > tks > zhu You need an SM running at all times. -- MST From tziporet at mellanox.co.il Sun May 14 00:54:03 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 14 May 2006 10:54:03 +0300 Subject: [openib-general] RE: [openfabrics-ewg] Re: OFED 1.0 rc4 won't compile on orig FC5 kernel Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA6F4A@mtlexch01.mtl.com> Michael is correct. OFED supports FC5 only with latest kernel 2.6.16.x from kernel.org. Tziporet -----Original Message----- From: openfabrics-ewg-bounces at openib.org [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Michael S. Tsirkin Sent: Thursday, May 11, 2006 7:37 PM To: Scott Weitzenkamp (sweitzen) Cc: openfabrics-ewg at openib.org; openib-general at openib.org Subject: [openfabrics-ewg] Re: OFED 1.0 rc4 won't compile on orig FC5 kernel Quoting r. Scott Weitzenkamp (sweitzen) : > Subject: OFED 1.0 rc4 won't compile on orig FC5 kernel > > Is this a useful kernel to try, or should get latest FC5 kernel or 2.6.16 from kernel.org? I think you should go to latest update. -- MST _______________________________________________ openfabrics-ewg mailing list openfabrics-ewg at openib.org http://openib.org/mailman/listinfo/openfabrics-ewg From dotanb at mellanox.co.il Sun May 14 01:36:31 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Sun, 14 May 2006 11:36:31 +0300 Subject: [openib-general] Re: [DAPL] latest DAPL cannot be compiled with the latest librdmacm In-Reply-To: References: <200605101652.26604.dotanb@mellanox.co.il> Message-ID: <200605141136.31661.dotanb@mellanox.co.il> On Friday 12 May 2006 22:43, James Lentini wrote: > > On Wed, 10 May 2006, Dotan Barak wrote: > > > Hi. > > > > The latest DAPL cannot be compiled with the latest librdmacm after > > an API change in the librdmacm. > > Fixed in revision 7141. Let me know if you have any problems. > The compilation now is passing. thanks Dotan From dotanb at mellanox.co.il Sun May 14 01:43:07 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Sun, 14 May 2006 11:43:07 +0300 Subject: [openib-general] Quick RDMA Write with Immediate Data Item question In-Reply-To: <2cfcf21e0605121524p38579385i75b8682f070e28d6@mail.gmail.com> References: <2cfcf21e0605120546p22b71d3ax84e3fcc6cd43145d@mail.gmail.com> <446507F3.7090908@ichips.intel.com> <2cfcf21e0605121524p38579385i75b8682f070e28d6@mail.gmail.com> Message-ID: <200605141143.07152.dotanb@mellanox.co.il> Hi. On Saturday 13 May 2006 01:24, Steven Wooding wrote: > > > This leads me to another question I had about memory protection for RDMA > writes. What's the best way to stop the sender accidentally writing a larger > message than they should of, if I didn't want to use a different rkey for > each message (as setting up rkeys is expensive and too inflexible for my > application). > > Any thoughts? If you create one large MR (and have only one rkey) you cannot give this key to a remote side and prevent him from writing to all of the MR, so maybe you should consider holding several MRs. In the IB spec there is a mechanism that should help you managing remote access in more flexible way, this mechanism called Memory Window, but the gen2 driver doesn't support MW (yet?). Dotan From ogerlitz at voltaire.com Sun May 14 03:45:54 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 14 May 2006 13:45:54 +0300 (IDT) Subject: [openib-general] Re: slab error while removing ib_mad In-Reply-To: Message-ID: On Sun, 23 Apr 2006, Or Gerlitz wrote: > I am getting the below trace on 2.6.17-rc2 / AMD x86_64 / PCIX HCA > with both the IB sources that come with the kernel and svn trunk 6520. > > This happens if i just modprobe -r ib_mthca after fresh reboot, can > anyone reproduce it on her/his system as well? The module does get > modprobed out. > > Or. > > $ modprobe -r ib_mthca > > $ dmesg > > slab error in kmem_cache_destroy(): cache `ib_mad': Can't free all objects > > Call Trace: {kmem_cache_destroy+150} > {:ib_mad:ib_mad_cleanup_module+25} > {sys_delete_module+415} > {__up_write+20} > {sys_munmap+91} > {system_call+126} > > ib_mad: Failed to destroy ib_mad cache Roland, I think you were on vacation when i posted this, there were two responses saying they were not able to reproduce it, but no one was trying 2.6.17-X The current status is that I can still reproduce it, now with 2.6.17-rc4-git2 and the IB code that comes with the kernel. However i am ***not*** able to reproduce the failure i had with the trivial stand alone module which i have posted in http://openib.org/pipermail/openib-general/2006-April/020582.html So it seems there might be more then one issue here, or that the various problems i had on 2.6.17-rc1/2/3 were related but not totally solved with 2.6.17-rc4-git2 Following this failure the system gets into an unstable state, eg if i try to cat /proc/slabinfo i get the below oops Or. Unable to handle kernel paging request at ffffffff88092be5 RIP: {strnlen+12} PGD 1003027 PUD 1005027 PMD 37f32067 PTE 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: usbserial nfsd exportfs parport_pc lp parport edd joydev sg st sr_mod button battery ac ipv6 nfs lockd sunrpc sata_sil libata ohci_hcd hw_random i2c_amd8111 i2c_core e100 mii tg3 sd_mod scsi_mod dm_mod Pid: 16415, comm: cat Not tainted 2.6.17-rc4-git2 #1 RIP: 0010:[] {strnlen+12} RSP: 0018:ffff810019645d10 EFLAGS: 00010297 RAX: ffffffff88092be5 RBX: ffff810019645d68 RCX: 000000000000000a RDX: ffff810019645d98 RSI: fffffffffffffffe RDI: ffffffff88092be5 RBP: ffffffff88092be5 R08: 00000000ffffffff R09: 00000000000001d0 R10: 0000000000000000 R11: 0000000000000002 R12: ffff81003c964185 R13: 0000000000000011 R14: 0000000000000010 R15: ffff81003c964fff FS: 00002b8716e9b6e0(0000) GS:ffff81003f7a1768(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffff88092be5 CR3: 0000000034424000 CR4: 00000000000006e0 Process cat (pid: 16415, threadinfo ffff810019644000, task ffff81001f778880) Stack: ffffffff81132586 0000000000000e7b ffff81003c964185 ffffffff81295c05 ffff8100322d4930 ffff81001582a240 0000000000000000 0000000000000000 ffff8100322d4930 0000000000000000 Call Trace: {vsnprintf+765} {seq_printf+165} {__alloc_pages+101} {__handle_mm_fault+2357} {s_start+21} {debug_mutex_add_waiter+144} {__mutex_lock_slowpath+768} {s_show+616} {seq_read+264} {vfs_read+209} {sys_read+69} {system_call+126} Code: 80 3f 00 74 11 48 ff ce 48 ff c0 48 83 fe ff 74 05 80 38 00 RIP {strnlen+12} RSP CR2: ffffffff88092be5 <3>BUG: sleeping function called from invalid context at include/linux/rwsem.h:43 in_atomic():0, irqs_disabled():1 Call Trace: {__might_sleep+196} {blocking_notifier_call_chain+31} {do_exit+34} {do_unblank_screen+123} {do_page_fault+1864} {proc_alloc_inode+18} {proc_alloc_inode+64} {alloc_inode+272} {bit_waitqueue+53} {d_rehash+113} {error_exit+0} {strnlen+12} {vsnprintf+765} {seq_printf+165} {__alloc_pages+101} {__handle_mm_fault+2357} {s_start+21} {debug_mutex_add_waiter+144} {__mutex_lock_slowpath+768} {s_show+616} {seq_read+264} {vfs_read+209} {sys_read+69} {system_call+126} Losing some ticks... checking if CPU frequency changed. BUG: cat/16415, active lock [ffff8100322d4960(ffff8100322d4930-ffff8100322d4a30)] freed! Call Trace: {mutex_debug_check_no_locks_freed+273} {kfree+115} {seq_release+24} {__fput+181} {filp_close+91} {put_files_struct+105} {do_exit+652} {do_unblank_screen+123} {do_page_fault+1864} {proc_alloc_inode+18} {proc_alloc_inode+64} {alloc_inode+272} {bit_waitqueue+53} {d_rehash+113} {error_exit+0} {strnlen+12} {vsnprintf+765} {seq_printf+165} {__alloc_pages+101} {__handle_mm_fault+2357} {s_start+21} {debug_mutex_add_waiter+144} {__mutex_lock_slowpath+768} {s_show+616} {seq_read+264} {vfs_read+209} {sys_read+69} {system_call+126} [ffff8100322d4960] {seq_open} .. held by: cat:16415 [ffff81001f778880, 116] ... acquired at: seq_read+0x36/0x2a4 From ishai at mellanox.co.il Sun May 14 04:02:01 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Sun, 14 May 2006 14:02:01 +0300 Subject: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading In-Reply-To: References: Message-ID: <20060514110201.GA23308@mellanox.co.il> Hi, After loading ib_srp module, adding a target and then unloading the ib_srp target the scsi_host directory in /sys/class/scsi_host/ still exists. It looks like the srp code does not release the scsi_host it had allocated. After examining the code I found out that when executing srp_remove_work (the removal of one target) scsi_host_put is called twice, but when unloading the module in srp_remove_one scsi_host_put is called only once. It looks like the correct thing is to execute scsi_host_put twice (once for the call to scsi_add_host and once for the call to scsi_host_alloc). So, I suggest the next patch: -------------------------------------------------------------------------------- Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-14 13:09:23.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-14 13:25:48.000000000 +0300 @@ -357,7 +357,6 @@ static void srp_remove_work(void *target spin_lock_irq(target->scsi_host->host_lock); if (target->state != SRP_TARGET_DEAD) { spin_unlock_irq(target->scsi_host->host_lock); - scsi_host_put(target->scsi_host); return; } target->state = SRP_TARGET_REMOVED; @@ -1790,6 +1789,11 @@ static void srp_remove_one(struct ib_dev srp_disconnect_target(target); ib_destroy_cm_id(target->cm_id); srp_free_target_ib(target); + /* + * We need 2 scsi_host_put becuase there are two get: + * in scsi_host_alloc and in scsi_add_host + */ + scsi_host_put(target->scsi_host); scsi_host_put(target->scsi_host); } From halr at voltaire.com Sun May 14 04:40:25 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 14 May 2006 07:40:25 -0400 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <20060512171053.GN26684@obsidianresearch.com> References: <1147310565.4485.56947.camel@hal.voltaire.com> <20060511054803.GE26684@obsidianresearch.com> <1147346418.4485.68543.camel@hal.voltaire.com> <20060511171210.GH26684@obsidianresearch.com> <1147435580.4485.96257.camel@hal.voltaire.com> <20060512171053.GN26684@obsidianresearch.com> Message-ID: <1147606819.4485.150215.camel@hal.voltaire.com> On Fri, 2006-05-12 at 13:10, Jason Gunthorpe wrote: > On Fri, May 12, 2006 at 08:11:17AM -0400, Hal Rosenstock wrote: > > > > To allow what Roland is talking about you need an unambiguous > > > mechanism where the SA can signal to the client that the path > > > needs a GRH. > > > > Ah, you are referring to the SA path record response not the request. > > Yes.. Though I think we are still talking about different things in a > few places ;> > > How about this, how do you see this scenario: > > 1) Client gets a DGID from 'someplace' > 2) Client sends a SA query to resolve the DGID to a Path Record > 3) Client configures a QP based on the Path Record > > Now, the question I'm interested in is this: > During step #3 what test should the client apply to determine if a > GRH should be used with the QP. > > Other issues around the GRH like management MAD responses use and > multicast I feel are well specified and don't need more consideration. Thanks for clarifying. > > > Think of it the other way, HopLimit < 2 means it _can't_ be forwarded > > > off subnet, so that result from the SA should _always_ cause the > > > requesting client to not use a GRH for that path. > > > > Not always true in terms of local subnet (multicast and management MAD > > response exceptions). > > Yes, but these are well specified. Multicast must always have a GRH. > MAD requests are covered under my scenario above and MAD responses > to MAD requests with GRH's are specified to use the GRH and set the > HopLimit = 0xFF. Where does the spec say HopLmt needs to be 0xFF for multicast ? > Also, I would assume when building a router that multicast packets > with a hop limit of 0 are non-forwardable based on the rules in IBA. 0 or 1 hop limit for both unicast and multicast > > Are you saying HopLimit is supplied to the SA in the request ? It could > > be but it's optional in general. In the router case, an off subnet DGID > > should be sufficient. I would think the HopLimit (as well as the other > > GRH fields) would need to be returned by the SA to the client. > > Talking about a request for a Path to the SA from a client now: > I would suggest that if the client wishes to restrict itself to paths > that are only on-link then it could send a SA request with the > path record HopLimit=0. Yes (or HopLimit=1). > A SA request with HopLimit=* (masked out > of component mask) should let the SA return routed paths. Yes. > I also think that the SA response should have a HopLimit of 0 for > local paths 1 would also be valid here too. > and a HopLimit >= 2 for routed paths. Yes. > However, I can't find any wording in IBA that would require this > behavior. In terms of the SA responses to Path/MultiPathRecord requests, the HopLimit is required to be filled in in the response. Is that what you are asking ? It's up to the SA to determine this and for the client to use the values returned subsequently just as it does for DLIDs, SLs, etc. > > Not sure exactly what you mean by full control over the routing header > > (GRH). The SA supplies the info for the headers to the client and the > > client is responsible for putting the correct info in the headers. Do > > you mean supplies sufficient info for the client to do this correctly ? > > If so, I agree. > > As far as I can see IBA includes all header information for the GRH > and LRH in the PathRecord response. It does not define a how to > determine if the path described by a PathRecord response requires > a GRH or not. I think the rules are there: Multicasts always have GRH. Unicasts off subnet have GRH and on subnet they are optional. Off subnet is either determined by the prefix comparison or HopLimit >=2 in the response from the SA. The latter is implied by C8-16 on p. 229. -- Hal > Thanks, > Jason From halr at voltaire.com Sun May 14 04:53:40 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 14 May 2006 07:53:40 -0400 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <4464CC2A.80207@ichips.intel.com> References: <1147310565.4485.56947.camel@hal.voltaire.com> <20060511054803.GE26684@obsidianresearch.com> <1147346418.4485.68543.camel@hal.voltaire.com> <20060511171210.GH26684@obsidianresearch.com> <1147435580.4485.96257.camel@hal.voltaire.com> <20060512171053.GN26684@obsidianresearch.com> <4464CC2A.80207@ichips.intel.com> Message-ID: <1147607619.4485.150481.camel@hal.voltaire.com> On Fri, 2006-05-12 at 13:55, Sean Hefty wrote: > Jason Gunthorpe wrote: > > How about this, how do you see this scenario: > > > > 1) Client gets a DGID from 'someplace' > > 2) Client sends a SA query to resolve the DGID to a Path Record > > 3) Client configures a QP based on the Path Record > > > > Now, the question I'm interested in is this: > > During step #3 what test should the client apply to determine if a > > GRH should be used with the QP. > > This is the scenario that I need to resolve. > > What would happen if the GRH flag were always set? That would work but there would be additional overhead (especially for small packets this would be more noticeable) in the local subnet case. > Set only if the GID prefixes of the SGID/DGID were different? That's one way although it is more complex than what Jason has been proposing for this (SA response with HopLimit>=2). I'm not yet sure that the latter is sufficient as I think there may be other factors as to whether a packet is forwarded off subnet. One is the prefix scope (but I would think link local scopes should be limited in HopLimit except for multicasts (Jason cited that multicasts were required to have HopLimit 0xFF) but they require GRHs anyhow) so maybe I'm wrong about this and HopLimit>=2 is sufficient. -- Hal > - Sean From sashak at voltaire.com Sun May 14 05:23:34 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 14 May 2006 15:23:34 +0300 Subject: [openib-general] [PATCH] opensm: fix SL2VL capability check for switch's external ports Message-ID: <20060514122334.24603.63195.stgit@sashak.voltaire.com> Fix SL2VL capability check for case of switch's external ports - PortInfo::CapabilityMask is not used for such ports and capability check should be based on number of supported data VLs. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_qos.c | 10 ++++++++-- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/osm/opensm/osm_qos.c b/osm/opensm/osm_qos.c index 5944e9e..e72a0c0 100644 --- a/osm/opensm/osm_qos.c +++ b/osm/opensm/osm_qos.c @@ -206,8 +206,14 @@ static ib_api_status_t sl2vl_update(osm_ unsigned i, num_ports; ib_port_info_t *p_pi = osm_physp_get_port_info_ptr(p); - if (p_pi && !(p_pi->capability_mask & IB_PORT_CAP_HAS_SL_MAP)) - return IB_SUCCESS; + if (!p_pi) + return IB_ERROR; + if (port_num == 0) { + if (!(p_pi->capability_mask & IB_PORT_CAP_HAS_SL_MAP)) + return IB_SUCCESS; + } + else if (ib_port_info_get_vl_cap(p_pi) == 1) + return IB_SUCCESS; if (osm_node_get_type(osm_physp_get_node_ptr(p)) == IB_NODE_TYPE_SWITCH) num_ports = osm_node_get_num_physp(osm_physp_get_node_ptr(p)); From ishai at mellanox.co.il Sun May 14 06:09:03 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Sun, 14 May 2006 16:09:03 +0300 Subject: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading In-Reply-To: <20060514110201.GA23308@mellanox.co.il> References: <20060514110201.GA23308@mellanox.co.il> Message-ID: <20060514130903.GA24687@mellanox.co.il> On Sun, May 14, 2006 at 02:02:01PM +0300, Ishai Rabinovitz wrote: > Hi, > > After loading ib_srp module, adding a target and then unloading the ib_srp > target the scsi_host directory in /sys/class/scsi_host/ still exists. > It looks like the srp code does not release the scsi_host it had allocated. > > After examining the code I found out that when executing srp_remove_work > (the removal of one target) scsi_host_put is called twice, but when unloading > the module in srp_remove_one scsi_host_put is called only once. > > It looks like the correct thing is to execute scsi_host_put twice (once for > the call to scsi_add_host and once for the call to scsi_host_alloc). > > So, I suggest the next patch: Forgot to sign. Here it is again with the Signed-off-by line. Signed-off-by: Ishai Rabinovitz -------------------------------------------------------------------------------- Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-14 13:09:23.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-14 13:25:48.000000000 +0300 @@ -357,7 +357,6 @@ static void srp_remove_work(void *target spin_lock_irq(target->scsi_host->host_lock); if (target->state != SRP_TARGET_DEAD) { spin_unlock_irq(target->scsi_host->host_lock); - scsi_host_put(target->scsi_host); return; } target->state = SRP_TARGET_REMOVED; @@ -1790,6 +1789,11 @@ static void srp_remove_one(struct ib_dev srp_disconnect_target(target); ib_destroy_cm_id(target->cm_id); srp_free_target_ib(target); + /* + * We need 2 scsi_host_put becuase there are two get: + * in scsi_host_alloc and in scsi_add_host + */ + scsi_host_put(target->scsi_host); scsi_host_put(target->scsi_host); } From bugzilla-daemon at openib.org Sun May 14 06:41:30 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Sun, 14 May 2006 06:41:30 -0700 (PDT) Subject: [openib-general] [Bug 78] OFED 1.0 RC 4 iser install fails if patches already applied Message-ID: <20060514134130.D192E22841F@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=78 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |vlad at mellanox.co.il Status|ASSIGNED |NEW ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From glebn at voltaire.com Sun May 14 06:42:40 2006 From: glebn at voltaire.com (Gleb Natapov) Date: Sun, 14 May 2006 16:42:40 +0300 Subject: [openib-general] [resend][RFC][PATCH] adding call to madvise In-Reply-To: References: <20060511134217.GW5319@minantech.com> <20060511185926.GA1561@minantech.com> Message-ID: <20060514134240.GZ5319@minantech.com> Hello Roland, Here is the new version of the patch. It tries to address most of your comments. I've looked at possibility to use autoconf for detecting MADV_* defines, but I haven't found AC_CHECK_DEFUN or something like this to check for available defines. Besides what should we do in case the define is available? Define HAVE_MADV_DOFORK and check this instead of MADV_DOFORK? It seem redundant to me. Index: libibverbs/include/infiniband/verbs.h =================================================================== --- libibverbs/include/infiniband/verbs.h (revision 7141) +++ libibverbs/include/infiniband/verbs.h (working copy) @@ -289,6 +289,8 @@ struct ibv_mr { uint32_t handle; uint32_t lkey; uint32_t rkey; + void *addr; + size_t length; }; struct ibv_global_route { Index: libibverbs/src/verbs.c =================================================================== --- libibverbs/src/verbs.c (revision 7141) +++ libibverbs/src/verbs.c (working copy) @@ -154,10 +154,13 @@ struct ibv_mr *ibv_reg_mr(struct ibv_pd { struct ibv_mr *mr; + ibv_dontfork_range(addr, length); mr = pd->context->ops.reg_mr(pd, addr, length, access); if (mr) { mr->context = pd->context; mr->pd = pd; + } else { + ibv_dofork_range(addr, length); } return mr; @@ -165,7 +168,12 @@ struct ibv_mr *ibv_reg_mr(struct ibv_pd int ibv_dereg_mr(struct ibv_mr *mr) { - return mr->context->ops.dereg_mr(mr); + int rc = mr->context->ops.dereg_mr(mr); + + if (!rc) + ibv_dofork_range(mr->addr, mr->length); + + return rc; } static struct ibv_comp_channel *ibv_create_comp_channel_v2(struct ibv_context *context) Index: libibverbs/src/ibverbs.h =================================================================== --- libibverbs/src/ibverbs.h (revision 7141) +++ libibverbs/src/ibverbs.h (working copy) @@ -61,8 +61,8 @@ extern HIDDEN int abi_ver; extern HIDDEN int ibverbs_init(struct ibv_device ***list); extern HIDDEN int ibv_init_mem_map(void); -extern HIDDEN int ibv_lock_range(void *base, size_t size); -extern HIDDEN int ibv_unlock_range(void *base, size_t size); +extern HIDDEN int ibv_dontfork_range(void *base, size_t size); +extern HIDDEN int ibv_dofork_range(void *base, size_t size); #define IBV_INIT_CMD(cmd, size, opcode) \ do { \ Index: libibverbs/src/cmd.c =================================================================== --- libibverbs/src/cmd.c (revision 7141) +++ libibverbs/src/cmd.c (working copy) @@ -238,6 +238,8 @@ int ibv_cmd_reg_mr(struct ibv_pd *pd, vo mr->handle = resp.mr_handle; mr->lkey = resp.lkey; mr->rkey = resp.rkey; + mr->addr = addr; + mr->length = length; return 0; } Index: libibverbs/src/memory.c =================================================================== --- libibverbs/src/memory.c (revision 7141) +++ libibverbs/src/memory.c (working copy) @@ -43,6 +43,13 @@ #include "ibverbs.h" +#ifndef MADV_DONTFORK +#define MADV_DONTFORK 10 +#endif +#ifndef MADV_DOFORK +#define MADV_DOFORK 11 +#endif + /* * We keep a linked list of page ranges that have been locked along with a * reference count to manage overlapping registrations, etc. @@ -136,7 +143,7 @@ static void __mm_remove(struct ibv_mem_n node->next->prev = node->prev; } -int ibv_lock_range(void *base, size_t size) +int ibv_dontfork_range(void *base, size_t size) { uintptr_t start, end; struct ibv_mem_node *node, *tmp; @@ -187,8 +194,8 @@ int ibv_lock_range(void *base, size_t si if (node->refcnt++ == 0) { - ret = mlock((void *) node->start, - node->end - node->start + 1); + ret = madvise((void *) node->start, + node->end - node->start + 1, MADV_DONTFORK); if (ret) goto out; } @@ -202,7 +209,7 @@ out: return ret; } -int ibv_unlock_range(void *base, size_t size) +int ibv_dofork_range(void *base, size_t size) { uintptr_t start, end; struct ibv_mem_node *node, *tmp; @@ -226,8 +233,8 @@ int ibv_unlock_range(void *base, size_t while (node && node->end <= end) { if (--node->refcnt == 0) { - ret = munlock((void *) node->start, - node->end - node->start + 1); + ret = madvise((void *) node->start, + node->end - node->start + 1, MADV_DOFORK); } if (__mm_prev(node) && node->refcnt == __mm_prev(node)->refcnt) { Index: libmthca/src/qp.c =================================================================== --- libmthca/src/qp.c (revision 7141) +++ libmthca/src/qp.c (working copy) @@ -819,8 +819,10 @@ int mthca_alloc_qp_buf(struct ibv_pd *pd qp->buf_size = qp->send_wqe_offset + (qp->sq.max << qp->sq.wqe_shift); - if (posix_memalign(&qp->buf, to_mdev(pd->context->device)->page_size, - align(qp->buf_size, to_mdev(pd->context->device)->page_size))) { + if (mthca_memalign_dontfork(&qp->buf, + to_mdev(pd->context->device)->page_size, + align(qp->buf_size, + to_mdev(pd->context->device)->page_size))) { free(qp->wrid); return -1; } Index: libmthca/src/mthca.h =================================================================== --- libmthca/src/mthca.h (revision 7141) +++ libmthca/src/mthca.h (working copy) @@ -36,6 +36,8 @@ #ifndef MTHCA_H #define MTHCA_H +#include + #include #include @@ -341,4 +343,32 @@ void mthca_free_av(struct mthca_ah *ah); int mthca_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); int mthca_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); +#ifndef MADV_DONTFORK +#define MADV_DONTFORK 10 +#endif +#ifndef MADV_DOFORK +#define MADV_DOFORK 11 +#endif + +static inline int mthca_memalign_dontfork(void **memptr, size_t alignment, + size_t size) +{ + int ret; + + ret = posix_memalign(memptr, alignment, size); + + if (ret) + return ret; + + madvise(*memptr, size, MADV_DONTFORK); + + return 0; +} + +static inline void mthca_free_dofork(void *ptr, size_t size) +{ + madvise(ptr, size, MADV_DOFORK); + free(ptr); +} + #endif /* MTHCA_H */ Index: libmthca/src/verbs.c =================================================================== --- libmthca/src/verbs.c (revision 7141) +++ libmthca/src/verbs.c (working copy) @@ -247,7 +247,7 @@ err_unreg: mthca_dereg_mr(cq->mr); err_buf: - free(cq->buf); + mthca_free_dofork(cq->buf, cqe * MTHCA_CQ_ENTRY_SIZE); err: free(cq); @@ -263,6 +263,7 @@ int mthca_resize_cq(struct ibv_cq *ibcq, void *buf; int old_cqe; int ret; + size_t length; pthread_spin_lock(&cq->lock); @@ -282,7 +283,7 @@ int mthca_resize_cq(struct ibv_cq *ibcq, cqe * MTHCA_CQ_ENTRY_SIZE, 0, IBV_ACCESS_LOCAL_WRITE); if (!mr) { - free(buf); + mthca_free_dofork(buf, cqe * MTHCA_CQ_ENTRY_SIZE); ret = ENOMEM; goto out; } @@ -295,14 +296,15 @@ int mthca_resize_cq(struct ibv_cq *ibcq, ret = ibv_cmd_resize_cq(ibcq, cqe - 1, &cmd.ibv_cmd, sizeof cmd); if (ret) { mthca_dereg_mr(mr); - free(buf); + mthca_free_dofork(buf, cqe * MTHCA_CQ_ENTRY_SIZE); goto out; } mthca_cq_resize_copy_cqes(cq, buf, old_cqe); + length = cq->mr->length; mthca_dereg_mr(cq->mr); - free(cq->buf); + mthca_free_dofork(cq->buf, length); cq->buf = buf; cq->mr = mr; @@ -315,6 +317,7 @@ out: int mthca_destroy_cq(struct ibv_cq *cq) { int ret; + size_t length; ret = ibv_cmd_destroy_cq(cq); if (ret) @@ -327,9 +330,10 @@ int mthca_destroy_cq(struct ibv_cq *cq) to_mcq(cq)->arm_db_index); } + length = to_mcq(cq)->mr->length; mthca_dereg_mr(to_mcq(cq)->mr); - free(to_mcq(cq)->buf); + mthca_free_dofork(to_mcq(cq)->buf, length); free(to_mcq(cq)); return 0; @@ -422,7 +426,7 @@ err_unreg: err_free: free(srq->wrid); - free(srq->buf); + mthca_free_dofork(srq->buf, srq->buf_size); err: free(srq); @@ -461,7 +465,7 @@ int mthca_destroy_srq(struct ibv_srq *sr mthca_dereg_mr(to_msrq(srq)->mr); - free(to_msrq(srq)->buf); + mthca_free_dofork(to_msrq(srq)->buf, to_msrq(srq)->buf_size); free(to_msrq(srq)->wrid); free(to_msrq(srq)); @@ -566,7 +570,7 @@ err_unreg: err_free: free(qp->wrid); - free(qp->buf); + mthca_free_dofork(qp->buf, qp->buf_size); err: free(qp); @@ -648,7 +652,7 @@ int mthca_destroy_qp(struct ibv_qp *qp) mthca_dereg_mr(to_mqp(qp)->mr); - free(to_mqp(qp)->buf); + mthca_free_dofork(to_mqp(qp)->buf, to_mqp(qp)->buf_size); free(to_mqp(qp)->wrid); free(to_mqp(qp)); Index: libmthca/src/cq.c =================================================================== --- libmthca/src/cq.c (revision 7141) +++ libmthca/src/cq.c (working copy) @@ -606,8 +606,9 @@ void *mthca_alloc_cq_buf(struct mthca_de void *buf; int i; - if (posix_memalign(&buf, dev->page_size, - align(nent * MTHCA_CQ_ENTRY_SIZE, dev->page_size))) + if (mthca_memalign_dontfork(&buf, dev->page_size, + align(nent * MTHCA_CQ_ENTRY_SIZE, + dev->page_size))) return NULL; for (i = 0; i < nent; ++i) Index: libmthca/src/srq.c =================================================================== --- libmthca/src/srq.c (revision 7141) +++ libmthca/src/srq.c (working copy) @@ -291,8 +291,10 @@ int mthca_alloc_srq_buf(struct ibv_pd *p srq->buf_size = srq->max << srq->wqe_shift; - if (posix_memalign(&srq->buf, to_mdev(pd->context->device)->page_size, - align(srq->buf_size, to_mdev(pd->context->device)->page_size))) { + if (mthca_memalign_dontfork(&srq->buf, + to_mdev(pd->context->device)->page_size, + align(srq->buf_size, + to_mdev(pd->context->device)->page_size))) { free(srq->wrid); return -1; } Index: libmthca/src/ah.c =================================================================== --- libmthca/src/ah.c (revision 7141) +++ libmthca/src/ah.c (working copy) @@ -59,14 +59,14 @@ static struct mthca_ah_page *__add_page( if (!page) return NULL; - if (posix_memalign(&page->buf, page_size, page_size)) { + if (mthca_memalign_dontfork(&page->buf, page_size, page_size)) { free(page); return NULL; } page->mr = mthca_reg_mr(&pd->ibv_pd, page->buf, page_size, 0); if (!page->mr) { - free(page->buf); + mthca_free_dofork(page->buf, page_size); free(page); return NULL; } @@ -175,6 +175,8 @@ void mthca_free_av(struct mthca_ah *ah) page->free[i / (8 * sizeof (int))] |= 1 << (i % (8 * sizeof (int))); if (!--page->use_cnt) { + size_t length; + if (page->prev) page->prev->next = page->next; else @@ -182,8 +184,9 @@ void mthca_free_av(struct mthca_ah *ah) if (page->next) page->next->prev = page->prev; + length = page->mr->length; mthca_dereg_mr(page->mr); - free(page->buf); + mthca_free_dofork(page->buf, length); free(page); } -- Gleb. From mst at mellanox.co.il Sun May 14 07:21:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 14 May 2006 17:21:43 +0300 Subject: [openib-general] CMA IPv6 support Message-ID: <20060514142143.GB30211@mellanox.co.il> Sean, CMA currently does not support IPv6 addresses at all. Is that right? However, while I don't have immediate need to make real IPv6 addressing to work, some applications (notably Java) always use AF_INET6 sockets for both IPv4 and IPv6 communications. (Applications that want to restrict their use of AF_INET6 socket to IPv6 communications only can set the IPV6_V6ONLY option). I suggest implementing the support for IPv6 mapped IPv4 addresses in CMA as a first step, with an eye towards full IPv6 support in the future. What do you think? -- MST From jgunthorpe at obsidianresearch.com Sun May 14 12:30:58 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Sun, 14 May 2006 13:30:58 -0600 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <1147606819.4485.150215.camel@hal.voltaire.com> References: <1147310565.4485.56947.camel@hal.voltaire.com> <20060511054803.GE26684@obsidianresearch.com> <1147346418.4485.68543.camel@hal.voltaire.com> <20060511171210.GH26684@obsidianresearch.com> <1147435580.4485.96257.camel@hal.voltaire.com> <20060512171053.GN26684@obsidianresearch.com> <1147606819.4485.150215.camel@hal.voltaire.com> Message-ID: <20060514193058.GC6954@obsidianresearch.com> On Sun, May 14, 2006 at 07:40:25AM -0400, Hal Rosenstock wrote: > > > Not always true in terms of local subnet (multicast and management MAD > > > response exceptions). > > > > Yes, but these are well specified. Multicast must always have a GRH. > > MAD requests are covered under my scenario above and MAD responses > > to MAD requests with GRH's are specified to use the GRH and set the > > HopLimit = 0xFF. > > Where does the spec say HopLmt needs to be 0xFF for multicast ? I ment that the spec says a MAD response with a GRH should have 0xFF for HopLmt. (13.5.4.4) I'd expect the Multicast HopLmt to come from the SA, just like in the unicast case. > Off subnet is either determined by the prefix comparison or HopLimit >=2 > in the response from the SA. The latter is implied by C8-16 on p. 229. The only possible downside of using HopLimit, that I can see, is compatability with existing SA's. Do all existing SA's set HopLmt to 0 or 1 in path record responses? (Since no SA's support routers, that would be correct..) Scope should not be a problem because the SA can follow whatever scope based rules might exist and then set HopLimit properly. FWIW, my vote would be to use HopLimit, since that lets the SA tell the client if it should use a GRH. With prefix comparison GRH usage is not under the control of the SA - so it is less flexable. Jason From halr at voltaire.com Sun May 14 16:02:21 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 14 May 2006 19:02:21 -0400 Subject: [openib-general] question regarding GRH flag in ib_ah_attr In-Reply-To: <20060514193058.GC6954@obsidianresearch.com> References: <1147310565.4485.56947.camel@hal.voltaire.com> <20060511054803.GE26684@obsidianresearch.com> <1147346418.4485.68543.camel@hal.voltaire.com> <20060511171210.GH26684@obsidianresearch.com> <1147435580.4485.96257.camel@hal.voltaire.com> <20060512171053.GN26684@obsidianresearch.com> <1147606819.4485.150215.camel@hal.voltaire.com> <20060514193058.GC6954@obsidianresearch.com> Message-ID: <1147647739.4485.164154.camel@hal.voltaire.com> On Sun, 2006-05-14 at 15:30, Jason Gunthorpe wrote: > On Sun, May 14, 2006 at 07:40:25AM -0400, Hal Rosenstock wrote: > > > > Not always true in terms of local subnet (multicast and management MAD > > > > response exceptions). > > > > > > Yes, but these are well specified. Multicast must always have a GRH. > > > MAD requests are covered under my scenario above and MAD responses > > > to MAD requests with GRH's are specified to use the GRH and set the > > > HopLimit = 0xFF. > > > > Where does the spec say HopLmt needs to be 0xFF for multicast ? > > I ment that the spec says a MAD response with a GRH should have 0xFF > for HopLmt. (13.5.4.4) Right; from the MAD response rules. > I'd expect the Multicast HopLmt to come from the SA, just like in the > unicast case. OK; that's what I thought. > > Off subnet is either determined by the prefix comparison or HopLimit >=2 > > in the response from the SA. The latter is implied by C8-16 on p. 229. > > The only possible downside of using HopLimit, that I can see, is > compatability with existing SA's. Do all existing SA's set HopLmt to 0 > or 1 in path record responses? (Since no SA's support routers, > that would be correct..) I would argue that the implementations would not be conformant if that were not the case currently. > Scope should not be a problem because the SA can follow whatever > scope based rules might exist and then set HopLimit properly. Sure, the SA would certainly use the scope to know whether it needs to go beyond the local subnet for path resolution (both unicast and multicast). > FWIW, my vote would be to use HopLimit, since that lets the SA > tell the client if it should use a GRH. With prefix comparison GRH > usage is not under the control of the SA - so it is less flexable. Makes sense to me (now)... -- Hal > Jason From halr at voltaire.com Mon May 15 03:24:55 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 15 May 2006 06:24:55 -0400 Subject: [openib-general] Re: [PATCH] opensm: fix SL2VL capability check for switch's external ports In-Reply-To: <20060514122334.24603.63195.stgit@sashak.voltaire.com> References: <20060514122334.24603.63195.stgit@sashak.voltaire.com> Message-ID: <1147688603.4485.178159.camel@hal.voltaire.com> On Sun, 2006-05-14 at 08:23, Sasha Khapyorsky wrote: > Fix SL2VL capability check for case of switch's external ports - > PortInfo::CapabilityMask is not used for such ports and capability check > should be based on number of supported data VLs. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied. -- Hal From halr at voltaire.com Mon May 15 03:28:10 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 15 May 2006 06:28:10 -0400 Subject: [openib-general] Re: [PATCH] opensm: fix SL2VL capability check for switch's external ports In-Reply-To: <20060514122334.24603.63195.stgit@sashak.voltaire.com> References: <20060514122334.24603.63195.stgit@sashak.voltaire.com> Message-ID: <1147688694.4485.178184.camel@hal.voltaire.com> On Sun, 2006-05-14 at 08:23, Sasha Khapyorsky wrote: > Fix SL2VL capability check for case of switch's external ports - > PortInfo::CapabilityMask is not used for such ports and capability check > should be based on number of supported data VLs. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only. -- Hal From halr at voltaire.com Mon May 15 05:51:28 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 15 May 2006 08:51:28 -0400 Subject: [openib-general] complib and __WIN__ conditionalization Message-ID: <1147697376.4485.181438.camel@hal.voltaire.com> Hi Eitan, Is my understanding that there is a separate complib for Windows (v. Linux) for OpenSM correct ? If so, any objections to removing the following (from the Linux complib): complib/cl_log.c:#ifdef __WIN__ include/complib/cl_memory_osd.h:#ifndef __WIN__ include/complib/cl_types.h:#ifdef __WIN__ Thanks. -- Hal From zhushisongzhu at yahoo.com Mon May 15 07:02:33 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Mon, 15 May 2006 07:02:33 -0700 (PDT) Subject: [openib-general] can mellanox IBGD-1.8.2 support Redhat EL 4.3 Message-ID: <20060515140233.41098.qmail@web36915.mail.mud.yahoo.com> Using IBGD2-2.0.1 I have experienced sdp connection problem.So I have downloaded IBGD-1.8.2 to test. But vstat and ibls can't find infiniband driver. OS: Redhat EL 4.3 Kernel: kernel 2.6.9-34 HCA: MT 25204 can IBGD-1.8.2 support kernel 2.6.9-34? tks zhu __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From ianbrn at gmail.com Mon May 15 07:26:50 2006 From: ianbrn at gmail.com (Ian Brown) Date: Mon, 15 May 2006 17:26:50 +0300 Subject: [openib-general] RDMA enabled NICs- newbie Message-ID: Hello, google search for "RNIC driver" gives about 60 results. Are there Etherenet NICS in the market (or will there be soon such nics) such NICs which are RDMA nics (RDMA enabled NICs)? And in case the answer is positive - does linux kernel has support for such nics? IB From rheflin at atipa.com Mon May 15 08:02:37 2006 From: rheflin at atipa.com (Roger Heflin) Date: Mon, 15 May 2006 10:02:37 -0500 Subject: [openib-general] [PATCH 0 of 53] ipath driver updates for 2.6.17-rc4 In-Reply-To: References: Message-ID: <4468980D.3000704@atipa.com> Bryan O'Sullivan wrote: > Hi, Roland - > > Here is a series of patches to bring the ipath driver up to date. I > believe you may already have two of them (but I've included them just > in case), but the others should all be new. > > They apply on top of Linus's current -git. > > Cheers, > > Hi Hal, We were working towards merging of the complibs. This is why the __WIN__ is there I am not sure we should remove those as one day we might be getting back to this task. But, if you find these annoying we can remove them and add back only if such work is restarted. EZ Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Monday, May 15, 2006 3:51 PM > To: Eitan Zahavi > Cc: openib-general at openib.org > Subject: complib and __WIN__ conditionalization > > Hi Eitan, > > Is my understanding that there is a separate complib for Windows (v. > Linux) for OpenSM correct ? If so, any objections to removing the > following (from the Linux complib): > > complib/cl_log.c:#ifdef __WIN__ > include/complib/cl_memory_osd.h:#ifndef __WIN__ > include/complib/cl_types.h:#ifdef __WIN__ > > Thanks. > > -- Hal From rdreier at cisco.com Mon May 15 08:44:46 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 08:44:46 -0700 Subject: [openib-general] Re: [PATCH 0 of 53] ipath driver updates for 2.6.17-rc4 In-Reply-To: (Bryan O'Sullivan's message of "Fri, 12 May 2006 16:42:45 -0700") References: Message-ID: Umm... dumping a 53 patch series into the kernel at this stage in the release cycle isn't going to work. You need to sort out the patches that need to go into 2.6.17 from patches that can wait. For example, a 1500+ line patch to factor out common code is clearly not appropriate now. Pretty much the only patches that should be going in now are changes that fix crashes or other serious bugs. (You can send both sets of patches at the same time -- just let me which ones are for 2.6.17 and which ones can be queued for 2.6.18) I have some more specific comments in reply to individual patches, although I didn't try to review all 53. - R. From rdreier at cisco.com Mon May 15 08:45:49 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 08:45:49 -0700 Subject: [openib-general] Re: [PATCH 4 of 53] ipath - cap number of PDs that can be allocated In-Reply-To: <300f0aa6f034eec6a806.1147477369@eng-12.pathscale.com> (Bryan O'Sullivan's message of "Fri, 12 May 2006 16:42:49 -0700") References: <300f0aa6f034eec6a806.1147477369@eng-12.pathscale.com> Message-ID: > Put an arbitrary cap on the maximum number of PDs that can be allocated > for a device. This is arbitrary because the number we support > is constrained only by system memory and what kmalloc can give us. > Nevertheless, if we don't have a limit, some third-party OpenIB stress > tests fail. The limit can be changed on the fly using a module parameter. Would it make more sense to fix the stress test? - R. From rdreier at cisco.com Mon May 15 08:46:59 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 08:46:59 -0700 Subject: [openib-general] Re: [PATCH 14 of 53] ipath - forbid empty MRs In-Reply-To: <5d9fbba3222eeb941679.1147477379@eng-12.pathscale.com> (Bryan O'Sullivan's message of "Fri, 12 May 2006 16:42:59 -0700") References: <5d9fbba3222eeb941679.1147477379@eng-12.pathscale.com> Message-ID: > Don't allow zero-length regions to be created. Why are zero-length regions forbidden? - R. From rdreier at cisco.com Mon May 15 08:48:12 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 08:48:12 -0700 Subject: [openib-general] Re: [PATCH 15 of 53] ipath - make some maximum values more sane In-Reply-To: <480ceff18a886d7504a5.1147477380@eng-12.pathscale.com> (Bryan O'Sullivan's message of "Fri, 12 May 2006 16:43:00 -0700") References: <480ceff18a886d7504a5.1147477380@eng-12.pathscale.com> Message-ID: > -unsigned int ib_ipath_max_cqes = 0xFFFF; > +unsigned int ib_ipath_max_cqes = 0x2FFFF; You just added this limit in patch 8/53. How about just fixing that patch to do what you want? - R. From rdreier at cisco.com Mon May 15 08:50:27 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 08:50:27 -0700 Subject: [openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt In-Reply-To: <4e0a07d20868c6c4f038.1147477386@eng-12.pathscale.com> (Bryan O'Sullivan's message of "Fri, 12 May 2006 16:43:06 -0700") References: <4e0a07d20868c6c4f038.1147477386@eng-12.pathscale.com> Message-ID: > I think Roland already has this patch. > * This is a bit of a hack since we rely on dma_map_single() > - * being reversible by calling bus_to_virt(). > + * being reversible by calling phys_to_virt(). Actually I NAK'ed this patch. It compiles the same thing on x86_64 but makes the source code wrong -- dma_map_single() returns a bus address, not a physical address. - R. From rdreier at cisco.com Mon May 15 08:53:44 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 08:53:44 -0700 Subject: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes In-Reply-To: (Bryan O'Sullivan's message of "Fri, 12 May 2006 16:43:20 -0700") References: Message-ID: This looks like a pastiche of several patches. Why can't it be split up into logical pieces? > Call dma_free_coherent without ipath_mutex held. Why? Doesn't freeing work with the mutex held? - R. From rdreier at cisco.com Mon May 15 08:55:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 08:55:37 -0700 Subject: [openib-general] Re: [PATCH 41 of 53] ipath - disable interrupts while holding spinlock in RWQE get In-Reply-To: <83f1832c601594846868.1147477406@eng-12.pathscale.com> (Bryan O'Sullivan's message of "Fri, 12 May 2006 16:43:26 -0700") References: <83f1832c601594846868.1147477406@eng-12.pathscale.com> Message-ID: > @@ -171,12 +171,13 @@ int ipath_get_rwqe(struct ipath_qp *qp, > n = rq->head - rq->tail; > if (n < srq->limit) { > srq->limit = 0; > - spin_unlock(&rq->lock); > + spin_unlock_irqrestore(&rq->lock, flags); > ev.device = qp->ibqp.device; > ev.element.srq = qp->ibqp.srq; > ev.event = IB_EVENT_SRQ_LIMIT_REACHED; > srq->ibsrq.event_handler(&ev, > srq->ibsrq.srq_context); > + spin_lock_irqsave(&rq->lock, flags); ipath_get_rwqe() in the kernel now doesn't even have a flags variable. So this looks like a bug introduced earlier in this patch series. Please roll the fix up into the place where you added the bug. - R. From rdreier at cisco.com Mon May 15 08:57:41 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 08:57:41 -0700 Subject: [openib-general] Re: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes In-Reply-To: (Bryan O'Sullivan's message of "Fri, 12 May 2006 16:43:38 -0700") References: Message-ID: > static void i2c_wait_for_writes(struct ipath_devdata *dd) > { > + mb(); > (void)ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); > } This needs a comment explaining why it's needed. A memory barrier before a readl() looks very strange since readl() should be ordered anyway. - R. From rheflin at atipa.com Mon May 15 09:00:28 2006 From: rheflin at atipa.com (Roger Heflin) Date: Mon, 15 May 2006 11:00:28 -0500 Subject: [openib-general] Re: [PATCH 0 of 53] ipath driver updates for 2.6.17-rc4 In-Reply-To: References: Message-ID: <4468A59C.2030400@atipa.com> Roland Dreier wrote: > Umm... dumping a 53 patch series into the kernel at this stage in the > release cycle isn't going to work. You need to sort out the patches > that need to go into 2.6.17 from patches that can wait. For example, > a 1500+ line patch to factor out common code is clearly not > appropriate now. Pretty much the only patches that should be going in > now are changes that fix crashes or other serious bugs. > > (You can send both sets of patches at the same time -- just let me > which ones are for 2.6.17 and which ones can be queued for 2.6.18) > > I have some more specific comments in reply to individual patches, > although I didn't try to review all 53. > > - R. Roland, What should these patches apply against? I have tried rc4 and a number of them fail, and I have also tried one of your gits (though maybe not the right one), and at least some of the the same patches seem to fail to apply there. If I can get an idea what they should apply to I will apply them against that and see how things look. I have at least one of the nasty bugs that I know about. Roger From rdreier at cisco.com Mon May 15 09:00:18 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 09:00:18 -0700 Subject: [openib-general] Re: [PATCH 50 of 53] ipath - reduce maximum table sizes In-Reply-To: (Bryan O'Sullivan's message of "Fri, 12 May 2006 16:43:35 -0700") References: Message-ID: This is the third patch in the series that changes these -- how about making up your mind ;) From rdreier at cisco.com Mon May 15 09:04:30 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 09:04:30 -0700 Subject: [openib-general] Re: [PATCH 0 of 53] ipath driver updates for 2.6.17-rc4 In-Reply-To: <4468A59C.2030400@atipa.com> (Roger Heflin's message of "Mon, 15 May 2006 11:00:28 -0500") References: <4468A59C.2030400@atipa.com> Message-ID: Roger> What should these patches apply against? No idea. Bryan said they apply against Linus's current git, but I didn't actually try. - R. From rdreier at cisco.com Mon May 15 09:05:36 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 09:05:36 -0700 Subject: [openib-general] Re: slab error while removing ib_mad In-Reply-To: (Or Gerlitz's message of "Sun, 14 May 2006 13:45:54 +0300 (IDT)") References: Message-ID: Or> I think you were on vacation when i posted this, there were Or> two responses saying they were not able to reproduce it, but Or> no one was trying 2.6.17-X Not sure why you expect me to solve this -- other than the fact that I am a great debugger ;) Anyway I would guess the problem is in the NUMA slab stuff. Do you have CONFIG_NUMA set in your config? Do you still see the issue if you take CONFIG_NUMA out? - R. From jlentini at netapp.com Mon May 15 09:26:49 2006 From: jlentini at netapp.com (James Lentini) Date: Mon, 15 May 2006 12:26:49 -0400 (EDT) Subject: [openib-general] RDMA enabled NICs- newbie In-Reply-To: References: Message-ID: On Mon, 15 May 2006, Ian Brown wrote: > Hello, > google search for "RNIC driver" gives about 60 results. > > Are there Etherenet NICS in the market (or will there be soon such nics) > such NICs which are RDMA nics (RDMA enabled NICs)? > And in case the answer is positive - does linux kernel has support > for such nics? There are a number of companies that have produced RDMA NICs (RNICs). To date, software support for these devices has not been accepted into the mainline Linux kernel. The OpenFabrics project (formerly OpenIB) is been developing support for RNICs. This work is being done in a branch of the source tree: https://openib.org/svn/gen2/branches/iwarp/ This branch contains drivers for Ammasso AMSO1100 and Chelsio T3-based adapters with support for additional adapters possible in the future. The goal is to merge this work back into the OpenFabrics trunk and eventually submit it for inclusion into the mainline Linux kernel. From mst at mellanox.co.il Mon May 15 09:38:11 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 15 May 2006 19:38:11 +0300 Subject: [openib-general] [PATCH] cm: dont flush wqq before destroy Message-ID: <20060515163811.GA19104@mellanox.co.il> Sean, is there some reason CM flushes WQ an extra time? If not ... --- destroy_workqueue already does flush_workqueue. Signed-off-by: Michael S. Tsirkin Index: cm.c =================================================================== --- cm.c (revision 7145) +++ cm.c (working copy) @@ -3362,7 +3362,6 @@ error: static void __exit ib_cm_cleanup(void) { - flush_workqueue(cm.wq); destroy_workqueue(cm.wq); ib_unregister_client(&cm_client); idr_destroy(&cm.local_id_table); -- MST From mshefty at ichips.intel.com Mon May 15 09:35:34 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 15 May 2006 09:35:34 -0700 Subject: [openib-general] CMA IPv6 support In-Reply-To: <20060514142143.GB30211@mellanox.co.il> References: <20060514142143.GB30211@mellanox.co.il> Message-ID: <4468ADD6.1040700@ichips.intel.com> Michael S. Tsirkin wrote: > Sean, CMA currently does not support IPv6 addresses at all. Is that right? This is correct. At best, there's some code in places to handle it. > However, while I don't have immediate need to make real IPv6 addressing to work, > some applications (notably Java) always use AF_INET6 sockets for both IPv4 and > IPv6 communications. (Applications that want to restrict their use of AF_INET6 > socket to IPv6 communications only can set the IPV6_V6ONLY option). > > I suggest implementing the support for IPv6 mapped IPv4 addresses in CMA > as a first step, with an eye towards full IPv6 support in the future. > > What do you think? I agree that it needs to be added. What I lack at the moment is the time to do it. I don't think that it would be a huge effort to add proper support, so I'd rather not put in a temporary solution. - Sean From mst at mellanox.co.il Mon May 15 09:44:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 15 May 2006 19:44:43 +0300 Subject: [openib-general] CMA IPv6 support In-Reply-To: <4468ADD6.1040700@ichips.intel.com> References: <20060514142143.GB30211@mellanox.co.il> <4468ADD6.1040700@ichips.intel.com> Message-ID: <20060515164443.GA19163@mellanox.co.il> Quoting r. Sean Hefty : > >I suggest implementing the support for IPv6 mapped IPv4 addresses in CMA as a > >first step, with an eye towards full IPv6 support in the future. > > > >What do you think? > > I agree that it needs to be added. What I lack at the moment is the time > to do it. I don't think that it would be a huge effort to add proper > support, so I'd rather not put in a temporary solution. OK. May we discuss the design/API for now? My understading is an IPv4 socket should only listen on IPv4 requests, while IPv6 socket should listen on both IPv4 and IPv6, unless IPV6_V6ONLY is set. Is that right? What will the API be for this? Maybe create_cm_id should get an address family parameter? -- MST From mshefty at ichips.intel.com Mon May 15 09:41:23 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 15 May 2006 09:41:23 -0700 Subject: [openib-general] RDMA enabled NICs- newbie In-Reply-To: References: Message-ID: <4468AF33.5010205@ichips.intel.com> Ian Brown wrote: > Are there Etherenet NICS in the market (or will there be soon such nics) > such NICs which are RDMA nics (RDMA enabled NICs)? > And in case the answer is positive - does linux kernel has support > for such nics? The Linux kernel does not have support for RNICs at this time, but there is active work to add it. There are RNICs that exist, but I'm not sure that any of the companies that made them still exist. There are a couple of RNIC developers on the list who can provide more details. - Sean From ianbrn at gmail.com Mon May 15 09:58:04 2006 From: ianbrn at gmail.com (Ian Brown) Date: Mon, 15 May 2006 19:58:04 +0300 Subject: [openib-general] RDMA enabled NICs- newbie In-Reply-To: <4468AF33.5010205@ichips.intel.com> References: <4468AF33.5010205@ichips.intel.com> Message-ID: Thanks all. I indeed fround that http://www.ammasso.com/ responds with "There is no website configured at this address." while http://www.chelsio.com/ does exist. Is there a reason why manufacturers will refrain from producing RDMA ? (I mean , are there better technologies which are a substitute for RDMA for ethernet ?) Regards, IB On 5/15/06, Sean Hefty wrote: > Ian Brown wrote: > > Are there Etherenet NICS in the market (or will there be soon such nics) > > such NICs which are RDMA nics (RDMA enabled NICs)? > > And in case the answer is positive - does linux kernel has support > > for such nics? > > The Linux kernel does not have support for RNICs at this time, but there is > active work to add it. There are RNICs that exist, but I'm not sure that any of > the companies that made them still exist. There are a couple of RNIC developers > on the list who can provide more details. > > - Sean > From sean.hefty at intel.com Mon May 15 09:55:23 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 15 May 2006 09:55:23 -0700 Subject: [openib-general] RE: [PATCH] cm: dont flush wqq before destroy In-Reply-To: <20060515163811.GA19104@mellanox.co.il> Message-ID: Thanks! - applied. - Sean From sean.hefty at intel.com Mon May 15 10:05:04 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 15 May 2006 10:05:04 -0700 Subject: [openib-general] CMA IPv6 support In-Reply-To: <20060515164443.GA19163@mellanox.co.il> Message-ID: >OK. May we discuss the design/API for now? Sounds good. >My understanding is an IPv4 socket should only listen on IPv4 requests, >while IPv6 socket should listen on both IPv4 and IPv6, unless >IPV6_V6ONLY is set. > >Is that right? What will the API be for this? Maybe create_cm_id should get an >address family parameter? I came to the same conclusion a couple of weeks ago. Rdma_create_id() will likely need an address family parameter, or the user must explicitly bind before calling listen. - Sean From rheflin at atipa.com Mon May 15 10:14:14 2006 From: rheflin at atipa.com (Roger Heflin) Date: Mon, 15 May 2006 12:14:14 -0500 Subject: [openib-general] RDMA enabled NICs- newbie In-Reply-To: References: <4468AF33.5010205@ichips.intel.com> Message-ID: <4468B6E6.2030201@atipa.com> Ian Brown wrote: > Thanks all. > I indeed fround that > http://www.ammasso.com/ responds with > "There is no website configured at this address." > while > http://www.chelsio.com/ > does exist. > > Is there a reason why manufacturers will refrain from > producing RDMA ? (I mean , are there better technologies > which are a substitute for RDMA for ethernet ?) > Regards, > IB I kind of think that the market is too small to support a company making a card that is at best just slightly cheaper than things like Infiniband, and Myrinet, and is actually slower than the Infiniband and Myrinet. Consider how many cards one has to sell to pay a single engineers salary when you are at best making $100-$150 a card over production costs. The numbers don't look that good to me, and consider that previous to Ammasso and Chelsio there have been a long string of companies producing accelerated nitch network cards of various types (going back as far as the early 90's), and all of them have failed to get enough market share to stay in business. About the only thing that makes one of these companies viable is being bought out by someone large enough to support the needed funding. Level 5 is making accelerated ethernet cards, I believe most of the acceleration is in software in some manner (kernel bypass), and I don't know if their card could be made to do rdma. Roger From Thomas.Talpey at netapp.com Mon May 15 10:26:40 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Mon, 15 May 2006 13:26:40 -0400 Subject: [openib-general] CMA IPv6 support In-Reply-To: References: <20060515164443.GA19163@mellanox.co.il> Message-ID: <7.0.1.0.2.20060515131807.041caef8@netapp.com> At 01:05 PM 5/15/2006, Sean Hefty wrote: >I came to the same conclusion a couple of weeks ago. Rdma_create_id() will >likely need an address family parameter, or the user must explicitly >bind before calling listen. Rdma_create_id() already takes a struct sockaddr *, which has an address family selector (sa_family) to define the contained address format. Why is that one not sufficient? Tom. From schihei at de.ibm.com Mon May 15 10:41:13 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:41:13 +0200 Subject: [openib-general] [PATCH 01/16] ehca: module infrastructure Message-ID: <4468BD39.3010008@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/ehca_main.c | 966 +++++++++++++++++++++++++++++++++ 1 file changed, 966 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_main.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_main.c 2006-05-15 19:17:26.000000000 +0200 @@ -0,0 +1,966 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * module start stop, hca detection + * + * Authors: Heiko J Schick + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#define DEB_PREFIX "shca" + +#include "ehca_classes.h" +#include "ehca_iverbs.h" +#include "ehca_mrmw.h" +#include "ehca_tools.h" +#include "hcp_if.h" + +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_AUTHOR("Christoph Raisch "); +MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver"); +MODULE_VERSION("SVNEHCA_0006"); + +struct ehca_comp_pool* ehca_pool; + +int ehca_open_aqp1 = 0; +int ehca_debug_level = -1; +int ehca_hw_level = 0; +int ehca_nr_ports = 2; +int ehca_use_hp_mr = 0; +int ehca_port_act_time = 30; +int ehca_poll_all_eqs = 1; +int ehca_static_rate = -1; + +module_param_named(open_aqp1, ehca_open_aqp1, int, 0); +module_param_named(debug_level, ehca_debug_level, int, 0); +module_param_named(hw_level, ehca_hw_level, int, 0); +module_param_named(nr_ports, ehca_nr_ports, int, 0); +module_param_named(use_hp_mr, ehca_use_hp_mr, int, 0); +module_param_named(port_act_time, ehca_port_act_time, int, 0); +module_param_named(poll_all_eqs, ehca_poll_all_eqs, int, 0); +module_param_named(static_rate, ehca_static_rate, int, 0); + +MODULE_PARM_DESC(open_aqp1, + "AQP1 on startup (0: no (default), 1: yes)"); +MODULE_PARM_DESC(debug_level, + "debug level" + " (0: node, 6: only errors (default), 9: all)"); +MODULE_PARM_DESC(hw_level, + "hardware level" + " (0: autosensing (default), 1: v. 0.20, 2: v. 0.21)"); +MODULE_PARM_DESC(nr_ports, + "number of connected ports (default: 2)"); +MODULE_PARM_DESC(use_hp_mr, + "high performance MRs (0: no (default), 1: yes)"); +MODULE_PARM_DESC(port_act_time, + "time to wait for port activation (default: 30 sec)"); +MODULE_PARM_DESC(poll_all_eqs, + "polls all event queues periodically" + " (0: no, 1: yes (default))"); +MODULE_PARM_DESC(static_rate, + "set permanent static rate (default: disabled)"); + +/* This external trace mask controls what will end up in the + * kernel ring buffer. Number 6 means, that everything between + * 0 and 5 will be stored. + */ +u8 ehca_edeb_mask[EHCA_EDEB_TRACE_MASK_SIZE]={6, 6, 6, 6, + 6, 6, 6, 6, + 6, 6, 6, 6, + 6, 6, 6, 6, + 6, 6, 6, 6, + 6, 6, 6, 6, + 6, 6, 6, 6, + 6, 6, 0, 0}; + +spinlock_t ehca_qp_idr_lock; +spinlock_t ehca_cq_idr_lock; +DEFINE_IDR(ehca_qp_idr); +DEFINE_IDR(ehca_cq_idr); + +struct ehca_module ehca_module; + +void ehca_init_trace(void) +{ + EDEB_EN(7, ""); + + if (ehca_debug_level != -1) { + int i; + for (i = 0; i < EHCA_EDEB_TRACE_MASK_SIZE; i++) + ehca_edeb_mask[i] = ehca_debug_level; + } + + EDEB_EX(7, ""); +} + +int ehca_create_slab_caches(struct ehca_module *ehca_module) +{ + int ret = 0; + + EDEB_EN(7, ""); + + ehca_module->cache_pd = + kmem_cache_create("ehca_cache_pd", + sizeof(struct ehca_pd), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!ehca_module->cache_pd) { + EDEB_ERR(4, "Cannot create PD SLAB cache."); + ret = -ENOMEM; + goto create_slab_caches1; + } + + ehca_module->cache_cq = + kmem_cache_create("ehca_cache_cq", + sizeof(struct ehca_cq), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!ehca_module->cache_cq) { + EDEB_ERR(4, "Cannot create CQ SLAB cache."); + ret = -ENOMEM; + goto create_slab_caches2; + } + + ehca_module->cache_qp = + kmem_cache_create("ehca_cache_qp", + sizeof(struct ehca_qp), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!ehca_module->cache_qp) { + EDEB_ERR(4, "Cannot create QP SLAB cache."); + ret = -ENOMEM; + goto create_slab_caches3; + } + + ehca_module->cache_av = + kmem_cache_create("ehca_cache_av", + sizeof(struct ehca_av), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!ehca_module->cache_av) { + EDEB_ERR(4, "Cannot create AV SLAB cache."); + ret = -ENOMEM; + goto create_slab_caches4; + } + + ehca_module->cache_mw = + kmem_cache_create("ehca_cache_mw", + sizeof(struct ehca_mw), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!ehca_module->cache_mw) { + EDEB_ERR(4, "Cannot create MW SLAB cache."); + ret = -ENOMEM; + goto create_slab_caches5; + } + + ehca_module->cache_mr = + kmem_cache_create("ehca_cache_mr", + sizeof(struct ehca_mr), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + if (!ehca_module->cache_mr) { + EDEB_ERR(4, "Cannot create MR SLAB cache."); + ret = -ENOMEM; + goto create_slab_caches6; + } + + EDEB_EX(7, "ret=%x", ret); + + return ret; + +create_slab_caches6: + kmem_cache_destroy(ehca_module->cache_mw); + +create_slab_caches5: + kmem_cache_destroy(ehca_module->cache_av); + +create_slab_caches4: + kmem_cache_destroy(ehca_module->cache_qp); + +create_slab_caches3: + kmem_cache_destroy(ehca_module->cache_cq); + +create_slab_caches2: + kmem_cache_destroy(ehca_module->cache_pd); + +create_slab_caches1: + EDEB_EX(7, "ret=%x", ret); + + return ret; +} + +int ehca_destroy_slab_caches(struct ehca_module *ehca_module) +{ + int ret; + + EDEB_EN(7, ""); + + ret = kmem_cache_destroy(ehca_module->cache_pd); + if (ret) + EDEB_ERR(4, "Cannot destroy PD SLAB cache. ret=%x", ret); + + ret = kmem_cache_destroy(ehca_module->cache_cq); + if (ret) + EDEB_ERR(4, "Cannot destroy CQ SLAB cache. ret=%x", ret); + + ret = kmem_cache_destroy(ehca_module->cache_qp); + if (ret) + EDEB_ERR(4, "Cannot destroy QP SLAB cache. ret=%x", ret); + + ret = kmem_cache_destroy(ehca_module->cache_av); + if (ret) + EDEB_ERR(4, "Cannot destroy AV SLAB cache. ret=%x", ret); + + ret = kmem_cache_destroy(ehca_module->cache_mw); + if (ret) + EDEB_ERR(4, "Cannot destroy MW SLAB cache. ret=%x", ret); + + ret = kmem_cache_destroy(ehca_module->cache_mr); + if (ret) + EDEB_ERR(4, "Cannot destroy MR SLAB cache. ret=%x", ret); + + EDEB_EX(7, ""); + + return 0; +} + +#define EHCA_HCAAVER EHCA_BMASK_IBM(32,39) +#define EHCA_REVID EHCA_BMASK_IBM(40,63) + +int ehca_sense_attributes(struct ehca_shca *shca) +{ + int ret = -EINVAL; + u64 h_ret = H_SUCCESS; + struct hipz_query_hca *rblock; + + EDEB_EN(7, "shca=%p", shca); + + rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); + if (!rblock) { + EDEB_ERR(4, "Cannot allocate rblock memory."); + ret = -ENOMEM; + goto num_ports0; + } + + h_ret = hipz_h_query_hca(shca->ipz_hca_handle, rblock); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "Cannot query device properties. h_ret=%lx", h_ret); + ret = -EPERM; + goto num_ports1; + } + + if (ehca_nr_ports == 1) + shca->num_ports = 1; + else + shca->num_ports = (u8)rblock->num_ports; + + EDEB(6, " ... found %x ports", rblock->num_ports); + + if (ehca_hw_level == 0) { + u32 hcaaver; + u32 revid; + + hcaaver = EHCA_BMASK_GET(EHCA_HCAAVER, rblock->hw_ver); + revid = EHCA_BMASK_GET(EHCA_REVID, rblock->hw_ver); + + EDEB(6, " ... hardware version=%x:%x", + hcaaver, revid); + + if ((hcaaver == 1) && (revid == 0)) + shca->hw_level = 0; + else if ((hcaaver == 1) && (revid == 1)) + shca->hw_level = 1; + else if ((hcaaver == 1) && (revid == 2)) + shca->hw_level = 2; + } + EDEB(6, " ... hardware level=%x", shca->hw_level); + + shca->sport[0].rate = IB_RATE_30_GBPS; + shca->sport[1].rate = IB_RATE_30_GBPS; + + ret = 0; + +num_ports1: + kfree(rblock); + +num_ports0: + EDEB_EX(7, "ret=%x", ret); + + return ret; +} + +static int init_node_guid(struct ehca_shca* shca) +{ + int ret = 0; + struct hipz_query_hca *rblock; + + EDEB_EN(7, ""); + + rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); + if (!rblock) { + EDEB_ERR(4, "Can't allocate rblock memory."); + ret = -ENOMEM; + goto init_node_guid0; + } + + if (hipz_h_query_hca(shca->ipz_hca_handle, rblock) != H_SUCCESS) { + EDEB_ERR(4, "Can't query device properties"); + ret = -EINVAL; + goto init_node_guid1; + } + + memcpy(&shca->ib_device.node_guid, &rblock->node_guid, (sizeof(u64))); + +init_node_guid1: + kfree(rblock); + +init_node_guid0: + EDEB_EX(7, "node_guid=%lx ret=%x", shca->ib_device.node_guid, ret); + + return ret; +} + +int ehca_register_device(struct ehca_shca *shca) +{ + int ret = 0; + + EDEB_EN(7, "shca=%p", shca); + + ret = init_node_guid(shca); + if (ret) + return ret; + + strlcpy(shca->ib_device.name, "ehca%d", IB_DEVICE_NAME_MAX); + shca->ib_device.owner = THIS_MODULE; + + shca->ib_device.uverbs_abi_ver = 5; + shca->ib_device.uverbs_cmd_mask = + (1ull << IB_USER_VERBS_CMD_GET_CONTEXT) | + (1ull << IB_USER_VERBS_CMD_QUERY_DEVICE) | + (1ull << IB_USER_VERBS_CMD_QUERY_PORT) | + (1ull << IB_USER_VERBS_CMD_ALLOC_PD) | + (1ull << IB_USER_VERBS_CMD_DEALLOC_PD) | + (1ull << IB_USER_VERBS_CMD_REG_MR) | + (1ull << IB_USER_VERBS_CMD_DEREG_MR) | + (1ull << IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) | + (1ull << IB_USER_VERBS_CMD_CREATE_CQ) | + (1ull << IB_USER_VERBS_CMD_DESTROY_CQ) | + (1ull << IB_USER_VERBS_CMD_CREATE_QP) | + (1ull << IB_USER_VERBS_CMD_MODIFY_QP) | + (1ull << IB_USER_VERBS_CMD_QUERY_QP) | + (1ull << IB_USER_VERBS_CMD_DESTROY_QP) | + (1ull << IB_USER_VERBS_CMD_ATTACH_MCAST) | + (1ull << IB_USER_VERBS_CMD_DETACH_MCAST); + + shca->ib_device.node_type = RDMA_NODE_IB_CA; + shca->ib_device.phys_port_cnt = shca->num_ports; + shca->ib_device.dma_device = &shca->ibmebus_dev->ofdev.dev; + shca->ib_device.query_device = ehca_query_device; + shca->ib_device.query_port = ehca_query_port; + shca->ib_device.query_gid = ehca_query_gid; + shca->ib_device.query_pkey = ehca_query_pkey; + /* shca->in_device.modify_device = ehca_modify_device */ + shca->ib_device.modify_port = ehca_modify_port; + shca->ib_device.alloc_ucontext = ehca_alloc_ucontext; + shca->ib_device.dealloc_ucontext = ehca_dealloc_ucontext; + shca->ib_device.alloc_pd = ehca_alloc_pd; + shca->ib_device.dealloc_pd = ehca_dealloc_pd; + shca->ib_device.create_ah = ehca_create_ah; + /* shca->ib_device.modify_ah = ehca_modify_ah; */ + shca->ib_device.query_ah = ehca_query_ah; + shca->ib_device.destroy_ah = ehca_destroy_ah; + shca->ib_device.create_qp = ehca_create_qp; + shca->ib_device.modify_qp = ehca_modify_qp; + shca->ib_device.query_qp = ehca_query_qp; + shca->ib_device.destroy_qp = ehca_destroy_qp; + shca->ib_device.post_send = ehca_post_send; + shca->ib_device.post_recv = ehca_post_recv; + shca->ib_device.create_cq = ehca_create_cq; + shca->ib_device.destroy_cq = ehca_destroy_cq; + shca->ib_device.resize_cq = ehca_resize_cq; + shca->ib_device.poll_cq = ehca_poll_cq; + /* shca->ib_device.peek_cq = ehca_peek_cq; */ + shca->ib_device.req_notify_cq = ehca_req_notify_cq; + /* shca->ib_device.req_ncomp_notif = ehca_req_ncomp_notif; */ + shca->ib_device.get_dma_mr = ehca_get_dma_mr; + shca->ib_device.reg_phys_mr = ehca_reg_phys_mr; + shca->ib_device.reg_user_mr = ehca_reg_user_mr; + shca->ib_device.query_mr = ehca_query_mr; + shca->ib_device.dereg_mr = ehca_dereg_mr; + shca->ib_device.rereg_phys_mr = ehca_rereg_phys_mr; + shca->ib_device.alloc_mw = ehca_alloc_mw; + shca->ib_device.bind_mw = ehca_bind_mw; + shca->ib_device.dealloc_mw = ehca_dealloc_mw; + shca->ib_device.alloc_fmr = ehca_alloc_fmr; + shca->ib_device.map_phys_fmr = ehca_map_phys_fmr; + shca->ib_device.unmap_fmr = ehca_unmap_fmr; + shca->ib_device.dealloc_fmr = ehca_dealloc_fmr; + shca->ib_device.attach_mcast = ehca_attach_mcast; + shca->ib_device.detach_mcast = ehca_detach_mcast; + /* shca->ib_device.process_mad = ehca_process_mad; */ + shca->ib_device.mmap = ehca_mmap; + + ret = ib_register_device(&shca->ib_device); + + EDEB_EX(7, "ret=%x", ret); + + return ret; +} + +static int ehca_create_aqp1(struct ehca_shca *shca, u32 port) +{ + struct ehca_sport *sport; + struct ib_cq *ibcq; + struct ib_qp *ibqp; + struct ib_qp_init_attr qp_init_attr; + int ret = 0; + + EDEB_EN(7, "shca=%p port=%x", shca, port); + + sport = &shca->sport[port - 1]; + + if (sport->ibcq_aqp1) { + EDEB_ERR(4, "AQP1 CQ is already created."); + return -EPERM; + } + + ibcq = ib_create_cq(&shca->ib_device, NULL, NULL, (void*)(-1), 10); + if (IS_ERR(ibcq)) { + EDEB_ERR(4, "Cannot create AQP1 CQ."); + return PTR_ERR(ibcq); + } + sport->ibcq_aqp1 = ibcq; + + if (sport->ibqp_aqp1) { + EDEB_ERR(4, "AQP1 QP is already created."); + ret = -EPERM; + goto create_aqp1; + } + + memset(&qp_init_attr, 0, sizeof(struct ib_qp_init_attr)); + qp_init_attr.send_cq = ibcq; + qp_init_attr.recv_cq = ibcq; + qp_init_attr.sq_sig_type = IB_SIGNAL_ALL_WR; + qp_init_attr.cap.max_send_wr = 100; + qp_init_attr.cap.max_recv_wr = 100; + qp_init_attr.cap.max_send_sge = 2; + qp_init_attr.cap.max_recv_sge = 1; + qp_init_attr.qp_type = IB_QPT_GSI; + qp_init_attr.port_num = port; + qp_init_attr.qp_context = NULL; + qp_init_attr.event_handler = NULL; + qp_init_attr.srq = NULL; + + ibqp = ib_create_qp(&shca->pd->ib_pd, &qp_init_attr); + if (IS_ERR(ibqp)) { + EDEB_ERR(4, "Cannot create AQP1 QP."); + ret = PTR_ERR(ibqp); + goto create_aqp1; + } + sport->ibqp_aqp1 = ibqp; + + EDEB_EX(7, "ret=%x", ret); + + return ret; + +create_aqp1: + ib_destroy_cq(sport->ibcq_aqp1); + + EDEB_EX(7, "ret=%x", ret); + + return ret; +} + +static int ehca_destroy_aqp1(struct ehca_sport *sport) +{ + int ret = 0; + + EDEB_EN(7, "sport=%p", sport); + + ret = ib_destroy_qp(sport->ibqp_aqp1); + if (ret) { + EDEB_ERR(4, "Cannot destroy AQP1 QP. ret=%x", ret); + goto destroy_aqp1; + } + + ret = ib_destroy_cq(sport->ibcq_aqp1); + if (ret) + EDEB_ERR(4, "Cannot destroy AQP1 CQ. ret=%x", ret); + +destroy_aqp1: + EDEB_EX(7, "ret=%x", ret); + + return ret; +} + +static ssize_t ehca_show_debug_mask(struct device_driver *ddp, char *buf) +{ + int i; + int total = 0; + total += snprintf(buf + total, PAGE_SIZE - total, "%d", + ehca_edeb_mask[0]); + for (i = 1; i < EHCA_EDEB_TRACE_MASK_SIZE; i++) { + total += snprintf(buf + total, PAGE_SIZE - total, "%d", + ehca_edeb_mask[i]); + } + + total += snprintf(buf + total, PAGE_SIZE - total, "\n"); + + return total; +} + +static ssize_t ehca_store_debug_mask(struct device_driver *ddp, + const char *buf, size_t count) +{ + int i; + for (i = 0; i < EHCA_EDEB_TRACE_MASK_SIZE; i++) { + char value = buf[i] - '0'; + if ((value <= 9) && (count >= i)) { + ehca_edeb_mask[i] = value; + } + } + return count; +} +DRIVER_ATTR(debug_mask, S_IRUSR | S_IWUSR, + ehca_show_debug_mask, ehca_store_debug_mask); + +void ehca_create_driver_sysfs(struct ibmebus_driver *drv) +{ + driver_create_file(&drv->driver, &driver_attr_debug_mask); +} + +void ehca_remove_driver_sysfs(struct ibmebus_driver *drv) +{ + driver_remove_file(&drv->driver, &driver_attr_debug_mask); +} + +#define EHCA_RESOURCE_ATTR(name) \ +static ssize_t ehca_show_##name(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + struct ehca_shca *shca; \ + struct hipz_query_hca *rblock; \ + int data; \ + \ + shca = dev->driver_data; \ + \ + rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); \ + if (!rblock) { \ + EDEB_ERR(4, "Can't allocate rblock memory."); \ + return 0; \ + } \ + \ + if (hipz_h_query_hca(shca->ipz_hca_handle, rblock) != H_SUCCESS) { \ + EDEB_ERR(4, "Can't query device properties"); \ + kfree(rblock); \ + return 0; \ + } \ + \ + data = rblock->name; \ + kfree(rblock); \ + \ + if ((strcmp(#name, "num_ports") == 0) && (ehca_nr_ports == 1)) \ + return snprintf(buf, 256, "1\n"); \ + else \ + return snprintf(buf, 256, "%d\n", data); \ + \ +} \ +static DEVICE_ATTR(name, S_IRUGO, ehca_show_##name, NULL); + +EHCA_RESOURCE_ATTR(num_ports); +EHCA_RESOURCE_ATTR(hw_ver); +EHCA_RESOURCE_ATTR(max_eq); +EHCA_RESOURCE_ATTR(cur_eq); +EHCA_RESOURCE_ATTR(max_cq); +EHCA_RESOURCE_ATTR(cur_cq); +EHCA_RESOURCE_ATTR(max_qp); +EHCA_RESOURCE_ATTR(cur_qp); +EHCA_RESOURCE_ATTR(max_mr); +EHCA_RESOURCE_ATTR(cur_mr); +EHCA_RESOURCE_ATTR(max_mw); +EHCA_RESOURCE_ATTR(cur_mw); +EHCA_RESOURCE_ATTR(max_pd); +EHCA_RESOURCE_ATTR(max_ah); + +static ssize_t ehca_show_adapter_handle(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct ehca_shca *shca = dev->driver_data; + + return sprintf(buf, "%lx\n", shca->ipz_hca_handle.handle); + +} +static DEVICE_ATTR(adapter_handle, S_IRUGO, ehca_show_adapter_handle, NULL); + + + +void ehca_create_device_sysfs(struct ibmebus_dev *dev) +{ + device_create_file(&dev->ofdev.dev, &dev_attr_adapter_handle); + device_create_file(&dev->ofdev.dev, &dev_attr_num_ports); + device_create_file(&dev->ofdev.dev, &dev_attr_hw_ver); + device_create_file(&dev->ofdev.dev, &dev_attr_max_eq); + device_create_file(&dev->ofdev.dev, &dev_attr_cur_eq); + device_create_file(&dev->ofdev.dev, &dev_attr_max_cq); + device_create_file(&dev->ofdev.dev, &dev_attr_cur_cq); + device_create_file(&dev->ofdev.dev, &dev_attr_max_qp); + device_create_file(&dev->ofdev.dev, &dev_attr_cur_qp); + device_create_file(&dev->ofdev.dev, &dev_attr_max_mr); + device_create_file(&dev->ofdev.dev, &dev_attr_cur_mr); + device_create_file(&dev->ofdev.dev, &dev_attr_max_mw); + device_create_file(&dev->ofdev.dev, &dev_attr_cur_mw); + device_create_file(&dev->ofdev.dev, &dev_attr_max_pd); + device_create_file(&dev->ofdev.dev, &dev_attr_max_ah); +} + +void ehca_remove_device_sysfs(struct ibmebus_dev *dev) +{ + device_remove_file(&dev->ofdev.dev, &dev_attr_adapter_handle); + device_remove_file(&dev->ofdev.dev, &dev_attr_num_ports); + device_remove_file(&dev->ofdev.dev, &dev_attr_hw_ver); + device_remove_file(&dev->ofdev.dev, &dev_attr_max_eq); + device_remove_file(&dev->ofdev.dev, &dev_attr_cur_eq); + device_remove_file(&dev->ofdev.dev, &dev_attr_max_cq); + device_remove_file(&dev->ofdev.dev, &dev_attr_cur_cq); + device_remove_file(&dev->ofdev.dev, &dev_attr_max_qp); + device_remove_file(&dev->ofdev.dev, &dev_attr_cur_qp); + device_remove_file(&dev->ofdev.dev, &dev_attr_max_mr); + device_remove_file(&dev->ofdev.dev, &dev_attr_cur_mr); + device_remove_file(&dev->ofdev.dev, &dev_attr_max_mw); + device_remove_file(&dev->ofdev.dev, &dev_attr_cur_mw); + device_remove_file(&dev->ofdev.dev, &dev_attr_max_pd); + device_remove_file(&dev->ofdev.dev, &dev_attr_max_ah); +} + +static int __devinit ehca_probe(struct ibmebus_dev *dev, + const struct of_device_id *id) +{ + struct ehca_shca *shca; + u64 *handle; + struct ib_pd *ibpd; + int ret = 0; + + EDEB_EN(7, "name=%s", dev->name); + + handle = (u64 *)get_property(dev->ofdev.node, "ibm,hca-handle", NULL); + if (!handle) { + EDEB_ERR(4, "Cannot get eHCA handle for adapter: %s.", + dev->ofdev.node->full_name); + return -ENODEV; + } + + if (!(*handle)) { + EDEB_ERR(4, "Wrong eHCA handle for adapter: %s.", + dev->ofdev.node->full_name); + return -ENODEV; + } + + shca = (struct ehca_shca *)ib_alloc_device(sizeof(*shca)); + if (shca == NULL) { + EDEB_ERR(4, "Cannot allocate shca memory."); + return -ENOMEM; + } + + shca->ibmebus_dev = dev; + shca->ipz_hca_handle.handle = *handle; + dev->ofdev.dev.driver_data = shca; + + ret = ehca_sense_attributes(shca); + if (ret < 0) { + EDEB_ERR(4, "Cannot sense eHCA attributes."); + goto probe1; + } + + /* create event queues */ + ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, 2048); + if (ret) { + EDEB_ERR(4, "Cannot create EQ."); + goto probe1; + } + + ret = ehca_create_eq(shca, &shca->neq, EHCA_NEQ, 513); + if (ret) { + EDEB_ERR(4, "Cannot create NEQ."); + goto probe2; + } + + /* create internal protection domain */ + ibpd = ehca_alloc_pd(&shca->ib_device, (void*)(-1), NULL); + if (IS_ERR(ibpd)) { + EDEB_ERR(4, "Cannot create internal PD."); + ret = PTR_ERR(ibpd); + goto probe3; + } + + shca->pd = container_of(ibpd, struct ehca_pd, ib_pd); + shca->pd->ib_pd.device = &shca->ib_device; + + /* create internal max MR */ + ret = ehca_reg_internal_maxmr(shca, shca->pd, &shca->maxmr); + if (ret) { + EDEB_ERR(4, "Cannot create internal MR. ret=%x", ret); + goto probe4; + } + + ret = ehca_register_device(shca); + if (ret) { + EDEB_ERR(4, "Cannot register Infiniband device."); + goto probe5; + } + + /* create AQP1 for port 1 */ + if (ehca_open_aqp1 == 1) { + shca->sport[0].port_state = IB_PORT_DOWN; + ret = ehca_create_aqp1(shca, 1); + if (ret) { + EDEB_ERR(4, "Cannot create AQP1 for port 1."); + goto probe6; + } + } + + /* create AQP1 for port 2 */ + if ((ehca_open_aqp1 == 1) && (shca->num_ports == 2)) { + shca->sport[1].port_state = IB_PORT_DOWN; + ret = ehca_create_aqp1(shca, 2); + if (ret) { + EDEB_ERR(4, "Cannot create AQP1 for port 2."); + goto probe7; + } + } + + ehca_create_device_sysfs(dev); + + spin_lock(&ehca_module.shca_lock); + list_add(&shca->shca_list, &ehca_module.shca_list); + spin_unlock(&ehca_module.shca_lock); + + EDEB_EX(7, "ret=%x", ret); + + return 0; + +probe7: + ret = ehca_destroy_aqp1(&shca->sport[0]); + if (ret) + EDEB_ERR(4, "Cannot destroy AQP1 for port 1. ret=%x", ret); + +probe6: + ib_unregister_device(&shca->ib_device); + +probe5: + ret = ehca_dereg_internal_maxmr(shca); + if (ret) + EDEB_ERR(4, "Cannot destroy internal MR. ret=%x", ret); + +probe4: + ret = ehca_dealloc_pd(&shca->pd->ib_pd); + if (ret != 0) + EDEB_ERR(4, "Cannot destroy internal PD. ret=%x", ret); + +probe3: + ret = ehca_destroy_eq(shca, &shca->neq); + if (ret != 0) + EDEB_ERR(4, "Cannot destroy NEQ. ret=%x", ret); + +probe2: + ret = ehca_destroy_eq(shca, &shca->eq); + if (ret != 0) + EDEB_ERR(4, "Cannot destroy EQ. ret=%x", ret); + +probe1: + ib_dealloc_device(&shca->ib_device); + + EDEB_EX(4, "ret=%x", ret); + + return -EINVAL; +} + +static int __devexit ehca_remove(struct ibmebus_dev *dev) +{ + struct ehca_shca *shca = dev->ofdev.dev.driver_data; + int ret; + + EDEB_EN(7, "shca=%p", shca); + + ehca_remove_device_sysfs(dev); + + if (ehca_open_aqp1 == 1) { + int i; + + for (i = 0; i < shca->num_ports; i++) { + ret = ehca_destroy_aqp1(&shca->sport[i]); + if (ret != 0) + EDEB_ERR(4, "Cannot destroy AQP1 for port %x." + " ret=%x", ret, i); + } + } + + ib_unregister_device(&shca->ib_device); + + ret = ehca_dereg_internal_maxmr(shca); + if (ret) + EDEB_ERR(4, "Cannot destroy internal MR. ret=%x", ret); + + ret = ehca_dealloc_pd(&shca->pd->ib_pd); + if (ret) + EDEB_ERR(4, "Cannot destroy internal PD. ret=%x", ret); + + ret = ehca_destroy_eq(shca, &shca->eq); + if (ret) + EDEB_ERR(4, "Cannot destroy EQ. ret=%x", ret); + + ret = ehca_destroy_eq(shca, &shca->neq); + if (ret) + EDEB_ERR(4, "Canot destroy NEQ. ret=%x", ret); + + ib_dealloc_device(&shca->ib_device); + + spin_lock(&ehca_module.shca_lock); + list_del(&shca->shca_list); + spin_unlock(&ehca_module.shca_lock); + + EDEB_EX(7, "ret=%x", ret); + + return ret; +} + +static struct of_device_id ehca_device_table[] = +{ + { + .name = "lhca", + .compatible = "IBM,lhca", + }, + {}, +}; + +static struct ibmebus_driver ehca_driver = { + .name = "ehca", + .id_table = ehca_device_table, + .probe = ehca_probe, + .remove = ehca_remove, +}; + +int __init ehca_module_init(void) +{ + int ret = 0; + + printk(KERN_INFO "eHCA Infiniband Device Driver " + "(Rel.: SVNEHCA_0006)\n"); + EDEB_EN(7, ""); + + idr_init(&ehca_qp_idr); + idr_init(&ehca_cq_idr); + spin_lock_init(&ehca_qp_idr_lock); + spin_lock_init(&ehca_cq_idr_lock); + + INIT_LIST_HEAD(&ehca_module.shca_list); + spin_lock_init(&ehca_module.shca_lock); + + ehca_init_trace(); + + ehca_pool = ehca_create_comp_pool(); + if (ehca_pool == NULL) { + EDEB_ERR(4, "Cannot create comp pool."); + ret = -EINVAL; + goto module_init0; + } + + if ((ret = ehca_create_slab_caches(&ehca_module))) { + EDEB_ERR(4, "Cannot create SLAB caches"); + ret = -ENOMEM; + goto module_init1; + } + + if ((ret = ibmebus_register_driver(&ehca_driver))) { + EDEB_ERR(4, "Cannot register eHCA device driver"); + ret = -EINVAL; + goto module_init2; + } + + ehca_create_driver_sysfs(&ehca_driver); + + if (ehca_poll_all_eqs != 1) { + EDEB_ERR(4, "WARNING!!!"); + EDEB_ERR(4, "It is possible to lose interrupts."); + + return 0; + } + + init_timer(&ehca_module.timer); + ehca_module.timer.function = ehca_poll_eqs; + ehca_module.timer.data = (unsigned long)(void*)&ehca_module; + ehca_module.timer.expires = jiffies + HZ; + add_timer(&ehca_module.timer); + + EDEB_EX(7, "ret=%x", ret); + + return 0; + +module_init2: + ehca_destroy_slab_caches(&ehca_module); + +module_init1: + ehca_destroy_comp_pool(ehca_pool); + +module_init0: + EDEB_EX(7, "ret=%x", ret); + + return ret; +}; + +void __exit ehca_module_exit(void) +{ + EDEB_EN(7, ""); + + if (ehca_poll_all_eqs == 1) + del_timer_sync(&ehca_module.timer); + + ehca_remove_driver_sysfs(&ehca_driver); + ibmebus_unregister_driver(&ehca_driver); + + if (ehca_destroy_slab_caches(&ehca_module) != 0) + EDEB_ERR(4, "Cannot destroy SLAB caches"); + + ehca_destroy_comp_pool(ehca_pool); + + idr_destroy(&ehca_cq_idr); + idr_destroy(&ehca_qp_idr); + + EDEB_EX(7, ""); +}; + +module_init(ehca_module_init); +module_exit(ehca_module_exit); From schihei at de.ibm.com Mon May 15 10:41:21 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:41:21 +0200 Subject: [openib-general] [PATCH 02/16] ehca: structure definitions Message-ID: <4468BD41.4010803@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/ehca_classes.h | 350 ++++++++++++++++++++++ drivers/infiniband/hw/ehca/ehca_classes_pSeries.h | 251 +++++++++++++++ 2 files changed, 601 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_classes.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_classes.h 2006-05-12 12:48:21.000000000 +0200 @@ -0,0 +1,350 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * Struct definition for eHCA internal structures + * + * Authors: Heiko J Schick + * Christoph Raisch + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __EHCA_CLASSES_H__ +#define __EHCA_CLASSES_H__ + +#include "ehca_classes.h" +#include "ipz_pt_fn.h" + +struct ehca_module; +struct ehca_qp; +struct ehca_cq; +struct ehca_eq; +struct ehca_mr; +struct ehca_mw; +struct ehca_pd; +struct ehca_av; + +#ifdef CONFIG_PPC64 +#include "ehca_classes_pSeries.h" +#endif + +#include +#include + +#include "ehca_irq.h" + +struct ehca_module { + struct list_head shca_list; + spinlock_t shca_lock; + struct timer_list timer; + kmem_cache_t *cache_pd; + kmem_cache_t *cache_cq; + kmem_cache_t *cache_qp; + kmem_cache_t *cache_av; + kmem_cache_t *cache_mr; + kmem_cache_t *cache_mw; + struct ehca_pfmodule pf; +}; + +struct ehca_eq { + u32 length; + struct ipz_queue ipz_queue; + struct ipz_eq_handle ipz_eq_handle; + struct work_struct work; + struct h_galpas galpas; + int is_initialized; + struct ehca_pfeq pf; + spinlock_t spinlock; + struct tasklet_struct interrupt_task; + u32 ist; +}; + +struct ehca_sport { + struct ib_cq *ibcq_aqp1; + struct ib_qp *ibqp_aqp1; + enum ib_rate rate; + enum ib_port_state port_state; +}; + +struct ehca_shca { + struct ib_device ib_device; + struct ibmebus_dev *ibmebus_dev; + u8 num_ports; + int hw_level; + struct list_head shca_list; + struct ipz_adapter_handle ipz_hca_handle; + struct ehca_sport sport[2]; + struct ehca_eq eq; + struct ehca_eq neq; + struct ehca_mr *maxmr; + struct ehca_pd *pd; + struct ehca_pfshca pf; + struct h_galpas galpas; +}; + +struct ehca_pd { + struct ib_pd ib_pd; + struct ipz_pd fw_pd; + struct ehca_pfpd pf; + u32 ownpid; +}; + +struct ehca_qp { + struct ib_qp ib_qp; + u32 qp_type; + struct ipz_queue ipz_squeue; + struct ipz_queue ipz_rqueue; + struct h_galpas galpas; + u32 qkey; + u32 real_qp_num; + u32 token; + spinlock_t spinlock_s; + spinlock_t spinlock_r; + u32 sq_max_inline_data_size; + struct ipz_qp_handle ipz_qp_handle; + struct ehca_pfqp pf; + struct ib_qp_init_attr init_attr; + u64 uspace_squeue; + u64 uspace_rqueue; + u64 uspace_fwh; + struct ehca_cq *send_cq; + struct ehca_cq *recv_cq; + unsigned int sqerr_purgeflag; + struct hlist_node list_entries; +}; + +/* must be power of 2 */ +#define QP_HASHTAB_LEN 8 + +struct ehca_cq { + struct ib_cq ib_cq; + struct ipz_queue ipz_queue; + struct h_galpas galpas; + spinlock_t spinlock; + u32 cq_number; + u32 token; + u32 nr_of_entries; + struct ipz_cq_handle ipz_cq_handle; + struct ehca_pfcq pf; + spinlock_t cb_lock; + u64 uspace_queue; + u64 uspace_fwh; + struct hlist_head qp_hashtab[QP_HASHTAB_LEN]; + struct list_head entry; + u32 nr_callbacks; + spinlock_t task_lock; + u32 ownpid; +}; + +enum ehca_mr_flag { + EHCA_MR_FLAG_FMR = 0x80000000, /* FMR, created with ehca_alloc_fmr */ + EHCA_MR_FLAG_MAXMR = 0x40000000, /* max-MR */ +}; + +struct ehca_mr { + union { + struct ib_mr ib_mr; /* must always be first in ehca_mr */ + struct ib_fmr ib_fmr; /* must always be first in ehca_mr */ + } ib; + spinlock_t mrlock; + + enum ehca_mr_flag flags; + u32 num_pages; /* number of MR pages */ + u32 num_4k; /* number of 4k "page" portions to form MR */ + int acl; /* ACL (stored here for usage in reregister) */ + u64 *start; /* virtual start address (stored here for */ + /* usage in reregister) */ + u64 size; /* size (stored here for usage in reregister) */ + u32 fmr_page_size; /* page size for FMR */ + u32 fmr_max_pages; /* max pages for FMR */ + u32 fmr_max_maps; /* max outstanding maps for FMR */ + u32 fmr_map_cnt; /* map counter for FMR */ + /* fw specific data */ + struct ipz_mrmw_handle ipz_mr_handle; /* MR handle for h-calls */ + struct h_galpas galpas; + /* data for userspace bridge */ + u32 nr_of_pages; + void *pagearray; + + struct ehca_pfmr pf; /* platform specific part of MR */ +}; + +struct ehca_mw { + struct ib_mw ib_mw; /* gen2 mw, must always be first in ehca_mw */ + spinlock_t mwlock; + + u8 never_bound; /* indication MW was never bound */ + struct ipz_mrmw_handle ipz_mw_handle; /* MW handle for h-calls */ + struct h_galpas galpas; + + struct ehca_pfmw pf; /* platform specific part of MW */ +}; + +enum ehca_mr_pgi_type { + EHCA_MR_PGI_PHYS = 1, /* type of ehca_reg_phys_mr, + * ehca_rereg_phys_mr, + * ehca_reg_internal_maxmr */ + EHCA_MR_PGI_USER = 2, /* type of ehca_reg_user_mr */ + EHCA_MR_PGI_FMR = 3 /* type of ehca_map_phys_fmr */ +}; + +struct ehca_mr_pginfo { + enum ehca_mr_pgi_type type; + u64 num_pages; + u64 page_cnt; + u64 num_4k; /* number of 4k "page" portions */ + u64 page_4k_cnt; /* counter for 4k "page" portions */ + u64 next_4k; /* next 4k "page" portion in buffer/chunk/listelem */ + + /* type EHCA_MR_PGI_PHYS section */ + int num_phys_buf; + struct ib_phys_buf *phys_buf_array; + u64 next_buf; + + /* type EHCA_MR_PGI_USER section */ + struct ib_umem *region; + struct ib_umem_chunk *next_chunk; + u64 next_nmap; + + /* type EHCA_MR_PGI_FMR section */ + u64 *page_list; + u64 next_listelem; + /* next_4k also used within EHCA_MR_PGI_FMR */ +}; + +/* output parameters for MR/FMR hipz calls */ +struct ehca_mr_hipzout_parms { + struct ipz_mrmw_handle handle; + u32 lkey; + u32 rkey; + u64 len; + u64 vaddr; + u32 acl; +}; + +/* output parameters for MW hipz calls */ +struct ehca_mw_hipzout_parms { + struct ipz_mrmw_handle handle; + u32 rkey; +}; + +struct ehca_av { + struct ib_ah ib_ah; + struct ehca_ud_av av; +}; + +struct ehca_ucontext { + struct ib_ucontext ib_ucontext; +}; + +struct ehca_module *ehca_module_new(void); + +int ehca_module_delete(struct ehca_module *me); + +int ehca_eq_ctor(struct ehca_eq *eq); + +int ehca_eq_dtor(struct ehca_eq *eq); + +struct ehca_shca *ehca_shca_new(void); + +int ehca_shca_delete(struct ehca_shca *me); + +struct ehca_sport *ehca_sport_new(struct ehca_shca *anchor); + +extern spinlock_t ehca_qp_idr_lock; +extern spinlock_t ehca_cq_idr_lock; +extern struct idr ehca_qp_idr; +extern struct idr ehca_cq_idr; + +struct ipzu_queue_resp { + u64 queue; /* points to first queue entry */ + u32 qe_size; /* queue entry size */ + u32 act_nr_of_sg; + u32 queue_length; /* queue length allocated in bytes */ + u32 pagesize; + u32 toggle_state; + u32 dummy; /* padding for 8 byte alignment */ +}; + +struct ehca_create_cq_resp { + u32 cq_number; + u32 token; + struct ipzu_queue_resp ipz_queue; + struct h_galpas galpas; +}; + +struct ehca_create_qp_resp { + u32 qp_num; + u32 token; + u32 qp_type; + u32 qkey; + /* qp_num assigned by ehca: sqp0/1 may have got different numbers */ + u32 real_qp_num; + u32 dummy; /* padding for 8 byte alignment */ + struct ipzu_queue_resp ipz_squeue; + struct ipzu_queue_resp ipz_rqueue; + struct h_galpas galpas; +}; + +struct ehca_alloc_cq_parms { + u32 nr_cqe; + u32 act_nr_of_entries; + u32 act_pages; + struct ipz_eq_handle eq_handle; +}; + +struct ehca_alloc_qp_parms { + int servicetype; + int sigtype; + int daqp_ctrl; + int max_send_sge; + int max_recv_sge; + int ud_av_l_key_ctl; + + u16 act_nr_send_wqes; + u16 act_nr_recv_wqes; + u8 act_nr_recv_sges; + u8 act_nr_send_sges; + + u32 nr_rq_pages; + u32 nr_sq_pages; + + struct ipz_eq_handle ipz_eq_handle; + struct ipz_pd pd; +}; + +int ehca_cq_assign_qp(struct ehca_cq *cq, struct ehca_qp *qp); +int ehca_cq_unassign_qp(struct ehca_cq *cq, unsigned int qp_num); +struct ehca_qp* ehca_cq_get_qp(struct ehca_cq *cq, int qp_num); + +#endif --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_classes_pSeries.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_classes_pSeries.h 2006-04-28 14:20:07.000000000 +0200 @@ -0,0 +1,251 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * pSeries interface definitions + * + * Authors: Waleri Fomin + * Christoph Raisch + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __EHCA_CLASSES_PSERIES_H__ +#define __EHCA_CLASSES_PSERIES_H__ + +#include "hcp_phyp.h" +#include "ipz_pt_fn.h" + + +struct ehca_pfmodule { +}; + +struct ehca_pfshca { +}; + +struct ehca_pfqp { + struct ipz_qpt sqpt; + struct ipz_qpt rqpt; +}; + +struct ehca_pfcq { + struct ipz_qpt qpt; + u32 cqnr; +}; + +struct ehca_pfeq { + struct ipz_qpt qpt; + struct h_galpa galpa; + u32 eqnr; +}; + +struct ehca_pfpd { +}; + +struct ehca_pfmr { +}; + +struct ehca_pfmw { +}; + +struct ipz_adapter_handle { + u64 handle; +}; + +struct ipz_cq_handle { + u64 handle; +}; + +struct ipz_eq_handle { + u64 handle; +}; + +struct ipz_qp_handle { + u64 handle; +}; +struct ipz_mrmw_handle { + u64 handle; +}; + +struct ipz_pd { + u32 value; +}; + +struct hcp_modify_qp_control_block { + u32 qkey; /* 00 */ + u32 rdd; /* reliable datagram domain */ + u32 send_psn; /* 02 */ + u32 receive_psn; /* 03 */ + u32 prim_phys_port; /* 04 */ + u32 alt_phys_port; /* 05 */ + u32 prim_p_key_idx; /* 06 */ + u32 alt_p_key_idx; /* 07 */ + u32 rdma_atomic_ctrl; /* 08 */ + u32 qp_state; /* 09 */ + u32 reserved_10; /* 10 */ + u32 rdma_nr_atomic_resp_res; /* 11 */ + u32 path_migration_state; /* 12 */ + u32 rdma_atomic_outst_dest_qp; /* 13 */ + u32 dest_qp_nr; /* 14 */ + u32 min_rnr_nak_timer_field; /* 15 */ + u32 service_level; /* 16 */ + u32 send_grh_flag; /* 17 */ + u32 retry_count; /* 18 */ + u32 timeout; /* 19 */ + u32 path_mtu; /* 20 */ + u32 max_static_rate; /* 21 */ + u32 dlid; /* 22 */ + u32 rnr_retry_count; /* 23 */ + u32 source_path_bits; /* 24 */ + u32 traffic_class; /* 25 */ + u32 hop_limit; /* 26 */ + u32 source_gid_idx; /* 27 */ + u32 flow_label; /* 28 */ + u32 reserved_29; /* 29 */ + union { /* 30 */ + u64 dw[2]; + u8 byte[16]; + } dest_gid; + u32 service_level_al; /* 34 */ + u32 send_grh_flag_al; /* 35 */ + u32 retry_count_al; /* 36 */ + u32 timeout_al; /* 37 */ + u32 max_static_rate_al; /* 38 */ + u32 dlid_al; /* 39 */ + u32 rnr_retry_count_al; /* 40 */ + u32 source_path_bits_al; /* 41 */ + u32 traffic_class_al; /* 42 */ + u32 hop_limit_al; /* 43 */ + u32 source_gid_idx_al; /* 44 */ + u32 flow_label_al; /* 45 */ + u32 reserved_46; /* 46 */ + u32 reserved_47; /* 47 */ + union { /* 48 */ + u64 dw[2]; + u8 byte[16]; + } dest_gid_al; + u32 max_nr_outst_send_wr; /* 52 */ + u32 max_nr_outst_recv_wr; /* 53 */ + u32 disable_ete_credit_check; /* 54 */ + u32 qp_number; /* 55 */ + u64 send_queue_handle; /* 56 */ + u64 recv_queue_handle; /* 58 */ + u32 actual_nr_sges_in_sq_wqe; /* 60 */ + u32 actual_nr_sges_in_rq_wqe; /* 61 */ + u32 qp_enable; /* 62 */ + u32 curr_srq_limit; /* 63 */ + u64 qp_aff_asyn_ev_log_reg; /* 64 */ + u64 shared_rq_hndl; /* 66 */ + u64 trigg_doorbell_qp_hndl; /* 68 */ + u32 reserved_70_127[58]; /* 70 */ +}; + +#define MQPCB_MASK_QKEY EHCA_BMASK_IBM(0,0) +#define MQPCB_MASK_SEND_PSN EHCA_BMASK_IBM(2,2) +#define MQPCB_MASK_RECEIVE_PSN EHCA_BMASK_IBM(3,3) +#define MQPCB_MASK_PRIM_PHYS_PORT EHCA_BMASK_IBM(4,4) +#define MQPCB_PRIM_PHYS_PORT EHCA_BMASK_IBM(24,31) +#define MQPCB_MASK_ALT_PHYS_PORT EHCA_BMASK_IBM(5,5) +#define MQPCB_MASK_PRIM_P_KEY_IDX EHCA_BMASK_IBM(6,6) +#define MQPCB_PRIM_P_KEY_IDX EHCA_BMASK_IBM(24,31) +#define MQPCB_MASK_ALT_P_KEY_IDX EHCA_BMASK_IBM(7,7) +#define MQPCB_MASK_RDMA_ATOMIC_CTRL EHCA_BMASK_IBM(8,8) +#define MQPCB_MASK_QP_STATE EHCA_BMASK_IBM(9,9) +#define MQPCB_QP_STATE EHCA_BMASK_IBM(24,31) +#define MQPCB_MASK_RDMA_NR_ATOMIC_RESP_RES EHCA_BMASK_IBM(11,11) +#define MQPCB_MASK_PATH_MIGRATION_STATE EHCA_BMASK_IBM(12,12) +#define MQPCB_MASK_RDMA_ATOMIC_OUTST_DEST_QP EHCA_BMASK_IBM(13,13) +#define MQPCB_MASK_DEST_QP_NR EHCA_BMASK_IBM(14,14) +#define MQPCB_MASK_MIN_RNR_NAK_TIMER_FIELD EHCA_BMASK_IBM(15,15) +#define MQPCB_MASK_SERVICE_LEVEL EHCA_BMASK_IBM(16,16) +#define MQPCB_MASK_SEND_GRH_FLAG EHCA_BMASK_IBM(17,17) +#define MQPCB_MASK_RETRY_COUNT EHCA_BMASK_IBM(18,18) +#define MQPCB_MASK_TIMEOUT EHCA_BMASK_IBM(19,19) +#define MQPCB_MASK_PATH_MTU EHCA_BMASK_IBM(20,20) +#define MQPCB_PATH_MTU EHCA_BMASK_IBM(24,31) +#define MQPCB_MASK_MAX_STATIC_RATE EHCA_BMASK_IBM(21,21) +#define MQPCB_MAX_STATIC_RATE EHCA_BMASK_IBM(24,31) +#define MQPCB_MASK_DLID EHCA_BMASK_IBM(22,22) +#define MQPCB_DLID EHCA_BMASK_IBM(16,31) +#define MQPCB_MASK_RNR_RETRY_COUNT EHCA_BMASK_IBM(23,23) +#define MQPCB_RNR_RETRY_COUNT EHCA_BMASK_IBM(29,31) +#define MQPCB_MASK_SOURCE_PATH_BITS EHCA_BMASK_IBM(24,24) +#define MQPCB_SOURCE_PATH_BITS EHCA_BMASK_IBM(25,31) +#define MQPCB_MASK_TRAFFIC_CLASS EHCA_BMASK_IBM(25,25) +#define MQPCB_TRAFFIC_CLASS EHCA_BMASK_IBM(24,31) +#define MQPCB_MASK_HOP_LIMIT EHCA_BMASK_IBM(26,26) +#define MQPCB_HOP_LIMIT EHCA_BMASK_IBM(24,31) +#define MQPCB_MASK_SOURCE_GID_IDX EHCA_BMASK_IBM(27,27) +#define MQPCB_SOURCE_GID_IDX EHCA_BMASK_IBM(24,31) +#define MQPCB_MASK_FLOW_LABEL EHCA_BMASK_IBM(28,28) +#define MQPCB_FLOW_LABEL EHCA_BMASK_IBM(12,31) +#define MQPCB_MASK_DEST_GID EHCA_BMASK_IBM(30,30) +#define MQPCB_MASK_SERVICE_LEVEL_AL EHCA_BMASK_IBM(31,31) +#define MQPCB_SERVICE_LEVEL_AL EHCA_BMASK_IBM(28,31) +#define MQPCB_MASK_SEND_GRH_FLAG_AL EHCA_BMASK_IBM(32,32) +#define MQPCB_SEND_GRH_FLAG_AL EHCA_BMASK_IBM(31,31) +#define MQPCB_MASK_RETRY_COUNT_AL EHCA_BMASK_IBM(33,33) +#define MQPCB_RETRY_COUNT_AL EHCA_BMASK_IBM(29,31) +#define MQPCB_MASK_TIMEOUT_AL EHCA_BMASK_IBM(34,34) +#define MQPCB_TIMEOUT_AL EHCA_BMASK_IBM(27,31) +#define MQPCB_MASK_MAX_STATIC_RATE_AL EHCA_BMASK_IBM(35,35) +#define MQPCB_MAX_STATIC_RATE_AL EHCA_BMASK_IBM(24,31) +#define MQPCB_MASK_DLID_AL EHCA_BMASK_IBM(36,36) +#define MQPCB_DLID_AL EHCA_BMASK_IBM(16,31) +#define MQPCB_MASK_RNR_RETRY_COUNT_AL EHCA_BMASK_IBM(37,37) +#define MQPCB_RNR_RETRY_COUNT_AL EHCA_BMASK_IBM(29,31) +#define MQPCB_MASK_SOURCE_PATH_BITS_AL EHCA_BMASK_IBM(38,38) +#define MQPCB_SOURCE_PATH_BITS_AL EHCA_BMASK_IBM(25,31) +#define MQPCB_MASK_TRAFFIC_CLASS_AL EHCA_BMASK_IBM(39,39) +#define MQPCB_TRAFFIC_CLASS_AL EHCA_BMASK_IBM(24,31) +#define MQPCB_MASK_HOP_LIMIT_AL EHCA_BMASK_IBM(40,40) +#define MQPCB_HOP_LIMIT_AL EHCA_BMASK_IBM(24,31) +#define MQPCB_MASK_SOURCE_GID_IDX_AL EHCA_BMASK_IBM(41,41) +#define MQPCB_SOURCE_GID_IDX_AL EHCA_BMASK_IBM(24,31) +#define MQPCB_MASK_FLOW_LABEL_AL EHCA_BMASK_IBM(42,42) +#define MQPCB_FLOW_LABEL_AL EHCA_BMASK_IBM(12,31) +#define MQPCB_MASK_DEST_GID_AL EHCA_BMASK_IBM(44,44) +#define MQPCB_MASK_MAX_NR_OUTST_SEND_WR EHCA_BMASK_IBM(45,45) +#define MQPCB_MAX_NR_OUTST_SEND_WR EHCA_BMASK_IBM(16,31) +#define MQPCB_MASK_MAX_NR_OUTST_RECV_WR EHCA_BMASK_IBM(46,46) +#define MQPCB_MAX_NR_OUTST_RECV_WR EHCA_BMASK_IBM(16,31) +#define MQPCB_MASK_DISABLE_ETE_CREDIT_CHECK EHCA_BMASK_IBM(47,47) +#define MQPCB_DISABLE_ETE_CREDIT_CHECK EHCA_BMASK_IBM(31,31) +#define MQPCB_QP_NUMBER EHCA_BMASK_IBM(8,31) +#define MQPCB_MASK_QP_ENABLE EHCA_BMASK_IBM(48,48) +#define MQPCB_QP_ENABLE EHCA_BMASK_IBM(31,31) +#define MQPCB_MASK_CURR_SQR_LIMIT EHCA_BMASK_IBM(49,49) +#define MQPCB_CURR_SQR_LIMIT EHCA_BMASK_IBM(15,31) +#define MQPCB_MASK_QP_AFF_ASYN_EV_LOG_REG EHCA_BMASK_IBM(50,50) +#define MQPCB_MASK_SHARED_RQ_HNDL EHCA_BMASK_IBM(51,51) + +#endif /* __EHCA_CLASSES_PSERIES_H__ */ From schihei at de.ibm.com Mon May 15 10:41:01 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:41:01 +0200 Subject: [openib-general] [PATCH 00/16] ehca: IBM eHCA InfiniBand Device Driver Message-ID: <4468BD2D.5000103@de.ibm.com> Hello, many thanks for your comments. They are very helpful for us. All 17 patches have to be applied, otherwise the driver won't compile. We would appreciate for any comments and feedbacks. Signed-off-by: Heiko J Schick Changelog-by: Heiko J Schick Changelog: Differences to PatchSet http://openib.org/pipermail/openib-general/2006-April/020584.html Differences to PatchSet http://openib.org/pipermail/openib-general/2006-March/018144.html Differences to PatchSet http://openib.org/pipermail/openib-general/2006-March/017412.html - Linux kernel coding style - Reduce number of parameters passed to firmware interface wrappers - Remove ehca_kernel.h - Remove implementation of plpar_hcall_7arg_7ret() and plpar_hcall_9arg_9ret(), which are now included in kernel code - Remove simulation stub drivers/infiniband/hw/ehca/Kconfig | 6 drivers/infiniband/hw/ehca/Makefile | 16 drivers/infiniband/hw/ehca/ehca_av.c | 306 ++ drivers/infiniband/hw/ehca/ehca_classes.h | 350 +++ drivers/infiniband/hw/ehca/ehca_classes_pSeries.h | 251 ++ drivers/infiniband/hw/ehca/ehca_cq.c | 431 +++ drivers/infiniband/hw/ehca/ehca_eq.c | 222 + drivers/infiniband/hw/ehca/ehca_hca.c | 282 ++ drivers/infiniband/hw/ehca/ehca_irq.c | 710 ++++++ drivers/infiniband/hw/ehca/ehca_irq.h | 77 drivers/infiniband/hw/ehca/ehca_iverbs.h | 181 + drivers/infiniband/hw/ehca/ehca_main.c | 966 ++++++++ drivers/infiniband/hw/ehca/ehca_mcast.c | 194 + drivers/infiniband/hw/ehca/ehca_mrmw.c | 2474 ++++++++++++++++++++++ drivers/infiniband/hw/ehca/ehca_mrmw.h | 143 + drivers/infiniband/hw/ehca/ehca_pd.c | 118 + drivers/infiniband/hw/ehca/ehca_qes.h | 274 ++ drivers/infiniband/hw/ehca/ehca_qp.c | 1565 +++++++++++++ drivers/infiniband/hw/ehca/ehca_reqs.c | 683 ++++++ drivers/infiniband/hw/ehca/ehca_sqp.c | 123 + drivers/infiniband/hw/ehca/ehca_tools.h | 411 +++ drivers/infiniband/hw/ehca/ehca_uverbs.c | 391 +++ drivers/infiniband/hw/ehca/hcp_if.c | 1476 +++++++++++++ drivers/infiniband/hw/ehca/hcp_if.h | 330 ++ drivers/infiniband/hw/ehca/hcp_phyp.c | 92 drivers/infiniband/hw/ehca/hcp_phyp.h | 95 drivers/infiniband/hw/ehca/hipz_fns.h | 68 drivers/infiniband/hw/ehca/hipz_fns_core.h | 122 + drivers/infiniband/hw/ehca/hipz_hw.h | 395 +++ drivers/infiniband/hw/ehca/ipz_pt_fn.c | 177 + drivers/infiniband/hw/ehca/ipz_pt_fn.h | 254 ++ 31 files changed, 13183 insertions(+) From schihei at de.ibm.com Mon May 15 10:41:30 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:41:30 +0200 Subject: [openib-general] [PATCH 03/16] ehca: userspace support Message-ID: <4468BD4A.6080108@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/ehca_uverbs.c | 391 +++++++++++++++++++++++++++++++ 1 file changed, 391 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_uverbs.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_uverbs.c 2006-05-12 12:31:52.000000000 +0200 @@ -0,0 +1,391 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * userspace support verbs + * + * Authors: Christoph Raisch + * Hoang-Nam Nguyen + * Heiko J Schick + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#undef DEB_PREFIX +#define DEB_PREFIX "uver" + +#include + +#include "ehca_classes.h" +#include "ehca_iverbs.h" +#include "ehca_mrmw.h" +#include "ehca_tools.h" +#include "hcp_if.h" + +struct ib_ucontext *ehca_alloc_ucontext(struct ib_device *device, + struct ib_udata *udata) +{ + struct ehca_ucontext *my_context = NULL; + + EHCA_CHECK_ADR_P(device); + EDEB_EN(7, "device=%p name=%s", device, device->name); + + my_context = kzalloc(sizeof *my_context, GFP_KERNEL); + if (!my_context) { + EDEB_ERR(4, "Out of memory device=%p", device); + return ERR_PTR(-ENOMEM); + } + + EDEB_EX(7, "device=%p ucontext=%p", device, my_context); + + return &my_context->ib_ucontext; +} + +int ehca_dealloc_ucontext(struct ib_ucontext *context) +{ + struct ehca_ucontext *my_context = NULL; + EHCA_CHECK_ADR(context); + EDEB_EN(7, "ucontext=%p", context); + my_context = container_of(context, struct ehca_ucontext, ib_ucontext); + kfree(my_context); + EDEB_EN(7, "ucontext=%p", context); + return 0; +} + +struct page *ehca_nopage(struct vm_area_struct *vma, + unsigned long address, int *type) +{ + struct page *mypage = NULL; + u64 fileoffset = vma->vm_pgoff << PAGE_SHIFT; + u32 idr_handle = fileoffset >> 32; + u32 q_type = (fileoffset >> 28) & 0xF; /* CQ, QP,... */ + u32 rsrc_type = (fileoffset >> 24) & 0xF; /* sq,rq,cmnd_window */ + u32 cur_pid = current->tgid; + unsigned long flags; + + EDEB_EN(7, "vm_start=%lx vm_end=%lx vm_page_prot=%lx vm_fileoff=%lx " + "address=%lx", + vma->vm_start, vma->vm_end, vma->vm_page_prot, fileoffset, + address); + + if (q_type == 1) { /* CQ */ + struct ehca_cq *cq = NULL; + u64 offset; + void *vaddr = NULL; + + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq = idr_find(&ehca_cq_idr, idr_handle); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + + if (cq->ownpid != cur_pid) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, cq->ownpid); + return NOPAGE_SIGBUS; + } + + /* make sure this mmap really belongs to the authorized user */ + if (!cq) { + EDEB_ERR(4, "cq is NULL ret=NOPAGE_SIGBUS"); + return NOPAGE_SIGBUS; + } + if (rsrc_type == 2) { + EDEB(6, "cq=%p cq queuearea", cq); + offset = address - vma->vm_start; + vaddr = ipz_qeit_calc(&cq->ipz_queue, offset); + EDEB(6, "offset=%lx vaddr=%p", offset, vaddr); + mypage = virt_to_page(vaddr); + } + } else if (q_type == 2) { /* QP */ + struct ehca_qp *qp = NULL; + struct ehca_pd *pd = NULL; + u64 offset; + void *vaddr = NULL; + + spin_lock_irqsave(&ehca_qp_idr_lock, flags); + qp = idr_find(&ehca_qp_idr, idr_handle); + spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); + + + pd = container_of(qp->ib_qp.pd, struct ehca_pd, ib_pd); + if (pd->ownpid != cur_pid) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, pd->ownpid); + return NOPAGE_SIGBUS; + } + + /* make sure this mmap really belongs to the authorized user */ + if (!qp) { + EDEB_ERR(4, "qp is NULL ret=NOPAGE_SIGBUS"); + return NOPAGE_SIGBUS; + } + if (rsrc_type == 2) { /* rqueue */ + EDEB(6, "qp=%p qp rqueuearea", qp); + offset = address - vma->vm_start; + vaddr = ipz_qeit_calc(&qp->ipz_rqueue, offset); + EDEB(6, "offset=%lx vaddr=%p", offset, vaddr); + mypage = virt_to_page(vaddr); + } else if (rsrc_type == 3) { /* squeue */ + EDEB(6, "qp=%p qp squeuearea", qp); + offset = address - vma->vm_start; + vaddr = ipz_qeit_calc(&qp->ipz_squeue, offset); + EDEB(6, "offset=%lx vaddr=%p", offset, vaddr); + mypage = virt_to_page(vaddr); + } + } + + if (!mypage) { + EDEB_ERR(4, "Invalid page adr==NULL ret=NOPAGE_SIGBUS"); + return NOPAGE_SIGBUS; + } + get_page(mypage); + EDEB_EX(7, "page adr=%p", mypage); + return mypage; +} + +static struct vm_operations_struct ehcau_vm_ops = { + .nopage = ehca_nopage, +}; + +int ehca_mmap(struct ib_ucontext *context, struct vm_area_struct *vma) +{ + u64 fileoffset = vma->vm_pgoff << PAGE_SHIFT; + u32 idr_handle = fileoffset >> 32; + u32 q_type = (fileoffset >> 28) & 0xF; /* CQ, QP,... */ + u32 rsrc_type = (fileoffset >> 24) & 0xF; /* sq,rq,cmnd_window */ + u32 ret = -EFAULT; /* assume the worst */ + u64 vsize = 0; /* must be calculated/set below */ + u64 physical = 0; /* must be calculated/set below */ + u32 cur_pid = current->tgid; + unsigned long flags; + + EDEB_EN(7, "vm_start=%lx vm_end=%lx vm_page_prot=%lx vm_fileoff=%lx", + vma->vm_start, vma->vm_end, vma->vm_page_prot, fileoffset); + + if (q_type == 1) { /* CQ */ + struct ehca_cq *cq; + + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq = idr_find(&ehca_cq_idr, idr_handle); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + + if (cq->ownpid != cur_pid) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, cq->ownpid); + return -ENOMEM; + } + + /* make sure this mmap really belongs to the authorized user */ + if (!cq) + return -EINVAL; + if (!cq->ib_cq.uobject) + return -EINVAL; + if (cq->ib_cq.uobject->context != context) + return -EINVAL; + if (rsrc_type == 1) { /* galpa fw handle */ + EDEB(6, "cq=%p cq triggerarea", cq); + vma->vm_flags |= VM_RESERVED; + vsize = vma->vm_end - vma->vm_start; + if (vsize != EHCA_PAGESIZE) { + EDEB_ERR(4, "invalid vsize=%lx", + vma->vm_end - vma->vm_start); + ret = -EINVAL; + goto mmap_exit0; + } + + physical = cq->galpas.user.fw_handle; + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_flags |= VM_IO | VM_RESERVED; + + EDEB(6, "vsize=%lx physical=%lx", vsize, physical); + ret = remap_pfn_range(vma, vma->vm_start, + physical >> PAGE_SHIFT, vsize, + vma->vm_page_prot); + if (ret) { + EDEB_ERR(4, "remap_pfn_range() failed ret=%x", + ret); + ret = -ENOMEM; + } + goto mmap_exit0; + } else if (rsrc_type == 2) { /* cq queue_addr */ + EDEB(6, "cq=%p cq q_addr", cq); + /* vma->vm_page_prot = + * pgprot_noncached(vma->vm_page_prot); */ + vma->vm_flags |= VM_RESERVED; + vma->vm_ops = &ehcau_vm_ops; + ret = 0; + goto mmap_exit0; + } else { + EDEB_ERR(6, "bad resource type %x", rsrc_type); + ret = -EINVAL; + goto mmap_exit0; + } + } else if (q_type == 2) { /* QP */ + struct ehca_qp *qp = NULL; + struct ehca_pd *pd = NULL; + + spin_lock_irqsave(&ehca_qp_idr_lock, flags); + qp = idr_find(&ehca_qp_idr, idr_handle); + spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); + + pd = container_of(qp->ib_qp.pd, struct ehca_pd, ib_pd); + if (pd->ownpid != cur_pid) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, pd->ownpid); + return -ENOMEM; + } + + /* make sure this mmap really belongs to the authorized user */ + if (!qp || !qp->ib_qp.uobject || + qp->ib_qp.uobject->context != context) { + EDEB(6, "qp=%p, uobject=%p, context=%p", + qp, qp->ib_qp.uobject, qp->ib_qp.uobject->context); + ret = -EINVAL; + goto mmap_exit0; + } + if (rsrc_type == 1) { /* galpa fw handle */ + EDEB(6, "qp=%p qp triggerarea", qp); + vma->vm_flags |= VM_RESERVED; + vsize = vma->vm_end - vma->vm_start; + if (vsize != EHCA_PAGESIZE) { + EDEB_ERR(4, "invalid vsize=%lx", + vma->vm_end - vma->vm_start); + ret = -EINVAL; + goto mmap_exit0; + } + + physical = qp->galpas.user.fw_handle; + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_flags |= VM_IO | VM_RESERVED; + + EDEB(6, "vsize=%lx physical=%lx", vsize, physical); + ret = remap_pfn_range(vma, vma->vm_start, + physical >> PAGE_SHIFT, vsize, + vma->vm_page_prot); + if (ret) { + EDEB_ERR(4, "remap_pfn_range() failed ret=%x", + ret); + ret = -ENOMEM; + } + goto mmap_exit0; + } else if (rsrc_type == 2) { /* qp rqueue_addr */ + EDEB(6, "qp=%p qp rqueue_addr", qp); + vma->vm_flags |= VM_RESERVED; + vma->vm_ops = &ehcau_vm_ops; + ret = 0; + goto mmap_exit0; + } else if (rsrc_type == 3) { /* qp squeue_addr */ + EDEB(6, "qp=%p qp squeue_addr", qp); + vma->vm_flags |= VM_RESERVED; + vma->vm_ops = &ehcau_vm_ops; + ret = 0; + goto mmap_exit0; + } else { + EDEB_ERR(4, "bad resource type %x", rsrc_type); + ret = -EINVAL; + goto mmap_exit0; + } + } else { + EDEB_ERR(4, "bad queue type %x", q_type); + ret = -EINVAL; + goto mmap_exit0; + } + +mmap_exit0: + EDEB_EX(7, "ret=%x", ret); + return ret; +} + +int ehca_mmap_nopage(u64 foffset, u64 length, void ** mapped, + struct vm_area_struct ** vma) +{ + EDEB_EN(7, "foffset=%lx length=%lx", foffset, length); + down_write(¤t->mm->mmap_sem); + *mapped = (void*) + do_mmap(NULL,0, + length, + PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, + foffset); + up_write(¤t->mm->mmap_sem); + if (*mapped) { + *vma = find_vma(current->mm,(u64)*mapped); + if (*vma) { + (*vma)->vm_flags |= VM_RESERVED; + (*vma)->vm_ops = &ehcau_vm_ops; + } else + EDEB_ERR(4, "couldn't find queue vma queue=%p", *mapped); + } else + EDEB_ERR(4, "couldn't create mmap length=%lx", length); + EDEB_EX(7, "mapped=%p", *mapped); + return 0; +} + +int ehca_mmap_register(u64 physical, void ** mapped, + struct vm_area_struct ** vma) +{ + int ret = 0; + unsigned long vsize; + /* ehca hw supports only 4k page */ + ehca_mmap_nopage(0, EHCA_PAGESIZE, mapped, vma); + (*vma)->vm_flags |= VM_RESERVED; + vsize = (*vma)->vm_end - (*vma)->vm_start; + if (vsize != EHCA_PAGESIZE) { + EDEB_ERR(4, "invalid vsize=%lx", + (*vma)->vm_end - (*vma)->vm_start); + ret = -EINVAL; + return ret; + } + + (*vma)->vm_page_prot = pgprot_noncached((*vma)->vm_page_prot); + (*vma)->vm_flags |= VM_IO | VM_RESERVED; + + EDEB(6, "vsize=%lx physical=%lx", vsize, physical); + ret = remap_pfn_range((*vma), (*vma)->vm_start, + physical >> PAGE_SHIFT, vsize, + (*vma)->vm_page_prot); + if (ret) { + EDEB_ERR(4, "remap_pfn_range() failed ret=%x", ret); + ret = -ENOMEM; + } + return ret; + +} + +int ehca_munmap(unsigned long addr, size_t len) { + int ret = 0; + struct mm_struct *mm = current->mm; + if (mm) { + down_write(&mm->mmap_sem); + ret = do_munmap(mm, addr, len); + up_write(&mm->mmap_sem); + } + return ret; +} From schihei at de.ibm.com Mon May 15 10:41:38 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:41:38 +0200 Subject: [openib-general] [PATCH 04/16] ehca: InfiniBand query and multicast functionality Message-ID: <4468BD52.8080506@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/ehca_hca.c | 282 ++++++++++++++++++++++++++++++++ drivers/infiniband/hw/ehca/ehca_mcast.c | 194 ++++++++++++++++++++++ 2 files changed, 476 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_hca.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_hca.c 2006-05-02 10:55:26.000000000 +0200 @@ -0,0 +1,282 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * HCA query functions + * + * Authors: Heiko J Schick + * Christoph Raisch + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#undef DEB_PREFIX +#define DEB_PREFIX "shca" + +#include "ehca_tools.h" + +#include "hcp_if.h" + +int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props) +{ + int ret = 0; + struct ehca_shca *shca; + struct hipz_query_hca *rblock; + + EDEB_EN(7, ""); + + memset(props, 0, sizeof(struct ib_device_attr)); + shca = container_of(ibdev, struct ehca_shca, ib_device); + + rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); + if (!rblock) { + EDEB_ERR(4, "Can't allocate rblock memory."); + ret = -ENOMEM; + goto query_device0; + } + + if (hipz_h_query_hca(shca->ipz_hca_handle, rblock) != H_SUCCESS) { + EDEB_ERR(4, "Can't query device properties"); + ret = -EINVAL; + goto query_device1; + } + props->fw_ver = rblock->hw_ver; + props->max_mr_size = rblock->max_mr_size; + props->vendor_id = rblock->vendor_id >> 8; + props->vendor_part_id = rblock->vendor_part_id >> 16; + props->hw_ver = rblock->hw_ver; + props->max_qp = min_t(int, rblock->max_qp, INT_MAX); + props->max_qp_wr = min_t(int, rblock->max_wqes_wq, INT_MAX); + props->max_sge = min_t(int, rblock->max_sge, INT_MAX); + props->max_sge_rd = min_t(int, rblock->max_sge_rd, INT_MAX); + props->max_cq = min_t(int, rblock->max_cq, INT_MAX); + props->max_cqe = min_t(int, rblock->max_cqe, INT_MAX); + props->max_mr = min_t(int, rblock->max_mr, INT_MAX); + props->max_mw = min_t(int, rblock->max_mw, INT_MAX); + props->max_pd = min_t(int, rblock->max_pd, INT_MAX); + props->max_ah = min_t(int, rblock->max_ah, INT_MAX); + props->max_fmr = min_t(int, rblock->max_mr, INT_MAX); + props->max_srq = 0; + props->max_srq_wr = 0; + props->max_srq_sge = 0; + props->max_pkeys = 16; + props->local_ca_ack_delay + = rblock->local_ca_ack_delay; + props->max_raw_ipv6_qp + = min_t(int, rblock->max_raw_ipv6_qp, INT_MAX); + props->max_raw_ethy_qp + = min_t(int, rblock->max_raw_ethy_qp, INT_MAX); + props->max_mcast_grp + = min_t(int, rblock->max_mcast_grp, INT_MAX); + props->max_mcast_qp_attach + = min_t(int, rblock->max_mcast_qp_attach, INT_MAX); + props->max_total_mcast_qp_attach + = min_t(int, rblock->max_total_mcast_qp_attach, INT_MAX); + +query_device1: + kfree(rblock); + +query_device0: + EDEB_EX(7, "ret=%x", ret); + + return ret; +} + +int ehca_query_port(struct ib_device *ibdev, + u8 port, struct ib_port_attr *props) +{ + int ret = 0; + struct ehca_shca *shca; + struct hipz_query_port *rblock; + + EDEB_EN(7, "port=%x", port); + + memset(props, 0, sizeof(struct ib_port_attr)); + shca = container_of(ibdev, struct ehca_shca, ib_device); + + rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); + if (!rblock) { + EDEB_ERR(4, "Can't allocate rblock memory."); + ret = -ENOMEM; + goto query_port0; + } + + if (hipz_h_query_port(shca->ipz_hca_handle, port, rblock) != H_SUCCESS) { + EDEB_ERR(4, "Can't query port properties"); + ret = -EINVAL; + goto query_port1; + } + + props->state = rblock->state; + + switch (rblock->max_mtu) { + case 0x1: + props->active_mtu = props->max_mtu = IB_MTU_256; + break; + case 0x2: + props->active_mtu = props->max_mtu = IB_MTU_512; + break; + case 0x3: + props->active_mtu = props->max_mtu = IB_MTU_1024; + break; + case 0x4: + props->active_mtu = props->max_mtu = IB_MTU_2048; + break; + case 0x5: + props->active_mtu = props->max_mtu = IB_MTU_4096; + break; + default: + EDEB_ERR(4, "Unknown MTU size: %x.", rblock->max_mtu); + } + + props->gid_tbl_len = rblock->gid_tbl_len; + props->max_msg_sz = rblock->max_msg_sz; + props->bad_pkey_cntr = rblock->bad_pkey_cntr; + props->qkey_viol_cntr = rblock->qkey_viol_cntr; + props->pkey_tbl_len = rblock->pkey_tbl_len; + props->lid = rblock->lid; + props->sm_lid = rblock->sm_lid; + props->lmc = rblock->lmc; + props->sm_sl = rblock->sm_sl; + props->subnet_timeout = rblock->subnet_timeout; + props->init_type_reply = rblock->init_type_reply; + + props->active_width = IB_WIDTH_12X; + props->active_speed = 0x1; + +query_port1: + kfree(rblock); + +query_port0: + EDEB_EX(7, "ret=%x", ret); + + return ret; +} + +int ehca_query_pkey(struct ib_device *ibdev, u8 port, u16 index, u16 *pkey) +{ + int ret = 0; + struct ehca_shca *shca; + struct hipz_query_port *rblock; + + EDEB_EN(7, "port=%x index=%x", port, index); + + if (index > 16) { + EDEB_ERR(4, "Invalid index: %x.", index); + ret = -EINVAL; + goto query_pkey0; + } + + shca = container_of(ibdev, struct ehca_shca, ib_device); + + rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); + if (!rblock) { + EDEB_ERR(4, "Can't allocate rblock memory."); + ret = -ENOMEM; + goto query_pkey0; + } + + if (hipz_h_query_port(shca->ipz_hca_handle, port, rblock) != H_SUCCESS) { + EDEB_ERR(4, "Can't query port properties"); + ret = -EINVAL; + goto query_pkey1; + } + + memcpy(pkey, &rblock->pkey_entries + index, sizeof(u16)); + +query_pkey1: + kfree(rblock); + +query_pkey0: + EDEB_EX(7, "ret=%x", ret); + + return ret; +} + +int ehca_query_gid(struct ib_device *ibdev, u8 port, + int index, union ib_gid *gid) +{ + int ret = 0; + struct ehca_shca *shca; + struct hipz_query_port *rblock; + + EDEB_EN(7, "port=%x index=%x", port, index); + + if (index > 255) { + EDEB_ERR(4, "Invalid index: %x.", index); + ret = -EINVAL; + goto query_gid0; + } + + shca = container_of(ibdev, struct ehca_shca, ib_device); + + rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); + if (!rblock) { + EDEB_ERR(4, "Can't allocate rblock memory."); + ret = -ENOMEM; + goto query_gid0; + } + + if (hipz_h_query_port(shca->ipz_hca_handle, port, rblock) != H_SUCCESS) { + EDEB_ERR(4, "Can't query port properties"); + ret = -EINVAL; + goto query_gid1; + } + + memcpy(&gid->raw[0], &rblock->gid_prefix, sizeof(u64)); + memcpy(&gid->raw[8], &rblock->guid_entries[index], sizeof(u64)); + +query_gid1: + kfree(rblock); + +query_gid0: + EDEB_EX(7, "ret=%x GID=%lx%lx", ret, + *(u64 *) & gid->raw[0], + *(u64 *) & gid->raw[8]); + + return ret; +} + +int ehca_modify_port(struct ib_device *ibdev, + u8 port, int port_modify_mask, + struct ib_port_modify *props) +{ + int ret = 0; + + EDEB_EN(7, "port=%x", port); + + /* Not implemented yet. */ + + EDEB_EX(7, "ret=%x", ret); + + return ret; +} --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_mcast.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_mcast.c 2006-05-15 09:54:24.000000000 +0200 @@ -0,0 +1,194 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * mcast functions + * + * Authors: Khadija Souissi + * Waleri Fomin + * Reinhard Ernst + * Hoang-Nam Nguyen + * Heiko J Schick + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#define DEB_PREFIX "mcas" + +#include +#include +#include "ehca_classes.h" +#include "ehca_tools.h" +#include "ehca_qes.h" +#include "ehca_iverbs.h" + +#include "hcp_if.h" + +#define MAX_MC_LID 0xFFFE +#define MIN_MC_LID 0xC000 /* Multicast limits */ +#define EHCA_VALID_MULTICAST_GID(gid) ((gid)[0] == 0xFF) +#define EHCA_VALID_MULTICAST_LID(lid) (((lid) >= MIN_MC_LID) && ((lid) <= MAX_MC_LID)) + +int ehca_attach_mcast(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) +{ + struct ehca_qp *my_qp = NULL; + struct ehca_shca *shca = NULL; + union ib_gid my_gid; + u64 h_ret = H_SUCCESS; + int ret = 0; + + EHCA_CHECK_ADR(ibqp); + EHCA_CHECK_ADR(gid); + + my_qp = container_of(ibqp, struct ehca_qp, ib_qp); + + EHCA_CHECK_QP(my_qp); + if (ibqp->qp_type != IB_QPT_UD) { + EDEB_ERR(4, "invalid qp_type %x gid, ret=%x", + ibqp->qp_type, EINVAL); + return -EINVAL; + } + + shca = container_of(ibqp->pd->device, struct ehca_shca, ib_device); + EHCA_CHECK_ADR(shca); + + if (!(EHCA_VALID_MULTICAST_GID(gid->raw))) { + EDEB_ERR(4, "gid is not valid mulitcast gid ret=%x", + EINVAL); + return -EINVAL; + } else if ((lid < MIN_MC_LID) || (lid > MAX_MC_LID)) { + EDEB_ERR(4, "lid=%x is not valid mulitcast lid ret=%x", + lid, EINVAL); + return -EINVAL; + } + + memcpy(&my_gid.raw, gid->raw, sizeof(union ib_gid)); + + h_ret = hipz_h_attach_mcqp(shca->ipz_hca_handle, + my_qp->ipz_qp_handle, + my_qp->galpas.kernel, + lid, my_gid.global.subnet_prefix, + my_gid.global.interface_id); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, + "ehca_qp=%p qp_num=%x hipz_h_attach_mcqp() failed " + "h_ret=%lx", my_qp, ibqp->qp_num, h_ret); + } + ret = ehca2ib_return_code(h_ret); + + EDEB_EX(7, "mcast attach ret=%x\n" + "ehca_qp=%p qp_num=%x lid=%x\n" + "my_gid= %x %x %x %x\n" + " %x %x %x %x\n" + " %x %x %x %x\n" + " %x %x %x %x\n", + ret, my_qp, ibqp->qp_num, lid, + my_gid.raw[0], my_gid.raw[1], + my_gid.raw[2], my_gid.raw[3], + my_gid.raw[4], my_gid.raw[5], + my_gid.raw[6], my_gid.raw[7], + my_gid.raw[8], my_gid.raw[9], + my_gid.raw[10], my_gid.raw[11], + my_gid.raw[12], my_gid.raw[13], + my_gid.raw[14], my_gid.raw[15]); + + return ret; +} + +int ehca_detach_mcast(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) +{ + struct ehca_qp *my_qp = NULL; + struct ehca_shca *shca = NULL; + union ib_gid my_gid; + u64 h_ret = H_SUCCESS; + int ret = 0; + + EHCA_CHECK_ADR(ibqp); + EHCA_CHECK_ADR(gid); + + my_qp = container_of(ibqp, struct ehca_qp, ib_qp); + + EHCA_CHECK_QP(my_qp); + if (ibqp->qp_type != IB_QPT_UD) { + EDEB_ERR(4, "invalid qp_type %x gid, ret=%x", + ibqp->qp_type, EINVAL); + return -EINVAL; + } + + shca = container_of(ibqp->pd->device, struct ehca_shca, ib_device); + EHCA_CHECK_ADR(shca); + + if (!(EHCA_VALID_MULTICAST_GID(gid->raw))) { + EDEB_ERR(4, "gid is not valid mulitcast gid ret=%x", + EINVAL); + return -EINVAL; + } else if ((lid < MIN_MC_LID) || (lid > MAX_MC_LID)) { + EDEB_ERR(4, "lid=%x is not valid mulitcast lid ret=%x", + lid, EINVAL); + return -EINVAL; + } + + EDEB_EN(7, "dgid=%p qp_numl=%x lid=%x", + gid, ibqp->qp_num, lid); + + memcpy(&my_gid.raw, gid->raw, sizeof(union ib_gid)); + + h_ret = hipz_h_detach_mcqp(shca->ipz_hca_handle, + my_qp->ipz_qp_handle, + my_qp->galpas.kernel, + lid, my_gid.global.subnet_prefix, + my_gid.global.interface_id); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, + "ehca_qp=%p qp_num=%x hipz_h_detach_mcqp() failed " + "h_ret=%lx", my_qp, ibqp->qp_num, h_ret); + } + ret = ehca2ib_return_code(h_ret); + + EDEB_EX(7, "mcast detach ret=%x\n" + "ehca_qp=%p qp_num=%x lid=%x\n" + "my_gid= %x %x %x %x\n" + " %x %x %x %x\n" + " %x %x %x %x\n" + " %x %x %x %x\n", + ret, my_qp, ibqp->qp_num, lid, + my_gid.raw[0], my_gid.raw[1], + my_gid.raw[2], my_gid.raw[3], + my_gid.raw[4], my_gid.raw[5], + my_gid.raw[6], my_gid.raw[7], + my_gid.raw[8], my_gid.raw[9], + my_gid.raw[10], my_gid.raw[11], + my_gid.raw[12], my_gid.raw[13], + my_gid.raw[14], my_gid.raw[15]); + + return ret; +} From schihei at de.ibm.com Mon May 15 10:41:55 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:41:55 +0200 Subject: [openib-general] [PATCH 06/16] ehca: interrupt handling routines Message-ID: <4468BD63.6070509@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/ehca_irq.c | 710 ++++++++++++++++++++++++++++++++++ drivers/infiniband/hw/ehca/ehca_irq.h | 77 +++ 2 files changed, 787 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_irq.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_irq.h 2006-05-10 08:07:51.000000000 +0200 @@ -0,0 +1,77 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * Function definitions and structs for EQs, NEQs and interrupts + * + * Authors: Heiko J Schick + * Khadija Souissi + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __EHCA_IRQ_H +#define __EHCA_IRQ_H + + +struct ehca_shca; + +#include +#include +#include + +int ehca_error_data(struct ehca_shca *shca, void *data, u64 resource); + +irqreturn_t ehca_interrupt_neq(int irq, void *dev_id, struct pt_regs *regs); +void ehca_tasklet_neq(unsigned long data); + +irqreturn_t ehca_interrupt_eq(int irq, void *dev_id, struct pt_regs *regs); +void ehca_tasklet_eq(unsigned long data); + +struct ehca_cpu_comp_task { + wait_queue_head_t wait_queue; + struct list_head cq_list; + struct task_struct *task; + spinlock_t task_lock; +}; + +struct ehca_comp_pool { + struct ehca_cpu_comp_task *cpu_comp_tasks; + int last_cpu; + spinlock_t last_cpu_lock; +}; + +struct ehca_comp_pool *ehca_create_comp_pool(void); +void ehca_destroy_comp_pool(struct ehca_comp_pool *pool); +void ehca_queue_comp_task(struct ehca_comp_pool *pool, struct ehca_cq *__cq); + +#endif --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_irq.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_irq.c 2006-05-15 13:29:49.000000000 +0200 @@ -0,0 +1,710 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * Functions for EQs, NEQs and interrupts + * + * Authors: Heiko J Schick + * Khadija Souissi + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#define DEB_PREFIX "eirq" + +#include "ehca_classes.h" +#include "ehca_irq.h" +#include "ehca_iverbs.h" +#include "ehca_tools.h" +#include "hcp_if.h" +#include "hipz_fns.h" + +#define EQE_COMPLETION_EVENT EHCA_BMASK_IBM(1,1) +#define EQE_CQ_QP_NUMBER EHCA_BMASK_IBM(8,31) +#define EQE_EE_IDENTIFIER EHCA_BMASK_IBM(2,7) +#define EQE_CQ_NUMBER EHCA_BMASK_IBM(8,31) +#define EQE_QP_NUMBER EHCA_BMASK_IBM(8,31) +#define EQE_QP_TOKEN EHCA_BMASK_IBM(32,63) +#define EQE_CQ_TOKEN EHCA_BMASK_IBM(32,63) + +#define NEQE_COMPLETION_EVENT EHCA_BMASK_IBM(1,1) +#define NEQE_EVENT_CODE EHCA_BMASK_IBM(2,7) +#define NEQE_PORT_NUMBER EHCA_BMASK_IBM(8,15) +#define NEQE_PORT_AVAILABILITY EHCA_BMASK_IBM(16,16) + +#define ERROR_DATA_LENGTH EHCA_BMASK_IBM(52,63) +#define ERROR_DATA_TYPE EHCA_BMASK_IBM(0,7) + +static inline void comp_event_callback(struct ehca_cq *cq) +{ + EDEB_EN(7, "cq=%p", cq); + + if (!cq->ib_cq.comp_handler) + return; + + spin_lock(&cq->cb_lock); + cq->ib_cq.comp_handler(&cq->ib_cq, cq->ib_cq.cq_context); + spin_unlock(&cq->cb_lock); + + EDEB_EX(7, "cq=%p", cq); + + return; +} + +static void print_error_data(struct ehca_shca * shca, void* data, + u64* rblock, int length) +{ + u64 type = EHCA_BMASK_GET(ERROR_DATA_TYPE, rblock[2]); + u64 resource = rblock[1]; + + EDEB_EN(7, "shca=%p data=%p rblock=%p length=%x", + shca, data, rblock, length); + + switch (type) { + case 0x1: /* Queue Pair */ + { + struct ehca_qp *qp = (struct ehca_qp*)data; + + /* only print error data if AER is set */ + if (rblock[6] == 0) + return; + + EDEB_ERR(4, "QP 0x%x (resource=%lx) has errors.", + qp->ib_qp.qp_num, resource); + break; + } + case 0x4: /* Completion Queue */ + { + struct ehca_cq *cq = (struct ehca_cq*)data; + + EDEB_ERR(4, "CQ 0x%x (resource=%lx) has errors.", + cq->cq_number, resource); + break; + } + default: + EDEB_ERR(4, "Unknown errror type: %lx on %s.", + type, shca->ib_device.name); + break; + } + + EDEB_ERR(4, "Error data is available: %lx.", resource); + EDEB_ERR(4, "EHCA ----- error data begin " + "---------------------------------------------------"); + EDEB_DMP(4, rblock, length, "resource=%lx", resource); + EDEB_ERR(4, "EHCA ----- error data end " + "----------------------------------------------------"); + + EDEB_EX(7, ""); + + return; +} + +int ehca_error_data(struct ehca_shca *shca, void *data, + u64 resource) +{ + + unsigned long ret = 0; + u64 *rblock; + unsigned long block_count; + + EDEB_EN(7, "shca=%p data=%p resource=%lx", shca, data, resource); + + rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); + if (!rblock) { + EDEB_ERR(4, "Cannot allocate rblock memory."); + ret = -ENOMEM; + goto error_data1; + } + + ret = hipz_h_error_data(shca->ipz_hca_handle, + resource, + rblock, + &block_count); + + if (ret == H_R_STATE) { + EDEB_ERR(4, "No error data is available: %lx.", resource); + } + else if (ret == H_SUCCESS) { + int length; + + length = EHCA_BMASK_GET(ERROR_DATA_LENGTH, rblock[0]); + + if (length > PAGE_SIZE) + length = PAGE_SIZE; + + print_error_data(shca, data, rblock, length); + } + else { + EDEB_ERR(4, "Error data could not be fetched: %lx", resource); + } + + kfree(rblock); + +error_data1: + return ret; + +} + +static void qp_event_callback(struct ehca_shca *shca, + u64 eqe, + enum ib_event_type event_type) +{ + struct ib_event event; + struct ehca_qp *qp; + unsigned long flags; + u32 token = EHCA_BMASK_GET(EQE_QP_TOKEN, eqe); + + EDEB_EN(7, "eqe=%lx", eqe); + + spin_lock_irqsave(&ehca_qp_idr_lock, flags); + qp = idr_find(&ehca_qp_idr, token); + spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); + + + if (!qp) + return; + + ehca_error_data(shca, qp, qp->ipz_qp_handle.handle); + + if (!qp->ib_qp.event_handler) + return; + + event.device = &shca->ib_device; + event.event = event_type; + event.element.qp = &qp->ib_qp; + + qp->ib_qp.event_handler(&event, qp->ib_qp.qp_context); + + EDEB_EX(7, "qp=%p", qp); + + return; +} + +static void cq_event_callback(struct ehca_shca *shca, + u64 eqe) +{ + struct ehca_cq *cq; + unsigned long flags; + u32 token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe); + + EDEB_EN(7, "eqe=%lx", eqe); + + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + cq = idr_find(&ehca_cq_idr, token); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + + if (!cq) + return; + + ehca_error_data(shca, cq, cq->ipz_cq_handle.handle); + + EDEB_EX(7, "cq=%p", cq); + + return; +} + +static void parse_identifier(struct ehca_shca *shca, u64 eqe) +{ + u8 identifier = EHCA_BMASK_GET(EQE_EE_IDENTIFIER, eqe); + + EDEB_EN(7, "shca=%p eqe=%lx", shca, eqe); + + switch (identifier) { + case 0x02: /* path migrated */ + qp_event_callback(shca, eqe, IB_EVENT_PATH_MIG); + break; + case 0x03: /* communication established */ + qp_event_callback(shca, eqe, IB_EVENT_COMM_EST); + break; + case 0x04: /* send queue drained */ + qp_event_callback(shca, eqe, IB_EVENT_SQ_DRAINED); + break; + case 0x05: /* QP error */ + case 0x06: /* QP error */ + qp_event_callback(shca, eqe, IB_EVENT_QP_FATAL); + break; + case 0x07: /* CQ error */ + case 0x08: /* CQ error */ + cq_event_callback(shca, eqe); + break; + case 0x09: /* MRMWPTE error */ + EDEB_ERR(4, "MRMWPTE error."); + break; + case 0x0A: /* port event */ + EDEB_ERR(4, "Port event."); + break; + case 0x0B: /* MR access error */ + EDEB_ERR(4, "MR access error."); + break; + case 0x0C: /* EQ error */ + EDEB_ERR(4, "EQ error."); + break; + case 0x0D: /* P/Q_Key mismatch */ + EDEB_ERR(4, "P/Q_Key mismatch."); + break; + case 0x10: /* sampling complete */ + EDEB_ERR(4, "Sampling complete."); + break; + case 0x11: /* unaffiliated access error */ + EDEB_ERR(4, "Unaffiliated access error."); + break; + case 0x12: /* path migrating error */ + EDEB_ERR(4, "Path migration error."); + break; + case 0x13: /* interface trace stopped */ + EDEB_ERR(4, "Interface trace stopped."); + break; + case 0x14: /* first error capture info available */ + default: + EDEB_ERR(4, "Unknown identifier: %x on %s.", + identifier, shca->ib_device.name); + break; + } + + EDEB_EX(7, "eqe=%lx identifier=%x", eqe, identifier); + + return; +} + +static void parse_ec(struct ehca_shca *shca, u64 eqe) +{ + struct ib_event event; + u8 ec = EHCA_BMASK_GET(NEQE_EVENT_CODE, eqe); + u8 port = EHCA_BMASK_GET(NEQE_PORT_NUMBER, eqe); + + EDEB_EN(7, "shca=%p eqe=%lx", shca, eqe); + + switch (ec) { + case 0x30: /* port availability change */ + if (EHCA_BMASK_GET(NEQE_PORT_AVAILABILITY, eqe)) { + EDEB(4, "%s: port %x is active.", + shca->ib_device.name, port); + event.device = &shca->ib_device; + event.event = IB_EVENT_PORT_ACTIVE; + event.element.port_num = port; + shca->sport[port - 1].port_state = IB_PORT_ACTIVE; + ib_dispatch_event(&event); + } else { + EDEB(4, "%s: port %x is inactive.", + shca->ib_device.name, port); + event.device = &shca->ib_device; + event.event = IB_EVENT_PORT_ERR; + event.element.port_num = port; + shca->sport[port - 1].port_state = IB_PORT_DOWN; + ib_dispatch_event(&event); + } + break; + case 0x31: + /* port configuration change */ + /* disruptive change is caused by */ + /* LID, PKEY or SM change */ + EDEB(4, "EHCA disruptive port %x " + "configuration change.", port); + + EDEB(4, "%s: port %x is inactive.", + shca->ib_device.name, port); + event.device = &shca->ib_device; + event.event = IB_EVENT_PORT_ERR; + event.element.port_num = port; + shca->sport[port - 1].port_state = IB_PORT_DOWN; + ib_dispatch_event(&event); + + EDEB(4, "%s: port %x is active.", + shca->ib_device.name, port); + event.device = &shca->ib_device; + event.event = IB_EVENT_PORT_ACTIVE; + event.element.port_num = port; + shca->sport[port - 1].port_state = IB_PORT_ACTIVE; + ib_dispatch_event(&event); + break; + case 0x32: /* adapter malfunction */ + EDEB_ERR(4, "Adapter malfunction."); + break; + case 0x33: /* trace stopped */ + EDEB_ERR(4, "Traced stopped."); + break; + default: + EDEB_ERR(4, "Unknown event code: %x on %s.", + ec, shca->ib_device.name); + break; + } + + EDEB_EN(7, "eqe=%lx ec=%x", eqe, ec); + + return; +} + +static inline void reset_eq_pending(struct ehca_cq *cq) +{ + u64 CQx_EP = 0; + struct h_galpa gal = cq->galpas.kernel; + + EDEB_EN(7, "cq=%p", cq); + + hipz_galpa_store_cq(gal, cqx_ep, 0x0); + CQx_EP = hipz_galpa_load(gal, CQTEMM_OFFSET(cqx_ep)); + EDEB(7, "CQx_EP=%lx", CQx_EP); + + EDEB_EX(7, "cq=%p", cq); + + return; +} + +irqreturn_t ehca_interrupt_neq(int irq, void *dev_id, struct pt_regs *regs) +{ + struct ehca_shca *shca = (struct ehca_shca*)dev_id; + + EDEB_EN(7, "dev_id=%p", dev_id); + + tasklet_hi_schedule(&shca->neq.interrupt_task); + + EDEB_EX(7, ""); + + return IRQ_HANDLED; +} + +void ehca_tasklet_neq(unsigned long data) +{ + struct ehca_shca *shca = (struct ehca_shca*)data; + struct ehca_eqe *eqe; + u64 ret = H_SUCCESS; + + EDEB_EN(7, "shca=%p", shca); + + eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->neq); + + while (eqe) { + if (!EHCA_BMASK_GET(NEQE_COMPLETION_EVENT, eqe->entry)) + parse_ec(shca, eqe->entry); + + eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->neq); + } + + ret = hipz_h_reset_event(shca->ipz_hca_handle, + shca->neq.ipz_eq_handle, 0xFFFFFFFFFFFFFFFFL); + + if (ret != H_SUCCESS) + EDEB_ERR(4, "Can't clear notification events."); + + EDEB_EX(7, "shca=%p", shca); + + return; +} + +irqreturn_t ehca_interrupt_eq(int irq, void *dev_id, struct pt_regs *regs) +{ + struct ehca_shca *shca = (struct ehca_shca*)dev_id; + + EDEB_EN(7, "dev_id=%p", dev_id); + + tasklet_hi_schedule(&shca->eq.interrupt_task); + + EDEB_EX(7, ""); + + return IRQ_HANDLED; +} + +void ehca_tasklet_eq(unsigned long data) +{ + struct ehca_shca *shca = (struct ehca_shca*)data; + struct ehca_eqe *eqe; + int int_state; + + EDEB_EN(7, "shca=%p", shca); + + do { + eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq); + + if ((shca->hw_level >= 2) && eqe) + int_state = 1; + else + int_state = 0; + + while ((int_state == 1) || eqe) { + while (eqe) { + u64 eqe_value = eqe->entry; + + EDEB(7, "eqe_value=%lx", eqe_value); + + /* TODO: better structure */ + if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT, + eqe_value)) { + extern struct ehca_comp_pool* ehca_pool; + extern struct idr ehca_cq_idr; + unsigned long flags; + u32 token; + struct ehca_cq *cq; + + EDEB(6, "... completion event"); + token = + EHCA_BMASK_GET(EQE_CQ_TOKEN, + eqe_value); + spin_lock_irqsave(&ehca_cq_idr_lock, + flags); + cq = idr_find(&ehca_cq_idr, token); + + if (cq == NULL) { + spin_unlock(&ehca_cq_idr_lock); + break; + } + + reset_eq_pending(cq); + ehca_queue_comp_task(ehca_pool, cq); + spin_unlock_irqrestore(&ehca_cq_idr_lock, + flags); + } else { + EDEB(6, "... non completion event"); + parse_identifier(shca, eqe_value); + } + eqe = + (struct ehca_eqe *)ehca_poll_eq(shca, + &shca->eq); + } + + if (shca->hw_level >= 2) + int_state = + hipz_h_query_int_state(shca->ipz_hca_handle, + shca->eq.ist); + eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq); + + } + } while (int_state != 0); + + EDEB_EX(7, "shca=%p", shca); + + return; +} + +static inline int find_next_online_cpu(struct ehca_comp_pool* pool) +{ + unsigned long flags_last_cpu; + + spin_lock_irqsave(&pool->last_cpu_lock, flags_last_cpu); + pool->last_cpu = next_cpu(pool->last_cpu, cpu_online_map); + + if (pool->last_cpu == NR_CPUS) + pool->last_cpu = 0; + + spin_unlock_irqrestore(&pool->last_cpu_lock, flags_last_cpu); + + return pool->last_cpu; +} + +void ehca_queue_comp_task(struct ehca_comp_pool *pool, struct ehca_cq *__cq) +{ + int cpu; + int cpu_id; + struct ehca_cpu_comp_task *cct; + unsigned long flags_cct; + unsigned long flags_cq; + + cpu = get_cpu(); + cpu_id = find_next_online_cpu(pool); + + EDEB_EN(7, "pool=%p cq=%p cq_nr=%x CPU=%x:%x:%x:%x", + pool, __cq, __cq->cq_number, + cpu, cpu_id, num_online_cpus(), num_possible_cpus()); + + BUG_ON(!cpu_online(cpu_id)); + + cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id); + + spin_lock_irqsave(&cct->task_lock, flags_cct); + spin_lock_irqsave(&__cq->task_lock, flags_cq); + + if (__cq->nr_callbacks == 0) { + __cq->nr_callbacks++; + list_add_tail(&__cq->entry, &cct->cq_list); + wake_up(&cct->wait_queue); + } + else + __cq->nr_callbacks++; + + spin_unlock_irqrestore(&__cq->task_lock, flags_cq); + spin_unlock_irqrestore(&cct->task_lock, flags_cct); + + put_cpu(); + + EDEB_EX(7, "cct=%p", cct); + + return; +} + +static void run_comp_task(struct ehca_cpu_comp_task* cct) +{ + struct ehca_cq *cq = NULL; + unsigned long flags_cct; + unsigned long flags_cq; + + + EDEB_EN(7, "cct=%p", cct); + + spin_lock_irqsave(&cct->task_lock, flags_cct); + + while (!list_empty(&cct->cq_list)) { + cq = list_entry(cct->cq_list.next, struct ehca_cq, entry); + spin_unlock_irqrestore(&cct->task_lock, flags_cct); + comp_event_callback(cq); + spin_lock_irqsave(&cct->task_lock, flags_cct); + + spin_lock_irqsave(&cq->task_lock, flags_cq); + cq->nr_callbacks--; + if (cq->nr_callbacks == 0) + list_del_init(cct->cq_list.next); + spin_unlock_irqrestore(&cq->task_lock, flags_cq); + + } + + spin_unlock_irqrestore(&cct->task_lock, flags_cct); + + EDEB_EX(7, "cct=%p cq=%p", cct, cq); + + return; +} + +static int comp_task(void *__cct) +{ + struct ehca_cpu_comp_task* cct = __cct; + DECLARE_WAITQUEUE(wait, current); + + EDEB_EN(7, "cct=%p", cct); + + set_current_state(TASK_INTERRUPTIBLE); + while(!kthread_should_stop()) { + add_wait_queue(&cct->wait_queue, &wait); + + if (list_empty(&cct->cq_list)) + schedule(); + else + __set_current_state(TASK_RUNNING); + + remove_wait_queue(&cct->wait_queue, &wait); + + if (!list_empty(&cct->cq_list)) + run_comp_task(__cct); + + set_current_state(TASK_INTERRUPTIBLE); + } + __set_current_state(TASK_RUNNING); + + EDEB_EX(7, ""); + + return 0; +} + +static struct task_struct *create_comp_task(struct ehca_comp_pool *pool, + int cpu) +{ + struct ehca_cpu_comp_task *cct; + + EDEB_EN(7, "cpu=%d:%d", cpu, NR_CPUS); + + cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu); + spin_lock_init(&cct->task_lock); + INIT_LIST_HEAD(&cct->cq_list); + init_waitqueue_head(&cct->wait_queue); + cct->task = kthread_create(comp_task, cct, "ehca_comp/%d", cpu); + + EDEB_EX(7, "cct/%d=%p", cpu, cct); + + return cct->task; +} + +static void destroy_comp_task(struct ehca_comp_pool *pool, + int cpu) +{ + struct ehca_cpu_comp_task *cct; + struct task_struct *task; + + EDEB_EN(7, "pool=%p cpu=%d:%d", pool, cpu, NR_CPUS); + + cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu); + cct->task = NULL; + task = cct->task; + + if (task) + kthread_stop(task); + + EDEB_EX(7, ""); + + return; +} + +struct ehca_comp_pool *ehca_create_comp_pool(void) +{ + struct ehca_comp_pool *pool; + int cpu; + struct task_struct *task; + + EDEB_EN(7, ""); + + pool = kzalloc(sizeof(struct ehca_comp_pool), GFP_KERNEL); + if (pool == NULL) + return NULL; + + spin_lock_init(&pool->last_cpu_lock); + pool->last_cpu = any_online_cpu(cpu_online_map); + + pool->cpu_comp_tasks = alloc_percpu(struct ehca_cpu_comp_task); + if (pool->cpu_comp_tasks == NULL) { + kfree(pool); + return NULL; + } + + for_each_online_cpu(cpu) { + task = create_comp_task(pool, cpu); + if (task) { + kthread_bind(task, cpu); + wake_up_process(task); + } + } + + EDEB_EX(7, "pool=%p", pool); + + return pool; +} + +void ehca_destroy_comp_pool(struct ehca_comp_pool *pool) +{ + int i; + + EDEB_EN(7, "pool=%p", pool); + + for (i = 0; i < NR_CPUS; i++) { + if (cpu_online(i)) + destroy_comp_task(pool, i); + } + + EDEB_EN(7, ""); + + return; +} From schihei at de.ibm.com Mon May 15 10:41:47 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:41:47 +0200 Subject: [openib-general] [PATCH 05/16] ehca: common include files Message-ID: <4468BD5B.1060406@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/ehca_iverbs.h | 181 +++++++++++++ drivers/infiniband/hw/ehca/ehca_tools.h | 411 +++++++++++++++++++++++++++++++ 2 files changed, 592 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_iverbs.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_iverbs.h 2006-04-28 14:20:08.000000000 +0200 @@ -0,0 +1,181 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * Function definitions for internal functions + * + * Authors: Heiko J Schick + * Dietmar Decker + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __EHCA_IVERBS_H__ +#define __EHCA_IVERBS_H__ + +#include "ehca_classes.h" + +int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props); + +int ehca_query_port(struct ib_device *ibdev, u8 port, + struct ib_port_attr *props); + +int ehca_query_pkey(struct ib_device *ibdev, u8 port, u16 index, u16 * pkey); + +int ehca_query_gid(struct ib_device *ibdev, u8 port, int index, + union ib_gid *gid); + +int ehca_modify_port(struct ib_device *ibdev, u8 port, int port_modify_mask, + struct ib_port_modify *props); + +struct ib_pd *ehca_alloc_pd(struct ib_device *device, + struct ib_ucontext *context, + struct ib_udata *udata); + +int ehca_dealloc_pd(struct ib_pd *pd); + +struct ib_ah *ehca_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr); + +int ehca_modify_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr); + +int ehca_query_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr); + +int ehca_destroy_ah(struct ib_ah *ah); + +struct ib_mr *ehca_get_dma_mr(struct ib_pd *pd, int mr_access_flags); + +struct ib_mr *ehca_reg_phys_mr(struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, u64 *iova_start); + +struct ib_mr *ehca_reg_user_mr(struct ib_pd *pd, + struct ib_umem *region, + int mr_access_flags, struct ib_udata *udata); + +int ehca_rereg_phys_mr(struct ib_mr *mr, + int mr_rereg_mask, + struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, int mr_access_flags, u64 *iova_start); + +int ehca_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr); + +int ehca_dereg_mr(struct ib_mr *mr); + +struct ib_mw *ehca_alloc_mw(struct ib_pd *pd); + +int ehca_bind_mw(struct ib_qp *qp, struct ib_mw *mw, + struct ib_mw_bind *mw_bind); + +int ehca_dealloc_mw(struct ib_mw *mw); + +struct ib_fmr *ehca_alloc_fmr(struct ib_pd *pd, + int mr_access_flags, + struct ib_fmr_attr *fmr_attr); + +int ehca_map_phys_fmr(struct ib_fmr *fmr, + u64 *page_list, int list_len, u64 iova); + +int ehca_unmap_fmr(struct list_head *fmr_list); + +int ehca_dealloc_fmr(struct ib_fmr *fmr); + +enum ehca_eq_type { + EHCA_EQ = 0, /* Event Queue */ + EHCA_NEQ /* Notification Event Queue */ +}; + +int ehca_create_eq(struct ehca_shca *shca, struct ehca_eq *eq, + enum ehca_eq_type type, const u32 length); + +int ehca_destroy_eq(struct ehca_shca *shca, struct ehca_eq *eq); + +void *ehca_poll_eq(struct ehca_shca *shca, struct ehca_eq *eq); + + +struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, + struct ib_ucontext *context, + struct ib_udata *udata); + +int ehca_destroy_cq(struct ib_cq *cq); + +int ehca_resize_cq(struct ib_cq *cq, int cqe, struct ib_udata *udata); + +int ehca_poll_cq(struct ib_cq *cq, int num_entries, struct ib_wc *wc); + +int ehca_peek_cq(struct ib_cq *cq, int wc_cnt); + +int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify); + +struct ib_qp *ehca_create_qp(struct ib_pd *pd, + struct ib_qp_init_attr *init_attr, + struct ib_udata *udata); + +int ehca_destroy_qp(struct ib_qp *qp); + +int ehca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask); + +int ehca_query_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr, + int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr); + +int ehca_post_send(struct ib_qp *qp, struct ib_send_wr *send_wr, + struct ib_send_wr **bad_send_wr); + +int ehca_post_recv(struct ib_qp *qp, struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr); + +u64 ehca_define_sqp(struct ehca_shca *shca, struct ehca_qp *ibqp, + struct ib_qp_init_attr *qp_init_attr); + +int ehca_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid); + +int ehca_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid); + +struct ib_ucontext *ehca_alloc_ucontext(struct ib_device *device, + struct ib_udata *udata); + +int ehca_dealloc_ucontext(struct ib_ucontext *context); + +int ehca_mmap(struct ib_ucontext *context, struct vm_area_struct *vma); + +void ehca_poll_eqs(unsigned long data); + +int ehca_mmap_nopage(u64 foffset,u64 length,void **mapped, + struct vm_area_struct **vma); + +int ehca_mmap_register(u64 physical,void **mapped, + struct vm_area_struct **vma); + +int ehca_munmap(unsigned long addr, size_t len); + +#endif --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_tools.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_tools.h 2006-05-03 13:44:15.000000000 +0200 @@ -0,0 +1,411 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * auxiliary functions + * + * Authors: Christoph Raisch + * Hoang-Nam Nguyen + * Khadija Souissi + * Waleri Fomin + * Heiko J Schick + * + * Copyright (c) 2005 IBM Corporation + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + + +#ifndef EHCA_TOOLS_H +#define EHCA_TOOLS_H + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#define EHCA_EDEB_TRACE_MASK_SIZE 32 +extern u8 ehca_edeb_mask[EHCA_EDEB_TRACE_MASK_SIZE]; +#define EDEB_ID_TO_U32(str4) (str4[3] | (str4[2] << 8) | (str4[1] << 16) | \ + (str4[0] << 24)) + +static inline u64 ehca_edeb_filter(const u32 level, + const u32 id, const u32 line) +{ + u64 ret = 0; + u32 filenr = 0; + u32 filter_level = 9; + u32 dynamic_level = 0; + + /* This is code written for the gcc -O2 optimizer which should colapse + * to two single ints filter_level is the first level kicked out by + * compiler means trace everythin below 6. */ + if (id == EDEB_ID_TO_U32("ehav")) { + filenr = 0x01; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("clas")) { + filenr = 0x02; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("cqeq")) { + filenr = 0x03; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("shca")) { + filenr = 0x05; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("eirq")) { + filenr = 0x06; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("lMad")) { + filenr = 0x07; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("mcas")) { + filenr = 0x08; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("mrmw")) { + filenr = 0x09; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("vpd ")) { + filenr = 0x0a; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("e_qp")) { + filenr = 0x0b; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("uqes")) { + filenr = 0x0c; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("PHYP")) { + filenr = 0x0d; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("hcpi")) { + filenr = 0x0e; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("iptz")) { + filenr = 0x0f; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("spta")) { + filenr = 0x10; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("simp")) { + filenr = 0x11; + filter_level = 8; + } + if (id == EDEB_ID_TO_U32("reqs")) { + filenr = 0x12; + filter_level = 8; + } + + if ((filenr - 1) > sizeof(ehca_edeb_mask)) { + filenr = 0; + } + + if (filenr == 0) { + filter_level = 9; + } /* default */ + ret = filenr * 0x10000 + line; + if (filter_level <= level) { + return ret | 0x100000000L; /* this is the flag to not trace */ + } + dynamic_level = ehca_edeb_mask[filenr]; + if (likely(dynamic_level <= level)) { + ret = ret | 0x100000000L; + }; + return ret; +} + +#ifdef EHCA_USE_HCALL_KERNEL +#ifdef CONFIG_PPC_PSERIES + +#include + +/** + * IS_EDEB_ON - Checks if debug is on for the given level. + */ +#define IS_EDEB_ON(level) \ + ((ehca_edeb_filter(level, EDEB_ID_TO_U32(DEB_PREFIX), __LINE__) & 0x100000000L)==0) + +#define EDEB_P_GENERIC(level,idstring,format,args...) \ +do { \ + u64 ehca_edeb_filterresult = \ + ehca_edeb_filter(level, EDEB_ID_TO_U32(DEB_PREFIX), __LINE__);\ + if ((ehca_edeb_filterresult & 0x100000000L) == 0) \ + printk("PU%04x %08x:%s " idstring " "format "\n", \ + get_paca()->paca_index, (u32)(ehca_edeb_filterresult), \ + __func__, ##args); \ +} while (1 == 0) + +#elif REAL_HCALL + +#define EDEB_P_GENERIC(level,idstring,format,args...) \ +do { \ + u64 ehca_edeb_filterresult = \ + ehca_edeb_filter(level, EDEB_ID_TO_U32(DEB_PREFIX), __LINE__); \ + if ((ehca_edeb_filterresult & 0x100000000L) == 0) \ + printk("%08x:%s " idstring " "format "\n", \ + (u32)(ehca_edeb_filterresult), \ + __func__, ##args); \ +} while (1 == 0) + +#endif +#else + +#define IS_EDEB_ON(level) (1) + +#define EDEB_P_GENERIC(level,idstring,format,args...) \ +do { \ + printk("%s " idstring " "format "\n", \ + __func__, ##args); \ +} while (1 == 0) + +#endif + +/** + * EDEB - Trace output macro. + * @level tracelevel + * @format optional format string, use "" if not desired + * @args printf like arguments for trace, use %Lx for u64, %x for u32 + * %p for pointer + */ +#define EDEB(level,format,args...) \ + EDEB_P_GENERIC(level,"",format,##args) +#define EDEB_ERR(level,format,args...) \ + EDEB_P_GENERIC(level,"HCAD_ERROR ",format,##args) +#define EDEB_EN(level,format,args...) \ + EDEB_P_GENERIC(level,">>>",format,##args) +#define EDEB_EX(level,format,args...) \ + EDEB_P_GENERIC(level,"<<<",format,##args) + +/** + * EDEB macro to dump a memory block, whose length is n*8 bytes. + * Each line has the following layout: + * adr=X ofs=Y <8 bytes hex> <8 bytes hex> + */ +#define EDEB_DMP(level,adr,len,format,args...) \ + do { \ + unsigned int x; \ + unsigned int l = (unsigned int)(len); \ + unsigned char *deb = (unsigned char*)(adr); \ + for (x = 0; x < l; x += 16) { \ + EDEB(level, format " adr=%p ofs=%04x %016lx %016lx", \ + ##args, deb, x, *((u64 *)&deb[0]), *((u64 *)&deb[8])); \ + deb += 16; \ + } \ + } while (0) + +/* define a bitmask, little endian version */ +#define EHCA_BMASK(pos,length) (((pos)<<16)+(length)) +/* define a bitmask, the ibm way... */ +#define EHCA_BMASK_IBM(from,to) (((63-to)<<16)+((to)-(from)+1)) +/* internal function, don't use */ +#define EHCA_BMASK_SHIFTPOS(mask) (((mask)>>16)&0xffff) +/* internal function, don't use */ +#define EHCA_BMASK_MASK(mask) (0xffffffffffffffffULL >> ((64-(mask))&0xffff)) +/* return value shifted and masked by mask\n + * variable|=HCA_BMASK_SET(MY_MASK,0x4711) ORs the bits in variable\n + * variable&=~HCA_BMASK_SET(MY_MASK,-1) clears the bits from the mask + * in variable + */ +#define EHCA_BMASK_SET(mask,value) \ + ((EHCA_BMASK_MASK(mask) & ((u64)(value)))<>EHCA_BMASK_SHIFTPOS(mask))) + +#define PARANOIA_MODE +#ifdef PARANOIA_MODE + +#define EHCA_CHECK_ADR_P(adr) \ + if (unlikely(adr == 0)) { \ + EDEB_ERR(4, "adr=%p check failed line %i", adr, \ + __LINE__); \ + return ERR_PTR(-EFAULT); } + +#define EHCA_CHECK_ADR(adr) \ + if (unlikely(adr == 0)) { \ + EDEB_ERR(4, "adr=%p check failed line %i", adr, \ + __LINE__); \ + return -EFAULT; } + +#define EHCA_CHECK_DEVICE_P(device) \ + if (unlikely(device == 0)) { \ + EDEB_ERR(4, "device=%p check failed", device); \ + return ERR_PTR(-EFAULT); } + +#define EHCA_CHECK_DEVICE(device) \ + if (unlikely(device == 0)) { \ + EDEB_ERR(4, "device=%p check failed", device); \ + return -EFAULT; } + +#define EHCA_CHECK_PD(pd) \ + if (unlikely(pd == 0)) { \ + EDEB_ERR(4, "pd=%p check failed", pd); \ + return -EFAULT; } + +#define EHCA_CHECK_PD_P(pd) \ + if (unlikely(pd == 0)) { \ + EDEB_ERR(4, "pd=%p check failed", pd); \ + return ERR_PTR(-EFAULT); } + +#define EHCA_CHECK_AV(av) \ + if (unlikely(av == 0)) { \ + EDEB_ERR(4, "av=%p check failed", av); \ + return -EFAULT; } + +#define EHCA_CHECK_AV_P(av) \ + if (unlikely(av == 0)) { \ + EDEB_ERR(4, "av=%p check failed", av); \ + return ERR_PTR(-EFAULT); } + +#define EHCA_CHECK_CQ(cq) \ + if (unlikely(cq == 0)) { \ + EDEB_ERR(4, "cq=%p check failed", cq); \ + return -EFAULT; } + +#define EHCA_CHECK_CQ_P(cq) \ + if (unlikely(cq == 0)) { \ + EDEB_ERR(4, "cq=%p check failed", cq); \ + return ERR_PTR(-EFAULT); } + +#define EHCA_CHECK_EQ(eq) \ + if (unlikely(eq == 0)) { \ + EDEB_ERR(4, "eq=%p check failed", eq); \ + return -EFAULT; } + +#define EHCA_CHECK_EQ_P(eq) \ + if (unlikely(eq == 0)) { \ + EDEB_ERR(4, "eq=%p check failed", eq); \ + return ERR_PTR(-EFAULT); } + +#define EHCA_CHECK_QP(qp) \ + if (unlikely(qp == 0)) { \ + EDEB_ERR(4, "qp=%p check failed", qp); \ + return -EFAULT; } + +#define EHCA_CHECK_QP_P(qp) \ + if (unlikely(qp == 0)) { \ + EDEB_ERR(4, "qp=%p check failed", qp); \ + return ERR_PTR(-EFAULT); } + +#define EHCA_CHECK_MR(mr) \ + if (unlikely(mr == 0)) { \ + EDEB_ERR(4, "mr=%p check failed", mr); \ + return -EFAULT; } + +#define EHCA_CHECK_MR_P(mr) \ + if (unlikely(mr == 0)) { \ + EDEB_ERR(4, "mr=%p check failed", mr); \ + return ERR_PTR(-EFAULT); } + +#define EHCA_CHECK_MW(mw) \ + if (unlikely(mw == 0)) { \ + EDEB_ERR(4, "mw=%p check failed", mw); \ + return -EFAULT; } + +#define EHCA_CHECK_MW_P(mw) \ + if (unlikely(mw == 0)) { \ + EDEB_ERR(4, "mw=%p check failed", mw); \ + return ERR_PTR(-EFAULT); } + +#define EHCA_CHECK_FMR(fmr) \ + if (unlikely(fmr == 0)) { \ + EDEB_ERR(4, "fmr=%p check failed", fmr); \ + return -EFAULT; } + +#define EHCA_CHECK_FMR_P(fmr) \ + if (unlikely(fmr == 0)) { \ + EDEB_ERR(4, "fmr=%p check failed", fmr); \ + return ERR_PTR(-EFAULT); } + +#define EHCA_REGISTER_PD(device,pd) +#define EHCA_REGISTER_AV(pd,av) +#define EHCA_DEREGISTER_PD(PD) +#define EHCA_DEREGISTER_AV(av) +#else +#define EHCA_CHECK_DEVICE_P(device) + +#define EHCA_CHECK_PD(pd) +#define EHCA_REGISTER_PD(device,pd) +#define EHCA_DEREGISTER_PD(PD) +#endif + +/** + * ehca_adr_bad - Handle to be used for adress translation mechanisms, + * currently a placeholder. + */ +static inline int ehca_adr_bad(void *adr) +{ + return !adr; +} + +/** + * ehca2ib_return_code - Returns ib return code corresponding to the given + * ehca return code. + */ +static inline int ehca2ib_return_code(u64 ehca_rc) +{ + switch (ehca_rc) { + case H_SUCCESS: + return 0; + case H_BUSY: + return -EBUSY; + case H_NO_MEM: + return -ENOMEM; + default: + return -EINVAL; + } +} + +#endif /* EHCA_TOOLS_H */ From Thomas.Talpey at netapp.com Mon May 15 10:40:52 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Mon, 15 May 2006 13:40:52 -0400 Subject: [openib-general] CMA IPv6 support In-Reply-To: <7.0.1.0.2.20060515131807.041caef8@netapp.com> References: <20060515164443.GA19163@mellanox.co.il> <7.0.1.0.2.20060515131807.041caef8@netapp.com> Message-ID: <7.0.1.0.2.20060515132921.041caef8@netapp.com> At 01:26 PM 5/15/2006, Talpey, Thomas wrote: >At 01:05 PM 5/15/2006, Sean Hefty wrote: >>I came to the same conclusion a couple of weeks ago. Rdma_create_id() will >>likely need an address family parameter, or the user must explicitly >>bind before calling listen. > >Rdma_create_id() already takes a struct sockaddr *, which has an address >family selector (sa_family) to define the contained address format. Why is >that one not sufficient? Scratch that, I was looking at our usage one layer up in the NFS/RDMA code, which does have the struct sockaddr *. Looking at rdma_listen(), the code I see checks for bound state before proceeding to listen: int rdma_listen(struct rdma_cm_id *id, int backlog) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_LISTEN)) return -EINVAL; ... This makes sense, because sockets work this way, and servers generally want to listen on a port of their own choosing. So, I think it's already there. Right? Tom. From schihei at de.ibm.com Mon May 15 10:42:03 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:42:03 +0200 Subject: [openib-general] [PATCH 07/16] ehca: memory region Message-ID: <4468BD6B.6090200@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/ehca_mrmw.c | 2474 +++++++++++++++++++++++++++++++++ drivers/infiniband/hw/ehca/ehca_mrmw.h | 143 + 2 files changed, 2617 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_mrmw.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_mrmw.h 2006-04-28 15:18:05.000000000 +0200 @@ -0,0 +1,143 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * MR/MW declarations and inline functions + * + * Authors: Dietmar Decker + * Christoph Raisch + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _EHCA_MRMW_H_ +#define _EHCA_MRMW_H_ + +#undef DEB_PREFIX +#define DEB_PREFIX "mrmw" + +int ehca_reg_mr(struct ehca_shca *shca, + struct ehca_mr *e_mr, + u64 *iova_start, + u64 size, + int acl, + struct ehca_pd *e_pd, + struct ehca_mr_pginfo *pginfo, + u32 *lkey, + u32 *rkey); + +int ehca_reg_mr_rpages(struct ehca_shca *shca, + struct ehca_mr *e_mr, + struct ehca_mr_pginfo *pginfo); + +int ehca_rereg_mr(struct ehca_shca *shca, + struct ehca_mr *e_mr, + u64 *iova_start, + u64 size, + int mr_access_flags, + struct ehca_pd *e_pd, + struct ehca_mr_pginfo *pginfo, + u32 *lkey, + u32 *rkey); + +int ehca_unmap_one_fmr(struct ehca_shca *shca, + struct ehca_mr *e_fmr); + +int ehca_reg_smr(struct ehca_shca *shca, + struct ehca_mr *e_origmr, + struct ehca_mr *e_newmr, + u64 *iova_start, + int acl, + struct ehca_pd *e_pd, + u32 *lkey, + u32 *rkey); + +int ehca_reg_internal_maxmr(struct ehca_shca *shca, + struct ehca_pd *e_pd, + struct ehca_mr **maxmr); + +int ehca_reg_maxmr(struct ehca_shca *shca, + struct ehca_mr *e_newmr, + u64 *iova_start, + int acl, + struct ehca_pd *e_pd, + u32 *lkey, + u32 *rkey); + +int ehca_dereg_internal_maxmr(struct ehca_shca *shca); + +int ehca_mr_chk_buf_and_calc_size(struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + u64 *iova_start, + u64 *size); + +int ehca_fmr_check_page_list(struct ehca_mr *e_fmr, + u64 *page_list, + int list_len); + +int ehca_set_pagebuf(struct ehca_mr *e_mr, + struct ehca_mr_pginfo *pginfo, + u32 number, + u64 *kpage); + +int ehca_set_pagebuf_1(struct ehca_mr *e_mr, + struct ehca_mr_pginfo *pginfo, + u64 *rpage); + +int ehca_mr_is_maxmr(u64 size, + u64 *iova_start); + +void ehca_mrmw_map_acl(int ib_acl, + u32 *hipz_acl); + +void ehca_mrmw_set_pgsize_hipz_acl(u32 *hipz_acl); + +void ehca_mrmw_reverse_map_acl(const u32 *hipz_acl, + int *ib_acl); + +int ehca_mrmw_map_hrc_alloc(const u64 hipz_rc); + +int ehca_mrmw_map_hrc_rrpg_last(const u64 hipz_rc); + +int ehca_mrmw_map_hrc_rrpg_notlast(const u64 hipz_rc); + +int ehca_mrmw_map_hrc_query_mr(const u64 hipz_rc); + +int ehca_mrmw_map_hrc_free_mr(const u64 hipz_rc); + +int ehca_mrmw_map_hrc_free_mw(const u64 hipz_rc); + +int ehca_mrmw_map_hrc_reg_smr(const u64 hipz_rc); + +void ehca_mr_deletenew(struct ehca_mr *mr); + +#endif /*_EHCA_MRMW_H_*/ --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_mrmw.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_mrmw.c 2006-05-15 15:43:31.000000000 +0200 @@ -0,0 +1,2474 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * MR/MW functions + * + * Authors: Dietmar Decker + * Christoph Raisch + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#undef DEB_PREFIX +#define DEB_PREFIX "mrmw" + +#include + +#include "ehca_iverbs.h" +#include "ehca_mrmw.h" +#include "hcp_if.h" +#include "hipz_hw.h" + +extern int ehca_use_hp_mr; + +static struct ehca_mr *ehca_mr_new(void) +{ + extern struct ehca_module ehca_module; + struct ehca_mr *me; + + me = kmem_cache_alloc(ehca_module.cache_mr, SLAB_KERNEL); + if (me) { + memset(me, 0, sizeof(struct ehca_mr)); + spin_lock_init(&me->mrlock); + EDEB_EX(7, "ehca_mr=%p sizeof(ehca_mr_t)=%x", me, + (u32) sizeof(struct ehca_mr)); + } else { + EDEB_ERR(3, "alloc failed"); + } + + return me; +} + +static void ehca_mr_delete(struct ehca_mr *me) +{ + extern struct ehca_module ehca_module; + + kmem_cache_free(ehca_module.cache_mr, me); +} + +static struct ehca_mw *ehca_mw_new(void) +{ + extern struct ehca_module ehca_module; + struct ehca_mw *me; + + me = kmem_cache_alloc(ehca_module.cache_mw, SLAB_KERNEL); + if (me) { + memset(me, 0, sizeof(struct ehca_mw)); + spin_lock_init(&me->mwlock); + EDEB_EX(7, "ehca_mw=%p sizeof(ehca_mw_t)=%x", me, + (u32) sizeof(struct ehca_mw)); + } else { + EDEB_ERR(3, "alloc failed"); + } + + return me; +} + +static void ehca_mw_delete(struct ehca_mw *me) +{ + extern struct ehca_module ehca_module; + + kmem_cache_free(ehca_module.cache_mw, me); +} + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +struct ib_mr *ehca_get_dma_mr(struct ib_pd *pd, int mr_access_flags) +{ + struct ib_mr *ib_mr = NULL; + int ret = 0; + struct ehca_mr *e_maxmr = NULL; + struct ehca_pd *e_pd = NULL; + struct ehca_shca *shca = NULL; + + EDEB_EN(7, "pd=%p mr_access_flags=%x", pd, mr_access_flags); + + EHCA_CHECK_PD_P(pd); + e_pd = container_of(pd, struct ehca_pd, ib_pd); + shca = container_of(pd->device, struct ehca_shca, ib_device); + + if (shca->maxmr) { + e_maxmr = ehca_mr_new(); + if (!e_maxmr) { + EDEB_ERR(4, "out of memory"); + ib_mr = ERR_PTR(-ENOMEM); + goto get_dma_mr_exit0; + } + + ret = ehca_reg_maxmr(shca, e_maxmr, (u64*)KERNELBASE, + mr_access_flags, e_pd, + &e_maxmr->ib.ib_mr.lkey, + &e_maxmr->ib.ib_mr.rkey); + if (ret) { + ib_mr = ERR_PTR(ret); + goto get_dma_mr_exit0; + } + ib_mr = &e_maxmr->ib.ib_mr; + } else { + EDEB_ERR(4, "no internal max-MR exist!"); + ib_mr = ERR_PTR(-EINVAL); + goto get_dma_mr_exit0; + } + +get_dma_mr_exit0: + if (IS_ERR(ib_mr)) + EDEB_EX(4, "rc=%lx pd=%p mr_access_flags=%x ", + PTR_ERR(ib_mr), pd, mr_access_flags); + else + EDEB_EX(7, "ib_mr=%p lkey=%x rkey=%x", + ib_mr, ib_mr->lkey, ib_mr->rkey); + return ib_mr; +} /* end ehca_get_dma_mr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +struct ib_mr *ehca_reg_phys_mr(struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start) +{ + struct ib_mr *ib_mr = NULL; + int ret = 0; + struct ehca_mr *e_mr = NULL; + struct ehca_shca *shca = NULL; + struct ehca_pd *e_pd = NULL; + u64 size = 0; + struct ehca_mr_pginfo pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0}; + u32 num_pages_mr = 0; + u32 num_pages_4k = 0; /* 4k portion "pages" */ + + EDEB_EN(7, "pd=%p phys_buf_array=%p num_phys_buf=%x " + "mr_access_flags=%x iova_start=%p", pd, phys_buf_array, + num_phys_buf, mr_access_flags, iova_start); + + EHCA_CHECK_PD_P(pd); + if ((num_phys_buf <= 0) || ehca_adr_bad(phys_buf_array)) { + EDEB_ERR(4, "bad input values: num_phys_buf=%x " + "phys_buf_array=%p", num_phys_buf, phys_buf_array); + ib_mr = ERR_PTR(-EINVAL); + goto reg_phys_mr_exit0; + } + if (((mr_access_flags & IB_ACCESS_REMOTE_WRITE) && + !(mr_access_flags & IB_ACCESS_LOCAL_WRITE)) || + ((mr_access_flags & IB_ACCESS_REMOTE_ATOMIC) && + !(mr_access_flags & IB_ACCESS_LOCAL_WRITE))) { + /* Remote Write Access requires Local Write Access */ + /* Remote Atomic Access requires Local Write Access */ + EDEB_ERR(4, "bad input values: mr_access_flags=%x", + mr_access_flags); + ib_mr = ERR_PTR(-EINVAL); + goto reg_phys_mr_exit0; + } + + /* check physical buffer list and calculate size */ + ret = ehca_mr_chk_buf_and_calc_size(phys_buf_array, num_phys_buf, + iova_start, &size); + if (ret) { + ib_mr = ERR_PTR(ret); + goto reg_phys_mr_exit0; + } + if ((size == 0) || + (((u64)iova_start + size) < (u64)iova_start)) { + EDEB_ERR(4, "bad input values: size=%lx iova_start=%p", + size, iova_start); + ib_mr = ERR_PTR(-EINVAL); + goto reg_phys_mr_exit0; + } + + e_pd = container_of(pd, struct ehca_pd, ib_pd); + shca = container_of(pd->device, struct ehca_shca, ib_device); + + e_mr = ehca_mr_new(); + if (!e_mr) { + EDEB_ERR(4, "out of memory"); + ib_mr = ERR_PTR(-ENOMEM); + goto reg_phys_mr_exit0; + } + + /* determine number of MR pages */ + num_pages_mr = ((((u64)iova_start % PAGE_SIZE) + size + + PAGE_SIZE - 1) / PAGE_SIZE); + num_pages_4k = ((((u64)iova_start % EHCA_PAGESIZE) + size + + EHCA_PAGESIZE - 1) / EHCA_PAGESIZE); + + /* register MR on HCA */ + if (ehca_mr_is_maxmr(size, iova_start)) { + e_mr->flags |= EHCA_MR_FLAG_MAXMR; + ret = ehca_reg_maxmr(shca, e_mr, iova_start, mr_access_flags, + e_pd, &e_mr->ib.ib_mr.lkey, + &e_mr->ib.ib_mr.rkey); + if (ret) { + ib_mr = ERR_PTR(ret); + goto reg_phys_mr_exit1; + } + } else { + pginfo.type = EHCA_MR_PGI_PHYS; + pginfo.num_pages = num_pages_mr; + pginfo.num_4k = num_pages_4k; + pginfo.num_phys_buf = num_phys_buf; + pginfo.phys_buf_array = phys_buf_array; + pginfo.next_4k = (((u64)iova_start & ~PAGE_MASK) / + EHCA_PAGESIZE); + + ret = ehca_reg_mr(shca, e_mr, iova_start, size, mr_access_flags, + e_pd, &pginfo, &e_mr->ib.ib_mr.lkey, + &e_mr->ib.ib_mr.rkey); + if (ret) { + ib_mr = ERR_PTR(ret); + goto reg_phys_mr_exit1; + } + } + + /* successful registration of all pages */ + ib_mr = &e_mr->ib.ib_mr; + goto reg_phys_mr_exit0; + +reg_phys_mr_exit1: + ehca_mr_delete(e_mr); +reg_phys_mr_exit0: + if (IS_ERR(ib_mr)) + EDEB_EX(4, "rc=%lx pd=%p phys_buf_array=%p " + "num_phys_buf=%x mr_access_flags=%x iova_start=%p", + PTR_ERR(ib_mr), pd, phys_buf_array, + num_phys_buf, mr_access_flags, iova_start); + else + EDEB_EX(7, "ib_mr=%p lkey=%x rkey=%x", + ib_mr, ib_mr->lkey, ib_mr->rkey); + return ib_mr; +} /* end ehca_reg_phys_mr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +struct ib_mr *ehca_reg_user_mr(struct ib_pd *pd, + struct ib_umem *region, + int mr_access_flags, + struct ib_udata *udata) +{ + struct ib_mr *ib_mr = NULL; + struct ehca_mr *e_mr = NULL; + struct ehca_shca *shca = NULL; + struct ehca_pd *e_pd = NULL; + struct ehca_mr_pginfo pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0}; + int ret = 0; + u32 num_pages_mr = 0; + u32 num_pages_4k = 0; /* 4k portion "pages" */ + + EDEB_EN(7, "pd=%p region=%p mr_access_flags=%x udata=%p", + pd, region, mr_access_flags, udata); + + EHCA_CHECK_PD_P(pd); + if (ehca_adr_bad(region)) { + EDEB_ERR(4, "bad input values: region=%p", region); + ib_mr = ERR_PTR(-EINVAL); + goto reg_user_mr_exit0; + } + if (((mr_access_flags & IB_ACCESS_REMOTE_WRITE) && + !(mr_access_flags & IB_ACCESS_LOCAL_WRITE)) || + ((mr_access_flags & IB_ACCESS_REMOTE_ATOMIC) && + !(mr_access_flags & IB_ACCESS_LOCAL_WRITE))) { + /* Remote Write Access requires Local Write Access */ + /* Remote Atomic Access requires Local Write Access */ + EDEB_ERR(4, "bad input values: mr_access_flags=%x", + mr_access_flags); + ib_mr = ERR_PTR(-EINVAL); + goto reg_user_mr_exit0; + } + EDEB(7, "user_base=%lx virt_base=%lx length=%lx offset=%x page_size=%x " + "chunk_list.next=%p", + region->user_base, region->virt_base, region->length, + region->offset, region->page_size, region->chunk_list.next); + if (region->page_size != PAGE_SIZE) { + EDEB_ERR(4, "page size not supported, region->page_size=%x", + region->page_size); + ib_mr = ERR_PTR(-EINVAL); + goto reg_user_mr_exit0; + } + + if ((region->length == 0) || + ((region->virt_base + region->length) < region->virt_base)) { + EDEB_ERR(4, "bad input values: length=%lx virt_base=%lx", + region->length, region->virt_base); + ib_mr = ERR_PTR(-EINVAL); + goto reg_user_mr_exit0; + } + + e_pd = container_of(pd, struct ehca_pd, ib_pd); + shca = container_of(pd->device, struct ehca_shca, ib_device); + + e_mr = ehca_mr_new(); + if (!e_mr) { + EDEB_ERR(4, "out of memory"); + ib_mr = ERR_PTR(-ENOMEM); + goto reg_user_mr_exit0; + } + + /* determine number of MR pages */ + num_pages_mr = (((region->virt_base % PAGE_SIZE) + region->length + + PAGE_SIZE - 1) / PAGE_SIZE); + num_pages_4k = (((region->virt_base % EHCA_PAGESIZE) + region->length + + EHCA_PAGESIZE - 1) / EHCA_PAGESIZE); + + /* register MR on HCA */ + pginfo.type = EHCA_MR_PGI_USER; + pginfo.num_pages = num_pages_mr; + pginfo.num_4k = num_pages_4k; + pginfo.region = region; + pginfo.next_4k = region->offset / EHCA_PAGESIZE; + pginfo.next_chunk = list_prepare_entry(pginfo.next_chunk, + (®ion->chunk_list), + list); + + ret = ehca_reg_mr(shca, e_mr, (u64*)region->virt_base, + region->length, mr_access_flags, e_pd, &pginfo, + &e_mr->ib.ib_mr.lkey, &e_mr->ib.ib_mr.rkey); + if (ret) { + ib_mr = ERR_PTR(ret); + goto reg_user_mr_exit1; + } + + /* successful registration of all pages */ + ib_mr = &e_mr->ib.ib_mr; + goto reg_user_mr_exit0; + +reg_user_mr_exit1: + ehca_mr_delete(e_mr); +reg_user_mr_exit0: + if (IS_ERR(ib_mr)) + EDEB_EX(4, "rc=%lx pd=%p region=%p mr_access_flags=%x " + "udata=%p", + PTR_ERR(ib_mr), pd, region, mr_access_flags, udata); + else + EDEB_EX(7, "ib_mr=%p lkey=%x rkey=%x", + ib_mr, ib_mr->lkey, ib_mr->rkey); + return ib_mr; +} /* end ehca_reg_user_mr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_rereg_phys_mr(struct ib_mr *mr, + int mr_rereg_mask, + struct ib_pd *pd, + struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + int mr_access_flags, + u64 *iova_start) +{ + int ret = 0; + struct ehca_shca *shca = NULL; + struct ehca_mr *e_mr = NULL; + u64 new_size = 0; + u64 *new_start = NULL; + u32 new_acl = 0; + struct ehca_pd *new_pd = NULL; + u32 tmp_lkey = 0; + u32 tmp_rkey = 0; + unsigned long sl_flags; + u32 num_pages_mr = 0; + u32 num_pages_4k = 0; /* 4k portion "pages" */ + struct ehca_mr_pginfo pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0}; + struct ehca_pd *my_pd = NULL; + u32 cur_pid = current->tgid; + + EDEB_EN(7, "mr=%p mr_rereg_mask=%x pd=%p phys_buf_array=%p " + "num_phys_buf=%x mr_access_flags=%x iova_start=%p", + mr, mr_rereg_mask, pd, phys_buf_array, num_phys_buf, + mr_access_flags, iova_start); + + EHCA_CHECK_MR(mr); + my_pd = container_of(mr->pd, struct ehca_pd, ib_pd); + if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && + (my_pd->ownpid != cur_pid)) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, my_pd->ownpid); + ret = -EINVAL; + goto rereg_phys_mr_exit0; + } + + if (!(mr_rereg_mask & IB_MR_REREG_TRANS)) { + /* TODO not supported, because PHYP rereg hCall needs pages*/ + /* TODO: We will follow this with Tom ....*/ + EDEB_ERR(4, "rereg without IB_MR_REREG_TRANS not supported yet," + " mr_rereg_mask=%x", mr_rereg_mask); + ret = -EINVAL; + goto rereg_phys_mr_exit0; + } + + e_mr = container_of(mr, struct ehca_mr, ib.ib_mr); + if (mr_rereg_mask & IB_MR_REREG_PD) { + EHCA_CHECK_PD(pd); + } + + if ((mr_rereg_mask & + ~(IB_MR_REREG_TRANS | IB_MR_REREG_PD | IB_MR_REREG_ACCESS)) || + (mr_rereg_mask == 0)) { + ret = -EINVAL; + goto rereg_phys_mr_exit0; + } + + shca = container_of(mr->device, struct ehca_shca, ib_device); + + /* check other parameters */ + if (e_mr == shca->maxmr) { + /* should be impossible, however reject to be sure */ + EDEB_ERR(3, "rereg internal max-MR impossible, mr=%p " + "shca->maxmr=%p mr->lkey=%x", + mr, shca->maxmr, mr->lkey); + ret = -EINVAL; + goto rereg_phys_mr_exit0; + } + if (mr_rereg_mask & IB_MR_REREG_TRANS) { /* transl., i.e. addr/size */ + if (e_mr->flags & EHCA_MR_FLAG_FMR) { + EDEB_ERR(4, "not supported for FMR, mr=%p flags=%x", + mr, e_mr->flags); + ret = -EINVAL; + goto rereg_phys_mr_exit0; + } + if (ehca_adr_bad(phys_buf_array) || num_phys_buf <= 0) { + EDEB_ERR(4, "bad input values: mr_rereg_mask=%x " + "phys_buf_array=%p num_phys_buf=%x", + mr_rereg_mask, phys_buf_array, num_phys_buf); + ret = -EINVAL; + goto rereg_phys_mr_exit0; + } + } + if ((mr_rereg_mask & IB_MR_REREG_ACCESS) && /* change ACL */ + (((mr_access_flags & IB_ACCESS_REMOTE_WRITE) && + !(mr_access_flags & IB_ACCESS_LOCAL_WRITE)) || + ((mr_access_flags & IB_ACCESS_REMOTE_ATOMIC) && + !(mr_access_flags & IB_ACCESS_LOCAL_WRITE)))) { + /* Remote Write Access requires Local Write Access */ + /* Remote Atomic Access requires Local Write Access */ + EDEB_ERR(4, "bad input values: mr_rereg_mask=%x " + "mr_access_flags=%x", mr_rereg_mask, mr_access_flags); + ret = -EINVAL; + goto rereg_phys_mr_exit0; + } + + /* set requested values dependent on rereg request */ + spin_lock_irqsave(&e_mr->mrlock, sl_flags); + new_start = e_mr->start; /* new == old address */ + new_size = e_mr->size; /* new == old length */ + new_acl = e_mr->acl; /* new == old access control */ + new_pd = container_of(mr->pd,struct ehca_pd,ib_pd); /*new == old PD*/ + + if (mr_rereg_mask & IB_MR_REREG_TRANS) { + new_start = iova_start; /* change address */ + /* check physical buffer list and calculate size */ + ret = ehca_mr_chk_buf_and_calc_size(phys_buf_array, + num_phys_buf, iova_start, + &new_size); + if (ret) + goto rereg_phys_mr_exit1; + if ((new_size == 0) || + (((u64)iova_start + new_size) < (u64)iova_start)) { + EDEB_ERR(4, "bad input values: new_size=%lx " + "iova_start=%p", new_size, iova_start); + ret = -EINVAL; + goto rereg_phys_mr_exit1; + } + num_pages_mr = ((((u64)new_start % PAGE_SIZE) + new_size + + PAGE_SIZE - 1) / PAGE_SIZE); + num_pages_4k = ((((u64)new_start % EHCA_PAGESIZE) + new_size + + EHCA_PAGESIZE - 1) / EHCA_PAGESIZE); + pginfo.type = EHCA_MR_PGI_PHYS; + pginfo.num_pages = num_pages_mr; + pginfo.num_4k = num_pages_4k; + pginfo.num_phys_buf = num_phys_buf; + pginfo.phys_buf_array = phys_buf_array; + pginfo.next_4k = (((u64)iova_start & ~PAGE_MASK) / + EHCA_PAGESIZE); + } + if (mr_rereg_mask & IB_MR_REREG_ACCESS) + new_acl = mr_access_flags; + if (mr_rereg_mask & IB_MR_REREG_PD) + new_pd = container_of(pd, struct ehca_pd, ib_pd); + + EDEB(7, "mr=%p new_start=%p new_size=%lx new_acl=%x new_pd=%p " + "num_pages_mr=%x num_pages_4k=%x", e_mr, new_start, new_size, + new_acl, new_pd, num_pages_mr, num_pages_4k); + + ret = ehca_rereg_mr(shca, e_mr, new_start, new_size, new_acl, + new_pd, &pginfo, &tmp_lkey, &tmp_rkey); + if (ret) + goto rereg_phys_mr_exit1; + + /* successful reregistration */ + if (mr_rereg_mask & IB_MR_REREG_PD) + mr->pd = pd; + mr->lkey = tmp_lkey; + mr->rkey = tmp_rkey; + +rereg_phys_mr_exit1: + spin_unlock_irqrestore(&e_mr->mrlock, sl_flags); +rereg_phys_mr_exit0: + if (ret) + EDEB_EX(4, "ret=%x mr=%p mr_rereg_mask=%x pd=%p " + "phys_buf_array=%p num_phys_buf=%x mr_access_flags=%x " + "iova_start=%p", + ret, mr, mr_rereg_mask, pd, phys_buf_array, + num_phys_buf, mr_access_flags, iova_start); + else + EDEB_EX(7, "mr=%p mr_rereg_mask=%x pd=%p phys_buf_array=%p " + "num_phys_buf=%x mr_access_flags=%x iova_start=%p", + mr, mr_rereg_mask, pd, phys_buf_array, num_phys_buf, + mr_access_flags, iova_start); + + return ret; +} /* end ehca_rereg_phys_mr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr) +{ + int ret = 0; + u64 h_ret = H_SUCCESS; + struct ehca_shca *shca = NULL; + struct ehca_mr *e_mr = NULL; + struct ehca_pd *my_pd = NULL; + u32 cur_pid = current->tgid; + unsigned long sl_flags; + struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0}; + + EDEB_EN(7, "mr=%p mr_attr=%p", mr, mr_attr); + + EHCA_CHECK_MR(mr); + + my_pd = container_of(mr->pd, struct ehca_pd, ib_pd); + if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && + (my_pd->ownpid != cur_pid)) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, my_pd->ownpid); + ret = -EINVAL; + goto query_mr_exit0; + } + + e_mr = container_of(mr, struct ehca_mr, ib.ib_mr); + if (ehca_adr_bad(mr_attr)) { + EDEB_ERR(4, "bad input values: mr_attr=%p", mr_attr); + ret = -EINVAL; + goto query_mr_exit0; + } + if ((e_mr->flags & EHCA_MR_FLAG_FMR)) { + EDEB_ERR(4, "not supported for FMR, mr=%p e_mr=%p " + "e_mr->flags=%x", mr, e_mr, e_mr->flags); + ret = -EINVAL; + goto query_mr_exit0; + } + + shca = container_of(mr->device, struct ehca_shca, ib_device); + memset(mr_attr, 0, sizeof(struct ib_mr_attr)); + spin_lock_irqsave(&e_mr->mrlock, sl_flags); + + h_ret = hipz_h_query_mr(shca->ipz_hca_handle, e_mr, &hipzout); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "hipz_mr_query failed, h_ret=%lx mr=%p " + "hca_hndl=%lx mr_hndl=%lx lkey=%x", + h_ret, mr, shca->ipz_hca_handle.handle, + e_mr->ipz_mr_handle.handle, mr->lkey); + ret = ehca_mrmw_map_hrc_query_mr(h_ret); + goto query_mr_exit1; + } + mr_attr->pd = mr->pd; + mr_attr->device_virt_addr = hipzout.vaddr; + mr_attr->size = hipzout.len; + mr_attr->lkey = hipzout.lkey; + mr_attr->rkey = hipzout.rkey; + ehca_mrmw_reverse_map_acl(&hipzout.acl, &mr_attr->mr_access_flags); + +query_mr_exit1: + spin_unlock_irqrestore(&e_mr->mrlock, sl_flags); +query_mr_exit0: + if (ret) + EDEB_EX(4, "ret=%x mr=%p mr_attr=%p", ret, mr, mr_attr); + else + EDEB_EX(7, "pd=%p device_virt_addr=%lx size=%lx " + "mr_access_flags=%x lkey=%x rkey=%x", + mr_attr->pd, mr_attr->device_virt_addr, + mr_attr->size, mr_attr->mr_access_flags, + mr_attr->lkey, mr_attr->rkey); + return ret; +} /* end ehca_query_mr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_dereg_mr(struct ib_mr *mr) +{ + int ret = 0; + u64 h_ret = H_SUCCESS; + struct ehca_shca *shca = NULL; + struct ehca_mr *e_mr = NULL; + struct ehca_pd *my_pd = NULL; + u32 cur_pid = current->tgid; + + EDEB_EN(7, "mr=%p", mr); + + EHCA_CHECK_MR(mr); + my_pd = container_of(mr->pd, struct ehca_pd, ib_pd); + if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && + (my_pd->ownpid != cur_pid)) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, my_pd->ownpid); + ret = -EINVAL; + goto dereg_mr_exit0; + } + + e_mr = container_of(mr, struct ehca_mr, ib.ib_mr); + shca = container_of(mr->device, struct ehca_shca, ib_device); + + if ((e_mr->flags & EHCA_MR_FLAG_FMR)) { + EDEB_ERR(4, "not supported for FMR, mr=%p e_mr=%p " + "e_mr->flags=%x", mr, e_mr, e_mr->flags); + ret = -EINVAL; + goto dereg_mr_exit0; + } else if (e_mr == shca->maxmr) { + /* should be impossible, however reject to be sure */ + EDEB_ERR(3, "dereg internal max-MR impossible, mr=%p " + "shca->maxmr=%p mr->lkey=%x", + mr, shca->maxmr, mr->lkey); + ret = -EINVAL; + goto dereg_mr_exit0; + } + + /* TODO: BUSY: MR still has bound window(s) */ + h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, e_mr); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "hipz_free_mr failed, h_ret=%lx shca=%p e_mr=%p" + " hca_hndl=%lx mr_hndl=%lx mr->lkey=%x", + h_ret, shca, e_mr, shca->ipz_hca_handle.handle, + e_mr->ipz_mr_handle.handle, mr->lkey); + ret = ehca_mrmw_map_hrc_free_mr(h_ret); + goto dereg_mr_exit0; + } + + /* successful deregistration */ + ehca_mr_delete(e_mr); + +dereg_mr_exit0: + if (ret) + EDEB_EX(4, "ret=%x mr=%p", ret, mr); + else + EDEB_EX(7, ""); + return ret; +} /* end ehca_dereg_mr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +struct ib_mw *ehca_alloc_mw(struct ib_pd *pd) +{ + struct ib_mw *ib_mw = NULL; + u64 h_ret = H_SUCCESS; + struct ehca_shca *shca = NULL; + struct ehca_mw *e_mw = NULL; + struct ehca_pd *e_pd = NULL; + struct ehca_mw_hipzout_parms hipzout = {{0},0}; + + EDEB_EN(7, "pd=%p", pd); + + EHCA_CHECK_PD_P(pd); + e_pd = container_of(pd, struct ehca_pd, ib_pd); + shca = container_of(pd->device, struct ehca_shca, ib_device); + + e_mw = ehca_mw_new(); + if (!e_mw) { + ib_mw = ERR_PTR(-ENOMEM); + goto alloc_mw_exit0; + } + + h_ret = hipz_h_alloc_resource_mw(shca->ipz_hca_handle, e_mw, + e_pd->fw_pd, &hipzout); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "hipz_mw_allocate failed, h_ret=%lx shca=%p " + "hca_hndl=%lx mw=%p", h_ret, shca, + shca->ipz_hca_handle.handle, e_mw); + ib_mw = ERR_PTR(ehca_mrmw_map_hrc_alloc(h_ret)); + goto alloc_mw_exit1; + } + /* successful MW allocation */ + e_mw->ipz_mw_handle = hipzout.handle; + e_mw->ib_mw.rkey = hipzout.rkey; + ib_mw = &e_mw->ib_mw; + goto alloc_mw_exit0; + +alloc_mw_exit1: + ehca_mw_delete(e_mw); +alloc_mw_exit0: + if (IS_ERR(ib_mw)) + EDEB_EX(4, "rc=%lx pd=%p", PTR_ERR(ib_mw), pd); + else + EDEB_EX(7, "ib_mw=%p rkey=%x", ib_mw, ib_mw->rkey); + return ib_mw; +} /* end ehca_alloc_mw() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_bind_mw(struct ib_qp *qp, + struct ib_mw *mw, + struct ib_mw_bind *mw_bind) +{ + int ret = 0; + + /* TODO: not supported up to now */ + EDEB_ERR(4, "bind MW currently not supported by HCAD"); + ret = -EPERM; + goto bind_mw_exit0; + +bind_mw_exit0: + if (ret) + EDEB_EX(4, "ret=%x qp=%p mw=%p mw_bind=%p", + ret, qp, mw, mw_bind); + else + EDEB_EX(7, "qp=%p mw=%p mw_bind=%p", qp, mw, mw_bind); + return ret; +} /* end ehca_bind_mw() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_dealloc_mw(struct ib_mw *mw) +{ + int ret = 0; + u64 h_ret = H_SUCCESS; + struct ehca_shca *shca = NULL; + struct ehca_mw *e_mw = NULL; + + EDEB_EN(7, "mw=%p", mw); + + EHCA_CHECK_MW(mw); + e_mw = container_of(mw, struct ehca_mw, ib_mw); + shca = container_of(mw->device, struct ehca_shca, ib_device); + + h_ret = hipz_h_free_resource_mw(shca->ipz_hca_handle, e_mw); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "hipz_free_mw failed, h_ret=%lx shca=%p mw=%p " + "rkey=%x hca_hndl=%lx mw_hndl=%lx", + h_ret, shca, mw, mw->rkey, shca->ipz_hca_handle.handle, + e_mw->ipz_mw_handle.handle); + ret = ehca_mrmw_map_hrc_free_mw(h_ret); + goto dealloc_mw_exit0; + } + /* successful deallocation */ + ehca_mw_delete(e_mw); + +dealloc_mw_exit0: + if (ret) + EDEB_EX(4, "ret=%x mw=%p", ret, mw); + else + EDEB_EX(7, ""); + return ret; +} /* end ehca_dealloc_mw() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +struct ib_fmr *ehca_alloc_fmr(struct ib_pd *pd, + int mr_access_flags, + struct ib_fmr_attr *fmr_attr) +{ + struct ib_fmr *ib_fmr = NULL; + struct ehca_shca *shca = NULL; + struct ehca_mr *e_fmr = NULL; + int ret = 0; + struct ehca_pd *e_pd = NULL; + u32 tmp_lkey = 0; + u32 tmp_rkey = 0; + struct ehca_mr_pginfo pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0}; + + EDEB_EN(7, "pd=%p mr_access_flags=%x fmr_attr=%p", + pd, mr_access_flags, fmr_attr); + + EHCA_CHECK_PD_P(pd); + if (ehca_adr_bad(fmr_attr)) { + EDEB_ERR(4, "bad input values: fmr_attr=%p", fmr_attr); + ib_fmr = ERR_PTR(-EINVAL); + goto alloc_fmr_exit0; + } + + EDEB(7, "max_pages=%x max_maps=%x page_shift=%x", + fmr_attr->max_pages, fmr_attr->max_maps, fmr_attr->page_shift); + + /* check other parameters */ + if (((mr_access_flags & IB_ACCESS_REMOTE_WRITE) && + !(mr_access_flags & IB_ACCESS_LOCAL_WRITE)) || + ((mr_access_flags & IB_ACCESS_REMOTE_ATOMIC) && + !(mr_access_flags & IB_ACCESS_LOCAL_WRITE))) { + /* Remote Write Access requires Local Write Access */ + /* Remote Atomic Access requires Local Write Access */ + EDEB_ERR(4, "bad input values: mr_access_flags=%x", + mr_access_flags); + ib_fmr = ERR_PTR(-EINVAL); + goto alloc_fmr_exit0; + } + if (mr_access_flags & IB_ACCESS_MW_BIND) { + EDEB_ERR(4, "bad input values: mr_access_flags=%x", + mr_access_flags); + ib_fmr = ERR_PTR(-EINVAL); + goto alloc_fmr_exit0; + } + if ((fmr_attr->max_pages == 0) || (fmr_attr->max_maps == 0)) { + EDEB_ERR(4, "bad input values: fmr_attr->max_pages=%x " + "fmr_attr->max_maps=%x fmr_attr->page_shift=%x", + fmr_attr->max_pages, fmr_attr->max_maps, + fmr_attr->page_shift); + ib_fmr = ERR_PTR(-EINVAL); + goto alloc_fmr_exit0; + } + if (((1 << fmr_attr->page_shift) != EHCA_PAGESIZE) && + ((1 << fmr_attr->page_shift) != PAGE_SIZE)) { + EDEB_ERR(4, "unsupported fmr_attr->page_shift=%x", + fmr_attr->page_shift); + ib_fmr = ERR_PTR(-EINVAL); + goto alloc_fmr_exit0; + } + + e_pd = container_of(pd, struct ehca_pd, ib_pd); + shca = container_of(pd->device, struct ehca_shca, ib_device); + + e_fmr = ehca_mr_new(); + if (!e_fmr) { + ib_fmr = ERR_PTR(-ENOMEM); + goto alloc_fmr_exit0; + } + e_fmr->flags |= EHCA_MR_FLAG_FMR; + + /* register MR on HCA */ + ret = ehca_reg_mr(shca, e_fmr, NULL, + fmr_attr->max_pages * (1 << fmr_attr->page_shift), + mr_access_flags, e_pd, &pginfo, + &tmp_lkey, &tmp_rkey); + if (ret) { + ib_fmr = ERR_PTR(ret); + goto alloc_fmr_exit1; + } + + /* successful */ + e_fmr->fmr_page_size = 1 << fmr_attr->page_shift; + e_fmr->fmr_max_pages = fmr_attr->max_pages; + e_fmr->fmr_max_maps = fmr_attr->max_maps; + e_fmr->fmr_map_cnt = 0; + ib_fmr = &e_fmr->ib.ib_fmr; + goto alloc_fmr_exit0; + +alloc_fmr_exit1: + ehca_mr_delete(e_fmr); +alloc_fmr_exit0: + if (IS_ERR(ib_fmr)) + EDEB_EX(4, "rc=%lx pd=%p mr_access_flags=%x " + "fmr_attr=%p", PTR_ERR(ib_fmr), pd, + mr_access_flags, fmr_attr); + else + EDEB_EX(7, "ib_fmr=%p tmp_lkey=%x tmp_rkey=%x", + ib_fmr, tmp_lkey, tmp_rkey); + return ib_fmr; +} /* end ehca_alloc_fmr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_map_phys_fmr(struct ib_fmr *fmr, + u64 *page_list, + int list_len, + u64 iova) +{ + int ret = 0; + struct ehca_shca *shca = NULL; + struct ehca_mr *e_fmr = NULL; + struct ehca_pd *e_pd = NULL; + struct ehca_mr_pginfo pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0}; + u32 tmp_lkey = 0; + u32 tmp_rkey = 0; + + EDEB_EN(7, "fmr=%p page_list=%p list_len=%x iova=%lx", + fmr, page_list, list_len, iova); + + EHCA_CHECK_FMR(fmr); + e_fmr = container_of(fmr, struct ehca_mr, ib.ib_fmr); + shca = container_of(fmr->device, struct ehca_shca, ib_device); + e_pd = container_of(fmr->pd, struct ehca_pd, ib_pd); + + if (!(e_fmr->flags & EHCA_MR_FLAG_FMR)) { + EDEB_ERR(4, "not a FMR, e_fmr=%p e_fmr->flags=%x", + e_fmr, e_fmr->flags); + ret = -EINVAL; + goto map_phys_fmr_exit0; + } + ret = ehca_fmr_check_page_list(e_fmr, page_list, list_len); + if (ret) + goto map_phys_fmr_exit0; + if (iova % e_fmr->fmr_page_size) { + /* only whole-numbered pages */ + EDEB_ERR(4, "bad iova, iova=%lx fmr_page_size=%x", + iova, e_fmr->fmr_page_size); + ret = -EINVAL; + goto map_phys_fmr_exit0; + } + if (e_fmr->fmr_map_cnt >= e_fmr->fmr_max_maps) { + /* HCAD does not limit the maps, however trace this anyway */ + EDEB(6, "map limit exceeded, fmr=%p e_fmr->fmr_map_cnt=%x " + "e_fmr->fmr_max_maps=%x", + fmr, e_fmr->fmr_map_cnt, e_fmr->fmr_max_maps); + } + + pginfo.type = EHCA_MR_PGI_FMR; + pginfo.num_pages = list_len; + pginfo.page_list = page_list; + pginfo.next_4k = ((iova & (e_fmr->fmr_page_size-1)) / + EHCA_PAGESIZE); + + ret = ehca_rereg_mr(shca, e_fmr, (u64*)iova, + list_len * e_fmr->fmr_page_size, + e_fmr->acl, e_pd, &pginfo, &tmp_lkey, &tmp_rkey); + if (ret) + goto map_phys_fmr_exit0; + + /* successful reregistration */ + e_fmr->fmr_map_cnt++; + e_fmr->ib.ib_fmr.lkey = tmp_lkey; + e_fmr->ib.ib_fmr.rkey = tmp_rkey; + +map_phys_fmr_exit0: + if (ret) + EDEB_EX(4, "ret=%x fmr=%p page_list=%p list_len=%x iova=%lx", + ret, fmr, page_list, list_len, iova); + else + EDEB_EX(7, "lkey=%x rkey=%x", + e_fmr->ib.ib_fmr.lkey, e_fmr->ib.ib_fmr.rkey); + return ret; +} /* end ehca_map_phys_fmr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_unmap_fmr(struct list_head *fmr_list) +{ + int ret = 0; + struct ib_fmr *ib_fmr = NULL; + struct ehca_shca *shca = NULL; + struct ehca_shca *prev_shca = NULL; + struct ehca_mr *e_fmr = NULL; + u32 num_fmr = 0; + u32 unmap_fmr_cnt = 0; + + EDEB_EN(7, "fmr_list=%p", fmr_list); + + /* check all FMR belong to same SHCA, and check internal flag */ + list_for_each_entry(ib_fmr, fmr_list, list) { + prev_shca = shca; + shca = container_of(ib_fmr->device, struct ehca_shca, + ib_device); + EHCA_CHECK_FMR(ib_fmr); + e_fmr = container_of(ib_fmr, struct ehca_mr, ib.ib_fmr); + if ((shca != prev_shca) && prev_shca) { + EDEB_ERR(4, "SHCA mismatch, shca=%p prev_shca=%p " + "e_fmr=%p", shca, prev_shca, e_fmr); + ret = -EINVAL; + goto unmap_fmr_exit0; + } + if (!(e_fmr->flags & EHCA_MR_FLAG_FMR)) { + EDEB_ERR(4, "not a FMR, e_fmr=%p e_fmr->flags=%x", + e_fmr, e_fmr->flags); + ret = -EINVAL; + goto unmap_fmr_exit0; + } + num_fmr++; + } + + /* loop over all FMRs to unmap */ + list_for_each_entry(ib_fmr, fmr_list, list) { + unmap_fmr_cnt++; + e_fmr = container_of(ib_fmr, struct ehca_mr, ib.ib_fmr); + shca = container_of(ib_fmr->device, struct ehca_shca, + ib_device); + ret = ehca_unmap_one_fmr(shca, e_fmr); + if (ret) { + /* unmap failed, stop unmapping of rest of FMRs */ + EDEB_ERR(4, "unmap of one FMR failed, stop rest, " + "e_fmr=%p num_fmr=%x unmap_fmr_cnt=%x lkey=%x", + e_fmr, num_fmr, unmap_fmr_cnt, + e_fmr->ib.ib_fmr.lkey); + goto unmap_fmr_exit0; + } + } + +unmap_fmr_exit0: + if (ret) + EDEB_EX(4, "ret=%x fmr_list=%p num_fmr=%x unmap_fmr_cnt=%x", + ret, fmr_list, num_fmr, unmap_fmr_cnt); + else + EDEB_EX(7, "num_fmr=%x", num_fmr); + return ret; +} /* end ehca_unmap_fmr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_dealloc_fmr(struct ib_fmr *fmr) +{ + int ret = 0; + u64 h_ret = H_SUCCESS; + struct ehca_shca *shca = NULL; + struct ehca_mr *e_fmr = NULL; + + EDEB_EN(7, "fmr=%p", fmr); + + EHCA_CHECK_FMR(fmr); + e_fmr = container_of(fmr, struct ehca_mr, ib.ib_fmr); + shca = container_of(fmr->device, struct ehca_shca, ib_device); + + if (!(e_fmr->flags & EHCA_MR_FLAG_FMR)) { + EDEB_ERR(4, "not a FMR, e_fmr=%p e_fmr->flags=%x", + e_fmr, e_fmr->flags); + ret = -EINVAL; + goto free_fmr_exit0; + } + + h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, e_fmr); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "hipz_free_mr failed, h_ret=%lx e_fmr=%p " + "hca_hndl=%lx fmr_hndl=%lx fmr->lkey=%x", + h_ret, e_fmr, shca->ipz_hca_handle.handle, + e_fmr->ipz_mr_handle.handle, fmr->lkey); + ehca_mrmw_map_hrc_free_mr(h_ret); + goto free_fmr_exit0; + } + /* successful deregistration */ + ehca_mr_delete(e_fmr); + +free_fmr_exit0: + if (ret) + EDEB_EX(4, "ret=%x fmr=%p", ret, fmr); + else + EDEB_EX(7, ""); + return ret; +} /* end ehca_dealloc_fmr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_reg_mr(struct ehca_shca *shca, + struct ehca_mr *e_mr, + u64 *iova_start, + u64 size, + int acl, + struct ehca_pd *e_pd, + struct ehca_mr_pginfo *pginfo, + u32 *lkey, /*OUT*/ + u32 *rkey) /*OUT*/ +{ + int ret = 0; + u64 h_ret = H_SUCCESS; + u32 hipz_acl = 0; + struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0}; + + EDEB_EN(7, "shca=%p e_mr=%p iova_start=%p size=%lx acl=%x e_pd=%p " + "pginfo=%p num_pages=%lx num_4k=%lx", shca, e_mr, iova_start, + size, acl, e_pd, pginfo, pginfo->num_pages, pginfo->num_4k); + + ehca_mrmw_map_acl(acl, &hipz_acl); + ehca_mrmw_set_pgsize_hipz_acl(&hipz_acl); + if (ehca_use_hp_mr == 1) + hipz_acl |= 0x00000001; + + h_ret = hipz_h_alloc_resource_mr(shca->ipz_hca_handle, e_mr, + (u64)iova_start, size, hipz_acl, + e_pd->fw_pd, &hipzout); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "hipz_alloc_mr failed, h_ret=%lx hca_hndl=%lx", + h_ret, shca->ipz_hca_handle.handle); + ret = ehca_mrmw_map_hrc_alloc(h_ret); + goto ehca_reg_mr_exit0; + } + + e_mr->ipz_mr_handle = hipzout.handle; + + ret = ehca_reg_mr_rpages(shca, e_mr, pginfo); + if (ret) + goto ehca_reg_mr_exit1; + + /* successful registration */ + e_mr->num_pages = pginfo->num_pages; + e_mr->num_4k = pginfo->num_4k; + e_mr->start = iova_start; + e_mr->size = size; + e_mr->acl = acl; + *lkey = hipzout.lkey; + *rkey = hipzout.rkey; + goto ehca_reg_mr_exit0; + +ehca_reg_mr_exit1: + h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, e_mr); + if (h_ret != H_SUCCESS) { + EDEB_ERR(1, "h_ret=%lx shca=%p e_mr=%p iova_start=%p " + "size=%lx acl=%x e_pd=%p lkey=%x pginfo=%p " + "num_pages=%lx num_4k=%lx ret=%x", h_ret, shca, e_mr, + iova_start, size, acl, e_pd, hipzout.lkey, pginfo, + pginfo->num_pages, pginfo->num_4k, ret); + EDEB_ERR(1, "internal error in ehca_reg_mr, not recoverable"); + } +ehca_reg_mr_exit0: + if (ret) + EDEB_EX(4, "ret=%x shca=%p e_mr=%p iova_start=%p size=%lx " + "acl=%x e_pd=%p pginfo=%p num_pages=%lx num_4k=%lx", + ret, shca, e_mr, iova_start, size, acl, e_pd, pginfo, + pginfo->num_pages, pginfo->num_4k); + else + EDEB_EX(7, "ret=%x lkey=%x rkey=%x", ret, *lkey, *rkey); + return ret; +} /* end ehca_reg_mr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_reg_mr_rpages(struct ehca_shca *shca, + struct ehca_mr *e_mr, + struct ehca_mr_pginfo *pginfo) +{ + int ret = 0; + u64 h_ret = H_SUCCESS; + u32 rnum = 0; + u64 rpage = 0; + u32 i; + u64 *kpage = NULL; + + EDEB_EN(7, "shca=%p e_mr=%p pginfo=%p num_pages=%lx num_4k=%lx", + shca, e_mr, pginfo, pginfo->num_pages, pginfo->num_4k); + + kpage = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); + if (!kpage) { + EDEB_ERR(4, "kpage alloc failed"); + ret = -ENOMEM; + goto ehca_reg_mr_rpages_exit0; + } + + /* max 512 pages per shot */ + for (i = 0; i < ((pginfo->num_4k + 512 - 1) / 512); i++) { + + if (i == ((pginfo->num_4k + 512 - 1) / 512) - 1) { + rnum = pginfo->num_4k % 512; /* last shot */ + if (rnum == 0) + rnum = 512; /* last shot is full */ + } else + rnum = 512; + + if (rnum > 1) { + ret = ehca_set_pagebuf(e_mr, pginfo, rnum, kpage); + if (ret) { + EDEB_ERR(4, "ehca_set_pagebuf bad rc, ret=%x " + "rnum=%x kpage=%p", ret, rnum, kpage); + ret = -EFAULT; + goto ehca_reg_mr_rpages_exit1; + } + rpage = virt_to_abs(kpage); + if (!rpage) { + EDEB_ERR(4, "kpage=%p i=%x", kpage, i); + ret = -EFAULT; + goto ehca_reg_mr_rpages_exit1; + } + } else { /* rnum==1 */ + ret = ehca_set_pagebuf_1(e_mr, pginfo, &rpage); + if (ret) { + EDEB_ERR(4, "ehca_set_pagebuf_1 bad rc, " + "ret=%x i=%x", ret, i); + ret = -EFAULT; + goto ehca_reg_mr_rpages_exit1; + } + } + + EDEB(9, "i=%x rnum=%x rpage=%lx", i, rnum, rpage); + + h_ret = hipz_h_register_rpage_mr(shca->ipz_hca_handle, e_mr, + 0, /* pagesize 4k */ + 0, rpage, rnum); + + if (i == ((pginfo->num_4k + 512 - 1) / 512) - 1) { + /* check for 'registration complete'==H_SUCCESS */ + /* and for 'page registered'==H_PAGE_REGISTERED */ + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "last hipz_reg_rpage_mr failed, " + "h_ret=%lx e_mr=%p i=%x hca_hndl=%lx " + "mr_hndl=%lx lkey=%x", h_ret, e_mr, i, + shca->ipz_hca_handle.handle, + e_mr->ipz_mr_handle.handle, + e_mr->ib.ib_mr.lkey); + ret = ehca_mrmw_map_hrc_rrpg_last(h_ret); + break; + } else + ret = 0; + } else if (h_ret != H_PAGE_REGISTERED) { + EDEB_ERR(4, "hipz_reg_rpage_mr failed, h_ret=%lx " + "e_mr=%p i=%x lkey=%x hca_hndl=%lx " + "mr_hndl=%lx", h_ret, e_mr, i, + e_mr->ib.ib_mr.lkey, + shca->ipz_hca_handle.handle, + e_mr->ipz_mr_handle.handle); + ret = ehca_mrmw_map_hrc_rrpg_notlast(h_ret); + break; + } else + ret = 0; + } /* end for(i) */ + + +ehca_reg_mr_rpages_exit1: + kfree(kpage); +ehca_reg_mr_rpages_exit0: + if (ret) + EDEB_EX(4, "ret=%x shca=%p e_mr=%p pginfo=%p num_pages=%lx " + "num_4k=%lx", ret, shca, e_mr, pginfo, + pginfo->num_pages, pginfo->num_4k); + else + EDEB_EX(7, "ret=%x", ret); + return ret; +} /* end ehca_reg_mr_rpages() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +inline int ehca_rereg_mr_rereg1(struct ehca_shca *shca, + struct ehca_mr *e_mr, + u64 *iova_start, + u64 size, + u32 acl, + struct ehca_pd *e_pd, + struct ehca_mr_pginfo *pginfo, + u32 *lkey, /*OUT*/ + u32 *rkey) /*OUT*/ +{ + int ret = 0; + u64 h_ret = H_SUCCESS; + u32 hipz_acl = 0; + u64 *kpage = NULL; + u64 rpage = 0; + struct ehca_mr_pginfo pginfo_save; + struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0}; + + EDEB_EN(7, "shca=%p e_mr=%p iova_start=%p size=%lx acl=%x " + "e_pd=%p pginfo=%p num_pages=%lx num_4k=%lx", shca, e_mr, + iova_start, size, acl, e_pd, pginfo, pginfo->num_pages, + pginfo->num_4k); + + ehca_mrmw_map_acl(acl, &hipz_acl); + ehca_mrmw_set_pgsize_hipz_acl(&hipz_acl); + + kpage = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); + if (!kpage) { + EDEB_ERR(4, "kpage alloc failed"); + ret = -ENOMEM; + goto ehca_rereg_mr_rereg1_exit0; + } + + pginfo_save = *pginfo; + ret = ehca_set_pagebuf(e_mr, pginfo, pginfo->num_4k, kpage); + if (ret) { + EDEB_ERR(4, "set pagebuf failed, e_mr=%p pginfo=%p type=%x " + "num_pages=%lx num_4k=%lx kpage=%p", e_mr, pginfo, + pginfo->type, pginfo->num_pages, pginfo->num_4k,kpage); + goto ehca_rereg_mr_rereg1_exit1; + } + rpage = virt_to_abs(kpage); + if (!rpage) { + EDEB_ERR(4, "kpage=%p", kpage); + ret = -EFAULT; + goto ehca_rereg_mr_rereg1_exit1; + } + h_ret = hipz_h_reregister_pmr(shca->ipz_hca_handle, e_mr, + (u64)iova_start, size, hipz_acl, + e_pd->fw_pd, rpage, &hipzout); + if (h_ret != H_SUCCESS) { + /* reregistration unsuccessful, */ + /* try it again with the 3 hCalls, */ + /* e.g. this is required in case H_MR_CONDITION */ + /* (MW bound or MR is shared) */ + EDEB(6, "hipz_h_reregister_pmr failed (Rereg1), h_ret=%lx " + "e_mr=%p", h_ret, e_mr); + *pginfo = pginfo_save; + ret = -EAGAIN; + } else if ((u64*)hipzout.vaddr != iova_start) { + EDEB_ERR(4, "PHYP changed iova_start in rereg_pmr, " + "iova_start=%p iova_start_out=%lx e_mr=%p " + "mr_handle=%lx lkey=%x lkey_out=%x", iova_start, + hipzout.vaddr, e_mr, e_mr->ipz_mr_handle.handle, + e_mr->ib.ib_mr.lkey, hipzout.lkey); + ret = -EFAULT; + } else { + /* successful reregistration */ + /* note: start and start_out are identical for eServer HCAs */ + e_mr->num_pages = pginfo->num_pages; + e_mr->num_4k = pginfo->num_4k; + e_mr->start = iova_start; + e_mr->size = size; + e_mr->acl = acl; + *lkey = hipzout.lkey; + *rkey = hipzout.rkey; + } + +ehca_rereg_mr_rereg1_exit1: + kfree(kpage); +ehca_rereg_mr_rereg1_exit0: + if ( ret && (ret != -EAGAIN) ) + EDEB_EX(4, "ret=%x h_ret=%lx lkey=%x rkey=%x pginfo=%p " + "num_pages=%lx num_4k=%lx", ret, h_ret, *lkey, *rkey, + pginfo, pginfo->num_pages, pginfo->num_4k); + else + EDEB_EX(7, "ret=%x h_ret=%lx lkey=%x rkey=%x pginfo=%p " + "num_pages=%lx num_4k=%lx", ret, h_ret, *lkey, *rkey, + pginfo, pginfo->num_pages, pginfo->num_4k); + return ret; +} /* end ehca_rereg_mr_rereg1() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_rereg_mr(struct ehca_shca *shca, + struct ehca_mr *e_mr, + u64 *iova_start, + u64 size, + int acl, + struct ehca_pd *e_pd, + struct ehca_mr_pginfo *pginfo, + u32 *lkey, + u32 *rkey) +{ + int ret = 0; + u64 h_ret = H_SUCCESS; + int rereg_1_hcall = 1; /* 1: use hipz_h_reregister_pmr directly */ + int rereg_3_hcall = 0; /* 1: use 3 hipz calls for reregistration */ + + EDEB_EN(7, "shca=%p e_mr=%p iova_start=%p size=%lx acl=%x " + "e_pd=%p pginfo=%p num_pages=%lx num_4k=%lx", shca, e_mr, + iova_start, size, acl, e_pd, pginfo, pginfo->num_pages, + pginfo->num_4k); + + /* first determine reregistration hCall(s) */ + if ((pginfo->num_4k > 512) || (e_mr->num_4k > 512) || + (pginfo->num_4k > e_mr->num_4k)) { + EDEB(7, "Rereg3 case, pginfo->num_4k=%lx " + "e_mr->num_4k=%x", pginfo->num_4k, e_mr->num_4k); + rereg_1_hcall = 0; + rereg_3_hcall = 1; + } + + if (e_mr->flags & EHCA_MR_FLAG_MAXMR) { /* check for max-MR */ + rereg_1_hcall = 0; + rereg_3_hcall = 1; + e_mr->flags &= ~EHCA_MR_FLAG_MAXMR; + EDEB(4, "Rereg MR for max-MR! e_mr=%p", e_mr); + } + + if (rereg_1_hcall) { + ret = ehca_rereg_mr_rereg1(shca, e_mr, iova_start, size, + acl, e_pd, pginfo, lkey, rkey); + if (ret) { + if (ret == -EAGAIN) + rereg_3_hcall = 1; + else + goto ehca_rereg_mr_exit0; + } + } + + if (rereg_3_hcall) { + struct ehca_mr save_mr; + + /* first deregister old MR */ + h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, e_mr); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "hipz_free_mr failed, h_ret=%lx e_mr=%p " + "hca_hndl=%lx mr_hndl=%lx mr->lkey=%x", + h_ret, e_mr, shca->ipz_hca_handle.handle, + e_mr->ipz_mr_handle.handle, + e_mr->ib.ib_mr.lkey); + ret = ehca_mrmw_map_hrc_free_mr(h_ret); + goto ehca_rereg_mr_exit0; + } + /* clean ehca_mr_t, without changing struct ib_mr and lock */ + save_mr = *e_mr; + ehca_mr_deletenew(e_mr); + + /* set some MR values */ + e_mr->flags = save_mr.flags; + e_mr->fmr_page_size = save_mr.fmr_page_size; + e_mr->fmr_max_pages = save_mr.fmr_max_pages; + e_mr->fmr_max_maps = save_mr.fmr_max_maps; + e_mr->fmr_map_cnt = save_mr.fmr_map_cnt; + + ret = ehca_reg_mr(shca, e_mr, iova_start, size, acl, + e_pd, pginfo, lkey, rkey); + if (ret) { + u32 offset = (u64)(&e_mr->flags) - (u64)e_mr; + memcpy(&e_mr->flags, &(save_mr.flags), + sizeof(struct ehca_mr) - offset); + goto ehca_rereg_mr_exit0; + } + } + +ehca_rereg_mr_exit0: + if (ret) + EDEB_EX(4, "ret=%x shca=%p e_mr=%p iova_start=%p size=%lx " + "acl=%x e_pd=%p pginfo=%p num_pages=%lx lkey=%x rkey=%x" + " rereg_1_hcall=%x rereg_3_hcall=%x", ret, shca, e_mr, + iova_start, size, acl, e_pd, pginfo, pginfo->num_pages, + *lkey, *rkey, rereg_1_hcall, rereg_3_hcall); + else + EDEB_EX(7, "ret=%x shca=%p e_mr=%p iova_start=%p size=%lx " + "acl=%x e_pd=%p pginfo=%p num_pages=%lx lkey=%x rkey=%x" + " rereg_1_hcall=%x rereg_3_hcall=%x", ret, shca, e_mr, + iova_start, size, acl, e_pd, pginfo, pginfo->num_pages, + *lkey, *rkey, rereg_1_hcall, rereg_3_hcall); + + return ret; +} /* end ehca_rereg_mr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_unmap_one_fmr(struct ehca_shca *shca, + struct ehca_mr *e_fmr) +{ + int ret = 0; + u64 h_ret = H_SUCCESS; + int rereg_1_hcall = 1; /* 1: use hipz_mr_reregister directly */ + int rereg_3_hcall = 0; /* 1: use 3 hipz calls for unmapping */ + struct ehca_pd *e_pd = NULL; + struct ehca_mr save_fmr; + u32 tmp_lkey = 0; + u32 tmp_rkey = 0; + struct ehca_mr_pginfo pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0}; + struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0}; + + EDEB_EN(7, "shca=%p e_fmr=%p", shca, e_fmr); + + /* first check if reregistration hCall can be used for unmap */ + if (e_fmr->fmr_max_pages > 512) { + rereg_1_hcall = 0; + rereg_3_hcall = 1; + } + + e_pd = container_of(e_fmr->ib.ib_fmr.pd, struct ehca_pd, ib_pd); + + if (rereg_1_hcall) { + /* note: after using rereg hcall with len=0, */ + /* rereg hcall must be used again for registering pages */ + h_ret = hipz_h_reregister_pmr(shca->ipz_hca_handle, e_fmr, 0, + 0, 0, e_pd->fw_pd, 0, &hipzout); + if (h_ret != H_SUCCESS) { + /* should not happen, because length checked above, */ + /* FMRs are not shared and no MW bound to FMRs */ + EDEB_ERR(4, "hipz_reregister_pmr failed (Rereg1), " + "h_ret=%lx e_fmr=%p hca_hndl=%lx mr_hndl=%lx " + "lkey=%x lkey_out=%x", h_ret, e_fmr, + shca->ipz_hca_handle.handle, + e_fmr->ipz_mr_handle.handle, + e_fmr->ib.ib_fmr.lkey, hipzout.lkey); + rereg_3_hcall = 1; + } else { + /* successful reregistration */ + e_fmr->start = NULL; + e_fmr->size = 0; + tmp_lkey = hipzout.lkey; + tmp_rkey = hipzout.rkey; + } + } + + if (rereg_3_hcall) { + struct ehca_mr save_mr; + + /* first free old FMR */ + h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, e_fmr); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "hipz_free_mr failed, h_ret=%lx e_fmr=%p " + "hca_hndl=%lx mr_hndl=%lx lkey=%x", h_ret, + e_fmr, shca->ipz_hca_handle.handle, + e_fmr->ipz_mr_handle.handle, + e_fmr->ib.ib_fmr.lkey); + ret = ehca_mrmw_map_hrc_free_mr(h_ret); + goto ehca_unmap_one_fmr_exit0; + } + /* clean ehca_mr_t, without changing lock */ + save_fmr = *e_fmr; + ehca_mr_deletenew(e_fmr); + + /* set some MR values */ + e_fmr->flags = save_fmr.flags; + e_fmr->fmr_page_size = save_fmr.fmr_page_size; + e_fmr->fmr_max_pages = save_fmr.fmr_max_pages; + e_fmr->fmr_max_maps = save_fmr.fmr_max_maps; + e_fmr->fmr_map_cnt = save_fmr.fmr_map_cnt; + e_fmr->acl = save_fmr.acl; + + pginfo.type = EHCA_MR_PGI_FMR; + pginfo.num_pages = 0; + pginfo.num_4k = 0; + ret = ehca_reg_mr(shca, e_fmr, NULL, + (e_fmr->fmr_max_pages * e_fmr->fmr_page_size), + e_fmr->acl, e_pd, &pginfo, &tmp_lkey, + &tmp_rkey); + if (ret) { + u32 offset = (u64)(&e_fmr->flags) - (u64)e_fmr; + memcpy(&e_fmr->flags, &(save_mr.flags), + sizeof(struct ehca_mr) - offset); + goto ehca_unmap_one_fmr_exit0; + } + } + +ehca_unmap_one_fmr_exit0: + EDEB_EX(7, "ret=%x tmp_lkey=%x tmp_rkey=%x fmr_max_pages=%x " + "rereg_1_hcall=%x rereg_3_hcall=%x", ret, tmp_lkey, tmp_rkey, + e_fmr->fmr_max_pages, rereg_1_hcall, rereg_3_hcall); + return ret; +} /* end ehca_unmap_one_fmr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_reg_smr(struct ehca_shca *shca, + struct ehca_mr *e_origmr, + struct ehca_mr *e_newmr, + u64 *iova_start, + int acl, + struct ehca_pd *e_pd, + u32 *lkey, /*OUT*/ + u32 *rkey) /*OUT*/ +{ + int ret = 0; + u64 h_ret = H_SUCCESS; + u32 hipz_acl = 0; + struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0}; + + EDEB_EN(7,"shca=%p e_origmr=%p e_newmr=%p iova_start=%p acl=%x e_pd=%p", + shca, e_origmr, e_newmr, iova_start, acl, e_pd); + + ehca_mrmw_map_acl(acl, &hipz_acl); + ehca_mrmw_set_pgsize_hipz_acl(&hipz_acl); + + h_ret = hipz_h_register_smr(shca->ipz_hca_handle, e_newmr, e_origmr, + (u64)iova_start, hipz_acl, e_pd->fw_pd, + &hipzout); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "hipz_reg_smr failed, h_ret=%lx shca=%p e_origmr=%p" + " e_newmr=%p iova_start=%p acl=%x e_pd=%p hca_hndl=%lx" + " mr_hndl=%lx lkey=%x", h_ret, shca, e_origmr, e_newmr, + iova_start, acl, e_pd, shca->ipz_hca_handle.handle, + e_origmr->ipz_mr_handle.handle, + e_origmr->ib.ib_mr.lkey); + ret = ehca_mrmw_map_hrc_reg_smr(h_ret); + goto ehca_reg_smr_exit0; + } + /* successful registration */ + e_newmr->num_pages = e_origmr->num_pages; + e_newmr->num_4k = e_origmr->num_4k; + e_newmr->start = iova_start; + e_newmr->size = e_origmr->size; + e_newmr->acl = acl; + e_newmr->ipz_mr_handle = hipzout.handle; + *lkey = hipzout.lkey; + *rkey = hipzout.rkey; + goto ehca_reg_smr_exit0; + +ehca_reg_smr_exit0: + if (ret) + EDEB_EX(4, "ret=%x shca=%p e_origmr=%p e_newmr=%p " + "iova_start=%p acl=%x e_pd=%p", + ret, shca, e_origmr, e_newmr, iova_start, acl, e_pd); + else + EDEB_EX(7, "ret=%x lkey=%x rkey=%x", ret, *lkey, *rkey); + return ret; +} /* end ehca_reg_smr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* register internal max-MR to internal SHCA */ +int ehca_reg_internal_maxmr( + struct ehca_shca *shca, + struct ehca_pd *e_pd, + struct ehca_mr **e_maxmr) /*OUT*/ +{ + int ret = 0; + struct ehca_mr *e_mr = NULL; + u64 *iova_start = NULL; + u64 size_maxmr = 0; + struct ehca_mr_pginfo pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0}; + struct ib_phys_buf ib_pbuf; + u32 num_pages_mr = 0; + u32 num_pages_4k = 0; /* 4k portion "pages" */ + + EDEB_EN(7, "shca=%p e_pd=%p e_maxmr=%p", shca, e_pd, e_maxmr); + + if (ehca_adr_bad(shca) || ehca_adr_bad(e_pd) || ehca_adr_bad(e_maxmr)) { + EDEB_ERR(4, "bad input values: shca=%p e_pd=%p e_maxmr=%p", + shca, e_pd, e_maxmr); + ret = -EINVAL; + goto ehca_reg_internal_maxmr_exit0; + } + + e_mr = ehca_mr_new(); + if (!e_mr) { + EDEB_ERR(4, "out of memory"); + ret = -ENOMEM; + goto ehca_reg_internal_maxmr_exit0; + } + e_mr->flags |= EHCA_MR_FLAG_MAXMR; + + /* register internal max-MR on HCA */ + size_maxmr = (u64)high_memory - PAGE_OFFSET; + EDEB(7, "high_memory=%p PAGE_OFFSET=%lx", high_memory, PAGE_OFFSET); + iova_start = (u64*)KERNELBASE; + ib_pbuf.addr = 0; + ib_pbuf.size = size_maxmr; + num_pages_mr = ((((u64)iova_start % PAGE_SIZE) + size_maxmr + + PAGE_SIZE - 1) / PAGE_SIZE); + num_pages_4k = ((((u64)iova_start % EHCA_PAGESIZE) + size_maxmr + + EHCA_PAGESIZE - 1) / EHCA_PAGESIZE); + + pginfo.type = EHCA_MR_PGI_PHYS; + pginfo.num_pages = num_pages_mr; + pginfo.num_4k = num_pages_4k; + pginfo.num_phys_buf = 1; + pginfo.phys_buf_array = &ib_pbuf; + + ret = ehca_reg_mr(shca, e_mr, iova_start, size_maxmr, 0, e_pd, + &pginfo, &e_mr->ib.ib_mr.lkey, + &e_mr->ib.ib_mr.rkey); + if (ret) { + EDEB_ERR(4, "reg of internal max MR failed, e_mr=%p " + "iova_start=%p size_maxmr=%lx num_pages_mr=%x " + "num_pages_4k=%x", e_mr, iova_start, size_maxmr, + num_pages_mr, num_pages_4k); + goto ehca_reg_internal_maxmr_exit1; + } + + /* successful registration of all pages */ + e_mr->ib.ib_mr.device = e_pd->ib_pd.device; + e_mr->ib.ib_mr.pd = &e_pd->ib_pd; + e_mr->ib.ib_mr.uobject = NULL; + atomic_inc(&(e_pd->ib_pd.usecnt)); + atomic_set(&(e_mr->ib.ib_mr.usecnt), 0); + *e_maxmr = e_mr; + goto ehca_reg_internal_maxmr_exit0; + +ehca_reg_internal_maxmr_exit1: + ehca_mr_delete(e_mr); +ehca_reg_internal_maxmr_exit0: + if (ret) + EDEB_EX(4, "ret=%x shca=%p e_pd=%p e_maxmr=%p", + ret, shca, e_pd, e_maxmr); + else + EDEB_EX(7, "*e_maxmr=%p lkey=%x rkey=%x", + *e_maxmr, (*e_maxmr)->ib.ib_mr.lkey, + (*e_maxmr)->ib.ib_mr.rkey); + return ret; +} /* end ehca_reg_internal_maxmr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_reg_maxmr(struct ehca_shca *shca, + struct ehca_mr *e_newmr, + u64 *iova_start, + int acl, + struct ehca_pd *e_pd, + u32 *lkey, + u32 *rkey) +{ + int ret = 0; + u64 h_ret = H_SUCCESS; + struct ehca_mr *e_origmr = shca->maxmr; + u32 hipz_acl = 0; + struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0}; + + EDEB_EN(7,"shca=%p e_origmr=%p e_newmr=%p iova_start=%p acl=%x e_pd=%p", + shca, e_origmr, e_newmr, iova_start, acl, e_pd); + + ehca_mrmw_map_acl(acl, &hipz_acl); + ehca_mrmw_set_pgsize_hipz_acl(&hipz_acl); + + h_ret = hipz_h_register_smr(shca->ipz_hca_handle, e_newmr, e_origmr, + (u64)iova_start, hipz_acl, e_pd->fw_pd, + &hipzout); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "hipz_reg_smr failed, h_ret=%lx e_origmr=%p " + "hca_hndl=%lx mr_hndl=%lx lkey=%x", + h_ret, e_origmr, shca->ipz_hca_handle.handle, + e_origmr->ipz_mr_handle.handle, + e_origmr->ib.ib_mr.lkey); + ret = ehca_mrmw_map_hrc_reg_smr(h_ret); + goto ehca_reg_maxmr_exit0; + } + /* successful registration */ + e_newmr->num_pages = e_origmr->num_pages; + e_newmr->num_4k = e_origmr->num_4k; + e_newmr->start = iova_start; + e_newmr->size = e_origmr->size; + e_newmr->acl = acl; + e_newmr->ipz_mr_handle = hipzout.handle; + *lkey = hipzout.lkey; + *rkey = hipzout.rkey; + +ehca_reg_maxmr_exit0: + EDEB_EX(7, "ret=%x lkey=%x rkey=%x", ret, *lkey, *rkey); + return ret; +} /* end ehca_reg_maxmr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +int ehca_dereg_internal_maxmr(struct ehca_shca *shca) +{ + int ret = 0; + struct ehca_mr *e_maxmr = NULL; + struct ib_pd *ib_pd = NULL; + + EDEB_EN(7, "shca=%p shca->maxmr=%p", shca, shca->maxmr); + + if (!shca->maxmr) { + EDEB_ERR(4, "bad call, shca=%p", shca); + ret = -EINVAL; + goto ehca_dereg_internal_maxmr_exit0; + } + + e_maxmr = shca->maxmr; + ib_pd = e_maxmr->ib.ib_mr.pd; + shca->maxmr = NULL; /* remove internal max-MR indication from SHCA */ + + ret = ehca_dereg_mr(&e_maxmr->ib.ib_mr); + if (ret) { + EDEB_ERR(3, "dereg internal max-MR failed, " + "ret=%x e_maxmr=%p shca=%p lkey=%x", + ret, e_maxmr, shca, e_maxmr->ib.ib_mr.lkey); + shca->maxmr = e_maxmr; + goto ehca_dereg_internal_maxmr_exit0; + } + + atomic_dec(&ib_pd->usecnt); + +ehca_dereg_internal_maxmr_exit0: + if (ret) + EDEB_EX(4, "ret=%x shca=%p shca->maxmr=%p", + ret, shca, shca->maxmr); + else + EDEB_EX(7, ""); + return ret; +} /* end ehca_dereg_internal_maxmr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* check physical buffer array of MR verbs for validness and + * calculates MR size + */ +int ehca_mr_chk_buf_and_calc_size(struct ib_phys_buf *phys_buf_array, + int num_phys_buf, + u64 *iova_start, + u64 *size) +{ + struct ib_phys_buf *pbuf = phys_buf_array; + u64 size_count = 0; + u32 i; + + if (num_phys_buf == 0) { + EDEB_ERR(4, "bad phys buf array len, num_phys_buf=0"); + return -EINVAL; + } + /* check first buffer */ + if (((u64)iova_start & ~PAGE_MASK) != (pbuf->addr & ~PAGE_MASK)) { + EDEB_ERR(4, "iova_start/addr mismatch, iova_start=%p " + "pbuf->addr=%lx pbuf->size=%lx", + iova_start, pbuf->addr, pbuf->size); + return -EINVAL; + } + if (((pbuf->addr + pbuf->size) % PAGE_SIZE) && + (num_phys_buf > 1)) { + EDEB_ERR(4, "addr/size mismatch in 1st buf, pbuf->addr=%lx " + "pbuf->size=%lx", pbuf->addr, pbuf->size); + return -EINVAL; + } + + for (i = 0; i < num_phys_buf; i++) { + if ((i > 0) && (pbuf->addr % PAGE_SIZE)) { + EDEB_ERR(4, "bad address, i=%x pbuf->addr=%lx " + "pbuf->size=%lx", i, pbuf->addr, pbuf->size); + return -EINVAL; + } + if (((i > 0) && /* not 1st */ + (i < (num_phys_buf - 1)) && /* not last */ + (pbuf->size % PAGE_SIZE)) || (pbuf->size == 0)) { + EDEB_ERR(4, "bad size, i=%x pbuf->size=%lx", + i, pbuf->size); + return -EINVAL; + } + size_count += pbuf->size; + pbuf++; + } + + *size = size_count; + return 0; +} /* end ehca_mr_chk_buf_and_calc_size() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* check page list of map FMR verb for validness */ +int ehca_fmr_check_page_list(struct ehca_mr *e_fmr, + u64 *page_list, + int list_len) +{ + u32 i; + u64 *page = NULL; + + if (ehca_adr_bad(page_list)) { + EDEB_ERR(4, "bad page_list, page_list=%p fmr=%p", + page_list, e_fmr); + return -EINVAL; + } + + if ((list_len == 0) || (list_len > e_fmr->fmr_max_pages)) { + EDEB_ERR(4, "bad list_len, list_len=%x e_fmr->fmr_max_pages=%x " + "fmr=%p", list_len, e_fmr->fmr_max_pages, e_fmr); + return -EINVAL; + } + + /* each page must be aligned */ + page = page_list; + for (i = 0; i < list_len; i++) { + if (*page % e_fmr->fmr_page_size) { + EDEB_ERR(4, "bad page, i=%x *page=%lx page=%p " + "fmr=%p fmr_page_size=%x", + i, *page, page, e_fmr, e_fmr->fmr_page_size); + return -EINVAL; + } + page++; + } + + return 0; +} /* end ehca_fmr_check_page_list() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* setup page buffer from page info */ +int ehca_set_pagebuf(struct ehca_mr *e_mr, + struct ehca_mr_pginfo *pginfo, + u32 number, + u64 *kpage) +{ + int ret = 0; + struct ib_umem_chunk *prev_chunk = NULL; + struct ib_umem_chunk *chunk = NULL; + struct ib_phys_buf *pbuf = NULL; + u64 *fmrlist = NULL; + u64 num4k = 0; + u64 pgaddr = 0; + u64 offs4k = 0; + u32 i = 0; + u32 j = 0; + + EDEB_EN(7, "pginfo=%p type=%x num_pages=%lx num_4k=%lx next_buf=%lx " + "next_4k=%lx number=%x kpage=%p page_cnt=%lx page_4k_cnt=%lx " + "next_listelem=%lx region=%p next_chunk=%p next_nmap=%lx", + pginfo, pginfo->type, pginfo->num_pages, pginfo->num_4k, + pginfo->next_buf, pginfo->next_4k, number, kpage, + pginfo->page_cnt, pginfo->page_4k_cnt, pginfo->next_listelem, + pginfo->region, pginfo->next_chunk, pginfo->next_nmap); + + if (pginfo->type == EHCA_MR_PGI_PHYS) { + /* loop over desired phys_buf_array entries */ + while (i < number) { + pbuf = pginfo->phys_buf_array + pginfo->next_buf; + num4k = ((pbuf->addr % EHCA_PAGESIZE) + pbuf->size + + EHCA_PAGESIZE - 1) / EHCA_PAGESIZE; + offs4k = (pbuf->addr & ~PAGE_MASK) / EHCA_PAGESIZE; + while (pginfo->next_4k < offs4k + num4k) { + /* sanity check */ + if ((pginfo->page_cnt >= pginfo->num_pages) || + (pginfo->page_4k_cnt >= pginfo->num_4k)) { + EDEB_ERR(4, "page_cnt >= num_pages, " + "page_cnt=%lx num_pages=%lx " + "page_4k_cnt=%lx num_4k=%lx " + "i=%x", pginfo->page_cnt, + pginfo->num_pages, + pginfo->page_4k_cnt, + pginfo->num_4k, i); + ret = -EFAULT; + } + *kpage = phys_to_abs( + (pbuf->addr & (EHCA_PAGESIZE-1)) + + (pginfo->next_4k * EHCA_PAGESIZE)); + if ( !(*kpage) && pbuf->addr ) { + EDEB_ERR(4, "pbuf->addr=%lx " + "pbuf->size=%lx next_4k=%lx", + pbuf->addr, pbuf->size, + pginfo->next_4k); + ret = -EFAULT; + goto ehca_set_pagebuf_exit0; + } + (pginfo->page_4k_cnt)++; + (pginfo->next_4k)++; + if(pginfo->next_4k >= PAGE_SIZE/EHCA_PAGESIZE) + (pginfo->page_cnt)++; + kpage++; + i++; + if (i >= number) break; + } + if (pginfo->next_4k >= offs4k + num4k) { + (pginfo->next_buf)++; + pginfo->next_4k = 0; + } + } + } else if (pginfo->type == EHCA_MR_PGI_USER) { + /* loop over desired chunk entries */ + chunk = pginfo->next_chunk; + prev_chunk = pginfo->next_chunk; + list_for_each_entry_continue(chunk, + (&(pginfo->region->chunk_list)), + list) { + EDEB(9, "chunk->page_list[0]=%lx", + (u64)sg_dma_address(&chunk->page_list[0])); + for (i = pginfo->next_nmap; i < chunk->nmap; ) { + pgaddr = ( page_to_pfn(chunk->page_list[i].page) + << PAGE_SHIFT ); + *kpage = phys_to_abs(pgaddr + + (pginfo->next_4k * + EHCA_PAGESIZE)); + EDEB(9,"pgaddr=%lx *kpage=%lx next_4k=%lx", + pgaddr, *kpage, pginfo->next_4k); + if ( !(*kpage) ) { + EDEB_ERR(4, "pgaddr=%lx " + "chunk->page_list[i]=%lx i=%x " + "next_4k=%lx mr=%p", pgaddr, + (u64)sg_dma_address( + &chunk->page_list[i]), + i, pginfo->next_4k, e_mr); + ret = -EFAULT; + goto ehca_set_pagebuf_exit0; + } + (pginfo->page_4k_cnt)++; + (pginfo->next_4k)++; + kpage++; + if (pginfo->next_4k >= PAGE_SIZE/EHCA_PAGESIZE) { + (pginfo->page_cnt)++; + (pginfo->next_nmap)++; + pginfo->next_4k = 0; + i++; + } + j++; + if (j >= number) break; + } + if ((pginfo->next_nmap >= chunk->nmap) && + (j >= number)) { + pginfo->next_nmap = 0; + prev_chunk = chunk; + break; + } else if (pginfo->next_nmap >= chunk->nmap) { + pginfo->next_nmap = 0; + prev_chunk = chunk; + } else if (j >= number) + break; + else + prev_chunk = chunk; + } + pginfo->next_chunk = + list_prepare_entry(prev_chunk, + (&(pginfo->region->chunk_list)), + list); + } else if (pginfo->type == EHCA_MR_PGI_FMR) { + /* loop over desired page_list entries */ + fmrlist = pginfo->page_list + pginfo->next_listelem; + for (i = 0; i < number; i++) { + *kpage = phys_to_abs((*fmrlist & (EHCA_PAGESIZE-1)) + + pginfo->next_4k * EHCA_PAGESIZE); + if ( !(*kpage) ) { + EDEB_ERR(4, "*fmrlist=%lx fmrlist=%p " + "next_listelem=%lx next_4k=%lx", + *fmrlist, fmrlist, + pginfo->next_listelem,pginfo->next_4k); + ret = -EFAULT; + goto ehca_set_pagebuf_exit0; + } + (pginfo->page_4k_cnt)++; + (pginfo->next_4k)++; + kpage++; + if ( pginfo->next_4k >= + ((e_mr->fmr_page_size) / EHCA_PAGESIZE) ) { + (pginfo->page_cnt)++; + (pginfo->next_listelem)++; + fmrlist++; + pginfo->next_4k = 0; + } + } + } else { + EDEB_ERR(4, "bad pginfo->type=%x", pginfo->type); + ret = -EFAULT; + goto ehca_set_pagebuf_exit0; + } + +ehca_set_pagebuf_exit0: + if (ret) + EDEB_EX(4, "ret=%x e_mr=%p pginfo=%p type=%x num_pages=%lx " + "num_4k=%lx next_buf=%lx next_4k=%lx number=%x " + "kpage=%p page_cnt=%lx page_4k_cnt=%lx i=%x " + "next_listelem=%lx region=%p next_chunk=%p " + "next_nmap=%lx", ret, e_mr, pginfo, pginfo->type, + pginfo->num_pages, pginfo->num_4k, pginfo->next_buf, + pginfo->next_4k, number, kpage, pginfo->page_cnt, + pginfo->page_4k_cnt, i, pginfo->next_listelem, + pginfo->region, pginfo->next_chunk, pginfo->next_nmap); + else + EDEB_EX(7, "ret=%x e_mr=%p pginfo=%p type=%x num_pages=%lx " + "num_4k=%lx next_buf=%lx next_4k=%lx number=%x " + "kpage=%p page_cnt=%lx page_4k_cnt=%lx i=%x " + "next_listelem=%lx region=%p next_chunk=%p " + "next_nmap=%lx", ret, e_mr, pginfo, pginfo->type, + pginfo->num_pages, pginfo->num_4k, pginfo->next_buf, + pginfo->next_4k, number, kpage, pginfo->page_cnt, + pginfo->page_4k_cnt, i, pginfo->next_listelem, + pginfo->region, pginfo->next_chunk, pginfo->next_nmap); + return ret; +} /* end ehca_set_pagebuf() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* setup 1 page from page info page buffer */ +int ehca_set_pagebuf_1(struct ehca_mr *e_mr, + struct ehca_mr_pginfo *pginfo, + u64 *rpage) +{ + int ret = 0; + struct ib_phys_buf *tmp_pbuf = NULL; + u64 *fmrlist = NULL; + struct ib_umem_chunk *chunk = NULL; + struct ib_umem_chunk *prev_chunk = NULL; + u64 pgaddr = 0; + u64 num4k = 0; + u64 offs4k = 0; + + EDEB_EN(7, "pginfo=%p type=%x num_pages=%lx num_4k=%lx next_buf=%lx " + "next_4k=%lx rpage=%p page_cnt=%lx page_4k_cnt=%lx " + "next_listelem=%lx region=%p next_chunk=%p next_nmap=%lx", + pginfo, pginfo->type, pginfo->num_pages, pginfo->num_4k, + pginfo->next_buf, pginfo->next_4k, rpage, pginfo->page_cnt, + pginfo->page_4k_cnt, pginfo->next_listelem, pginfo->region, + pginfo->next_chunk, pginfo->next_nmap); + + if (pginfo->type == EHCA_MR_PGI_PHYS) { + /* sanity check */ + if ((pginfo->page_cnt >= pginfo->num_pages) || + (pginfo->page_4k_cnt >= pginfo->num_4k)) { + EDEB_ERR(4, "page_cnt >= num_pages, page_cnt=%lx " + "num_pages=%lx page_4k_cnt=%lx num_4k=%lx", + pginfo->page_cnt, pginfo->num_pages, + pginfo->page_4k_cnt, pginfo->num_4k); + ret = -EFAULT; + goto ehca_set_pagebuf_1_exit0; + } + tmp_pbuf = pginfo->phys_buf_array + pginfo->next_buf; + num4k = ((tmp_pbuf->addr % EHCA_PAGESIZE) + tmp_pbuf->size + + EHCA_PAGESIZE - 1) / EHCA_PAGESIZE; + offs4k = (tmp_pbuf->addr & ~PAGE_MASK) / EHCA_PAGESIZE; + *rpage = phys_to_abs((tmp_pbuf->addr & (EHCA_PAGESIZE-1)) + + (pginfo->next_4k * EHCA_PAGESIZE)); + if ( !(*rpage) && tmp_pbuf->addr ) { + EDEB_ERR(4, "tmp_pbuf->addr=%lx" + " tmp_pbuf->size=%lx next_4k=%lx", + tmp_pbuf->addr, tmp_pbuf->size, + pginfo->next_4k); + ret = -EFAULT; + goto ehca_set_pagebuf_1_exit0; + } + (pginfo->page_4k_cnt)++; + (pginfo->next_4k)++; + if(pginfo->next_4k >= PAGE_SIZE/EHCA_PAGESIZE) + (pginfo->page_cnt)++; + if (pginfo->next_4k >= offs4k + num4k) { + (pginfo->next_buf)++; + pginfo->next_4k = 0; + } + } else if (pginfo->type == EHCA_MR_PGI_USER) { + chunk = pginfo->next_chunk; + prev_chunk = pginfo->next_chunk; + list_for_each_entry_continue(chunk, + (&(pginfo->region->chunk_list)), + list) { + pgaddr = ( page_to_pfn(chunk->page_list[ + pginfo->next_nmap].page) + << PAGE_SHIFT); + *rpage = phys_to_abs(pgaddr + + (pginfo->next_4k * EHCA_PAGESIZE)); + EDEB(9,"pgaddr=%lx *rpage=%lx next_4k=%lx", pgaddr, + *rpage, pginfo->next_4k); + if ( !(*rpage) ) { + EDEB_ERR(4, "pgaddr=%lx chunk->page_list[]=%lx " + "next_nmap=%lx next_4k=%lx mr=%p", + pgaddr, (u64)sg_dma_address( + &chunk->page_list[ + pginfo->next_nmap]), + pginfo->next_nmap, pginfo->next_4k, + e_mr); + ret = -EFAULT; + goto ehca_set_pagebuf_1_exit0; + } + (pginfo->page_4k_cnt)++; + (pginfo->next_4k)++; + if (pginfo->next_4k >= PAGE_SIZE/EHCA_PAGESIZE) { + (pginfo->page_cnt)++; + (pginfo->next_nmap)++; + pginfo->next_4k = 0; + } + if (pginfo->next_nmap >= chunk->nmap) { + pginfo->next_nmap = 0; + prev_chunk = chunk; + } + break; + } + pginfo->next_chunk = + list_prepare_entry(prev_chunk, + (&(pginfo->region->chunk_list)), + list); + } else if (pginfo->type == EHCA_MR_PGI_FMR) { + fmrlist = pginfo->page_list + pginfo->next_listelem; + *rpage = phys_to_abs((*fmrlist & (EHCA_PAGESIZE-1)) + + pginfo->next_4k * EHCA_PAGESIZE); + if ( !(*rpage) ) { + EDEB_ERR(4, "*fmrlist=%lx fmrlist=%p next_listelem=%lx " + "next_4k=%lx", *fmrlist, fmrlist, + pginfo->next_listelem, pginfo->next_4k); + ret = -EFAULT; + goto ehca_set_pagebuf_1_exit0; + } + (pginfo->page_4k_cnt)++; + (pginfo->next_4k)++; + if (pginfo->next_4k >= (e_mr->fmr_page_size)/EHCA_PAGESIZE) { + (pginfo->page_cnt)++; + (pginfo->next_listelem)++; + pginfo->next_4k = 0; + } + } else { + EDEB_ERR(4, "bad pginfo->type=%x", pginfo->type); + ret = -EFAULT; + goto ehca_set_pagebuf_1_exit0; + } + +ehca_set_pagebuf_1_exit0: + if (ret) + EDEB_EX(4, "ret=%x e_mr=%p pginfo=%p type=%x num_pages=%lx " + "num_4k=%lx next_buf=%lx next_4k=%lx rpage=%p " + "page_cnt=%lx page_4k_cnt=%lx next_listelem=%lx " + "region=%p next_chunk=%p next_nmap=%lx", ret, e_mr, + pginfo, pginfo->type, pginfo->num_pages, pginfo->num_4k, + pginfo->next_buf, pginfo->next_4k, rpage, + pginfo->page_cnt, pginfo->page_4k_cnt, + pginfo->next_listelem, pginfo->region, + pginfo->next_chunk, pginfo->next_nmap); + else + EDEB_EX(7, "ret=%x e_mr=%p pginfo=%p type=%x num_pages=%lx " + "num_4k=%lx next_buf=%lx next_4k=%lx rpage=%p " + "page_cnt=%lx page_4k_cnt=%lx next_listelem=%lx " + "region=%p next_chunk=%p next_nmap=%lx", ret, e_mr, + pginfo, pginfo->type, pginfo->num_pages, pginfo->num_4k, + pginfo->next_buf, pginfo->next_4k, rpage, + pginfo->page_cnt, pginfo->page_4k_cnt, + pginfo->next_listelem, pginfo->region, + pginfo->next_chunk, pginfo->next_nmap); + return ret; +} /* end ehca_set_pagebuf_1() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* check MR if it is a max-MR, i.e. uses whole memory + * in case it's a max-MR 1 is returned, else 0 + */ +int ehca_mr_is_maxmr(u64 size, + u64 *iova_start) +{ + /* a MR is treated as max-MR only if it fits following: */ + if ((size == ((u64)high_memory - PAGE_OFFSET)) && + (iova_start == (void*)KERNELBASE)) { + EDEB(6, "this is a max-MR"); + return 1; + } else + return 0; +} /* end ehca_mr_is_maxmr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ +/* map access control for MR/MW. This routine is used for MR and MW. */ +void ehca_mrmw_map_acl(int ib_acl, + u32 *hipz_acl) +{ + *hipz_acl = 0; + if (ib_acl & IB_ACCESS_REMOTE_READ) + *hipz_acl |= HIPZ_ACCESSCTRL_R_READ; + if (ib_acl & IB_ACCESS_REMOTE_WRITE) + *hipz_acl |= HIPZ_ACCESSCTRL_R_WRITE; + if (ib_acl & IB_ACCESS_REMOTE_ATOMIC) + *hipz_acl |= HIPZ_ACCESSCTRL_R_ATOMIC; + if (ib_acl & IB_ACCESS_LOCAL_WRITE) + *hipz_acl |= HIPZ_ACCESSCTRL_L_WRITE; + if (ib_acl & IB_ACCESS_MW_BIND) + *hipz_acl |= HIPZ_ACCESSCTRL_MW_BIND; +} /* end ehca_mrmw_map_acl() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* sets page size in hipz access control for MR/MW. */ +void ehca_mrmw_set_pgsize_hipz_acl(u32 *hipz_acl) /*INOUT*/ +{ + return; /* HCA supports only 4k */ +} /* end ehca_mrmw_set_pgsize_hipz_acl() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* reverse map access control for MR/MW. + * This routine is used for MR and MW. + */ +void ehca_mrmw_reverse_map_acl(const u32 *hipz_acl, + int *ib_acl) /*OUT*/ +{ + *ib_acl = 0; + if (*hipz_acl & HIPZ_ACCESSCTRL_R_READ) + *ib_acl |= IB_ACCESS_REMOTE_READ; + if (*hipz_acl & HIPZ_ACCESSCTRL_R_WRITE) + *ib_acl |= IB_ACCESS_REMOTE_WRITE; + if (*hipz_acl & HIPZ_ACCESSCTRL_R_ATOMIC) + *ib_acl |= IB_ACCESS_REMOTE_ATOMIC; + if (*hipz_acl & HIPZ_ACCESSCTRL_L_WRITE) + *ib_acl |= IB_ACCESS_LOCAL_WRITE; + if (*hipz_acl & HIPZ_ACCESSCTRL_MW_BIND) + *ib_acl |= IB_ACCESS_MW_BIND; +} /* end ehca_mrmw_reverse_map_acl() */ + + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* map HIPZ rc to IB retcodes for MR/MW allocations + * Used for hipz_mr_reg_alloc and hipz_mw_alloc. + */ +int ehca_mrmw_map_hrc_alloc(const u64 hipz_rc) +{ + switch (hipz_rc) { + case H_SUCCESS: /* successful completion */ + return 0; + case H_ADAPTER_PARM: /* invalid adapter handle */ + case H_RT_PARM: /* invalid resource type */ + case H_NOT_ENOUGH_RESOURCES: /* insufficient resources */ + case H_MLENGTH_PARM: /* invalid memory length */ + case H_MEM_ACCESS_PARM: /* invalid access controls */ + case H_CONSTRAINED: /* resource constraint */ + return -EINVAL; + case H_BUSY: /* long busy */ + return -EBUSY; + default: + return -EINVAL; + } +} /* end ehca_mrmw_map_hrc_alloc() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* map HIPZ rc to IB retcodes for MR register rpage + * Used for hipz_h_register_rpage_mr at registering last page + */ +int ehca_mrmw_map_hrc_rrpg_last(const u64 hipz_rc) +{ + switch (hipz_rc) { + case H_SUCCESS: /* registration complete */ + return 0; + case H_PAGE_REGISTERED: /* page registered */ + case H_ADAPTER_PARM: /* invalid adapter handle */ + case H_RH_PARM: /* invalid resource handle */ +/* case H_QT_PARM: invalid queue type */ + case H_PARAMETER: /* invalid logical address, */ + /* or count zero or greater 512 */ + case H_TABLE_FULL: /* page table full */ + case H_HARDWARE: /* HCA not operational */ + return -EINVAL; + case H_BUSY: /* long busy */ + return -EBUSY; + default: + return -EINVAL; + } +} /* end ehca_mrmw_map_hrc_rrpg_last() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* map HIPZ rc to IB retcodes for MR register rpage + * Used for hipz_h_register_rpage_mr at registering one page, but not last page + */ +int ehca_mrmw_map_hrc_rrpg_notlast(const u64 hipz_rc) +{ + switch (hipz_rc) { + case H_PAGE_REGISTERED: /* page registered */ + return 0; + case H_SUCCESS: /* registration complete */ + case H_ADAPTER_PARM: /* invalid adapter handle */ + case H_RH_PARM: /* invalid resource handle */ +/* case H_QT_PARM: invalid queue type */ + case H_PARAMETER: /* invalid logical address, */ + /* or count zero or greater 512 */ + case H_TABLE_FULL: /* page table full */ + case H_HARDWARE: /* HCA not operational */ + return -EINVAL; + case H_BUSY: /* long busy */ + return -EBUSY; + default: + return -EINVAL; + } +} /* end ehca_mrmw_map_hrc_rrpg_notlast() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* map HIPZ rc to IB retcodes for MR query. Used for hipz_mr_query. */ +int ehca_mrmw_map_hrc_query_mr(const u64 hipz_rc) +{ + switch (hipz_rc) { + case H_SUCCESS: /* successful completion */ + return 0; + case H_ADAPTER_PARM: /* invalid adapter handle */ + case H_RH_PARM: /* invalid resource handle */ + return -EINVAL; + case H_BUSY: /* long busy */ + return -EBUSY; + default: + return -EINVAL; + } +} /* end ehca_mrmw_map_hrc_query_mr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* map HIPZ rc to IB retcodes for freeing MR resource + * Used for hipz_h_free_resource_mr + */ +int ehca_mrmw_map_hrc_free_mr(const u64 hipz_rc) +{ + switch (hipz_rc) { + case H_SUCCESS: /* resource freed */ + return 0; + case H_ADAPTER_PARM: /* invalid adapter handle */ + case H_RH_PARM: /* invalid resource handle */ + case H_R_STATE: /* invalid resource state */ + case H_HARDWARE: /* HCA not operational */ + return -EINVAL; + case H_RESOURCE: /* Resource in use */ + case H_BUSY: /* long busy */ + return -EBUSY; + default: + return -EINVAL; + } +} /* end ehca_mrmw_map_hrc_free_mr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* map HIPZ rc to IB retcodes for freeing MW resource + * Used for hipz_h_free_resource_mw + */ +int ehca_mrmw_map_hrc_free_mw(const u64 hipz_rc) +{ + switch (hipz_rc) { + case H_SUCCESS: /* resource freed */ + return 0; + case H_ADAPTER_PARM: /* invalid adapter handle */ + case H_RH_PARM: /* invalid resource handle */ + case H_R_STATE: /* invalid resource state */ + case H_HARDWARE: /* HCA not operational */ + return -EINVAL; + case H_RESOURCE: /* Resource in use */ + case H_BUSY: /* long busy */ + return -EBUSY; + default: + return -EINVAL; + } +} /* end ehca_mrmw_map_hrc_free_mw() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* map HIPZ rc to IB retcodes for SMR registrations + * Used for hipz_h_register_smr. + */ +int ehca_mrmw_map_hrc_reg_smr(const u64 hipz_rc) +{ + switch (hipz_rc) { + case H_SUCCESS: /* successful completion */ + return 0; + case H_ADAPTER_PARM: /* invalid adapter handle */ + case H_RH_PARM: /* invalid resource handle */ + case H_MEM_PARM: /* invalid MR virtual address */ + case H_MEM_ACCESS_PARM: /* invalid access controls */ + case H_NOT_ENOUGH_RESOURCES: /* insufficient resources */ + return -EINVAL; + case H_BUSY: /* long busy */ + return -EBUSY; + default: + return -EINVAL; + } +} /* end ehca_mrmw_map_hrc_reg_smr() */ + +/*----------------------------------------------------------------------*/ +/*----------------------------------------------------------------------*/ + +/* MR destructor and constructor + * used in Reregister MR verb, sets all fields in ehca_mr_t to 0, + * except struct ib_mr and spinlock + */ +void ehca_mr_deletenew(struct ehca_mr *mr) +{ + mr->flags = 0; + mr->num_pages = 0; + mr->num_4k = 0; + mr->acl = 0; + mr->start = NULL; + mr->fmr_page_size = 0; + mr->fmr_max_pages = 0; + mr->fmr_max_maps = 0; + mr->fmr_map_cnt = 0; + memset(&mr->ipz_mr_handle, 0, sizeof(mr->ipz_mr_handle)); + memset(&mr->galpas, 0, sizeof(mr->galpas)); + mr->nr_of_pages = 0; + mr->pagearray = NULL; + memset(&mr->pf, 0, sizeof(mr->pf)); +} /* end ehca_mr_deletenew() */ From schihei at de.ibm.com Mon May 15 10:42:14 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:42:14 +0200 Subject: [openib-general] [PATCH 08/16] ehca: protection domain and address vector Message-ID: <4468BD76.6010505@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/ehca_av.c | 306 +++++++++++++++++++++++++++++++++++ drivers/infiniband/hw/ehca/ehca_pd.c | 118 +++++++++++++ 2 files changed, 424 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_av.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_av.c 2006-05-12 12:45:25.000000000 +0200 @@ -0,0 +1,306 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * adress vector functions + * + * Authors: Hoang-Nam Nguyen + * Khadija Souissi + * Reinhard Ernst + * Christoph Raisch + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + + +#define DEB_PREFIX "ehav" + +#include + +#include "ehca_tools.h" +#include "ehca_iverbs.h" +#include "hcp_if.h" + +struct ib_ah *ehca_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr) +{ + extern struct ehca_module ehca_module; + extern int ehca_static_rate; + int ret = 0; + struct ehca_av *av = NULL; + struct ehca_shca *shca = NULL; + + EHCA_CHECK_PD_P(pd); + EHCA_CHECK_ADR_P(ah_attr); + + shca = container_of(pd->device, struct ehca_shca, ib_device); + + EDEB_EN(7, "pd=%p ah_attr=%p", pd, ah_attr); + + av = kmem_cache_alloc(ehca_module.cache_av, SLAB_KERNEL); + if (!av) { + EDEB_ERR(4, "Out of memory pd=%p ah_attr=%p", pd, ah_attr); + ret = -ENOMEM; + goto create_ah_exit0; + } + + av->av.sl = ah_attr->sl; + av->av.dlid = ntohs(ah_attr->dlid); + av->av.slid_path_bits = ah_attr->src_path_bits; + + if (ehca_static_rate < 0) { + int ah_mult = ib_rate_to_mult(ah_attr->static_rate); + int ehca_mult = + ib_rate_to_mult(shca->sport[ah_attr->port_num].rate ); + + if (ah_mult >= ehca_mult) + av->av.ipd = 0; + else + av->av.ipd = (ah_mult > 0) ? + ((ehca_mult - 1) / ah_mult) : 0; + } else + av->av.ipd = ehca_static_rate; + + EDEB(7, "IPD av->av.ipd set =%x ah_attr->static_rate=%x " + "shca_ib_rate=%x ",av->av.ipd, ah_attr->static_rate, + shca->sport[ah_attr->port_num].rate); + + av->av.lnh = ah_attr->ah_flags; + av->av.grh.word_0 |= EHCA_BMASK_SET(GRH_IPVERSION_MASK, 6); + av->av.grh.word_0 |= EHCA_BMASK_SET(GRH_TCLASS_MASK, + ah_attr->grh.traffic_class); + av->av.grh.word_0 |= EHCA_BMASK_SET(GRH_FLOWLABEL_MASK, + ah_attr->grh.flow_label); + av->av.grh.word_0 |= EHCA_BMASK_SET(GRH_HOPLIMIT_MASK, + ah_attr->grh.hop_limit); + av->av.grh.word_0 |= EHCA_BMASK_SET(GRH_NEXTHEADER_MASK, 0x1B); + /* IB transport */ + av->av.grh.word_0 = be64_to_cpu(av->av.grh.word_0); + /* set sgid in grh.word_1 */ + if (ah_attr->ah_flags & IB_AH_GRH) { + int rc = 0; + struct ib_port_attr port_attr; + union ib_gid gid; + memset(&port_attr, 0, sizeof(port_attr)); + rc = ehca_query_port(pd->device, ah_attr->port_num, + &port_attr); + if (rc) { /* invalid port number */ + ret = -EINVAL; + EDEB_ERR(4, "Invalid port number " + "ehca_query_port() returned %x " + "pd=%p ah_attr=%p", rc, pd, ah_attr); + goto create_ah_exit1; + } + memset(&gid, 0, sizeof(gid)); + rc = ehca_query_gid(pd->device, + ah_attr->port_num, + ah_attr->grh.sgid_index, &gid); + if (rc) { + ret = -EINVAL; + EDEB_ERR(4, "Failed to retrieve sgid " + "ehca_query_gid() returned %x " + "pd=%p ah_attr=%p", rc, pd, ah_attr); + goto create_ah_exit1; + } + memcpy(&av->av.grh.word_1, &gid, sizeof(gid)); + } + /* for the time being we use a hard coded PMTU of 2048 Bytes */ + av->av.pmtu = 4; + + /* dgid comes in grh.word_3 */ + memcpy(&av->av.grh.word_3, &ah_attr->grh.dgid, + sizeof(ah_attr->grh.dgid)); + + EHCA_REGISTER_AV(device, pd); + + EDEB_EX(7, "pd=%p ah_attr=%p av=%p", pd, ah_attr, av); + return &av->ib_ah; + +create_ah_exit1: + kmem_cache_free(ehca_module.cache_av, av); + +create_ah_exit0: + EDEB_EX(7, "ret=%x pd=%p ah_attr=%p", ret, pd, ah_attr); + + return ERR_PTR(ret); +} + +int ehca_modify_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr) +{ + struct ehca_av *av = NULL; + struct ehca_ud_av new_ehca_av; + struct ehca_pd *my_pd = NULL; + u32 cur_pid = current->tgid; + int ret = 0; + + EHCA_CHECK_AV(ah); + EHCA_CHECK_ADR(ah_attr); + + EDEB_EN(7, "ah=%p ah_attr=%p", ah, ah_attr); + + my_pd = container_of(ah->pd, struct ehca_pd, ib_pd); + if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && + my_pd->ownpid != cur_pid) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, my_pd->ownpid); + return -EINVAL; + } + + memset(&new_ehca_av, 0, sizeof(new_ehca_av)); + new_ehca_av.sl = ah_attr->sl; + new_ehca_av.dlid = ntohs(ah_attr->dlid); + new_ehca_av.slid_path_bits = ah_attr->src_path_bits; + new_ehca_av.ipd = ah_attr->static_rate; + new_ehca_av.lnh = EHCA_BMASK_SET(GRH_FLAG_MASK, + ((ah_attr->ah_flags & IB_AH_GRH) > 0)); + new_ehca_av.grh.word_0 = EHCA_BMASK_SET(GRH_TCLASS_MASK, + ah_attr->grh.traffic_class); + new_ehca_av.grh.word_0 |= EHCA_BMASK_SET(GRH_FLOWLABEL_MASK, + ah_attr->grh.flow_label); + new_ehca_av.grh.word_0 |= EHCA_BMASK_SET(GRH_HOPLIMIT_MASK, + ah_attr->grh.hop_limit); + new_ehca_av.grh.word_0 |= EHCA_BMASK_SET(GRH_NEXTHEADER_MASK, 0x1b); + new_ehca_av.grh.word_0 = be64_to_cpu(new_ehca_av.grh.word_0); + + /* set sgid in grh.word_1 */ + if (ah_attr->ah_flags & IB_AH_GRH) { + int rc = 0; + struct ib_port_attr port_attr; + union ib_gid gid; + memset(&port_attr, 0, sizeof(port_attr)); + rc = ehca_query_port(ah->device, ah_attr->port_num, + &port_attr); + if (rc) { /* invalid port number */ + ret = -EINVAL; + EDEB_ERR(4, "Invalid port number " + "ehca_query_port() returned %x " + "ah=%p ah_attr=%p port_num=%x", + rc, ah, ah_attr, ah_attr->port_num); + goto modify_ah_exit1; + } + memset(&gid, 0, sizeof(gid)); + rc = ehca_query_gid(ah->device, + ah_attr->port_num, + ah_attr->grh.sgid_index, &gid); + if (rc) { + ret = -EINVAL; + EDEB_ERR(4, "Failed to retrieve sgid " + "ehca_query_gid() returned %x " + "ah=%p ah_attr=%p port_num=%x " + "sgid_index=%x", + rc, ah, ah_attr, ah_attr->port_num, + ah_attr->grh.sgid_index); + goto modify_ah_exit1; + } + memcpy(&new_ehca_av.grh.word_1, &gid, sizeof(gid)); + } + + new_ehca_av.pmtu = 4; /* see also comment in create_ah() */ + + memcpy(&new_ehca_av.grh.word_3, &ah_attr->grh.dgid, + sizeof(ah_attr->grh.dgid)); + + av = container_of(ah, struct ehca_av, ib_ah); + av->av = new_ehca_av; + +modify_ah_exit1: + EDEB_EX(7, "ret=%x ah=%p ah_attr=%p", ret, ah, ah_attr); + + return ret; +} + +int ehca_query_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr) +{ + int ret = 0; + struct ehca_av *av = NULL; + struct ehca_pd *my_pd = NULL; + u32 cur_pid = current->tgid; + + EHCA_CHECK_AV(ah); + EHCA_CHECK_ADR(ah_attr); + + EDEB_EN(7, "ah=%p ah_attr=%p", ah, ah_attr); + + my_pd = container_of(ah->pd, struct ehca_pd, ib_pd); + if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && + my_pd->ownpid != cur_pid) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, my_pd->ownpid); + return -EINVAL; + } + + av = container_of(ah, struct ehca_av, ib_ah); + memcpy(&ah_attr->grh.dgid, &av->av.grh.word_3, + sizeof(ah_attr->grh.dgid)); + ah_attr->sl = av->av.sl; + + ah_attr->dlid = av->av.dlid; + + ah_attr->src_path_bits = av->av.slid_path_bits; + ah_attr->static_rate = av->av.ipd; + ah_attr->ah_flags = EHCA_BMASK_GET(GRH_FLAG_MASK, av->av.lnh); + ah_attr->grh.traffic_class = EHCA_BMASK_GET(GRH_TCLASS_MASK, + av->av.grh.word_0); + ah_attr->grh.hop_limit = EHCA_BMASK_GET(GRH_HOPLIMIT_MASK, + av->av.grh.word_0); + ah_attr->grh.flow_label = EHCA_BMASK_GET(GRH_FLOWLABEL_MASK, + av->av.grh.word_0); + + EDEB_EX(7, "ah=%p ah_attr=%p ret=%x", ah, ah_attr, ret); + return ret; +} + +int ehca_destroy_ah(struct ib_ah *ah) +{ + extern struct ehca_module ehca_module; + struct ehca_pd *my_pd = NULL; + u32 cur_pid = current->tgid; + int ret = 0; + + EHCA_CHECK_AV(ah); + EHCA_DEREGISTER_AV(ah); + + EDEB_EN(7, "ah=%p", ah); + + my_pd = container_of(ah->pd, struct ehca_pd, ib_pd); + if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && + my_pd->ownpid != cur_pid) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, my_pd->ownpid); + return -EINVAL; + } + + kmem_cache_free(ehca_module.cache_av, + container_of(ah, struct ehca_av, ib_ah)); + + EDEB_EX(7, "ret=%x ah=%p", ret, ah); + return ret; +} --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_pd.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_pd.c 2006-05-15 13:35:24.000000000 +0200 @@ -0,0 +1,118 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * PD functions + * + * Authors: Christoph Raisch + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + + +#define DEB_PREFIX "vpd " + +#include + +#include "ehca_tools.h" +#include "ehca_iverbs.h" + +struct ib_pd *ehca_alloc_pd(struct ib_device *device, + struct ib_ucontext *context, struct ib_udata *udata) +{ + extern struct ehca_module ehca_module; + struct ib_pd *mypd = NULL; + struct ehca_pd *pd = NULL; + + EDEB_EN(7, "device=%p context=%p udata=%p", device, context, udata); + + EHCA_CHECK_DEVICE_P(device); + + pd = kmem_cache_alloc(ehca_module.cache_pd, SLAB_KERNEL); + if (!pd) { + EDEB_ERR(4, "ERROR device=%p context=%p pd=%p" + " out of memory", device, context, mypd); + return ERR_PTR(-ENOMEM); + } + + memset(pd, 0, sizeof(struct ehca_pd)); + pd->ownpid = current->tgid; + + /* Kernel PD: when device = -1, 0 + * User PD: when context != -1 + */ + if (!context) { + /* Kernel PDs after init reuses always + * the one created in ehca_shca_reopen() + */ + struct ehca_shca *shca = container_of(device, struct ehca_shca, + ib_device); + pd->fw_pd.value = shca->pd->fw_pd.value; + } else + pd->fw_pd.value = (u64)pd; + + mypd = &pd->ib_pd; + + EHCA_REGISTER_PD(device, pd); + + EDEB_EX(7, "device=%p context=%p pd=%p", device, context, mypd); + + return mypd; +} + +int ehca_dealloc_pd(struct ib_pd *pd) +{ + extern struct ehca_module ehca_module; + int ret = 0; + u32 cur_pid = current->tgid; + struct ehca_pd *my_pd = NULL; + + EDEB_EN(7, "pd=%p", pd); + + EHCA_CHECK_PD(pd); + my_pd = container_of(pd, struct ehca_pd, ib_pd); + if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && + my_pd->ownpid != cur_pid) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, my_pd->ownpid); + return -EINVAL; + } + + EHCA_DEREGISTER_PD(pd); + + kmem_cache_free(ehca_module.cache_pd, + container_of(pd, struct ehca_pd, ib_pd)); + + EDEB_EX(7, "pd=%p", pd); + + return ret; +} From schihei at de.ibm.com Mon May 15 10:42:22 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:42:22 +0200 Subject: [openib-general] [PATCH 09/16] ehca: event queue Message-ID: <4468BD7E.4040805@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/ehca_eq.c | 222 +++++++++++++++++++++++++++++++++++ 1 file changed, 222 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_eq.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_eq.c 2006-05-15 13:29:49.000000000 +0200 @@ -0,0 +1,222 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * Event queue handling + * + * Authors: Waleri Fomin + * Khadija Souissi + * Reinhard Ernst + * Heiko J Schick + * Hoang-Nam Nguyen + * + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#define DEB_PREFIX "e_eq" + +#include "ehca_classes.h" +#include "ehca_irq.h" +#include "ehca_iverbs.h" +#include "ehca_qes.h" +#include "hcp_if.h" +#include "ipz_pt_fn.h" + +int ehca_create_eq(struct ehca_shca *shca, + struct ehca_eq *eq, + const enum ehca_eq_type type, const u32 length) +{ + u64 ret = H_SUCCESS; + u32 nr_pages = 0; + u32 i; + void *vpage = NULL; + + EDEB_EN(7, "shca=%p eq=%p length=%x", shca, eq, length); + EHCA_CHECK_ADR(shca); + EHCA_CHECK_ADR(eq); + + spin_lock_init(&eq->spinlock); + eq->is_initialized = 0; + + if (type != EHCA_EQ && type != EHCA_NEQ) { + EDEB_ERR(4, "Invalid EQ type %x. eq=%p", type, eq); + return -EINVAL; + } + if (length == 0) { + EDEB_ERR(4, "EQ length must not be zero. eq=%p", eq); + return -EINVAL; + } + + ret = hipz_h_alloc_resource_eq(shca->ipz_hca_handle, + &eq->pf, + type, + length, + &eq->ipz_eq_handle, + &eq->length, + &nr_pages, &eq->ist); + + if (ret != H_SUCCESS) { + EDEB_ERR(4, "Can't allocate EQ / NEQ. eq=%p", eq); + return -EINVAL; + } + + ret = ipz_queue_ctor(&eq->ipz_queue, nr_pages, + EHCA_PAGESIZE, sizeof(struct ehca_eqe), 0); + if (!ret) { + EDEB_ERR(4, "Can't allocate EQ pages. eq=%p", eq); + goto create_eq_exit1; + } + + for (i = 0; i < nr_pages; i++) { + u64 rpage; + + if (!(vpage = ipz_qpageit_get_inc(&eq->ipz_queue))) { + ret = H_RESOURCE; + goto create_eq_exit2; + } + + rpage = virt_to_abs(vpage); + ret = hipz_h_register_rpage_eq(shca->ipz_hca_handle, + eq->ipz_eq_handle, + &eq->pf, + 0, 0, rpage, 1); + + if (i == (nr_pages - 1)) { + /* last page */ + vpage = ipz_qpageit_get_inc(&eq->ipz_queue); + if (ret != H_SUCCESS || vpage) + goto create_eq_exit2; + } else { + if (ret != H_PAGE_REGISTERED || !vpage) + goto create_eq_exit2; + } + } + + ipz_qeit_reset(&eq->ipz_queue); + + /* register interrupt handlers and initialize work queues */ + if (type == EHCA_EQ) { + ret = ibmebus_request_irq(NULL, eq->ist, ehca_interrupt_eq, + SA_INTERRUPT, "ehca_eq", + (void *)shca); + if (ret < 0) + EDEB_ERR(4, "Can't map interrupt handler."); + + tasklet_init(&eq->interrupt_task, ehca_tasklet_eq, (long)shca); + } else if (type == EHCA_NEQ) { + ret = ibmebus_request_irq(NULL, eq->ist, ehca_interrupt_neq, + SA_INTERRUPT, "ehca_neq", + (void *)shca); + if (ret < 0) + EDEB_ERR(4, "Can't map interrupt handler."); + + tasklet_init(&eq->interrupt_task, ehca_tasklet_neq, (long)shca); + } + + eq->is_initialized = 1; + + EDEB_EX(7, "ret=%lx", ret); + + return 0; + +create_eq_exit2: + ipz_queue_dtor(&eq->ipz_queue); + +create_eq_exit1: + hipz_h_destroy_eq(shca->ipz_hca_handle, eq); + + EDEB_EX(7, "ret=%lx", ret); + + return -EINVAL; +} + +void *ehca_poll_eq(struct ehca_shca *shca, struct ehca_eq *eq) +{ + unsigned long flags = 0; + void *eqe = NULL; + + EDEB_EN(7, "shca=%p eq=%p", shca, eq); + EHCA_CHECK_ADR_P(shca); + EHCA_CHECK_EQ_P(eq); + + spin_lock_irqsave(&eq->spinlock, flags); + eqe = ipz_eqit_eq_get_inc_valid(&eq->ipz_queue); + spin_unlock_irqrestore(&eq->spinlock, flags); + + EDEB_EX(7, "eq=%p eqe=%p", eq, eqe); + + return eqe; +} + +void ehca_poll_eqs(unsigned long data) +{ + struct ehca_shca *shca; + struct ehca_module *module = (struct ehca_module*)data; + + spin_lock(&module->shca_lock); + list_for_each_entry(shca, &module->shca_list, shca_list) { + if (shca->eq.is_initialized) + ehca_tasklet_eq((unsigned long)(void*)shca); + } + mod_timer(&module->timer, jiffies + HZ); + spin_unlock(&module->shca_lock); + + return; +} + +int ehca_destroy_eq(struct ehca_shca *shca, struct ehca_eq *eq) +{ + unsigned long flags = 0; + u64 h_ret = H_SUCCESS; + + EDEB_EN(7, "shca=%p eq=%p", shca, eq); + EHCA_CHECK_ADR(shca); + EHCA_CHECK_EQ(eq); + + spin_lock_irqsave(&eq->spinlock, flags); + ibmebus_free_irq(NULL, eq->ist, (void *)shca); + + h_ret = hipz_h_destroy_eq(shca->ipz_hca_handle, eq); + + spin_unlock_irqrestore(&eq->spinlock, flags); + + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "Can't free EQ resources."); + return -EINVAL; + } + ipz_queue_dtor(&eq->ipz_queue); + + EDEB_EX(7, "h_ret=%lx", h_ret); + + return h_ret; +} From schihei at de.ibm.com Mon May 15 10:42:29 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:42:29 +0200 Subject: [openib-general] [PATCH 10/16] ehca: completion queue Message-ID: <4468BD85.8090805@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/ehca_cq.c | 431 +++++++++++++++++++++++++++++++++++ 1 file changed, 431 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_cq.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_cq.c 2006-05-15 14:56:34.000000000 +0200 @@ -0,0 +1,431 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * Completion queue handling + * + * Authors: Waleri Fomin + * Khadija Souissi + * Reinhard Ernst + * Heiko J Schick + * Hoang-Nam Nguyen + * + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#define DEB_PREFIX "e_cq" + +#include + +#include "ehca_iverbs.h" +#include "ehca_classes.h" +#include "ehca_irq.h" +#include "hcp_if.h" + +int ehca_cq_assign_qp(struct ehca_cq *cq, struct ehca_qp *qp) +{ + unsigned int qp_num = qp->real_qp_num; + unsigned int key = qp_num & (QP_HASHTAB_LEN-1); + unsigned long spl_flags = 0; + + spin_lock_irqsave(&cq->spinlock, spl_flags); + hlist_add_head(&qp->list_entries, &cq->qp_hashtab[key]); + spin_unlock_irqrestore(&cq->spinlock, spl_flags); + + EDEB(7, "cq_num=%x real_qp_num=%x", cq->cq_number, qp_num); + + return 0; +} + +int ehca_cq_unassign_qp(struct ehca_cq *cq, unsigned int real_qp_num) +{ + int ret = -EINVAL; + unsigned int key = real_qp_num & (QP_HASHTAB_LEN-1); + struct hlist_node *iter = NULL; + struct ehca_qp *qp = NULL; + unsigned long spl_flags = 0; + + spin_lock_irqsave(&cq->spinlock, spl_flags); + hlist_for_each(iter, &cq->qp_hashtab[key]) { + qp = hlist_entry(iter, struct ehca_qp, list_entries); + if (qp->real_qp_num == real_qp_num) { + hlist_del(iter); + EDEB(7, "removed qp from cq .cq_num=%x real_qp_num=%x", + cq->cq_number, real_qp_num); + ret = 0; + break; + } + } + spin_unlock_irqrestore(&cq->spinlock, spl_flags); + if (ret) { + EDEB_ERR(4, "qp not found cq_num=%x real_qp_num=%x", + cq->cq_number, real_qp_num); + } + + return ret; +} + +struct ehca_qp* ehca_cq_get_qp(struct ehca_cq *cq, int real_qp_num) +{ + struct ehca_qp *ret = NULL; + unsigned int key = real_qp_num & (QP_HASHTAB_LEN-1); + struct hlist_node *iter = NULL; + struct ehca_qp *qp = NULL; + hlist_for_each(iter, &cq->qp_hashtab[key]) { + qp = hlist_entry(iter, struct ehca_qp, list_entries); + if (qp->real_qp_num == real_qp_num) { + ret = qp; + break; + } + } + return ret; +} + +struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, + struct ib_ucontext *context, + struct ib_udata *udata) +{ + extern struct ehca_module ehca_module; + struct ib_cq *cq = NULL; + struct ehca_cq *my_cq = NULL; + struct ehca_shca *shca = NULL; + struct ipz_adapter_handle adapter_handle; + /* h_call's out parameters */ + struct ehca_alloc_cq_parms param; + u32 counter = 0; + void *vpage = NULL; + u64 rpage = 0; + struct h_galpa gal; + u64 cqx_fec = 0; + u64 h_ret = 0; + int ipz_rc = 0; + int ret = 0; + const u32 additional_cqe=20; + int i= 0; + unsigned long flags; + + EHCA_CHECK_DEVICE_P(device); + EDEB_EN(7, "device=%p cqe=%x context=%p", device, cqe, context); + + if (cqe >= 0xFFFFFFFF - 64 - additional_cqe) + return ERR_PTR(-EINVAL); + + my_cq = kmem_cache_alloc(ehca_module.cache_cq, SLAB_KERNEL); + if (!my_cq) { + cq = ERR_PTR(-ENOMEM); + EDEB_ERR(4, "Out of memory for ehca_cq struct device=%p", + device); + goto create_cq_exit0; + } + + memset(my_cq, 0, sizeof(struct ehca_cq)); + memset(¶m, 0, sizeof(struct ehca_alloc_cq_parms)); + + spin_lock_init(&my_cq->spinlock); + spin_lock_init(&my_cq->cb_lock); + spin_lock_init(&my_cq->task_lock); + my_cq->ownpid = current->tgid; + + cq = &my_cq->ib_cq; + + shca = container_of(device, struct ehca_shca, ib_device); + adapter_handle = shca->ipz_hca_handle; + param.eq_handle = shca->eq.ipz_eq_handle; + + + do { + if (!idr_pre_get(&ehca_cq_idr, GFP_KERNEL)) { + cq = ERR_PTR(-ENOMEM); + EDEB_ERR(4, + "Can't reserve idr resources. " + "device=%p", device); + goto create_cq_exit1; + } + + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + ret = idr_get_new(&ehca_cq_idr, my_cq, &my_cq->token); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + + } while (ret == -EAGAIN); + + if (ret) { + cq = ERR_PTR(-ENOMEM); + EDEB_ERR(4, + "Can't allocate new idr entry. " + "device=%p", device); + goto create_cq_exit1; + } + + /* CQs maximum depth is 4GB-64, but we need additional 20 as buffer + * for receiving errors CQEs. + */ + param.nr_cqe = cqe + additional_cqe; + h_ret = hipz_h_alloc_resource_cq(adapter_handle, my_cq, ¶m); + + if (h_ret != H_SUCCESS) { + EDEB_ERR(4,"hipz_h_alloc_resource_cq() failed " + "h_ret=%lx device=%p", h_ret, device); + cq = ERR_PTR(ehca2ib_return_code(h_ret)); + goto create_cq_exit2; + } + + ipz_rc = ipz_queue_ctor(&my_cq->ipz_queue, param.act_pages, + EHCA_PAGESIZE, sizeof(struct ehca_cqe), 0); + if (!ipz_rc) { + EDEB_ERR(4, + "ipz_queue_ctor() failed " + "ipz_rc=%x device=%p", ipz_rc, device); + cq = ERR_PTR(-EINVAL); + goto create_cq_exit3; + } + + for (counter = 0; counter < param.act_pages; counter++) { + vpage = ipz_qpageit_get_inc(&my_cq->ipz_queue); + if (!vpage) { + EDEB_ERR(4, "ipz_qpageit_get_inc() " + "returns NULL device=%p", device); + cq = ERR_PTR(-EAGAIN); + goto create_cq_exit4; + } + rpage = virt_to_abs(vpage); + + h_ret = hipz_h_register_rpage_cq(adapter_handle, + my_cq->ipz_cq_handle, + &my_cq->pf, + 0, + 0, + rpage, + 1, + my_cq->galpas. + kernel); + + if (h_ret < H_SUCCESS) { + EDEB_ERR(4, "hipz_h_register_rpage_cq() failed " + "ehca_cq=%p cq_num=%x h_ret=%lx " + "counter=%i act_pages=%i", + my_cq, my_cq->cq_number, + h_ret, counter, param.act_pages); + cq = ERR_PTR(-EINVAL); + goto create_cq_exit4; + } + + if (counter == (param.act_pages - 1)) { + vpage = ipz_qpageit_get_inc(&my_cq->ipz_queue); + if ((h_ret != H_SUCCESS) || vpage) { + EDEB_ERR(4, "Registration of pages not " + "complete ehca_cq=%p cq_num=%x " + "h_ret=%lx", + my_cq, my_cq->cq_number, h_ret); + cq = ERR_PTR(-EAGAIN); + goto create_cq_exit4; + } + } else { + if (h_ret != H_PAGE_REGISTERED) { + EDEB_ERR(4, "Registration of page failed " + "ehca_cq=%p cq_num=%x h_ret=%lx" + "counter=%i act_pages=%i", + my_cq, my_cq->cq_number, + h_ret, counter, param.act_pages); + cq = ERR_PTR(-ENOMEM); + goto create_cq_exit4; + } + } + } + + ipz_qeit_reset(&my_cq->ipz_queue); + + gal = my_cq->galpas.kernel; + cqx_fec = hipz_galpa_load(gal, CQTEMM_OFFSET(cqx_fec)); + EDEB(8, "ehca_cq=%p cq_num=%x CQX_FEC=%lx", + my_cq, my_cq->cq_number, cqx_fec); + + my_cq->ib_cq.cqe = my_cq->nr_of_entries = + param.act_nr_of_entries - additional_cqe; + my_cq->cq_number = (my_cq->ipz_cq_handle.handle) & 0xffff; + + for (i = 0; i < QP_HASHTAB_LEN; i++) + INIT_HLIST_HEAD(&my_cq->qp_hashtab[i]); + + if (context) { + struct ipz_queue *ipz_queue = &my_cq->ipz_queue; + struct ehca_create_cq_resp resp; + struct vm_area_struct *vma = NULL; + memset(&resp, 0, sizeof(resp)); + resp.cq_number = my_cq->cq_number; + resp.token = my_cq->token; + resp.ipz_queue.qe_size = ipz_queue->qe_size; + resp.ipz_queue.act_nr_of_sg = ipz_queue->act_nr_of_sg; + resp.ipz_queue.queue_length = ipz_queue->queue_length; + resp.ipz_queue.pagesize = ipz_queue->pagesize; + resp.ipz_queue.toggle_state = ipz_queue->toggle_state; + ehca_mmap_nopage(((u64) (my_cq->token) << 32) | 0x12000000, + ipz_queue->queue_length, + ((void**)&resp.ipz_queue.queue), + &vma); + my_cq->uspace_queue = resp.ipz_queue.queue; + resp.galpas = my_cq->galpas; + ehca_mmap_register(my_cq->galpas.user.fw_handle, + ((void**)&resp.galpas.kernel.fw_handle), + &vma); + my_cq->uspace_fwh = (u64)resp.galpas.kernel.fw_handle; + if (ib_copy_to_udata(udata, &resp, sizeof(resp))) { + EDEB_ERR(4, "Copy to udata failed."); + goto create_cq_exit4; + } + } + + EDEB_EX(7,"retcode=%p ehca_cq=%p cq_num=%x cq_size=%x", + cq, my_cq, my_cq->cq_number, param.act_nr_of_entries); + return cq; + +create_cq_exit4: + ipz_queue_dtor(&my_cq->ipz_queue); + +create_cq_exit3: + h_ret = hipz_h_destroy_cq(adapter_handle, my_cq, 1); + EDEB(3, "hipz_h_destroy_cq() failed ehca_cq=%p cq_num=%x h_ret=%lx", + my_cq, my_cq->cq_number, h_ret); + +create_cq_exit2: + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + idr_remove(&ehca_cq_idr, my_cq->token); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + +create_cq_exit1: + kmem_cache_free(ehca_module.cache_cq, my_cq); + +create_cq_exit0: + EDEB_EX(7, "An error has occured retcode=%p ", cq); + return cq; +} + +int ehca_destroy_cq(struct ib_cq *cq) +{ + extern struct ehca_module ehca_module; + u64 h_ret = 0; + int ret = 0; + struct ehca_cq *my_cq = NULL; + int cq_num = 0; + struct ib_device *device = NULL; + struct ehca_shca *shca = NULL; + struct ipz_adapter_handle adapter_handle; + u32 cur_pid = current->tgid; + unsigned long flags; + + EHCA_CHECK_CQ(cq); + my_cq = container_of(cq, struct ehca_cq, ib_cq); + cq_num = my_cq->cq_number; + device = cq->device; + EHCA_CHECK_DEVICE(device); + shca = container_of(device, struct ehca_shca, ib_device); + adapter_handle = shca->ipz_hca_handle; + EDEB_EN(7, "ehca_cq=%p cq_num=%x", + my_cq, my_cq->cq_number); + + spin_lock_irqsave(&ehca_cq_idr_lock, flags); + while (my_cq->nr_callbacks) + yield(); + + idr_remove(&ehca_cq_idr, my_cq->token); + spin_unlock_irqrestore(&ehca_cq_idr_lock, flags); + + if (my_cq->uspace_queue && my_cq->ownpid != cur_pid) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, my_cq->ownpid); + return -EINVAL; + } + + /* un-mmap if vma alloc */ + if (my_cq->uspace_queue ) { + ret = ehca_munmap(my_cq->uspace_queue, + my_cq->ipz_queue.queue_length); + ret = ehca_munmap(my_cq->uspace_fwh, 4096); + } + + h_ret = hipz_h_destroy_cq(adapter_handle, my_cq, 0); + if (h_ret == H_R_STATE) { + /* cq in err: read err data and destroy it forcibly */ + EDEB(4, "ehca_cq=%p cq_num=%x ressource=%lx in err state. " + "Try to delete it forcibly.", + my_cq, my_cq->cq_number, my_cq->ipz_cq_handle.handle); + ehca_error_data(shca, my_cq, my_cq->ipz_cq_handle.handle); + h_ret = hipz_h_destroy_cq(adapter_handle, my_cq, 1); + if (h_ret == H_SUCCESS) + EDEB(4, "ehca_cq=%p cq_num=%x deleted successfully.", + my_cq, my_cq->cq_number); + } + if (h_ret != H_SUCCESS) { + EDEB_ERR(4,"hipz_h_destroy_cq() failed " + "h_ret=%lx ehca_cq=%p cq_num=%x", + h_ret, my_cq, my_cq->cq_number); + ret = ehca2ib_return_code(h_ret); + goto destroy_cq_exit0; + } + ipz_queue_dtor(&my_cq->ipz_queue); + kmem_cache_free(ehca_module.cache_cq, my_cq); + +destroy_cq_exit0: + EDEB_EX(7, "ehca_cq=%p cq_num=%x ret=%x ", + my_cq, cq_num, ret); + return ret; +} + +int ehca_resize_cq(struct ib_cq *cq, int cqe, struct ib_udata *udata) +{ + int ret = 0; + struct ehca_cq *my_cq = NULL; + u32 cur_pid = current->tgid; + + if (unlikely(!cq)) { + EDEB_ERR(4, "cq is NULL"); + return -EFAULT; + } + + my_cq = container_of(cq, struct ehca_cq, ib_cq); + EDEB_EN(7, "ehca_cq=%p cq_num=%x", + my_cq, my_cq->cq_number); + + if (my_cq->uspace_queue && my_cq->ownpid != cur_pid) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, my_cq->ownpid); + return -EINVAL; + } + + /* TODO: proper resize needs to be done */ + ret = -EFAULT; + EDEB_ERR(4, "not implemented yet"); + + EDEB_EX(7, "ehca_cq=%p cq_num=%x", + my_cq, my_cq->cq_number); + return ret; +} From schihei at de.ibm.com Mon May 15 10:42:49 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:42:49 +0200 Subject: [openib-general] [PATCH 12/16] ehca: firmware InfiniBand interface Message-ID: <4468BD99.5050505@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/hcp_if.c | 1476 ++++++++++++++++++++++++++++++++++++ drivers/infiniband/hw/ehca/hcp_if.h | 330 ++++++++ 2 files changed, 1806 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/hcp_if.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/hcp_if.h 2006-05-12 12:48:21.000000000 +0200 @@ -0,0 +1,330 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * Firmware Infiniband Interface code for POWER + * + * Authors: Christoph Raisch + * Hoang-Nam Nguyen + * Gerd Bayer + * Waleri Fomin + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __HCP_IF_H__ +#define __HCP_IF_H__ + +#include "ehca_classes.h" +#include "ehca_tools.h" +#include "hipz_hw.h" + +/** + * hipz_h_alloc_resource_eq - Allocate EQ resources in HW and FW, initalize + * resources, create the empty EQPT (ring). + * + * @eq_handle: eq handle for this queue + * @act_nr_of_entries: actual number of queue entries + * @act_pages: actual number of queue pages + * @eq_ist: used by hcp_H_XIRR() call + */ +u64 hipz_h_alloc_resource_eq(const struct ipz_adapter_handle adapter_handle, + struct ehca_pfeq *pfeq, + const u32 neq_control, + const u32 number_of_entries, + struct ipz_eq_handle *eq_handle, + u32 * act_nr_of_entries, + u32 * act_pages, + u32 * eq_ist); + +u64 hipz_h_reset_event(const struct ipz_adapter_handle adapter_handle, + struct ipz_eq_handle eq_handle, + const u64 event_mask); +/** + * hipz_h_allocate_resource_cq - Allocate CQ resources in HW and FW, initialize + * resources, create the empty CQPT (ring). + * + * @eq_handle: eq handle to use for this cq + * @cq_handle: cq handle for this queue + * @act_nr_of_entries: actual number of queue entries + * @act_pages: actual number of queue pages + * @galpas: contain logical adress of priv. storage and + * log_user_storage + */ +u64 hipz_h_alloc_resource_cq(const struct ipz_adapter_handle adapter_handle, + struct ehca_cq *cq, + struct ehca_alloc_cq_parms *param); + + +/** + * hipz_h_alloc_resource_qp - Allocate QP resources in HW and FW, + * initialize resources, create empty QPPTs (2 rings). + * + * @h_galpas to access HCA resident QP attributes + */ +u64 hipz_h_alloc_resource_qp(const struct ipz_adapter_handle adapter_handle, + struct ehca_qp *qp, + struct ehca_alloc_qp_parms *parms); + +u64 hipz_h_query_port(const struct ipz_adapter_handle adapter_handle, + const u8 port_id, + struct hipz_query_port *query_port_response_block); + +u64 hipz_h_query_hca(const struct ipz_adapter_handle adapter_handle, + struct hipz_query_hca *query_hca_rblock); + +/** + * hipz_h_register_rpage - hcp_if.h internal function for all + * hcp_H_REGISTER_RPAGE calls. + * + * @logical_address_of_page: kv transformation to GX address in this routine + */ +u64 hipz_h_register_rpage(const struct ipz_adapter_handle adapter_handle, + const u8 pagesize, + const u8 queue_type, + const u64 resource_handle, + const u64 logical_address_of_page, + u64 count); + +u64 hipz_h_register_rpage_eq(const struct ipz_adapter_handle adapter_handle, + const struct ipz_eq_handle eq_handle, + struct ehca_pfeq *pfeq, + const u8 pagesize, + const u8 queue_type, + const u64 logical_address_of_page, + const u64 count); + +u32 hipz_h_query_int_state(const struct ipz_adapter_handle + hcp_adapter_handle, + u32 ist); + +u64 hipz_h_register_rpage_cq(const struct ipz_adapter_handle adapter_handle, + const struct ipz_cq_handle cq_handle, + struct ehca_pfcq *pfcq, + const u8 pagesize, + const u8 queue_type, + const u64 logical_address_of_page, + const u64 count, + const struct h_galpa gal); + +u64 hipz_h_register_rpage_qp(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct ehca_pfqp *pfqp, + const u8 pagesize, + const u8 queue_type, + const u64 logical_address_of_page, + const u64 count, + const struct h_galpa galpa); + +u64 hipz_h_disable_and_get_wqe(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct ehca_pfqp *pfqp, + void **log_addr_next_sq_wqe_tb_processed, + void **log_addr_next_rq_wqe_tb_processed, + int dis_and_get_function_code); +enum hcall_sigt { + HCALL_SIGT_NO_CQE = 0, + HCALL_SIGT_BY_WQE = 1, + HCALL_SIGT_EVERY = 2 +}; + +u64 hipz_h_modify_qp(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct ehca_pfqp *pfqp, + const u64 update_mask, + struct hcp_modify_qp_control_block *mqpcb, + struct h_galpa gal); + +u64 hipz_h_query_qp(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct ehca_pfqp *pfqp, + struct hcp_modify_qp_control_block *qqpcb, + struct h_galpa gal); + +u64 hipz_h_destroy_qp(const struct ipz_adapter_handle adapter_handle, + struct ehca_qp *qp); + +u64 hipz_h_define_aqp0(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct h_galpa gal, + u32 port); + +u64 hipz_h_define_aqp1(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct h_galpa gal, + u32 port, u32 * pma_qp_nr, + u32 * bma_qp_nr); + +u64 hipz_h_attach_mcqp(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct h_galpa gal, + u16 mcg_dlid, + u64 subnet_prefix, u64 interface_id); + +u64 hipz_h_detach_mcqp(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct h_galpa gal, + u16 mcg_dlid, + u64 subnet_prefix, u64 interface_id); + +u64 hipz_h_destroy_cq(const struct ipz_adapter_handle adapter_handle, + struct ehca_cq *cq, + u8 force_flag); + +u64 hipz_h_destroy_eq(const struct ipz_adapter_handle adapter_handle, + struct ehca_eq *eq); + +/** + * hipz_h_alloc_resource_mr - Allocate MR resources in HW and FW, initialize + * resources. + * + * @mr: ehca MR + * @vaddr: Memory Region I/O Virtual Address + * @length: Memory Region Length + * @access_ctrl: Memory Region Access Controls + * @pd: Protection Domain + * @outparms: output parameters + */ +u64 hipz_h_alloc_resource_mr(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mr *mr, + const u64 vaddr, + const u64 length, + const u32 access_ctrl, + const struct ipz_pd pd, + struct ehca_mr_hipzout_parms *outparms); + +/** + * hipz_h_register_rpage_mr - Register MR resource page in HW and FW . + * + * @mr: ehca MR + * @queue_type: must be zero for MR + */ +u64 hipz_h_register_rpage_mr(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mr *mr, + const u8 pagesize, + const u8 queue_type, + const u64 logical_address_of_page, + const u64 count); + +/** + * hipz_h_query_mr - Query MR in HW and FW. + * + * @mr: ehca MR + * @outparms: output parameters + */ +u64 hipz_h_query_mr(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mr *mr, + struct ehca_mr_hipzout_parms *outparms); + +/** + * hipz_h_free_resource_mr - Free MR resources in HW and FW. + * + * @mr: ehca MR + */ +u64 hipz_h_free_resource_mr(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mr *mr); + +/** + * hipz_h_reregister_pmr - Reregister MR in HW and FW. + * + * @mr: ehca MR + * @vaddr_in: Memory Region I/O Virtual Address + * @length: Memory Region Length + * @access_ctrl: Memory Region Access Controls + * @pd: Protection Domain + * @mr_addr_cb: Logical Address of MR Control Block + * @outparms: output parameters + */ +u64 hipz_h_reregister_pmr(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mr *mr, + const u64 vaddr_in, + const u64 length, + const u32 access_ctrl, + const struct ipz_pd pd, + const u64 mr_addr_cb, + struct ehca_mr_hipzout_parms *outparms); + +/** + * hipz_h_register_smr - Register shared MR in HW and FW. + * + * @mr: ehca MR + * @orig_mr: original ehca MR + * @vaddr_in: Memory Region I/O Virtual Address of new shared MR + * @access_ctrl: Memory Region Access Controls of new shared MR + * @pd: Protection Domain of new shared MR + * @outparms: output parameters + */ +u64 hipz_h_register_smr(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mr *mr, + const struct ehca_mr *orig_mr, + const u64 vaddr_in, + const u32 access_ctrl, + const struct ipz_pd pd, + struct ehca_mr_hipzout_parms *outparms); + +/** + * hipz_h_alloc_resource_mw - Allocate MR resources in HW and FW, initialize + * resources. + * + * @mw: ehca MW + * @pd: Protection Domain + * @outparms: output parameters + */ +u64 hipz_h_alloc_resource_mw(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mw *mw, + const struct ipz_pd pd, + struct ehca_mw_hipzout_parms *outparms); + +/** + * hipz_h_query_mw - Query MW in HW and FW. + * + * @mw: ehca MW + * @outparms: output parameters + */ +u64 hipz_h_query_mw(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mw *mw, + struct ehca_mw_hipzout_parms *outparms); + +/** + * hipz_h_free_resource_mw - Free MW resources in HW and FW. + * + * @mw: ehca MW + */ +u64 hipz_h_free_resource_mw(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mw *mw); + +u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle, + const u64 ressource_handle, + void *rblock, + unsigned long *byte_count); + +#endif /* __HCP_IF_H__ */ --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/hcp_if.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/hcp_if.c 2006-05-15 15:43:31.000000000 +0200 @@ -0,0 +1,1476 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * Firmware Infiniband Interface code for POWER + * + * Authors: Christoph Raisch + * Hoang-Nam Nguyen + * Gerd Bayer + * Waleri Fomin + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#define DEB_PREFIX "hcpi" + +#include +#include "ehca_tools.h" +#include "hcp_if.h" +#include "hcp_phyp.h" +#include "hipz_fns.h" + +#define H_ALL_RES_QP_ENHANCED_OPS EHCA_BMASK_IBM(9,11) +#define H_ALL_RES_QP_PTE_PIN EHCA_BMASK_IBM(12,12) +#define H_ALL_RES_QP_SERVICE_TYPE EHCA_BMASK_IBM(13,15) +#define H_ALL_RES_QP_LL_RQ_CQE_POSTING EHCA_BMASK_IBM(18,18) +#define H_ALL_RES_QP_LL_SQ_CQE_POSTING EHCA_BMASK_IBM(19,21) +#define H_ALL_RES_QP_SIGNALING_TYPE EHCA_BMASK_IBM(22,23) +#define H_ALL_RES_QP_UD_AV_LKEY_CTRL EHCA_BMASK_IBM(31,31) +#define H_ALL_RES_QP_RESOURCE_TYPE EHCA_BMASK_IBM(56,63) + +#define H_ALL_RES_QP_MAX_OUTST_SEND_WR EHCA_BMASK_IBM(0,15) +#define H_ALL_RES_QP_MAX_OUTST_RECV_WR EHCA_BMASK_IBM(16,31) +#define H_ALL_RES_QP_MAX_SEND_SGE EHCA_BMASK_IBM(32,39) +#define H_ALL_RES_QP_MAX_RECV_SGE EHCA_BMASK_IBM(40,47) + +#define H_ALL_RES_QP_ACT_OUTST_SEND_WR EHCA_BMASK_IBM(16,31) +#define H_ALL_RES_QP_ACT_OUTST_RECV_WR EHCA_BMASK_IBM(48,63) +#define H_ALL_RES_QP_ACT_SEND_SGE EHCA_BMASK_IBM(8,15) +#define H_ALL_RES_QP_ACT_RECV_SGE EHCA_BMASK_IBM(24,31) + +#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES EHCA_BMASK_IBM(0,31) +#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES EHCA_BMASK_IBM(32,63) + +/* direct access qp controls */ +#define DAQP_CTRL_ENABLE 0x01 +#define DAQP_CTRL_SEND_COMP 0x20 +#define DAQP_CTRL_RECV_COMP 0x40 + +static u32 get_longbusy_msecs(int longbusy_rc) +{ + switch (longbusy_rc) { + case H_LONG_BUSY_ORDER_1_MSEC: + return 1; + case H_LONG_BUSY_ORDER_10_MSEC: + return 10; + case H_LONG_BUSY_ORDER_100_MSEC: + return 100; + case H_LONG_BUSY_ORDER_1_SEC: + return 1000; + case H_LONG_BUSY_ORDER_10_SEC: + return 10000; + case H_LONG_BUSY_ORDER_100_SEC: + return 100000; + default: + return 1; + } +} + +static long ehca_hcall_7arg_7ret(unsigned long opcode, + unsigned long arg1, + unsigned long arg2, + unsigned long arg3, + unsigned long arg4, + unsigned long arg5, + unsigned long arg6, + unsigned long arg7, + unsigned long *out1, + unsigned long *out2, + unsigned long *out3, + unsigned long *out4, + unsigned long *out5, + unsigned long *out6, + unsigned long *out7) +{ + long ret = H_SUCCESS; + int i, sleep_msecs; + + EDEB_EN(7, "opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx arg5=%lx" + " arg6=%lx arg7=%lx", opcode, arg1, arg2, arg3, arg4, arg5, + arg6, arg7); + + for (i = 0; i < 5; i++) { + ret = plpar_hcall_7arg_7ret(opcode, + arg1, arg2, arg3, arg4, + arg5, arg6, arg7, + out1, out2, out3, out4, + out5, out6,out7); + + if (H_IS_LONG_BUSY(ret)) { + sleep_msecs = get_longbusy_msecs(ret); + msleep_interruptible(sleep_msecs); + continue; + } + + if (ret < H_SUCCESS) + EDEB_ERR(4, "opcode=%lx ret=%lx" + " arg1=%lx arg2=%lx arg3=%lx arg4=%lx" + " arg5=%lx arg6=%lx arg7=%lx" + " out1=%lx out2=%lx out3=%lx out4=%lx" + " out5=%lx out6=%lx out7=%lx", + opcode, ret, + arg1, arg2, arg3, arg4, + arg5, arg6, arg7, + *out1, *out2, *out3, *out4, + *out5, *out6, *out7); + + EDEB_EX(7, "opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx " + "out4=%lx out5=%lx out6=%lx out7=%lx", + opcode, ret, *out1, *out2, *out3, *out4, *out5, + *out6, *out7); + return ret; + } + + EDEB_EX(7, "opcode=%lx ret=H_BUSY", opcode); + + return H_BUSY; +} + +static long ehca_hcall_9arg_9ret(unsigned long opcode, + unsigned long arg1, + unsigned long arg2, + unsigned long arg3, + unsigned long arg4, + unsigned long arg5, + unsigned long arg6, + unsigned long arg7, + unsigned long arg8, + unsigned long arg9, + unsigned long *out1, + unsigned long *out2, + unsigned long *out3, + unsigned long *out4, + unsigned long *out5, + unsigned long *out6, + unsigned long *out7, + unsigned long *out8, + unsigned long *out9) +{ + long ret = H_SUCCESS; + int i, sleep_msecs; + + EDEB_EN(7, "opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx " + "arg5=%lx arg6=%lx arg7=%lx arg8=%lx arg9=%lx", + opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7, + arg8, arg9); + + + for (i = 0; i < 5; i++) { + ret = plpar_hcall_9arg_9ret(opcode, + arg1, arg2, arg3, arg4, + arg5, arg6, arg7, arg8, + arg9, + out1, out2, out3, out4, + out5, out6, out7, out8, + out9); + + if (H_IS_LONG_BUSY(ret)) { + sleep_msecs = get_longbusy_msecs(ret); + msleep_interruptible(sleep_msecs); + continue; + } + + if (ret < H_SUCCESS) + EDEB_ERR(4, "opcode=%lx ret=%lx" + " arg1=%lx arg2=%lx arg3=%lx arg4=%lx" + " arg5=%lx arg6=%lx arg7=%lx arg8=%lx" + " arg9=%lx" + " out1=%lx out2=%lx out3=%lx out4=%lx" + " out5=%lx out6=%lx out7=%lx out8=%lx" + " out9=%lx", + opcode, ret, + arg1, arg2, arg3, arg4, + arg5, arg6, arg7, arg8, + arg9, + *out1, *out2, *out3, *out4, + *out5, *out6, *out7, *out8, + *out9); + + EDEB_EX(7, "opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx " + "out4=%lx out5=%lx out6=%lx out7=%lx out8=%lx out9=%lx", + opcode, ret,*out1, *out2, *out3, *out4, *out5, *out6, + *out7, *out8, *out9); + return ret; + + } + + EDEB_EX(7, "opcode=%lx ret=H_BUSY", opcode); + return H_BUSY; +} + +u64 hipz_h_alloc_resource_eq(const struct ipz_adapter_handle adapter_handle, + struct ehca_pfeq *pfeq, + const u32 neq_control, + const u32 number_of_entries, + struct ipz_eq_handle *eq_handle, + u32 * act_nr_of_entries, + u32 * act_pages, + u32 * eq_ist) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u64 act_nr_of_entries_out = 0; + u64 act_pages_out = 0; + u64 eq_ist_out = 0; + u64 allocate_controls = 0; + u32 x = (u64)(&x); + + EDEB_EN(7, "pfeq=%p adapter_handle=%lx new_control=%x" + " number_of_entries=%x", + pfeq, adapter_handle.handle, neq_control, + number_of_entries); + + /* resource type */ + allocate_controls = 3ULL; + + /* ISN is associated */ + if (neq_control != 1) + allocate_controls = (1ULL << (63 - 7)) | allocate_controls; + else /* notification event queue */ + allocate_controls = (1ULL << 63) | allocate_controls; + + ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, + adapter_handle.handle, /* r4 */ + allocate_controls, /* r5 */ + number_of_entries, /* r6 */ + 0, 0, 0, 0, + &eq_handle->handle, /* r4 */ + &dummy, /* r5 */ + &dummy, /* r6 */ + &act_nr_of_entries_out, /* r7 */ + &act_pages_out, /* r8 */ + &eq_ist_out, /* r8 */ + &dummy); + + *act_nr_of_entries = (u32)act_nr_of_entries_out; + *act_pages = (u32)act_pages_out; + *eq_ist = (u32)eq_ist_out; + + if (ret == H_NOT_ENOUGH_RESOURCES) + EDEB_ERR(4, "Not enough resource - ret=%lx ", ret); + + EDEB_EX(7, "act_nr_of_entries=%x act_pages=%x eq_ist=%x", + *act_nr_of_entries, *act_pages, *eq_ist); + + return ret; +} + +u64 hipz_h_reset_event(const struct ipz_adapter_handle adapter_handle, + struct ipz_eq_handle eq_handle, + const u64 event_mask) +{ + u64 ret = H_SUCCESS; + u64 dummy; + + EDEB_EN(7, "eq_handle=%lx, adapter_handle=%lx event_mask=%lx", + eq_handle.handle, adapter_handle.handle, event_mask); + + ret = ehca_hcall_7arg_7ret(H_RESET_EVENTS, + adapter_handle.handle, /* r4 */ + eq_handle.handle, /* r5 */ + event_mask, /* r6 */ + 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + + EDEB(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_alloc_resource_cq(const struct ipz_adapter_handle adapter_handle, + struct ehca_cq *cq, + struct ehca_alloc_cq_parms *param) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u64 act_nr_of_entries_out; + u64 act_pages_out; + u64 g_la_privileged_out; + u64 g_la_user_out; + + EDEB_EN(7, "Adapter_handle=%lx eq_handle=%lx cq_token=%x" + " cq_number_of_entries=%x", + adapter_handle.handle, param->eq_handle.handle, + cq->token, param->nr_cqe); + + ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, + adapter_handle.handle, /* r4 */ + 2, /* r5 */ + param->eq_handle.handle, /* r6 */ + cq->token, /* r7 */ + param->nr_cqe, /* r8 */ + 0, 0, + &cq->ipz_cq_handle.handle, /* r4 */ + &dummy, /* r5 */ + &dummy, /* r6 */ + &act_nr_of_entries_out, /* r7 */ + &act_pages_out, /* r8 */ + &g_la_privileged_out, /* r9 */ + &g_la_user_out); /* r10 */ + + param->act_nr_of_entries = (u32)act_nr_of_entries_out; + param->act_pages = (u32)act_pages_out; + + if (ret == H_SUCCESS) + hcp_galpas_ctor(&cq->galpas, g_la_privileged_out, g_la_user_out); + + if (ret == H_NOT_ENOUGH_RESOURCES) + EDEB_ERR(4, "Not enough resources. ret=%lx", ret); + + EDEB_EX(7, "cq_handle=%lx act_nr_of_entries=%x act_pages=%x", + cq->ipz_cq_handle.handle, param->act_nr_of_entries, param->act_pages); + + return ret; +} + +u64 hipz_h_alloc_resource_qp(const struct ipz_adapter_handle adapter_handle, + struct ehca_qp *qp, + struct ehca_alloc_qp_parms *parms) +{ + u64 ret = H_SUCCESS; + u64 allocate_controls; + u64 max_r10_reg; + u64 dummy = 0; + u64 qp_nr_out = 0; + u64 r6_out = 0; + u64 r7_out = 0; + u64 r8_out = 0; + u64 g_la_user_out = 0; + u64 r11_out = 0; + u16 max_nr_receive_wqes = qp->init_attr.cap.max_recv_wr + 1; + u16 max_nr_send_wqes = qp->init_attr.cap.max_send_wr + 1; + int daqp_ctrl = parms->daqp_ctrl; + + EDEB_EN(7, "Adapter_handle=%lx servicetype=%x signalingtype=%x" + " ud_av_l_key=%x send_cq_handle=%lx receive_cq_handle=%lx" + " async_eq_handle=%lx qp_token=%x pd=%x max_nr_send_wqes=%x" + " max_nr_receive_wqes=%x max_nr_send_sges=%x" + " max_nr_receive_sges=%x ud_av_l_key=%x galpa.pid=%x", + adapter_handle.handle, parms->servicetype, parms->sigtype, + parms->ud_av_l_key_ctl, qp->send_cq->ipz_cq_handle.handle, + qp->recv_cq->ipz_cq_handle.handle, parms->ipz_eq_handle.handle, + qp->token, parms->pd.value, max_nr_send_wqes, + max_nr_receive_wqes, parms->max_send_sge, parms->max_recv_sge, + parms->ud_av_l_key_ctl, qp->galpas.pid); + + allocate_controls = + EHCA_BMASK_SET(H_ALL_RES_QP_ENHANCED_OPS, + (daqp_ctrl & DAQP_CTRL_ENABLE) ? 1 : 0) + | EHCA_BMASK_SET(H_ALL_RES_QP_PTE_PIN, 0) + | EHCA_BMASK_SET(H_ALL_RES_QP_SERVICE_TYPE, parms->servicetype) + | EHCA_BMASK_SET(H_ALL_RES_QP_SIGNALING_TYPE, parms->sigtype) + | EHCA_BMASK_SET(H_ALL_RES_QP_LL_RQ_CQE_POSTING, + (daqp_ctrl & DAQP_CTRL_RECV_COMP) ? 1 : 0) + | EHCA_BMASK_SET(H_ALL_RES_QP_LL_SQ_CQE_POSTING, + (daqp_ctrl & DAQP_CTRL_SEND_COMP) ? 1 : 0) + | EHCA_BMASK_SET(H_ALL_RES_QP_UD_AV_LKEY_CTRL, + parms->ud_av_l_key_ctl) + | EHCA_BMASK_SET(H_ALL_RES_QP_RESOURCE_TYPE, 1); + + max_r10_reg = + EHCA_BMASK_SET(H_ALL_RES_QP_MAX_OUTST_SEND_WR, + max_nr_send_wqes) + | EHCA_BMASK_SET(H_ALL_RES_QP_MAX_OUTST_RECV_WR, + max_nr_receive_wqes) + | EHCA_BMASK_SET(H_ALL_RES_QP_MAX_SEND_SGE, + parms->max_send_sge) + | EHCA_BMASK_SET(H_ALL_RES_QP_MAX_RECV_SGE, + parms->max_recv_sge); + + + ret = ehca_hcall_9arg_9ret(H_ALLOC_RESOURCE, + adapter_handle.handle, /* r4 */ + allocate_controls, /* r5 */ + qp->send_cq->ipz_cq_handle.handle, + qp->recv_cq->ipz_cq_handle.handle, + parms->ipz_eq_handle.handle, + ((u64)qp->token << 32) | parms->pd.value, + max_r10_reg, /* r10 */ + parms->ud_av_l_key_ctl, /* r11 */ + 0, + &qp->ipz_qp_handle.handle, + &qp_nr_out, /* r5 */ + &r6_out, /* r6 */ + &r7_out, /* r7 */ + &r8_out, /* r8 */ + &dummy, /* r9 */ + &g_la_user_out, /* r10 */ + &r11_out, + &dummy); + + /* extract outputs */ + qp->real_qp_num = (u32)qp_nr_out; + + parms->act_nr_send_sges = + (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, r6_out); + parms->act_nr_recv_wqes = + (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, r6_out); + parms->act_nr_send_sges = + (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, r7_out); + parms->act_nr_recv_sges = + (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, r7_out); + parms->nr_sq_pages = + (u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, r8_out); + parms->nr_rq_pages = + (u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, r8_out); + + if (ret == H_SUCCESS) + hcp_galpas_ctor(&qp->galpas, g_la_user_out, g_la_user_out); + + if (ret == H_NOT_ENOUGH_RESOURCES) + EDEB_ERR(4, "Not enough resources. ret=%lx",ret); + + EDEB_EX(7, "qp_nr=%x act_nr_send_wqes=%x" + " act_nr_receive_wqes=%x act_nr_send_sges=%x" + " act_nr_receive_sges=%x nr_sq_pages=%x" + " nr_rq_pages=%x galpa.user=%lx galpa.kernel=%lx", + qp->real_qp_num, parms->act_nr_send_wqes, + parms->act_nr_recv_wqes, parms->act_nr_send_sges, + parms->act_nr_recv_sges, parms->nr_sq_pages, + parms->nr_rq_pages, qp->galpas.user.fw_handle, + qp->galpas.kernel.fw_handle); + + return ret; +} + +u64 hipz_h_query_port(const struct ipz_adapter_handle adapter_handle, + const u8 port_id, + struct hipz_query_port *query_port_response_block) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u64 r_cb; + + EDEB_EN(7, "adapter_handle=%lx port_id %x", + adapter_handle.handle, port_id); + + if (((u64)query_port_response_block) & 0xfff) { + EDEB_ERR(4, "response block not page aligned"); + ret = H_PARAMETER; + return ret; + } + + r_cb = virt_to_abs(query_port_response_block); + + ret = ehca_hcall_7arg_7ret(H_QUERY_PORT, + adapter_handle.handle, /* r4 */ + port_id, /* r5 */ + r_cb, /* r6 */ + 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + + EDEB_DMP(7, query_port_response_block, 64, "query_port_response_block"); + EDEB(7, "offset31=%x offset35=%x offset36=%x", + ((u32*)query_port_response_block)[32], + ((u32*)query_port_response_block)[36], + ((u32*)query_port_response_block)[37]); + EDEB(7, "offset200=%x offset201=%x offset202=%x " + "offset203=%x", + ((u32*)query_port_response_block)[0x200], + ((u32*)query_port_response_block)[0x201], + ((u32*)query_port_response_block)[0x202], + ((u32*)query_port_response_block)[0x203]); + + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_query_hca(const struct ipz_adapter_handle adapter_handle, + struct hipz_query_hca *query_hca_rblock) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u64 r_cb; + EDEB_EN(7, "adapter_handle=%lx", adapter_handle.handle); + + if (((u64)query_hca_rblock) & 0xfff) { + EDEB_ERR(4, "response block not page aligned"); + ret = H_PARAMETER; + return ret; + } + + r_cb = virt_to_abs(query_hca_rblock); + + ret = ehca_hcall_7arg_7ret(H_QUERY_HCA, + adapter_handle.handle, /* r4 */ + r_cb, /* r5 */ + 0, 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + + EDEB(7, "offset0=%x offset1=%x offset2=%x offset3=%x", + ((u32*)query_hca_rblock)[0], + ((u32*)query_hca_rblock)[1], + ((u32*)query_hca_rblock)[2], ((u32*)query_hca_rblock)[3]); + EDEB(7, "offset4=%x offset5=%x offset6=%x offset7=%x", + ((u32*)query_hca_rblock)[4], + ((u32*)query_hca_rblock)[5], + ((u32*)query_hca_rblock)[6], ((u32*)query_hca_rblock)[7]); + EDEB(7, "offset8=%x offset9=%x offseta=%x offsetb=%x", + ((u32*)query_hca_rblock)[8], + ((u32*)query_hca_rblock)[9], + ((u32*)query_hca_rblock)[10], ((u32*)query_hca_rblock)[11]); + EDEB(7, "offsetc=%x offsetd=%x offsete=%x offsetf=%x", + ((u32*)query_hca_rblock)[12], + ((u32*)query_hca_rblock)[13], + ((u32*)query_hca_rblock)[14], ((u32*)query_hca_rblock)[15]); + EDEB(7, "offset136=%x offset192=%x offset204=%x", + ((u32*)query_hca_rblock)[32], + ((u32*)query_hca_rblock)[48], ((u32*)query_hca_rblock)[51]); + EDEB(7, "offset231=%x offset235=%x", + ((u32*)query_hca_rblock)[57], ((u32*)query_hca_rblock)[58]); + EDEB(7, "offset200=%x offset201=%x offset202=%x offset203=%x", + ((u32*)query_hca_rblock)[0x201], + ((u32*)query_hca_rblock)[0x202], + ((u32*)query_hca_rblock)[0x203], + ((u32*)query_hca_rblock)[0x204]); + + EDEB_EX(7, "ret=%lx adapter_handle=%lx", + ret, adapter_handle.handle); + + return ret; +} + +u64 hipz_h_register_rpage(const struct ipz_adapter_handle adapter_handle, + const u8 pagesize, + const u8 queue_type, + const u64 resource_handle, + const u64 logical_address_of_page, + u64 count) +{ + u64 ret = H_SUCCESS; + u64 dummy; + + EDEB_EN(7, "adapter_handle=%lx pagesize=%x queue_type=%x" + " resource_handle=%lx logical_address_of_page=%lx count=%lx", + adapter_handle.handle, pagesize, queue_type, + resource_handle, logical_address_of_page, count); + + ret = ehca_hcall_7arg_7ret(H_REGISTER_RPAGES, + adapter_handle.handle, /* r4 */ + queue_type | pagesize << 8, /* r5 */ + resource_handle, /* r6 */ + logical_address_of_page, /* r7 */ + count, /* r8 */ + 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_register_rpage_eq(const struct ipz_adapter_handle adapter_handle, + const struct ipz_eq_handle eq_handle, + struct ehca_pfeq *pfeq, + const u8 pagesize, + const u8 queue_type, + const u64 logical_address_of_page, + const u64 count) +{ + u64 ret = H_SUCCESS; + + EDEB_EN(7, "pfeq=%p adapter_handle=%lx eq_handle=%lx pagesize=%x" + " queue_type=%x logical_address_of_page=%lx count=%lx", + pfeq, adapter_handle.handle, eq_handle.handle, pagesize, + queue_type,logical_address_of_page, count); + + if (count != 1) { + EDEB_ERR(4, "Ppage counter=%lx", count); + return H_PARAMETER; + } + ret = hipz_h_register_rpage(adapter_handle, + pagesize, + queue_type, + eq_handle.handle, + logical_address_of_page, count); + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} + +u32 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle, + u32 ist) +{ + u32 ret = H_SUCCESS; + u64 dummy = 0; + + EDEB_EN(7, "ist=%x", ist); + + ret = ehca_hcall_7arg_7ret(H_QUERY_INT_STATE, + adapter_handle.handle, /* r4 */ + ist, /* r5 */ + 0, 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + + if (ret != H_SUCCESS && ret != H_BUSY) + EDEB_ERR(4, "Could not query interrupt state."); + + EDEB_EX(7, "interrupt state: %x", ret); + + return ret; +} + +u64 hipz_h_register_rpage_cq(const struct ipz_adapter_handle adapter_handle, + const struct ipz_cq_handle cq_handle, + struct ehca_pfcq *pfcq, + const u8 pagesize, + const u8 queue_type, + const u64 logical_address_of_page, + const u64 count, + const struct h_galpa gal) +{ + u64 ret = H_SUCCESS; + + EDEB_EN(7, "pfcq=%p adapter_handle=%lx cq_handle=%lx pagesize=%x" + " queue_type=%x logical_address_of_page=%lx count=%lx", + pfcq, adapter_handle.handle, cq_handle.handle, pagesize, + queue_type, logical_address_of_page, count); + + if (count != 1) { + EDEB_ERR(4, "Page counter=%lx", count); + return H_PARAMETER; + } + + ret = hipz_h_register_rpage(adapter_handle, pagesize, queue_type, + cq_handle.handle, logical_address_of_page, + count); + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_register_rpage_qp(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct ehca_pfqp *pfqp, + const u8 pagesize, + const u8 queue_type, + const u64 logical_address_of_page, + const u64 count, + const struct h_galpa galpa) +{ + u64 ret = H_SUCCESS; + + EDEB_EN(7, "pfqp=%p adapter_handle=%lx qp_handle=%lx pagesize=%x" + " queue_type=%x logical_address_of_page=%lx count=%lx", + pfqp, adapter_handle.handle, qp_handle.handle, pagesize, + queue_type, logical_address_of_page, count); + + if (count != 1) { + EDEB_ERR(4, "Page counter=%lx", count); + return H_PARAMETER; + } + + ret = hipz_h_register_rpage(adapter_handle,pagesize,queue_type, + qp_handle.handle,logical_address_of_page, + count); + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_disable_and_get_wqe(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct ehca_pfqp *pfqp, + void **log_addr_next_sq_wqe2processed, + void **log_addr_next_rq_wqe2processed, + int dis_and_get_function_code) +{ + u64 ret = H_SUCCESS; + u8 function_code = 1; + u64 dummy, dummy1, dummy2; + + EDEB_EN(7, "pfqp=%p adapter_handle=%lx function=%x qp_handle=%lx", + pfqp, adapter_handle.handle, function_code, qp_handle.handle); + + if (!log_addr_next_sq_wqe2processed) + log_addr_next_sq_wqe2processed = (void**)&dummy1; + if (!log_addr_next_rq_wqe2processed) + log_addr_next_rq_wqe2processed = (void**)&dummy2; + + ret = ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC, + adapter_handle.handle, /* r4 */ + dis_and_get_function_code, /* r5 */ + qp_handle.handle, /* r6 */ + 0, 0, 0, 0, + (void*)log_addr_next_sq_wqe2processed, + (void*)log_addr_next_rq_wqe2processed, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + EDEB_EX(7, "ret=%lx ladr_next_rq_wqe_out=%p" + " ladr_next_sq_wqe_out=%p", ret, + *log_addr_next_sq_wqe2processed, + *log_addr_next_rq_wqe2processed); + + return ret; +} + +u64 hipz_h_modify_qp(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct ehca_pfqp *pfqp, + const u64 update_mask, + struct hcp_modify_qp_control_block *mqpcb, + struct h_galpa gal) +{ + u64 ret = H_SUCCESS; + u64 invalid_attribute_identifier = 0; + u64 rc_attrib_mask = 0; + u64 dummy; + u64 r_cb; + EDEB_EN(7, "pfqp=%p adapter_handle=%lx qp_handle=%lx" + " update_mask=%lx qp_state=%x mqpcb=%p", + pfqp, adapter_handle.handle, qp_handle.handle, + update_mask, mqpcb->qp_state, mqpcb); + + r_cb = virt_to_abs(mqpcb); + ret = ehca_hcall_7arg_7ret(H_MODIFY_QP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + update_mask, /* r6 */ + r_cb, /* r7 */ + 0, 0, 0, + &invalid_attribute_identifier, /* r4 */ + &dummy, /* r5 */ + &dummy, /* r6 */ + &dummy, /* r7 */ + &dummy, /* r8 */ + &rc_attrib_mask, /* r9 */ + &dummy); + if (ret == H_NOT_ENOUGH_RESOURCES) + EDEB_ERR(4, "Insufficient resources ret=%lx", ret); + + EDEB_EX(7, "ret=%lx invalid_attribute_identifier=%lx" + " invalid_attribute_MASK=%lx", ret, + invalid_attribute_identifier, rc_attrib_mask); + + return ret; +} + +u64 hipz_h_query_qp(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct ehca_pfqp *pfqp, + struct hcp_modify_qp_control_block *qqpcb, + struct h_galpa gal) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u64 r_cb; + EDEB_EN(7, "adapter_handle=%lx qp_handle=%lx", + adapter_handle.handle, qp_handle.handle); + + r_cb = virt_to_abs(qqpcb); + EDEB(7, "r_cb=%lx", r_cb); + + ret = ehca_hcall_7arg_7ret(H_QUERY_QP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + r_cb, /* r6 */ + 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_destroy_qp(const struct ipz_adapter_handle adapter_handle, + struct ehca_qp *qp) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u64 ladr_next_sq_wqe_out; + u64 ladr_next_rq_wqe_out; + + EDEB_EN(7, "qp=%p ipz_qp_handle=%lx adapter_handle=%lx", + qp, qp->ipz_qp_handle.handle, adapter_handle.handle); + + ret = hcp_galpas_dtor(&qp->galpas); + if (ret) { + EDEB_ERR(4, "Could not destruct qp->galpas"); + return H_RESOURCE; + } + ret = ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC, + adapter_handle.handle, /* r4 */ + /* function code */ + 1, /* r5 */ + qp->ipz_qp_handle.handle, /* r6 */ + 0, 0, 0, 0, + &ladr_next_sq_wqe_out, /* r4 */ + &ladr_next_rq_wqe_out, /* r5 */ + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + if (ret == H_HARDWARE) + EDEB_ERR(4, "HCA not operational. ret=%lx", ret); + + ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + qp->ipz_qp_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + + if (ret == H_RESOURCE) + EDEB_ERR(4, "Resource still in use. ret=%lx", ret); + + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_define_aqp0(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct h_galpa gal, + u32 port) +{ + u64 ret = H_SUCCESS; + u64 dummy; + + EDEB_EN(7, "port=%x ipz_qp_handle=%lx adapter_handle=%lx", + port, qp_handle.handle, adapter_handle.handle); + + ret = ehca_hcall_7arg_7ret(H_DEFINE_AQP0, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + port, /* r6 */ + 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_define_aqp1(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct h_galpa gal, + u32 port, u32 * pma_qp_nr, + u32 * bma_qp_nr) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u64 pma_qp_nr_out; + u64 bma_qp_nr_out; + + EDEB_EN(7, "port=%x qp_handle=%lx adapter_handle=%lx", + port, qp_handle.handle, adapter_handle.handle); + + ret = ehca_hcall_7arg_7ret(H_DEFINE_AQP1, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + port, /* r6 */ + 0, 0, 0, 0, + &pma_qp_nr_out, /* r4 */ + &bma_qp_nr_out, /* r5 */ + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + + *pma_qp_nr = (u32)pma_qp_nr_out; + *bma_qp_nr = (u32)bma_qp_nr_out; + + if (ret == H_ALIAS_EXIST) + EDEB_ERR(4, "AQP1 already exists. ret=%lx", ret); + + EDEB_EX(7, "ret=%lx pma_qp_nr=%i bma_qp_nr=%i", + ret, (int)*pma_qp_nr, (int)*bma_qp_nr); + + return ret; +} + +u64 hipz_h_attach_mcqp(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct h_galpa gal, + u16 mcg_dlid, + u64 subnet_prefix, u64 interface_id) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u8 *dgid_sp = (u8*)&subnet_prefix; + u8 *dgid_ii = (u8*)&interface_id; + + EDEB_EN(7, "qp_handle=%lx adapter_handle=%lx\nMCG_DGID =" + " %d.%d.%d.%d.%d.%d.%d.%d." + " %d.%d.%d.%d.%d.%d.%d.%d", + qp_handle.handle, adapter_handle.handle, + dgid_sp[0], dgid_sp[1], + dgid_sp[2], dgid_sp[3], + dgid_sp[4], dgid_sp[5], + dgid_sp[6], dgid_sp[7], + dgid_ii[0], dgid_ii[1], + dgid_ii[2], dgid_ii[3], + dgid_ii[4], dgid_ii[5], + dgid_ii[6], dgid_ii[7]); + + ret = ehca_hcall_7arg_7ret(H_ATTACH_MCQP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + mcg_dlid, /* r6 */ + interface_id, /* r7 */ + subnet_prefix, /* r8 */ + 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + + if (ret == H_NOT_ENOUGH_RESOURCES) + EDEB_ERR(4, "Not enough resources. ret=%lx", ret); + + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_detach_mcqp(const struct ipz_adapter_handle adapter_handle, + const struct ipz_qp_handle qp_handle, + struct h_galpa gal, + u16 mcg_dlid, + u64 subnet_prefix, u64 interface_id) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u8 *dgid_sp = (u8*)&subnet_prefix; + u8 *dgid_ii = (u8*)&interface_id; + + EDEB_EN(7, "qp_handle=%lx adapter_handle=%lx\nMCG_DGID =" + " %d.%d.%d.%d.%d.%d.%d.%d." + " %d.%d.%d.%d.%d.%d.%d.%d", + qp_handle.handle, adapter_handle.handle, + dgid_sp[0], dgid_sp[1], + dgid_sp[2], dgid_sp[3], + dgid_sp[4], dgid_sp[5], + dgid_sp[6], dgid_sp[7], + dgid_ii[0], dgid_ii[1], + dgid_ii[2], dgid_ii[3], + dgid_ii[4], dgid_ii[5], + dgid_ii[6], dgid_ii[7]); + ret = ehca_hcall_7arg_7ret(H_DETACH_MCQP, + adapter_handle.handle, /* r4 */ + qp_handle.handle, /* r5 */ + mcg_dlid, /* r6 */ + interface_id, /* r7 */ + subnet_prefix, /* r8 */ + 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + + EDEB(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_destroy_cq(const struct ipz_adapter_handle adapter_handle, + struct ehca_cq *cq, + u8 force_flag) +{ + u64 ret = H_SUCCESS; + u64 dummy; + + EDEB_EN(7, "cq->pf=%p cq=.%p ipz_cq_handle=%lx adapter_handle=%lx", + &cq->pf, cq, cq->ipz_cq_handle.handle, adapter_handle.handle); + + ret = hcp_galpas_dtor(&cq->galpas); + if (ret) { + EDEB_ERR(4, "Could not destruct cp->galpas"); + return H_RESOURCE; + } + + ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + cq->ipz_cq_handle.handle, /* r5 */ + force_flag != 0 ? 1L : 0L, /* r6 */ + 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + + if (ret == H_RESOURCE) + EDEB(4, "ret=%lx ", ret); + + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_destroy_eq(const struct ipz_adapter_handle adapter_handle, + struct ehca_eq *eq) +{ + u64 ret = H_SUCCESS; + u64 dummy; + + EDEB_EN(7, "eq->pf=%p eq=%p ipz_eq_handle=%lx adapter_handle=%lx", + &eq->pf, eq, eq->ipz_eq_handle.handle, + adapter_handle.handle); + + ret = hcp_galpas_dtor(&eq->galpas); + if (ret) { + EDEB_ERR(4, "Could not destruct eq->galpas"); + return H_RESOURCE; + } + + ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + eq->ipz_eq_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + + + if (ret == H_RESOURCE) + EDEB_ERR(4, "Resource in use. ret=%lx ", ret); + + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_alloc_resource_mr(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mr *mr, + const u64 vaddr, + const u64 length, + const u32 access_ctrl, + const struct ipz_pd pd, + struct ehca_mr_hipzout_parms *outparms) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u64 lkey_out; + u64 rkey_out; + + EDEB_EN(7, "adapter_handle=%lx mr=%p vaddr=%lx length=%lx" + " access_ctrl=%x pd=%x", + adapter_handle.handle, mr, vaddr, length, access_ctrl, + pd.value); + + ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, + adapter_handle.handle, /* r4 */ + 5, /* r5 */ + vaddr, /* r6 */ + length, /* r7 */ + (((u64)access_ctrl) << 32ULL), /* r8 */ + pd.value, /* r9 */ + 0, + &(outparms->handle.handle), /* r4 */ + &dummy, /* r5 */ + &lkey_out, /* r6 */ + &rkey_out, /* r7 */ + &dummy, + &dummy, + &dummy); + outparms->lkey = (u32)lkey_out; + outparms->rkey = (u32)rkey_out; + + EDEB_EX(7, "ret=%lx mr_handle=%lx lkey=%x rkey=%x", + ret, outparms->handle.handle, outparms->lkey, outparms->rkey); + + return ret; +} + +u64 hipz_h_register_rpage_mr(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mr *mr, + const u8 pagesize, + const u8 queue_type, + const u64 logical_address_of_page, + const u64 count) +{ + u64 ret = H_SUCCESS; + + EDEB_EN(7, "adapter_handle=%lx mr=%p mr_handle=%lx pagesize=%x" + " queue_type=%x logical_address_of_page=%lx count=%lx", + adapter_handle.handle, mr, mr->ipz_mr_handle.handle, pagesize, + queue_type, logical_address_of_page, count); + + if ((count > 1) && (logical_address_of_page & 0xfff)) { + EDEB_ERR(4, "logical_address_of_page not on a 4k boundary " + "adapter_handle=%lx mr=%p mr_handle=%lx " + "pagesize=%x queue_type=%x logical_address_of_page=%lx" + " count=%lx", + adapter_handle.handle, mr, mr->ipz_mr_handle.handle, + pagesize, queue_type, logical_address_of_page, count); + ret = H_PARAMETER; + } else + ret = hipz_h_register_rpage(adapter_handle, pagesize, + queue_type, + mr->ipz_mr_handle.handle, + logical_address_of_page, count); + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_query_mr(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mr *mr, + struct ehca_mr_hipzout_parms *outparms) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u64 remote_len_out; + u64 remote_vaddr_out; + u64 acc_ctrl_pd_out; + u64 r9_out; + + EDEB_EN(7, "adapter_handle=%lx mr=%p mr_handle=%lx", + adapter_handle.handle, mr, mr->ipz_mr_handle.handle); + + ret = ehca_hcall_7arg_7ret(H_QUERY_MR, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, + &outparms->len, /* r4 */ + &outparms->vaddr, /* r5 */ + &remote_len_out, /* r6 */ + &remote_vaddr_out, /* r7 */ + &acc_ctrl_pd_out, /* r8 */ + &r9_out, + &dummy); + + outparms->acl = acc_ctrl_pd_out >> 32; + outparms->lkey = (u32)(r9_out >> 32); + outparms->rkey = (u32)(r9_out & (0xffffffff)); + + EDEB_EX(7, "ret=%lx mr_local_length=%lx mr_local_vaddr=%lx " + "mr_remote_length=%lx mr_remote_vaddr=%lx access_ctrl=%x " + "pd=%x lkey=%x rkey=%x", ret, outparms->len, + outparms->vaddr, remote_len_out, remote_vaddr_out, + outparms->acl, outparms->acl, outparms->lkey, outparms->rkey); + + return ret; +} + +u64 hipz_h_free_resource_mr(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mr *mr) +{ + u64 ret = H_SUCCESS; + u64 dummy; + + EDEB_EN(7, "adapter_handle=%lx mr=%p mr_handle=%lx", + adapter_handle.handle, mr, mr->ipz_mr_handle.handle); + + ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_reregister_pmr(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mr *mr, + const u64 vaddr_in, + const u64 length, + const u32 access_ctrl, + const struct ipz_pd pd, + const u64 mr_addr_cb, + struct ehca_mr_hipzout_parms *outparms) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u64 lkey_out; + u64 rkey_out; + + EDEB_EN(7, "adapter_handle=%lx mr=%p mr_handle=%lx vaddr_in=%lx " + "length=%lx access_ctrl=%x pd=%x mr_addr_cb=%lx", + adapter_handle.handle, mr, mr->ipz_mr_handle.handle, vaddr_in, + length, access_ctrl, pd.value, mr_addr_cb); + + ret = ehca_hcall_7arg_7ret(H_REREGISTER_PMR, + adapter_handle.handle, /* r4 */ + mr->ipz_mr_handle.handle, /* r5 */ + vaddr_in, /* r6 */ + length, /* r7 */ + /* r8 */ + ((((u64)access_ctrl) << 32ULL) | pd.value), + mr_addr_cb, /* r9 */ + 0, + &dummy, /* r4 */ + &outparms->vaddr, /* r5 */ + &lkey_out, /* r6 */ + &rkey_out, /* r7 */ + &dummy, + &dummy, + &dummy); + + outparms->lkey = (u32)lkey_out; + outparms->rkey = (u32)rkey_out; + + EDEB_EX(7, "ret=%lx vaddr=%lx lkey=%x rkey=%x", + ret, outparms->vaddr, outparms->lkey, outparms->rkey); + return ret; +} + +u64 hipz_h_register_smr(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mr *mr, + const struct ehca_mr *orig_mr, + const u64 vaddr_in, + const u32 access_ctrl, + const struct ipz_pd pd, + struct ehca_mr_hipzout_parms *outparms) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u64 lkey_out; + u64 rkey_out; + + EDEB_EN(7, "adapter_handle=%lx orig_mr=%p orig_mr_handle=%lx " + "vaddr_in=%lx access_ctrl=%x pd=%x", adapter_handle.handle, + orig_mr, orig_mr->ipz_mr_handle.handle, vaddr_in, access_ctrl, + pd.value); + + + ret = ehca_hcall_7arg_7ret(H_REGISTER_SMR, + adapter_handle.handle, /* r4 */ + orig_mr->ipz_mr_handle.handle, /* r5 */ + vaddr_in, /* r6 */ + (((u64)access_ctrl) << 32ULL), /* r7 */ + pd.value, /* r8 */ + 0, 0, + &(outparms->handle.handle), /* r4 */ + &dummy, /* r5 */ + &lkey_out, /* r6 */ + &rkey_out, /* r7 */ + &dummy, + &dummy, + &dummy); + outparms->lkey = (u32)lkey_out; + outparms->rkey = (u32)rkey_out; + + EDEB_EX(7, "ret=%lx mr_handle=%lx lkey=%x rkey=%x", + ret, outparms->handle.handle, outparms->lkey, outparms->rkey); + + return ret; +} + +u64 hipz_h_alloc_resource_mw(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mw *mw, + const struct ipz_pd pd, + struct ehca_mw_hipzout_parms *outparms) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u64 rkey_out; + + EDEB_EN(7, "adapter_handle=%lx mw=%p pd=%x", + adapter_handle.handle, mw, pd.value); + + ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE, + adapter_handle.handle, /* r4 */ + 6, /* r5 */ + pd.value, /* r6 */ + 0, 0, 0, 0, + &(outparms->handle.handle), /* r4 */ + &dummy, /* r5 */ + &dummy, /* r6 */ + &rkey_out, /* r7 */ + &dummy, + &dummy, + &dummy); + + outparms->rkey = (u32)rkey_out; + + EDEB_EX(7, "ret=%lx mw_handle=%lx rkey=%x", + ret, outparms->handle.handle, outparms->rkey); + return ret; +} + +u64 hipz_h_query_mw(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mw *mw, + struct ehca_mw_hipzout_parms *outparms) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u64 pd_out; + u64 rkey_out; + + EDEB_EN(7, "adapter_handle=%lx mw=%p mw_handle=%lx", + adapter_handle.handle, mw, mw->ipz_mw_handle.handle); + + ret = ehca_hcall_7arg_7ret(H_QUERY_MW, + adapter_handle.handle, /* r4 */ + mw->ipz_mw_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, + &dummy, /* r4 */ + &dummy, /* r5 */ + &dummy, /* r6 */ + &rkey_out, /* r7 */ + &pd_out, /* r8 */ + &dummy, + &dummy); + outparms->rkey = (u32)rkey_out; + + EDEB_EX(7, "ret=%lx rkey=%x pd=%lx", ret, outparms->rkey, pd_out); + + return ret; +} + +u64 hipz_h_free_resource_mw(const struct ipz_adapter_handle adapter_handle, + const struct ehca_mw *mw) +{ + u64 ret = H_SUCCESS; + u64 dummy; + + EDEB_EN(7, "adapter_handle=%lx mw=%p mw_handle=%lx", + adapter_handle.handle, mw, mw->ipz_mw_handle.handle); + + ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE, + adapter_handle.handle, /* r4 */ + mw->ipz_mw_handle.handle, /* r5 */ + 0, 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} + +u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle, + const u64 ressource_handle, + void *rblock, + unsigned long *byte_count) +{ + u64 ret = H_SUCCESS; + u64 dummy; + u64 r_cb; + + EDEB_EN(7, "adapter_handle=%lx ressource_handle=%lx rblock=%p", + adapter_handle.handle, ressource_handle, rblock); + + if (((u64)rblock) & 0xfff) { + EDEB_ERR(4, "rblock not page aligned."); + ret = H_PARAMETER; + return ret; + } + + r_cb = virt_to_abs(rblock); + + ret = ehca_hcall_7arg_7ret(H_ERROR_DATA, + adapter_handle.handle, + ressource_handle, + r_cb, + 0, 0, 0, 0, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy, + &dummy); + + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} From schihei at de.ibm.com Mon May 15 10:42:38 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:42:38 +0200 Subject: [openib-general] [PATCH 11/16] ehca: queue pair Message-ID: <4468BD8E.7010102@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/ehca_qes.h | 274 +++++ drivers/infiniband/hw/ehca/ehca_qp.c | 1565 +++++++++++++++++++++++++++++++++ drivers/infiniband/hw/ehca/ehca_reqs.c | 683 ++++++++++++++ drivers/infiniband/hw/ehca/ehca_sqp.c | 123 ++ 4 files changed, 2645 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_qes.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_qes.h 2006-05-02 10:55:26.000000000 +0200 @@ -0,0 +1,274 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * Hardware request structures + * + * Authors: Waleri Fomin + * Reinhard Ernst + * Christoph Raisch + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + + +#ifndef _EHCA_QES_H_ +#define _EHCA_QES_H_ + +#include "ehca_tools.h" + +/** + * virtual scatter gather entry to specify remote adresses with length + */ +struct ehca_vsgentry { + u64 vaddr; + u32 lkey; + u32 length; +}; + +#define GRH_FLAG_MASK EHCA_BMASK_IBM(7,7) +#define GRH_IPVERSION_MASK EHCA_BMASK_IBM(0,3) +#define GRH_TCLASS_MASK EHCA_BMASK_IBM(4,12) +#define GRH_FLOWLABEL_MASK EHCA_BMASK_IBM(13,31) +#define GRH_PAYLEN_MASK EHCA_BMASK_IBM(32,47) +#define GRH_NEXTHEADER_MASK EHCA_BMASK_IBM(48,55) +#define GRH_HOPLIMIT_MASK EHCA_BMASK_IBM(56,63) + +/** + * Unreliable Datagram Address Vector Format + * see IBTA Vol1 chapter 8.3 Global Routing Header + */ +struct ehca_ud_av { + u8 sl; + u8 lnh; + u16 dlid; + u8 reserved1; + u8 reserved2; + u8 reserved3; + u8 slid_path_bits; + u8 reserved4; + u8 ipd; + u8 reserved5; + u8 pmtu; + u32 reserved6; + u64 reserved7; + union { + struct { + u64 word_0; /* always set to 6 */ + /*should be 0x1B for IB transport */ + u64 word_1; + u64 word_2; + u64 word_3; + u64 word_4; + } grh; + struct { + u32 wd_0; + u32 wd_1; + /* DWord_1 --> SGID */ + + u32 sgid_wd3; + /* bits 127 - 96 */ + + u32 sgid_wd2; + /* bits 95 - 64 */ + /* DWord_2 */ + + u32 sgid_wd1; + /* bits 63 - 32 */ + + u32 sgid_wd0; + /* bits 31 - 0 */ + /* DWord_3 --> DGID */ + + u32 dgid_wd3; + /* bits 127 - 96 + **/ + u32 dgid_wd2; + /* bits 95 - 64 + DWord_4 */ + u32 dgid_wd1; + /* bits 63 - 32 */ + + u32 dgid_wd0; + /* bits 31 - 0 */ + } grh_l; + }; +}; + +/* maximum number of sg entries allowed in a WQE */ +#define MAX_WQE_SG_ENTRIES 252 + +#define WQE_OPTYPE_SEND 0x80 +#define WQE_OPTYPE_RDMAREAD 0x40 +#define WQE_OPTYPE_RDMAWRITE 0x20 +#define WQE_OPTYPE_CMPSWAP 0x10 +#define WQE_OPTYPE_FETCHADD 0x08 +#define WQE_OPTYPE_BIND 0x04 + +#define WQE_WRFLAG_REQ_SIGNAL_COM 0x80 +#define WQE_WRFLAG_FENCE 0x40 +#define WQE_WRFLAG_IMM_DATA_PRESENT 0x20 +#define WQE_WRFLAG_SOLIC_EVENT 0x10 + +#define WQEF_CACHE_HINT 0x80 +#define WQEF_CACHE_HINT_RD_WR 0x40 +#define WQEF_TIMED_WQE 0x20 +#define WQEF_PURGE 0x08 +#define WQEF_HIGH_NIBBLE 0xF0 + +#define MW_BIND_ACCESSCTRL_R_WRITE 0x40 +#define MW_BIND_ACCESSCTRL_R_READ 0x20 +#define MW_BIND_ACCESSCTRL_R_ATOMIC 0x10 + +struct ehca_wqe { + u64 work_request_id; + u8 optype; + u8 wr_flag; + u16 pkeyi; + u8 wqef; + u8 nr_of_data_seg; + u16 wqe_provided_slid; + u32 destination_qp_number; + u32 resync_psn_sqp; + u32 local_ee_context_qkey; + u32 immediate_data; + union { + struct { + u64 remote_virtual_adress; + u32 rkey; + u32 reserved; + u64 atomic_1st_op_dma_len; + u64 atomic_2nd_op; + struct ehca_vsgentry sg_list[MAX_WQE_SG_ENTRIES]; + + } nud; + struct { + u64 ehca_ud_av_ptr; + u64 reserved1; + u64 reserved2; + u64 reserved3; + struct ehca_vsgentry sg_list[MAX_WQE_SG_ENTRIES]; + } ud_avp; + struct { + struct ehca_ud_av ud_av; + struct ehca_vsgentry sg_list[MAX_WQE_SG_ENTRIES - + 2]; + } ud_av; + struct { + u64 reserved0; + u64 reserved1; + u64 reserved2; + u64 reserved3; + struct ehca_vsgentry sg_list[MAX_WQE_SG_ENTRIES]; + } all_rcv; + + struct { + u64 reserved; + u32 rkey; + u32 old_rkey; + u64 reserved1; + u64 reserved2; + u64 virtual_address; + u32 reserved3; + u32 length; + u32 reserved4; + u16 reserved5; + u8 reserved6; + u8 lr_ctl; + u32 lkey; + u32 reserved7; + u64 reserved8; + u64 reserved9; + u64 reserved10; + u64 reserved11; + } bind; + struct { + u64 reserved12; + u64 reserved13; + u32 size; + u32 start; + } inline_data; + } u; + +}; + +#define WC_SEND_RECEIVE EHCA_BMASK_IBM(0,0) +#define WC_IMM_DATA EHCA_BMASK_IBM(1,1) +#define WC_GRH_PRESENT EHCA_BMASK_IBM(2,2) +#define WC_SE_BIT EHCA_BMASK_IBM(3,3) +#define WC_STATUS_ERROR_BIT 0x80000000 +#define WC_STATUS_REMOTE_ERROR_FLAGS 0x0000F800 +#define WC_STATUS_PURGE_BIT 0x10 + +struct ehca_cqe { + u64 work_request_id; + u8 optype; + u8 w_completion_flags; + u16 reserved1; + u32 nr_bytes_transferred; + u32 immediate_data; + u32 local_qp_number; + u8 freed_resource_count; + u8 service_level; + u16 wqe_count; + u32 qp_token; + u32 qkey_ee_token; + u32 remote_qp_number; + u16 dlid; + u16 rlid; + u16 reserved2; + u16 pkey_index; + u32 cqe_timestamp; + u32 wqe_timestamp; + u8 wqe_timestamp_valid; + u8 reserved3; + u8 reserved4; + u8 cqe_flags; + u32 status; +}; + +struct ehca_eqe { + u64 entry; +}; + +struct ehca_mrte { + u64 starting_va; + u64 length; /* length of memory region in bytes*/ + u32 pd; + u8 key_instance; + u8 pagesize; + u8 mr_control; + u8 local_remote_access_ctrl; + u8 reserved[0x20 - 0x18]; + u64 at_pointer[4]; +}; +#endif /*_EHCA_QES_H_*/ --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_qp.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_qp.c 2006-05-15 15:43:31.000000000 +0200 @@ -0,0 +1,1565 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * QP functions + * + * Authors: Waleri Fomin + * Hoang-Nam Nguyen + * Reinhard Ernst + * Heiko J Schick + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + + +#define DEB_PREFIX "e_qp" + +#include + +#include "ehca_classes.h" +#include "ehca_tools.h" +#include "ehca_qes.h" +#include "ehca_iverbs.h" +#include "hcp_if.h" +#include "hipz_fns.h" + +/** + * attributes not supported by query qp + */ +#define QP_ATTR_QUERY_NOT_SUPPORTED (IB_QP_MAX_DEST_RD_ATOMIC | \ + IB_QP_MAX_QP_RD_ATOMIC | \ + IB_QP_ACCESS_FLAGS | \ + IB_QP_EN_SQD_ASYNC_NOTIFY) + +/** + * ehca (internal) qp state values + */ +enum ehca_qp_state { + EHCA_QPS_RESET = 1, + EHCA_QPS_INIT = 2, + EHCA_QPS_RTR = 3, + EHCA_QPS_RTS = 5, + EHCA_QPS_SQD = 6, + EHCA_QPS_SQE = 8, + EHCA_QPS_ERR = 128 +}; + +/** + * qp state transitions as defined by IB Arch Rel 1.1 page 431 + */ +enum ib_qp_statetrans { + IB_QPST_ANY2RESET, + IB_QPST_ANY2ERR, + IB_QPST_RESET2INIT, + IB_QPST_INIT2RTR, + IB_QPST_INIT2INIT, + IB_QPST_RTR2RTS, + IB_QPST_RTS2SQD, + IB_QPST_RTS2RTS, + IB_QPST_SQD2RTS, + IB_QPST_SQE2RTS, + IB_QPST_SQD2SQD, + IB_QPST_MAX /* nr of transitions, this must be last!!! */ +}; + +/** + * ib2ehca_qp_state - maps IB to ehca qp_state + * returns ehca qp state corresponding to given ib qp state + */ +static inline enum ehca_qp_state ib2ehca_qp_state(enum ib_qp_state ib_qp_state) +{ + switch (ib_qp_state) { + case IB_QPS_RESET: + return EHCA_QPS_RESET; + case IB_QPS_INIT: + return EHCA_QPS_INIT; + case IB_QPS_RTR: + return EHCA_QPS_RTR; + case IB_QPS_RTS: + return EHCA_QPS_RTS; + case IB_QPS_SQD: + return EHCA_QPS_SQD; + case IB_QPS_SQE: + return EHCA_QPS_SQE; + case IB_QPS_ERR: + return EHCA_QPS_ERR; + default: + EDEB_ERR(4, "invalid ib_qp_state=%x", ib_qp_state); + return -EINVAL; + } +} + +/** + * ehca2ib_qp_state - maps ehca to IB qp_state + * returns ib qp state corresponding to given ehca qp state + */ +static inline enum ib_qp_state ehca2ib_qp_state(enum ehca_qp_state + ehca_qp_state) +{ + switch (ehca_qp_state) { + case EHCA_QPS_RESET: + return IB_QPS_RESET; + case EHCA_QPS_INIT: + return IB_QPS_INIT; + case EHCA_QPS_RTR: + return IB_QPS_RTR; + case EHCA_QPS_RTS: + return IB_QPS_RTS; + case EHCA_QPS_SQD: + return IB_QPS_SQD; + case EHCA_QPS_SQE: + return IB_QPS_SQE; + case EHCA_QPS_ERR: + return IB_QPS_ERR; + default: + EDEB_ERR(4,"invalid ehca_qp_state=%x",ehca_qp_state); + return -EINVAL; + } +} + +/** + * ehca_qp_type - used as index for req_attr and opt_attr of + * struct ehca_modqp_statetrans + */ +enum ehca_qp_type { + QPT_RC = 0, + QPT_UC = 1, + QPT_UD = 2, + QPT_SQP = 3, + QPT_MAX +}; + +/** + * ib2ehcaqptype - maps Ib to ehca qp_type + * returns ehca qp type corresponding to ib qp type + */ +static inline enum ehca_qp_type ib2ehcaqptype(enum ib_qp_type ibqptype) +{ + switch (ibqptype) { + case IB_QPT_SMI: + case IB_QPT_GSI: + return QPT_SQP; + case IB_QPT_RC: + return QPT_RC; + case IB_QPT_UC: + return QPT_UC; + case IB_QPT_UD: + return QPT_UD; + default: + EDEB_ERR(4,"Invalid ibqptype=%x", ibqptype); + return -EINVAL; + } +} + +static inline enum ib_qp_statetrans get_modqp_statetrans(int ib_fromstate, + int ib_tostate) +{ + int index = -EINVAL; + switch (ib_tostate) { + case IB_QPS_RESET: + index = IB_QPST_ANY2RESET; + break; + case IB_QPS_INIT: + if (ib_fromstate == IB_QPS_RESET) + index = IB_QPST_RESET2INIT; + else if (ib_fromstate == IB_QPS_INIT) + index = IB_QPST_INIT2INIT; + break; + case IB_QPS_RTR: + if (ib_fromstate == IB_QPS_INIT) + index = IB_QPST_INIT2RTR; + break; + case IB_QPS_RTS: + if (ib_fromstate == IB_QPS_RTR) + index = IB_QPST_RTR2RTS; + else if (ib_fromstate == IB_QPS_RTS) + index = IB_QPST_RTS2RTS; + else if (ib_fromstate == IB_QPS_SQD) + index = IB_QPST_SQD2RTS; + else if (ib_fromstate == IB_QPS_SQE) + index = IB_QPST_SQE2RTS; + break; + case IB_QPS_SQD: + if (ib_fromstate == IB_QPS_RTS) + index = IB_QPST_RTS2SQD; + break; + case IB_QPS_SQE: + break; + case IB_QPS_ERR: + index = IB_QPST_ANY2ERR; + break; + default: + break; + } + return index; +} + +enum ehca_service_type { + ST_RC = 0, + ST_UC = 1, + ST_RD = 2, + ST_UD = 3 +}; + +/** + * ibqptype2servicetype - returns hcp service type corresponding to given + * ib qp type used by create_qp() + */ +static inline int ibqptype2servicetype(enum ib_qp_type ibqptype) +{ + switch (ibqptype) { + case IB_QPT_SMI: + case IB_QPT_GSI: + return ST_UD; + case IB_QPT_RC: + return ST_RC; + case IB_QPT_UC: + return ST_UC; + case IB_QPT_UD: + return ST_UD; + case IB_QPT_RAW_IPV6: + return -EINVAL; + case IB_QPT_RAW_ETY: + return -EINVAL; + default: + EDEB_ERR(4, "Invalid ibqptype=%x", ibqptype); + return -EINVAL; + } +} + +/** + * init_qp_queues - Initializes/constructs r/squeue and registers queue pages. + */ +static inline int init_qp_queues(struct ipz_adapter_handle ipz_hca_handle, + struct ehca_qp *my_qp, + int nr_sq_pages, + int nr_rq_pages, + int swqe_size, + int rwqe_size, + int nr_send_sges, int nr_receive_sges) +{ + int ret = -EINVAL; + int cnt = 0; + void *vpage = NULL; + u64 rpage = 0; + int ipz_rc = -1; + u64 h_ret = H_PARAMETER; + + ipz_rc = ipz_queue_ctor(&my_qp->ipz_squeue, + nr_sq_pages, + EHCA_PAGESIZE, swqe_size, nr_send_sges); + if (!ipz_rc) { + EDEB_ERR(4, "Cannot allocate page for squeue. ipz_rc=%x", + ipz_rc); + ret = -EBUSY; + return ret; + } + + ipz_rc = ipz_queue_ctor(&my_qp->ipz_rqueue, + nr_rq_pages, + EHCA_PAGESIZE, rwqe_size, nr_receive_sges); + if (!ipz_rc) { + EDEB_ERR(4, "Cannot allocate page for rqueue. ipz_rc=%x", + ipz_rc); + ret = -EBUSY; + goto init_qp_queues0; + } + /* register SQ pages */ + for (cnt = 0; cnt < nr_sq_pages; cnt++) { + vpage = ipz_qpageit_get_inc(&my_qp->ipz_squeue); + if (!vpage) { + EDEB_ERR(4, "SQ ipz_qpageit_get_inc() " + "failed p_vpage= %p", vpage); + ret = -EINVAL; + goto init_qp_queues1; + } + rpage = virt_to_abs(vpage); + + h_ret = hipz_h_register_rpage_qp(ipz_hca_handle, + my_qp->ipz_qp_handle, + &my_qp->pf, 0, 0, + rpage, 1, + my_qp->galpas.kernel); + if (h_ret < H_SUCCESS) { + EDEB_ERR(4,"SQ hipz_qp_register_rpage() faield " + "rc=%lx", h_ret); + ret = ehca2ib_return_code(h_ret); + goto init_qp_queues1; + } + } + + ipz_qeit_reset(&my_qp->ipz_squeue); + + /* register RQ pages */ + for (cnt = 0; cnt < nr_rq_pages; cnt++) { + vpage = ipz_qpageit_get_inc(&my_qp->ipz_rqueue); + if (!vpage) { + EDEB_ERR(4,"RQ ipz_qpageit_get_inc() " + "failed p_vpage = %p", vpage); + h_ret = H_RESOURCE; + ret = -EINVAL; + goto init_qp_queues1; + } + + rpage = virt_to_abs(vpage); + + h_ret = hipz_h_register_rpage_qp(ipz_hca_handle, + my_qp->ipz_qp_handle, + &my_qp->pf, 0, 1, + rpage, 1,my_qp->galpas.kernel); + if (h_ret < H_SUCCESS) { + EDEB_ERR(4, "RQ hipz_qp_register_rpage() failed " + "rc=%lx", h_ret); + ret = ehca2ib_return_code(h_ret); + goto init_qp_queues1; + } + if (cnt == (nr_rq_pages - 1)) { /* last page! */ + if (h_ret != H_SUCCESS) { + EDEB_ERR(4,"RQ hipz_qp_register_rpage() " + "h_ret= %lx ", h_ret); + ret = ehca2ib_return_code(h_ret); + goto init_qp_queues1; + } + vpage = ipz_qpageit_get_inc(&my_qp->ipz_rqueue); + if (vpage) { + EDEB_ERR(4,"ipz_qpageit_get_inc() " + "should not succeed vpage=%p", + vpage); + ret = -EINVAL; + goto init_qp_queues1; + } + } else { + if (h_ret != H_PAGE_REGISTERED) { + EDEB_ERR(4,"RQ hipz_qp_register_rpage() " + "h_ret= %lx ", h_ret); + ret = ehca2ib_return_code(h_ret); + goto init_qp_queues1; + } + } + } + + ipz_qeit_reset(&my_qp->ipz_rqueue); + + return 0; + +init_qp_queues1: + ipz_queue_dtor(&my_qp->ipz_rqueue); +init_qp_queues0: + ipz_queue_dtor(&my_qp->ipz_squeue); + return ret; +} + + +struct ib_qp *ehca_create_qp(struct ib_pd *pd, + struct ib_qp_init_attr *init_attr, + struct ib_udata *udata) +{ + extern struct ehca_module ehca_module; + static int da_msg_size[]={ 128, 256, 512, 1024, 2048, 4096 }; + int ret = -EINVAL; + + struct ehca_qp *my_qp = NULL; + struct ehca_pd *my_pd = NULL; + struct ehca_shca *shca = NULL; + struct ib_ucontext *context = NULL; + u64 h_ret = H_PARAMETER; + int max_send_sge; + int max_recv_sge; + + /* h_call's out parameters */ + struct ehca_alloc_qp_parms parms; + u32 qp_nr = 0, swqe_size = 0, rwqe_size = 0; + u8 daqp_completion, isdaqp; + unsigned long flags; + + EDEB_EN(7,"pd=%p init_attr=%p", pd, init_attr); + EHCA_CHECK_PD_P(pd); + EHCA_CHECK_ADR_P(init_attr); + + if (init_attr->sq_sig_type != IB_SIGNAL_REQ_WR && + init_attr->sq_sig_type != IB_SIGNAL_ALL_WR) { + EDEB_ERR(4, "init_attr->sg_sig_type=%x not allowed", + init_attr->sq_sig_type); + return ERR_PTR(-EINVAL); + } + + /* save daqp completion bits */ + daqp_completion = init_attr->qp_type & 0x60; + /* save daqp bit */ + isdaqp = (init_attr->qp_type & 0x80) ? 1 : 0; + init_attr->qp_type = init_attr->qp_type & 0x1F; + + if (init_attr->qp_type != IB_QPT_UD && + init_attr->qp_type != IB_QPT_SMI && + init_attr->qp_type != IB_QPT_GSI && + init_attr->qp_type != IB_QPT_UC && + init_attr->qp_type != IB_QPT_RC) { + EDEB_ERR(4,"wrong QP Type=%x",init_attr->qp_type); + return ERR_PTR(-EINVAL); + } + if (init_attr->qp_type != IB_QPT_RC && isdaqp != 0) { + EDEB_ERR(4,"unsupported LL QP Type=%x",init_attr->qp_type); + return ERR_PTR(-EINVAL); + } + + if (pd->uobject && udata) + context = pd->uobject->context; + + my_qp = kmem_cache_alloc(ehca_module.cache_qp, SLAB_KERNEL); + if (!my_qp) { + EDEB_ERR(4, "pd=%p not enough memory to alloc qp", pd); + return ERR_PTR(-ENOMEM); + } + + memset(my_qp, 0, sizeof(struct ehca_qp)); + memset (&parms, 0, sizeof(struct ehca_alloc_qp_parms)); + spin_lock_init(&my_qp->spinlock_s); + spin_lock_init(&my_qp->spinlock_r); + + my_pd = container_of(pd, struct ehca_pd, ib_pd); + + shca = container_of(pd->device, struct ehca_shca, ib_device); + my_qp->recv_cq = + container_of(init_attr->recv_cq, struct ehca_cq, ib_cq); + my_qp->send_cq = + container_of(init_attr->send_cq, struct ehca_cq, ib_cq); + + my_qp->init_attr = *init_attr; + + do { + if (!idr_pre_get(&ehca_qp_idr, GFP_KERNEL)) { + ret = -ENOMEM; + EDEB_ERR(4, "Can't reserve idr resources."); + goto create_qp_exit0; + } + + spin_lock_irqsave(&ehca_qp_idr_lock, flags); + ret = idr_get_new(&ehca_qp_idr, my_qp, &my_qp->token); + spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); + + } while (ret == -EAGAIN); + + if (ret) { + ret = -ENOMEM; + EDEB_ERR(4, "Can't allocate new idr entry."); + goto create_qp_exit0; + } + + parms.servicetype = ibqptype2servicetype(init_attr->qp_type); + if (parms.servicetype < 0) { + ret = -EINVAL; + EDEB_ERR(4, "Invalid qp_type=%x", init_attr->qp_type); + goto create_qp_exit0; + } + + if (init_attr->sq_sig_type == IB_SIGNAL_ALL_WR) + parms.sigtype = HCALL_SIGT_EVERY; + else + parms.sigtype = HCALL_SIGT_BY_WQE; + + /* UD_AV CIRCUMVENTION */ + max_send_sge = init_attr->cap.max_send_sge; + max_recv_sge = init_attr->cap.max_recv_sge; + if (IB_QPT_UD == init_attr->qp_type || + IB_QPT_GSI == init_attr->qp_type || + IB_QPT_SMI == init_attr->qp_type) { + max_send_sge += 2; + max_recv_sge += 2; + } + + EDEB(7, "isdaqp=%x daqp_completion=%x", isdaqp, daqp_completion); + + parms.ipz_eq_handle = shca->eq.ipz_eq_handle; + parms.daqp_ctrl = isdaqp | daqp_completion; + parms.pd = my_pd->fw_pd; + parms.max_recv_sge = max_recv_sge; + parms.max_send_sge = max_send_sge; + + h_ret = hipz_h_alloc_resource_qp(shca->ipz_hca_handle, my_qp, &parms); + + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "h_alloc_resource_qp() failed h_ret=%lx", h_ret); + ret = ehca2ib_return_code(h_ret); + goto create_qp_exit1; + } + + switch (init_attr->qp_type) { + case IB_QPT_RC: + if (isdaqp == 0) { + swqe_size = offsetof(struct ehca_wqe, u.nud.sg_list[ + (parms.act_nr_send_sges)]); + rwqe_size = offsetof(struct ehca_wqe, u.nud.sg_list[ + (parms.act_nr_recv_sges)]); + } else { /* for daqp we need to use msg size, not wqe size */ + swqe_size = da_msg_size[max_send_sge]; + rwqe_size = da_msg_size[max_recv_sge]; + parms.act_nr_send_sges = 1; + parms.act_nr_recv_sges = 1; + } + break; + case IB_QPT_UC: + swqe_size = offsetof(struct ehca_wqe, + u.nud.sg_list[parms.act_nr_send_sges]); + rwqe_size = offsetof(struct ehca_wqe, + u.nud.sg_list[parms.act_nr_recv_sges]); + break; + + case IB_QPT_UD: + case IB_QPT_GSI: + case IB_QPT_SMI: + /* UD circumvention */ + parms.act_nr_recv_sges -= 2; + parms.act_nr_send_sges -= 2; + swqe_size = offsetof(struct ehca_wqe, + u.ud_av.sg_list[parms.act_nr_send_sges]); + rwqe_size = offsetof(struct ehca_wqe, + u.ud_av.sg_list[parms.act_nr_recv_sges]); + + if (IB_QPT_GSI == init_attr->qp_type || + IB_QPT_SMI == init_attr->qp_type) { + parms.act_nr_send_wqes = init_attr->cap.max_send_wr; + parms.act_nr_recv_wqes = init_attr->cap.max_recv_wr; + parms.act_nr_send_sges = init_attr->cap.max_send_sge; + parms.act_nr_recv_sges = init_attr->cap.max_recv_sge; + my_qp->real_qp_num = + (init_attr->qp_type == IB_QPT_SMI) ? 0 : 1; + } + + break; + + default: + break; + } + + /* initializes r/squeue and registers queue pages */ + ret = init_qp_queues(shca->ipz_hca_handle, my_qp, + parms.nr_sq_pages, parms.nr_rq_pages, + swqe_size, rwqe_size, + parms.act_nr_send_sges, parms.act_nr_recv_sges); + if (ret) { + EDEB_ERR(4,"Couldn't initialize r/squeue and pages ret=%x", + ret); + goto create_qp_exit2; + } + + my_qp->ib_qp.pd = &my_pd->ib_pd; + my_qp->ib_qp.device = my_pd->ib_pd.device; + + my_qp->ib_qp.recv_cq = init_attr->recv_cq; + my_qp->ib_qp.send_cq = init_attr->send_cq; + + my_qp->ib_qp.qp_num = my_qp->real_qp_num; + my_qp->ib_qp.qp_type = init_attr->qp_type; + + my_qp->qp_type = init_attr->qp_type; + my_qp->ib_qp.srq = init_attr->srq; + + my_qp->ib_qp.qp_context = init_attr->qp_context; + my_qp->ib_qp.event_handler = init_attr->event_handler; + + init_attr->cap.max_inline_data = 0; /* not supported yet */ + init_attr->cap.max_recv_sge = parms.act_nr_recv_sges; + init_attr->cap.max_recv_wr = parms.act_nr_recv_wqes; + init_attr->cap.max_send_sge = parms.act_nr_send_sges; + init_attr->cap.max_send_wr = parms.act_nr_send_wqes; + + /* NOTE: define_apq0() not supported yet */ + if (init_attr->qp_type == IB_QPT_GSI) { + h_ret = ehca_define_sqp(shca, my_qp, init_attr); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "ehca_define_sqp() failed rc=%lx",h_ret); + ret = ehca2ib_return_code(h_ret); + goto create_qp_exit3; + } + } + if (init_attr->send_cq) { + struct ehca_cq *cq = container_of(init_attr->send_cq, + struct ehca_cq, ib_cq); + ret = ehca_cq_assign_qp(cq, my_qp); + if (ret) { + EDEB_ERR(4, "Couldn't assign qp to send_cq ret=%x", + ret); + goto create_qp_exit3; + } + my_qp->send_cq = cq; + } + /* copy queues, galpa data to user space */ + if (context && udata) { + struct ipz_queue *ipz_rqueue = &my_qp->ipz_rqueue; + struct ipz_queue *ipz_squeue = &my_qp->ipz_squeue; + struct ehca_create_qp_resp resp; + struct vm_area_struct * vma; + memset(&resp, 0, sizeof(resp)); + + resp.qp_num = my_qp->real_qp_num; + resp.token = my_qp->token; + resp.qp_type = my_qp->qp_type; + resp.qkey = my_qp->qkey; + resp.real_qp_num = my_qp->real_qp_num; + /* rqueue properties */ + resp.ipz_rqueue.qe_size = ipz_rqueue->qe_size; + resp.ipz_rqueue.act_nr_of_sg = ipz_rqueue->act_nr_of_sg; + resp.ipz_rqueue.queue_length = ipz_rqueue->queue_length; + resp.ipz_rqueue.pagesize = ipz_rqueue->pagesize; + resp.ipz_rqueue.toggle_state = ipz_rqueue->toggle_state; + ehca_mmap_nopage(((u64)(my_qp->token) << 32) | 0x22000000, + ipz_rqueue->queue_length, + ((void**)&resp.ipz_rqueue.queue), + &vma); + my_qp->uspace_rqueue = resp.ipz_rqueue.queue; + /* squeue properties */ + resp.ipz_squeue.qe_size = ipz_squeue->qe_size; + resp.ipz_squeue.act_nr_of_sg = ipz_squeue->act_nr_of_sg; + resp.ipz_squeue.queue_length = ipz_squeue->queue_length; + resp.ipz_squeue.pagesize = ipz_squeue->pagesize; + resp.ipz_squeue.toggle_state = ipz_squeue->toggle_state; + ehca_mmap_nopage(((u64)(my_qp->token) << 32) | 0x23000000, + ipz_squeue->queue_length, + ((void**)&resp.ipz_squeue.queue), + &vma); + my_qp->uspace_squeue = resp.ipz_squeue.queue; + /* fw_handle */ + resp.galpas = my_qp->galpas; + ehca_mmap_register(my_qp->galpas.user.fw_handle, + ((void**)&resp.galpas.kernel.fw_handle), + &vma); + my_qp->uspace_fwh = (u64)resp.galpas.kernel.fw_handle; + + if (ib_copy_to_udata(udata, &resp, sizeof resp)) { + EDEB_ERR(4, "Copy to udata failed"); + ret = -EINVAL; + goto create_qp_exit3; + } + } + + EDEB_EX(7, "ehca_qp=%p qp_num=%x, token=%x", + my_qp, qp_nr, my_qp->token); + return &my_qp->ib_qp; + +create_qp_exit3: + ipz_queue_dtor(&my_qp->ipz_rqueue); + ipz_queue_dtor(&my_qp->ipz_squeue); + +create_qp_exit2: + hipz_h_destroy_qp(shca->ipz_hca_handle, my_qp); + +create_qp_exit1: + spin_lock_irqsave(&ehca_qp_idr_lock, flags); + idr_remove(&ehca_qp_idr, my_qp->token); + spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); + +create_qp_exit0: + kmem_cache_free(ehca_module.cache_qp, my_qp); + EDEB_EX(4, "failed ret=%x", ret); + return ERR_PTR(ret); + +} + +/** + * prepare_sqe_rts - called by internal_modify_qp() at trans sqe -> rts + * set purge bit of bad wqe and subsequent wqes to avoid reentering sqe + * returns total number of bad wqes in bad_wqe_cnt + */ +static int prepare_sqe_rts(struct ehca_qp *my_qp, struct ehca_shca *shca, + int *bad_wqe_cnt) +{ + int ret = 0; + u64 h_ret = H_SUCCESS; + struct ipz_queue *squeue = NULL; + void *bad_send_wqe_p = NULL; + void *bad_send_wqe_v = NULL; + void *squeue_start_p = NULL; + void *squeue_end_p = NULL; + void *squeue_start_v = NULL; + void *squeue_end_v = NULL; + struct ehca_wqe *wqe = NULL; + int qp_num = my_qp->ib_qp.qp_num; + + EDEB_EN(7, "ehca_qp=%p qp_num=%x ", my_qp, qp_num); + + /* get send wqe pointer */ + h_ret = hipz_h_disable_and_get_wqe(shca->ipz_hca_handle, + my_qp->ipz_qp_handle, &my_qp->pf, + &bad_send_wqe_p, NULL, 2); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "hipz_h_disable_and_get_wqe() failed " + "ehca_qp=%p qp_num=%x h_ret=%lx",my_qp, qp_num, h_ret); + ret = ehca2ib_return_code(h_ret); + goto prepare_sqe_rts_exit1; + } + bad_send_wqe_p = (void*)((u64)bad_send_wqe_p & (~(1L<<63))); + EDEB(7, "qp_num=%x bad_send_wqe_p=%p", qp_num, bad_send_wqe_p); + /* convert wqe pointer to vadr */ + bad_send_wqe_v = abs_to_virt((u64)bad_send_wqe_p); + EDEB_DMP(6, bad_send_wqe_v, 32, "qp_num=%x bad_wqe", qp_num); + squeue = &my_qp->ipz_squeue; + squeue_start_p = (void*)virt_to_abs(ipz_qeit_calc(squeue, 0L)); + squeue_end_p = squeue_start_p+squeue->queue_length; + squeue_start_v = abs_to_virt((u64)squeue_start_p); + squeue_end_v = abs_to_virt((u64)squeue_end_p); + EDEB(6, "qp_num=%x squeue_start_v=%p squeue_end_v=%p", + qp_num, squeue_start_v, squeue_end_v); + + /* loop sets wqe's purge bit */ + wqe = (struct ehca_wqe*)bad_send_wqe_v; + *bad_wqe_cnt = 0; + while (wqe->optype != 0xff && wqe->wqef != 0xff) { + EDEB_DMP(6, wqe, 32, "qp_num=%x wqe", qp_num); + wqe->nr_of_data_seg = 0; /* suppress data access */ + wqe->wqef = WQEF_PURGE; /* WQE to be purged */ + wqe = (struct ehca_wqe*)((u8*)wqe+squeue->qe_size); + *bad_wqe_cnt = (*bad_wqe_cnt)+1; + if ((void*)wqe >= squeue_end_v) { + wqe = squeue_start_v; + } + } + /* bad wqe will be reprocessed and ignored when pol_cq() is called, + * i.e. nr of wqes with flush error status is one less + */ + EDEB(6, "qp_num=%x flusherr_wqe_cnt=%x", qp_num, (*bad_wqe_cnt)-1); + wqe->wqef = 0; + +prepare_sqe_rts_exit1: + + EDEB_EX(7, "ehca_qp=%p qp_num=%x ret=%x", my_qp, qp_num, ret); + return ret; +} + +/** + * internal_modify_qp - with circumvention to handle aqp0 properly + * smi_reset2init indicates if this is an internal reset-to-init-call for + * smi. This flag must always be zero if called from ehca_modify_qp()! + * This internal func was intorduced to avoid recursion of ehca_modify_qp()! + */ +static int internal_modify_qp(struct ib_qp *ibqp, + struct ib_qp_attr *attr, + int attr_mask, int smi_reset2init) +{ + enum ib_qp_state qp_cur_state = 0, qp_new_state = 0; + int cnt = 0, qp_attr_idx = 0, ret = 0; + + enum ib_qp_statetrans statetrans; + struct hcp_modify_qp_control_block *mqpcb = NULL; + struct ehca_qp *my_qp = NULL; + struct ehca_shca *shca = NULL; + u64 update_mask = 0; + u64 h_ret = H_SUCCESS; + int bad_wqe_cnt = 0; + int squeue_locked = 0; + unsigned long spl_flags = 0; + + my_qp = container_of(ibqp, struct ehca_qp, ib_qp); + shca = container_of(ibqp->pd->device, struct ehca_shca, ib_device); + + EDEB_EN(7, "ehca_qp=%p qp_num=%x ibqp_type=%x " + "new qp_state=%x attribute_mask=%x", + my_qp, ibqp->qp_num, ibqp->qp_type, + attr->qp_state, attr_mask); + + /* do query_qp to obtain current attr values */ + mqpcb = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); + if (mqpcb == NULL) { + ret = -ENOMEM; + EDEB_ERR(4, "Could not get zeroed page for mqpcb " + "ehca_qp=%p qp_num=%x ", my_qp, ibqp->qp_num); + goto modify_qp_exit0; + } + + h_ret = hipz_h_query_qp(shca->ipz_hca_handle, + my_qp->ipz_qp_handle, + &my_qp->pf, + mqpcb, my_qp->galpas.kernel); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "hipz_h_query_qp() failed " + "ehca_qp=%p qp_num=%x h_ret=%lx", + my_qp, ibqp->qp_num, h_ret); + ret = ehca2ib_return_code(h_ret); + goto modify_qp_exit1; + } + EDEB(7, "ehca_qp=%p qp_num=%x ehca_qp_state=%x", + my_qp, ibqp->qp_num, mqpcb->qp_state); + + qp_cur_state = ehca2ib_qp_state(mqpcb->qp_state); + + if (qp_cur_state == -EINVAL) { /* invalid qp state */ + ret = -EINVAL; + EDEB_ERR(4, "Invalid current ehca_qp_state=%x " + "ehca_qp=%p qp_num=%x", + mqpcb->qp_state, my_qp, ibqp->qp_num); + goto modify_qp_exit1; + } + /* circumvention to set aqp0 initial state to init + as expected by IB spec */ + if (smi_reset2init == 0 && + ibqp->qp_type == IB_QPT_SMI && + qp_cur_state == IB_QPS_RESET && + (attr_mask & IB_QP_STATE) && + attr->qp_state == IB_QPS_INIT) { /* RESET -> INIT */ + struct ib_qp_attr smiqp_attr = { + .qp_state = IB_QPS_INIT, + .port_num = my_qp->init_attr.port_num, + .pkey_index = 0, + .qkey = 0 + }; + int smiqp_attr_mask = IB_QP_STATE | IB_QP_PORT | + IB_QP_PKEY_INDEX | IB_QP_QKEY; + int smirc = internal_modify_qp( + ibqp, &smiqp_attr, smiqp_attr_mask, 1); + if (smirc) { + EDEB_ERR(4, "SMI RESET -> INIT failed. " + "ehca_modify_qp() rc=%x", smirc); + ret = H_PARAMETER; + goto modify_qp_exit1; + } + qp_cur_state = IB_QPS_INIT; + EDEB(7, "SMI RESET -> INIT succeeded"); + } + /* is transmitted current state equal to "real" current state */ + if ((attr_mask & IB_QP_CUR_STATE) && + qp_cur_state != attr->cur_qp_state) { + ret = -EINVAL; + EDEB_ERR(4, "Invalid IB_QP_CUR_STATE attr->curr_qp_state=%x <>" + " actual cur_qp_state=%x. ehca_qp=%p qp_num=%x", + attr->cur_qp_state, qp_cur_state, my_qp, ibqp->qp_num); + goto modify_qp_exit1; + } + + EDEB(7, "ehca_qp=%p qp_num=%x current qp_state=%x " + "new qp_state=%x attribute_mask=%x", + my_qp, ibqp->qp_num, qp_cur_state, attr->qp_state, attr_mask); + + qp_new_state = attr_mask & IB_QP_STATE ? attr->qp_state : qp_cur_state; + if (!smi_reset2init && + !ib_modify_qp_is_ok(qp_cur_state, qp_new_state, ibqp->qp_type, + attr_mask)) { + ret = -EINVAL; + EDEB_ERR(4, "Invalid qp transition new_state=%x cur_state=%x " + "ehca_qp=%p qp_num=%x attr_mask=%x", + qp_new_state, qp_cur_state, my_qp, ibqp->qp_num, + attr_mask); + goto modify_qp_exit1; + } + + if ((mqpcb->qp_state = ib2ehca_qp_state(qp_new_state))) + update_mask = EHCA_BMASK_SET(MQPCB_MASK_QP_STATE, 1); + else { + ret = -EINVAL; + EDEB_ERR(4, "Invalid new qp state=%x " + "ehca_qp=%p qp_num=%x", + qp_new_state, my_qp, ibqp->qp_num); + goto modify_qp_exit1; + } + + /* retrieve state transition struct to get req and opt attrs */ + statetrans = get_modqp_statetrans(qp_cur_state, qp_new_state); + if (statetrans < 0) { + ret = -EINVAL; + EDEB_ERR(4, " qp_cur_state=%x " + "new_qp_state=%x State_xsition=%x " + "ehca_qp=%p qp_num=%x", + qp_cur_state, qp_new_state, + statetrans, my_qp, ibqp->qp_num); + goto modify_qp_exit1; + } + + qp_attr_idx = ib2ehcaqptype(ibqp->qp_type); + + if (qp_attr_idx < 0) { + ret = qp_attr_idx; + EDEB_ERR(4, "Invalid QP type=%x ehca_qp=%p qp_num=%x", + ibqp->qp_type, my_qp, ibqp->qp_num); + goto modify_qp_exit1; + } + + EDEB(7, "ehca_qp=%p qp_num=%x qp_state_xsit=%x", + my_qp, ibqp->qp_num, statetrans); + + /* sqe -> rts: set purge bit of bad wqe before actual trans */ + if ((my_qp->qp_type == IB_QPT_UD || + my_qp->qp_type == IB_QPT_GSI || + my_qp->qp_type == IB_QPT_SMI) && + statetrans == IB_QPST_SQE2RTS) { + /* mark next free wqe if kernel */ + if (my_qp->uspace_squeue == 0) { + struct ehca_wqe *wqe = NULL; + /* lock send queue */ + spin_lock_irqsave(&my_qp->spinlock_s, spl_flags); + squeue_locked = 1; + /* mark next free wqe */ + wqe = (struct ehca_wqe*) + ipz_qeit_get(&my_qp->ipz_squeue); + wqe->optype = wqe->wqef = 0xff; + EDEB(7, "qp_num=%x next_free_wqe=%p", + ibqp->qp_num, wqe); + } + ret = prepare_sqe_rts(my_qp, shca, &bad_wqe_cnt); + if (ret) { + EDEB_ERR(4, "prepare_sqe_rts() failed " + "ehca_qp=%p qp_num=%x ret=%x", + my_qp, ibqp->qp_num, ret); + goto modify_qp_exit2; + } + } + + /* enable RDMA_Atomic_Control if reset->init und reliable con + this is necessary since gen2 does not provide that flag, + but pHyp requires it */ + if (statetrans == IB_QPST_RESET2INIT && + (ibqp->qp_type == IB_QPT_RC || ibqp->qp_type == IB_QPT_UC)) { + mqpcb->rdma_atomic_ctrl = 3; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_RDMA_ATOMIC_CTRL, 1); + } + /* circ. pHyp requires #RDMA/Atomic Resp Res for UC INIT -> RTR */ + if (statetrans == IB_QPST_INIT2RTR && + (ibqp->qp_type == IB_QPT_UC) && + !(attr_mask & IB_QP_MAX_DEST_RD_ATOMIC)) { + mqpcb->rdma_nr_atomic_resp_res = 1; /* default to 1 */ + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_RDMA_NR_ATOMIC_RESP_RES, 1); + } + + if (attr_mask & IB_QP_PKEY_INDEX) { + mqpcb->prim_p_key_idx = attr->pkey_index; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_PRIM_P_KEY_IDX, 1); + EDEB(7, "ehca_qp=%p qp_num=%x " + "IB_QP_PKEY_INDEX update_mask=%lx", + my_qp, ibqp->qp_num, update_mask); + } + if (attr_mask & IB_QP_PORT) { + if (attr->port_num < 1 || attr->port_num > shca->num_ports) { + ret = -EINVAL; + EDEB_ERR(4, "Invalid port=%x. " + "ehca_qp=%p qp_num=%x num_ports=%x", + attr->port_num, my_qp, ibqp->qp_num, + shca->num_ports); + goto modify_qp_exit2; + } + mqpcb->prim_phys_port = attr->port_num; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_PRIM_PHYS_PORT, 1); + EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_PORT update_mask=%lx", + my_qp, ibqp->qp_num, update_mask); + } + if (attr_mask & IB_QP_QKEY) { + mqpcb->qkey = attr->qkey; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_QKEY, 1); + EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_QKEY update_mask=%lx", + my_qp, ibqp->qp_num, update_mask); + } + if (attr_mask & IB_QP_AV) { + int ah_mult = ib_rate_to_mult(attr->ah_attr.static_rate); + int ehca_mult = ib_rate_to_mult(shca->sport[my_qp-> + init_attr.port_num].rate); + + mqpcb->dlid = attr->ah_attr.dlid; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_DLID, 1); + mqpcb->source_path_bits = attr->ah_attr.src_path_bits; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_SOURCE_PATH_BITS, 1); + mqpcb->service_level = attr->ah_attr.sl; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_SERVICE_LEVEL, 1); + + if (ah_mult < ehca_mult) + mqpcb->max_static_rate = (ah_mult > 0) ? + ((ehca_mult - 1) / ah_mult) : 0; + else + mqpcb->max_static_rate = 0; + + EDEB(7, " ipd=mqpcb->max_static_rate set %x " + " ah_mult=%x ehca_mult=%x " + " attr->ah_attr.static_rate=%x", + mqpcb->max_static_rate,ah_mult,ehca_mult, + attr->ah_attr.static_rate); + + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_MAX_STATIC_RATE, 1); + + /* only if GRH is TRUE we might consider SOURCE_GID_IDX + * and DEST_GID otherwise phype will return H_ATTR_PARM!!! + */ + if (attr->ah_attr.ah_flags == IB_AH_GRH) { + mqpcb->send_grh_flag = 1 << 31; + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_SEND_GRH_FLAG, 1); + mqpcb->source_gid_idx = attr->ah_attr.grh.sgid_index; + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_SOURCE_GID_IDX, 1); + + for (cnt = 0; cnt < 16; cnt++) + mqpcb->dest_gid.byte[cnt] = + attr->ah_attr.grh.dgid.raw[cnt]; + + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_DEST_GID, 1); + mqpcb->flow_label = attr->ah_attr.grh.flow_label; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_FLOW_LABEL, 1); + mqpcb->hop_limit = attr->ah_attr.grh.hop_limit; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_HOP_LIMIT, 1); + mqpcb->traffic_class = attr->ah_attr.grh.traffic_class; + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_TRAFFIC_CLASS, 1); + } + + EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_AV update_mask=%lx", + my_qp, ibqp->qp_num, update_mask); + } + + if (attr_mask & IB_QP_PATH_MTU) { + mqpcb->path_mtu = attr->path_mtu; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_PATH_MTU, 1); + EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_PATH_MTU update_mask=%lx", + my_qp, ibqp->qp_num, update_mask); + } + if (attr_mask & IB_QP_TIMEOUT) { + mqpcb->timeout = attr->timeout; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_TIMEOUT, 1); + EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_TIMEOUT update_mask=%lx", + my_qp, ibqp->qp_num, update_mask); + } + if (attr_mask & IB_QP_RETRY_CNT) { + mqpcb->retry_count = attr->retry_cnt; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_RETRY_COUNT, 1); + EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_RETRY_CNT update_mask=%lx", + my_qp, ibqp->qp_num, update_mask); + } + if (attr_mask & IB_QP_RNR_RETRY) { + mqpcb->rnr_retry_count = attr->rnr_retry; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_RNR_RETRY_COUNT, 1); + EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_RNR_RETRY update_mask=%lx", + my_qp, ibqp->qp_num, update_mask); + } + if (attr_mask & IB_QP_RQ_PSN) { + mqpcb->receive_psn = attr->rq_psn; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_RECEIVE_PSN, 1); + EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_RQ_PSN update_mask=%lx", + my_qp, ibqp->qp_num, update_mask); + } + if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC) { + mqpcb->rdma_nr_atomic_resp_res = attr->max_dest_rd_atomic < 3 ? + attr->max_dest_rd_atomic : 2; /* max is 2 */ + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_RDMA_NR_ATOMIC_RESP_RES, 1); + EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_MAX_DEST_RD_ATOMIC " + "update_mask=%lx", my_qp, ibqp->qp_num, update_mask); + } + if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC) { + mqpcb->rdma_atomic_outst_dest_qp = attr->max_rd_atomic; + update_mask |= + EHCA_BMASK_SET + (MQPCB_MASK_RDMA_ATOMIC_OUTST_DEST_QP, 1); + EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_MAX_QP_RD_ATOMIC " + "update_mask=%lx", my_qp, ibqp->qp_num, update_mask); + } + if (attr_mask & IB_QP_ALT_PATH) { + int ah_mult = ib_rate_to_mult(attr->alt_ah_attr.static_rate); + int ehca_mult = ib_rate_to_mult( + shca->sport[my_qp->init_attr.port_num].rate); + + mqpcb->dlid_al = attr->alt_ah_attr.dlid; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_DLID_AL, 1); + mqpcb->source_path_bits_al = attr->alt_ah_attr.src_path_bits; + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_SOURCE_PATH_BITS_AL, 1); + mqpcb->service_level_al = attr->alt_ah_attr.sl; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_SERVICE_LEVEL_AL, 1); + + if (ah_mult < ehca_mult) + mqpcb->max_static_rate = (ah_mult > 0) ? + ((ehca_mult - 1) / ah_mult) : 0; + else + mqpcb->max_static_rate_al = 0; + + EDEB(7, " ipd=mqpcb->max_static_rate set %x," + " ah_mult=%x ehca_mult=%x", + mqpcb->max_static_rate,ah_mult,ehca_mult); + + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_MAX_STATIC_RATE_AL, 1); + + /* only if GRH is TRUE we might consider SOURCE_GID_IDX + * and DEST_GID otherwise phype will return H_ATTR_PARM!!! + */ + if (attr->alt_ah_attr.ah_flags == IB_AH_GRH) { + mqpcb->send_grh_flag_al = 1 << 31; + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_SEND_GRH_FLAG_AL, 1); + mqpcb->source_gid_idx_al = + attr->alt_ah_attr.grh.sgid_index; + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_SOURCE_GID_IDX_AL, 1); + + for (cnt = 0; cnt < 16; cnt++) + mqpcb->dest_gid_al.byte[cnt] = + attr->alt_ah_attr.grh.dgid.raw[cnt]; + + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_DEST_GID_AL, 1); + mqpcb->flow_label_al = attr->alt_ah_attr.grh.flow_label; + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_FLOW_LABEL_AL, 1); + mqpcb->hop_limit_al = attr->alt_ah_attr.grh.hop_limit; + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_HOP_LIMIT_AL, 1); + mqpcb->traffic_class_al = + attr->alt_ah_attr.grh.traffic_class; + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_TRAFFIC_CLASS_AL, 1); + } + + EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_ALT_PATH update_mask=%lx", + my_qp, ibqp->qp_num, update_mask); + } + + if (attr_mask & IB_QP_MIN_RNR_TIMER) { + mqpcb->min_rnr_nak_timer_field = attr->min_rnr_timer; + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_MIN_RNR_NAK_TIMER_FIELD, 1); + EDEB(7, "ehca_qp=%p qp_num=%x " + "IB_QP_MIN_RNR_TIMER update_mask=%lx", + my_qp, ibqp->qp_num, update_mask); + } + + if (attr_mask & IB_QP_SQ_PSN) { + mqpcb->send_psn = attr->sq_psn; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_SEND_PSN, 1); + EDEB(7, "ehca_qp=%p qp_num=%x " + "IB_QP_SQ_PSN update_mask=%lx", + my_qp, ibqp->qp_num, update_mask); + } + + if (attr_mask & IB_QP_DEST_QPN) { + mqpcb->dest_qp_nr = attr->dest_qp_num; + update_mask |= EHCA_BMASK_SET(MQPCB_MASK_DEST_QP_NR, 1); + EDEB(7, "ehca_qp=%p qp_num=%x " + "IB_QP_DEST_QPN update_mask=%lx", + my_qp, ibqp->qp_num, update_mask); + } + + if (attr_mask & IB_QP_PATH_MIG_STATE) { + mqpcb->path_migration_state = attr->path_mig_state; + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_PATH_MIGRATION_STATE, 1); + EDEB(7, "ehca_qp=%p qp_num=%x " + "IB_QP_PATH_MIG_STATE update_mask=%lx", my_qp, + ibqp->qp_num, update_mask); + } + + if (attr_mask & IB_QP_CAP) { + mqpcb->max_nr_outst_send_wr = attr->cap.max_send_wr+1; + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_MAX_NR_OUTST_SEND_WR, 1); + mqpcb->max_nr_outst_recv_wr = attr->cap.max_recv_wr+1; + update_mask |= + EHCA_BMASK_SET(MQPCB_MASK_MAX_NR_OUTST_RECV_WR, 1); + EDEB(7, "ehca_qp=%p qp_num=%x " + "IB_QP_CAP update_mask=%lx", + my_qp, ibqp->qp_num, update_mask); + /* no support for max_send/recv_sge yet */ + } + + EDEB_DMP(7, mqpcb, 4*70, "ehca_qp=%p qp_num=%x", my_qp, ibqp->qp_num); + + h_ret = hipz_h_modify_qp(shca->ipz_hca_handle, + my_qp->ipz_qp_handle, + &my_qp->pf, + update_mask, + mqpcb, my_qp->galpas.kernel); + + if (h_ret != H_SUCCESS) { + ret = ehca2ib_return_code(h_ret); + EDEB_ERR(4, "hipz_h_modify_qp() failed rc=%lx " + "ehca_qp=%p qp_num=%x", + h_ret, my_qp, ibqp->qp_num); + goto modify_qp_exit2; + } + + if ((my_qp->qp_type == IB_QPT_UD || + my_qp->qp_type == IB_QPT_GSI || + my_qp->qp_type == IB_QPT_SMI) && + statetrans == IB_QPST_SQE2RTS) { + /* doorbell to reprocessing wqes */ + iosync(); /* serialize GAL register access */ + hipz_update_sqa(my_qp, bad_wqe_cnt-1); + EDEB(6, "doorbell for %x wqes", bad_wqe_cnt); + } + + if (statetrans == IB_QPST_RESET2INIT || + statetrans == IB_QPST_INIT2INIT) { + mqpcb->qp_enable = 1; + mqpcb->qp_state = EHCA_QPS_INIT; + update_mask = 0; + update_mask = EHCA_BMASK_SET(MQPCB_MASK_QP_ENABLE, 1); + + EDEB(7, "ehca_qp=%p qp_num=%x " + "RESET_2_INIT needs an additional enable " + "-> update_mask=%lx", my_qp, ibqp->qp_num, update_mask); + + h_ret = hipz_h_modify_qp(shca->ipz_hca_handle, + my_qp->ipz_qp_handle, + &my_qp->pf, + update_mask, + mqpcb, + my_qp->galpas.kernel); + + if (h_ret != H_SUCCESS) { + ret = ehca2ib_return_code(h_ret); + EDEB_ERR(4, "ENABLE in context of " + "RESET_2_INIT failed! " + "Maybe you didn't get a LID" + "h_ret=%lx ehca_qp=%p qp_num=%x", + h_ret, my_qp, ibqp->qp_num); + goto modify_qp_exit2; + } + } + + if (statetrans == IB_QPST_ANY2RESET) { + ipz_qeit_reset(&my_qp->ipz_rqueue); + ipz_qeit_reset(&my_qp->ipz_squeue); + } + + if (attr_mask & IB_QP_QKEY) + my_qp->qkey = attr->qkey; + +modify_qp_exit2: + if (squeue_locked) { /* this means: sqe -> rts */ + spin_unlock_irqrestore(&my_qp->spinlock_s, spl_flags); + my_qp->sqerr_purgeflag = 1; + } + +modify_qp_exit1: + kfree(mqpcb); + +modify_qp_exit0: + EDEB_EX(7, "ehca_qp=%p qp_num=%x ibqp_type=%x ret=%x", + my_qp, ibqp->qp_num, ibqp->qp_type, ret); + return ret; +} + +int ehca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask) +{ + int ret = 0; + struct ehca_qp *my_qp = NULL; + struct ehca_pd *my_pd = NULL; + u32 cur_pid = current->tgid; + + EHCA_CHECK_ADR(ibqp); + EHCA_CHECK_ADR(attr); + EHCA_CHECK_ADR(ibqp->device); + + my_qp = container_of(ibqp, struct ehca_qp, ib_qp); + + EDEB_EN(7, "ehca_qp=%p qp_num=%x ibqp_type=%x attr_mask=%x", + my_qp, ibqp->qp_num, ibqp->qp_type, attr_mask); + + my_pd = container_of(my_qp->ib_qp.pd, struct ehca_pd, ib_pd); + if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && + my_pd->ownpid != cur_pid) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, my_pd->ownpid); + ret = -EINVAL; + } else + ret = internal_modify_qp(ibqp, attr, attr_mask, 0); + + EDEB_EX(7, "ehca_qp=%p qp_num=%x ibqp_type=%x ret=%x", + my_qp, ibqp->qp_num, ibqp->qp_type, ret); + return ret; +} + +int ehca_query_qp(struct ib_qp *qp, + struct ib_qp_attr *qp_attr, + int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr) +{ + struct ehca_qp *my_qp = NULL; + struct ehca_shca *shca = NULL; + struct hcp_modify_qp_control_block *qpcb = NULL; + struct ipz_adapter_handle adapter_handle; + struct ehca_pd *my_pd = NULL; + u32 cur_pid = current->tgid; + int cnt = 0, ret = 0; + u64 h_ret = H_SUCCESS; + + EHCA_CHECK_ADR(qp); + EHCA_CHECK_ADR(qp_attr); + EHCA_CHECK_DEVICE(qp->device); + + my_qp = container_of(qp, struct ehca_qp, ib_qp); + + EDEB_EN(7, "ehca_qp=%p qp_num=%x " + "qp_attr=%p qp_attr_mask=%x qp_init_attr=%p", + my_qp, qp->qp_num, qp_attr, qp_attr_mask, qp_init_attr); + + my_pd = container_of(my_qp->ib_qp.pd, struct ehca_pd, ib_pd); + if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && + my_pd->ownpid != cur_pid) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, my_pd->ownpid); + ret = -EINVAL; + goto query_qp_exit0; + } + + shca = container_of(qp->device, struct ehca_shca, ib_device); + adapter_handle = shca->ipz_hca_handle; + + if (qp_attr_mask & QP_ATTR_QUERY_NOT_SUPPORTED) { + ret = -EINVAL; + EDEB_ERR(4,"Invalid attribute mask " + "ehca_qp=%p qp_num=%x qp_attr_mask=%x ", + my_qp, qp->qp_num, qp_attr_mask); + goto query_qp_exit0; + } + + qpcb = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL ); + if (!qpcb) { + ret = -ENOMEM; + EDEB_ERR(4,"Out of memory for qpcb " + "ehca_qp=%p qp_num=%x", my_qp, qp->qp_num); + goto query_qp_exit0; + } + + h_ret = hipz_h_query_qp(adapter_handle, + my_qp->ipz_qp_handle, + &my_qp->pf, + qpcb, my_qp->galpas.kernel); + + if (h_ret != H_SUCCESS) { + ret = ehca2ib_return_code(h_ret); + EDEB_ERR(4,"hipz_h_query_qp() failed " + "ehca_qp=%p qp_num=%x h_ret=%lx", + my_qp, qp->qp_num, h_ret); + goto query_qp_exit1; + } + + qp_attr->cur_qp_state = ehca2ib_qp_state(qpcb->qp_state); + qp_attr->qp_state = qp_attr->cur_qp_state; + if (qp_attr->cur_qp_state == -EINVAL) { + ret = -EINVAL; + EDEB_ERR(4,"Got invalid ehca_qp_state=%x " + "ehca_qp=%p qp_num=%x", + qpcb->qp_state, my_qp, qp->qp_num); + goto query_qp_exit1; + } + + if (qp_attr->qp_state == IB_QPS_SQD) + qp_attr->sq_draining = 1; + + qp_attr->qkey = qpcb->qkey; + qp_attr->path_mtu = qpcb->path_mtu; + qp_attr->path_mig_state = qpcb->path_migration_state; + qp_attr->rq_psn = qpcb->receive_psn; + qp_attr->sq_psn = qpcb->send_psn; + qp_attr->min_rnr_timer = qpcb->min_rnr_nak_timer_field; + qp_attr->cap.max_send_wr = qpcb->max_nr_outst_send_wr-1; + qp_attr->cap.max_recv_wr = qpcb->max_nr_outst_recv_wr-1; + /* UD_AV CIRCUMVENTION */ + if (my_qp->qp_type == IB_QPT_UD) { + qp_attr->cap.max_send_sge = + qpcb->actual_nr_sges_in_sq_wqe - 2; + qp_attr->cap.max_recv_sge = + qpcb->actual_nr_sges_in_rq_wqe - 2; + } else { + qp_attr->cap.max_send_sge = + qpcb->actual_nr_sges_in_sq_wqe; + qp_attr->cap.max_recv_sge = + qpcb->actual_nr_sges_in_rq_wqe; + } + + qp_attr->cap.max_inline_data = my_qp->sq_max_inline_data_size; + qp_attr->dest_qp_num = qpcb->dest_qp_nr; + + qp_attr->pkey_index = + EHCA_BMASK_GET(MQPCB_PRIM_P_KEY_IDX, qpcb->prim_p_key_idx); + + qp_attr->port_num = + EHCA_BMASK_GET(MQPCB_PRIM_PHYS_PORT, qpcb->prim_phys_port); + + qp_attr->timeout = qpcb->timeout; + qp_attr->retry_cnt = qpcb->retry_count; + qp_attr->rnr_retry = qpcb->rnr_retry_count; + + qp_attr->alt_pkey_index = + EHCA_BMASK_GET(MQPCB_PRIM_P_KEY_IDX, qpcb->alt_p_key_idx); + + qp_attr->alt_port_num = qpcb->alt_phys_port; + qp_attr->alt_timeout = qpcb->timeout_al; + + /* primary av */ + qp_attr->ah_attr.sl = qpcb->service_level; + + if (qpcb->send_grh_flag) { + qp_attr->ah_attr.ah_flags = IB_AH_GRH; + } + + qp_attr->ah_attr.static_rate = qpcb->max_static_rate; + qp_attr->ah_attr.dlid = qpcb->dlid; + qp_attr->ah_attr.src_path_bits = qpcb->source_path_bits; + qp_attr->ah_attr.port_num = qp_attr->port_num; + + /* primary GRH */ + qp_attr->ah_attr.grh.traffic_class = qpcb->traffic_class; + qp_attr->ah_attr.grh.hop_limit = qpcb->hop_limit; + qp_attr->ah_attr.grh.sgid_index = qpcb->source_gid_idx; + qp_attr->ah_attr.grh.flow_label = qpcb->flow_label; + + for (cnt = 0; cnt < 16; cnt++) + qp_attr->ah_attr.grh.dgid.raw[cnt] = + qpcb->dest_gid.byte[cnt]; + + /* alternate AV */ + qp_attr->alt_ah_attr.sl = qpcb->service_level_al; + if (qpcb->send_grh_flag_al) { + qp_attr->alt_ah_attr.ah_flags = IB_AH_GRH; + } + + qp_attr->alt_ah_attr.static_rate = qpcb->max_static_rate_al; + qp_attr->alt_ah_attr.dlid = qpcb->dlid_al; + qp_attr->alt_ah_attr.src_path_bits = qpcb->source_path_bits_al; + + /* alternate GRH */ + qp_attr->alt_ah_attr.grh.traffic_class = qpcb->traffic_class_al; + qp_attr->alt_ah_attr.grh.hop_limit = qpcb->hop_limit_al; + qp_attr->alt_ah_attr.grh.sgid_index = qpcb->source_gid_idx_al; + qp_attr->alt_ah_attr.grh.flow_label = qpcb->flow_label_al; + + for (cnt = 0; cnt < 16; cnt++) + qp_attr->alt_ah_attr.grh.dgid.raw[cnt] = + qpcb->dest_gid_al.byte[cnt]; + + /* return init attributes given in ehca_create_qp */ + if (qp_init_attr) + *qp_init_attr = my_qp->init_attr; + + EDEB(7, "ehca_qp=%p qp_number=%x dest_qp_number=%x " + "dlid=%x path_mtu=%x dest_gid=%lx_%lx " + "service_level=%x qp_state=%x", + my_qp, qpcb->qp_number, qpcb->dest_qp_nr, + qpcb->dlid, qpcb->path_mtu, + qpcb->dest_gid.dw[0], qpcb->dest_gid.dw[1], + qpcb->service_level, qpcb->qp_state); + + EDEB_DMP(7, qpcb, 4*70, "ehca_qp=%p qp_num=%x", my_qp, qp->qp_num); + +query_qp_exit1: + kfree(qpcb); + +query_qp_exit0: + EDEB_EX(7, "ehca_qp=%p qp_num=%x ret=%x", + my_qp, qp->qp_num, ret); + return ret; +} + +int ehca_destroy_qp(struct ib_qp *ibqp) +{ + extern struct ehca_module ehca_module; + struct ehca_qp *my_qp = NULL; + struct ehca_shca *shca = NULL; + struct ehca_pfqp *qp_pf = NULL; + struct ehca_pd *my_pd = NULL; + u32 cur_pid = current->tgid; + u32 qp_num = 0; + int ret = 0; + u64 h_ret = H_SUCCESS; + u8 port_num = 0; + enum ib_qp_type qp_type; + unsigned long flags; + + EHCA_CHECK_ADR(ibqp); + + my_qp = container_of(ibqp, struct ehca_qp, ib_qp); + qp_num = ibqp->qp_num; + qp_pf = &my_qp->pf; + + shca = container_of(ibqp->device, struct ehca_shca, ib_device); + + EDEB_EN(7, "ehca_qp=%p qp_num=%x", my_qp, ibqp->qp_num); + + my_pd = container_of(my_qp->ib_qp.pd, struct ehca_pd, ib_pd); + if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context && + my_pd->ownpid != cur_pid) { + EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x", + cur_pid, my_pd->ownpid); + return -EINVAL; + } + + if (my_qp->send_cq) { + ret = ehca_cq_unassign_qp(my_qp->send_cq, + my_qp->real_qp_num); + if (ret) { + EDEB_ERR(4, "Couldn't unassign qp from send_cq " + "ret=%x qp_num=%x cq_num=%x", + ret, my_qp->ib_qp.qp_num, + my_qp->send_cq->cq_number); + goto destroy_qp_exit0; + } + } + + spin_lock_irqsave(&ehca_qp_idr_lock, flags); + idr_remove(&ehca_qp_idr, my_qp->token); + spin_unlock_irqrestore(&ehca_qp_idr_lock, flags); + + /* un-mmap if vma alloc */ + if (my_qp->uspace_rqueue) { + ret = ehca_munmap(my_qp->uspace_rqueue, + my_qp->ipz_rqueue.queue_length); + ret = ehca_munmap(my_qp->uspace_squeue, + my_qp->ipz_squeue.queue_length); + ret = ehca_munmap(my_qp->uspace_fwh, 4096); + } + + h_ret = hipz_h_destroy_qp(shca->ipz_hca_handle, my_qp); + if (h_ret != H_SUCCESS) { + EDEB_ERR(4, "hipz_h_destroy_qp() failed " + "rc=%lx ehca_qp=%p qp_num=%x", + h_ret, qp_pf, qp_num); + goto destroy_qp_exit0; + } + + port_num = my_qp->init_attr.port_num; + qp_type = my_qp->init_attr.qp_type; + + /* no support for IB_QPT_SMI yet */ + if (qp_type == IB_QPT_GSI) { + struct ib_event event; + + EDEB(4, "device %s: port %x is inactive.", + shca->ib_device.name, port_num); + event.device = &shca->ib_device; + event.event = IB_EVENT_PORT_ERR; + event.element.port_num = port_num; + shca->sport[port_num - 1].port_state = IB_PORT_DOWN; + ib_dispatch_event(&event); + } + + ipz_queue_dtor(&my_qp->ipz_rqueue); + ipz_queue_dtor(&my_qp->ipz_squeue); + kmem_cache_free(ehca_module.cache_qp, my_qp); + +destroy_qp_exit0: + ret = ehca2ib_return_code(h_ret); + EDEB_EX(7,"ret=%x", ret); + return ret; +} --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_reqs.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_reqs.c 2006-05-15 08:09:49.000000000 +0200 @@ -0,0 +1,683 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * post_send/recv, poll_cq, req_notify + * + * Authors: Waleri Fomin + * Hoang-Nam Nguyen + * Reinhard Ernst + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + + +#define DEB_PREFIX "reqs" + +#include "ehca_classes.h" +#include "ehca_tools.h" +#include "ehca_qes.h" +#include "ehca_iverbs.h" +#include "hcp_if.h" +#include "hipz_fns.h" + +static inline int ehca_write_rwqe(struct ipz_queue *ipz_rqueue, + struct ehca_wqe *wqe_p, + struct ib_recv_wr *recv_wr) +{ + u8 cnt_ds; + if (unlikely((recv_wr->num_sge < 0) || + (recv_wr->num_sge > ipz_rqueue->act_nr_of_sg))) { + EDEB_ERR(4, "Invalid number of WQE SGE. " + "num_sqe=%x max_nr_of_sg=%x", + recv_wr->num_sge, ipz_rqueue->act_nr_of_sg); + return -EINVAL; /* invalid SG list length */ + } + + /* clear wqe header until sglist */ + memset(wqe_p, 0, offsetof(struct ehca_wqe, u.ud_av.sg_list)); + + wqe_p->work_request_id = be64_to_cpu(recv_wr->wr_id); + wqe_p->nr_of_data_seg = recv_wr->num_sge; + + for (cnt_ds = 0; cnt_ds < recv_wr->num_sge; cnt_ds++) { + wqe_p->u.all_rcv.sg_list[cnt_ds].vaddr = + be64_to_cpu(recv_wr->sg_list[cnt_ds].addr); + wqe_p->u.all_rcv.sg_list[cnt_ds].lkey = + ntohl(recv_wr->sg_list[cnt_ds].lkey); + wqe_p->u.all_rcv.sg_list[cnt_ds].length = + ntohl(recv_wr->sg_list[cnt_ds].length); + } + + if (IS_EDEB_ON(7)) { + EDEB(7, "RECEIVE WQE written into ipz_rqueue=%p", ipz_rqueue); + EDEB_DMP(7, wqe_p, 16*(6 + wqe_p->nr_of_data_seg), "recv wqe"); + } + + return 0; +} + +#if defined(DEBUG_GSI_SEND_WR) + +/* need ib_mad struct */ +#include + +static void trace_send_wr_ud(const struct ib_send_wr *send_wr) +{ + int idx = 0; + int j = 0; + while (send_wr) { + struct ib_mad_hdr *mad_hdr = send_wr->wr.ud.mad_hdr; + struct ib_sge *sge = send_wr->sg_list; + EDEB(4, "send_wr#%x wr_id=%lx num_sge=%x " + "send_flags=%x opcode=%x",idx, send_wr->wr_id, + send_wr->num_sge, send_wr->send_flags, send_wr->opcode); + if (mad_hdr) { + EDEB(4, "send_wr#%x mad_hdr base_version=%x " + "mgmt_class=%x class_version=%x method=%x " + "status=%x class_specific=%x tid=%lx attr_id=%x " + "resv=%x attr_mod=%x", + idx, mad_hdr->base_version, mad_hdr->mgmt_class, + mad_hdr->class_version, mad_hdr->method, + mad_hdr->status, mad_hdr->class_specific, + mad_hdr->tid, mad_hdr->attr_id, mad_hdr->resv, + mad_hdr->attr_mod); + } + for (j = 0; j < send_wr->num_sge; j++) { + u8 *data = (u8 *) abs_to_virt(sge->addr); + EDEB(4, "send_wr#%x sge#%x addr=%p length=%x lkey=%x", + idx, j, data, sge->length, sge->lkey); + /* assume length is n*16 */ + EDEB_DMP(4, data, sge->length, "send_wr#%x sge#%x", + idx, j); + sge++; + } /* eof for j */ + idx++; + send_wr = send_wr->next; + } /* eof while send_wr */ +} + +#endif /* DEBUG_GSI_SEND_WR */ + +static inline int ehca_write_swqe(struct ehca_qp *qp, + struct ehca_wqe *wqe_p, + const struct ib_send_wr *send_wr) +{ + u32 idx; + u64 dma_length; + struct ehca_av *my_av; + u32 remote_qkey = send_wr->wr.ud.remote_qkey; + + if (unlikely((send_wr->num_sge < 0) || + (send_wr->num_sge > qp->ipz_squeue.act_nr_of_sg))) { + EDEB_ERR(4, "Invalid number of WQE SGE. " + "num_sqe=%x max_nr_of_sg=%x", + send_wr->num_sge, qp->ipz_squeue.act_nr_of_sg); + return -EINVAL; /* invalid SG list length */ + } + + /* clear wqe header until sglist */ + memset(wqe_p, 0, offsetof(struct ehca_wqe, u.ud_av.sg_list)); + + wqe_p->work_request_id = be64_to_cpu(send_wr->wr_id); + + switch (send_wr->opcode) { + case IB_WR_SEND: + case IB_WR_SEND_WITH_IMM: + wqe_p->optype = WQE_OPTYPE_SEND; + break; + case IB_WR_RDMA_WRITE: + case IB_WR_RDMA_WRITE_WITH_IMM: + wqe_p->optype = WQE_OPTYPE_RDMAWRITE; + break; + case IB_WR_RDMA_READ: + wqe_p->optype = WQE_OPTYPE_RDMAREAD; + break; + default: + EDEB_ERR(4, "Invalid opcode=%x", send_wr->opcode); + return -EINVAL; /* invalid opcode */ + } + + wqe_p->wqef = (send_wr->opcode) & WQEF_HIGH_NIBBLE; + + wqe_p->wr_flag = 0; + + if (send_wr->send_flags & IB_SEND_SIGNALED) + wqe_p->wr_flag |= WQE_WRFLAG_REQ_SIGNAL_COM; + + if (send_wr->opcode == IB_WR_SEND_WITH_IMM || + send_wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM) { + /* this might not work as long as HW does not support it */ + wqe_p->immediate_data = send_wr->imm_data; + wqe_p->wr_flag |= WQE_WRFLAG_IMM_DATA_PRESENT; + } + + wqe_p->nr_of_data_seg = send_wr->num_sge; + + switch (qp->qp_type) { + case IB_QPT_SMI: + case IB_QPT_GSI: + /* no break is intential here */ + case IB_QPT_UD: + /* IB 1.2 spec C10-15 compliance */ + if (send_wr->wr.ud.remote_qkey & 0x80000000) + remote_qkey = qp->qkey; + + wqe_p->destination_qp_number = + ntohl(send_wr->wr.ud.remote_qpn << 8); + wqe_p->local_ee_context_qkey = ntohl(remote_qkey); + if (!send_wr->wr.ud.ah) { + EDEB_ERR(4, "wr.ud.ah is NULL. qp=%p", qp); + return -EINVAL; + } + my_av = container_of(send_wr->wr.ud.ah, struct ehca_av, ib_ah); + wqe_p->u.ud_av.ud_av = my_av->av; + + /* omitted check of IB_SEND_INLINE + since HW does not support it */ + for (idx = 0; idx < send_wr->num_sge; idx++) { + wqe_p->u.ud_av.sg_list[idx].vaddr = + be64_to_cpu(send_wr->sg_list[idx].addr); + wqe_p->u.ud_av.sg_list[idx].lkey = + ntohl(send_wr->sg_list[idx].lkey); + wqe_p->u.ud_av.sg_list[idx].length = + ntohl(send_wr->sg_list[idx].length); + } /* eof for idx */ + if (qp->qp_type == IB_QPT_SMI || + qp->qp_type == IB_QPT_GSI) + wqe_p->u.ud_av.ud_av.pmtu = 1; + if (qp->qp_type == IB_QPT_GSI) { + wqe_p->pkeyi = + ntohs(send_wr->wr.ud.pkey_index); +#ifdef DEBUG_GSI_SEND_WR + trace_send_wr_ud(send_wr); +#endif /* DEBUG_GSI_SEND_WR */ + } + break; + + case IB_QPT_UC: + if (send_wr->send_flags & IB_SEND_FENCE) + wqe_p->wr_flag |= WQE_WRFLAG_FENCE; + /* no break is intentional here */ + case IB_QPT_RC: + /* TODO: atomic not implemented */ + wqe_p->u.nud.remote_virtual_adress = + be64_to_cpu(send_wr->wr.rdma.remote_addr); + wqe_p->u.nud.rkey = ntohl(send_wr->wr.rdma.rkey); + + /* omitted checking of IB_SEND_INLINE + since HW does not support it */ + dma_length = 0; + for (idx = 0; idx < send_wr->num_sge; idx++) { + wqe_p->u.nud.sg_list[idx].vaddr = + be64_to_cpu(send_wr->sg_list[idx].addr); + wqe_p->u.nud.sg_list[idx].lkey = + ntohl(send_wr->sg_list[idx].lkey); + wqe_p->u.nud.sg_list[idx].length = + ntohl(send_wr->sg_list[idx].length); + dma_length += send_wr->sg_list[idx].length; + } /* eof idx */ + wqe_p->u.nud.atomic_1st_op_dma_len = be64_to_cpu(dma_length); + + break; + + default: + EDEB_ERR(4, "Invalid qptype=%x", qp->qp_type); + return -EINVAL; + } + + if (IS_EDEB_ON(7)) { + EDEB(7, "SEND WQE written into queue qp=%p ", qp); + EDEB_DMP(7, wqe_p, 16*(6 + wqe_p->nr_of_data_seg), "send wqe"); + } + return 0; +} + +/** map_ib_wc_status - convert raw cqe_status to ib_wc_status + */ +static inline void map_ib_wc_status(u32 cqe_status, + enum ib_wc_status *wc_status) +{ + if (unlikely(cqe_status & WC_STATUS_ERROR_BIT)) { + switch (cqe_status & 0x3F) { + case 0x01: + case 0x21: + *wc_status = IB_WC_LOC_LEN_ERR; + break; + case 0x02: + case 0x22: + *wc_status = IB_WC_LOC_QP_OP_ERR; + break; + case 0x03: + case 0x23: + *wc_status = IB_WC_LOC_EEC_OP_ERR; + break; + case 0x04: + case 0x24: + *wc_status = IB_WC_LOC_PROT_ERR; + break; + case 0x05: + case 0x25: + *wc_status = IB_WC_WR_FLUSH_ERR; + break; + case 0x06: + *wc_status = IB_WC_MW_BIND_ERR; + break; + case 0x07: /* remote error - look into bits 20:24 */ + switch ((cqe_status + & WC_STATUS_REMOTE_ERROR_FLAGS) >> 11) { + case 0x0: + /* PSN Sequence Error! + couldn't find a matching status! */ + *wc_status = IB_WC_GENERAL_ERR; + break; + case 0x1: + *wc_status = IB_WC_REM_INV_REQ_ERR; + break; + case 0x2: + *wc_status = IB_WC_REM_ACCESS_ERR; + break; + case 0x3: + *wc_status = IB_WC_REM_OP_ERR; + break; + case 0x4: + *wc_status = IB_WC_REM_INV_RD_REQ_ERR; + break; + } + break; + case 0x08: + *wc_status = IB_WC_RETRY_EXC_ERR; + break; + case 0x09: + *wc_status = IB_WC_RNR_RETRY_EXC_ERR; + break; + case 0x0A: + case 0x2D: + *wc_status = IB_WC_REM_ABORT_ERR; + break; + case 0x0B: + case 0x2E: + *wc_status = IB_WC_INV_EECN_ERR; + break; + case 0x0C: + case 0x2F: + *wc_status = IB_WC_INV_EEC_STATE_ERR; + break; + case 0x0D: + *wc_status = IB_WC_BAD_RESP_ERR; + break; + case 0x10: + /* WQE purged */ + *wc_status = IB_WC_WR_FLUSH_ERR; + break; + default: + *wc_status = IB_WC_FATAL_ERR; + + } + } else + *wc_status = IB_WC_SUCCESS; +} + +int ehca_post_send(struct ib_qp *qp, + struct ib_send_wr *send_wr, + struct ib_send_wr **bad_send_wr) +{ + struct ehca_qp *my_qp = NULL; + struct ib_send_wr *cur_send_wr = NULL; + struct ehca_wqe *wqe_p = NULL; + int wqe_cnt = 0; + int ret = 0; + unsigned long spl_flags = 0; + + EHCA_CHECK_ADR(qp); + my_qp = container_of(qp, struct ehca_qp, ib_qp); + EHCA_CHECK_QP(my_qp); + EHCA_CHECK_ADR(send_wr); + EDEB_EN(7, "ehca_qp=%p qp_num=%x send_wr=%p bad_send_wr=%p", + my_qp, qp->qp_num, send_wr, bad_send_wr); + + /* LOCK the QUEUE */ + spin_lock_irqsave(&my_qp->spinlock_s, spl_flags); + + /* loop processes list of send reqs */ + for (cur_send_wr = send_wr; cur_send_wr != NULL; + cur_send_wr = cur_send_wr->next) { + u64 start_offset = my_qp->ipz_squeue.current_q_offset; + /* get pointer next to free WQE */ + wqe_p = ipz_qeit_get_inc(&my_qp->ipz_squeue); + if (unlikely(!wqe_p)) { + /* too many posted work requests: queue overflow */ + if (bad_send_wr) + *bad_send_wr = cur_send_wr; + if (wqe_cnt == 0) { + ret = -ENOMEM; + EDEB_ERR(4, "Too many posted WQEs qp_num=%x", + qp->qp_num); + } + goto post_send_exit0; + } + /* write a SEND WQE into the QUEUE */ + ret = ehca_write_swqe(my_qp, wqe_p, cur_send_wr); + /* if something failed, + reset the free entry pointer to the start value + */ + if (unlikely(ret)) { + my_qp->ipz_squeue.current_q_offset = start_offset; + *bad_send_wr = cur_send_wr; + if (wqe_cnt == 0) { + ret = -EINVAL; + EDEB_ERR(4, "Could not write WQE qp_num=%x", + qp->qp_num); + } + goto post_send_exit0; + } + wqe_cnt++; + EDEB(7, "ehca_qp=%p qp_num=%x wqe_cnt=%d", + my_qp, qp->qp_num, wqe_cnt); + } /* eof for cur_send_wr */ + +post_send_exit0: + /* UNLOCK the QUEUE */ + spin_unlock_irqrestore(&my_qp->spinlock_s, spl_flags); + iosync(); /* serialize GAL register access */ + hipz_update_sqa(my_qp, wqe_cnt); + EDEB_EX(7, "ehca_qp=%p qp_num=%x ret=%x wqe_cnt=%d", + my_qp, qp->qp_num, ret, wqe_cnt); + return ret; +} + +int ehca_post_recv(struct ib_qp *qp, + struct ib_recv_wr *recv_wr, + struct ib_recv_wr **bad_recv_wr) +{ + struct ehca_qp *my_qp = NULL; + struct ib_recv_wr *cur_recv_wr = NULL; + struct ehca_wqe *wqe_p = NULL; + int wqe_cnt = 0; + int ret = 0; + unsigned long spl_flags = 0; + + EHCA_CHECK_ADR(qp); + my_qp = container_of(qp, struct ehca_qp, ib_qp); + EHCA_CHECK_QP(my_qp); + EHCA_CHECK_ADR(recv_wr); + EDEB_EN(7, "ehca_qp=%p qp_num=%x recv_wr=%p bad_recv_wr=%p", + my_qp, qp->qp_num, recv_wr, bad_recv_wr); + + /* LOCK the QUEUE */ + spin_lock_irqsave(&my_qp->spinlock_r, spl_flags); + + /* loop processes list of send reqs */ + for (cur_recv_wr = recv_wr; cur_recv_wr != NULL; + cur_recv_wr = cur_recv_wr->next) { + u64 start_offset = my_qp->ipz_rqueue.current_q_offset; + /* get pointer next to free WQE */ + wqe_p = ipz_qeit_get_inc(&my_qp->ipz_rqueue); + if (unlikely(!wqe_p)) { + /* too many posted work requests: queue overflow */ + if (bad_recv_wr) + *bad_recv_wr = cur_recv_wr; + if (wqe_cnt == 0) { + ret = -ENOMEM; + EDEB_ERR(4, "Too many posted WQEs qp_num=%x", + qp->qp_num); + } + goto post_recv_exit0; + } + /* write a RECV WQE into the QUEUE */ + ret = ehca_write_rwqe(&my_qp->ipz_rqueue, wqe_p, + cur_recv_wr); + /* if something failed, + reset the free entry pointer to the start value + */ + if (unlikely(ret)) { + my_qp->ipz_rqueue.current_q_offset = start_offset; + *bad_recv_wr = cur_recv_wr; + if (wqe_cnt == 0) { + ret = -EINVAL; + EDEB_ERR(4, "Could not write WQE qp_num=%x", + qp->qp_num); + } + goto post_recv_exit0; + } + wqe_cnt++; + EDEB(7, "ehca_qp=%p qp_num=%x wqe_cnt=%d", + my_qp, qp->qp_num, wqe_cnt); + } /* eof for cur_recv_wr */ + +post_recv_exit0: + spin_unlock_irqrestore(&my_qp->spinlock_r, spl_flags); + iosync(); /* serialize GAL register access */ + hipz_update_rqa(my_qp, wqe_cnt); + EDEB_EX(7, "ehca_qp=%p qp_num=%x ret=%x wqe_cnt=%d", + my_qp, qp->qp_num, ret, wqe_cnt); + return ret; +} + +/** + * ib_wc_opcode - Table converts ehca wc opcode to ib + * Since we use zero to indicate invalid opcode, the actual ib opcode must + * be decremented!!! + */ +static const u8 ib_wc_opcode[255] = { + [0x01] = IB_WC_RECV+1, + [0x02] = IB_WC_RECV_RDMA_WITH_IMM+1, + [0x04] = IB_WC_BIND_MW+1, + [0x08] = IB_WC_FETCH_ADD+1, + [0x10] = IB_WC_COMP_SWAP+1, + [0x20] = IB_WC_RDMA_WRITE+1, + [0x40] = IB_WC_RDMA_READ+1, + [0x80] = IB_WC_SEND+1 +}; + +/** + * internal function to poll one entry of cq + */ +static inline int ehca_poll_cq_one(struct ib_cq *cq, struct ib_wc *wc) +{ + int ret = 0; + struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq); + struct ehca_cqe *cqe = NULL; + int cqe_count = 0; + + EDEB_EN(7, "ehca_cq=%p cq_num=%x wc=%p", my_cq, my_cq->cq_number, wc); + +poll_cq_one_read_cqe: + cqe = (struct ehca_cqe *) + ipz_qeit_get_inc_valid(&my_cq->ipz_queue); + if (!cqe) { + ret = -EAGAIN; + EDEB(7, "Completion queue is empty ehca_cq=%p cq_num=%x " + "ret=%x", my_cq, my_cq->cq_number, ret); + goto poll_cq_one_exit0; + } + cqe_count++; + if (unlikely(cqe->status & WC_STATUS_PURGE_BIT)) { + struct ehca_qp *qp=ehca_cq_get_qp(my_cq, cqe->local_qp_number); + int purgeflag = 0; + unsigned long spl_flags = 0; + if (!qp) { + EDEB_ERR(4, "cq_num=%x qp_num=%x " + "could not find qp -> ignore cqe", + my_cq->cq_number, cqe->local_qp_number); + EDEB_DMP(4, cqe, 64, "cq_num=%x qp_num=%x", + my_cq->cq_number, cqe->local_qp_number); + /* ignore this purged cqe */ + goto poll_cq_one_read_cqe; + } + spin_lock_irqsave(&qp->spinlock_s, spl_flags); + purgeflag = qp->sqerr_purgeflag; + spin_unlock_irqrestore(&qp->spinlock_s, spl_flags); + + if (purgeflag) { + EDEB(6, "Got CQE with purged bit qp_num=%x src_qp=%x", + cqe->local_qp_number, cqe->remote_qp_number); + EDEB_DMP(6, cqe, 64, "qp_num=%x src_qp=%x", + cqe->local_qp_number, cqe->remote_qp_number); + /* ignore this to avoid double cqes of bad wqe + that caused sqe and turn off purge flag */ + qp->sqerr_purgeflag = 0; + goto poll_cq_one_read_cqe; + } + } + + /* tracing cqe */ + if (IS_EDEB_ON(7)) { + EDEB(7, "Received COMPLETION ehca_cq=%p cq_num=%x -----", + my_cq, my_cq->cq_number); + EDEB_DMP(7, cqe, 64, "ehca_cq=%p cq_num=%x", + my_cq, my_cq->cq_number); + EDEB(7, "ehca_cq=%p cq_num=%x -------------------------", + my_cq, my_cq->cq_number); + } + + /* we got a completion! */ + wc->wr_id = cqe->work_request_id; + + /* eval ib_wc_opcode */ + wc->opcode = ib_wc_opcode[cqe->optype]-1; + if (unlikely(wc->opcode == -1)) { + EDEB_ERR(4, "Invalid cqe->OPType=%x cqe->status=%x " + "ehca_cq=%p cq_num=%x", + cqe->optype, cqe->status, my_cq, my_cq->cq_number); + /* dump cqe for other infos */ + EDEB_DMP(4, cqe, 64, "ehca_cq=%p cq_num=%x", + my_cq, my_cq->cq_number); + /* update also queue adder to throw away this entry!!! */ + goto poll_cq_one_exit0; + } + /* eval ib_wc_status */ + if (unlikely(cqe->status & WC_STATUS_ERROR_BIT)) { /* complete with errors */ + map_ib_wc_status(cqe->status, &wc->status); + wc->vendor_err = wc->status; + } else + wc->status = IB_WC_SUCCESS; + + wc->qp_num = cqe->local_qp_number; + wc->byte_len = ntohl(cqe->nr_bytes_transferred); + wc->pkey_index = cqe->pkey_index; + wc->slid = cqe->rlid; + wc->dlid_path_bits = cqe->dlid; + wc->src_qp = cqe->remote_qp_number; + wc->wc_flags = cqe->w_completion_flags; + wc->imm_data = cqe->immediate_data; + wc->sl = cqe->service_level; + + if (wc->status != IB_WC_SUCCESS) + EDEB(6, "ehca_cq=%p cq_num=%x WARNING unsuccessful cqe " + "OPType=%x status=%x qp_num=%x src_qp=%x wr_id=%lx cqe=%p", + my_cq, my_cq->cq_number, cqe->optype, cqe->status, + cqe->local_qp_number, cqe->remote_qp_number, + cqe->work_request_id, cqe); + +poll_cq_one_exit0: + if (cqe_count > 0) + hipz_update_feca(my_cq, cqe_count); + + EDEB_EX(7, "ret=%x ehca_cq=%p cq_number=%x wc=%p " + "status=%x opcode=%x qp_num=%x byte_len=%x", + ret, my_cq, my_cq->cq_number, wc, wc->status, + wc->opcode, wc->qp_num, wc->byte_len); + + return ret; +} + +int ehca_poll_cq(struct ib_cq *cq, int num_entries, struct ib_wc *wc) +{ + struct ehca_cq *my_cq = NULL; + int nr = 0; + struct ib_wc *current_wc = NULL; + int ret = 0; + unsigned long spl_flags = 0; + + EHCA_CHECK_CQ(cq); + EHCA_CHECK_ADR(wc); + + my_cq = container_of(cq, struct ehca_cq, ib_cq); + EHCA_CHECK_CQ(my_cq); + + EDEB_EN(7, "ehca_cq=%p cq_num=%x num_entries=%d wc=%p", + my_cq, my_cq->cq_number, num_entries, wc); + + if (num_entries < 1) { + EDEB_ERR(4, "Invalid num_entries=%d ehca_cq=%p cq_num=%x", + num_entries, my_cq, my_cq->cq_number); + ret = -EINVAL; + goto poll_cq_exit0; + } + + current_wc = wc; + spin_lock_irqsave(&my_cq->spinlock, spl_flags); + for (nr = 0; nr < num_entries; nr++) { + ret = ehca_poll_cq_one(cq, current_wc); + if (ret) + break; + current_wc++; + } /* eof for nr */ + spin_unlock_irqrestore(&my_cq->spinlock, spl_flags); + if (ret == -EAGAIN || !ret) + ret = nr; + +poll_cq_exit0: + EDEB_EX(7, "ehca_cq=%p cq_num=%x ret=%x wc=%p nr_entries=%d", + my_cq, my_cq->cq_number, ret, wc, nr); + + return ret; +} + +int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify) +{ + struct ehca_cq *my_cq = NULL; + int ret = 0; + + EHCA_CHECK_CQ(cq); + my_cq = container_of(cq, struct ehca_cq, ib_cq); + EHCA_CHECK_CQ(my_cq); + EDEB_EN(7, "ehca_cq=%p cq_num=%x cq_notif=%x", + my_cq, my_cq->cq_number, cq_notify); + + switch (cq_notify) { + case IB_CQ_SOLICITED: + hipz_set_cqx_n0(my_cq, 1); + break; + case IB_CQ_NEXT_COMP: + hipz_set_cqx_n1(my_cq, 1); + break; + default: + return -EINVAL; + } + + EDEB_EX(7, "ehca_cq=%p cq_num=%x ret=%x", + my_cq, my_cq->cq_number, ret); + + return ret; +} --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_sqp.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_sqp.c 2006-05-15 13:35:24.000000000 +0200 @@ -0,0 +1,123 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * SQP functions + * + * Authors: Khadija Souissi + * Heiko J Schick + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + + +#define DEB_PREFIX "e_qp" + +#include +#include +#include "ehca_classes.h" +#include "ehca_tools.h" +#include "ehca_qes.h" +#include "ehca_iverbs.h" +#include "hcp_if.h" + + +extern int ehca_create_aqp1(struct ehca_shca *shca, struct ehca_sport *sport); +extern int ehca_destroy_aqp1(struct ehca_sport *sport); + +extern int ehca_port_act_time; + +/** + * ehca_define_sqp - Defines special queue pair 1 (GSI QP). When special queue + * pair is created successfully, the corresponding port gets active. + * + * Define Special Queue pair 0 (SMI QP) is still not supported. + * + * @qp_init_attr: Queue pair init attributes with port and queue pair type + */ + +u64 ehca_define_sqp(struct ehca_shca *shca, + struct ehca_qp *ehca_qp, + struct ib_qp_init_attr *qp_init_attr) +{ + + u32 pma_qp_nr = 0; + u32 bma_qp_nr = 0; + u64 ret = H_SUCCESS; + u8 port = qp_init_attr->port_num; + int counter = 0; + + EDEB_EN(7, "port=%x qp_type=%x", + port, qp_init_attr->qp_type); + + shca->sport[port - 1].port_state = IB_PORT_DOWN; + + switch (qp_init_attr->qp_type) { + case IB_QPT_SMI: + /* function not supported yet */ + break; + case IB_QPT_GSI: + ret = hipz_h_define_aqp1(shca->ipz_hca_handle, + ehca_qp->ipz_qp_handle, + ehca_qp->galpas.kernel, + (u32) qp_init_attr->port_num, + &pma_qp_nr, &bma_qp_nr); + + if (ret != H_SUCCESS) { + EDEB_ERR(4, "Can't define AQP1 for port %x. rc=%lx", + port, ret); + goto ehca_define_aqp1; + } + break; + default: + ret = H_PARAMETER; + goto ehca_define_aqp1; + } + + while ((shca->sport[port - 1].port_state != IB_PORT_ACTIVE) && + (counter < ehca_port_act_time)) { + EDEB(6, "... wait until port %x is active", + port); + msleep_interruptible(1000); + counter++; + } + + if (counter == ehca_port_act_time) { + EDEB_ERR(4, "Port %x is not active.", port); + ret = H_HARDWARE; + } + +ehca_define_aqp1: + EDEB_EX(7, "ret=%lx", ret); + + return ret; +} From schihei at de.ibm.com Mon May 15 10:43:01 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:43:01 +0200 Subject: [openib-general] [PATCH 13/16] ehca: hardware interface Message-ID: <4468BDA5.7050705@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/hipz_fns.h | 68 ++++ drivers/infiniband/hw/ehca/hipz_fns_core.h | 122 ++++++++ drivers/infiniband/hw/ehca/hipz_hw.h | 395 +++++++++++++++++++++++++++++ 3 files changed, 585 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/hipz_fns.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/hipz_fns.h 2006-05-02 12:48:48.000000000 +0200 @@ -0,0 +1,68 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * HW abstraction register functions + * + * Authors: Christoph Raisch + * Reinhard Ernst + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __HIPZ_FNS_H__ +#define __HIPZ_FNS_H__ + +#include "ehca_classes.h" +#include "hipz_hw.h" + +#include "hipz_fns_core.h" + +#define hipz_galpa_store_eq(gal, offset, value) \ + hipz_galpa_store(gal, EQTEMM_OFFSET(offset), value) + +#define hipz_galpa_load_eq(gal, offset) \ + hipz_galpa_load(gal, EQTEMM_OFFSET(offset)) + +#define hipz_galpa_store_qped(gal, offset, value) \ + hipz_galpa_store(gal, QPEDMM_OFFSET(offset), value) + +#define hipz_galpa_load_qped(gal, offset) \ + hipz_galpa_load(gal, QPEDMM_OFFSET(offset)) + +#define hipz_galpa_store_mrmw(gal, offset, value) \ + hipz_galpa_store(gal, MRMWMM_OFFSET(offset), value) + +#define hipz_galpa_load_mrmw(gal, offset) \ + hipz_galpa_load(gal, MRMWMM_OFFSET(offset)) + +#endif --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/hipz_fns_core.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/hipz_fns_core.h 2006-05-02 12:48:48.000000000 +0200 @@ -0,0 +1,122 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * HW abstraction register functions + * + * Authors: Christoph Raisch + * Heiko J Schick + * Hoang-Nam Nguyen + * Reinhard Ernst + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __HIPZ_FNS_CORE_H__ +#define __HIPZ_FNS_CORE_H__ + +#include "hcp_phyp.h" +#include "hipz_hw.h" + +#define hipz_galpa_store_cq(gal, offset, value) \ + hipz_galpa_store(gal, CQTEMM_OFFSET(offset), value) + +#define hipz_galpa_load_cq(gal, offset) \ + hipz_galpa_load(gal, CQTEMM_OFFSET(offset)) + +#define hipz_galpa_store_qp(gal,offset, value) \ + hipz_galpa_store(gal, QPTEMM_OFFSET(offset), value) +#define hipz_galpa_load_qp(gal, offset) \ + hipz_galpa_load(gal,QPTEMM_OFFSET(offset)) + +static inline void hipz_update_sqa(struct ehca_qp *qp, u16 nr_wqes) +{ + struct h_galpa gal; + + EDEB_EN(7, "qp=%p", qp); + gal = qp->galpas.kernel; + /* ringing doorbell :-) */ + hipz_galpa_store_qp(gal, qpx_sqa, EHCA_BMASK_SET(QPX_SQADDER, nr_wqes)); + EDEB_EX(7, "qp=%p QPx_SQA = %i", qp, nr_wqes); +} + +static inline void hipz_update_rqa(struct ehca_qp *qp, u16 nr_wqes) +{ + struct h_galpa gal; + + EDEB_EN(7, "qp=%p", qp); + gal = qp->galpas.kernel; + /* ringing doorbell :-) */ + hipz_galpa_store_qp(gal, qpx_rqa, EHCA_BMASK_SET(QPX_RQADDER, nr_wqes)); + EDEB_EX(7, "qp=%p QPx_RQA = %i", qp, nr_wqes); +} + +static inline void hipz_update_feca(struct ehca_cq *cq, u32 nr_cqes) +{ + struct h_galpa gal; + + EDEB_EN(7, "cq=%p", cq); + gal = cq->galpas.kernel; + hipz_galpa_store_cq(gal, cqx_feca, + EHCA_BMASK_SET(CQX_FECADDER, nr_cqes)); + EDEB_EX(7, "cq=%p CQx_FECA = %i", cq, nr_cqes); +} + +static inline void hipz_set_cqx_n0(struct ehca_cq *cq, u32 value) +{ + struct h_galpa gal; + u64 CQx_N0_reg = 0; + + EDEB_EN(7, "cq=%p event on solicited completion -- write CQx_N0", cq); + gal = cq->galpas.kernel; + hipz_galpa_store_cq(gal, cqx_n0, + EHCA_BMASK_SET(CQX_N0_GENERATE_SOLICITED_COMP_EVENT, + value)); + CQx_N0_reg = hipz_galpa_load_cq(gal, cqx_n0); + EDEB_EX(7, "cq=%p loaded CQx_N0=%lx", cq, (unsigned long)CQx_N0_reg); +} + +static inline void hipz_set_cqx_n1(struct ehca_cq *cq, u32 value) +{ + struct h_galpa gal; + u64 CQx_N1_reg = 0; + + EDEB_EN(7, "cq=%p event on completion -- write CQx_N1", + cq); + gal = cq->galpas.kernel; + hipz_galpa_store_cq(gal, cqx_n1, + EHCA_BMASK_SET(CQX_N1_GENERATE_COMP_EVENT, value)); + CQx_N1_reg = hipz_galpa_load_cq(gal, cqx_n1); + EDEB_EX(7, "cq=%p loaded CQx_N1=%lx", cq, (unsigned long)CQx_N1_reg); +} + +#endif /* __HIPZ_FNC_CORE_H__ */ --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/hipz_hw.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/hipz_hw.h 2006-05-02 10:55:26.000000000 +0200 @@ -0,0 +1,395 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * eHCA register definitions + * + * Authors: Waleri Fomin + * Christoph Raisch + * Reinhard Ernst + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __HIPZ_HW_H__ +#define __HIPZ_HW_H__ + +#include "ehca_tools.h" + +/** QP Table Entry Memory Map + */ +struct hipz_qptemm { + u64 qpx_hcr; + u64 qpx_c; + u64 qpx_herr; + u64 qpx_aer; +/* 0x20*/ + u64 qpx_sqa; + u64 qpx_sqc; + u64 qpx_rqa; + u64 qpx_rqc; +/* 0x40*/ + u64 qpx_st; + u64 qpx_pmstate; + u64 qpx_pmfa; + u64 qpx_pkey; +/* 0x60*/ + u64 qpx_pkeya; + u64 qpx_pkeyb; + u64 qpx_pkeyc; + u64 qpx_pkeyd; +/* 0x80*/ + u64 qpx_qkey; + u64 qpx_dqp; + u64 qpx_dlidp; + u64 qpx_portp; +/* 0xa0*/ + u64 qpx_slidp; + u64 qpx_slidpp; + u64 qpx_dlida; + u64 qpx_porta; +/* 0xc0*/ + u64 qpx_slida; + u64 qpx_slidpa; + u64 qpx_slvl; + u64 qpx_ipd; +/* 0xe0*/ + u64 qpx_mtu; + u64 qpx_lato; + u64 qpx_rlimit; + u64 qpx_rnrlimit; +/* 0x100*/ + u64 qpx_t; + u64 qpx_sqhp; + u64 qpx_sqptp; + u64 qpx_nspsn; +/* 0x120*/ + u64 qpx_nspsnhwm; + u64 reserved1; + u64 qpx_sdsi; + u64 qpx_sdsbc; +/* 0x140*/ + u64 qpx_sqwsize; + u64 qpx_sqwts; + u64 qpx_lsn; + u64 qpx_nssn; +/* 0x160 */ + u64 qpx_mor; + u64 qpx_cor; + u64 qpx_sqsize; + u64 qpx_erc; +/* 0x180*/ + u64 qpx_rnrrc; + u64 qpx_ernrwt; + u64 qpx_rnrresp; + u64 qpx_lmsna; +/* 0x1a0 */ + u64 qpx_sqhpc; + u64 qpx_sqcptp; + u64 qpx_sigt; + u64 qpx_wqecnt; +/* 0x1c0*/ + + u64 qpx_rqhp; + u64 qpx_rqptp; + u64 qpx_rqsize; + u64 qpx_nrr; +/* 0x1e0*/ + u64 qpx_rdmac; + u64 qpx_nrpsn; + u64 qpx_lapsn; + u64 qpx_lcr; +/* 0x200*/ + u64 qpx_rwc; + u64 qpx_rwva; + u64 qpx_rdsi; + u64 qpx_rdsbc; +/* 0x220*/ + u64 qpx_rqwsize; + u64 qpx_crmsn; + u64 qpx_rdd; + u64 qpx_larpsn; +/* 0x240*/ + u64 qpx_pd; + u64 qpx_scqn; + u64 qpx_rcqn; + u64 qpx_aeqn; +/* 0x260*/ + u64 qpx_aaelog; + u64 qpx_ram; + u64 qpx_rdmaqe0; + u64 qpx_rdmaqe1; +/* 0x280*/ + u64 qpx_rdmaqe2; + u64 qpx_rdmaqe3; + u64 qpx_nrpsnhwm; +/* 0x298*/ + u64 reserved[(0x400 - 0x298) / 8]; +/* 0x400 extended data */ + u64 reserved_ext[(0x500 - 0x400) / 8]; +/* 0x500 */ + u64 reserved2[(0x1000 - 0x500) / 8]; +/* 0x1000 */ +}; + +#define QPX_SQADDER EHCA_BMASK_IBM(48,63) +#define QPX_RQADDER EHCA_BMASK_IBM(48,63) + +#define QPTEMM_OFFSET(x) offsetof(struct hipz_qptemm,x) + +/** MRMWPT Entry Memory Map + */ +struct hipz_mrmwmm { + /* 0x00 */ + u64 mrx_hcr; + + u64 mrx_c; + u64 mrx_herr; + u64 mrx_aer; + /* 0x20 */ + u64 mrx_pp; + u64 reserved1; + u64 reserved2; + u64 reserved3; + /* 0x40 */ + u64 reserved4[(0x200 - 0x40) / 8]; + /* 0x200 */ + u64 mrx_ctl[64]; + +}; + +#define MRX_HCR_LPARID_VALID EHCA_BMASK_IBM(0,0) + +#define MRMWMM_OFFSET(x) offsetof(struct hipz_mrmwmm,x) + +struct hipz_qpedmm { + /* 0x00 */ + u64 reserved0[(0x400) / 8]; + /* 0x400 */ + u64 qpedx_phh; + u64 qpedx_ppsgp; + /* 0x410 */ + u64 qpedx_ppsgu; + u64 qpedx_ppdgp; + /* 0x420 */ + u64 qpedx_ppdgu; + u64 qpedx_aph; + /* 0x430 */ + u64 qpedx_apsgp; + u64 qpedx_apsgu; + /* 0x440 */ + u64 qpedx_apdgp; + u64 qpedx_apdgu; + /* 0x450 */ + u64 qpedx_apav; + u64 qpedx_apsav; + /* 0x460 */ + u64 qpedx_hcr; + u64 reserved1[4]; + /* 0x488 */ + u64 qpedx_rrl0; + /* 0x490 */ + u64 qpedx_rrrkey0; + u64 qpedx_rrva0; + /* 0x4a0 */ + u64 reserved2; + u64 qpedx_rrl1; + /* 0x4b0 */ + u64 qpedx_rrrkey1; + u64 qpedx_rrva1; + /* 0x4c0 */ + u64 reserved3; + u64 qpedx_rrl2; + /* 0x4d0 */ + u64 qpedx_rrrkey2; + u64 qpedx_rrva2; + /* 0x4e0 */ + u64 reserved4; + u64 qpedx_rrl3; + /* 0x4f0 */ + u64 qpedx_rrrkey3; + u64 qpedx_rrva3; +}; + +#define QPEDMM_OFFSET(x) offsetof(struct hipz_QPEDMM,x) + +/** CQ Table Entry Memory Map + */ +struct hipz_cqtemm { + u64 cqx_hcr; + u64 cqx_c; + u64 cqx_herr; + u64 cqx_aer; +/* 0x20 */ + u64 cqx_ptp; + u64 cqx_tp; + u64 cqx_fec; + u64 cqx_feca; +/* 0x40 */ + u64 cqx_ep; + u64 cqx_eq; +/* 0x50 */ + u64 reserved1; + u64 cqx_n0; +/* 0x60 */ + u64 cqx_n1; + u64 reserved2[(0x1000 - 0x60) / 8]; +/* 0x1000 */ +}; + +#define CQX_FEC_CQE_CNT EHCA_BMASK_IBM(32,63) +#define CQX_FECADDER EHCA_BMASK_IBM(32,63) +#define CQX_N0_GENERATE_SOLICITED_COMP_EVENT EHCA_BMASK_IBM(0,0) +#define CQX_N1_GENERATE_COMP_EVENT EHCA_BMASK_IBM(0,0) + +#define CQTEMM_OFFSET(x) offsetof(struct hipz_cqtemm,x) + +/** EQ Table Entry Memory Map + */ +struct hipz_eqtemm { + u64 eqx_hcr; + u64 eqx_c; + + u64 eqx_herr; + u64 eqx_aer; +/* 0x20 */ + u64 eqx_ptp; + u64 eqx_tp; + u64 eqx_ssba; + u64 eqx_psba; + +/* 0x40 */ + u64 eqx_cec; + u64 eqx_meql; + u64 eqx_xisbi; + u64 eqx_xisc; +/* 0x60 */ + u64 eqx_it; + +}; + +#define EQTEMM_OFFSET(x) offsetof(struct hipz_eqtemm,x) + +/* access control defines for MR/MW */ +#define HIPZ_ACCESSCTRL_L_WRITE 0x00800000 +#define HIPZ_ACCESSCTRL_R_WRITE 0x00400000 +#define HIPZ_ACCESSCTRL_R_READ 0x00200000 +#define HIPZ_ACCESSCTRL_R_ATOMIC 0x00100000 +#define HIPZ_ACCESSCTRL_MW_BIND 0x00080000 + +/* query hca response block */ +struct hipz_query_hca { + u32 cur_reliable_dg; + u32 cur_qp; + u32 cur_cq; + u32 cur_eq; + u32 cur_mr; + u32 cur_mw; + u32 cur_ee_context; + u32 cur_mcast_grp; + u32 cur_qp_attached_mcast_grp; + u32 reserved1; + u32 cur_ipv6_qp; + u32 cur_eth_qp; + u32 cur_hp_mr; + u32 reserved2[3]; + u32 max_rd_domain; + u32 max_qp; + u32 max_cq; + u32 max_eq; + u32 max_mr; + u32 max_hp_mr; + u32 max_mw; + u32 max_mrwpte; + u32 max_special_mrwpte; + u32 max_rd_ee_context; + u32 max_mcast_grp; + u32 max_total_mcast_qp_attach; + u32 max_mcast_qp_attach; + u32 max_raw_ipv6_qp; + u32 max_raw_ethy_qp; + u32 internal_clock_frequency; + u32 max_pd; + u32 max_ah; + u32 max_cqe; + u32 max_wqes_wq; + u32 max_partitions; + u32 max_rr_ee_context; + u32 max_rr_qp; + u32 max_rr_hca; + u32 max_act_wqs_ee_context; + u32 max_act_wqs_qp; + u32 max_sge; + u32 max_sge_rd; + u32 memory_page_size_supported; + u64 max_mr_size; + u32 local_ca_ack_delay; + u32 num_ports; + u32 vendor_id; + u32 vendor_part_id; + u32 hw_ver; + u64 node_guid; + u64 hca_cap_indicators; + u32 data_counter_register_size; + u32 max_shared_rq; + u32 max_isns_eq; + u32 max_neq; +} __attribute__ ((packed)); + +/* query port response block */ +struct hipz_query_port { + u32 state; + u32 bad_pkey_cntr; + u32 lmc; + u32 lid; + u32 subnet_timeout; + u32 qkey_viol_cntr; + u32 sm_sl; + u32 sm_lid; + u32 capability_mask; + u32 init_type_reply; + u32 pkey_tbl_len; + u32 gid_tbl_len; + u64 gid_prefix; + u32 port_nr; + u16 pkey_entries[16]; + u8 reserved1[32]; + u32 trent_size; + u32 trbuf_size; + u64 max_msg_sz; + u32 max_mtu; + u32 vl_cap; + u8 reserved2[1900]; + u64 guid_entries[255]; +} __attribute__ ((packed)); + +#endif From schihei at de.ibm.com Mon May 15 10:43:09 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:43:09 +0200 Subject: [openib-general] [PATCH 14/16] ehca: queue page table handling Message-ID: <4468BDAD.3020701@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/ipz_pt_fn.c | 177 ++++++++++++++++++++++ drivers/infiniband/hw/ehca/ipz_pt_fn.h | 254 +++++++++++++++++++++++++++++++++ 2 files changed, 431 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ipz_pt_fn.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ipz_pt_fn.h 2006-05-12 12:25:43.000000000 +0200 @@ -0,0 +1,254 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * internal queue handling + * + * Authors: Waleri Fomin + * Reinhard Ernst + * Christoph Raisch + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __IPZ_PT_FN_H__ +#define __IPZ_PT_FN_H__ + +#include "ehca_qes.h" +#define EHCA_PAGESHIFT 12 +#define EHCA_PAGESIZE 4096UL +#define EHCA_PT_ENTRIES 512UL + +#include "ehca_tools.h" +#include "ehca_qes.h" + +/* struct generic ehca page + */ +struct ipz_page { + u8 entries[EHCA_PAGESIZE]; +}; + +/* struct generic queue in linux kernel virtual memory (kv) + */ +struct ipz_queue { + u64 current_q_offset; /* current queue entry */ + + struct ipz_page **queue_pages; /* array of pages belonging to queue */ + u32 qe_size; /* queue entry size */ + u32 act_nr_of_sg; + u32 queue_length; /* queue length allocated in bytes */ + u32 pagesize; + u32 toggle_state; /* toggle flag - per page */ + u32 dummy3; /* 64 bit alignment */ +}; + +/* return current Queue Entry for a certain q_offset + * returns address (kv) of Queue Entry + */ +static inline void *ipz_qeit_calc(struct ipz_queue *queue, u64 q_offset) +{ + struct ipz_page *current_page = NULL; + if (q_offset >= queue->queue_length) + return NULL; + current_page = (queue->queue_pages)[q_offset >> EHCA_PAGESHIFT]; + return ¤t_page->entries[q_offset & (EHCA_PAGESIZE - 1)]; +} + +/* return current Queue Entry + * returns address (kv) of Queue Entry + */ +static inline void *ipz_qeit_get(struct ipz_queue *queue) +{ + return ipz_qeit_calc(queue, queue->current_q_offset); +} + +/* return current Queue Page , increment Queue Page iterator from + * page to page in struct ipz_queue, last increment will return 0! and + * NOT wrap + * returns address (kv) of Queue Page + * warning don't use in parallel with ipz_QE_get_inc() + */ +void *ipz_qpageit_get_inc(struct ipz_queue *queue); + +/* return current Queue Entry, increment Queue Entry iterator by one + * step in struct ipz_queue, will wrap in ringbuffer + * @returns address (kv) of Queue Entry BEFORE increment + * warning don't use in parallel with ipz_qpageit_get_inc() + * warning unpredictable results may occur if steps>act_nr_of_queue_entries + */ +static inline void *ipz_qeit_get_inc(struct ipz_queue *queue) +{ + void *ret = NULL; + + ret = ipz_qeit_get(queue); + queue->current_q_offset += queue->qe_size; + if (queue->current_q_offset >= queue->queue_length) { + queue->current_q_offset = 0; + /* toggle the valid flag */ + queue->toggle_state = (~queue->toggle_state) & 1; + } + + EDEB(7, "queue=%p ret=%p new current_q_addr=%lx qe_size=%x", + queue, ret, queue->current_q_offset, queue->qe_size); + + return ret; +} + +/* return current Queue Entry, increment Queue Entry iterator by one + * step in struct ipz_queue, will wrap in ringbuffer + * returns address (kv) of Queue Entry BEFORE increment + * returns 0 and does not increment, if wrong valid state + * warning don't use in parallel with ipz_qpageit_get_inc() + * warning unpredictable results may occur if steps>act_nr_of_queue_entries + */ +static inline void *ipz_qeit_get_inc_valid(struct ipz_queue *queue) +{ + struct ehca_cqe *cqe = ipz_qeit_get(queue); + u32 cqe_flags = cqe->cqe_flags; + + if ((cqe_flags >> 7) != (queue->toggle_state & 1)) + return NULL; + + ipz_qeit_get_inc(queue); + return cqe; +} + +/* returns and resets Queue Entry iterator + * returns address (kv) of first Queue Entry + */ +static inline void *ipz_qeit_reset(struct ipz_queue *queue) +{ + queue->current_q_offset = 0; + return ipz_qeit_get(queue); +} + +/** struct generic page table + */ +struct ipz_pt { + u64 entries[EHCA_PT_ENTRIES]; +}; + +/* struct page table for a queue, only to be used in pf + */ +struct ipz_qpt { + /* queue page tables (kv), use u64 because we know the element length */ + u64 *qpts; + u32 allocated_qpts_entries; + u32 nr_of_PTEs; /* number of page table entries PTE iterators */ + u64 *current_pte_addr; +}; + +/* constructor for a ipz_queue_t, placement new for ipz_queue_t, + * new for all dependent datastructors + * + * all QP Tables are the same + * flow: + * allocate+pin queue + * see ipz_qpt_ctor() + * returns true if ok, false if out of memory + */ +int ipz_queue_ctor(struct ipz_queue *queue, const u32 nr_of_pages, + const u32 pagesize, const u32 qe_size, + const u32 nr_of_sg); + +/* destructor for a ipz_queue_t + * -# free queue + * see ipz_queue_ctor() + * returns true if ok, false if queue was NULL-ptr of free failed + */ +int ipz_queue_dtor(struct ipz_queue *queue); + +/* constructor for a ipz_qpt_t, + * placement new for struct ipz_queue, new for all dependent datastructors + * + * all QP Tables are the same, + * flow: + * -# allocate+pin queue + * -# initialise ptcb + * -# allocate+pin PTs + * -# link PTs to a ring, according to HCA Arch, set bit62 id needed + * -# the ring must have room for exactly nr_of_PTEs + * see ipz_qpt_ctor() + */ +void ipz_qpt_ctor(struct ipz_qpt *qpt, + const u32 nr_of_QEs, + const u32 pagesize, + const u32 qe_size, + const u8 lowbyte, const u8 toggle, + u32 * act_nr_of_QEs, u32 * act_nr_of_pages); + +/* return current Queue Entry, increment Queue Entry iterator by one + * step in struct ipz_queue, will wrap in ringbuffer + * returns address (kv) of Queue Entry BEFORE increment + * warning don't use in parallel with ipz_qpageit_get_inc() + * warning unpredictable results may occur if steps>act_nr_of_queue_entries + * + * fix EQ page problems + */ +void *ipz_qeit_eq_get_inc(struct ipz_queue *queue); + +/* return current Event Queue Entry, increment Queue Entry iterator + * by one step in struct ipz_queue if valid, will wrap in ringbuffer + * returns address (kv) of Queue Entry BEFORE increment + * returns 0 and does not increment, if wrong valid state + * warning don't use in parallel with ipz_queue_QPageit_get_inc() + * warning unpredictable results may occur if steps>act_nr_of_queue_entries + */ +static inline void *ipz_eqit_eq_get_inc_valid(struct ipz_queue *queue) +{ + void *ret = ipz_qeit_get(queue); + u32 qe = *(u8 *) ret; + EDEB(7, "ipz_QEit_EQ_get_inc_valid qe=%x", qe); + if ((qe >> 7) == (queue->toggle_state & 1)) + ipz_qeit_eq_get_inc(queue); /* this is a good one */ + else + ret = NULL; + return ret; +} + +/* + * returns address (GX) of first queue entry + */ +static inline u64 ipz_qpt_get_firstpage(struct ipz_qpt *qpt) +{ + return be64_to_cpu(qpt->qpts[0]); +} + +/* + * returns address (kv) of first page of queue page table + */ +static inline void *ipz_qpt_get_qpt(struct ipz_qpt *qpt) +{ + return qpt->qpts; +} + +#endif /* __IPZ_PT_FN_H__ */ --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ipz_pt_fn.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ipz_pt_fn.c 2006-05-15 15:43:31.000000000 +0200 @@ -0,0 +1,177 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * internal queue handling + * + * Authors: Waleri Fomin + * Reinhard Ernst + * Christoph Raisch + * + * Copyright (c) 2005 IBM Corporation + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#define DEB_PREFIX "iptz" + +#include "ehca_tools.h" +#include "ipz_pt_fn.h" + +extern int ehca_hwlevel; + +void *ipz_qpageit_get_inc(struct ipz_queue *queue) +{ + void *ret = ipz_qeit_get(queue); + queue->current_q_offset += queue->pagesize; + if (queue->current_q_offset > queue->queue_length) { + queue->current_q_offset -= queue->pagesize; + ret = NULL; + } + if (((u64)ret) % EHCA_PAGESIZE) { + EDEB(4, "ERROR!! not at PAGE-Boundary"); + return NULL; + } + EDEB(7, "queue=%p ret=%p", queue, ret); + return ret; +} + +void *ipz_qeit_eq_get_inc(struct ipz_queue *queue) +{ + void *ret = ipz_qeit_get(queue); + u64 last_entry_in_q = queue->queue_length - queue->qe_size; + queue->current_q_offset += queue->qe_size; + if (queue->current_q_offset > last_entry_in_q) { + queue->current_q_offset = 0; + queue->toggle_state = (~queue->toggle_state) & 1; + } + + EDEB(7, "queue=%p ret=%p new current_q_offset=%lx qe_size=%x", + queue, ret, queue->current_q_offset, queue->qe_size); + + return ret; +} + +int ipz_queue_ctor(struct ipz_queue *queue, + const u32 nr_of_pages, + const u32 pagesize, const u32 qe_size, const u32 nr_of_sg) +{ + int pages_per_kpage = PAGE_SIZE >> EHCA_PAGESHIFT; + int f; + + EDEB_EN(7, "nr_of_pages=%x pagesize=%x qe_size=%x", + nr_of_pages, pagesize, qe_size); + if (pagesize > PAGE_SIZE) { + EDEB_ERR(4, "FATAL ERROR: pagesize=%x is greater than " + "kernel page size", pagesize); + return 0; + } + if (!pages_per_kpage) { + EDEB_ERR(4, "FATAL ERROR: invalid kernel page size. " + "pages_per_kpage=%x", pages_per_kpage); + return 0; + } + queue->queue_length = nr_of_pages * pagesize; + queue->queue_pages = vmalloc(nr_of_pages * sizeof(void *)); + if (!queue->queue_pages) { + EDEB(4, "ERROR!! didn't get the memory"); + return 0; + } + memset(queue->queue_pages, 0, nr_of_pages * sizeof(void *)); + /* allocate pages for queue: + while loop allocates whole kernel pages + if cond allocates so much mem needed for the rest of queue pages, + which is nr_of_pages % pages_per_kpage + */ + f = 0; + while (f + pages_per_kpage <= nr_of_pages) { + u8 *kpage = kzalloc(PAGE_SIZE, GFP_KERNEL); + int k; + if (!kpage) + goto ipz_queue_ctor_exit0; /*NOMEM*/ + for (k = 0; k < pages_per_kpage; k++) { + (queue->queue_pages)[f] = (struct ipz_page *)kpage; + kpage += EHCA_PAGESIZE; + f++; + } + } + if (f < nr_of_pages) { + u8 *kpage = kzalloc((nr_of_pages - f) * EHCA_PAGESIZE, + GFP_KERNEL); + if (!kpage) + goto ipz_queue_ctor_exit0; /*NOMEM*/ + while (f < nr_of_pages) { + (queue->queue_pages)[f] = (struct ipz_page *)kpage; + kpage += EHCA_PAGESIZE; + f++; + } + } + + queue->current_q_offset = 0; + queue->qe_size = qe_size; + queue->act_nr_of_sg = nr_of_sg; + queue->pagesize = pagesize; + queue->toggle_state = 1; + EDEB_EX(7, "queue_length=%x queue_pages=%p qe_size=%x" + " act_nr_of_sg=%x", queue->queue_length, queue->queue_pages, + queue->qe_size, queue->act_nr_of_sg); + return 1; + + ipz_queue_ctor_exit0: + EDEB_ERR(4, "Couldn't get alloc pages queue=%p f=%x nr_of_pages=%x", + queue, f, nr_of_pages); + for (f = 0; f < nr_of_pages; f += pages_per_kpage) { + if (!(queue->queue_pages)[f]) + break; + kfree((queue->queue_pages)[f]); + } + return 0; +} + +int ipz_queue_dtor(struct ipz_queue *queue) +{ + int pages_per_kpage = PAGE_SIZE >> EHCA_PAGESHIFT; + int g; + int nr_pages; + + EDEB_EN(7, "ipz_queue pointer=%p", queue); + if (!queue || !queue->queue_pages) { + EDEB_ERR(4, "queue or queue_pages is NULL"); + return 0; + } + EDEB(7, "destructing a queue with the following " + "properties:\n nr_of_pages=%x pagesize=%x qe_size=%x", + queue->act_nr_of_sg, queue->pagesize, queue->qe_size); + nr_pages = queue->queue_length / queue->pagesize; + for (g = 0; g < nr_pages; g += pages_per_kpage) + kfree((queue->queue_pages)[g]); + vfree(queue->queue_pages); + + EDEB_EX(7, "queue freed!"); + return 1; +} From schihei at de.ibm.com Mon May 15 10:43:19 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:43:19 +0200 Subject: [openib-general] [PATCH 15/16] ehca: PHYP abstraction layer Message-ID: <4468BDB7.6060106@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/hcp_phyp.c | 92 ++++++++++++++++++++++++++++++++ drivers/infiniband/hw/ehca/hcp_phyp.h | 95 ++++++++++++++++++++++++++++++++++ 2 files changed, 187 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/hcp_phyp.h 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/hcp_phyp.h 2006-05-02 12:48:48.000000000 +0200 @@ -0,0 +1,95 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * Firmware calls + * + * Authors: Christoph Raisch + * Hoang-Nam Nguyen + * Waleri Fomin + * Gerd Bayer + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef __HCP_PHYP_H__ +#define __HCP_PHYP_H__ + + +/* eHCA page (mapped into memory) + resource to access eHCA register pages in CPU address space +*/ +struct h_galpa { + u64 fw_handle; + /* for pSeries this is a 64bit memory address where + I/O memory is mapped into CPU address space (kv) */ +}; + +/* + resource to access eHCA address space registers, all types +*/ +struct h_galpas { + u32 pid; /*PID of userspace galpa checking */ + struct h_galpa user; /* user space accessible resource, + set to 0 if unused */ + struct h_galpa kernel; /* kernel space accessible resource, + set to 0 if unused */ +}; + +static inline u64 hipz_galpa_load(struct h_galpa galpa, u32 offset) +{ + u64 addr = galpa.fw_handle + offset; + u64 out; + EDEB_EN(7, "addr=%lx offset=%x ", addr, offset); + out = *(u64 *) addr; + EDEB_EX(7, "addr=%lx value=%lx", addr, out); + return out; +} + +static inline void hipz_galpa_store(struct h_galpa galpa, u32 offset, u64 value) +{ + u64 addr = galpa.fw_handle + offset; + EDEB(7, "addr=%lx offset=%x value=%lx", addr, + offset, value); + *(u64 *) addr = value; +} + +int hcp_galpas_ctor(struct h_galpas *galpas, + u64 paddr_kernel, u64 paddr_user); + +int hcp_galpas_dtor(struct h_galpas *galpas); + +int hcall_map_page(u64 physaddr, u64 * mapaddr); + +int hcall_unmap_page(u64 mapaddr); + +#endif --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/hcp_phyp.c 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/hcp_phyp.c 2006-05-03 13:44:00.000000000 +0200 @@ -0,0 +1,92 @@ +/* + * IBM eServer eHCA Infiniband device driver for Linux on POWER + * + * load store abstraction for ehca register access with tracing + * + * Authors: Christoph Raisch + * Hoang-Nam Nguyen + * + * Copyright (c) 2005 IBM Corporation + * + * All rights reserved. + * + * This source code is distributed under a dual license of GPL v2.0 and OpenIB + * BSD. + * + * OpenIB BSD License + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER + * IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#define DEB_PREFIX "PHYP" + +#include "ehca_classes.h" +#include "hipz_hw.h" + +int hcall_map_page(u64 physaddr, u64 *mapaddr) +{ + *mapaddr = (u64)(ioremap(physaddr, EHCA_PAGESIZE)); + + EDEB(7, "ioremap physaddr=%lx mapaddr=%lx", physaddr, *mapaddr); + return 0; +} + +int hcall_unmap_page(u64 mapaddr) +{ + EDEB(7, "mapaddr=%lx", mapaddr); + iounmap((volatile void __iomem*)mapaddr); + return 0; +} + +int hcp_galpas_ctor(struct h_galpas *galpas, + u64 paddr_kernel, u64 paddr_user) +{ + int ret = hcall_map_page(paddr_kernel, &galpas->kernel.fw_handle); + if (ret) + return ret; + + galpas->user.fw_handle = paddr_user; + + EDEB(7, "paddr_kernel=%lx paddr_user=%lx galpas->kernel=%lx" + " galpas->user=%lx", + paddr_kernel, paddr_user, galpas->kernel.fw_handle, + galpas->user.fw_handle); + + return ret; +} + +int hcp_galpas_dtor(struct h_galpas *galpas) +{ + int ret = 0; + + if (galpas->kernel.fw_handle) + ret = hcall_unmap_page(galpas->kernel.fw_handle); + + if (ret) + return ret; + + galpas->user.fw_handle = galpas->kernel.fw_handle = 0; + + return ret; +} From schihei at de.ibm.com Mon May 15 10:43:27 2006 From: schihei at de.ibm.com (Heiko J Schick) Date: Mon, 15 May 2006 19:43:27 +0200 Subject: [openib-general] [PATCH 16/16] ehca: integration in Linux kernel build system Message-ID: <4468BDBF.6060703@de.ibm.com> Signed-off-by: Heiko J Schick drivers/infiniband/hw/ehca/Kconfig | 6 ++++++ drivers/infiniband/hw/ehca/Makefile | 16 ++++++++++++++++ 2 files changed, 22 insertions(+) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/Kconfig 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/Kconfig 2006-04-28 13:32:08.000000000 +0200 @@ -0,0 +1,6 @@ +config INFINIBAND_EHCA + tristate "eHCA support" + depends on IBMEBUS && INFINIBAND + ---help--- + This is a low level device driver for the IBM + GX based Host channel adapters (HCAs) --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/Makefile 1970-01-01 01:00:00.000000000 +0100 +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/Makefile 2006-04-28 13:31:25.000000000 +0200 @@ -0,0 +1,16 @@ +# Authors: Heiko J Schick +# Christoph Raisch +# +# Copyright (c) 2005 IBM Corporation +# +# All rights reserved. +# +# This source code is distributed under a dual license of GPL v2.0 and OpenIB BSD. + +obj-$(CONFIG_INFINIBAND_EHCA) += hcad_mod.o + +hcad_mod-objs = ehca_main.o ehca_hca.o ehca_mcast.o ehca_pd.o ehca_av.o ehca_eq.o \ + ehca_cq.o ehca_qp.o ehca_sqp.o ehca_mrmw.o ehca_reqs.o ehca_irq.o \ + ehca_uverbs.o hcp_if.o hcp_phyp.o ipz_pt_fn.o + +CFLAGS += -DEHCA_USE_HCALL -DEHCA_USE_HCALL_KERNEL From sean.hefty at intel.com Mon May 15 10:51:24 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 15 May 2006 10:51:24 -0700 Subject: [openib-general] CMA IPv6 support In-Reply-To: <7.0.1.0.2.20060515131807.041caef8@netapp.com> Message-ID: >Rdma_create_id() already takes a struct sockaddr *, which has an address >family selector (sa_family) to define the contained address format. Why is >that one not sufficient? Rdma_bind() and rdma_resolve_addr() take struct sockaddr *. Rdma_create_id() only has an event handler, context, and port space. - Sean From sean.hefty at intel.com Mon May 15 11:04:58 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 15 May 2006 11:04:58 -0700 Subject: [openib-general] CMA IPv6 support In-Reply-To: <7.0.1.0.2.20060515132921.041caef8@netapp.com> Message-ID: >Looking at rdma_listen(), the code I see checks for bound state before >proceeding to listen: > >int rdma_listen(struct rdma_cm_id *id, int backlog) >{ > struct rdma_id_private *id_priv; > int ret; > > id_priv = container_of(id, struct rdma_id_private, id); > if (!cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_LISTEN)) > return -EINVAL; > ... This is a slightly older version of the code. There's now a call to bind if the user hadn't previously called it. >This makes sense, because sockets work this way, and servers generally >want to listen on a port of their own choosing. > >So, I think it's already there. Right? Sockets allows calling listen without first calling bind. If we want to support this, then rdma_create_id() needs to know the address family. If we're okay forcing the user to call rdma_bind_addr() before calling rdma_listen(), then I think that we can avoid adding it. - Sean From rdreier at cisco.com Mon May 15 11:31:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 11:31:34 -0700 Subject: [openib-general] Re: slab error while removing ib_mad In-Reply-To: (Roland Dreier's message of "Mon, 15 May 2006 09:05:36 -0700") References: Message-ID: Yes, I am a great debugger ;) I was able to reproduce this, and stay tuned for a patch that fixes it for me. - R. From rdreier at cisco.com Mon May 15 11:41:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 11:41:00 -0700 Subject: [openib-general] [PATCH] slab: Fix kmem_cache_destroy() on NUMA In-Reply-To: <1147715356.26686.87.camel@localhost.localdomain> (Alan Cox's message of "Mon, 15 May 2006 18:49:16 +0100") References: Message-ID: With CONFIG_NUMA set, kmem_cache_destroy() may fail and say "Can't free all objects." The problem is caused by sequences such as the following (suppose we are on a NUMA machine with two nodes, 0 and 1): * Allocate an object from cache on node 0. * Free the object on node 1. The object is put into node 1's alien array_cache for node 0. * Call kmem_cache_destroy(), which ultimately ends up in __cache_shrink(). * __cache_shrink() does drain_cpu_caches(), which loops through all nodes. For each node it drains the shared array_cache and then handles the alien array_cache for the other node. However this means that node 0's shared array_cache will be drained, and then node 1 will move the contents of its alien[0] array_cache into that same shared array_cache. node 0's shared array_cache is never looked at again, so the objects left there will appear to be in use when __cache_shrink() calls __node_shrink() for node 0. So __node_shrink() will return 1 and kmem_cache_destroy() will fail. This patch fixes this by having drain_cpu_caches() do drain_alien_cache() on every node before it does drain_array() on the nodes' shared array_caches. The problem was originally reported by Or Gerlitz . Cc: Christoph Lameter Cc: Pekka Enberg Signed-off-by: Roland Dreier --- I get a nervous feeling about touching NUMA slab code, because just the topic alone makes it sound hairy. But I think my diagnosis and fix are pretty clear, and this definitely fixes crashes seen when unloading IB modules. It's a regression from 2.6.16, and x86_64 machines with > 1 NUMA node are quite common, so this probably should go into 2.6.17. diff --git a/mm/slab.c b/mm/slab.c index c32af7e..cb747be 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -2192,11 +2192,14 @@ static void drain_cpu_caches(struct kmem check_irq_on(); for_each_online_node(node) { l3 = cachep->nodelists[node]; - if (l3) { + if (l3 && l3->alien) + drain_alien_cache(cachep, l3->alien); + } + + for_each_online_node(node) { + l3 = cachep->nodelists[node]; + if (l3) drain_array(cachep, l3, l3->shared, 1, node); - if (l3->alien) - drain_alien_cache(cachep, l3->alien); - } } } From Thomas.Talpey at netapp.com Mon May 15 11:57:49 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Mon, 15 May 2006 14:57:49 -0400 Subject: [openib-general] CMA IPv6 support In-Reply-To: References: <7.0.1.0.2.20060515132921.041caef8@netapp.com> Message-ID: <7.0.1.0.2.20060515144906.041caef8@netapp.com> At 02:04 PM 5/15/2006, Sean Hefty wrote: >This is a slightly older version of the code. There's now a call to >bind if the >user hadn't previously called it. Ok, and sorry for not checking the top-of-tree. So I like the old code better (requiring the bind). Besides, if the user does bind, then the family argument would be completely redundant. I assume you'd continue to support rdma_bind_addr() letting the system choose a port by binding to port 0... Tom. From sashak at voltaire.com Mon May 15 13:09:47 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 15 May 2006 23:09:47 +0300 Subject: [openib-general] [PATCH] opensm: QoS configuration parameters for routers Message-ID: <20060515200946.21963.48513.stgit@sashak.voltaire.com> This defines QoS configuration parameters set for routers. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_subnet.h | 4 ++++ osm/opensm/osm_qos.c | 6 +++++- osm/opensm/osm_subnet.c | 10 ++++++++++ 3 files changed, 19 insertions(+), 1 deletions(-) diff --git a/osm/include/opensm/osm_subnet.h b/osm/include/opensm/osm_subnet.h index 33ab3f5..319c494 100644 --- a/osm/include/opensm/osm_subnet.h +++ b/osm/include/opensm/osm_subnet.h @@ -284,6 +284,7 @@ typedef struct _osm_subn_opt osm_qos_options_t qos_hca_options; osm_qos_options_t qos_sw0_options; osm_qos_options_t qos_swe_options; + osm_qos_options_t qos_rtr_options; } osm_subn_opt_t; /* * FIELDS @@ -448,6 +449,9 @@ typedef struct _osm_subn_opt * qos_swe_options * QoS options for switches' external ports * +* qos_rtr_options +* QoS options for router ports +* * SEE ALSO * Subnet object *********/ diff --git a/osm/opensm/osm_qos.c b/osm/opensm/osm_qos.c index ade601c..fbd7773 100644 --- a/osm/opensm/osm_qos.c +++ b/osm/opensm/osm_qos.c @@ -309,7 +309,7 @@ static ib_api_status_t qos_physp_setup(o osm_signal_t osm_qos_setup(osm_opensm_t * p_osm) { - struct qos_config hca_config, sw0_config, swe_config; + struct qos_config hca_config, sw0_config, swe_config, rtr_config; struct qos_config *cfg; osm_switch_t *p_sw; ib_switch_info_t *p_si; @@ -333,6 +333,8 @@ osm_signal_t osm_qos_setup(osm_opensm_t &p_osm->subn.opt.qos_options); qos_build_config(&swe_config, &p_osm->subn.opt.qos_swe_options, &p_osm->subn.opt.qos_options); + qos_build_config(&rtr_config, &p_osm->subn.opt.qos_rtr_options, + &p_osm->subn.opt.qos_options); cl_plock_excl_acquire(&p_osm->lock); @@ -362,6 +364,8 @@ osm_signal_t osm_qos_setup(osm_opensm_t cfg = &sw0_config; } + else if (node_type == IB_NODE_TYPE_ROUTER) + cfg = &rtr_config; else cfg = &hca_config; diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c index ab5d88e..48f0305 100644 --- a/osm/opensm/osm_subnet.c +++ b/osm/opensm/osm_subnet.c @@ -475,6 +475,7 @@ osm_subn_set_default_opt( subn_set_default_qos_options(&p_opt->qos_hca_options); subn_set_default_qos_options(&p_opt->qos_sw0_options); subn_set_default_qos_options(&p_opt->qos_swe_options); + subn_set_default_qos_options(&p_opt->qos_rtr_options); } /********************************************************************** @@ -722,6 +723,9 @@ osm_subn_rescan_conf_file( subn_parse_qos_options("qos_swe", p_key, p_val, &p_opts->qos_swe_options); + subn_parse_qos_options("qos_rtr", + p_key, p_val, &p_opts->qos_rtr_options); + } } fclose(opts_file); @@ -928,6 +932,9 @@ osm_subn_parse_conf_file( subn_parse_qos_options("qos_swe", p_key, p_val, &p_opts->qos_swe_options); + subn_parse_qos_options("qos_rtr", + p_key, p_val, &p_opts->qos_rtr_options); + } } fclose(opts_file); @@ -1141,6 +1148,9 @@ osm_subn_write_conf_file( fprintf(opts_file, "\n"); subn_dump_qos_options(opts_file, "QoS Switch external ports options", "qos_swe", &p_opts->qos_swe_options); + fprintf(opts_file, "\n"); + subn_dump_qos_options(opts_file, + "QoS router ports options", "qos_rtr", &p_opts->qos_rtr_options); /* optional string attributes ... */ From rheflin at atipa.com Mon May 15 13:11:16 2006 From: rheflin at atipa.com (Roger Heflin) Date: Mon, 15 May 2006 15:11:16 -0500 Subject: [openib-general] Re: [PATCH 0 of 53] ipath driver updates for 2.6.17-rc4 In-Reply-To: References: <4468A59C.2030400@atipa.com> Message-ID: <4468E064.9060504@atipa.com> Roland Dreier wrote: > Roger> What should these patches apply against? > > No idea. Bryan said they apply against Linus's current git, but I > didn't actually try. > > - R. > I checked the rc4 -> git patches (there is only 1 ipath patch in it), and I get a number of patch fails attempting to apply the patches, I have the older 5/12/06 patch that was sent and I also get a number of fails trying to apply that, though that may mean to be applied to rc3 and not rc4, but rc4 + older patch + newer patches fails, and rc4 + git + newer patches fails, looking through the code there are a few things that I cannot find where the code in the context diff came from. I did attempt to resolve some of the funniness but there were things that I appear to be missing (things in the context diff that I cannot find exist in rc4 and I cannot find being added in any patch), so I don't think I can even get everything to apply even with manual adjusting. Roger From amit_byron at yahoo.com Mon May 15 13:24:03 2006 From: amit_byron at yahoo.com (amit byron) Date: Mon, 15 May 2006 13:24:03 -0700 (PDT) Subject: [openib-general] sdp with kernel 2.6.16.14 Message-ID: <20060515202403.31925.qmail@web38509.mail.mud.yahoo.com> hi, i'm trying to get sdp work between point-to-point connected machines running kernel 2.6.16.24. i have configured ipoib and trying to run iperf using sdp. the client machine has an entry in its libsdp.conf: match destination 192.168.1.2 the server machine has na entry in its libsdp.conf: match listen *:5001 iperf is started on the server machine using command: LD_PRELOAD=/usr/local/lib/libsdp.so iperf -s iperf client is started on the client machine using command: LD_PRELOAD=/usr/local/lib/libsdp.so iperf -c 192.168.1.2 the server machine panics with following messages: oom-killer: gfp_mask=0xd0, order=0 [] oom-killer: gfp_mask=0xd0, order=0 [] out_of_memory+0x155/0x180 [] __alloc_pages+0x2a5/0x320 [] __get_free_pages+0x1e/0x40 [] __pollwait+0x80/0xd0 [] pipe_poll+0xcd/0xe0 [] do_select+0x212/0x480 [] cache_free_debugcheck+0x135/0x230 [] __pollwait+0x0/0xd0 [] core_sys_select+0x1ce/0x2e0 [] sys_select+0x51/0x1c0 [] sysenter_past_esp+0x54/0x75 DMA per-cpu: cpu 0 hot: high 0, batch 1 used:0 cpu 0 cold: high 0, batch 1 used:0 cpu 1 hot: high 0, batch 1 used:0 cpu 1 cold: high 0, batch 1 used:0 cpu 2 hot: high 0, batch 1 used:0 cpu 2 cold: high 0, batch 1 used:0 cpu 3 hot: high 0, batch 1 used:0 cpu 3 cold: high 0, batch 1 used:0 DMA32 per-cpu: empty Normal per-cpu: cpu 0 hot: high 186, batch 31 used:103 cpu 0 cold: high 62, batch 15 used:61 cpu 1 hot: high 186, batch 31 used:183 cpu 1 cold: high 62, batch 15 used:53 cpu 2 hot: high 186, batch 31 used:28 cpu 2 cold: high 62, batch 15 used:54 cpu 3 hot: high 186, batch 31 used:63 cpu 3 cold: high 62, batch 15 used:60 HighMem per-cpu: cpu 0 hot: high 186, batch 31 used:176 cpu 0 cold: high 62, batch 15 used:13 cpu 1 hot: high 186, batch 31 used:169 cpu 1 cold: high 62, batch 15 used:1 cpu 2 hot: high 186, batch 31 used:157 cpu 2 cold: high 62, batch 15 used:0 cpu 3 hot: high 186, batch 31 used:174 cpu 3 cold: high 62, batch 15 used:6 Free pages: 7366104kB (7358760kB HighMem) Active:5351 inactive:4885 dirty:0 writeback:0 unstable:0 free:1841526 slab:8970 mapped:4565 pagetables:238 DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16384kB pages_scanned:8 all_unreclaimable? yes lowmem_reserve[]: 0 0 880 8623 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 880 8623 Normal free:3756kB min:3756kB low:4692kB high:5632kB active:232kB inactive:0kB present:901 120kB pages_scanned:314 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 61951 HighMem free:7358760kB min:512kB low:8780kB high:17052kB active:21172kB inactive:19540kB present:7929852kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB DMA32: empty Normal: 13*4kB 1*8kB 1*16kB 3*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3756kB HighMem: 1672*4kB 1103*8kB 581*16kB 308*32kB 129*64kB 65*128kB 29*256kB 12*512kB 7*1024kB 2*2048kB 1778*4096kB = 7358760kB Swap cache: add 0, delete 0, find 0/0, race 0+0 Free swap = 0kB Total swap = 0kB Out of Memory: Kill process 2037 (mDNSResponder) score 2847 and children. Out of memory: Killed process 2037 (mDNSResponder). oom-killer: gfp_mask=0xd0, order=0 [] out_of_memory+0x155/0x180 [] __alloc_pages+0x2a5/0x320 [] __get_free_pages+0x1e/0x40 [] __pollwait+0x80/0xd0 [] pipe_poll+0xcd/0xe0 [] do_select+0x212/0x480 [] cache_free_debugcheck+0x135/0x230 [] __pollwait+0x0/0xd0 [] core_sys_select+0x1ce/0x2e0 [] sys_select+0x51/0x1c0 [] sysenter_past_esp+0x54/0x75 DMA per-cpu: cpu 0 hot: high 0, batch 1 used:0 cpu 0 cold: high 0, batch 1 used:0 cpu 1 hot: high 0, batch 1 used:0 cpu 1 cold: high 0, batch 1 used:0 cpu 2 hot: high 0, batch 1 used:0 cpu 2 cold: high 0, batch 1 used:0 cpu 3 hot: high 0, batch 1 used:0 cpu 3 cold: high 0, batch 1 used:0 DMA32 per-cpu: empty Normal per-cpu: cpu 0 hot: high 186, batch 31 used:103 cpu 0 cold: high 62, batch 15 used:61 cpu 1 hot: high 186, batch 31 used:183 cpu 1 cold: high 62, batch 15 used:53 cpu 2 hot: high 186, batch 31 used:29 cpu 2 cold: high 62, batch 15 used:54 cpu 3 hot: high 186, batch 31 used:63 cpu 3 cold: high 62, batch 15 used:60 HighMem per-cpu: cpu 0 hot: high 186, batch 31 used:176 cpu 0 cold: high 62, batch 15 used:13 cpu 1 hot: high 186, batch 31 used:169 cpu 1 cold: high 62, batch 15 used:1 cpu 2 hot: high 186, batch 31 used:179 cpu 2 cold: high 62, batch 15 used:0 cpu 3 hot: high 186, batch 31 used:174 cpu 3 cold: high 62, batch 15 used:6 Free pages: 7366476kB (7359132kB HighMem) Active:5273 inactive:4855 dirty:0 writeback:0 unstable:0 free:1841619 slab:8970 mapped:4423 pagetables:232 DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16384kB pages_scanned:8 all_unreclaimable? yes lowmem_reserve[]: 0 0 880 8623 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 880 8623 Normal free:3756kB min:3756kB low:4692kB high:5632kB active:232kB inactive:0kB present:901120kB pages_scanned:314 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 61951 HighMem free:7359132kB min:512kB low:8780kB high:17052kB active:20860kB inactive:19420kB present:7929852kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB DMA32: empty Normal: 13*4kB 1*8kB 1*16kB 3*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3756kB HighMem: 1621*4kB 1073*8kB 562*16kB 303*32kB 137*64kB 69*128kB 30*256kB 12*512kB 7*1024kB 2*2048kB 1778*4096kB = 7359132kB Swap cache: add 0, delete 0, find 0/0, race 0+0 Free swap = 0kB Total swap = 0kB Out of Memory: Kill process 2134 (sendmail) score 1741 and children. Out of memory: Killed process 2134 (sendmail). oom-killer: gfp_mask=0xd0, order=0 [] out_of_memory+0x155/0x180 [] __alloc_pages+0x2a5/0x320 [] __get_free_pages+0x1e/0x40 [] __pollwait+0x80/0xd0 [] pipe_poll+0xcd/0xe0 [] do_select+0x212/0x480 [] cache_free_debugcheck+0x135/0x230 [] __pollwait+0x0/0xd0 [] core_sys_select+0x1ce/0x2e0 [] sys_select+0x51/0x1c0 [] sysenter_past_esp+0x54/0x75 openib source were retrieved with: svn co https://openib.org/svn/gen2/trunk anybody ran into similar problem, are there any sdp patches available? thanks, Amit --------------------------------- Love cheap thrills? Enjoy PC-to-Phone calls to 30+ countries for just 2¢/min with Yahoo! Messenger with Voice. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon May 15 13:41:47 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 15 May 2006 23:41:47 +0300 Subject: [openib-general] Re: sdp with kernel 2.6.16.14 In-Reply-To: <20060515202403.31925.qmail@web38509.mail.mud.yahoo.com> References: <20060515202403.31925.qmail@web38509.mail.mud.yahoo.com> Message-ID: <20060515204147.GF19163@mellanox.co.il> Quoting r. amit byron : > Subject: sdp with kernel 2.6.16.14 > > > hi, > > i'm trying to get sdp work between point-to-point connected > machines running kernel 2.6.16.24. i have configured ipoib > and trying to run iperf using sdp. > > the client machine has an entry in its libsdp.conf: > match destination 192.168.1.2 > > the server machine has na entry in its libsdp.conf: > match listen *:5001 > > iperf is started on the server machine using command: > LD_PRELOAD=/usr/local/lib/libsdp.so iperf -s > > iperf client is started on the client machine using command: > LD_PRELOAD=/usr/local/lib/libsdp.so iperf -c 192.168.1.2 > > the server machine panics with following messages: > > oom-killer: gfp_mask=0xd0, order=0 > [] oom-killer: gfp_mask=0xd0, order=0 > [] out_of_memory+0x155/0x180 > [] __alloc_pages+0x2a5/0x320 > [] __get_free_pages+0x1e/0x40 > [] __pollwait+0x80/0xd0 > [] pipe_poll+0xcd/0xe0 > [] do_select+0x212/0x480 > [] cache_free_debugcheck+0x135/0x230 > [] __pollwait+0x0/0xd0 > [] core_sys_select+0x1ce/0x2e0 > [] sys_select+0x51/0x1c0 > [] sysenter_past_esp+0x54/0x75 > DMA per-cpu: > cpu 0 hot: high 0, batch 1 used:0 > cpu 0 cold: high 0, batch 1 used:0 > cpu 1 hot: high 0, batch 1 used:0 > cpu 1 cold: high 0, batch 1 used:0 > cpu 2 hot: high 0, batch 1 used:0 > cpu 2 cold: high 0, batch 1 used:0 > cpu 3 hot: high 0, batch 1 used:0 > cpu 3 cold: high 0, batch 1 used:0 > DMA32 per-cpu: empty > Normal per-cpu: > cpu 0 hot: high 186, batch 31 used:103 > cpu 0 cold: high 62, batch 15 used:61 > cpu 1 hot: high 186, batch 31 used:183 > cpu 1 cold: high 62, batch 15 used:53 > cpu 2 hot: high 186, batch 31 used:28 > cpu 2 cold: high 62, batch 15 used:54 > cpu 3 hot: high 186, batch 31 used:63 > cpu 3 cold: high 62, batch 15 used:60 > HighMem per-cpu: > cpu 0 hot: high 186, batch 31 used:176 > cpu 0 cold: high 62, batch 15 used:13 > cpu 1 hot: high 186, batch 31 used:169 > cpu 1 cold: high 62, batch 15 used:1 > cpu 2 hot: high 186, batch 31 used:157 > cpu 2 cold: high 62, batch 15 used:0 > cpu 3 hot: high 186, batch 31 used:174 > cpu 3 cold: high 62, batch 15 used:6 > Free pages: 7366104kB (7358760kB HighMem) > Active:5351 inactive:4885 dirty:0 writeback:0 unstable:0 free:1841526 slab:8970 mapped:4565 pagetables:238 > DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16384kB pages_scanned:8 all_unreclaimable? yes > lowmem_reserve[]: 0 0 880 8623 > DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 880 8623 > Normal free:3756kB min:3756kB low:4692kB high:5632kB active:232kB inactive:0kB present:901 > 120kB pages_scanned:314 all_unreclaimable? yes > lowmem_reserve[]: 0 0 0 61951 > HighMem free:7358760kB min:512kB low:8780kB high:17052kB active:21172kB inactive:19540kB present:7929852kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 0 0 > DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB > DMA32: empty > Normal: 13*4kB 1*8kB 1*16kB 3*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3756kB > HighMem: 1672*4kB 1103*8kB 581*16kB 308*32kB 129*64kB 65*128kB 29*256kB 12*512kB 7*1024kB 2*2048kB 1778*4096kB = 7358760kB > Swap cache: add 0, delete 0, find 0/0, race 0+0 > Free swap = 0kB > Total swap = 0kB > Out of Memory: Kill process 2037 (mDNSResponder) score 2847 and children. > Out of memory: Killed process 2037 (mDNSResponder). > oom-killer: gfp_mask=0xd0, order=0 > [] out_of_memory+0x155/0x180 > [] __alloc_pages+0x2a5/0x320 > [] __get_free_pages+0x1e/0x40 > [] __pollwait+0x80/0xd0 > [] pipe_poll+0xcd/0xe0 > [] do_select+0x212/0x480 > [] cache_free_debugcheck+0x135/0x230 > [] __pollwait+0x0/0xd0 > [] core_sys_select+0x1ce/0x2e0 > [] sys_select+0x51/0x1c0 > [] sysenter_past_esp+0x54/0x75 > DMA per-cpu: > cpu 0 hot: high 0, batch 1 used:0 > cpu 0 cold: high 0, batch 1 used:0 > cpu 1 hot: high 0, batch 1 used:0 > cpu 1 cold: high 0, batch 1 used:0 > cpu 2 hot: high 0, batch 1 used:0 > cpu 2 cold: high 0, batch 1 used:0 > cpu 3 hot: high 0, batch 1 used:0 > cpu 3 cold: high 0, batch 1 used:0 > DMA32 per-cpu: empty > Normal per-cpu: > cpu 0 hot: high 186, batch 31 used:103 > cpu 0 cold: high 62, batch 15 used:61 > cpu 1 hot: high 186, batch 31 used:183 > cpu 1 cold: high 62, batch 15 used:53 > cpu 2 hot: high 186, batch 31 used:29 > cpu 2 cold: high 62, batch 15 used:54 > cpu 3 hot: high 186, batch 31 used:63 > cpu 3 cold: high 62, batch 15 used:60 > HighMem per-cpu: > cpu 0 hot: high 186, batch 31 used:176 > cpu 0 cold: high 62, batch 15 used:13 > cpu 1 hot: high 186, batch 31 used:169 > cpu 1 cold: high 62, batch 15 used:1 > cpu 2 hot: high 186, batch 31 used:179 > cpu 2 cold: high 62, batch 15 used:0 > cpu 3 hot: high 186, batch 31 used:174 > cpu 3 cold: high 62, batch 15 used:6 > Free pages: 7366476kB (7359132kB HighMem) > Active:5273 inactive:4855 dirty:0 writeback:0 unstable:0 free:1841619 slab:8970 mapped:4423 pagetables:232 > DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16384kB pages_scanned:8 all_unreclaimable? yes > lowmem_reserve[]: 0 0 880 8623 > DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 880 8623 > Normal free:3756kB min:3756kB low:4692kB high:5632kB active:232kB inactive:0kB present:901120kB pages_scanned:314 all_unreclaimable? yes > lowmem_reserve[]: 0 0 0 61951 > HighMem free:7359132kB min:512kB low:8780kB high:17052kB active:20860kB inactive:19420kB present:7929852kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 0 0 > DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB > DMA32: empty > Normal: 13*4kB 1*8kB 1*16kB 3*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3756kB > HighMem: 1621*4kB 1073*8kB 562*16kB 303*32kB 137*64kB 69*128kB 30*256kB 12*512kB 7*1024kB 2*2048kB 1778*4096kB = 7359132kB > Swap cache: add 0, delete 0, find 0/0, race 0+0 > Free swap = 0kB > Total swap = 0kB > Out of Memory: Kill process 2134 (sendmail) score 1741 and children. > Out of memory: Killed process 2134 (sendmail). > oom-killer: gfp_mask=0xd0, order=0 > [] out_of_memory+0x155/0x180 > [] __alloc_pages+0x2a5/0x320 > [] __get_free_pages+0x1e/0x40 > [] __pollwait+0x80/0xd0 > [] pipe_poll+0xcd/0xe0 > [] do_select+0x212/0x480 > [] cache_free_debugcheck+0x135/0x230 > [] __pollwait+0x0/0xd0 > [] core_sys_select+0x1ce/0x2e0 > [] sys_select+0x51/0x1c0 > [] sysenter_past_esp+0x54/0x75 > > openib source were retrieved with: > svn co https://openib.org/svn/gen2/trunk > > anybody ran into similar problem, are there any sdp > patches available? > > thanks, > Amit Netperf is running here fine. Can you try that to verify its not a setup problem? I'll try iperf later. Hmm. Where did all the memory go? can you cat /proc/slabinfo -- MST From rdunlap at xenotime.net Mon May 15 13:47:37 2006 From: rdunlap at xenotime.net (Randy.Dunlap) Date: Mon, 15 May 2006 13:47:37 -0700 Subject: [openib-general] Re: [PATCH 12/16] ehca: firmware InfiniBand interface In-Reply-To: <4468BD99.5050505@de.ibm.com> References: <4468BD99.5050505@de.ibm.com> Message-ID: <20060515134737.c03e02d3.rdunlap@xenotime.net> On Mon, 15 May 2006 19:42:49 +0200 Heiko J Schick wrote: > Signed-off-by: Heiko J Schick > > > drivers/infiniband/hw/ehca/hcp_if.c | 1476 ++++++++++++++++++++++++++++++++++++ > drivers/infiniband/hw/ehca/hcp_if.h | 330 ++++++++ > 2 files changed, 1806 insertions(+) > > > > --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/hcp_if.h 1970-01-01 01:00:00.000000000 +0100 > +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/hcp_if.h 2006-05-12 12:48:21.000000000 +0200 > @@ -0,0 +1,330 @@ > +/** > + * hipz_h_alloc_resource_eq - Allocate EQ resources in HW and FW, initalize > + * resources, create the empty EQPT (ring). > + * > + * @eq_handle: eq handle for this queue > + * @act_nr_of_entries: actual number of queue entries > + * @act_pages: actual number of queue pages > + * @eq_ist: used by hcp_H_XIRR() call > + */ kernel-doc format needs: 1. a short function name + description on one line 2. no blank line between function and parameters 3. blank line (optional) before more detailed function description See Documentation/kernel-doc-nano-HOWTO.txt or other kernel source files for more info. And please test it with "make htmldocs" or "make mandocs". --- ~Randy From rdunlap at xenotime.net Mon May 15 13:54:19 2006 From: rdunlap at xenotime.net (Randy.Dunlap) Date: Mon, 15 May 2006 13:54:19 -0700 Subject: [openib-general] Re: [PATCH 14/16] ehca: queue page table handling In-Reply-To: <4468BDAD.3020701@de.ibm.com> References: <4468BDAD.3020701@de.ibm.com> Message-ID: <20060515135419.f77a1d8b.rdunlap@xenotime.net> On Mon, 15 May 2006 19:43:09 +0200 Heiko J Schick wrote: > Signed-off-by: Heiko J Schick > > > drivers/infiniband/hw/ehca/ipz_pt_fn.c | 177 ++++++++++++++++++++++ > drivers/infiniband/hw/ehca/ipz_pt_fn.h | 254 +++++++++++++++++++++++++++++++++ > 2 files changed, 431 insertions(+) > > > > --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ipz_pt_fn.h 1970-01-01 01:00:00.000000000 +0100 > +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ipz_pt_fn.h 2006-05-12 12:25:43.000000000 +0200 > @@ -0,0 +1,254 @@ > +/* return current Queue Page , increment Queue Page iterator from > + * page to page in struct ipz_queue, last increment will return 0! and > + * NOT wrap > + * returns address (kv) of Queue Page > + * warning don't use in parallel with ipz_QE_get_inc() > + */ When not using kernel-doc format... the preferred multi-line comment format is: /* * foo foo foo ........ * bar bar bar ....... * blahz .......... */ repeat below. > +void *ipz_qpageit_get_inc(struct ipz_queue *queue); > + > +/* return current Queue Entry, increment Queue Entry iterator by one > + * step in struct ipz_queue, will wrap in ringbuffer > + * @returns address (kv) of Queue Entry BEFORE increment > + * warning don't use in parallel with ipz_qpageit_get_inc() > + * warning unpredictable results may occur if steps>act_nr_of_queue_entries > + */ > +static inline void *ipz_qeit_get_inc(struct ipz_queue *queue) > +{ > + void *ret = NULL; > + > + ret = ipz_qeit_get(queue); > + queue->current_q_offset += queue->qe_size; > + if (queue->current_q_offset >= queue->queue_length) { > + queue->current_q_offset = 0; > + /* toggle the valid flag */ > + queue->toggle_state = (~queue->toggle_state) & 1; > + } > + > + EDEB(7, "queue=%p ret=%p new current_q_addr=%lx qe_size=%x", > + queue, ret, queue->current_q_offset, queue->qe_size); > + > + return ret; > +} > + > +/* return current Queue Entry, increment Queue Entry iterator by one > + * step in struct ipz_queue, will wrap in ringbuffer > + * returns address (kv) of Queue Entry BEFORE increment > + * returns 0 and does not increment, if wrong valid state > + * warning don't use in parallel with ipz_qpageit_get_inc() > + * warning unpredictable results may occur if steps>act_nr_of_queue_entries > + */ > +static inline void *ipz_qeit_get_inc_valid(struct ipz_queue *queue) > +{ > + struct ehca_cqe *cqe = ipz_qeit_get(queue); > + u32 cqe_flags = cqe->cqe_flags; > + > + if ((cqe_flags >> 7) != (queue->toggle_state & 1)) > + return NULL; > + > + ipz_qeit_get_inc(queue); > + return cqe; > +} > + > +/* destructor for a ipz_queue_t > + * -# free queue > + * see ipz_queue_ctor() > + * returns true if ok, false if queue was NULL-ptr of free failed > + */ > +int ipz_queue_dtor(struct ipz_queue *queue); > + > +/* constructor for a ipz_qpt_t, > + * placement new for struct ipz_queue, new for all dependent datastructors > + * > + * all QP Tables are the same, > + * flow: > + * -# allocate+pin queue > + * -# initialise ptcb > + * -# allocate+pin PTs > + * -# link PTs to a ring, according to HCA Arch, set bit62 id needed > + * -# the ring must have room for exactly nr_of_PTEs > + * see ipz_qpt_ctor() > + */ > +void ipz_qpt_ctor(struct ipz_qpt *qpt, > + const u32 nr_of_QEs, > + const u32 pagesize, > + const u32 qe_size, > + const u8 lowbyte, const u8 toggle, > + u32 * act_nr_of_QEs, u32 * act_nr_of_pages); > + > +/* return current Queue Entry, increment Queue Entry iterator by one > + * step in struct ipz_queue, will wrap in ringbuffer > + * returns address (kv) of Queue Entry BEFORE increment > + * warning don't use in parallel with ipz_qpageit_get_inc() > + * warning unpredictable results may occur if steps>act_nr_of_queue_entries > + * > + * fix EQ page problems > + */ > +void *ipz_qeit_eq_get_inc(struct ipz_queue *queue); > + > +/* return current Event Queue Entry, increment Queue Entry iterator > + * by one step in struct ipz_queue if valid, will wrap in ringbuffer > + * returns address (kv) of Queue Entry BEFORE increment > + * returns 0 and does not increment, if wrong valid state > + * warning don't use in parallel with ipz_queue_QPageit_get_inc() > + * warning unpredictable results may occur if steps>act_nr_of_queue_entries > + */ > +static inline void *ipz_eqit_eq_get_inc_valid(struct ipz_queue *queue) > +{ > + void *ret = ipz_qeit_get(queue); > + u32 qe = *(u8 *) ret; > + EDEB(7, "ipz_QEit_EQ_get_inc_valid qe=%x", qe); > + if ((qe >> 7) == (queue->toggle_state & 1)) > + ipz_qeit_eq_get_inc(queue); /* this is a good one */ > + else > + ret = NULL; > + return ret; > +} --- ~Randy From rdunlap at xenotime.net Mon May 15 14:03:37 2006 From: rdunlap at xenotime.net (Randy.Dunlap) Date: Mon, 15 May 2006 14:03:37 -0700 Subject: [openib-general] Re: [PATCH 05/16] ehca: common include files In-Reply-To: <4468BD5B.1060406@de.ibm.com> References: <4468BD5B.1060406@de.ibm.com> Message-ID: <20060515140337.bad62cc9.rdunlap@xenotime.net> On Mon, 15 May 2006 19:41:47 +0200 Heiko J Schick wrote: > Signed-off-by: Heiko J Schick > > > drivers/infiniband/hw/ehca/ehca_iverbs.h | 181 +++++++++++++ > drivers/infiniband/hw/ehca/ehca_tools.h | 411 +++++++++++++++++++++++++++++++ > 2 files changed, 592 insertions(+) > > > > --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_tools.h 1970-01-01 01:00:00.000000000 +0100 > +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_tools.h 2006-05-03 13:44:15.000000000 +0200 > @@ -0,0 +1,411 @@ > +static inline u64 ehca_edeb_filter(const u32 level, > + const u32 id, const u32 line) > +{ > + u64 ret = 0; > + u32 filenr = 0; > + u32 filter_level = 9; > + u32 dynamic_level = 0; > + > + /* This is code written for the gcc -O2 optimizer which should colapse collapse > + * to two single ints filter_level is the first level kicked out by > + * compiler means trace everythin below 6. */ everything plus make a real sentence of that, please. > +} > + > +#ifdef EHCA_USE_HCALL_KERNEL > +#ifdef CONFIG_PPC_PSERIES > + > +#include > + > +/** > + * IS_EDEB_ON - Checks if debug is on for the given level. > + */ > +#define IS_EDEB_ON(level) \ > + ((ehca_edeb_filter(level, EDEB_ID_TO_U32(DEB_PREFIX), __LINE__) & 0x100000000L)==0) > + > +#elif REAL_HCALL > + > + > +#endif > +#else > + > +#endif > + > +/** > + * EDEB - Trace output macro. > + * @level tracelevel > + * @format optional format string, use "" if not desired > + * @args printf like arguments for trace, use %Lx for u64, %x for u32 > + * %p for pointer > + */ Use real kernel-doc format here, please. Parameters at least need a colon (':') after their names, like: * @format: optonal format string, use "" if not desired and test them... > +#define EDEB(level,format,args...) \ > + EDEB_P_GENERIC(level,"",format,##args) > +#define EDEB_ERR(level,format,args...) \ > + EDEB_P_GENERIC(level,"HCAD_ERROR ",format,##args) > +#define EDEB_EN(level,format,args...) \ > + EDEB_P_GENERIC(level,">>>",format,##args) > +#define EDEB_EX(level,format,args...) \ > + EDEB_P_GENERIC(level,"<<<",format,##args) > + > +/** > + * EDEB macro to dump a memory block, whose length is n*8 bytes. EDEB_DMP > + * Each line has the following layout: > + * adr=X ofs=Y <8 bytes hex> <8 bytes hex> > + */ > +#define EDEB_DMP(level,adr,len,format,args...) \ > + do { \ > + unsigned int x; \ > + unsigned int l = (unsigned int)(len); \ > + unsigned char *deb = (unsigned char*)(adr); \ > + for (x = 0; x < l; x += 16) { \ > + EDEB(level, format " adr=%p ofs=%04x %016lx %016lx", \ > + ##args, deb, x, *((u64 *)&deb[0]), *((u64 *)&deb[8])); \ > + deb += 16; \ > + } \ > + } while (0) > + > +/* define a bitmask, little endian version */ > +#define EHCA_BMASK(pos,length) (((pos)<<16)+(length)) > +/* define a bitmask, the ibm way... */ > +#define EHCA_BMASK_IBM(from,to) (((63-to)<<16)+((to)-(from)+1)) > +/* internal function, don't use */ > +#define EHCA_BMASK_SHIFTPOS(mask) (((mask)>>16)&0xffff) > +/* internal function, don't use */ > +#define EHCA_BMASK_MASK(mask) (0xffffffffffffffffULL >> ((64-(mask))&0xffff)) > +/* return value shifted and masked by mask\n > + * variable|=HCA_BMASK_SET(MY_MASK,0x4711) ORs the bits in variable\n > + * variable&=~HCA_BMASK_SET(MY_MASK,-1) clears the bits from the mask > + * in variable What are all of those "\n"s up there? and below? > +#define EHCA_BMASK_SET(mask,value) \ > + ((EHCA_BMASK_MASK(mask) & ((u64)(value)))< +/* extract a parameter from value by mask\n > + * param=EHCA_BMASK_GET(MY_MASK,value) > + */ > +#define EHCA_BMASK_GET(mask,value) \ > + ( EHCA_BMASK_MASK(mask)& (((u64)(value))>>EHCA_BMASK_SHIFTPOS(mask))) > + > + > +/** > + * ehca_adr_bad - Handle to be used for adress translation mechanisms, > + * currently a placeholder. > + */ Use proper kernel-doc format. > +static inline int ehca_adr_bad(void *adr) > +{ > + return !adr; > +} > + > +/** > + * ehca2ib_return_code - Returns ib return code corresponding to the given > + * ehca return code. > + */ Ditto. --- ~Randy From amit_byron at yahoo.com Mon May 15 14:03:09 2006 From: amit_byron at yahoo.com (amit byron) Date: Mon, 15 May 2006 14:03:09 -0700 (PDT) Subject: [openib-general] Re: sdp with kernel 2.6.16.14 In-Reply-To: <20060515204147.GF19163@mellanox.co.il> Message-ID: <20060515210309.96302.qmail@web38515.mail.mud.yahoo.com> Michael, netperf works with sdp. slabinfo output: slabinfo - version: 2.1 (statistics) # name : tunables : slabdata : globalstat : cpustat SDP 0 0 1172 6 2 : tunables 24 12 8 : slabdata 0 0 0 : globalstat 104 18 11 11 0 0 0 0 : cpustat 3 17 20 0 fib6_nodes 7 92 40 92 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 32 21 1 0 0 0 0 0 : cpustat 5 2 0 0 ip6_dst_cache 9 17 228 17 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 36 17 1 0 0 0 0 0 : cpustat 14 3 8 0 ndisc_cache 2 22 180 22 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 32 17 1 0 0 0 0 0 : cpustat 3 2 3 0 RAWv6 7 11 712 11 2 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 11 11 1 0 0 0 0 0 : cpustat 6 1 0 0 UDPv6 1 11 684 11 2 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 52 22 2 1 0 0 0 0 : cpustat 10 5 14 0 Amit "Michael S. Tsirkin" wrote: Quoting r. amit byron : > Subject: sdp with kernel 2.6.16.14 > > > hi, > > i'm trying to get sdp work between point-to-point connected > machines running kernel 2.6.16.24. i have configured ipoib > and trying to run iperf using sdp. > > the client machine has an entry in its libsdp.conf: > match destination 192.168.1.2 > > the server machine has na entry in its libsdp.conf: > match listen *:5001 > > iperf is started on the server machine using command: > LD_PRELOAD=/usr/local/lib/libsdp.so iperf -s > > iperf client is started on the client machine using command: > LD_PRELOAD=/usr/local/lib/libsdp.so iperf -c 192.168.1.2 > > the server machine panics with following messages: > > oom-killer: gfp_mask=0xd0, order=0 > [] oom-killer: gfp_mask=0xd0, order=0 > [] out_of_memory+0x155/0x180 > [] __alloc_pages+0x2a5/0x320 > [] __get_free_pages+0x1e/0x40 > [] __pollwait+0x80/0xd0 > [] pipe_poll+0xcd/0xe0 > [] do_select+0x212/0x480 > [] cache_free_debugcheck+0x135/0x230 > [] __pollwait+0x0/0xd0 > [] core_sys_select+0x1ce/0x2e0 > [] sys_select+0x51/0x1c0 > [] sysenter_past_esp+0x54/0x75 > DMA per-cpu: > cpu 0 hot: high 0, batch 1 used:0 > cpu 0 cold: high 0, batch 1 used:0 > cpu 1 hot: high 0, batch 1 used:0 > cpu 1 cold: high 0, batch 1 used:0 > cpu 2 hot: high 0, batch 1 used:0 > cpu 2 cold: high 0, batch 1 used:0 > cpu 3 hot: high 0, batch 1 used:0 > cpu 3 cold: high 0, batch 1 used:0 > DMA32 per-cpu: empty > Normal per-cpu: > cpu 0 hot: high 186, batch 31 used:103 > cpu 0 cold: high 62, batch 15 used:61 > cpu 1 hot: high 186, batch 31 used:183 > cpu 1 cold: high 62, batch 15 used:53 > cpu 2 hot: high 186, batch 31 used:28 > cpu 2 cold: high 62, batch 15 used:54 > cpu 3 hot: high 186, batch 31 used:63 > cpu 3 cold: high 62, batch 15 used:60 > HighMem per-cpu: > cpu 0 hot: high 186, batch 31 used:176 > cpu 0 cold: high 62, batch 15 used:13 > cpu 1 hot: high 186, batch 31 used:169 > cpu 1 cold: high 62, batch 15 used:1 > cpu 2 hot: high 186, batch 31 used:157 > cpu 2 cold: high 62, batch 15 used:0 > cpu 3 hot: high 186, batch 31 used:174 > cpu 3 cold: high 62, batch 15 used:6 > Free pages: 7366104kB (7358760kB HighMem) > Active:5351 inactive:4885 dirty:0 writeback:0 unstable:0 free:1841526 slab:8970 mapped:4565 pagetables:238 > DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16384kB pages_scanned:8 all_unreclaimable? yes > lowmem_reserve[]: 0 0 880 8623 > DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 880 8623 > Normal free:3756kB min:3756kB low:4692kB high:5632kB active:232kB inactive:0kB present:901 > 120kB pages_scanned:314 all_unreclaimable? yes > lowmem_reserve[]: 0 0 0 61951 > HighMem free:7358760kB min:512kB low:8780kB high:17052kB active:21172kB inactive:19540kB present:7929852kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 0 0 > DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB > DMA32: empty > Normal: 13*4kB 1*8kB 1*16kB 3*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3756kB > HighMem: 1672*4kB 1103*8kB 581*16kB 308*32kB 129*64kB 65*128kB 29*256kB 12*512kB 7*1024kB 2*2048kB 1778*4096kB = 7358760kB > Swap cache: add 0, delete 0, find 0/0, race 0+0 > Free swap = 0kB > Total swap = 0kB > Out of Memory: Kill process 2037 (mDNSResponder) score 2847 and children. > Out of memory: Killed process 2037 (mDNSResponder). > oom-killer: gfp_mask=0xd0, order=0 > [] out_of_memory+0x155/0x180 > [] __alloc_pages+0x2a5/0x320 > [] __get_free_pages+0x1e/0x40 > [] __pollwait+0x80/0xd0 > [] pipe_poll+0xcd/0xe0 > [] do_select+0x212/0x480 > [] cache_free_debugcheck+0x135/0x230 > [] __pollwait+0x0/0xd0 > [] core_sys_select+0x1ce/0x2e0 > [] sys_select+0x51/0x1c0 > [] sysenter_past_esp+0x54/0x75 > DMA per-cpu: > cpu 0 hot: high 0, batch 1 used:0 > cpu 0 cold: high 0, batch 1 used:0 > cpu 1 hot: high 0, batch 1 used:0 > cpu 1 cold: high 0, batch 1 used:0 > cpu 2 hot: high 0, batch 1 used:0 > cpu 2 cold: high 0, batch 1 used:0 > cpu 3 hot: high 0, batch 1 used:0 > cpu 3 cold: high 0, batch 1 used:0 > DMA32 per-cpu: empty > Normal per-cpu: > cpu 0 hot: high 186, batch 31 used:103 > cpu 0 cold: high 62, batch 15 used:61 > cpu 1 hot: high 186, batch 31 used:183 > cpu 1 cold: high 62, batch 15 used:53 > cpu 2 hot: high 186, batch 31 used:29 > cpu 2 cold: high 62, batch 15 used:54 > cpu 3 hot: high 186, batch 31 used:63 > cpu 3 cold: high 62, batch 15 used:60 > HighMem per-cpu: > cpu 0 hot: high 186, batch 31 used:176 > cpu 0 cold: high 62, batch 15 used:13 > cpu 1 hot: high 186, batch 31 used:169 > cpu 1 cold: high 62, batch 15 used:1 > cpu 2 hot: high 186, batch 31 used:179 > cpu 2 cold: high 62, batch 15 used:0 > cpu 3 hot: high 186, batch 31 used:174 > cpu 3 cold: high 62, batch 15 used:6 > Free pages: 7366476kB (7359132kB HighMem) > Active:5273 inactive:4855 dirty:0 writeback:0 unstable:0 free:1841619 slab:8970 mapped:4423 pagetables:232 > DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16384kB pages_scanned:8 all_unreclaimable? yes > lowmem_reserve[]: 0 0 880 8623 > DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 880 8623 > Normal free:3756kB min:3756kB low:4692kB high:5632kB active:232kB inactive:0kB present:901120kB pages_scanned:314 all_unreclaimable? yes > lowmem_reserve[]: 0 0 0 61951 > HighMem free:7359132kB min:512kB low:8780kB high:17052kB active:20860kB inactive:19420kB present:7929852kB pages_scanned:0 all_unreclaimable? no > lowmem_reserve[]: 0 0 0 0 > DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB > DMA32: empty > Normal: 13*4kB 1*8kB 1*16kB 3*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3756kB > HighMem: 1621*4kB 1073*8kB 562*16kB 303*32kB 137*64kB 69*128kB 30*256kB 12*512kB 7*1024kB 2*2048kB 1778*4096kB = 7359132kB > Swap cache: add 0, delete 0, find 0/0, race 0+0 > Free swap = 0kB > Total swap = 0kB > Out of Memory: Kill process 2134 (sendmail) score 1741 and children. > Out of memory: Killed process 2134 (sendmail). > oom-killer: gfp_mask=0xd0, order=0 > [] out_of_memory+0x155/0x180 > [] __alloc_pages+0x2a5/0x320 > [] __get_free_pages+0x1e/0x40 > [] __pollwait+0x80/0xd0 > [] pipe_poll+0xcd/0xe0 > [] do_select+0x212/0x480 > [] cache_free_debugcheck+0x135/0x230 > [] __pollwait+0x0/0xd0 > [] core_sys_select+0x1ce/0x2e0 > [] sys_select+0x51/0x1c0 > [] sysenter_past_esp+0x54/0x75 > > openib source were retrieved with: > svn co https://openib.org/svn/gen2/trunk > > anybody ran into similar problem, are there any sdp > patches available? > > thanks, > Amit Netperf is running here fine. Can you try that to verify its not a setup problem? I'll try iperf later. Hmm. Where did all the memory go? can you cat /proc/slabinfo -- MST __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bos at pathscale.com Mon May 15 14:06:57 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Mon, 15 May 2006 14:06:57 -0700 Subject: [openib-general] Re: [PATCH 4 of 53] ipath - cap number of PDs that can be allocated In-Reply-To: References: <300f0aa6f034eec6a806.1147477369@eng-12.pathscale.com> Message-ID: <1147727217.2773.6.camel@chalcedony.pathscale.com> On Mon, 2006-05-15 at 08:45 -0700, Roland Dreier wrote: > Would it make more sense to fix the stress test? I don't think so. Without some kind of limits, it is simple for an unprivileged user process to cause the kernel to allocate huge wads of memory and thereby DoS or accidentally OOM the machine. The test in question should probably be fixed, but this is a much more fundamental problem. I don't have any specific opinions on what should be done about it, other than "something". References: Message-ID: <1147727259.2773.8.camel@chalcedony.pathscale.com> On Mon, 2006-05-15 at 08:44 -0700, Roland Dreier wrote: > Umm... dumping a 53 patch series into the kernel at this stage in the > release cycle isn't going to work. Fair enough. > Pretty much the only patches that should be going in > now are changes that fix crashes or other serious bugs. OK, I'll filter those out and send them separately. References: <4468A59C.2030400@atipa.com> <4468E064.9060504@atipa.com> Message-ID: <1147727358.2773.11.camel@chalcedony.pathscale.com> On Mon, 2006-05-15 at 15:11 -0500, Roger Heflin wrote: > I checked the rc4 -> git patches (there is only 1 ipath patch in it), > and I get a number of patch fails attempting to apply the patches, I've been using a Mercurial mirror of the git tree, but it should be basically identical to the git tree. > I did attempt to resolve some of the funniness but there were things > that I appear to be missing (things in the context diff that I cannot > find exist in rc4 and I cannot find being added in any patch), so > I don't think I can even get everything to apply even with manual > adjusting. Please send me some more information off-list, and I'll try to help. References: Message-ID: <1147727447.2773.14.camel@chalcedony.pathscale.com> On Mon, 2006-05-15 at 08:57 -0700, Roland Dreier wrote: > > static void i2c_wait_for_writes(struct ipath_devdata *dd) > > { > > + mb(); > > (void)ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); > > } > > This needs a comment explaining why it's needed. A memory barrier > before a readl() looks very strange since readl() should be ordered anyway. Yeah. It's actually working around what appears to be a gcc bug if the kernel is compiled with -Os. Ralph knows the details; he can give a more complete answer. References: <4468BD39.3010008@de.ibm.com> Message-ID: <20060515141428.03800e3e.rdunlap@xenotime.net> On Mon, 15 May 2006 19:41:13 +0200 Heiko J Schick wrote: > Signed-off-by: Heiko J Schick > > > drivers/infiniband/hw/ehca/ehca_main.c | 966 +++++++++++++++++++++++++++++++++ > 1 file changed, 966 insertions(+) > > > > --- linux-2.6.17-rc2-orig/drivers/infiniband/hw/ehca/ehca_main.c 1970-01-01 01:00:00.000000000 +0100 > +++ linux-2.6.17-rc2/drivers/infiniband/hw/ehca/ehca_main.c 2006-05-15 19:17:26.000000000 +0200 > @@ -0,0 +1,966 @@ > +int ehca_open_aqp1 = 0; > +int ehca_debug_level = -1; > +int ehca_hw_level = 0; > +int ehca_nr_ports = 2; > +int ehca_use_hp_mr = 0; > +int ehca_port_act_time = 30; > +int ehca_poll_all_eqs = 1; > +int ehca_static_rate = -1; Don't need to init globals to 0. --- ~Randy From bos at pathscale.com Mon May 15 14:17:27 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Mon, 15 May 2006 14:17:27 -0700 Subject: [openib-general] Re: [PATCH 14 of 53] ipath - forbid empty MRs In-Reply-To: References: <5d9fbba3222eeb941679.1147477379@eng-12.pathscale.com> Message-ID: <1147727847.2773.21.camel@chalcedony.pathscale.com> On Mon, 2006-05-15 at 08:46 -0700, Roland Dreier wrote: > > Don't allow zero-length regions to be created. > > Why are zero-length regions forbidden? One of the gen2 basic tests checks for zero-length regions and barfs if someone creates them. There's no language in IBNA that forbids zero-length regions (I'll take a look at the spec itself to be sure), so it's possible that the test is wrong. On the other hand, a zero-length region doesn't seem terribly useful. References: <20060515200946.21963.48513.stgit@sashak.voltaire.com> Message-ID: <1147727593.4485.192497.camel@hal.voltaire.com> On Mon, 2006-05-15 at 16:09, Sasha Khapyorsky wrote: > This defines QoS configuration parameters set for routers. > > Signed-off-by: Sasha Khapyorsky > --- > > osm/include/opensm/osm_subnet.h | 4 ++++ > osm/opensm/osm_qos.c | 6 +++++- > osm/opensm/osm_subnet.c | 10 ++++++++++ > 3 files changed, 19 insertions(+), 1 deletions(-) Thanks. Applied to trunk only. -- Hal From bos at pathscale.com Mon May 15 14:21:21 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Mon, 15 May 2006 14:21:21 -0700 Subject: [openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt In-Reply-To: References: <4e0a07d20868c6c4f038.1147477386@eng-12.pathscale.com> Message-ID: <1147728081.2773.25.camel@chalcedony.pathscale.com> On Mon, 2006-05-15 at 08:50 -0700, Roland Dreier wrote: > Actually I NAK'ed this patch. It compiles the same thing on x86_64 > but makes the source code wrong -- dma_map_single() returns a bus > address, not a physical address. As Segher mentioned, bus_to_virt is unportable, so it's definitely the wrong thing to use. I don't recall what you suggested instead, but I seem to recall that the discussion kind of went "oh, right, the layering is all broken". Any ideas? Should this turn from a one-liner into a big-refactor-for-2.6.18 patch? (Bryan O'Sullivan's message of "Mon, 15 May 2006 14:21:21 -0700") References: <4e0a07d20868c6c4f038.1147477386@eng-12.pathscale.com> <1147728081.2773.25.camel@chalcedony.pathscale.com> Message-ID: Bryan> As Segher mentioned, bus_to_virt is unportable, so it's Bryan> definitely the wrong thing to use. Yes, but at least it says what you're trying to do. asm-powerpc's io.h has this for phys_to_virt: * This function does not handle bus mappings for DMA transfers. In * almost all conceivable cases a device driver should not be using * this function so replacing bus_to_virt with that is not a step forward. Bryan> Any ideas? Should this turn from a one-liner into a Bryan> big-refactor-for-2.6.18 patch? I don't think there's a quick way to fix this. What you really want to do is override the DMA mapping functions for your device so that you can keep track of the kernel mapping. powerpc can already do this (cf the ehca driver), and I think patches to do it on x86-64 are floating around as part of the "Calgary IOMMU" work. - R. From clameter at sgi.com Mon May 15 14:47:27 2006 From: clameter at sgi.com (Christoph Lameter) Date: Mon, 15 May 2006 14:47:27 -0700 (PDT) Subject: [openib-general] Re: [PATCH] slab: Fix kmem_cache_destroy() on NUMA In-Reply-To: References: Message-ID: On Mon, 15 May 2006, Roland Dreier wrote: > This patch fixes this by having drain_cpu_caches() do > drain_alien_cache() on every node before it does drain_array() on the > nodes' shared array_caches. Correct. That is the fix that I suggested earlier. The alien caches needs to be drained first. From sean.hefty at intel.com Mon May 15 15:32:35 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 15 May 2006 15:32:35 -0700 Subject: [openib-general] [PATCH] RDMA CM: updates to 2.6.18 branch In-Reply-To: Message-ID: I'm assuming that since the CMA isn't upstream yet, a single patch will work. The patch below should contain everything that makes sense to merge upstream for the CMA. Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 2c1386b..0003b87 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -2,7 +2,7 @@ * Copyright (c) 2005 Voltaire Inc. All rights reserved. * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved. * Copyright (c) 1999-2005, Mellanox Technologies, Inc. All rights reserved. - * Copyright (c) 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. * * This Software is licensed under one of the following licenses: * @@ -29,9 +29,15 @@ * */ +#include #include #include +#include #include +#include + +#include + #include #include #include @@ -57,12 +63,14 @@ static LIST_HEAD(dev_list); static LIST_HEAD(listen_any_list); static DEFINE_MUTEX(lock); static struct workqueue_struct *cma_wq; +static DEFINE_IDR(sdp_ps); +static DEFINE_IDR(tcp_ps); struct cma_device { struct list_head list; struct ib_device *device; __be64 node_guid; - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; struct list_head id_list; }; @@ -80,6 +88,12 @@ enum cma_state { CMA_DESTROYING }; +struct rdma_bind_list { + struct idr *ps; + struct hlist_head owners; + unsigned short port; +}; + /* * Device removal can occur at anytime, so we need extra handling to * serialize notifying the user of device removal with other callbacks. @@ -89,13 +103,15 @@ enum cma_state { struct rdma_id_private { struct rdma_cm_id id; + struct rdma_bind_list *bind_list; + struct hlist_node node; struct list_head list; struct list_head listen_list; struct cma_device *cma_dev; enum cma_state state; spinlock_t lock; - wait_queue_head_t wait; + struct completion comp; atomic_t refcount; wait_queue_head_t wait_remove; atomic_t dev_remove; @@ -140,7 +156,7 @@ struct cma_hdr { struct sdp_hh { u8 bsdh[16]; - u8 sdp_version; + u8 sdp_version; /* Major version: 7:4 */ u8 ip_version; /* IP version: 7:4 */ u8 sdp_specific1[10]; __u16 port; @@ -149,8 +165,13 @@ struct sdp_hh { union cma_ip_addr dst_addr; }; +struct sdp_hah { + u8 bsdh[16]; + u8 sdp_version; +}; + #define CMA_VERSION 0x00 -#define SDP_VERSION 0x22 +#define SDP_MAJ_VERSION 0x2 static int cma_comp(struct rdma_id_private *id_priv, enum cma_state comp) { @@ -199,6 +220,11 @@ static inline void cma_set_ip_ver(struct hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF); } +static inline u8 sdp_get_majv(u8 sdp_version) +{ + return sdp_version >> 4; +} + static inline u8 sdp_get_ip_ver(struct sdp_hh *hh) { return hh->ip_version >> 4; @@ -218,11 +244,16 @@ static void cma_attach_to_dev(struct rdm list_add_tail(&id_priv->list, &cma_dev->id_list); } +static inline void cma_deref_dev(struct cma_device *cma_dev) +{ + if (atomic_dec_and_test(&cma_dev->refcount)) + complete(&cma_dev->comp); +} + static void cma_detach_from_dev(struct rdma_id_private *id_priv) { list_del(&id_priv->list); - if (atomic_dec_and_test(&id_priv->cma_dev->refcount)) - wake_up(&id_priv->cma_dev->wait); + cma_deref_dev(id_priv->cma_dev); id_priv->cma_dev = NULL; } @@ -260,7 +291,7 @@ static int cma_acquire_dev(struct rdma_i static void cma_deref_id(struct rdma_id_private *id_priv) { if (atomic_dec_and_test(&id_priv->refcount)) - wake_up(&id_priv->wait); + complete(&id_priv->comp); } static void cma_release_remove(struct rdma_id_private *id_priv) @@ -283,7 +314,7 @@ struct rdma_cm_id *rdma_create_id(rdma_c id_priv->id.event_handler = event_handler; id_priv->id.ps = ps; spin_lock_init(&id_priv->lock); - init_waitqueue_head(&id_priv->wait); + init_completion(&id_priv->comp); atomic_set(&id_priv->refcount, 1); init_waitqueue_head(&id_priv->wait_remove); atomic_set(&id_priv->dev_remove, 0); @@ -457,13 +488,19 @@ static inline int cma_any_addr(struct so return cma_zero_addr(addr) || cma_loopback_addr(addr); } +static inline int cma_any_port(struct sockaddr *addr) +{ + return !((struct sockaddr_in *) addr)->sin_port; +} + static int cma_get_net_info(void *hdr, enum rdma_port_space ps, u8 *ip_ver, __u16 *port, union cma_ip_addr **src, union cma_ip_addr **dst) { switch (ps) { case RDMA_PS_SDP: - if (((struct sdp_hh *) hdr)->sdp_version != SDP_VERSION) + if (sdp_get_majv(((struct sdp_hh *) hdr)->sdp_version) != + SDP_MAJ_VERSION) return -EINVAL; *ip_ver = sdp_get_ip_ver(hdr); @@ -481,6 +518,9 @@ static int cma_get_net_info(void *hdr, e *dst = &((struct cma_hdr *) hdr)->dst_addr; break; } + + if (*ip_ver != 4 && *ip_ver != 6) + return -EINVAL; return 0; } @@ -581,8 +621,8 @@ static void cma_destroy_listen(struct rd } list_del(&id_priv->listen_list); - atomic_dec(&id_priv->refcount); - wait_event(id_priv->wait, !atomic_read(&id_priv->refcount)); + cma_deref_id(id_priv); + wait_for_completion(&id_priv->comp); kfree(id_priv); } @@ -622,6 +662,22 @@ static void cma_cancel_operation(struct } } +static void cma_release_port(struct rdma_id_private *id_priv) +{ + struct rdma_bind_list *bind_list = id_priv->bind_list; + + if (!bind_list) + return; + + mutex_lock(&lock); + hlist_del(&id_priv->node); + if (hlist_empty(&bind_list->owners)) { + idr_remove(bind_list->ps, bind_list->port); + kfree(bind_list); + } + mutex_unlock(&lock); +} + void rdma_destroy_id(struct rdma_cm_id *id) { struct rdma_id_private *id_priv; @@ -645,8 +701,9 @@ void rdma_destroy_id(struct rdma_cm_id * mutex_unlock(&lock); } - atomic_dec(&id_priv->refcount); - wait_event(id_priv->wait, !atomic_read(&id_priv->refcount)); + cma_release_port(id_priv); + cma_deref_id(id_priv); + wait_for_completion(&id_priv->comp); kfree(id_priv->id.route.path_rec); kfree(id_priv); @@ -677,6 +734,16 @@ reject: return ret; } +static int cma_verify_rep(struct rdma_id_private *id_priv, void *data) +{ + if (id_priv->id.ps == RDMA_PS_SDP && + sdp_get_majv(((struct sdp_hah *) data)->sdp_version) != + SDP_MAJ_VERSION) + return -EINVAL; + + return 0; +} + static int cma_rtu_recv(struct rdma_id_private *id_priv) { int ret; @@ -711,7 +778,10 @@ static int cma_ib_handler(struct ib_cm_i status = -ETIMEDOUT; break; case IB_CM_REP_RECEIVED: - if (id_priv->id.qp) { + status = cma_verify_rep(id_priv, ib_event->private_data); + if (status) + event = RDMA_CM_EVENT_CONNECT_ERROR; + else if (id_priv->id.qp) { status = cma_rep_recv(id_priv); event = status ? RDMA_CM_EVENT_CONNECT_ERROR : RDMA_CM_EVENT_ESTABLISHED; @@ -915,21 +985,6 @@ static int cma_ib_listen(struct rdma_id_ return ret; } -static int cma_duplicate_listen(struct rdma_id_private *id_priv) -{ - struct rdma_id_private *cur_id_priv; - struct sockaddr_in *cur_addr, *new_addr; - - new_addr = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; - list_for_each_entry(cur_id_priv, &listen_any_list, listen_list) { - cur_addr = (struct sockaddr_in *) - &cur_id_priv->id.route.addr.src_addr; - if (cur_addr->sin_port == new_addr->sin_port) - return -EADDRINUSE; - } - return 0; -} - static int cma_listen_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) { @@ -952,9 +1007,10 @@ static void cma_listen_on_dev(struct rdm return; dev_id_priv = container_of(id, struct rdma_id_private, id); - ret = rdma_bind_addr(id, &id_priv->id.route.addr.src_addr); - if (ret) - goto err; + + dev_id_priv->state = CMA_ADDR_BOUND; + memcpy(&id->route.addr.src_addr, &id_priv->id.route.addr.src_addr, + ip_addr_size(&id_priv->id.route.addr.src_addr)); cma_attach_to_dev(dev_id_priv, cma_dev); list_add_tail(&dev_id_priv->listen_list, &id_priv->listen_list); @@ -968,22 +1024,24 @@ err: cma_destroy_listen(dev_id_priv); } -static int cma_listen_on_all(struct rdma_id_private *id_priv) +static void cma_listen_on_all(struct rdma_id_private *id_priv) { struct cma_device *cma_dev; - int ret; mutex_lock(&lock); - ret = cma_duplicate_listen(id_priv); - if (ret) - goto out; - list_add_tail(&id_priv->list, &listen_any_list); list_for_each_entry(cma_dev, &dev_list, list) cma_listen_on_dev(id_priv, cma_dev); -out: mutex_unlock(&lock); - return ret; +} + +static int cma_bind_any(struct rdma_cm_id *id, sa_family_t af) +{ + struct sockaddr_in addr_in; + + memset(&addr_in, 0, sizeof addr_in); + addr_in.sin_family = af; + return rdma_bind_addr(id, (struct sockaddr *) &addr_in); } int rdma_listen(struct rdma_cm_id *id, int backlog) @@ -992,6 +1050,12 @@ int rdma_listen(struct rdma_cm_id *id, i int ret; id_priv = container_of(id, struct rdma_id_private, id); + if (id_priv->state == CMA_IDLE) { + ret = cma_bind_any(id, AF_INET); + if (ret) + return ret; + } + if (!cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_LISTEN)) return -EINVAL; @@ -999,23 +1063,22 @@ int rdma_listen(struct rdma_cm_id *id, i switch (id->device->node_type) { case IB_NODE_CA: ret = cma_ib_listen(id_priv); + if (ret) + goto err; break; default: ret = -ENOSYS; - break; + goto err; } } else - ret = cma_listen_on_all(id_priv); - - if (ret) - goto err; + cma_listen_on_all(id_priv); id_priv->backlog = backlog; return 0; err: cma_comp_exch(id_priv, CMA_LISTEN, CMA_ADDR_BOUND); return ret; -}; +} EXPORT_SYMBOL(rdma_listen); static void cma_query_handler(int status, struct ib_sa_path_rec *path_rec, @@ -1252,15 +1315,10 @@ err: static int cma_bind_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, struct sockaddr *dst_addr) { - struct sockaddr_in addr_in; - if (src_addr && src_addr->sa_family) return rdma_bind_addr(id, src_addr); - else { - memset(&addr_in, 0, sizeof addr_in); - addr_in.sin_family = dst_addr->sa_family; - return rdma_bind_addr(id, (struct sockaddr *) &addr_in); - } + else + return cma_bind_any(id, dst_addr->sa_family); } int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, @@ -1281,7 +1339,7 @@ int rdma_resolve_addr(struct rdma_cm_id atomic_inc(&id_priv->refcount); memcpy(&id->route.addr.dst_addr, dst_addr, ip_addr_size(dst_addr)); - if (cma_loopback_addr(dst_addr)) + if (cma_any_addr(dst_addr)) ret = cma_resolve_loopback(id_priv); else ret = rdma_resolve_ip(&id->route.addr.src_addr, dst_addr, @@ -1298,32 +1356,140 @@ err: } EXPORT_SYMBOL(rdma_resolve_addr); +static void cma_bind_port(struct rdma_bind_list *bind_list, + struct rdma_id_private *id_priv) +{ + struct sockaddr_in *sin; + + sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; + sin->sin_port = htons(bind_list->port); + id_priv->bind_list = bind_list; + hlist_add_head(&id_priv->node, &bind_list->owners); +} + +static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv, + unsigned short snum) +{ + struct rdma_bind_list *bind_list; + int port, start, ret; + + bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL); + if (!bind_list) + return -ENOMEM; + + start = snum ? snum : sysctl_local_port_range[0]; + + do { + ret = idr_get_new_above(ps, bind_list, start, &port); + } while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL)); + + if (ret) + goto err; + + if ((snum && port != snum) || + (!snum && port > sysctl_local_port_range[1])) { + idr_remove(ps, port); + ret = -EADDRNOTAVAIL; + goto err; + } + + bind_list->ps = ps; + bind_list->port = (unsigned short) port; + cma_bind_port(bind_list, id_priv); + return 0; +err: + kfree(bind_list); + return ret; +} + +static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv) +{ + struct rdma_id_private *cur_id; + struct sockaddr_in *sin, *cur_sin; + struct rdma_bind_list *bind_list; + struct hlist_node *node; + unsigned short snum; + + sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; + snum = ntohs(sin->sin_port); + if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE)) + return -EACCES; + + bind_list = idr_find(ps, snum); + if (!bind_list) + return cma_alloc_port(ps, id_priv, snum); + + /* + * We don't support binding to any address if anyone is bound to + * a specific address on the same port. + */ + if (cma_any_addr(&id_priv->id.route.addr.src_addr)) + return -EADDRNOTAVAIL; + + hlist_for_each_entry(cur_id, node, &bind_list->owners, node) { + if (cma_any_addr(&cur_id->id.route.addr.src_addr)) + return -EADDRNOTAVAIL; + + cur_sin = (struct sockaddr_in *) &cur_id->id.route.addr.src_addr; + if (sin->sin_addr.s_addr == cur_sin->sin_addr.s_addr) + return -EADDRINUSE; + } + + cma_bind_port(bind_list, id_priv); + return 0; +} + +static int cma_get_port(struct rdma_id_private *id_priv) +{ + struct idr *ps; + int ret; + + switch (id_priv->id.ps) { + case RDMA_PS_SDP: + ps = &sdp_ps; + break; + case RDMA_PS_TCP: + ps = &tcp_ps; + break; + default: + return -EPROTONOSUPPORT; + } + + mutex_lock(&lock); + if (cma_any_port(&id_priv->id.route.addr.src_addr)) + ret = cma_alloc_port(ps, id_priv, 0); + else + ret = cma_use_port(ps, id_priv); + mutex_unlock(&lock); + + return ret; +} + int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) { struct rdma_id_private *id_priv; - struct rdma_dev_addr *dev_addr; int ret; if (addr->sa_family != AF_INET) - return -EINVAL; + return -EAFNOSUPPORT; id_priv = container_of(id, struct rdma_id_private, id); if (!cma_comp_exch(id_priv, CMA_IDLE, CMA_ADDR_BOUND)) return -EINVAL; - if (cma_any_addr(addr)) - ret = 0; - else { - dev_addr = &id->route.addr.dev_addr; - ret = rdma_translate_ip(addr, dev_addr); + if (!cma_any_addr(addr)) { + ret = rdma_translate_ip(addr, &id->route.addr.dev_addr); if (!ret) ret = cma_acquire_dev(id_priv); + if (ret) + goto err; } + memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr)); + ret = cma_get_port(id_priv); if (ret) goto err; - memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr)); return 0; err: cma_comp_exch(id_priv, CMA_ADDR_BOUND, CMA_IDLE); @@ -1331,8 +1497,8 @@ err: } EXPORT_SYMBOL(rdma_bind_addr); -static void cma_format_hdr(void *hdr, enum rdma_port_space ps, - struct rdma_route *route) +static int cma_format_hdr(void *hdr, enum rdma_port_space ps, + struct rdma_route *route) { struct sockaddr_in *src4, *dst4; struct cma_hdr *cma_hdr; @@ -1344,7 +1510,8 @@ static void cma_format_hdr(void *hdr, en switch (ps) { case RDMA_PS_SDP: sdp_hdr = hdr; - sdp_hdr->sdp_version = SDP_VERSION; + if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) + return -EINVAL; sdp_set_ip_ver(sdp_hdr, 4); sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; @@ -1359,6 +1526,7 @@ static void cma_format_hdr(void *hdr, en cma_hdr->port = src4->sin_port; break; } + return 0; } static int cma_connect_ib(struct rdma_id_private *id_priv, @@ -1388,7 +1556,9 @@ static int cma_connect_ib(struct rdma_id } route = &id_priv->id.route; - cma_format_hdr(private_data, id_priv->id.ps, route); + ret = cma_format_hdr(private_data, id_priv->id.ps, route); + if (ret) + goto out; req.private_data = private_data; req.primary_path = &route->path_rec[0]; @@ -1534,7 +1704,7 @@ int rdma_reject(struct rdma_cm_id *id, c break; } return ret; -}; +} EXPORT_SYMBOL(rdma_reject); int rdma_disconnect(struct rdma_cm_id *id) @@ -1578,7 +1748,7 @@ static void cma_add_one(struct ib_device if (!cma_dev->node_guid) goto err; - init_waitqueue_head(&cma_dev->wait); + init_completion(&cma_dev->comp); atomic_set(&cma_dev->refcount, 1); INIT_LIST_HEAD(&cma_dev->id_list); ib_set_client_data(device, &cma_client, cma_dev); @@ -1645,8 +1815,8 @@ static void cma_process_remove(struct cm } mutex_unlock(&lock); - atomic_dec(&cma_dev->refcount); - wait_event(cma_dev->wait, !atomic_read(&cma_dev->refcount)); + cma_deref_dev(cma_dev); + wait_for_completion(&cma_dev->comp); } static void cma_remove_one(struct ib_device *device) @@ -1687,6 +1857,8 @@ static void cma_cleanup(void) { ib_unregister_client(&cma_client); destroy_workqueue(cma_wq); + idr_destroy(&sdp_ps); + idr_destroy(&tcp_ps); } module_init(cma_init); diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h index 2e56f25..402c63d 100644 --- a/include/rdma/rdma_cm.h +++ b/include/rdma/rdma_cm.h @@ -91,7 +91,7 @@ struct rdma_cm_id; * * Notes: Users may not call rdma_destroy_id from this callback to destroy * the passed in id, or a corresponding listen id. Returning a - * non-zero value from the callback will destroy the corresponding id. + * non-zero value from the callback will destroy the passed in id. */ typedef int (*rdma_cm_event_handler)(struct rdma_cm_id *id, struct rdma_cm_event *event); @@ -241,13 +241,14 @@ int rdma_listen(struct rdma_cm_id *id, i int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); /** - * rdma_reject - Called on the passive side to reject a connection request. + * rdma_reject - Called to reject a connection request or response. */ int rdma_reject(struct rdma_cm_id *id, const void *private_data, u8 private_data_len); /** - * rdma_disconnect - This function disconnects the associated QP. + * rdma_disconnect - This function disconnects the associated QP and + * transitions it into the error state. */ int rdma_disconnect(struct rdma_cm_id *id); From rdreier at cisco.com Mon May 15 15:35:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 15:35:07 -0700 Subject: [openib-general] Re: [PATCH] RDMA CM: updates to 2.6.18 branch In-Reply-To: (Sean Hefty's message of "Mon, 15 May 2006 15:32:35 -0700") References: Message-ID: Sean> I'm assuming that since the CMA isn't upstream yet, a single Sean> patch will work. Yes, that's perfect. I'll drop it into my git tree today. - R. From ralphc at pathscale.com Mon May 15 16:01:20 2006 From: ralphc at pathscale.com (ralphc at pathscale.com) Date: Mon, 15 May 2006 16:01:20 -0700 (PDT) Subject: [openib-general] Re: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes In-Reply-To: <1147727447.2773.14.camel@chalcedony.pathscale.com> References: <1147727447.2773.14.camel@chalcedony.pathscale.com> Message-ID: <60844.71.131.57.117.1147734080.squirrel@rocky.pathscale.com> > On Mon, 2006-05-15 at 08:57 -0700, Roland Dreier wrote: >> > static void i2c_wait_for_writes(struct ipath_devdata *dd) >> > { >> > + mb(); >> > (void)ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch); >> > } >> >> This needs a comment explaining why it's needed. A memory barrier >> before a readl() looks very strange since readl() should be ordered >> anyway. > > Yeah. It's actually working around what appears to be a gcc bug if the > kernel is compiled with -Os. Ralph knows the details; he can give a > more complete answer. > > (ralphc@pathscale.com's message of "Mon, 15 May 2006 16:01:20 -0700 (PDT)") References: <1147727447.2773.14.camel@chalcedony.pathscale.com> <60844.71.131.57.117.1147734080.squirrel@rocky.pathscale.com> Message-ID: ralphc> I don't have a lot to add to this other than I looked at ralphc> the assembly code output for -Os and -O3 and both looked ralphc> OK. I put the mb() in to be sure the writes were complete ralphc> and I found this to work by experimentation. Without it, ralphc> the driver fails to read the EEPROM correctly. Hmm, that doesn't give me a warm fuzzy feeling. Basically on x86-64 you're adding an unneeded mfence instruction to work around miscompilation? Is i2c_wait_for_writes miscompiled without the mb() with -Os? What does the bad assembly look like? - R. From iod00d at hp.com Mon May 15 16:13:42 2006 From: iod00d at hp.com (Grant Grundler) Date: Mon, 15 May 2006 16:13:42 -0700 Subject: [openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt In-Reply-To: References: <4e0a07d20868c6c4f038.1147477386@eng-12.pathscale.com> <1147728081.2773.25.camel@chalcedony.pathscale.com> Message-ID: <20060515231342.GK29082@esmail.cup.hp.com> On Mon, May 15, 2006 at 02:28:45PM -0700, Roland Dreier wrote: > Bryan> Any ideas? Should this turn from a one-liner into a > Bryan> big-refactor-for-2.6.18 patch? > > I don't think there's a quick way to fix this. What you really want > to do is override the DMA mapping functions for your device so that > you can keep track of the kernel mapping. Or figure out which openib.org interface has to change so the original virt addresses that were registered/handed to the ULP are passed down to the low level interface driver too. Seems like a more obvious way to fix the problem. Someone did suggest this already, right? > (cf the ehca driver), and I think patches to do it on x86-64 are > floating around as part of the "Calgary IOMMU" work. parisc has been using dma_ops for several years. I don't expect dma_ops to become part of generic code. DMA support is inherently arch specific. Because of that, I don't look forward to a low level device driver that is mucking with dma_ops. hth, grant From rdreier at cisco.com Mon May 15 16:16:45 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 16:16:45 -0700 Subject: [openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt In-Reply-To: <20060515231342.GK29082@esmail.cup.hp.com> (Grant Grundler's message of "Mon, 15 May 2006 16:13:42 -0700") References: <4e0a07d20868c6c4f038.1147477386@eng-12.pathscale.com> <1147728081.2773.25.camel@chalcedony.pathscale.com> <20060515231342.GK29082@esmail.cup.hp.com> Message-ID: Grant> Or figure out which openib.org interface has to change so Grant> the original virt addresses that were registered/handed to Grant> the ULP are passed down to the low level interface driver Grant> too. Seems like a more obvious way to fix the problem. Grant> Someone did suggest this already, right? It's been suggested many times, but no one ever comes up with a way to handle the fact that RDMA means that addresses come from remote systems as well as being passed in through an API. - R. From ralphc at pathscale.com Mon May 15 16:25:00 2006 From: ralphc at pathscale.com (ralphc at pathscale.com) Date: Mon, 15 May 2006 16:25:00 -0700 (PDT) Subject: [openib-general] Re: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes In-Reply-To: References: <1147727447.2773.14.camel@chalcedony.pathscale.com> <60844.71.131.57.117.1147734080.squirrel@rocky.pathscale.com> Message-ID: <40771.71.131.57.117.1147735500.squirrel@rocky.pathscale.com> > ralphc> I don't have a lot to add to this other than I looked at > ralphc> the assembly code output for -Os and -O3 and both looked > ralphc> OK. I put the mb() in to be sure the writes were complete > ralphc> and I found this to work by experimentation. Without it, > ralphc> the driver fails to read the EEPROM correctly. > > Hmm, that doesn't give me a warm fuzzy feeling. Basically on x86-64 > you're adding an unneeded mfence instruction to work around > miscompilation? > > Is i2c_wait_for_writes miscompiled without the mb() with -Os? What > does the bad assembly look like? > > - R. We had a power failure here so I'm not able to reproduce the assembly code at the moment. What I remember from looking at the code is that the code for ipath_read_kreg32() was present in i2c_wait_for_writes() when compiled -Os so it didn't look like a compiler bug. I probably could put the mb() at the end of i2c_gpio_set() if that makes you more comfortable. The mb() is definitely needed though. From rdreier at cisco.com Mon May 15 16:28:12 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 16:28:12 -0700 Subject: [openib-general] Re: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes In-Reply-To: <40771.71.131.57.117.1147735500.squirrel@rocky.pathscale.com> (ralphc@pathscale.com's message of "Mon, 15 May 2006 16:25:00 -0700 (PDT)") References: <1147727447.2773.14.camel@chalcedony.pathscale.com> <60844.71.131.57.117.1147734080.squirrel@rocky.pathscale.com> <40771.71.131.57.117.1147735500.squirrel@rocky.pathscale.com> Message-ID: ralphc> We had a power failure here so I'm not able to reproduce ralphc> the assembly code at the moment. What I remember from ralphc> looking at the code is that the code for ralphc> ipath_read_kreg32() was present in i2c_wait_for_writes() ralphc> when compiled -Os so it didn't look like a compiler bug. ralphc> I probably could put the mb() at the end of i2c_gpio_set() ralphc> if that makes you more comfortable. The mb() is ralphc> definitely needed though. Is it the mb()? Or is just a barrier() enough? In other words do you really need the mfence, or do you just need to stop the compiler from reordering things? - R. From iod00d at hp.com Mon May 15 16:30:49 2006 From: iod00d at hp.com (Grant Grundler) Date: Mon, 15 May 2006 16:30:49 -0700 Subject: [openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt In-Reply-To: References: <4e0a07d20868c6c4f038.1147477386@eng-12.pathscale.com> <1147728081.2773.25.camel@chalcedony.pathscale.com> <20060515231342.GK29082@esmail.cup.hp.com> Message-ID: <20060515233049.GM29082@esmail.cup.hp.com> On Mon, May 15, 2006 at 04:16:45PM -0700, Roland Dreier wrote: > Grant> Or figure out which openib.org interface has to change so > Grant> the original virt addresses that were registered/handed to > Grant> the ULP are passed down to the low level interface driver > Grant> too. Seems like a more obvious way to fix the problem. > Grant> Someone did suggest this already, right? > > It's been suggested many times, but no one ever comes up with a way to > handle the fact that RDMA means that addresses come from remote > systems as well as being passed in through an API. Aren't remote addresses handled differently than local ones? ULP has to map local addresses. We can't map remote ones (remote host maps it). The ULP must know the difference and can tell the lower level driver which is which. Sorry, I hope my ignorance of RDMA isn't getting in the way again. thanks, grant From rdreier at cisco.com Mon May 15 16:34:56 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 16:34:56 -0700 Subject: [openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt In-Reply-To: <20060515233049.GM29082@esmail.cup.hp.com> (Grant Grundler's message of "Mon, 15 May 2006 16:30:49 -0700") References: <4e0a07d20868c6c4f038.1147477386@eng-12.pathscale.com> <1147728081.2773.25.camel@chalcedony.pathscale.com> <20060515231342.GK29082@esmail.cup.hp.com> <20060515233049.GM29082@esmail.cup.hp.com> Message-ID: Grant> Aren't remote addresses handled differently than local Grant> ones? ULP has to map local addresses. We can't map remote Grant> ones (remote host maps it). The ULP must know the Grant> difference and can tell the lower level driver which is Grant> which. The problem is that RDMA requests have to be handled by the low-level driver (or hardware) without any ULP involvement. So every device has to handle getting messages like "send me XXX bytes of data from address YYY in the memory region corresponding to R_Key ZZZ." - R. From ralphc at pathscale.com Mon May 15 16:38:01 2006 From: ralphc at pathscale.com (ralphc at pathscale.com) Date: Mon, 15 May 2006 16:38:01 -0700 (PDT) Subject: [openib-general] Re: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes In-Reply-To: References: <1147727447.2773.14.camel@chalcedony.pathscale.com> <60844.71.131.57.117.1147734080.squirrel@rocky.pathscale.com> <40771.71.131.57.117.1147735500.squirrel@rocky.pathscale.com> Message-ID: <53739.71.131.57.117.1147736281.squirrel@rocky.pathscale.com> > ralphc> We had a power failure here so I'm not able to reproduce > ralphc> the assembly code at the moment. What I remember from > ralphc> looking at the code is that the code for > ralphc> ipath_read_kreg32() was present in i2c_wait_for_writes() > ralphc> when compiled -Os so it didn't look like a compiler bug. > ralphc> I probably could put the mb() at the end of i2c_gpio_set() > ralphc> if that makes you more comfortable. The mb() is > ralphc> definitely needed though. > > Is it the mb()? Or is just a barrier() enough? In other words do you > really need the mfence, or do you just need to stop the compiler from > reordering things? > > - R. I didn't try calling barrier() so I don't know the answer. When power is restored, I can try it. My guess is that it's a timing issue and not a code reordering issue. From caitlinb at broadcom.com Mon May 15 16:40:35 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Mon, 15 May 2006 16:40:35 -0700 Subject: [openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt Message-ID: <54AD0F12E08D1541B826BE97C98F99F149F34B@NT-SJCA-0751.brcm.ad.broadcom.com> openib-general-bounces at openib.org wrote: > Grant> Aren't remote addresses handled differently than local > Grant> ones? ULP has to map local addresses. We can't map remote > Grant> ones (remote host maps it). The ULP must know the > Grant> difference and can tell the lower level driver which is > Grant> which. > > The problem is that RDMA requests have to be handled by the > low-level driver (or hardware) without any ULP involvement. > So every device has to handle getting messages like "send me > XXX bytes of data from address YYY in the memory region > corresponding to R_Key ZZZ." > True, but how does that constrain the local interfaces by which the driver is informed of the set of pages that back a given memory region? The driver must still ultimately provide dma accessible addresses to the device. RDMA just changes the timing of the steps, albeit radically, but not what the steps are. From rdreier at cisco.com Mon May 15 16:47:41 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 16:47:41 -0700 Subject: [openib-general] Re: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes In-Reply-To: <53739.71.131.57.117.1147736281.squirrel@rocky.pathscale.com> (ralphc@pathscale.com's message of "Mon, 15 May 2006 16:38:01 -0700 (PDT)") References: <1147727447.2773.14.camel@chalcedony.pathscale.com> <60844.71.131.57.117.1147734080.squirrel@rocky.pathscale.com> <40771.71.131.57.117.1147735500.squirrel@rocky.pathscale.com> <53739.71.131.57.117.1147736281.squirrel@rocky.pathscale.com> Message-ID: ralphc> I didn't try calling barrier() so I don't know the answer. ralphc> When power is restored, I can try it. My guess is that ralphc> it's a timing issue and not a code reordering issue. Hmm, then we really better understand what's going on, because otherwise you're just going to have trouble again if someone makes a CPU with a faster mfence instruction... From rdreier at cisco.com Mon May 15 16:50:11 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 15 May 2006 16:50:11 -0700 Subject: [openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F149F34B@NT-SJCA-0751.brcm.ad.broadcom.com> (Caitlin Bestler's message of "Mon, 15 May 2006 16:40:35 -0700") References: <54AD0F12E08D1541B826BE97C98F99F149F34B@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: Caitlin> True, but how does that constrain the local interfaces by Caitlin> which the driver is informed of the set of pages that Caitlin> back a given memory region? The driver must still Caitlin> ultimately provide dma accessible addresses to the Caitlin> device. RDMA just changes the timing of the steps, albeit Caitlin> radically, but not what the steps are. It's only a problem for "reserved L_Key" types of things, where the device is supposed to just use the address given in a work request without translating it. No translation means that work requests have to contain "bus addresses" -- addresses that are what the device would put on the bus to access memory. But if a device needs to simulate DMA in software, then it really needs a kernel virtual address, not a bus address. But it's pretty ugly to have to put that knowledge in every consumer. - R. From penberg at cs.helsinki.fi Mon May 15 21:22:07 2006 From: penberg at cs.helsinki.fi (Pekka Enberg) Date: Tue, 16 May 2006 07:22:07 +0300 Subject: [openib-general] Re: [PATCH] slab: Fix kmem_cache_destroy() on NUMA In-Reply-To: References: Message-ID: <1147753327.11271.0.camel@localhost> On Mon, 15 May 2006, Roland Dreier wrote: > > This patch fixes this by having drain_cpu_caches() do > > drain_alien_cache() on every node before it does drain_array() on the > > nodes' shared array_caches. On Mon, 2006-05-15 at 14:47 -0700, Christoph Lameter wrote: > Correct. That is the fix that I suggested earlier. The alien caches needs > to be drained first. Yeah, looks good to me too. Thanks Roland! Pekka From k_mahesh85 at yahoo.co.in Mon May 15 22:07:28 2006 From: k_mahesh85 at yahoo.co.in (keshetti mahesh) Date: Tue, 16 May 2006 06:07:28 +0100 (BST) Subject: [openib-general] FTP over SDP is not working fine-[newbie] Message-ID: <20060516050728.97544.qmail@web8322.mail.in.yahoo.com> hi i was trying to make some applications run using SDP all the socket based applications developed by me are working fine but FTP is giving error at the time of connection setup itself i am using openIB gen2 stack (with sdp_historic) i have configured the libsdp.conf for all hosts >export LD_PRELOAD=/usr/lib64/libsdp.so >export LIBSDP_CONFIG_FILE=/etc/libsdp.conf >service vsftpd restart >ftp 192.168.3.242 >connect failed: network unreachable (but it is actually reachable- ihavce tried ping) do i need to configure anything to make FTP work over SDP regards K.Mahesh --------------------------------- Why was V. Sehwag warned by the BCCI? Share your knowledge on Yahoo! Answers India Send instant messages to your online friends - NOW -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at openib.org Mon May 15 23:20:04 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 15 May 2006 23:20:04 -0700 (PDT) Subject: [openib-general] [Bug 33] Ping fails on ib1 interface - IBED - RC3 Message-ID: <20060516062004.A8B272283D6@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=33 ------- Additional Comments From ksharma at silverstorm.com 2006-05-15 23:20 ------- The bug still exists with OFED RC4. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From iod00d at hp.com Mon May 15 23:23:17 2006 From: iod00d at hp.com (Grant Grundler) Date: Mon, 15 May 2006 23:23:17 -0700 Subject: [openib-general] FTP over SDP is not working fine-[newbie] In-Reply-To: <20060516050728.97544.qmail@web8322.mail.in.yahoo.com> References: <20060516050728.97544.qmail@web8322.mail.in.yahoo.com> Message-ID: <20060516062317.GN29082@esmail.cup.hp.com> On Tue, May 16, 2006 at 06:07:28AM +0100, keshetti mahesh wrote: > hi > > i was trying to make some applications run using SDP > all the socket based applications developed by me are working fine but FTP is giving error at the time of connection setup itself > > i am using openIB gen2 stack (with sdp_historic) > i have configured the libsdp.conf for all hosts > >export LD_PRELOAD=/usr/lib64/libsdp.so > >export LIBSDP_CONFIG_FILE=/etc/libsdp.conf > >service vsftpd restart > > >ftp 192.168.3.242 > >connect failed: network unreachable (but it is actually reachable- ihavce tried ping) > > do i need to configure anything to make FTP work over SDP You might make sure LD_PRELOAD is set on the client side too. It's clear you've done the server side. Then check the libsdp.conf file on both sides to make sure SDP is enabled respectively for both vsftpd server and ftp client on the respective machines. I prefer to just allow _all_ services in the libsdp.conf and then selectively enable SDP for services by prefixing the command line with LD_PRELOAD like this: LD_PRELOAD=/usr/lib/libsdp.so ftp That's instead of exporting the LD_PRELOAD. Makes the libsdp.conf file much simpler and works for the simple things I do. hth, grant From bugzilla-daemon at openib.org Tue May 16 04:44:28 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 16 May 2006 04:44:28 -0700 (PDT) Subject: [openib-general] [Bug 33] OFED: Ping fails on ib1 interface - IBED - RC3 Message-ID: <20060516114428.5D8CA228555@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=33 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |vlad at mellanox.co.il Summary|Ping fails on ib1 interface |OFED: Ping fails on ib1 |- IBED - RC3 |interface - IBED - RC3 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From ogerlitz at voltaire.com Tue May 16 04:57:02 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 16 May 2006 14:57:02 +0300 Subject: [openib-general] Re: [PATCH] slab: Fix kmem_cache_destroy() on NUMA In-Reply-To: References: Message-ID: <4469BE0E.9080205@voltaire.com> Roland Dreier wrote: > With CONFIG_NUMA set, kmem_cache_destroy() may fail and say "Can't > free all objects." The problem is caused by sequences such as the > following (suppose we are on a NUMA machine with two nodes, 0 and 1): > > * Allocate an object from cache on node 0. > * Free the object on node 1. The object is put into node 1's alien > array_cache for node 0. > * Call kmem_cache_destroy(), which ultimately ends up in __cache_shrink(). > * __cache_shrink() does drain_cpu_caches(), which loops through all nodes. > For each node it drains the shared array_cache and then handles the > alien array_cache for the other node. > > However this means that node 0's shared array_cache will be drained, > and then node 1 will move the contents of its alien[0] array_cache > into that same shared array_cache. node 0's shared array_cache is > never looked at again, so the objects left there will appear to be in > use when __cache_shrink() calls __node_shrink() for node 0. So > __node_shrink() will return 1 and kmem_cache_destroy() will fail. > > This patch fixes this by having drain_cpu_caches() do > drain_alien_cache() on every node before it does drain_array() on the > nodes' shared array_caches. > > The problem was originally reported by Or Gerlitz . > > Cc: Christoph Lameter > Cc: Pekka Enberg OK, Indeed i have CONFIG_NUMA and yes, the patch fixes my problem, thanks a lot for working on that! Or. From bugzilla-daemon at openib.org Tue May 16 05:14:27 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 16 May 2006 05:14:27 -0700 (PDT) Subject: [openib-general] [Bug 28] ipoib_mcast_sendonly_join_complete oops Message-ID: <20060516121427.7D0E0228547@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=28 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Additional Comments From tziporet at mellanox.co.il 2006-05-16 05:14 ------- Fixed by Eli ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Tue May 16 05:29:35 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 16 May 2006 05:29:35 -0700 (PDT) Subject: [openib-general] [Bug 65] ib_ipoib refuses to unload when alias exists in modprobe.conf Message-ID: <20060516122935.1B216228547@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=65 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Tue May 16 05:30:41 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 16 May 2006 05:30:41 -0700 (PDT) Subject: [openib-general] [Bug 31] ifconfig up/down while ssh connection alive cause oops Message-ID: <20060516123041.6F7F4228547@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=31 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Additional Comments From tziporet at mellanox.co.il 2006-05-16 05:30 ------- Resolved in RC4 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Tue May 16 06:07:46 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 16 May 2006 06:07:46 -0700 (PDT) Subject: [openib-general] [Bug 31] ifconfig up/down while ssh connection alive cause oops Message-ID: <20060516130746.DF63A228547@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=31 amitk at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Tue May 16 06:02:41 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 16 May 2006 16:02:41 +0300 Subject: [openib-general] Re: [PATCH] RE: compliancy issue? In-Reply-To: References: <20060508085301.GD20207@mellanox.co.il> Message-ID: <20060516130241.GQ30211@mellanox.co.il> Quoting r. Sean Hefty : > Subject: [PATCH] RE: compliancy issue? > > >CA4-24.2.3: The connecting peer shall terminate the connection attempt > >if ExtMaxAdverts of the HAH is set to zero. > > > >This means that SDP must examine the HAH before RTU is sent. > >But, CMA currently sends RTU from cma_rep_recv, before notifying > >the user. > > Can you try this simple patch and see if it fixes your problem? You will > need to call rdma_accept() or rdma_reject() after receiving a CONNECT_RESPONSE > event. The conn_param to rdma_accept() should be NULL. OK, I just tested and this works for me. Here's the SDP patch to do what you described. The code actually got cleaner now: its convenient to get different events on active versus passive side - previously I had to check a flag to figure out what does ESTABLISHED mean. I still think it makes sense to do this for all ULPs and not just SDP, but oh well. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.16/drivers/infiniband/ulp/sdp/sdp_cma.c =================================================================== --- linux-2.6.16.orig/drivers/infiniband/ulp/sdp/sdp_cma.c 2006-05-16 15:22:00.000000000 +0300 +++ linux-2.6.16/drivers/infiniband/ulp/sdp/sdp_cma.c 2006-05-16 15:25:03.000000000 +0300 @@ -237,9 +237,9 @@ int sdp_connect_handler(struct sock *sk, return 0; } -int sdp_connected_handler(struct sock *sk, struct rdma_cm_event *event) +static int sdp_response_handler(struct sock *sk, struct rdma_cm_event *event) { - struct sock *parent; + struct sdp_hah *h; sdp_dbg(sk, "%s\n", __func__); sk->sk_state = TCP_ESTABLISHED; @@ -250,23 +250,37 @@ int sdp_connected_handler(struct sock *s if (sock_flag(sk, SOCK_DEAD)) return 0; + h = event->private_data; + sdp_sk(sk)->bufs = ntohs(h->bsdh.bufs); + sdp_sk(sk)->xmit_size_goal = ntohl(h->actrcvsz) - + sizeof(struct sdp_bsdh); + + sdp_dbg(sk, "%s bufs %d xmit_size_goal %d\n", __func__, + sdp_sk(sk)->bufs, + sdp_sk(sk)->xmit_size_goal); + + ib_req_notify_cq(sdp_sk(sk)->qp->send_cq, IB_CQ_NEXT_COMP); + + sk->sk_state_change(sk); + sk_wake_async(sk, 0, POLL_OUT); + return 0; +} + +int sdp_connected_handler(struct sock *sk, struct rdma_cm_event *event) +{ + struct sock *parent; + sdp_dbg(sk, "%s\n", __func__); + parent = sdp_sk(sk)->parent; - if (!parent) { - struct sdp_hah *h = event->private_data; - sdp_sk(sk)->bufs = ntohs(h->bsdh.bufs); - sdp_sk(sk)->xmit_size_goal = ntohl(h->actrcvsz) - - sizeof(struct sdp_bsdh); - - sdp_dbg(sk, "%s bufs %d xmit_size_goal %d\n", __func__, - sdp_sk(sk)->bufs, - sdp_sk(sk)->xmit_size_goal); + BUG_ON(!parent); - ib_req_notify_cq(sdp_sk(sk)->qp->send_cq, IB_CQ_NEXT_COMP); + sk->sk_state = TCP_ESTABLISHED; + + /* TODO: If SOCK_KEEPOPEN set, need to reset and start + keepalive timer here */ - sk->sk_state_change(sk); - sk_wake_async(sk, 0, POLL_OUT); + if (sock_flag(sk, SOCK_DEAD)) return 0; - } lock_sock(parent); if (sk_acceptq_is_full(parent)) { @@ -292,11 +306,6 @@ void sdp_disconnected_handler(struct soc sdp_dbg(sk, "%s\n", __func__); } -void sdp_response_handler(struct sock *sk) -{ - sdp_dbg(sk, "%s\n", __func__); -} - int sdp_cma_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) { struct rdma_conn_param conn_param; @@ -388,7 +397,11 @@ int sdp_cma_handler(struct rdma_cm_id *i break; case RDMA_CM_EVENT_CONNECT_RESPONSE: sdp_dbg(sk, "RDMA_CM_EVENT_CONNECT_RESPONSE\n"); - sdp_response_handler(sk); + rc = sdp_response_handler(sk, event); + if (rc) + rdma_reject(id, NULL, 0); + else + rc = rdma_accept(id, NULL); break; case RDMA_CM_EVENT_CONNECT_ERROR: sdp_dbg(sk, "RDMA_CM_EVENT_CONNECT_ERROR\n"); -- MST From swise at opengridcomputing.com Tue May 16 06:47:35 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 16 May 2006 08:47:35 -0500 Subject: [openib-general] RDMA enabled NICs- newbie In-Reply-To: <4468B6E6.2030201@atipa.com> References: <4468AF33.5010205@ichips.intel.com> <4468B6E6.2030201@atipa.com> Message-ID: <1147787255.25266.12.camel@stevo-desktop> As has been said already, there are two rnics running with the Open Fabrics stack, the Ammasso 1100 and the Chelsio CXGB3 rnics. Tom Tucker and I are the maintainers of this code base (the iwarp branch). I hear NetEffect also has an 10Gb iWARP NIC. As far as I know, they don't have any support for the Open Fabrics iwarp branch yet. Steve. On Mon, 2006-05-15 at 12:14 -0500, Roger Heflin wrote: > Ian Brown wrote: > > Thanks all. > > I indeed fround that > > http://www.ammasso.com/ responds with > > "There is no website configured at this address." > > while > > http://www.chelsio.com/ > > does exist. > > > > Is there a reason why manufacturers will refrain from > > producing RDMA ? (I mean , are there better technologies > > which are a substitute for RDMA for ethernet ?) > > Regards, > > IB > > > I kind of think that the market is too small to support > a company making a card that is at best just slightly cheaper > than things like Infiniband, and Myrinet, and is actually > slower than the Infiniband and Myrinet. > > Consider how many cards one has to sell to pay a single > engineers salary when you are at best making $100-$150 a > card over production costs. The numbers don't look that > good to me, and consider that previous to Ammasso and Chelsio > there have been a long string of companies producing accelerated > nitch network cards of various types (going back as far as the > early 90's), and all of them have failed to get enough > market share to stay in business. About the only thing > that makes one of these companies viable is being bought > out by someone large enough to support the needed funding. > > Level 5 is making accelerated ethernet cards, I believe most > of the acceleration is in software in some manner (kernel bypass), > and I don't know if their card could be made to do rdma. > > Roger > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From swise at opengridcomputing.com Tue May 16 06:59:39 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 16 May 2006 08:59:39 -0500 Subject: [openib-general] PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM Message-ID: <1147787979.25266.23.camel@stevo-desktop> I don't know who maintains src/userspace/perftest, but here is a patch set that enables rdma_bw and rdma_lat to use the RDMA_CM with the addition of the -c or --cma flag. The rkey/addr info is exchanged in the private data, and SEND/RECV's are used to sync the client/server before and after execution. Also, I added -P or --poll to rdma_bw to allow blocking for completion events when none are ready (if you omit -P, it will block when no completion is available, otherwise it will spin). Signed-off-by: Steve Wise Index: rdma_lat.c =================================================================== --- rdma_lat.c (revision 7050) +++ rdma_lat.c (working copy) @@ -53,6 +53,7 @@ #include #include +#include #include "get_clock.h" @@ -71,7 +72,8 @@ struct ibv_context *context; struct ibv_pd *pd; struct ibv_mr *mr; - struct ibv_cq *cq; + struct ibv_cq *scq; + struct ibv_cq *rcq; struct ibv_qp *qp; void *buf; volatile char *post_buf; @@ -80,6 +82,7 @@ int tx_depth; struct ibv_sge list; struct ibv_send_wr wr; + struct rdma_cm_id *cm_id; }; struct pingpong_dest { @@ -323,16 +326,22 @@ return NULL; } - ctx->cq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0); - if (!ctx->cq) { + ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0); + if (!ctx->rcq) { fprintf(stderr, "Couldn't create CQ\n"); return NULL; } + ctx->scq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0); + if (!ctx->scq) { + fprintf(stderr, "Couldn't create CQ\n"); + return NULL; + } + { struct ibv_qp_init_attr attr = { - .send_cq = ctx->cq, - .recv_cq = ctx->cq, + .send_cq = ctx->scq, + .recv_cq = ctx->rcq, .cap = { .max_send_wr = tx_depth, /* Work around: driver doesnt support @@ -370,13 +379,6 @@ } } - ctx->wr.wr_id = PINGPONG_RDMA_WRID; - ctx->wr.sg_list = &ctx->list; - ctx->wr.num_sge = 1; - ctx->wr.opcode = IBV_WR_RDMA_WRITE; - ctx->wr.send_flags = IBV_SEND_SIGNALED | IBV_SEND_INLINE; - ctx->wr.next = NULL; - return ctx; } @@ -489,6 +491,467 @@ return 0; } +/* CMA STUFF */ + +static void pp_post_recv(struct pingpong_context *ctx) +{ + struct ibv_sge list; + struct ibv_recv_wr wr, *bad_wr; + int rc; + + list.addr = (uintptr_t) ctx->buf; + list.length = 1; + list.lkey = ctx->mr->lkey; + wr.next = NULL; + wr.wr_id = 0xdeadbeef; + wr.sg_list = &list; + wr.num_sge = 1; + + rc = ibv_post_recv(ctx->qp, &wr, &bad_wr); + if (rc) { + perror("ibv_post_recv"); + fprintf(stderr, "%s ibv_post_recv failed %d\n", __FUNCTION__, rc); + } +} + +static struct pingpong_context *pp_init_cma_ctx(struct rdma_cm_id *cm_id, + unsigned size, + int tx_depth, int port) +{ + struct pingpong_context *ctx; + + ctx = malloc(sizeof *ctx); + if (!ctx) + return NULL; + + ctx->size = size; + ctx->tx_depth = tx_depth; + + ctx->buf = memalign(page_size, size * 2); + if (!ctx->buf) { + fprintf(stderr, "Couldn't allocate work buf.\n"); + return NULL; + } + + memset(ctx->buf, 0, size * 2); + + ctx->post_buf = (char*)ctx->buf + (size - 1); + ctx->poll_buf = (char*)ctx->buf + (2 * size - 1); + + ctx->cm_id = cm_id; + ctx->context = cm_id->verbs; + if (!ctx->context) { + fprintf(stderr, "%s Unbound cm_id!!\n", __FUNCTION__); + return NULL; + } + + ctx->pd = ibv_alloc_pd(ctx->context); + if (!ctx->pd) { + fprintf(stderr, "Couldn't allocate PD\n"); + return NULL; + } + + /* We dont really want IBV_ACCESS_LOCAL_WRITE, but IB spec says: + * The Consumer is not allowed to assign Remote Write or Remote Atomic to + * a Memory Region that has not been assigned Local Write. */ + ctx->mr = ibv_reg_mr(ctx->pd, ctx->buf, size * 2, + IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_LOCAL_WRITE); + if (!ctx->mr) { + fprintf(stderr, "Couldn't allocate MR\n"); + return NULL; + } + + ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0); + if (!ctx->rcq) { + fprintf(stderr, "Couldn't create RCQ\n"); + return NULL; + } + + ctx->scq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0); + if (!ctx->scq) { + fprintf(stderr, "Couldn't create SCQ\n"); + return NULL; + } + + { + struct ibv_qp_init_attr attr = { + .send_cq = ctx->scq, + .recv_cq = ctx->rcq, + .cap = { + .max_send_wr = tx_depth, + .max_recv_wr = 1, + .max_send_sge = 1, + .max_recv_sge = 1, + .max_inline_data = size + }, + .qp_type = IBV_QPT_RC + }; + + if (rdma_create_qp(ctx->cm_id, ctx->pd, &attr)) { + fprintf(stderr, "Couldn't create QP\n"); + return NULL; + } + ctx->qp = ctx->cm_id->qp; + } + + pp_post_recv(ctx); + + return ctx; +} + +static void pp_close_cma(struct pingpong_context *ctx, const char *servername) +{ + struct rdma_cm_event *event; + int rc; + + if (servername) { + rc = rdma_disconnect(ctx->cm_id); + if (rc) { + perror("rdma_disconnect"); + return; + } + } + + rdma_get_cm_event(&event); + if (event->event != RDMA_CM_EVENT_DISCONNECTED) + printf("unexpected event during disconnect %d\n", event->event); + rdma_ack_cm_event(event); + rdma_destroy_id(ctx->cm_id); +} + +static struct pingpong_context *pp_server_connect_cma(unsigned short port, int size, int tx_depth, + struct pingpong_dest *my_dest, + struct pingpong_dest *rem_dest) +{ + struct rdma_cm_id *listen_id; + struct rdma_cm_event *event; + struct rdma_conn_param conn_param; + int ret; + struct sockaddr_in sin; + struct rdma_cm_id *child_cm_id; + struct pingpong_context *ctx; + + printf("%s starting server\n", __FUNCTION__); + ret = rdma_create_id(&listen_id, NULL); + if (ret) { + fprintf(stderr, "%s rdma_create_id failed %d\n", __FUNCTION__, ret); + return NULL; + } + + sin.sin_addr.s_addr = 0; + sin.sin_family = PF_INET; + sin.sin_port = htons(port); + ret = rdma_bind_addr(listen_id, (struct sockaddr *)&sin); + if (ret) { + fprintf(stderr,"%s rdma_bind_addr failed %d\n", __FUNCTION__, ret); + goto err2; + } + + ret = rdma_listen(listen_id, 0); + if (ret) { + fprintf(stderr,"%s rdma_listen failed %d\n", __FUNCTION__, ret); + goto err2; + } + + ret = rdma_get_cm_event(&event); + if (ret) + goto err2; + + if (event->event != RDMA_CM_EVENT_CONNECT_REQUEST) { + fprintf(stderr,"%s bad event waiting for connect request %d\n", __FUNCTION__, + event->event); + goto err1; + } + + if (!event->private_data || (event->private_data_len < sizeof(*rem_dest))) { + ret = 1; + fprintf(stderr,"%s bad private data len %d\n", __FUNCTION__, + event->private_data_len); + goto err1; + } + + memcpy(rem_dest, event->private_data, sizeof(*rem_dest)); + child_cm_id = (struct rdma_cm_id *)event->id; + ctx = pp_init_cma_ctx(child_cm_id, size, tx_depth, port); + + if (!ctx) { + fprintf(stderr,"%s pp_init_cma_ctx failed\n", __FUNCTION__); + goto err0; + } + + my_dest->qpn = 0; + my_dest->psn = 0xbb; + my_dest->rkey = ctx->mr->rkey; + my_dest->vaddr = (uintptr_t)ctx->buf + ctx->size; + memset(&conn_param, 0, sizeof conn_param); + conn_param.responder_resources = 1; + conn_param.initiator_depth = 1; + conn_param.private_data = my_dest; + conn_param.private_data_len = sizeof(*my_dest); + ret = rdma_accept(ctx->cm_id, &conn_param); + if (ret) { + fprintf(stderr,"%s rdma_accept failed %d\n", __FUNCTION__, ret); + goto err0; + } + rdma_ack_cm_event(event); + ret = rdma_get_cm_event(&event); + if (ret) { + fprintf(stderr,"rdma_get_cm_event error %d\n", ret); + rdma_destroy_id(child_cm_id); + goto err2; + } + if (event->event != RDMA_CM_EVENT_ESTABLISHED) { + fprintf(stderr,"%s bad event waiting for established %d\n", __FUNCTION__, + event->event); + goto err0; + } + rdma_ack_cm_event(event); + fprintf(stderr,"%s connected!\n", __FUNCTION__); + return ctx; +err0: + rdma_destroy_id(child_cm_id); +err1: + rdma_ack_cm_event(event); +err2: + rdma_destroy_id(listen_id); + fprintf(stderr,"%s NOT connected!\n", __FUNCTION__); + return NULL; +} + +static unsigned get_dst_addr(const char *dst) +{ + struct addrinfo *res; + int ret; + unsigned addr; + + ret = getaddrinfo(dst, NULL, NULL, &res); + if (ret) { + fprintf(stderr, "%s getaddrinfo failed - invalid hostname or IP address\n", + __FUNCTION__); + return 0; + } + + if (res->ai_family != PF_INET) { + return 0; + } + + addr = ((struct sockaddr_in*)res->ai_addr)->sin_addr.s_addr; + freeaddrinfo(res); + return addr; +} + +static struct pingpong_context *pp_client_connect_cma(const char *servername, + unsigned short port, int size, int tx_depth, + struct pingpong_dest *my_dest, + struct pingpong_dest *rem_dest) +{ + struct rdma_cm_event *event; + struct rdma_conn_param conn_param; + int ret; + struct sockaddr_in sin; + struct rdma_cm_id *cm_id; + struct pingpong_context *ctx; + + fprintf(stderr,"%s starting client\n", __FUNCTION__); + sin.sin_addr.s_addr = get_dst_addr(servername); + if (!sin.sin_addr.s_addr) { + return NULL; + } + + ret = rdma_create_id(&cm_id, NULL); + if (ret) { + fprintf(stderr,"%s rdma_create_id failed %d\n", __FUNCTION__, ret); + return NULL; + } + + sin.sin_family = PF_INET; + sin.sin_port = htons(port); + ret = rdma_resolve_addr(cm_id, NULL, (struct sockaddr *)&sin, 2000); + if (ret) { + fprintf(stderr,"%s rdma_resolve_addr failed %d\n", __FUNCTION__, ret); + goto err2; + } + + ret = rdma_get_cm_event(&event); + if (ret) + goto err2; + + if (event->event != RDMA_CM_EVENT_ADDR_RESOLVED) { + fprintf(stderr,"%s/%d unexpected CM event %d\n", __FUNCTION__, __LINE__, + event->event); + goto err1; + } + rdma_ack_cm_event(event); + + ret = rdma_resolve_route(cm_id, 2000); + if (ret) { + fprintf(stderr,"%s rdma_resolve_route failed %d\n", __FUNCTION__, ret); + goto err2; + } + + ret = rdma_get_cm_event(&event); + if (ret) + goto err2; + + if (event->event != RDMA_CM_EVENT_ROUTE_RESOLVED) { + fprintf(stderr,"%s/%d unexpected CM event %d\n", __FUNCTION__, __LINE__, + event->event); + goto err1; + } + rdma_ack_cm_event(event); + + ctx = pp_init_cma_ctx(cm_id, size, tx_depth, port); + + if (!ctx) { + fprintf(stderr,"%s pp_init_cma_ctx failed\n", __FUNCTION__); + goto err2; + } + + my_dest->qpn = 0; + my_dest->psn = 0xaa; + my_dest->rkey = ctx->mr->rkey; + my_dest->vaddr = (uintptr_t)ctx->buf + ctx->size; + memset(&conn_param, 0, sizeof conn_param); + conn_param.responder_resources = 1; + conn_param.initiator_depth = 1; + conn_param.retry_count = 5; + conn_param.private_data = my_dest; + conn_param.private_data_len = sizeof(*my_dest); + ret = rdma_connect(ctx->cm_id, &conn_param); + if (ret) { + fprintf(stderr,"%s rdma_connect failure %d\n", __FUNCTION__, ret); + goto err2; + } + + ret = rdma_get_cm_event(&event); + if (ret) + goto err2; + + if (event->event != RDMA_CM_EVENT_ESTABLISHED) { + fprintf(stderr,"%s/%d unexpected CM event %d\n", __FUNCTION__, __LINE__, + event->event); + goto err1; + } + if (!event->private_data || (event->private_data_len < sizeof(*rem_dest))) { + fprintf(stderr,"%s bad private data ptr %p len %d\n", __FUNCTION__, + event->private_data, event->private_data_len); + goto err1; + } + + memcpy(rem_dest, event->private_data, sizeof(*rem_dest)); + rdma_ack_cm_event(event); + fprintf(stderr,"connected!\n"); + return ctx; +err1: + rdma_ack_cm_event(event); +err2: + fprintf(stderr,"NOT connected!\n"); + rdma_destroy_id(cm_id); + return NULL; +} + +static void pp_wait_for_done(struct pingpong_context *ctx) +{ + struct ibv_wc wc; + int ne; + + do { + usleep(1000); + ne = ibv_poll_cq(ctx->rcq, 1, &wc); + } while (ne == 0); + + if (wc.status) + fprintf(stderr, "%s bad wc status %d\n", __FUNCTION__, wc.status); + if (!(wc.opcode & IBV_WC_RECV)) + fprintf(stderr, "%s bad wc opcode %d\n", __FUNCTION__, wc.opcode); + if (wc.wr_id != 0xdeadbeef) + fprintf(stderr, "%s bad wc wr_id 0x%x\n", __FUNCTION__, (int)wc.wr_id); +} + +static void pp_send_done(struct pingpong_context *ctx) +{ + struct ibv_send_wr *bad_wr; + int rc; + struct ibv_wc wc; + int ne; + + ctx->list.addr = (uintptr_t) ctx->buf; + ctx->list.length = 1; + ctx->list.lkey = ctx->mr->lkey; + ctx->wr.wr_id = 0xcafebabe; + ctx->wr.sg_list = &ctx->list; + ctx->wr.num_sge = 1; + ctx->wr.opcode = IBV_WR_SEND; + ctx->wr.send_flags = IBV_SEND_SIGNALED; + ctx->wr.next = NULL; + rc = ibv_post_send(ctx->qp, &ctx->wr, &bad_wr); + if (rc != 0) + fprintf(stderr, "%s ibv_post_send failed %d!\n", __FUNCTION__, rc); + do { + usleep(1000); + ne = ibv_poll_cq(ctx->scq, 1, &wc); + } while (ne == 0); + + if (wc.status) + fprintf(stderr, "%s bad wc status %d\n", __FUNCTION__, wc.status); + if (wc.opcode != IBV_WC_SEND) + fprintf(stderr, "%s bad wc opcode %d\n", __FUNCTION__, wc.opcode); + if (wc.wr_id != 0xcafebabe) + fprintf(stderr, "%s bad wc wr_id 0x%x\n", __FUNCTION__, (int)wc.wr_id); + sleep(1); +} + +static void pp_wait_for_start(struct pingpong_context *ctx) +{ + struct ibv_wc wc; + int ne; + + do { + usleep(1000); + ne = ibv_poll_cq(ctx->rcq, 1, &wc); + } while (ne == 0); + + if (wc.status) + fprintf(stderr, "%s bad wc status %d\n", __FUNCTION__, wc.status); + if (!(wc.opcode & IBV_WC_RECV)) + fprintf(stderr, "%s bad wc opcode %d\n", __FUNCTION__, wc.opcode); + if (wc.wr_id != 0xdeadbeef) + fprintf(stderr, "%s bad wc wr_id 0x%x\n", __FUNCTION__, (int)wc.wr_id); + pp_post_recv(ctx); +} + +static void pp_send_start(struct pingpong_context *ctx) +{ + struct ibv_send_wr *bad_wr; + int rc; + struct ibv_wc wc; + int ne; + + ctx->list.addr = (uintptr_t) ctx->buf; + ctx->list.length = 1; + ctx->list.lkey = ctx->mr->lkey; + ctx->wr.wr_id = 0xabbaabba; + ctx->wr.sg_list = &ctx->list; + ctx->wr.num_sge = 1; + ctx->wr.opcode = IBV_WR_SEND; + ctx->wr.send_flags = IBV_SEND_SIGNALED; + ctx->wr.next = NULL; + rc = ibv_post_send(ctx->qp, &ctx->wr, &bad_wr); + if (rc != 0) + fprintf(stderr, "%s ibv_post_send failed %d!\n", __FUNCTION__, rc); + do { + usleep(1000); + ne = ibv_poll_cq(ctx->scq, 1, &wc); + } while (ne == 0); + + if (wc.status) + fprintf(stderr, "%s bad wc status %d\n", __FUNCTION__, wc.status); + if (wc.opcode != IBV_WC_SEND) + fprintf(stderr, "%s bad wc opcode %d\n", __FUNCTION__, wc.opcode); + if (wc.wr_id != 0xabbaabba) + fprintf(stderr, "%s bad wc wr_id 0x%x\n", __FUNCTION__, (int)wc.wr_id); +} + static void usage(const char *argv0) { printf("Usage:\n"); @@ -505,6 +968,7 @@ printf(" -C, --report-cycles report times in cpu cycle units (default microseconds)\n"); printf(" -H, --report-histogram print out all results (default print summary only)\n"); printf(" -U, --report-unsorted (implies -H) print out unsorted results (default sorted)\n"); + printf(" -c, --cma Use the RDMA CMA to setup the RDMA connection\n"); } /* @@ -599,6 +1063,7 @@ struct ibv_send_wr *wr; volatile char *poll_buf; volatile char *post_buf; + int use_cma=0; int scnt, rcnt, ccnt; @@ -618,14 +1083,19 @@ { .name = "report-cycles", .has_arg = 0, .val = 'C' }, { .name = "report-histogram",.has_arg = 0, .val = 'H' }, { .name = "report-unsorted",.has_arg = 0, .val = 'U' }, + { .name = "cma", .has_arg = 0, .val = 'c' }, { 0 } }; - c = getopt_long(argc, argv, "p:d:i:s:n:t:CHU", long_options, NULL); + c = getopt_long(argc, argv, "cp:d:i:s:n:t:CHU", long_options, NULL); if (c == -1) break; switch (c) { + case 'c': + use_cma = 1; + break; + case 'p': port = strtol(optarg, NULL, 0); if (port < 0 || port > 65535) { @@ -697,23 +1167,62 @@ srand48(getpid() * time(NULL)); page_size = sysconf(_SC_PAGESIZE); - ib_dev = pp_find_dev(ib_devname); - if (!ib_dev) - return 7; + if (use_cma) { + int rc; + struct pingpong_dest my_dest; + + memset(&rem_dest, 0, sizeof(rem_dest)); - ctx = pp_init_ctx(ib_dev, size, tx_depth, ib_port); - if (!ctx) - return 8; + if (servername) + ctx = pp_client_connect_cma(servername, port, size, tx_depth, &my_dest, + &rem_dest); + else + ctx = pp_server_connect_cma(port, size, tx_depth, &my_dest, &rem_dest); + if (!ctx) { + fprintf(stderr, "pp_connect_cma(%s,%d) failed!\n", servername, port); + return rc; + } - if (pp_open_port(ctx, servername, ib_port, port, &rem_dest)) - return 9; + /* + * Synch up and force the server to wait for the client to send + * the first message (MPA requirement). + */ + if (servername) { + sleep(1); + pp_send_start(ctx); + } else { + pp_wait_for_start(ctx); + } + printf(" local address: PSN %#06x RKey %#08x VAddr %#016Lx\n", + my_dest.psn, my_dest.rkey, my_dest.vaddr); + printf(" remote address: PSN %#06x RKey %#08x VAddr %#016Lx\n", + rem_dest.psn, rem_dest.rkey, rem_dest.vaddr); + } else { + ib_dev = pp_find_dev(ib_devname); + if (!ib_dev) + return 7; + + ctx = pp_init_ctx(ib_dev, size, tx_depth, ib_port); + if (!ctx) + return 8; + + if (pp_open_port(ctx, servername, ib_port, port, &rem_dest)) + return 9; + } + wr = &ctx->wr; ctx->list.addr = (uintptr_t) ctx->buf; ctx->list.length = ctx->size; ctx->list.lkey = ctx->mr->lkey; wr->wr.rdma.remote_addr = rem_dest.vaddr; wr->wr.rdma.rkey = rem_dest.rkey; + wr->wr_id = PINGPONG_RDMA_WRID; + wr->sg_list = &ctx->list; + wr->num_sge = 1; + wr->opcode = IBV_WR_RDMA_WRITE; + wr->send_flags = IBV_SEND_SIGNALED | IBV_SEND_INLINE; + wr->next = NULL; scnt = 0; rcnt = 0; @@ -757,9 +1266,10 @@ if (ccnt < iters) { struct ibv_wc wc; int ne; + ++ccnt; do { - ne = ibv_poll_cq(ctx->cq, 1, &wc); + ne = ibv_poll_cq(ctx->scq, 1, &wc); } while (ne == 0); if (ne < 0) { @@ -778,6 +1288,12 @@ } } + if (use_cma) { + pp_send_done(ctx); + pp_wait_for_done(ctx); + pp_close_cma(ctx, servername); + } + print_report(&report, iters, tstamp); return 0; } Index: rdma_bw.c =================================================================== --- rdma_bw.c (revision 7050) +++ rdma_bw.c (working copy) @@ -53,6 +53,7 @@ #include #include +#include #include "get_clock.h" @@ -64,13 +65,16 @@ struct ibv_context *context; struct ibv_pd *pd; struct ibv_mr *mr; - struct ibv_cq *cq; + struct ibv_cq *rcq; + struct ibv_cq *scq; + struct ibv_comp_channel *ch; struct ibv_qp *qp; void *buf; unsigned size; int tx_depth; struct ibv_sge list; struct ibv_send_wr wr; + struct rdma_cm_id *cm_id; }; struct pingpong_dest { @@ -308,16 +312,27 @@ return NULL; } - ctx->cq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0); - if (!ctx->cq) { + ctx->ch = ibv_create_comp_channel(ctx->context); + if (!ctx->ch) { + fprintf(stderr, "Couldn't create comp channel\n"); + return NULL; + } + ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0); + if (!ctx->rcq) { fprintf(stderr, "Couldn't create CQ\n"); return NULL; } + ctx->scq = ibv_create_cq(ctx->context, tx_depth, ctx, ctx->ch, 0); + if (!ctx->scq) { + fprintf(stderr, "Couldn't create CQ\n"); + return NULL; + } + { struct ibv_qp_init_attr attr = { - .send_cq = ctx->cq, - .recv_cq = ctx->cq, + .send_cq = ctx->scq, + .recv_cq = ctx->rcq, .cap = { .max_send_wr = tx_depth, /* Work around: driver doesnt support @@ -407,6 +422,469 @@ return 0; } +/* CMA STUFF */ + +static void pp_post_recv(struct pingpong_context *ctx) +{ + struct ibv_sge list; + struct ibv_recv_wr wr, *bad_wr; + int rc; + + list.addr = (uintptr_t) ctx->buf; + list.length = 1; + list.lkey = ctx->mr->lkey; + wr.next = NULL; + wr.wr_id = 0xdeadbeef; + wr.sg_list = &list; + wr.num_sge = 1; + + rc = ibv_post_recv(ctx->qp, &wr, &bad_wr); + if (rc) { + perror("ibv_post_recv"); + fprintf(stderr, "%s ibv_post_recv failed %d\n", __FUNCTION__, rc); + } +} + +static struct pingpong_context *pp_init_cma_ctx(struct rdma_cm_id *cm_id, + unsigned size, + int tx_depth, int port) +{ + struct pingpong_context *ctx; + + ctx = malloc(sizeof *ctx); + if (!ctx) + return NULL; + + ctx->size = size; + ctx->tx_depth = tx_depth; + + ctx->buf = memalign(page_size, size * 2); + if (!ctx->buf) { + fprintf(stderr, "Couldn't allocate work buf.\n"); + return NULL; + } + + memset(ctx->buf, 0, size * 2); + + ctx->cm_id = cm_id; + ctx->context = cm_id->verbs; + if (!ctx->context) { + fprintf(stderr, "%s Unbound cm_id!!\n", __FUNCTION__); + return NULL; + } + + ctx->pd = ibv_alloc_pd(ctx->context); + if (!ctx->pd) { + fprintf(stderr, "Couldn't allocate PD\n"); + return NULL; + } + + /* We dont really want IBV_ACCESS_LOCAL_WRITE, but IB spec says: + * The Consumer is not allowed to assign Remote Write or Remote Atomic to + * a Memory Region that has not been assigned Local Write. */ + ctx->mr = ibv_reg_mr(ctx->pd, ctx->buf, size * 2, + IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_LOCAL_WRITE); + if (!ctx->mr) { + fprintf(stderr, "Couldn't allocate MR\n"); + return NULL; + } + + ctx->ch = ibv_create_comp_channel(ctx->context); + if (!ctx->ch) { + fprintf(stderr, "Couldn't create comp channel\n"); + return NULL; + } + + ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0); + if (!ctx->rcq) { + fprintf(stderr, "Couldn't create RCQ\n"); + return NULL; + } + + ctx->scq = ibv_create_cq(ctx->context, tx_depth, ctx, ctx->ch, 0); + if (!ctx->scq) { + fprintf(stderr, "Couldn't create SCQ\n"); + return NULL; + } + + { + struct ibv_qp_init_attr attr = { + .send_cq = ctx->scq, + .recv_cq = ctx->rcq, + .cap = { + .max_send_wr = tx_depth, + .max_recv_wr = 1, + .max_send_sge = 1, + .max_recv_sge = 1, + .max_inline_data = 0 + }, + .qp_type = IBV_QPT_RC + }; + if (rdma_create_qp(ctx->cm_id, ctx->pd, &attr)) { + fprintf(stderr, "Couldn't create QP\n"); + return NULL; + } + ctx->qp = ctx->cm_id->qp; + } + + pp_post_recv(ctx); + + return ctx; +} + +static void pp_close_cma(struct pingpong_context *ctx, const char *servername) +{ + struct rdma_cm_event *event; + int rc; + + if (servername) { + rc = rdma_disconnect(ctx->cm_id); + if (rc) { + perror("rdma_disconnect"); + return; + } + } + + rdma_get_cm_event(&event); + if (event->event != RDMA_CM_EVENT_DISCONNECTED) + printf("unexpected event during disconnect %d\n", event->event); + rdma_ack_cm_event(event); + rdma_destroy_id(ctx->cm_id); +} + +static struct pingpong_context *pp_server_connect_cma(unsigned short port, int size, int tx_depth, + struct pingpong_dest *my_dest, + struct pingpong_dest *rem_dest) +{ + struct rdma_cm_id *listen_id; + struct rdma_cm_event *event; + struct rdma_conn_param conn_param; + int ret; + struct sockaddr_in sin; + struct rdma_cm_id *child_cm_id; + struct pingpong_context *ctx; + + printf("%s starting server\n", __FUNCTION__); + ret = rdma_create_id(&listen_id, NULL); + if (ret) { + fprintf(stderr, "%s rdma_create_id failed %d\n", __FUNCTION__, ret); + return NULL; + } + + sin.sin_addr.s_addr = 0; + sin.sin_family = PF_INET; + sin.sin_port = htons(port); + ret = rdma_bind_addr(listen_id, (struct sockaddr *)&sin); + if (ret) { + fprintf(stderr,"%s rdma_bind_addr failed %d\n", __FUNCTION__, ret); + goto err2; + } + + ret = rdma_listen(listen_id, 0); + if (ret) { + fprintf(stderr,"%s rdma_listen failed %d\n", __FUNCTION__, ret); + goto err2; + } + + ret = rdma_get_cm_event(&event); + if (ret) + goto err2; + + if (event->event != RDMA_CM_EVENT_CONNECT_REQUEST) { + fprintf(stderr,"%s bad event waiting for connect request %d\n", __FUNCTION__, + event->event); + goto err1; + } + + if (!event->private_data || (event->private_data_len < sizeof(*rem_dest))) { + ret = 1; + fprintf(stderr,"%s bad private data len %d\n", __FUNCTION__, + event->private_data_len); + goto err1; + } + + memcpy(rem_dest, event->private_data, sizeof(*rem_dest)); + child_cm_id = (struct rdma_cm_id *)event->id; + ctx = pp_init_cma_ctx(child_cm_id, size, tx_depth, port); + + if (!ctx) { + fprintf(stderr,"%s pp_init_cma_ctx failed\n", __FUNCTION__); + goto err0; + } + + my_dest->qpn = 0; + my_dest->psn = 0xbb; + my_dest->rkey = ctx->mr->rkey; + my_dest->vaddr = (uintptr_t)ctx->buf + ctx->size; + memset(&conn_param, 0, sizeof conn_param); + conn_param.responder_resources = 1; + conn_param.initiator_depth = 1; + conn_param.private_data = my_dest; + conn_param.private_data_len = sizeof(*my_dest); + ret = rdma_accept(ctx->cm_id, &conn_param); + if (ret) { + fprintf(stderr,"%s rdma_accept failed %d\n", __FUNCTION__, ret); + goto err0; + } + rdma_ack_cm_event(event); + ret = rdma_get_cm_event(&event); + if (ret) { + fprintf(stderr,"rdma_get_cm_event error %d\n", ret); + rdma_destroy_id(child_cm_id); + goto err2; + } + if (event->event != RDMA_CM_EVENT_ESTABLISHED) { + fprintf(stderr,"%s bad event waiting for established %d\n", __FUNCTION__, + event->event); + goto err0; + } + rdma_ack_cm_event(event); + fprintf(stderr,"%s connected!\n", __FUNCTION__); + return ctx; +err0: + rdma_destroy_id(child_cm_id); +err1: + rdma_ack_cm_event(event); +err2: + rdma_destroy_id(listen_id); + fprintf(stderr,"%s NOT connected!\n", __FUNCTION__); + return NULL; +} + +static unsigned get_dst_addr(const char *dst) +{ + struct addrinfo *res; + int ret; + unsigned addr; + + ret = getaddrinfo(dst, NULL, NULL, &res); + if (ret) { + fprintf(stderr, "%s getaddrinfo failed - invalid hostname or IP address\n", + __FUNCTION__); + return 0; + } + + if (res->ai_family != PF_INET) { + return 0; + } + + addr = ((struct sockaddr_in*)res->ai_addr)->sin_addr.s_addr; + freeaddrinfo(res); + return addr; +} + +static struct pingpong_context *pp_client_connect_cma(const char *servername, + unsigned short port, int size, int tx_depth, + struct pingpong_dest *my_dest, + struct pingpong_dest *rem_dest) +{ + struct rdma_cm_event *event; + struct rdma_conn_param conn_param; + int ret; + struct sockaddr_in sin; + struct rdma_cm_id *cm_id; + struct pingpong_context *ctx; + + fprintf(stderr,"%s starting client\n", __FUNCTION__); + sin.sin_addr.s_addr = get_dst_addr(servername); + if (!sin.sin_addr.s_addr) { + return NULL; + } + + ret = rdma_create_id(&cm_id, NULL); + if (ret) { + fprintf(stderr,"%s rdma_create_id failed %d\n", __FUNCTION__, ret); + return NULL; + } + + sin.sin_family = PF_INET; + sin.sin_port = htons(port); + ret = rdma_resolve_addr(cm_id, NULL, (struct sockaddr *)&sin, 2000); + if (ret) { + fprintf(stderr,"%s rdma_resolve_addr failed %d\n", __FUNCTION__, ret); + goto err2; + } + + ret = rdma_get_cm_event(&event); + if (ret) + goto err2; + + if (event->event != RDMA_CM_EVENT_ADDR_RESOLVED) { + fprintf(stderr,"%s/%d unexpected CM event %d\n", __FUNCTION__, __LINE__, + event->event); + goto err1; + } + rdma_ack_cm_event(event); + + ret = rdma_resolve_route(cm_id, 2000); + if (ret) { + fprintf(stderr,"%s rdma_resolve_route failed %d\n", __FUNCTION__, ret); + goto err2; + } + + ret = rdma_get_cm_event(&event); + if (ret) + goto err2; + + if (event->event != RDMA_CM_EVENT_ROUTE_RESOLVED) { + fprintf(stderr,"%s/%d unexpected CM event %d\n", __FUNCTION__, __LINE__, + event->event); + goto err1; + } + rdma_ack_cm_event(event); + + ctx = pp_init_cma_ctx(cm_id, size, tx_depth, port); + + if (!ctx) { + fprintf(stderr,"%s pp_init_cma_ctx failed\n", __FUNCTION__); + goto err2; + } + + my_dest->qpn = 0; + my_dest->psn = 0xaa; + my_dest->rkey = ctx->mr->rkey; + my_dest->vaddr = (uintptr_t)ctx->buf + ctx->size; + memset(&conn_param, 0, sizeof conn_param); + conn_param.responder_resources = 1; + conn_param.initiator_depth = 1; + conn_param.retry_count = 5; + conn_param.private_data = my_dest; + conn_param.private_data_len = sizeof(*my_dest); + ret = rdma_connect(ctx->cm_id, &conn_param); + if (ret) { + fprintf(stderr,"%s rdma_connect failure %d\n", __FUNCTION__, ret); + goto err2; + } + + ret = rdma_get_cm_event(&event); + if (ret) + goto err2; + + if (event->event != RDMA_CM_EVENT_ESTABLISHED) { + fprintf(stderr,"%s/%d unexpected CM event %d\n", __FUNCTION__, __LINE__, + event->event); + goto err1; + } + if (!event->private_data || (event->private_data_len < sizeof(*rem_dest))) { + fprintf(stderr,"%s bad private data ptr %p len %d\n", __FUNCTION__, + event->private_data, event->private_data_len); + goto err1; + } + memcpy(rem_dest, event->private_data, sizeof(*rem_dest)); + rdma_ack_cm_event(event); + fprintf(stderr,"connected!\n"); + return ctx; +err1: + rdma_ack_cm_event(event); +err2: + fprintf(stderr,"NOT connected!\n"); + rdma_destroy_id(cm_id); + return NULL; +} + +static void pp_wait_for_done(struct pingpong_context *ctx) +{ + struct ibv_wc wc; + int ne; + + do { + usleep(1000); + ne = ibv_poll_cq(ctx->rcq, 1, &wc); + } while (ne == 0); + + if (wc.status) + fprintf(stderr, "%s bad wc status %d\n", __FUNCTION__, wc.status); + if (!(wc.opcode & IBV_WC_RECV)) + fprintf(stderr, "%s bad wc opcode %d\n", __FUNCTION__, wc.opcode); + if (wc.wr_id != 0xdeadbeef) + fprintf(stderr, "%s bad wc wr_id 0x%x\n", __FUNCTION__, (int)wc.wr_id); +} + +static void pp_send_done(struct pingpong_context *ctx) +{ + struct ibv_send_wr *bad_wr; + int rc; + struct ibv_wc wc; + int ne; + + ctx->list.addr = (uintptr_t) ctx->buf; + ctx->list.length = 1; + ctx->list.lkey = ctx->mr->lkey; + ctx->wr.wr_id = 0xcafebabe; + ctx->wr.sg_list = &ctx->list; + ctx->wr.num_sge = 1; + ctx->wr.opcode = IBV_WR_SEND; + ctx->wr.send_flags = IBV_SEND_SIGNALED; + ctx->wr.next = NULL; + rc = ibv_post_send(ctx->qp, &ctx->wr, &bad_wr); + if (rc != 0) + fprintf(stderr, "%s ibv_post_send failed %d!\n", __FUNCTION__, rc); + do { + usleep(1000); + ne = ibv_poll_cq(ctx->scq, 1, &wc); + } while (ne == 0); + + if (wc.status) + fprintf(stderr, "%s bad wc status %d\n", __FUNCTION__, wc.status); + if (wc.opcode != IBV_WC_SEND) + fprintf(stderr, "%s bad wc opcode %d\n", __FUNCTION__, wc.opcode); + if (wc.wr_id != 0xcafebabe) + fprintf(stderr, "%s bad wc wr_id 0x%x\n", __FUNCTION__, (int)wc.wr_id); + sleep(1); +} + +static void pp_wait_for_start(struct pingpong_context *ctx) +{ + struct ibv_wc wc; + int ne; + + do { + usleep(1000); + ne = ibv_poll_cq(ctx->rcq, 1, &wc); + } while (ne == 0); + + if (wc.status) + fprintf(stderr, "%s bad wc status %d\n", __FUNCTION__, wc.status); + if (!(wc.opcode & IBV_WC_RECV)) + fprintf(stderr, "%s bad wc opcode %d\n", __FUNCTION__, wc.opcode); + if (wc.wr_id != 0xdeadbeef) + fprintf(stderr, "%s bad wc wr_id 0x%x\n", __FUNCTION__, (int)wc.wr_id); + pp_post_recv(ctx); +} + +static void pp_send_start(struct pingpong_context *ctx) +{ + struct ibv_send_wr *bad_wr; + int rc; + struct ibv_wc wc; + int ne; + + ctx->list.addr = (uintptr_t) ctx->buf; + ctx->list.length = 1; + ctx->list.lkey = ctx->mr->lkey; + ctx->wr.wr_id = 0xabbaabba; + ctx->wr.sg_list = &ctx->list; + ctx->wr.num_sge = 1; + ctx->wr.opcode = IBV_WR_SEND; + ctx->wr.send_flags = IBV_SEND_SIGNALED; + ctx->wr.next = NULL; + rc = ibv_post_send(ctx->qp, &ctx->wr, &bad_wr); + if (rc != 0) + fprintf(stderr, "%s ibv_post_send failed %d!\n", __FUNCTION__, rc); + do { + usleep(1000); + ne = ibv_poll_cq(ctx->scq, 1, &wc); + } while (ne == 0); + + if (wc.status) + fprintf(stderr, "%s bad wc status %d\n", __FUNCTION__, wc.status); + if (wc.opcode != IBV_WC_SEND) + fprintf(stderr, "%s bad wc opcode %d\n", __FUNCTION__, wc.opcode); + if (wc.wr_id != 0xabbaabba) + fprintf(stderr, "%s bad wc wr_id 0x%x\n", __FUNCTION__, (int)wc.wr_id); + printf("start sent\n"); +} + static void usage(const char *argv0) { printf("Usage:\n"); @@ -421,6 +899,7 @@ printf(" -t, --tx-depth= size of tx queue (default 100)\n"); printf(" -n, --iters= number of exchanges (at least 2, default 1000)\n"); printf(" -b, --bidirectional measure bidirectional bandwidth (default unidirectional)\n"); + printf(" -c, --cma Use the RDMA CMA to setup the RDMA connection\n"); } static void print_report(unsigned int iters, unsigned size, int duplex, @@ -438,14 +917,15 @@ /* Find the peak bandwidth */ for (i = 0; i < iters; ++i) - for (j = i; j < iters; ++j) { - t = (tcompleted[j] - tposted[i]) / (j - i + 1); - if (t < opt_delta) { - opt_delta = t; - opt_posted = i; - opt_completed = j; + if ((i+200) < iters) + for (j = i; j < i+200; ++j) { + t = (tcompleted[j] - tposted[i]) / (j - i + 1); + if (t < opt_delta) { + opt_delta = t; + opt_posted = i; + opt_completed = j; + } } - } cycles_to_units = get_cpu_mhz() * 1000000; @@ -486,10 +966,16 @@ int sockfd; int duplex = 0; struct ibv_qp *qp; + int use_cma=0; + int poll = 0; + cycles_t *tposted; + cycles_t *tcompleted; + struct ibv_wc wc; + int ne; + struct ibv_cq *ev_cq; + void *ev_ctx; + int blocks = 0; - cycles_t *tposted; - cycles_t *tcompleted; - /* Parameter parsing. */ while (1) { int c; @@ -502,14 +988,24 @@ { .name = "iters", .has_arg = 1, .val = 'n' }, { .name = "tx-depth", .has_arg = 1, .val = 't' }, { .name = "bidirectional", .has_arg = 0, .val = 'b' }, + { .name = "cma", .has_arg = 0, .val = 'c' }, + { .name = "poll", .has_arg = 0, .val = 'P' }, { 0 } }; - c = getopt_long(argc, argv, "p:d:i:s:n:t:b", long_options, NULL); + c = getopt_long(argc, argv, "p:d:i:s:n:t:bcP", long_options, NULL); if (c == -1) break; switch (c) { + case 'c': + use_cma = 1; + break; + + case 'P': + poll = 1; + break; + case 'p': port = strtol(optarg, NULL, 0); if (port < 0 || port > 65535) { @@ -552,15 +1048,6 @@ break; - case 'l': - tx_depth = strtol(optarg, NULL, 0); - if (tx_depth < 1) { - usage(argv[0]); - return 1; - } - - break; - case 'b': duplex = 1; break; @@ -585,82 +1072,125 @@ page_size = sysconf(_SC_PAGESIZE); - dev_list = ibv_get_device_list(NULL); - - if (!ib_devname) { - ib_dev = dev_list[0]; - if (!ib_dev) { - fprintf(stderr, "No IB devices found\n"); + if (use_cma) { + int rc; + rem_dest = malloc(sizeof *rem_dest); + if (!rem_dest) { + fprintf(stderr,"%s cannot malloc rem_dest!\n", __FUNCTION__); return 1; } + + memset(rem_dest, 0, sizeof(*rem_dest)); + + if (servername) + ctx = pp_client_connect_cma(servername, port, size, tx_depth, &my_dest, + rem_dest); + else + ctx = pp_server_connect_cma(port, size, tx_depth, &my_dest, rem_dest); + if (!ctx) { + fprintf(stderr, "pp_connect_cma(%s,%d) failed!\n", servername, port); + return rc; + } + + /* + * Synch up and force the server to wait for the client to send + * the first message (MPA requirement). + */ + if (servername) { + sleep(1); + pp_send_start(ctx); + } else { + pp_wait_for_start(ctx); + } + + printf(" local address: PSN %#06x RKey %#08x VAddr %#016Lx\n", + my_dest.psn, my_dest.rkey, my_dest.vaddr); + printf(" remote address: PSN %#06x RKey %#08x VAddr %#016Lx\n", + rem_dest->psn, rem_dest->rkey, rem_dest->vaddr); } else { - for (; (ib_dev = *dev_list); ++dev_list) - if (!strcmp(ibv_get_device_name(ib_dev), ib_devname)) - break; - if (!ib_dev) { - fprintf(stderr, "IB device %s not found\n", ib_devname); - return 1; + dev_list = ibv_get_device_list(NULL); + + if (!ib_devname) { + ib_dev = dev_list[0]; + if (!ib_dev) { + fprintf(stderr, "No IB devices found\n"); + return 1; + } + } else { + for (; (ib_dev = *dev_list); ++dev_list) + if (!strcmp(ibv_get_device_name(ib_dev), ib_devname)) + break; + if (!ib_dev) { + fprintf(stderr, "IB device %s not found\n", ib_devname); + return 1; + } } - } - ctx = pp_init_ctx(ib_dev, size, tx_depth, ib_port); - if (!ctx) - return 1; + ctx = pp_init_ctx(ib_dev, size, tx_depth, ib_port); + if (!ctx) + return 1; - /* Create connection between client and server. - * We do it by exchanging data over a TCP socket connection. */ + /* Create connection between client and server. + * We do it by exchanging data over a TCP socket connection. */ - my_dest.lid = pp_get_local_lid(ctx, ib_port); - my_dest.qpn = ctx->qp->qp_num; - my_dest.psn = lrand48() & 0xffffff; - if (!my_dest.lid) { - fprintf(stderr, "Local lid 0x0 detected. Is an SM running?\n"); - return 1; - } - my_dest.rkey = ctx->mr->rkey; - my_dest.vaddr = (uintptr_t)ctx->buf + ctx->size; + my_dest.lid = pp_get_local_lid(ctx, ib_port); + my_dest.qpn = ctx->qp->qp_num; + my_dest.psn = lrand48() & 0xffffff; + if (!my_dest.lid) { + fprintf(stderr, "Local lid 0x0 detected. Is an SM running?\n"); + return 1; + } + my_dest.rkey = ctx->mr->rkey; + my_dest.vaddr = (uintptr_t)ctx->buf + ctx->size; - printf(" local address: LID %#04x, QPN %#06x, PSN %#06x " - "RKey %#08x VAddr %#016Lx\n", - my_dest.lid, my_dest.qpn, my_dest.psn, - my_dest.rkey, my_dest.vaddr); + printf(" local address: LID %#04x, QPN %#06x, PSN %#06x " + "RKey %#08x VAddr %#016Lx\n", + my_dest.lid, my_dest.qpn, my_dest.psn, + my_dest.rkey, my_dest.vaddr); - if (servername) { - sockfd = pp_client_connect(servername, port); - if (sockfd < 0) + if (servername) { + sockfd = pp_client_connect(servername, port); + if (sockfd < 0) + return 1; + rem_dest = pp_client_exch_dest(sockfd, &my_dest); + } else { + sockfd = pp_server_connect(port); + if (sockfd < 0) + return 1; + rem_dest = pp_server_exch_dest(sockfd, &my_dest); + } + + if (!rem_dest) return 1; - rem_dest = pp_client_exch_dest(sockfd, &my_dest); - } else { - sockfd = pp_server_connect(port); - if (sockfd < 0) - return 1; - rem_dest = pp_server_exch_dest(sockfd, &my_dest); - } - if (!rem_dest) - return 1; + printf(" remote address: LID %#04x, QPN %#06x, PSN %#06x, " + "RKey %#08x VAddr %#016Lx\n", + rem_dest->lid, rem_dest->qpn, rem_dest->psn, + rem_dest->rkey, rem_dest->vaddr); - printf(" remote address: LID %#04x, QPN %#06x, PSN %#06x, " - "RKey %#08x VAddr %#016Lx\n", - rem_dest->lid, rem_dest->qpn, rem_dest->psn, - rem_dest->rkey, rem_dest->vaddr); + if (pp_connect_ctx(ctx, ib_port, my_dest.psn, rem_dest)) + return 1; - if (pp_connect_ctx(ctx, ib_port, my_dest.psn, rem_dest)) - return 1; - - /* An additional handshake is required *after* moving qp to RTR. - Arbitrarily reuse exch_dest for this purpose. */ - if (servername) { - rem_dest = pp_client_exch_dest(sockfd, &my_dest); - } else { - rem_dest = pp_server_exch_dest(sockfd, &my_dest); + /* An additional handshake is required *after* moving qp to RTR. + Arbitrarily reuse exch_dest for this purpose. */ + if (servername) { + rem_dest = pp_client_exch_dest(sockfd, &my_dest); + } else { + rem_dest = pp_server_exch_dest(sockfd, &my_dest); + } } /* For half duplex tests, server just waits for client to exit */ if (!servername && !duplex) { - rem_dest = pp_server_exch_dest(sockfd, &my_dest); - write(sockfd, "done", sizeof "done"); - close(sockfd); + if (use_cma) { + pp_wait_for_done(ctx); + pp_send_done(ctx); + pp_close_cma(ctx, servername); + } else { + rem_dest = pp_server_exch_dest(sockfd, &my_dest); + write(sockfd, "done", sizeof "done"); + close(sockfd); + } return 0; } @@ -695,6 +1225,11 @@ return 1; } + if (!poll && ibv_req_notify_cq(ctx->scq, 0)) { + fprintf(stderr, "ibv_req_notify failed!\n"); + return 1; + } + /* Done with setup. Start the test. */ while (scnt < iters || ccnt < iters) { @@ -712,10 +1247,22 @@ } if (ccnt < iters) { - struct ibv_wc wc; - int ne; + do { - ne = ibv_poll_cq(ctx->cq, 1, &wc); + ne = ibv_poll_cq(ctx->scq, 1, &wc); + if (!poll && ne == 0) { + blocks++; + if (ibv_get_cq_event(ctx->ch, &ev_cq, &ev_ctx)) { + fprintf(stderr, "Failed to get cq event!\n"); + return 1; + } + if (ev_cq != ctx->scq) { + fprintf(stderr, "Unkown CQ!\n"); + return 1; + } + ibv_ack_cq_events(ctx->scq, 1); + ibv_req_notify_cq(ctx->scq, 0); + } } while (ne == 0); tcompleted[ccnt] = get_cycles(); @@ -737,15 +1284,20 @@ } } - if (servername) { - rem_dest = pp_client_exch_dest(sockfd, &my_dest); + if (use_cma) { + pp_send_done(ctx); + pp_wait_for_done(ctx); + pp_close_cma(ctx, servername); } else { - rem_dest = pp_server_exch_dest(sockfd, &my_dest); + if (servername) { + rem_dest = pp_client_exch_dest(sockfd, &my_dest); + } else { + rem_dest = pp_server_exch_dest(sockfd, &my_dest); + } + write(sockfd, "done", sizeof "done"); + close(sockfd); } - write(sockfd, "done", sizeof "done"); - close(sockfd); - print_report(iters, size, duplex, tposted, tcompleted); free(tposted); Index: Makefile =================================================================== --- Makefile (revision 7050) +++ Makefile (working copy) @@ -10,7 +10,7 @@ LOADLIBES += LDFLAGS += -${TESTS}: LOADLIBES += -libverbs +${TESTS}: LOADLIBES += -libverbs -lrdmacm ${TESTS} ${UTILS}: %: %.c ${EXTRA_FILES} ${EXTRA_HEADERS} $(CC) $(CPPFLAGS) $(CFLAGS) $(LDFLAGS) $< ${EXTRA_FILES} $(LOADLIBES) $(LDLIBS) -o $@ From mst at mellanox.co.il Tue May 16 07:41:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 16 May 2006 17:41:19 +0300 Subject: [openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM In-Reply-To: <1147787979.25266.23.camel@stevo-desktop> References: <1147787979.25266.23.camel@stevo-desktop> Message-ID: <20060516144118.GR30211@mellanox.co.il> Quoting r. Steve Wise : > Subject: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM > > I don't know who maintains src/userspace/perftest, but here is a patch > set that enables rdma_bw and rdma_lat to use the RDMA_CM with the > addition of the -c or --cma flag. > I'm worried that this makes the program too big. Maybe this should be another test rather than an option? > The rkey/addr info is exchanged in the private data, and SEND/RECV's are used > to sync the client/server before and after execution. Do we really need SEND/RECV messages for this? I think I get completion with error once the remote side has disconnected. No? > Also, I added -P or --poll to rdma_bw to allow blocking for completion > events when none are ready (if you omit -P, it will block when no > completion is available, otherwise it will spin). Needs to be a separate patch. > Signed-off-by: Steve Wise > Index: rdma_lat.c > =================================================================== > --- rdma_lat.c (revision 7050) > +++ rdma_lat.c (working copy) > @@ -53,6 +53,7 @@ > #include > > #include > +#include > > #include "get_clock.h" > > @@ -71,7 +72,8 @@ > struct ibv_context *context; > struct ibv_pd *pd; > struct ibv_mr *mr; > - struct ibv_cq *cq; > + struct ibv_cq *scq; > + struct ibv_cq *rcq; Why are you adding another CQ? > struct ibv_qp *qp; > void *buf; > volatile char *post_buf; > @@ -80,6 +82,7 @@ > int tx_depth; > struct ibv_sge list; > struct ibv_send_wr wr; > + struct rdma_cm_id *cm_id; > }; > > struct pingpong_dest { > @@ -323,16 +326,22 @@ > return NULL; > } > > - ctx->cq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0); > - if (!ctx->cq) { > + ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0); > + if (!ctx->rcq) { > fprintf(stderr, "Couldn't create CQ\n"); > return NULL; > } CQ of depth 1? -- MST From swise at opengridcomputing.com Tue May 16 07:46:52 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 16 May 2006 09:46:52 -0500 Subject: [openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM In-Reply-To: <20060516144118.GR30211@mellanox.co.il> References: <1147787979.25266.23.camel@stevo-desktop> <20060516144118.GR30211@mellanox.co.il> Message-ID: <1147790812.25266.32.camel@stevo-desktop> On Tue, 2006-05-16 at 17:41 +0300, Michael S. Tsirkin wrote: > Quoting r. Steve Wise : > > Subject: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM > > > > I don't know who maintains src/userspace/perftest, but here is a patch > > set that enables rdma_bw and rdma_lat to use the RDMA_CM with the > > addition of the -c or --cma flag. > > > > I'm worried that this makes the program too big. Maybe this should be > another test rather than an option? > ok. You want it as a separate pair of programs? > > The rkey/addr info is exchanged in the private data, and SEND/RECV's are used > > to sync the client/server before and after execution. > > Do we really need SEND/RECV messages for this? > I think I get completion with error once the remote side has disconnected. No? > perhaps. I just thought it was cleaner to synch up at the end. Just like the non-cma version does over the TCP socket (see pp_client_exch_dest() / pp_server_exch_dest() at the end of the test). > > Also, I added -P or --poll to rdma_bw to allow blocking for completion > > events when none are ready (if you omit -P, it will block when no > > completion is available, otherwise it will spin). > > Needs to be a separate patch. ok. > > > Signed-off-by: Steve Wise > > > > Index: rdma_lat.c > > =================================================================== > > --- rdma_lat.c (revision 7050) > > +++ rdma_lat.c (working copy) > > @@ -53,6 +53,7 @@ > > #include > > > > #include > > +#include > > > > #include "get_clock.h" > > > > @@ -71,7 +72,8 @@ > > struct ibv_context *context; > > struct ibv_pd *pd; > > struct ibv_mr *mr; > > - struct ibv_cq *cq; > > + struct ibv_cq *scq; > > + struct ibv_cq *rcq; > > Why are you adding another CQ? > It makes waiting for a recv completion easier since you won't get a send completion when the CQ is only for receives... > > struct ibv_qp *qp; > > void *buf; > > volatile char *post_buf; > > @@ -80,6 +82,7 @@ > > int tx_depth; > > struct ibv_sge list; > > struct ibv_send_wr wr; > > + struct rdma_cm_id *cm_id; > > }; > > > > struct pingpong_dest { > > @@ -323,16 +326,22 @@ > > return NULL; > > } > > > > - ctx->cq = ibv_create_cq(ctx->context, tx_depth, NULL, NULL, 0); > > - if (!ctx->cq) { > > + ctx->rcq = ibv_create_cq(ctx->context, 1, NULL, NULL, 0); > > + if (!ctx->rcq) { > > fprintf(stderr, "Couldn't create CQ\n"); > > return NULL; > > } > > CQ of depth 1? > Yes, there is only ever one outstanding send/recv exchange... From mst at mellanox.co.il Tue May 16 07:57:50 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 16 May 2006 17:57:50 +0300 Subject: [openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM In-Reply-To: <1147790812.25266.32.camel@stevo-desktop> References: <1147787979.25266.23.camel@stevo-desktop> <20060516144118.GR30211@mellanox.co.il> <1147790812.25266.32.camel@stevo-desktop> Message-ID: <20060516145750.GS30211@mellanox.co.il> Quoting r. Steve Wise : > Subject: Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM > > On Tue, 2006-05-16 at 17:41 +0300, Michael S. Tsirkin wrote: > > Quoting r. Steve Wise : > > > Subject: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM > > > > > > I don't know who maintains src/userspace/perftest, but here is a patch > > > set that enables rdma_bw and rdma_lat to use the RDMA_CM with the > > > addition of the -c or --cma flag. > > > > > > > I'm worried that this makes the program too big. Maybe this should be > > another test rather than an option? > > > > ok. You want it as a separate pair of programs? I guess we'll see once there's the minimum patch that only affects the connection setup. If the changes can be localised to just the pp routines, then I think it still fits as part of the same test. > > > The rkey/addr info is exchanged in the private data, and SEND/RECV's are used > > > to sync the client/server before and after execution. > > > > Do we really need SEND/RECV messages for this? > > I think I get completion with error once the remote side has disconnected. No? > > > > perhaps. I just thought it was cleaner to synch up at the end. Just > like the non-cma version does over the TCP socket (see > pp_client_exch_dest() / pp_server_exch_dest() at the end of the test). Yes, using pp_client_exch_dest/pp_server_exch_dest now looks like not a good idea. Need to think back to why do we need this at all. -- MST From Sagir at mellanox.co.il Tue May 16 08:05:31 2006 From: Sagir at mellanox.co.il (Sagi Rotem) Date: Tue, 16 May 2006 18:05:31 +0300 Subject: [openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat toutilize the RDMA CM Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30228F18D@mtlexch01.mtl.com> If we will have a patch just to the pp routine as MST suggested it would be nice , I could apply it to all other performance tests. Sagi -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Michael S. Tsirkin Sent: Tuesday, May 16, 2006 5:58 PM To: Steve Wise Cc: openib-general Subject: [openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat toutilize the RDMA CM Quoting r. Steve Wise : > Subject: Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the > RDMA CM > > On Tue, 2006-05-16 at 17:41 +0300, Michael S. Tsirkin wrote: > > Quoting r. Steve Wise : > > > Subject: PATCH] enhancement to rdma_bw and rdma_lat to utilize the > > > RDMA CM > > > > > > I don't know who maintains src/userspace/perftest, but here is a > > > patch set that enables rdma_bw and rdma_lat to use the RDMA_CM > > > with the addition of the -c or --cma flag. > > > > > > > I'm worried that this makes the program too big. Maybe this should > > be another test rather than an option? > > > > ok. You want it as a separate pair of programs? I guess we'll see once there's the minimum patch that only affects the connection setup. If the changes can be localised to just the pp routines, then I think it still fits as part of the same test. > > > The rkey/addr info is exchanged in the private data, and > > > SEND/RECV's are used to sync the client/server before and after execution. > > > > Do we really need SEND/RECV messages for this? > > I think I get completion with error once the remote side has disconnected. No? > > > > perhaps. I just thought it was cleaner to synch up at the end. Just > like the non-cma version does over the TCP socket (see > pp_client_exch_dest() / pp_server_exch_dest() at the end of the test). Yes, using pp_client_exch_dest/pp_server_exch_dest now looks like not a good idea. Need to think back to why do we need this at all. -- MST _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Tue May 16 08:08:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 16 May 2006 18:08:55 +0300 Subject: [openib-general] Re: sdp with kernel 2.6.16.14 In-Reply-To: <20060515210309.96302.qmail@web38515.mail.mud.yahoo.com> References: <20060515204147.GF19163@mellanox.co.il> <20060515210309.96302.qmail@web38515.mail.mud.yahoo.com> Message-ID: <20060516150855.GT30211@mellanox.co.il> Quoting r. amit byron : > Subject: Re: sdp with kernel 2.6.16.14 > > > Michael, > > netperf works with sdp. > > slabinfo output: > slabinfo - version: 2.1 (statistics) > # name : tunables : slabdata : globalstat : cpustat > SDP 0 0 1172 6 2 : tunables 24 12 8 : slabdata 0 0 0 : globalstat 104 18 11 11 0 0 0 0 : cpustat 3 17 20 0 > fib6_nodes 7 92 40 92 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 32 21 1 0 0 0 0 0 : cpustat 5 2 0 0 > ip6_dst_cache 9 17 228 17 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 36 17 1 0 0 0 0 0 : cpustat 14 3 8 0 > ndisc_cache 2 22 180 22 1 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 32 17 1 0 0 0 0 0 : cpustat 3 2 3 0 > RAWv6 7 11 712 11 2 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 11 11 1 0 0 0 0 0 : cpustat 6 1 0 0 > UDPv6 1 11 684 11 2 : tunables 32 16 8 : slabdata 1 1 0 : globalstat 52 22 2 1 0 0 0 0 : cpustat 10 5 14 0 > > Amit > > "Michael S. Tsirkin" wrote: > > Quoting r. amit byron : > > Subject: sdp with kernel 2.6.16.14 > > > > > > hi, > > > > i'm trying to get sdp work between point-to-point connected > > machines running kernel 2.6.16.24. i have configured ipoib > > and trying to run iperf using sdp. > > > > the client machine has an entry in its libsdp.conf: > > match destination 192.168.1.2 > > > > the server machine has na entry in its libsdp.conf: > > match listen *:5001 > > > > iperf is started on the server machine using command: > > LD_PRELOAD=/usr/local/lib/libsdp.so iperf -s > > > > iperf client is started on the client machine using command: > > LD_PRELOAD=/usr/local/lib/libsdp.so iperf -c 192.168.1.2 > > > > the server machine panics with following messages: > > > > oom-killer: gfp_mask=0xd0, order=0 > > [] oom-killer: gfp_mask=0xd0, order=0 > > [] out_of_memory+0x155/0x180 > > [] __alloc_pages+0x2a5/0x320 > > [] __get_free_pages+0x1e/0x40 > > [] __pollwait+0x80/0xd0 > > [] pipe_poll+0xcd/0xe0 > > [] do_select+0x212/0x480 > > [] cache_free_debugcheck+0x135/0x230 > > [] __pollwait+0x0/0xd0 > > [] core_sys_select+0x1ce/0x2e0 > > [] sys_select+0x51/0x1c0 > > [] sysenter_past_esp+0x54/0x75 > > DMA per-cpu: > > cpu 0 hot: high 0, batch 1 used:0 > > cpu 0 cold: high 0, batch 1 used:0 > > cpu 1 hot: high 0, batch 1 used:0 > > cpu 1 cold: high 0, batch 1 used:0 > > cpu 2 hot: high 0, batch 1 used:0 > > cpu 2 cold: high 0, batch 1 used:0 > > cpu 3 hot: high 0, batch 1 used:0 > > cpu 3 cold: high 0, batch 1 used:0 > > DMA32 per-cpu: empty > > Normal per-cpu: > > cpu 0 hot: high 186, batch 31 used:103 > > cpu 0 cold: high 62, batch 15 used:61 > > cpu 1 hot: high 186, batch 31 used:183 > > cpu 1 cold: high 62, batch 15 used:53 > > cpu 2 hot: high 186, batch 31 used:28 > > cpu 2 cold: high 62, batch 15 used:54 > > cpu 3 hot: high 186, batch 31 used:63 > > cpu 3 cold: high 62, batch 15 used:60 > > HighMem per-cpu: > > cpu 0 hot: high 186, batch 31 used:176 > > cpu 0 cold: high 62, batch 15 used:13 > > cpu 1 hot: high 186, batch 31 used:169 > > cpu 1 cold: high 62, batch 15 used:1 > > cpu 2 hot: high 186, batch 31 used:157 > > cpu 2 cold: high 62, batch 15 used:0 > > cpu 3 hot: high 186, batch 31 used:174 > > cpu 3 cold: high 62, batch 15 used:6 > > Free pages: 7366104kB (7358760kB HighMem) > > Active:5351 inactive:4885 dirty:0 writeback:0 unstable:0 free:1841526 slab:8970 mapped:4565 pagetables:238 > > DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16384kB pages_scanned:8 all_unreclaimable? yes > > lowmem_reserve[]: 0 0 880 8623 > > DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no > > lowmem_reserve[]: 0 0 880 8623 > > Normal free:3756kB min:3756kB low:4692kB high:5632kB active:232kB inactive:0kB present:901 > > 120kB pages_scanned:314 all_unreclaimable? yes > > lowmem_reserve[]: 0 0 0 61951 > > HighMem free:7358760kB min:512kB low:8780kB high:17052kB active:21172kB inactive:19540kB present:7929852kB pages_scanned:0 all_unreclaimable? no > > lowmem_reserve[]: 0 0 0 0 > > DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB > > DMA32: empty > > Normal: 13*4kB 1*8kB 1*16kB 3*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3756kB > > HighMem: 1672*4kB 1103*8kB 581*16kB 308*32kB 129*64kB 65*128kB 29*256kB 12*512kB 7*1024kB 2*2048kB 1778*4096kB = 7358760kB > > Swap cache: add 0, delete 0, find 0/0, race 0+0 > > Free swap = 0kB > > Total swap = 0kB > > Out of Memory: Kill process 2037 (mDNSResponder) score 2847 and children. > > Out of memory: Killed process 2037 (mDNSResponder). > > oom-killer: gfp_mask=0xd0, order=0 > > [] out_of_memory+0x155/0x180 > > [] __alloc_pages+0x2a5/0x320 > > [] __get_free_pages+0x1e/0x40 > > [] __pollwait+0x80/0xd0 > > [] pipe_poll+0xcd/0xe0 > > [] do_select+0x212/0x480 > > [] cache_free_debugcheck+0x135/0x230 > > [] __pollwait+0x0/0xd0 > > [] core_sys_select+0x1ce/0x2e0 > > [] sys_select+0x51/0x1c0 > > [] sysenter_past_esp+0x54/0x75 > > DMA per-cpu: > > cpu 0 hot: high 0, batch 1 used:0 > > cpu 0 cold: high 0, batch 1 used:0 > > cpu 1 hot: high 0, batch 1 used:0 > > cpu 1 cold: high 0, batch 1 used:0 > > cpu 2 hot: high 0, batch 1 used:0 > > cpu 2 cold: high 0, batch 1 used:0 > > cpu 3 hot: high 0, batch 1 used:0 > > cpu 3 cold: high 0, batch 1 used:0 > > DMA32 per-cpu: empty > > Normal per-cpu: > > cpu 0 hot: high 186, batch 31 used:103 > > cpu 0 cold: high 62, batch 15 used:61 > > cpu 1 hot: high 186, batch 31 used:183 > > cpu 1 cold: high 62, batch 15 used:53 > > cpu 2 hot: high 186, batch 31 used:29 > > cpu 2 cold: high 62, batch 15 used:54 > > cpu 3 hot: high 186, batch 31 used:63 > > cpu 3 cold: high 62, batch 15 used:60 > > HighMem per-cpu: > > cpu 0 hot: high 186, batch 31 used:176 > > cpu 0 cold: high 62, batch 15 used:13 > > cpu 1 hot: high 186, batch 31 used:169 > > cpu 1 cold: high 62, batch 15 used:1 > > cpu 2 hot: high 186, batch 31 used:179 > > cpu 2 cold: high 62, batch 15 used:0 > > cpu 3 hot: high 186, batch 31 used:174 > > cpu 3 cold: high 62, batch 15 used:6 > > Free pages: 7366476kB (7359132kB HighMem) > > Active:5273 inactive:4855 dirty:0 writeback:0 unstable:0 free:1841619 slab:8970 mapped:4423 pagetables:232 > > DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16384kB pages_scanned:8 all_unreclaimable? yes > > lowmem_reserve[]: 0 0 880 8623 > > DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no > > lowmem_reserve[]: 0 0 880 8623 > > Normal free:3756kB min:3756kB low:4692kB high:5632kB active:232kB inactive:0kB present:901120kB pages_scanned:314 all_unreclaimable? yes > > lowmem_reserve[]: 0 0 0 61951 > > HighMem free:7359132kB min:512kB low:8780kB high:17052kB active:20860kB inactive:19420kB present:7929852kB pages_scanned:0 all_unreclaimable? no > > lowmem_reserve[]: 0 0 0 0 > > DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB > > DMA32: empty > > Normal: 13*4kB 1*8kB 1*16kB 3*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3756kB > > HighMem: 1621*4kB 1073*8kB 562*16kB 303*32kB 137*64kB 69*128kB 30*256kB 12*512kB 7*1024kB 2*2048kB 1778*4096kB = 7359132kB > > Swap cache: add 0, delete 0, find 0/0, race 0+0 > > Free swap = 0kB > > Total swap = 0kB > > Out of Memory: Kill process 2134 (sendmail) score 1741 and children. > > Out of memory: Killed process 2134 (sendmail). > > oom-killer: gfp_mask=0xd0, order=0 > > [] out_of_memory+0x155/0x180 > > [] __alloc_pages+0x2a5/0x320 > > [] __get_free_pages+0x1e/0x40 > > [] __pollwait+0x80/0xd0 > > [] pipe_poll+0xcd/0xe0 > > [] do_select+0x212/0x480 > > [] cache_free_debugcheck+0x135/0x230 > > [] __pollwait+0x0/0xd0 > > [] core_sys_select+0x1ce/0x2e0 > > [] sys_select+0x51/0x1c0 > > [] sysenter_past_esp+0x54/0x75 > > > > openib source were retrieved with: > > svn co https://openib.org/svn/gen2/trunk > > > > anybody ran into similar problem, are there any sdp > > patches available? > > > > thanks, > > Amit > > Netperf is running here fine. Can you try that to verify its not > a setup problem? I'll try iperf later. > Hmm. Where did all the memory go? can you > cat /proc/slabinfo > > -- > MST OK, I think I see a problem. Stay tuned for a patch. -- MST From mshefty at ichips.intel.com Tue May 16 09:14:02 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 16 May 2006 09:14:02 -0700 Subject: [openib-general] Re: [PATCH] RE: compliancy issue? In-Reply-To: <20060516130241.GQ30211@mellanox.co.il> References: <20060508085301.GD20207@mellanox.co.il> <20060516130241.GQ30211@mellanox.co.il> Message-ID: <4469FA4A.9080909@ichips.intel.com> > OK, I just tested and this works for me. Here's the SDP patch to do what you > described. The code actually got cleaner now: its convenient to get > different events on active versus passive side - previously I had > to check a flag to figure out what does ESTABLISHED mean. I committed the CMA patch. - Sean From sean.hefty at intel.com Tue May 16 09:21:43 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 16 May 2006 09:21:43 -0700 Subject: [openib-general] RE: need help regarding IB core software In-Reply-To: <20060516140831.41633.qmail@web8316.mail.in.yahoo.com> Message-ID: Please post generic questions to the openib mailing list. i have started working over infiniband recently i want to develop a sample utility that would perform simple RDMA (read/write) operations There are some test applications that can be used as a base. Are you wanting a userspace or kernel application? i will be thankful to you if u can refer me some documents,atticles or books where i can get this information The best documentation is the IB architecture specification. However, I think that the test apps are simple enough to help you here. - Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Tue May 16 09:43:11 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Tue, 16 May 2006 11:43:11 -0500 Subject: [openib-general] Re: PATCH] enhancement to rdma_bw and rdma_lat to utilize the RDMA CM In-Reply-To: <20060516145750.GS30211@mellanox.co.il> References: <1147787979.25266.23.camel@stevo-desktop> <20060516144118.GR30211@mellanox.co.il> <1147790812.25266.32.camel@stevo-desktop> <20060516145750.GS30211@mellanox.co.il> Message-ID: <1147797791.25266.41.camel@stevo-desktop> > Yes, using pp_client_exch_dest/pp_server_exch_dest now looks like > not a good idea. Need to think back to why do we need this at all. > You need it to keep the connection alive until both client and server have finished running the test, in the case of full duplex tests... From iod00d at hp.com Tue May 16 09:44:25 2006 From: iod00d at hp.com (Grant Grundler) Date: Tue, 16 May 2006 09:44:25 -0700 Subject: [openib-general] FTP over SDP is not working fine-[newbie] In-Reply-To: <20060516071531.6929.qmail@web8321.mail.in.yahoo.com> References: <20060516062317.GN29082@esmail.cup.hp.com> <20060516071531.6929.qmail@web8321.mail.in.yahoo.com> Message-ID: <20060516164425.GC1592@esmail.cup.hp.com> On Tue, May 16, 2006 at 08:15:31AM +0100, keshetti mahesh wrote: > my lisdp.conf file is: > > match listen *:* > match destination *:* > match program * > > so that all services can be allowed both on the client and server sides Both client and server have a libsdp.conf file. Do they both have the above content? (It should, but your comment above suggests only one libsdp.conf file is being used.) > > after exporting that file and LD_PRELOAD=/usr/lib/lib64/libsdp.so, i have restarted all services (vsftpd, xinetd etc....) > > again the same problem with FTP .....network unreachable I'll assume "ping" does work. My next suggestion is to top the FTP server and manually invoke it to listen on a different port (proftpd takes -p parameter): /etc/init.d/proftpd stop LD_PRELOAD=/usr/lib/lib64/libsdp.so proftpd -p 20022 Using another login, confirm the ftp server is listening on port 20022 (netstat -a) and is using SDP (cat /proc//maps or something like that). Then from the client, try to talk to that server with LD_PRELOAD=/usr/lib/lib64/libsdp.so ftp 192.168.2.99 20022 > but the i can't understand y only this is giving problem (the other applications are not giving any problem) Sorry - I don't understand that either. If my above suggestion doesn't work, perhaps try a different ftp server or different ftp client? grant From rheflin at atipa.com Tue May 16 10:19:18 2006 From: rheflin at atipa.com (Roger Heflin) Date: Tue, 16 May 2006 12:19:18 -0500 Subject: [openib-general] Heads-up for anyone using certain thunderbird message filter features Message-ID: <446A0996.50709@atipa.com> Off topic - but probably very important to people using thunderbird as an email client. I am using certain thunderbird message filter features - mainly move to folder and then delete from pop server (this is done as a single step). I am on 2 mailing lists that received the same patch set, of the 54 patches emails, 15 patches when to one folder (kernel), 41 patches when to the other folder (openib), and 3 went to both. So anyone using this should watch as the filter does act odd, I suspect that it may be that since the message id is the same that, that may be what it is using to delete the message and may cause it to get both messages, the emails that I got both copies of were delayed by quote a bit and very likely came in on different email downloads, so the other email were not there to delete. Roger From mst at mellanox.co.il Tue May 16 10:20:46 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 16 May 2006 20:20:46 +0300 Subject: [openib-general] Re: Re: [PATCH] RE: compliancy issue? In-Reply-To: <4469FA4A.9080909@ichips.intel.com> References: <20060508085301.GD20207@mellanox.co.il> <20060516130241.GQ30211@mellanox.co.il> <4469FA4A.9080909@ichips.intel.com> Message-ID: <20060516172046.GC24838@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: Re: [PATCH] RE: compliancy issue? > > >OK, I just tested and this works for me. Here's the SDP patch to do what > >you > >described. The code actually got cleaner now: its convenient to get > >different events on active versus passive side - previously I had > >to check a flag to figure out what does ESTABLISHED mean. > > I committed the CMA patch. > Ditto for the SDP update. -- MST From sweitzen at cisco.com Tue May 16 10:47:49 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 16 May 2006 10:47:49 -0700 Subject: [openib-general] RE: OFED 1.0 rc4 won't compile on orig FC5 kernel Message-ID: After running "yum update", I was able to compile OFED 1.0 rc4 on 2.6.16-1.2111_FC5 kernel. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Thursday, May 11, 2006 9:37 AM > To: Scott Weitzenkamp (sweitzen) > Cc: openfabrics-ewg at openib.org; openib-general at openib.org > Subject: Re: OFED 1.0 rc4 won't compile on orig FC5 kernel > > Quoting r. Scott Weitzenkamp (sweitzen) : > > Subject: OFED 1.0 rc4 won't compile on orig FC5 kernel > > > > Is this a useful kernel to try, or should get latest FC5 > kernel or 2.6.16 from kernel.org? > > I think you should go to latest update. > > -- > MST > From sweitzen at cisco.com Tue May 16 10:52:17 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 16 May 2006 10:52:17 -0700 Subject: [openib-general] RE: [openfabrics-ewg] RE: OFED 1.0 rc4 won't compile on orig FC5 kernel Message-ID: Actually, I spoke too soon. Kernel components compiled, but MVAPICH did not: Compiling MVAPICH ... 2 mpirun_rsh.c: In function 'read_hostfile': mpirun_rsh.c:1197: warning: incompatible implicit declaration of built-in functi on 'strndup' mpirun_rsh.c:1205: warning: incompatible implicit declaration of built-in functi on 'strndup' mpirun_rsh.c:1220: warning: incompatible implicit declaration of built-in functi on 'strndup' mpirun_rsh.c:1220: error: too few arguments to function 'strndup' make[3]: *** [mpirun_rsh] Error 1 Exit status from make was 2 make[2]: *** [mpilib] Error 1 make[1]: *** [mpi-modules] Error 2 make: *** [mpi] Error 2 Error in compiling MVAPICH. Check the log file: make.mvapich.log Exiting .... Mvapich installation failed Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: openfabrics-ewg-bounces at openib.org > [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of > Scott Weitzenkamp (sweitzen) > Sent: Tuesday, May 16, 2006 10:48 AM > To: Michael S. Tsirkin > Cc: openfabrics-ewg at openib.org; openib-general at openib.org > Subject: [openfabrics-ewg] RE: OFED 1.0 rc4 won't compile on > orig FC5 kernel > > After running "yum update", I was able to compile OFED 1.0 rc4 on > 2.6.16-1.2111_FC5 kernel. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > > -----Original Message----- > > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > > Sent: Thursday, May 11, 2006 9:37 AM > > To: Scott Weitzenkamp (sweitzen) > > Cc: openfabrics-ewg at openib.org; openib-general at openib.org > > Subject: Re: OFED 1.0 rc4 won't compile on orig FC5 kernel > > > > Quoting r. Scott Weitzenkamp (sweitzen) : > > > Subject: OFED 1.0 rc4 won't compile on orig FC5 kernel > > > > > > Is this a useful kernel to try, or should get latest FC5 > > kernel or 2.6.16 from kernel.org? > > > > I think you should go to latest update. > > > > -- > > MST > > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > From rdreier at cisco.com Tue May 16 11:27:33 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 16 May 2006 11:27:33 -0700 Subject: [openib-general] Re: [PATCH] RDMA CM: updates to 2.6.18 branch In-Reply-To: (Sean Hefty's message of "Mon, 15 May 2006 15:32:35 -0700") References: Message-ID: OK, the for-2.6.18 branch is updated with all of this. From troy at scl.ameslab.gov Tue May 16 12:41:08 2006 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Tue, 16 May 2006 14:41:08 -0500 Subject: [openib-general] opensm segfault? Message-ID: <20060516194108.GB18223@scl.ameslab.gov> I got this after an indeterminate amount of time running opensm.. (gdb) bt #0 0x00002b90b0dbebf3 in cl_memcpy (p_dest=0x2aaaaac88850, p_src=0x0, count=64) at cl_memory_osd.c:87 #1 0x0000000000415053 in osm_pkey_tbl_sync_new_blocks ( p_pkey_tbl=0x2aaaaad99228) at osm_pkey.c:127 #2 0x0000000000416687 in osm_pkey_mgr_process (p_osm=0x580e40) at osm_pkey_mgr.c:407 #3 0x000000000043bb22 in osm_state_mgr_process (p_mgr=0x581ad8, signal=3) at osm_state_mgr.c:2243 #4 0x000000000043c88f in __osm_state_mgr_ctrl_disp_callback ( context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70 #5 0x00002b90b0db9437 in __cl_disp_worker (context=0x5831f0) at cl_dispatcher.c:108 #6 0x00002b90b0dc1ca3 in __cl_thread_pool_routine (context=0x583268) at cl_threadpool.c:78 #7 0x00002b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at cl_thread.c:61 #8 0x00002b90b0fe3b1c in start_thread () from /lib/libpthread.so.0 #9 0x00002b90b12c8273 in clone () from /lib/libc.so.6 And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This just seems like excessive uneeded abstraction. I'm running opensm from subversion rev 7091.. May 10 16:27:53 145969 [0000] -> OpenSM Rev:openib-1.2.0 OpenIB svn 6251:7091M the only local changes are as follows: troy at opteron1:/usr/src/openib-src/userspace/management$ svn diff Index: osm/opensm/osm_port_info_rcv.c =================================================================== --- osm/opensm/osm_port_info_rcv.c (revision 7091) +++ osm/opensm/osm_port_info_rcv.c (working copy) @@ -469,9 +469,14 @@ goto Exit; } +#if 0 /* Check for IBM eHCA firmware defect in reporting partition * enforcement cap */ if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) == IBM_VENDOR_ID) p_switch->switch_info.enforce_cap = 0; +#endif + /* Check for busted divergenet switch on ameslab network */ + if (cl_ntoh64(p_node->node_info.node_guid) == 0x00084e0000000152) + p_switch->switch_info.enforce_cap = 0; /* Bail out if this is a switch with no partition enforcement * capability */ if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0) From pasha at mellanox.co.il Tue May 16 12:53:42 2006 From: pasha at mellanox.co.il (Pavel Shamis (Pasha)) Date: Tue, 16 May 2006 22:53:42 +0300 Subject: [openib-general] Re: [openfabrics-ewg] RE: OFED 1.0 rc4 won't compile on orig FC5 kernel In-Reply-To: References: Message-ID: <446A2DC6.9060107@mellanox.co.il> This issue already fixed in rc5. Regards, Pasha. Scott Weitzenkamp (sweitzen) wrote: > Actually, I spoke too soon. Kernel components compiled, but MVAPICH > did not: > > Compiling MVAPICH ... > 2 > mpirun_rsh.c: In function 'read_hostfile': > mpirun_rsh.c:1197: warning: incompatible implicit declaration of > built-in functi > on 'strndup' > mpirun_rsh.c:1205: warning: incompatible implicit declaration of > built-in functi > on 'strndup' > mpirun_rsh.c:1220: warning: incompatible implicit declaration of > built-in functi > on 'strndup' > mpirun_rsh.c:1220: error: too few arguments to function 'strndup' > make[3]: *** [mpirun_rsh] Error 1 > Exit status from make was 2 > make[2]: *** [mpilib] Error 1 > make[1]: *** [mpi-modules] Error 2 > make: *** [mpi] Error 2 > Error in compiling MVAPICH. Check the log file: make.mvapich.log > Exiting .... > Mvapich installation failed > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > >> -----Original Message----- >> From: openfabrics-ewg-bounces at openib.org >> [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of >> Scott Weitzenkamp (sweitzen) >> Sent: Tuesday, May 16, 2006 10:48 AM >> To: Michael S. Tsirkin >> Cc: openfabrics-ewg at openib.org; openib-general at openib.org >> Subject: [openfabrics-ewg] RE: OFED 1.0 rc4 won't compile on >> orig FC5 kernel >> >> After running "yum update", I was able to compile OFED 1.0 rc4 on >> 2.6.16-1.2111_FC5 kernel. >> >> Scott Weitzenkamp >> SQA and Release Manager >> Server Virtualization Business Unit >> Cisco Systems >> >> >>> -----Original Message----- >>> From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] >>> Sent: Thursday, May 11, 2006 9:37 AM >>> To: Scott Weitzenkamp (sweitzen) >>> Cc: openfabrics-ewg at openib.org; openib-general at openib.org >>> Subject: Re: OFED 1.0 rc4 won't compile on orig FC5 kernel >>> >>> Quoting r. Scott Weitzenkamp (sweitzen) : >>>> Subject: OFED 1.0 rc4 won't compile on orig FC5 kernel >>>> >>>> Is this a useful kernel to try, or should get latest FC5 >>> kernel or 2.6.16 from kernel.org? >>> >>> I think you should go to latest update. >>> >>> -- >>> MST >>> >> _______________________________________________ >> openfabrics-ewg mailing list >> openfabrics-ewg at openib.org >> http://openib.org/mailman/listinfo/openfabrics-ewg >> > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg > From hch at infradead.org Tue May 16 13:05:01 2006 From: hch at infradead.org (Christoph Hellwig) Date: Tue, 16 May 2006 21:05:01 +0100 Subject: [openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt In-Reply-To: <1147728081.2773.25.camel@chalcedony.pathscale.com> References: <4e0a07d20868c6c4f038.1147477386@eng-12.pathscale.com> <1147728081.2773.25.camel@chalcedony.pathscale.com> Message-ID: <20060516200501.GA5060@infradead.org> On Mon, May 15, 2006 at 02:21:21PM -0700, Bryan O'Sullivan wrote: > On Mon, 2006-05-15 at 08:50 -0700, Roland Dreier wrote: > > > Actually I NAK'ed this patch. It compiles the same thing on x86_64 > > but makes the source code wrong -- dma_map_single() returns a bus > > address, not a physical address. > > As Segher mentioned, bus_to_virt is unportable, so it's definitely the > wrong thing to use. phys_to_virt is as bad. please fix your code to do the right thing, that is to stop pretending to be able to map back from a bus to a virtual address. The only way to get at the virtual address from a bus one is to store it away at the time you call the dma mapping function. From halr at voltaire.com Tue May 16 13:00:32 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 16 May 2006 16:00:32 -0400 Subject: [openib-general] opensm segfault? In-Reply-To: <20060516194108.GB18223@scl.ameslab.gov> References: <20060516194108.GB18223@scl.ameslab.gov> Message-ID: <1147809202.18971.27254.camel@hal.voltaire.com> Hi Troy, On Tue, 2006-05-16 at 15:41, Troy Benjegerdes wrote: > I got this after an indeterminate amount of time running opensm.. > > > (gdb) bt > #0 0x00002b90b0dbebf3 in cl_memcpy (p_dest=0x2aaaaac88850, p_src=0x0, ^^^^^^^^^ This is the problem. Not sure why yet. > count=64) at cl_memory_osd.c:87 > #1 0x0000000000415053 in osm_pkey_tbl_sync_new_blocks ( > p_pkey_tbl=0x2aaaaad99228) at osm_pkey.c:127 > #2 0x0000000000416687 in osm_pkey_mgr_process (p_osm=0x580e40) > at osm_pkey_mgr.c:407 > #3 0x000000000043bb22 in osm_state_mgr_process (p_mgr=0x581ad8, > signal=3) > at osm_state_mgr.c:2243 > #4 0x000000000043c88f in __osm_state_mgr_ctrl_disp_callback ( > context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70 > #5 0x00002b90b0db9437 in __cl_disp_worker (context=0x5831f0) > at cl_dispatcher.c:108 > #6 0x00002b90b0dc1ca3 in __cl_thread_pool_routine (context=0x583268) > at cl_threadpool.c:78 > #7 0x00002b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at > cl_thread.c:61 > #8 0x00002b90b0fe3b1c in start_thread () from /lib/libpthread.so.0 > #9 0x00002b90b12c8273 in clone () from /lib/libc.so.6 > > > > And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This > just seems like excessive uneeded abstraction. It's part of the component library, which is an OS abstraction layer. > I'm running opensm from subversion rev 7091.. > > May 10 16:27:53 145969 [0000] -> OpenSM Rev:openib-1.2.0 OpenIB svn > 6251:7091M > > the only local changes are as follows: > > troy at opteron1:/usr/src/openib-src/userspace/management$ svn diff > Index: osm/opensm/osm_port_info_rcv.c > =================================================================== > --- osm/opensm/osm_port_info_rcv.c (revision 7091) > +++ osm/opensm/osm_port_info_rcv.c (working copy) > @@ -469,9 +469,14 @@ > goto Exit; > } > > +#if 0 > /* Check for IBM eHCA firmware defect in reporting partition > * enforcement cap */ > if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) == > IBM_VENDOR_ID) > p_switch->switch_info.enforce_cap = 0; > +#endif > + /* Check for busted divergenet switch on ameslab network */ > + if (cl_ntoh64(p_node->node_info.node_guid) == 0x00084e0000000152) > + p_switch->switch_info.enforce_cap = 0; > > /* Bail out if this is a switch with no partition enforcement > * capability */ > if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0) Yes, that's fine. -- Hal From rdreier at cisco.com Tue May 16 13:10:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 16 May 2006 13:10:24 -0700 Subject: [openib-general] opensm segfault? In-Reply-To: <1147809202.18971.27254.camel@hal.voltaire.com> (Hal Rosenstock's message of "16 May 2006 16:00:32 -0400") References: <20060516194108.GB18223@scl.ameslab.gov> <1147809202.18971.27254.camel@hal.voltaire.com> Message-ID: Troy> And why the heck is "cl_memcpy" just a call to 'memcpy' Troy> anyway? This just seems like excessive uneeded abstraction. Hal> It's part of the component library, which is an OS Hal> abstraction layer. memcpy() is specified by the ISO C standard, so it seems pretty silly to abstract this. Is there any platform that opensm could conceivably run on that doesn't supply memcpy()? - R. From halr at voltaire.com Tue May 16 13:38:59 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 16 May 2006 16:38:59 -0400 Subject: [openib-general] opensm segfault? In-Reply-To: References: <20060516194108.GB18223@scl.ameslab.gov> <1147809202.18971.27254.camel@hal.voltaire.com> Message-ID: <1147811937.18971.28011.camel@hal.voltaire.com> On Tue, 2006-05-16 at 16:10, Roland Dreier wrote: > Troy> And why the heck is "cl_memcpy" just a call to 'memcpy' > Troy> anyway? This just seems like excessive uneeded abstraction. > > Hal> It's part of the component library, which is an OS > Hal> abstraction layer. > > memcpy() is specified by the ISO C standard, so it seems pretty silly > to abstract this. Is there any platform that opensm could conceivably > run on that doesn't supply memcpy()? OK. I'll work up a patch to eliminate this if there are no objections. -- Hal From friedman at ucla.edu Tue May 16 14:50:10 2006 From: friedman at ucla.edu (Scott A. Friedman) Date: Tue, 16 May 2006 14:50:10 -0700 Subject: [openib-general] RE: OFED 1.0 rc4 won't compile on orig FC5 kernel Message-ID: <446A4912.4010509@ucla.edu> Hi I have been trying to build OFED-1.0-rc4 on FC5 as well. MVAPICH builds if you fix the error - strndup should probably be strdup. Simple fix. We have found that only iser, open-iscsi, mpitests and ibutils do not build right now for us. We do not need iser or open-iscsi so are not going to spend time on those - mpitests and ibutils would be nice. Scott From ardavis at ichips.intel.com Tue May 16 15:44:33 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Tue, 16 May 2006 15:44:33 -0700 Subject: [openib-general] [PATCH 15 of 53] ipath - make some maximum values more sane In-Reply-To: <480ceff18a886d7504a5.1147477380@eng-12.pathscale.com> References: <480ceff18a886d7504a5.1147477380@eng-12.pathscale.com> Message-ID: <446A55D1.3090100@ichips.intel.com> Bryan O'Sullivan wrote: >Increase the limits on some maximum values. > > > I noticed a rdma/message max size limitation of 4096 the last time I ran some dapl tests. Are there plans to increase or did I miss it somewhere in all the patches? Here are the max values returned from the ipath ibv_query_device: query_hca: (ver=20401) ep 65535 ep_q 65535 evd 65535 evd_q 65535 query_hca: msg 4096 rdma 4096 iov 255 lmr 65535 rmr 0 query_hca: dto 65535 iov 255 rdma i1,o1 Thanks, -arlin From sashak at voltaire.com Tue May 16 16:10:38 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 17 May 2006 02:10:38 +0300 Subject: [openib-general] opensm segfault? In-Reply-To: <20060516194108.GB18223@scl.ameslab.gov> References: <20060516194108.GB18223@scl.ameslab.gov> Message-ID: <20060516231038.GE24906@sashak.voltaire.com> Hi Troy, On 14:41 Tue 16 May , Troy Benjegerdes wrote: > I got this after an indeterminate amount of time running opensm.. May this be reproducible? Or it is completely random failure? > (gdb) bt > #0 0x00002b90b0dbebf3 in cl_memcpy (p_dest=0x2aaaaac88850, p_src=0x0, > count=64) at cl_memory_osd.c:87 > #1 0x0000000000415053 in osm_pkey_tbl_sync_new_blocks ( > p_pkey_tbl=0x2aaaaad99228) at osm_pkey.c:127 > #2 0x0000000000416687 in osm_pkey_mgr_process (p_osm=0x580e40) > at osm_pkey_mgr.c:407 > #3 0x000000000043bb22 in osm_state_mgr_process (p_mgr=0x581ad8, > signal=3) > at osm_state_mgr.c:2243 > #4 0x000000000043c88f in __osm_state_mgr_ctrl_disp_callback ( > context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70 > #5 0x00002b90b0db9437 in __cl_disp_worker (context=0x5831f0) > at cl_dispatcher.c:108 > #6 0x00002b90b0dc1ca3 in __cl_thread_pool_routine (context=0x583268) > at cl_threadpool.c:78 > #7 0x00002b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at > cl_thread.c:61 > #8 0x00002b90b0fe3b1c in start_thread () from /lib/libpthread.so.0 > #9 0x00002b90b12c8273 in clone () from /lib/libc.so.6 > > > > And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This > just seems like excessive uneeded abstraction. Absolutely agree with you. Sasha. > I'm running opensm from subversion rev 7091.. > > May 10 16:27:53 145969 [0000] -> OpenSM Rev:openib-1.2.0 OpenIB svn > 6251:7091M > > the only local changes are as follows: > > troy at opteron1:/usr/src/openib-src/userspace/management$ svn diff > Index: osm/opensm/osm_port_info_rcv.c > =================================================================== > --- osm/opensm/osm_port_info_rcv.c (revision 7091) > +++ osm/opensm/osm_port_info_rcv.c (working copy) > @@ -469,9 +469,14 @@ > goto Exit; > } > > +#if 0 > /* Check for IBM eHCA firmware defect in reporting partition > * enforcement cap */ > if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) == > IBM_VENDOR_ID) > p_switch->switch_info.enforce_cap = 0; > +#endif > + /* Check for busted divergenet switch on ameslab network */ > + if (cl_ntoh64(p_node->node_info.node_guid) == 0x00084e0000000152) > + p_switch->switch_info.enforce_cap = 0; > > /* Bail out if this is a switch with no partition enforcement > * capability */ > if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0) > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Tue May 16 16:55:57 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 16 May 2006 16:55:57 -0700 Subject: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading In-Reply-To: <20060514130903.GA24687@mellanox.co.il> (Ishai Rabinovitz's message of "Sun, 14 May 2006 16:09:03 +0300") References: <20060514110201.GA23308@mellanox.co.il> <20060514130903.GA24687@mellanox.co.il> Message-ID: > + /* > + * We need 2 scsi_host_put becuase there are two get: > + * in scsi_host_alloc and in scsi_add_host > + */ > + scsi_host_put(target->scsi_host); > scsi_host_put(target->scsi_host); Hmm, this doesn't seem right to me. If I try this, then I get a crash because the scsi_host is already gone after the first put. I verified that the reference count is 1 before these puts, and with the unmodified module I don't see anything left in /sys/class/scsi_host after unloading the module. What kernel are you seeing problems with? I'm testing with an up-to-date git kernel, although I doubt it makes a difference (did SCSI reference counting change recently??). I do think there are some extra scsi_host_put() calls in srp_remove_work() -- I think the double scsi_host_put() dates back to a version (which I may never even have checked in) where there was a scsi_host_get() to avoid the scsi_host going away between the schedule_work() and srp_remove_work() actually running. So the patch below seems correct to me. What do you think? --- linux-kernel/infiniband/ulp/srp/ib_srp.c (revision 7245) +++ linux-kernel/infiniband/ulp/srp/ib_srp.c (working copy) @@ -353,7 +356,6 @@ static void srp_remove_work(void *target spin_lock_irq(target->scsi_host->host_lock); if (target->state != SRP_TARGET_DEAD) { spin_unlock_irq(target->scsi_host->host_lock); - scsi_host_put(target->scsi_host); return; } target->state = SRP_TARGET_REMOVED; @@ -367,8 +369,6 @@ static void srp_remove_work(void *target ib_destroy_cm_id(target->cm_id); srp_free_target_ib(target); scsi_host_put(target->scsi_host); - /* And another put to really free the target port... */ - scsi_host_put(target->scsi_host); } static int srp_connect_target(struct srp_target_port *target) From rdreier at cisco.com Tue May 16 16:56:58 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 16 May 2006 16:56:58 -0700 Subject: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading In-Reply-To: <20060514130903.GA24687@mellanox.co.il> (Ishai Rabinovitz's message of "Sun, 14 May 2006 16:09:03 +0300") References: <20060514110201.GA23308@mellanox.co.il> <20060514130903.GA24687@mellanox.co.il> Message-ID: BTW, I think the patch below is correct as well. This avoids problems where the SRP driver waits forever for a completion, for example if sending the DREQ fails because the connection has already been disconnected by the target. Does this scenario seem like the deadlock you thought you saw? --- linux-kernel/infiniband/ulp/srp/ib_srp.c (revision 7245) +++ linux-kernel/infiniband/ulp/srp/ib_srp.c (working copy) @@ -342,7 +342,10 @@ static void srp_disconnect_target(struct /* XXX should send SRP_I_LOGOUT request */ init_completion(&target->done); - ib_send_cm_dreq(target->cm_id, NULL, 0); + if (ib_send_cm_dreq(target->cm_id, NULL, 0)) { + printk(KERN_DEBUG PFX "Sending CM DREQ failed\n"); + return; + } wait_for_completion(&target->done); } From christopherx.b.kasten at intel.com Tue May 16 16:57:55 2006 From: christopherx.b.kasten at intel.com (Kasten, ChristopherX B) Date: Tue, 16 May 2006 16:57:55 -0700 Subject: [openib-general] AMSO1100 + uverbs: ping and opensm errors after installation Message-ID: Hello, I am having trouble getting the AMSO1100 Ethernet card to work with uverbs. I have installed uverbs from the Installation Cheat Sheet https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet, substituting amso where mthca is listed (for the most part). I also have updated the AMMASSO firmware using the file from http://www.opengridcomputing.com/downloads/ogc_amso_kit_20060308.tgz. My linux kernel is 2.6.16.15, and the ib... & iw_c2 modules are loading successfully at boot. When I try to ping another AMMASSO machine I get the following output: ping: sndmsg: Network is down accompanied by a dmesg report: Virtual device iw1 asks to queue packet! The following is printed when running ibv_devinfo: hca_id: amso0 fw_ver: 1.1.1 node_guid: 000d:b200:0845:0000 sys_image_guid: 000d:b200:0844:0000 vendor_id: 0x0000 vendor_part_id: 0 hw_ver: 0x0 board_id: AMSO1100 Board ID phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 512 (2) sm_lid: 0 port_lid: 0 port_lmc: 0x00 It looks like some values are not being initialized. Lastly, running opensm prints: ------------------------------------------------- OpenSM Rev:openib-1.2.0 Command Line Arguments: Log File: /var/log/osm.log ------------------------------------------------- OpenSM Rev:openib-1.2.0 Using default guid 0x0 Error: Could not get port guid Exiting SM Does anyone know which step(s) I've missed in correctly setting up my network? Thanks for the help, Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From ken at novell.com Tue May 16 18:49:14 2006 From: ken at novell.com (Ken L Johnson) Date: Tue, 16 May 2006 19:49:14 -0600 Subject: [openib-general] ib_mthca fails to load with old firmware Message-ID: <200605161949.14995.ken@novell.com> I'm running into a problem when I try to use the OFED RC4 release on some blade systems that have TopSpin HCA daughter cards installed (actually Mellanox). I'm trying to figure out how to update the firmware to the latest [ http://mellanox.com/support/firmware_table.php ] but it seems I must know the PSID so I can grab the right firmware image. Can anyone point me in the right direction here? ---8<--- [query device using flint] blade9:~ # flint -d /dev/mst/mt25208_pci_cr0 q Image type: Failsafe I.S. Version: 1 Chip Revision: A0 GUID Des: Node Port1 Port2 Sys image GUIDs: 0005ad000002ad1d 0005ad000002ad1e 0005ad000002ad1f 0005ad000100d050 Board ID: 1 VSD: 1 PSID: --->8--- ---8<--- [dmesg output showing ib_mthca load failure] <6>ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) <6>ib_mthca: Initializing 0000:02:00.0 <6>ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 169 <7>PCI: Setting latency timer of device 0000:02:00.0 to 64 <6>e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex <4>ib_mthca 0000:02:00.0: HCA FW version 4.6.0 is old (4.7.400 is current). <4>ib_mthca 0000:02:00.0: If you have problems, try updating your HCA FW. <3>ib_mthca 0000:02:00.0: NOP command failed to generate interrupt (IRQ 169), aborting. <3>ib_mthca 0000:02:00.0: BIOS or ACPI interrupt routing problem? <6>ACPI: PCI interrupt for device 0000:02:00.0 disabled <4>ib_mthca: probe of 0000:02:00.0 failed with error -16 --->8--- ---8<--- [hwinfo & lspci output for HCA] blade9:~ # hwinfo [...] 24: PCI 200.0: 0c06 InfiniBand [Created at pci.277] Unique ID: B35A.guWNc33i6_3 Parent ID: 8otl.l6V0RupyGX6 SysFS ID: /devices/pci0000:00/0000:00:04.0/0000:02:00.0 SysFS BusID: 0000:02:00.0 Hardware Class: unknown Model: "Mellanox MT25208 InfiniHost III Ex HCA (Tavor compatibility mode)" Vendor: pci 0x15b3 "Mellanox Technologies" Device: pci 0x6278 "MT25208 InfiniHost III Ex HCA (Tavor compatibility mode)" SubVendor: pci 0x15b3 "Mellanox Technologies" SubDevice: pci 0x6278 Revision: 0xa0 Memory Range: 0xfe900000-0xfe9fffff (rw,non-prefetchable) Memory Range: 0xdf800000-0xdfffffff (rw,prefetchable) Memory Range: 0xd0000000-0xd7ffffff (rw,prefetchable) IRQ: 169 (no events) Module Alias: "pci:v000015B3d00006278sv000015B3sd00006278bc0Csc06i00" Driver Info #0: Driver Status: ib_mthca is active Driver Activation Cmd: "modprobe ib_mthca" Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #17 (PCI bridge) blade9:~ # lspci -vv [...] 02:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex HCA (Tavor compatibility mode) (rev a0) Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex HCA (Tavor compatibility mode) Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 8--- Regards, -- Ken L Johnson From Liang.Peng at Sun.COM Tue May 16 21:44:26 2006 From: Liang.Peng at Sun.COM (Liang Peng) Date: Wed, 17 May 2006 12:44:26 +0800 Subject: [openib-general] mpirun_mpd crashing Message-ID: <446AAA2A.10500@Sun.COM> Hi there, Not sure whether this is the proper place to post, but we encounter some mpirun_mpd crashing problems in testing Voltaire MPI (based on MVAPICH) with Sun studio 11 compilers on SuSE Linux 9 SP3 (Opteron). Hope someone can provide some hints: MVAPICH version: 0.9.4 with Voltaire's modifications Compiler used: Sun Studio 11 Problem: When using the mpd version of MVAPICH, mpirun crashes with the following: > mpirun_mpd -np 2 /usr/voltaire/mpi.cc.mpd/bin/cpi [man_0]: [cli_0]: client_bnr_get failed [cli_1]: MPD_Man_msg_handler received unexpected msg :cmd=client_bnr_get_output val=apstc-g4:00024400: : handle_lhs_msgs_input: failed for bnr_get: buf=:cmd=bnr_get src=man_0 dest=man_0 bcast=true attr=MVAPICH_0001\^ gid=0 : [man_0]: application program exited abnormally with status 0 [man_0]: application program signaled with signal 11 (: Segmentation fault) The "rsh" version is working properly, and the gcc compiled version of mpd is working on the same machine. Thanks! Regards, Liang Peng -- Research Scientist Large Scale Computing Asia Pacific Science & Technology Center Sun Microsystems, Inc. and Nanyang Technological University, Singapore From rdreier at cisco.com Tue May 16 22:05:42 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 16 May 2006 22:05:42 -0700 Subject: [openib-general] ib_mthca fails to load with old firmware In-Reply-To: <200605161949.14995.ken@novell.com> (Ken L. Johnson's message of "Tue, 16 May 2006 19:49:14 -0600") References: <200605161949.14995.ken@novell.com> Message-ID: Ken> I'm running into a problem when I try to use the OFED RC4 Ken> release on some blade systems that have TopSpin HCA daughter Ken> cards installed (actually Mellanox). I'm trying to figure out Ken> how to update the firmware to the latest [ Ken> http://mellanox.com/support/firmware_table.php ] but it seems Ken> I must know the PSID so I can grab the right firmware Ken> image. Can anyone point me in the right direction here? For blade HCAs you should contact the HCA vendor for firmware updates. You could try passing the module option "fw_cmd_doorbell=0" to ib_mthca. That may work around things. - R. From mst at mellanox.co.il Tue May 16 22:22:17 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 17 May 2006 08:22:17 +0300 Subject: [openib-general] Re: ib_mthca fails to load with old firmware In-Reply-To: <200605161949.14995.ken@novell.com> References: <200605161949.14995.ken@novell.com> Message-ID: <20060517052216.GA8006@mellanox.co.il> Quoting r. Ken L Johnson : > Subject: ib_mthca fails to load with old firmware > > I'm running into a problem when I try to use the OFED RC4 release on some > blade systems that have TopSpin HCA daughter cards installed (actually > Mellanox). I'm trying to figure out how to update the firmware to the latest > [ http://mellanox.com/support/firmware_table.php ] but it seems I must know > the PSID so I can grab the right firmware image. Can anyone point me in the > right direction here? > > ---8<--- [query device using flint] > > blade9:~ # flint -d /dev/mst/mt25208_pci_cr0 q > Image type: Failsafe > I.S. Version: 1 > Chip Revision: A0 > GUID Des: Node Port1 Port2 Sys image > GUIDs: 0005ad000002ad1d 0005ad000002ad1e 0005ad000002ad1f > 0005ad000100d050 > Board ID: 1 > VSD: 1 > PSID: > > --->8--- > > ---8<--- [dmesg output showing ib_mthca load failure] > > <6>ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006) > <6>ib_mthca: Initializing 0000:02:00.0 > <6>ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 169 > <7>PCI: Setting latency timer of device 0000:02:00.0 to 64 > <6>e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex > <4>ib_mthca 0000:02:00.0: HCA FW version 4.6.0 is old (4.7.400 is current). > <4>ib_mthca 0000:02:00.0: If you have problems, try updating your HCA FW. > <3>ib_mthca 0000:02:00.0: NOP command failed to generate interrupt (IRQ > 169), aborting. > <3>ib_mthca 0000:02:00.0: BIOS or ACPI interrupt routing problem? > <6>ACPI: PCI interrupt for device 0000:02:00.0 disabled > <4>ib_mthca: probe of 0000:02:00.0 failed with error -16 > > --->8--- > > > ---8<--- [hwinfo & lspci output for HCA] > > blade9:~ # hwinfo > [...] > 24: PCI 200.0: 0c06 InfiniBand > [Created at pci.277] > Unique ID: B35A.guWNc33i6_3 > Parent ID: 8otl.l6V0RupyGX6 > SysFS ID: /devices/pci0000:00/0000:00:04.0/0000:02:00.0 > SysFS BusID: 0000:02:00.0 > Hardware Class: unknown > Model: "Mellanox MT25208 InfiniHost III Ex HCA (Tavor compatibility mode)" > Vendor: pci 0x15b3 "Mellanox Technologies" > Device: pci 0x6278 "MT25208 InfiniHost III Ex HCA (Tavor compatibility > mode)" > SubVendor: pci 0x15b3 "Mellanox Technologies" > SubDevice: pci 0x6278 > Revision: 0xa0 > Memory Range: 0xfe900000-0xfe9fffff (rw,non-prefetchable) > Memory Range: 0xdf800000-0xdfffffff (rw,prefetchable) > Memory Range: 0xd0000000-0xd7ffffff (rw,prefetchable) > IRQ: 169 (no events) > Module Alias: "pci:v000015B3d00006278sv000015B3sd00006278bc0Csc06i00" > Driver Info #0: > Driver Status: ib_mthca is active > Driver Activation Cmd: "modprobe ib_mthca" > Config Status: cfg=new, avail=yes, need=no, active=unknown > Attached to: #17 (PCI bridge) > > blade9:~ # lspci -vv > [...] > 02:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex HCA (Tavor > compatibility mode) (rev a0) > Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex HCA (Tavor > compatibility mode) > Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > SERR- Interrupt: pin A routed to IRQ 169 > Region 0: Memory at 00000000fe900000 (64-bit, non-prefetchable) > [size=1M] > Region 2: Memory at 00000000df800000 (64-bit, prefetchable) [size=8M] > Region 4: Memory at 00000000d0000000 (64-bit, prefetchable) > [size=128M] > Capabilities: [40] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [48] Vital Product Data > Capabilities: [90] Message Signalled Interrupts: 64bit+ Queue=0/5 > Enable- > Address: 0000000000000000 Data: 0000 > Capabilities: [84] MSI-X: Enable- Mask- TabSize=32 > Vector table: BAR=0 offset=00082000 > PBA: BAR=0 offset=00082200 > Capabilities: [60] Express Endpoint IRQ 0 > Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag- > Device: Latency L0s <64ns, L1 unlimited > Device: AtnBtn- AtnInd- PwrInd- > Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported- > Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- > Device: MaxPayload 128 bytes, MaxReadReq 512 bytes > Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 8 > Link: Latency L0s unlimited, L1 unlimited > Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- > Link: Speed 2.5Gb/s, Width x8 > --->8--- Can you try with fw_cmd_doorbell? -- MST From ken at novell.com Tue May 16 23:01:21 2006 From: ken at novell.com (Ken L Johnson) Date: Wed, 17 May 2006 00:01:21 -0600 Subject: [openib-general] ib_mthca fails to load with old firmware In-Reply-To: References: <200605161949.14995.ken@novell.com> Message-ID: <200605170001.21953.ken@novell.com> On Tue, 16 May 2006 at 22:05:42 -0700, Roland Dreier wrote: > You could try passing the module option "fw_cmd_doorbell=0" to > ib_mthca. That may work around things. Thanks Roland and Michael, that did it. Just added the following to the /etc/modprobe.conf.local: options ib_mthca fw_cmd_doorbell=0 Regards, -- Ken L Johnson From eitan at mellanox.co.il Tue May 16 23:10:11 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 17 May 2006 09:10:11 +0300 Subject: [openib-general] opensm segfault? Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB8E@mtlexch01.mtl.com> cl_memcpy should have some debug capabilities on top of memcpy ... cl memory management provide means to track all memory allocations, etc. Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Sasha Khapyorsky > Sent: Wednesday, May 17, 2006 2:11 AM > To: Troy Benjegerdes > Cc: openib-general at openib.org > Subject: Re: [openib-general] opensm segfault? > > Hi Troy, > > On 14:41 Tue 16 May , Troy Benjegerdes wrote: > > I got this after an indeterminate amount of time running opensm.. > > May this be reproducible? Or it is completely random failure? > > > (gdb) bt > > #0 0x00002b90b0dbebf3 in cl_memcpy (p_dest=0x2aaaaac88850, p_src=0x0, > > count=64) at cl_memory_osd.c:87 > > #1 0x0000000000415053 in osm_pkey_tbl_sync_new_blocks ( > > p_pkey_tbl=0x2aaaaad99228) at osm_pkey.c:127 > > #2 0x0000000000416687 in osm_pkey_mgr_process (p_osm=0x580e40) > > at osm_pkey_mgr.c:407 > > #3 0x000000000043bb22 in osm_state_mgr_process (p_mgr=0x581ad8, > > signal=3) > > at osm_state_mgr.c:2243 > > #4 0x000000000043c88f in __osm_state_mgr_ctrl_disp_callback ( > > context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70 > > #5 0x00002b90b0db9437 in __cl_disp_worker (context=0x5831f0) > > at cl_dispatcher.c:108 > > #6 0x00002b90b0dc1ca3 in __cl_thread_pool_routine (context=0x583268) > > at cl_threadpool.c:78 > > #7 0x00002b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at > > cl_thread.c:61 > > #8 0x00002b90b0fe3b1c in start_thread () from /lib/libpthread.so.0 > > #9 0x00002b90b12c8273 in clone () from /lib/libc.so.6 > > > > > > > > And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This > > just seems like excessive uneeded abstraction. > > Absolutely agree with you. > > Sasha. > > > I'm running opensm from subversion rev 7091.. > > > > May 10 16:27:53 145969 [0000] -> OpenSM Rev:openib-1.2.0 OpenIB svn > > 6251:7091M > > > > the only local changes are as follows: > > > > troy at opteron1:/usr/src/openib-src/userspace/management$ svn diff > > Index: osm/opensm/osm_port_info_rcv.c > > =================================================================== > > --- osm/opensm/osm_port_info_rcv.c (revision 7091) > > +++ osm/opensm/osm_port_info_rcv.c (working copy) > > @@ -469,9 +469,14 @@ > > goto Exit; > > } > > > > +#if 0 > > /* Check for IBM eHCA firmware defect in reporting partition > > * enforcement cap */ > > if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) == > > IBM_VENDOR_ID) > > p_switch->switch_info.enforce_cap = 0; > > +#endif > > + /* Check for busted divergenet switch on ameslab network */ > > + if (cl_ntoh64(p_node->node_info.node_guid) == 0x00084e0000000152) > > + p_switch->switch_info.enforce_cap = 0; > > > > /* Bail out if this is a switch with no partition enforcement > > * capability */ > > if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0) > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From jgunthorpe at obsidianresearch.com Tue May 16 23:32:12 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Wed, 17 May 2006 00:32:12 -0600 Subject: [openib-general] opensm segfault? In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB8E@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB8E@mtlexch01.mtl.com> Message-ID: <20060517063212.GC13072@obsidianresearch.com> On Wed, May 17, 2006 at 09:10:11AM +0300, Eitan Zahavi wrote: > cl_memcpy should have some debug capabilities on top of memcpy ... > cl memory management provide means to track all memory allocations, etc. There are a huge number of canned solutions that provide a way to debug memory problems without polluting the code with wrapper functions... You can even fairly easially take your particular tracking functions and build them into a canned linkable solution. Wrapping ISO C (and IMHO, SUSv3) functions is almost always a bad idea. It creates a maintenance pain because people will inevitably add new code that doesn't use the wrappers. Debugging hooks can always be integrated in with linker tricks and portability is _always_ better served by just providing missing ISO and SUSv3 functions on deficient platforms (using autoconf, libraries and #include_next this can be made totally seamless) Jason From mst at mellanox.co.il Wed May 17 00:00:23 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 17 May 2006 10:00:23 +0300 Subject: [openib-general] Re: ib_mthca fails to load with old firmware In-Reply-To: <200605170001.21953.ken@novell.com> References: <200605161949.14995.ken@novell.com> <200605170001.21953.ken@novell.com> Message-ID: <20060517070023.GU30211@mellanox.co.il> Quoting r. Ken L Johnson : > Subject: Re: ib_mthca fails to load with old firmware > > On Tue, 16 May 2006 at 22:05:42 -0700, Roland Dreier wrote: > > > You could try passing the module option "fw_cmd_doorbell=0" to > > ib_mthca. That may work around things. > > Thanks Roland and Michael, that did it. Just added the following to > the /etc/modprobe.conf.local: > > options ib_mthca fw_cmd_doorbell=0 Hmm. There have been recent reports on configurations which have trouble working with fw_cmd_doorbell=1, and not all of them old FW. I never saw this in the lab. Roland, should we change fw_cmd_doorbell to 0 by default, until we figure out what is going on? -- MST From zhushisongzhu at yahoo.com Wed May 17 00:59:58 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Wed, 17 May 2006 00:59:58 -0700 (PDT) Subject: [openib-general] OFED-1.0-rc4 need db-devel Message-ID: <20060517075958.29141.qmail@web36910.mail.mud.yahoo.com> I have downloaded OFED-1.0-rc4 for my RHEL 4.3. But I can't build all modules because it needs db-devel. RHEL 4.3 just have db4-devel there is no db-devel. Is there anything I don't know? tks zhu __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From vlad at mellanox.co.il Wed May 17 02:03:34 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 17 May 2006 12:03:34 +0300 Subject: [openib-general] OFED-1.0-rc4 need db-devel In-Reply-To: <20060517075958.29141.qmail@web36910.mail.mud.yahoo.com> References: <20060517075958.29141.qmail@web36910.mail.mud.yahoo.com> Message-ID: <446AE6E6.4010905@mellanox.co.il> Zhu, db-devel package is required to build open_iscsi package RPM. This package is not relevant for RHEL 4.3. There are two options to install OFED-1.0-rc4 on RHEL 4.3 without open_iscsi: 1. Select "Custom installation" and don't choose to install open_iscsi. 2. Edit ofed.conf (created automatically under OFED-1.0-rc4 directory when you run install.sh or build.sh) and set *open_iscsi=n*. Then run: ./install.sh -c ofed.conf Regards, Vladimir zhu shi song wrote: > I have downloaded OFED-1.0-rc4 for my RHEL 4.3. But I > can't build all modules because it needs db-devel. > RHEL 4.3 just have db4-devel there is no db-devel. Is > there anything I don't know? > > tks > zhu > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From halr at voltaire.com Wed May 17 03:49:56 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 May 2006 06:49:56 -0400 Subject: [openib-general] opensm segfault? In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB8E@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB8E@mtlexch01.mtl.com> Message-ID: <1147862568.18971.42909.camel@hal.voltaire.com> On Wed, 2006-05-17 at 02:10, Eitan Zahavi wrote: > cl_memcpy should have some debug capabilities on top of memcpy . I don't see any. Did I miss something ? .. > cl memory management provide means to track all memory allocations, etc. Yes, there is extra memory tracking code for malloc and free. This is a separable item in my mind right now. -- Hal > Eitan Zahavi > Senior Engineering Director, Software Architect > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: openib-general-bounces at openib.org [mailto:openib-general- > > bounces at openib.org] On Behalf Of Sasha Khapyorsky > > Sent: Wednesday, May 17, 2006 2:11 AM > > To: Troy Benjegerdes > > Cc: openib-general at openib.org > > Subject: Re: [openib-general] opensm segfault? > > > > Hi Troy, > > > > On 14:41 Tue 16 May , Troy Benjegerdes wrote: > > > I got this after an indeterminate amount of time running opensm.. > > > > May this be reproducible? Or it is completely random failure? > > > > > (gdb) bt > > > #0 0x00002b90b0dbebf3 in cl_memcpy (p_dest=0x2aaaaac88850, > p_src=0x0, > > > count=64) at cl_memory_osd.c:87 > > > #1 0x0000000000415053 in osm_pkey_tbl_sync_new_blocks ( > > > p_pkey_tbl=0x2aaaaad99228) at osm_pkey.c:127 > > > #2 0x0000000000416687 in osm_pkey_mgr_process (p_osm=0x580e40) > > > at osm_pkey_mgr.c:407 > > > #3 0x000000000043bb22 in osm_state_mgr_process (p_mgr=0x581ad8, > > > signal=3) > > > at osm_state_mgr.c:2243 > > > #4 0x000000000043c88f in __osm_state_mgr_ctrl_disp_callback ( > > > context=0x5819e8, p_data=0x3) at osm_state_mgr_ctrl.c:70 > > > #5 0x00002b90b0db9437 in __cl_disp_worker (context=0x5831f0) > > > at cl_dispatcher.c:108 > > > #6 0x00002b90b0dc1ca3 in __cl_thread_pool_routine > (context=0x583268) > > > at cl_threadpool.c:78 > > > #7 0x00002b90b0dc1ae2 in __cl_thread_wrapper (arg=0x584750) at > > > cl_thread.c:61 > > > #8 0x00002b90b0fe3b1c in start_thread () from /lib/libpthread.so.0 > > > #9 0x00002b90b12c8273 in clone () from /lib/libc.so.6 > > > > > > > > > > > > And why the heck is "cl_memcpy" just a call to 'memcpy' anyway? This > > > just seems like excessive uneeded abstraction. > > > > Absolutely agree with you. > > > > Sasha. > > > > > I'm running opensm from subversion rev 7091.. > > > > > > May 10 16:27:53 145969 [0000] -> OpenSM Rev:openib-1.2.0 OpenIB svn > > > 6251:7091M > > > > > > the only local changes are as follows: > > > > > > troy at opteron1:/usr/src/openib-src/userspace/management$ svn diff > > > Index: osm/opensm/osm_port_info_rcv.c > > > =================================================================== > > > --- osm/opensm/osm_port_info_rcv.c (revision 7091) > > > +++ osm/opensm/osm_port_info_rcv.c (working copy) > > > @@ -469,9 +469,14 @@ > > > goto Exit; > > > } > > > > > > +#if 0 > > > /* Check for IBM eHCA firmware defect in reporting partition > > > * enforcement cap */ > > > if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) > == > > > IBM_VENDOR_ID) > > > p_switch->switch_info.enforce_cap = 0; > > > +#endif > > > + /* Check for busted divergenet switch on ameslab network */ > > > + if (cl_ntoh64(p_node->node_info.node_guid) == > 0x00084e0000000152) > > > + p_switch->switch_info.enforce_cap = 0; > > > > > > /* Bail out if this is a switch with no partition enforcement > > > * capability */ > > > if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0) > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Wed May 17 04:13:57 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 May 2006 07:13:57 -0400 Subject: [openib-general] [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out Message-ID: <1147864436.18971.43577.camel@hal.voltaire.com> OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out Signed-off-by: Hal Rosenstock Index: opensm/osm_sm_mad_ctrl.c =================================================================== --- opensm/osm_sm_mad_ctrl.c (revision 7202) +++ opensm/osm_sm_mad_ctrl.c (working copy) @@ -803,7 +803,9 @@ __osm_sm_mad_ctrl_send_err_cb( IN osm_madw_t *p_madw ) { osm_sm_mad_ctrl_t* p_ctrl = (osm_sm_mad_ctrl_t*)bind_context; +#if 0 osm_physp_t* p_physp; +#endif ib_api_status_t status; ib_smp_t* p_smp; @@ -844,25 +846,26 @@ __osm_sm_mad_ctrl_send_err_cb( lid. */ /* For now - do not add the alternate dr path to the release */ - if (0) - if ( p_madw->mad_addr.dest_lid != 0xFFFF ) +#if 0 + if ( p_madw->mad_addr.dest_lid != 0xFFFF ) + { + p_physp = + osm_get_physp_by_mad_addr(p_ctrl->p_log, + p_ctrl->p_subn, + &(p_madw->mad_addr)); + if (!p_physp) { - p_physp = - osm_get_physp_by_mad_addr(p_ctrl->p_log, - p_ctrl->p_subn, - &(p_madw->mad_addr)); - if (! p_physp) - { - osm_log( p_ctrl->p_log, OSM_LOG_ERROR, - "__osm_sm_mad_ctrl_send_err_cb: ERR 3114: " - "Failed to find the corresponding phys port\n"); - } - else - { - osm_physp_replace_dr_path_with_alternate_dr_path( - p_ctrl->p_log, p_ctrl->p_subn, p_physp, p_madw->h_bind ); - } + osm_log( p_ctrl->p_log, OSM_LOG_ERROR, + "__osm_sm_mad_ctrl_send_err_cb: ERR 3114: " + "Failed to find the corresponding phys port\n"); } + else + { + osm_physp_replace_dr_path_with_alternate_dr_path( + p_ctrl->p_log, p_ctrl->p_subn, p_physp, p_madw->h_bind ); + } + } +#endif /* An error occurred. No response was received to a request MAD. From mst at mellanox.co.il Wed May 17 04:41:33 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 17 May 2006 14:41:33 +0300 Subject: [openib-general] Re: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out In-Reply-To: <1147864436.18971.43577.camel@hal.voltaire.com> References: <1147864436.18971.43577.camel@hal.voltaire.com> Message-ID: <20060517114133.GX30211@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out > > OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out > > Signed-off-by: Hal Rosenstock > @@ -844,25 +846,26 @@ __osm_sm_mad_ctrl_send_err_cb( > lid. > */ > /* For now - do not add the alternate dr path to the release */ > - if (0) > - if ( p_madw->mad_addr.dest_lid != 0xFFFF ) > +#if 0 > + if ( p_madw->mad_addr.dest_lid != 0xFFFF ) In my experience, if you compile with -O, gcc does a good enough job of dead code elimination. -- MST From halr at voltaire.com Wed May 17 05:11:40 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 May 2006 08:11:40 -0400 Subject: [openib-general] Re: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out In-Reply-To: <20060517114133.GX30211@mellanox.co.il> References: <1147864436.18971.43577.camel@hal.voltaire.com> <20060517114133.GX30211@mellanox.co.il> Message-ID: <1147867898.18971.44877.camel@hal.voltaire.com> On Wed, 2006-05-17 at 07:41, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out > > > > OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out > > > > Signed-off-by: Hal Rosenstock > > > > > @@ -844,25 +846,26 @@ __osm_sm_mad_ctrl_send_err_cb( > > lid. > > */ > > /* For now - do not add the alternate dr path to the release */ > > - if (0) > > - if ( p_madw->mad_addr.dest_lid != 0xFFFF ) > > +#if 0 > > + if ( p_madw->mad_addr.dest_lid != 0xFFFF ) > > In my experience, if you compile with -O, gcc does a good enough job of > dead code elimination. But not all builds are that way though. -- Hal From zhushisongzhu at yahoo.com Wed May 17 05:17:25 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Wed, 17 May 2006 05:17:25 -0700 (PDT) Subject: [openib-general] OFED RC4 also can't support >2000 connections Message-ID: <20060517121725.61335.qmail@web36911.mail.mud.yahoo.com> I have installed OFED RC4 on my RHEL 4.3(2.6.9-34 kernel). I use the same method I told in previous mail. When increasing concurrent sdp connection to 2000. sdp refuse connection in server side. And client can't connect to server through sdp connection forever. OS: RHEL 4.3 (2.6.9-34) IB: OFED RC4 Test Method: Server: LD_PRELOAD=libsdp.so squid -d 10 -f squid.conf( sdp listening on IB0: 193.12.10.14:3129) Client: LD_PRELOAD=libsdp.so ./lt-ab -c 2000 -n 2000 -X 193.12.10.14:3129 http://www.google.com/index.html ( IB0: 193.12.10.24) Who know what's wrong with sdp many concurrent connections? I have bought the cards for about 3 weeks, but I can't make them work correctly. Urgent! tks zhu __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From ogerlitz at voltaire.com Wed May 17 06:00:52 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 17 May 2006 16:00:52 +0300 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> Message-ID: <446B1E84.9020505@voltaire.com> Roland Dreier wrote: > Or> I don't see the niether of the two iscsi updates for 2.6.18 > Or> (both sent by Mike Christie) in your git tree, i was looking > Or> for it all over (in the for-2.6.18 , for-mm, master, for-linus > Or> branches ...). Do i missing anything or you were waiting for > Or> my repost of the patches to pull the iscsi updates? > > Yeah, I haven't pushed it out yet. > > I will be putting iSER into an iser branch of my tree, which I'll ask > Linus to pull once the SCSI changes are in his tree. OK, i have tested iSCSI/iSER with the kernel being built from the for-mm branch of your git tree and it works fine! Can you spare few words whats the difference between the for-2.6.18 and for-mm branches of your git tree? > Or> OK, thanks. Let me know when you have the branch, so i will be > Or> able to test it with this exact code configuration. > > It's there and pushed to master.kernel.org When you say the code is pushed into master.kernel.org are you referring to the mm tree of Andrew Morton? i don't see he has one under kernel.org/git? Or. From mst at mellanox.co.il Wed May 17 06:06:16 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 17 May 2006 16:06:16 +0300 Subject: [openib-general] Re: Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: <446B1E84.9020505@voltaire.com> References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> <446B1E84.9020505@voltaire.com> Message-ID: <20060517130616.GA30211@mellanox.co.il> Quoting r. Or Gerlitz : > When you say the code is pushed into master.kernel.org are you referring > to the mm tree of Andrew Morton? i don't see he has one under > kernel.org/git? Andrew does not use git for development. -- MST From eitan at mellanox.co.il Wed May 17 06:11:11 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 17 May 2006 16:11:11 +0300 Subject: [openib-general] OFED RC4 also can't support >2000 connections Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB9B@mtlexch01.mtl.com> Hi Zhu, If you are using libsdp.conf to select which ports should map to SDP and which to TCP you might run out of resources for tracking the opened sockets. Try increasing the following constant in libsdp: libsdp/src/port.c line 48: #define MAPPED_SOCKET_MAX 1024 to something like: #define MAPPED_SOCKET_MAX 10000 Or, if you can use SDP sockets only (your config file is empty anyway): SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so squid -d 10 -f squid.conf SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so ./lt-ab -c 2000 -n 2000 -X 193.12.10.14:3129 Hope this fixes the issue you see Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of zhu shi song > Sent: Wednesday, May 17, 2006 3:17 PM > To: openib-general at openib.org > Subject: [openib-general] OFED RC4 also can't support >2000 connections > > I have installed OFED RC4 on my RHEL 4.3(2.6.9-34 > kernel). I use the same method I told in previous > mail. When increasing concurrent sdp connection to > 2000. sdp refuse connection in server side. And client > can't connect to server through sdp connection > forever. > > OS: RHEL 4.3 (2.6.9-34) > IB: OFED RC4 > Test Method: > Server: LD_PRELOAD=libsdp.so squid -d 10 -f > squid.conf( sdp listening on IB0: 193.12.10.14:3129) > Client: LD_PRELOAD=libsdp.so ./lt-ab -c 2000 -n > 2000 -X 193.12.10.14:3129 > http://www.google.com/index.html ( IB0: 193.12.10.24) > > > Who know what's wrong with sdp many concurrent > connections? I have bought the cards for about 3 > weeks, but I can't make them work correctly. Urgent! > > tks > zhu > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From john.blackwood at ccur.com Wed May 17 06:15:38 2006 From: john.blackwood at ccur.com (John Blackwood) Date: Wed, 17 May 2006 09:15:38 -0400 Subject: [openib-general] The AF_INET_RDS value Message-ID: <446B21FA.1040203@ccur.com> Hello, I noticed that in ulp/rds/rds_inet.h, the value for AF_INET_RDS is: #define AF_INET_RDS 30 But in include/linux/socket.h, there is already a AF_TIPC with the same value: #define AF_WANPIPE 25 /* Wanpipe API Sockets */ #define AF_LLC 26 /* Linux LLC */ #define AF_TIPC 30 /* TIPC sockets */ #define AF_BLUETOOTH 31 /* Bluetooth sockets */ #define AF_MAX 32 /* For now.. */ Just wondering if the AF_INET_RDS value should be changed? Thanks. From ogerlitz at voltaire.com Wed May 17 06:20:25 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 17 May 2006 16:20:25 +0300 Subject: [openib-general] (was: slab error while removing ib_mad) testing IB of a kernel before its release In-Reply-To: References: Message-ID: <446B2319.9030204@voltaire.com> Roland Dreier wrote: > Or> I think you were on vacation when i posted this, there were > Or> two responses saying they were not able to reproduce it, but > Or> no one was trying 2.6.17-X > > Not sure why you expect me to solve this -- other than the fact that I > am a great debugger ;) Let me clarify a little: The test case for itself (probing out a module loaded by the pci hotplug subsystem) is kind of rare and its not the issue (I am doing it when replacing the ib stack with newer code) When i posted the original report, i got responses from two people both saying they have tried it with this or that flavor of the current stable kernel (2.6.16) and that the problem does not reproduce (sure...). The impression i was getting from those responses and the luck of others is that (say) almost no one of the openib maintainers test infiniband with the "next" kernel which is not released yet. And if we don't test it, sure we can't expect it to work, no less. That's the point i wanted to make later in that thread. I opted for deferring this discussion for a time you are around, so this is why i write it only now. Or. From rdreier at cisco.com Wed May 17 07:38:17 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 07:38:17 -0700 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: <446B1E84.9020505@voltaire.com> (Or Gerlitz's message of "Wed, 17 May 2006 16:00:52 +0300") References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> <446B1E84.9020505@voltaire.com> Message-ID: Or> Can you spare few words whats the difference between the Or> for-2.6.18 and for-mm branches of your git tree? for-mm is what Andrew pulls to get patches for -mm. It has things that I think should be seen in -mm, but which I am not ready to queue in for-2.6.18. You can use git show-branch or gitk to visualize exactly how the branches relate. Or> When you say the code is pushed into master.kernel.org are you Or> referring to the mm tree of Andrew Morton? i don't see he has Or> one under kernel.org/git? No, I mean it's in my tree on master.kernel.org, rather than just sitting on my local hard disk. - R. From rdreier at cisco.com Wed May 17 07:40:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 07:40:07 -0700 Subject: [openib-general] Re: In-Reply-To: <446B2319.9030204@voltaire.com> (Or Gerlitz's message of "Wed, 17 May 2006 16:20:25 +0300") References: <446B2319.9030204@voltaire.com> Message-ID: Or> The impression i was getting from those responses and the luck Or> of others is that (say) almost no one of the openib Or> maintainers test infiniband with the "next" kernel which is Or> not released yet. Yes, I agree. That's why I think we should get rid of the "linux-kernel" part of the svn tree entirely. Because everyone who wants to test new code seems to run last stable kernel + svn drivers instead of the new development kernel. - R. From rdreier at cisco.com Wed May 17 07:41:03 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 07:41:03 -0700 Subject: [openib-general] Re: ib_mthca fails to load with old firmware In-Reply-To: <20060517070023.GU30211@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 17 May 2006 10:00:23 +0300") References: <200605161949.14995.ken@novell.com> <200605170001.21953.ken@novell.com> <20060517070023.GU30211@mellanox.co.il> Message-ID: Michael> Hmm. There have been recent reports on configurations Michael> which have trouble working with fw_cmd_doorbell=1, and Michael> not all of them old FW. I never saw this in the lab. Michael> Roland, should we change fw_cmd_doorbell to 0 by default, Michael> until we figure out what is going on? Yes, it's looking like that option is causing problems. I will put a patch changing the default to 0 into 2.6.17. - R. From rdreier at cisco.com Wed May 17 07:44:42 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 07:44:42 -0700 Subject: [openib-general] [PATCH] IB: Make needlessly global ib_mad_cache static Message-ID: Signed-off-by: Roland Dreier --- Any reason not to apply this? diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 5ad41a6..92c7362 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -45,8 +45,7 @@ MODULE_DESCRIPTION("kernel IB MAD API"); MODULE_AUTHOR("Hal Rosenstock"); MODULE_AUTHOR("Sean Hefty"); - -kmem_cache_t *ib_mad_cache; +static kmem_cache_t *ib_mad_cache; static struct list_head ib_mad_port_list; static u32 ib_mad_client_id = 0; diff --git a/drivers/infiniband/core/mad_priv.h b/drivers/infiniband/core/mad_priv.h index b4fa28d..d147f3b 100644 --- a/drivers/infiniband/core/mad_priv.h +++ b/drivers/infiniband/core/mad_priv.h @@ -212,8 +212,6 @@ struct ib_mad_port_private { struct ib_mad_qp_info qp_info[IB_MAD_QPS_CORE]; }; -extern kmem_cache_t *ib_mad_cache; - int ib_send_mad(struct ib_mad_send_wr_private *mad_send_wr); struct ib_mad_send_wr_private * From mst at mellanox.co.il Wed May 17 07:48:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 17 May 2006 17:48:30 +0300 Subject: [openib-general] Re: NOP problem in ib_mthca on OFED RC4 In-Reply-To: References: <20060509150426.GI21036@mellanox.co.il> Message-ID: <20060517144830.GE30211@mellanox.co.il> Quoting r. Don.Albert at bull.com : > Subject: Re: NOP problem in ib_mthca on OFED RC4 > > > Michael, > > > > > Which FW revision do you have? > > > > > > > The "ibstat" command shows: > > > > > > CA type: MT25204 > > > Number of ports: 1 > > > Firmware version: 1.0.800 > > > Hardware version: a0 > > > Node GUID: 0x0002c90200216dc4 > > > System image GUID: 0x0002c90200216dc7 > > > > > > -Don Albert- > > > > > > > Yes, that's the latest revision. Hmm. > > > > What about the other thing I mentioned in my first message: the "lspci" command complains about the board slot that the HCA is plugged into: > > pcilib: Resource 2 in /sys/bus/pci/devices/0000:03:00.0/resource has a 64-bit address, ignoring > .... > 03:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20) > > I also found out that on this machine the HCA is plugged into a 16X PCI-e slot, which is different than the other machine which is working, where the slot is 8X. > > Bear in mind, however, that both machines were previously working with the 2.6.9-34 kernel with the backport patches and the OpenIB svn 6500 code. Did something happen in 2.6.16, or am I missing a patch? > > -Don Albert- > Could you please give more detail on the exact system that had/has this problem? Model, chipset revision, full lspci -v output, etc. Also, is there some way to login to such a system there remotely? Thanks a bunch, -- MST From rdreier at cisco.com Wed May 17 07:49:04 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 07:49:04 -0700 Subject: [openib-general] Re: ib_mthca fails to load with old firmware In-Reply-To: (Roland Dreier's message of "Wed, 17 May 2006 07:41:03 -0700") References: <200605161949.14995.ken@novell.com> <200605170001.21953.ken@novell.com> <20060517070023.GU30211@mellanox.co.il> Message-ID: OK, I put this into my 2.6.17 branch: diff-tree 1db76c14d215c8b26024dd532de3dcaf66ea30f7 (from 032ebf2620ef99a4fedaa0f77dc2272095ac5863) Author: Roland Dreier Date: Wed May 17 07:48:07 2006 -0700 IB/mthca: Make fw_cmd_doorbell default to 0 Setting fw_cmd_doorbell allows FW command to be queued using posted writes instead of requiring polling on a "go" bit, so it should be a performance boost. However, the option causes problems with at least some device/firmware combinations, so set the default to 0 until we understand what's going on better. Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index 1985b5d..798e13e 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -182,7 +182,7 @@ struct mthca_cmd_context { u8 status; }; -static int fw_cmd_doorbell = 1; +static int fw_cmd_doorbell = 0; module_param(fw_cmd_doorbell, int, 0644); MODULE_PARM_DESC(fw_cmd_doorbell, "post FW commands through doorbell page if nonzero " "(and supported by FW)"); From ishai at mellanox.co.il Wed May 17 07:55:20 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Wed, 17 May 2006 17:55:20 +0300 Subject: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading In-Reply-To: References: Message-ID: <20060517145520.GA4691@mellanox.co.il> On Wed, May 17, 2006 at 02:56:58AM +0300, Roland Dreier wrote: > BTW, I think the patch below is correct as well. This avoids problems > where the SRP driver waits forever for a completion, for example if > sending the DREQ fails because the connection has already been > disconnected by the target. > > Does this scenario seem like the deadlock you thought you saw? > > --- linux-kernel/infiniband/ulp/srp/ib_srp.c (revision 7245) > +++ linux-kernel/infiniband/ulp/srp/ib_srp.c (working copy) > @@ -342,7 +342,10 @@ static void srp_disconnect_target(struct > /* XXX should send SRP_I_LOGOUT request */ > > init_completion(&target->done); > - ib_send_cm_dreq(target->cm_id, NULL, 0); > + if (ib_send_cm_dreq(target->cm_id, NULL, 0)) { > + printk(KERN_DEBUG PFX "Sending CM DREQ failed\n"); > + return; > + } > wait_for_completion(&target->done); > } > I don't think this caused the deadlock I had. Still it looks like an important patch. -- Ishai Rabinovitz From ishai at mellanox.co.il Wed May 17 07:57:45 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Wed, 17 May 2006 17:57:45 +0300 Subject: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading In-Reply-To: References: Message-ID: <20060517145745.GB4691@mellanox.co.il> On Wed, May 17, 2006 at 02:55:57AM +0300, Roland Dreier wrote: > Hmm, this doesn't seem right to me. If I try this, then I get a crash > because the scsi_host is already gone after the first put. I verified > that the reference count is 1 before these puts, and with the > unmodified module I don't see anything left in /sys/class/scsi_host > after unloading the module. > > What kernel are you seeing problems with? I'm testing with an > up-to-date git kernel, although I doubt it makes a difference (did > SCSI reference counting change recently??). > > I do think there are some extra scsi_host_put() calls in > srp_remove_work() -- I think the double scsi_host_put() dates back to > a version (which I may never even have checked in) where there was a > scsi_host_get() to avoid the scsi_host going away between the > schedule_work() and srp_remove_work() actually running. > > So the patch below seems correct to me. > > What do you think? I could not reproduce the problem again, so this patch works for me. -- Ishai Rabinovitz From sweitzen at cisco.com Wed May 17 08:35:40 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 17 May 2006 08:35:40 -0700 Subject: [openib-general] OFED-1.0-rc4 need db-devel Message-ID: > db-devel package is required to build open_iscsi package RPM. > This package is not relevant for RHEL 4.3. > There are two options to install OFED-1.0-rc4 on RHEL 4.3 without > open_iscsi: > 1. Select "Custom installation" and don't choose to install > open_iscsi. > 2. Edit ofed.conf (created automatically under OFED-1.0-rc4 directory > when you run install.sh or build.sh) and set *open_iscsi=n*. > Then run: > ./install.sh -c ofed.conf Why don't we ignore these packages on RHEL4 U3, just like we ignore uDAPL on ppc64? Scott From dotanb at mellanox.co.il Wed May 17 08:35:44 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Wed, 17 May 2006 18:35:44 +0300 Subject: [openib-general] [librdmacm] changes to cmatose to return a value different than 0 when there is a failure Message-ID: <200605171835.44079.dotanb@mellanox.co.il> Added checks to the return values of all of the functions that may fail (in order to add this test to the regression system). Signed-off-by: Dotan Barak Index: last_stable/src/userspace/librdmacm/examples/cmatose.c =================================================================== --- last_stable.orig/src/userspace/librdmacm/examples/cmatose.c 2006-05-17 18:30:35.000000000 +0300 +++ last_stable/src/userspace/librdmacm/examples/cmatose.c 2006-05-17 18:31:35.000000000 +0300 @@ -219,7 +219,7 @@ static void connect_error(void) test.connects_left--; } -static void addr_handler(struct cmatest_node *node) +static int addr_handler(struct cmatest_node *node) { int ret; @@ -228,9 +228,10 @@ static void addr_handler(struct cmatest_ printf("cmatose: resolve route failed: %d\n", ret); connect_error(); } + return ret; } -static void route_handler(struct cmatest_node *node) +static int route_handler(struct cmatest_node *node) { struct rdma_conn_param conn_param; int ret; @@ -252,9 +253,10 @@ static void route_handler(struct cmatest printf("cmatose: failure connecting: %d\n", ret); goto err; } - return; + return 0; err: connect_error(); + return ret; } static int connect_handler(struct rdma_cm_id *cma_id) @@ -305,10 +307,10 @@ static int cma_handler(struct rdma_cm_id switch (event->event) { case RDMA_CM_EVENT_ADDR_RESOLVED: - addr_handler(cma_id->context); + ret = addr_handler(cma_id->context); break; case RDMA_CM_EVENT_ROUTE_RESOLVED: - route_handler(cma_id->context); + ret = route_handler(cma_id->context); break; case RDMA_CM_EVENT_CONNECT_REQUEST: ret = connect_handler(cma_id); @@ -420,35 +422,45 @@ static int poll_cqs(void) return 0; } -static void connect_events(void) +static int connect_events(void) { struct rdma_cm_event *event; - int err = 0; + int err = 0, ret = 0; while (test.connects_left && !err) { err = rdma_get_cm_event(test.channel, &event); if (!err) { cma_handler(event->id, event); rdma_ack_cm_event(event); + } else { + printf("cmatose: failure in rdma_get_cm_event in connect events\n"); + ret = err; } } + + return ret; } -static void disconnect_events(void) +static int disconnect_events(void) { struct rdma_cm_event *event; - int err = 0; + int err = 0, ret = 0; while (test.disconnects_left && !err) { err = rdma_get_cm_event(test.channel, &event); if (!err) { cma_handler(event->id, event); rdma_ack_cm_event(event); + } else { + printf("cmatose: failure in rdma_get_cm_event in disconnect events\n"); + ret = err; } } + + return ret; } -static void run_server(void) +static int run_server(void) { struct rdma_cm_id *listen_id; int i, ret; @@ -457,7 +469,7 @@ static void run_server(void) ret = rdma_create_id(test.channel, &listen_id, &test); if (ret) { printf("cmatose: listen request failed\n"); - return; + return ret; } test.src_in.sin_family = PF_INET; @@ -465,7 +477,7 @@ static void run_server(void) ret = rdma_bind_addr(listen_id, test.src_addr); if (ret) { printf("cmatose: bind address failed: %d\n", ret); - return; + return ret; } ret = rdma_listen(listen_id, 0); @@ -474,16 +486,21 @@ static void run_server(void) goto out; } - connect_events(); + ret = connect_events(); + if (ret) + goto out; if (message_count) { printf("initiating data transfers\n"); - for (i = 0; i < connections; i++) - if (post_sends(&test.nodes[i])) + for (i = 0; i < connections; i++) { + ret = post_sends(&test.nodes[i]); + if (ret) goto out; + } printf("receiving data transfers\n"); - if (poll_cqs()) + ret = poll_cqs(); + if (ret) goto out; printf("data transfers complete\n"); @@ -497,10 +514,13 @@ static void run_server(void) rdma_disconnect(test.nodes[i].cma_id); } - disconnect_events(); + ret = disconnect_events(); + printf("disconnected\n"); + out: rdma_destroy_id(listen_id); + return ret; } static int get_addr(char *dst, struct sockaddr_in *addr) @@ -525,20 +545,20 @@ out: return ret; } -static void run_client(char *dst, char *src) +static int run_client(char *dst, char *src) { - int i, ret; + int i, ret, ret2; printf("cmatose: starting client\n"); if (src) { ret = get_addr(src, &test.src_in); if (ret) - return; + return ret; } ret = get_addr(dst, &test.dst_in); if (ret) - return; + return ret; test.dst_in.sin_port = 7471; @@ -550,30 +570,44 @@ static void run_client(char *dst, char * if (ret) { printf("cmatose: failure getting addr: %d\n", ret); connect_error(); + return ret; } } - connect_events(); + ret = connect_events(); + if (ret) + goto out; if (message_count) { printf("receiving data transfers\n"); - if (poll_cqs()) + ret = poll_cqs(); + if (ret) goto out; printf("sending replies\n"); - for (i = 0; i < connections; i++) - if (post_sends(&test.nodes[i])) + for (i = 0; i < connections; i++) { + ret = post_sends(&test.nodes[i]); + if (ret) goto out; + } printf("data transfers complete\n"); } + + ret = 0; out: - disconnect_events(); + ret2 = disconnect_events(); + if (ret2) + ret = ret2; + + return ret; } int main(int argc, char **argv) { + int rc; + if (argc > 3) { printf("usage: %s [server_addr [src_addr]]\n", argv[0]); exit(1); @@ -595,12 +629,14 @@ int main(int argc, char **argv) exit(1); if (is_server) - run_server(); + rc = run_server(); else - run_client(argv[1], (argc == 3) ? argv[2] : NULL); + rc = run_client(argv[1], (argc == 3) ? argv[2] : NULL); printf("test complete\n"); destroy_nodes(); rdma_destroy_event_channel(test.channel); - return 0; + + printf("return status %d\n", rc); + return rc; } From vlad at mellanox.co.il Wed May 17 08:40:42 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Wed, 17 May 2006 18:40:42 +0300 Subject: [openib-general] OFED-1.0-rc4 need db-devel In-Reply-To: References: Message-ID: <446B43FA.7000709@mellanox.co.il> Scott Weitzenkamp (sweitzen) wrote: >> db-devel package is required to build open_iscsi package RPM. >> This package is not relevant for RHEL 4.3. >> There are two options to install OFED-1.0-rc4 on RHEL 4.3 without >> open_iscsi: >> 1. Select "Custom installation" and don't choose to install >> open_iscsi. >> 2. Edit ofed.conf (created automatically under OFED-1.0-rc4 directory >> when you run install.sh or build.sh) and set *open_iscsi=n*. >> Then run: >> ./install.sh -c ofed.conf >> > > Why don't we ignore these packages on RHEL4 U3, just like we ignore > uDAPL on ppc64? > > Scott > > We will do this in OFED-1.0-rc5. Vladimir From sweitzen at cisco.com Wed May 17 08:40:50 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 17 May 2006 08:40:50 -0700 Subject: [openib-general] ib_mthca fails to load with old firmware Message-ID: > Ken> I'm running into a problem when I try to use the OFED RC4 > Ken> release on some blade systems that have TopSpin HCA daughter > Ken> cards installed (actually Mellanox). I'm trying to figure out > Ken> how to update the firmware to the latest [ > Ken> http://mellanox.com/support/firmware_table.php ] but it seems > Ken> I must know the PSID so I can grab the right firmware > Ken> image. Can anyone point me in the right direction here? > > For blade HCAs you should contact the HCA vendor for firmware updates. > > You could try passing the module option "fw_cmd_doorbell=0" to > ib_mthca. That may work around things. > > - R. What kind of blade systems are these? For some blade systems, Cisco provides HCA firmware that has been configured to provide better signal integrity. If you run /usr/local/ofed/sbin/tvflash -i, I can then tell which firmware you need. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems From ishai at mellanox.co.il Wed May 17 08:40:04 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Wed, 17 May 2006 18:40:04 +0300 Subject: [openib-general] SRP [PATCH] Looks like a potantial bug Message-ID: <20060517154004.GA5091@mellanox.co.il> Hi, While doing a code review I found a potential bug. I did not manage to execute a test to check this code. Please take a look: Signed-off-by: Ishai Rabinovitz -------------------------------------- Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-17 16:24:24.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-17 17:13:47.000000000 +0300 @@ -1326,7 +1326,7 @@ static int srp_reset_device(struct scsi_ list_for_each_entry_safe(req, tmp, &target->req_queue, list) if (req->scmnd->device == scmnd->device) { req->scmnd->result = DID_RESET << 16; - scmnd->scsi_done(scmnd); + req->scmnd->scsi_done(scmnd); srp_remove_req(target, req); } -- Ishai Rabinovitz From mst at mellanox.co.il Wed May 17 08:46:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 17 May 2006 18:46:43 +0300 Subject: [openib-general] Re: In-Reply-To: References: <446B2319.9030204@voltaire.com> Message-ID: <20060517154643.GG30211@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: > That's why I think we should get rid of the > "linux-kernel" part of the svn tree entirely. Because everyone who > wants to test new code seems to run last stable kernel + svn drivers > instead of the new development kernel. > > - R. Yea, we are going that way. Soon all we'll need will be a git tree that we can used for development. BTW, how easy is it to get an account at kernel.org? But, I think it's still useful to make it possible for people to test development snapshots on stable kernels simply because we'll get more testing and feedback this way. One way would be to put snapshots under https://openib.org/downloads/ -- MST From dotanb at mellanox.co.il Wed May 17 08:47:27 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Wed, 17 May 2006 18:47:27 +0300 Subject: [openib-general] compilation warning in diags tools Message-ID: <200605171847.27448.dotanb@mellanox.co.il> Hi. Here is a compilation warning when using gcc 3.4.5: src/grouping.c: In function `get_router_slot': src/grouping.c:213: warning: implicit declaration of function `calloc' /bin/sh ./libtool --tag=CC --mode=link gcc -m64 -L../libibcommon -libcommon -L../libibumad -libumad -L../osm/opensm/.libs -lopensm -L../os m/libvendor/.libs -losmvendor -L../osm/complib/.libs -losmcomp -o src/ibnetdiscover src_ibnetdiscover-ibnetdiscover.o src_ibnetdiscover-gro uping.o ../libibcommon/libibcommon.la ../libibumad/libibumad.la ../libibmad/libibmad.la (i think that stdlib.h should be included to prevent this warning) thanks Dotan From mst at mellanox.co.il Wed May 17 08:51:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 17 May 2006 18:51:25 +0300 Subject: [openib-general] multcast join failed Message-ID: <20060517155125.GH30211@mellanox.co.il> Hi, Roland! With svn trunk, I started getting the following on one machine: ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status -22 and I can't ping this machine over ipoib. Any idea? -- MST From ishai at mellanox.co.il Wed May 17 08:50:44 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Wed, 17 May 2006 18:50:44 +0300 Subject: [openib-general] SRP [PATCH] Looks like a potantial bug In-Reply-To: <20060517154004.GA5091@mellanox.co.il> References: <20060517154004.GA5091@mellanox.co.il> Message-ID: <20060517155044.GA5319@mellanox.co.il> On Wed, May 17, 2006 at 06:40:04PM +0300, Ishai Rabinovitz wrote: > Hi, > > While doing a code review I found a potential bug. > I did not manage to execute a test to check this code. > Please take a look: Sorry, I made a mistake in the patch. Please look at this one. In srp_reconnect_target it uses req->scmnd->scsi_done(req->scmnd); (like in the patch) Ishai > Signed-off-by: Ishai Rabinovitz > -------------------------------------- > Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c > =================================================================== > --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-17 16:24:24.000000000 +0300 > +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-17 17:13:47.000000000 +0300 > @@ -1326,7 +1326,7 @@ static int srp_reset_device(struct scsi_ > list_for_each_entry_safe(req, tmp, &target->req_queue, list) > if (req->scmnd->device == scmnd->device) { > req->scmnd->result = DID_RESET << 16; > - scmnd->scsi_done(scmnd); > + req->scmnd->scsi_done(req->scmnd); > srp_remove_req(target, req); > } > > -- > Ishai Rabinovitz > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- Ishai Rabinovitz From rdreier at cisco.com Wed May 17 09:04:39 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 09:04:39 -0700 Subject: [openib-general] Re: multcast join failed In-Reply-To: <20060517155125.GH30211@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 17 May 2006 18:51:25 +0300") References: <20060517155125.GH30211@mellanox.co.il> Message-ID: > With svn trunk, I started getting the following on one machine: > > ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status -22 > > and I can't ping this machine over ipoib. > Any idea? No, nothing of significance has changed in ipoib for a while. - R. From rdreier at cisco.com Wed May 17 09:07:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 09:07:09 -0700 Subject: [openib-general] Re: In-Reply-To: <20060517154643.GG30211@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 17 May 2006 18:46:43 +0300") References: <446B2319.9030204@voltaire.com> <20060517154643.GG30211@mellanox.co.il> Message-ID: Michael> Yea, we are going that way. Soon all we'll need will be Michael> a git tree that we can used for development. BTW, how Michael> easy is it to get an account at kernel.org? It's not hard if you have some history as a kernel developer. Of course hosting a git tree is pretty easy as well. Michael> But, I think it's still useful to make it possible for Michael> people to test development snapshots on stable kernels Michael> simply because we'll get more testing and feedback this Michael> way. It's fine except when API changes force us to diverge from upstream. Then it becomes a hassle. - R. From halr at voltaire.com Wed May 17 09:14:09 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 May 2006 12:14:09 -0400 Subject: [openib-general] [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines Message-ID: <1147882436.18971.50423.camel@hal.voltaire.com> OpenSM: Use memory routines directly and eliminate cl_mem* routines as these routines are part of ISO C Signed-off-by: Hal Rosenstock Index: osm/include/opensm/osm_port.h =================================================================== --- osm/include/opensm/osm_port.h (revision 7286) +++ osm/include/opensm/osm_port.h (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -1296,7 +1296,7 @@ static inline void osm_port_construct( IN osm_port_t* const p_port ) { - cl_memclr( p_port, sizeof(*p_port) ); + memset( p_port, 0, sizeof(*p_port) ); cl_qlist_init( &p_port->mcm_list ); } /* Index: osm/include/opensm/osm_madw.h =================================================================== --- osm/include/opensm/osm_madw.h (revision 7286) +++ osm/include/opensm/osm_madw.h (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -489,8 +489,8 @@ osm_madw_construct( Don't touch the pool_item since that is an opaque object. Clear all other objects in the mad wrapper. */ - cl_memclr( ((uint8_t *)p_madw) + sizeof( cl_pool_item_t ), - sizeof(*p_madw) - sizeof( cl_pool_item_t ) ); + memset( ((uint8_t *)p_madw) + sizeof( cl_pool_item_t ), 0, + sizeof(*p_madw) - sizeof( cl_pool_item_t ) ); } /* * PARAMETERS Index: osm/include/opensm/osm_mcm_info.h =================================================================== --- osm/include/opensm/osm_mcm_info.h (revision 7286) +++ osm/include/opensm/osm_mcm_info.h (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -108,7 +108,7 @@ static inline void osm_mcm_info_construct( IN osm_mcm_info_t* const p_mcm ) { - cl_memclr( p_mcm, sizeof(*p_mcm) ); + memset( p_mcm, 0, sizeof(*p_mcm) ); } /* * PARAMETERS Index: osm/include/opensm/osm_path.h =================================================================== --- osm/include/opensm/osm_path.h (revision 7286) +++ osm/include/opensm/osm_path.h (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -121,7 +121,7 @@ osm_dr_path_construct( IN osm_dr_path_t* const p_path ) { /* The first location in the path array is reserved. */ - cl_memclr( p_path, sizeof(*p_path) ); + memset( p_path, 0, sizeof(*p_path) ); p_path->h_bind = OSM_BIND_INVALID_HANDLE; } @@ -168,7 +168,7 @@ osm_dr_path_init( CL_ASSERT( hop_count < IB_SUBNET_PATH_HOPS_MAX ); p_path->h_bind = h_bind; p_path->hop_count = hop_count; - cl_memcpy( p_path->path, path, IB_SUBNET_PATH_HOPS_MAX ); + memcpy( p_path->path, path, IB_SUBNET_PATH_HOPS_MAX ); } /* Index: osm/include/opensm/osm_port_profile.h =================================================================== --- osm/include/opensm/osm_port_profile.h (revision 7286) +++ osm/include/opensm/osm_port_profile.h (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -125,7 +125,7 @@ osm_port_prof_construct( IN osm_port_profile_t* const p_prof ) { CL_ASSERT( p_prof ); - cl_memclr( p_prof, sizeof(*p_prof) ); + memset( p_prof, 0, sizeof(*p_prof) ); } /* * PARAMETERS Index: osm/include/opensm/osm_mtree.h =================================================================== --- osm/include/opensm/osm_mtree.h (revision 7286) +++ osm/include/opensm/osm_mtree.h (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -154,7 +154,7 @@ static inline void osm_mtree_node_construct( IN osm_mtree_node_t* const p_mtn ) { - cl_memclr( p_mtn, sizeof(*p_mtn) ); + memset( p_mtn, 0, sizeof(*p_mtn) ); } /* * PARAMETERS Index: osm/include/opensm/osm_lin_fwd_tbl.h =================================================================== --- osm/include/opensm/osm_lin_fwd_tbl.h (revision 7286) +++ osm/include/opensm/osm_lin_fwd_tbl.h (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -351,7 +351,7 @@ osm_lin_fwd_tbl_set_block( if( lid_start + num_lids > p_tbl->size ) return( IB_INVALID_PARAMETER ); - cl_memcpy( &p_tbl->port_tbl[lid_start], p_block, num_lids ); + memcpy( &p_tbl->port_tbl[lid_start], p_block, num_lids ); return( IB_SUCCESS ); } /* Index: osm/include/complib/cl_memory.h =================================================================== --- osm/include/complib/cl_memory.h (revision 7286) +++ osm/include/complib/cl_memory.h (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -437,7 +437,7 @@ cl_malloc( * environments. * * SEE ALSO -* Memory Management, cl_free, cl_zalloc, cl_memset, cl_memclr, cl_memcpy, cl_memcmp +* Memory Management, cl_free, cl_zalloc **********/ @@ -468,7 +468,7 @@ cl_zalloc( * environments. * * SEE ALSO -* Memory Management, cl_free, cl_malloc, cl_memset, cl_memclr, cl_memcpy, cl_memcmp +* Memory Management, cl_free, cl_malloc **********/ @@ -503,142 +503,6 @@ cl_free( **********/ -/****f* Public: Memory Management/cl_memset -* NAME -* cl_memset -* -* DESCRIPTION -* The cl_memset function sets every byte in a memory range to a given value. -* -* SYNOPSIS -*/ -void -cl_memset( - IN void* const p_memory, - IN const uint8_t fill, - IN const size_t count ); -/* -* PARAMETERS -* p_memory -* [in] Pointer to a memory block. -* -* fill -* [in] Byte value with which to fill the memory. -* -* count -* [in] Number of bytes to set. -* -* RETURN VALUE -* This function does not return a value. -* -* SEE ALSO -* Memory Management, cl_memclr, cl_memcpy, cl_memcmp -**********/ - - -/****f* Public: Memory Management/cl_memclr -* NAME -* cl_memclr -* -* DESCRIPTION -* The cl_memclr function sets every byte in a memory range to zero. -* -* SYNOPSIS -*/ -static inline void -cl_memclr( - IN void* const p_memory, - IN const size_t count ) -{ - cl_memset( p_memory, 0, count ); -} -/* -* PARAMETERS -* p_memory -* [in] Pointer to a memory block. -* -* count -* [in] Number of bytes to set. -* -* RETURN VALUE -* This function does not return a value. -* -* SEE ALSO -* Memory Management, cl_memset, cl_memcpy, cl_memcmp -**********/ - - -/****f* Public: Memory Management/cl_memcpy -* NAME -* cl_memcpy -* -* DESCRIPTION -* The cl_memcpy function copies a given number of bytes from -* one buffer to another. -* -* SYNOPSIS -*/ -void* -cl_memcpy( - IN void* const p_dest, - IN const void* const p_src, - IN const size_t count ); -/* -* PARAMETERS -* p_dest -* [in] Pointer to the buffer being copied to. -* -* p_src -* [in] Pointer to the buffer being copied from. -* -* count -* [in] Number of bytes to copy from the source buffer to the -* destination buffer. -* -* RETURN VALUE -* This function does not return a value. -* -* SEE ALSO -* Memory Management, cl_memset, cl_memclr, cl_memcmp -**********/ - - -/****f* Public: Memory Management/cl_memcmp -* NAME -* cl_memcmp -* -* DESCRIPTION -* The cl_memcmp function compares two memory buffers. -* -* SYNOPSIS -*/ -int32_t -cl_memcmp( - IN const void* const p_mem, - IN const void* const p_ref, - IN const size_t count ); -/* -* PARAMETERS -* p_mem -* [in] Pointer to a memory block being compared. -* -* p_ref -* [in] Pointer to the reference memory block to compare against. -* -* count -* [in] Number of bytes to compare. -* -* RETURN VALUES -* Returns less than zero if p_mem is less than p_ref. -* -* Returns greater than zero if p_mem is greater than p_ref. -* -* Returns zero if the two memory regions are the identical. -* -* SEE ALSO -* Memory Management, cl_memset, cl_memclr, cl_memcpy -**********/ - /****f* Public: Memory Management/cl_get_pagesize * NAME * cl_get_pagesize Index: osm/include/complib/cl_byteswap.h =================================================================== --- osm/include/complib/cl_byteswap.h (revision 7286) +++ osm/include/complib/cl_byteswap.h (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -528,7 +528,7 @@ cl_ntoh( * the destination. */ if( p_src != p_dest ) - cl_memcpy( p_dest, p_src, size ); + memcpy( p_dest, p_src, size ); #endif } /* Index: osm/include/iba/ib_types.h =================================================================== --- osm/include/iba/ib_types.h (revision 7286) +++ osm/include/iba/ib_types.h (working copy) @@ -3659,15 +3659,15 @@ ib_smp_init_new( p_smp->dr_slid = dr_slid; p_smp->dr_dlid = dr_dlid; - cl_memclr( p_smp->resv1, - sizeof(p_smp->resv1) + - sizeof(p_smp->data) + - sizeof(p_smp->initial_path) + - sizeof(p_smp->return_path) ); + memset( p_smp->resv1, 0, + sizeof(p_smp->resv1) + + sizeof(p_smp->data) + + sizeof(p_smp->initial_path) + + sizeof(p_smp->return_path) ); /* copy the path */ - cl_memcpy( &p_smp->initial_path, path_out, - sizeof( p_smp->initial_path ) ); + memcpy( &p_smp->initial_path, path_out, + sizeof( p_smp->initial_path ) ); } /* * PARAMETERS Index: osm/include/vendor/osm_vendor_mlx_svc.h =================================================================== --- osm/include/vendor/osm_vendor_mlx_svc.h (revision 7286) +++ osm/include/vendor/osm_vendor_mlx_svc.h (working copy) @@ -192,8 +192,7 @@ osmv_mad_copy(IN const ib_mad_t *p_mad) p_copy = cl_zalloc(MAD_BLOCK_SIZE); if (NULL != p_copy) { - - cl_memcpy(p_copy, p_mad, MAD_BLOCK_SIZE); + memcpy(p_copy, p_mad, MAD_BLOCK_SIZE); } return p_copy; Index: osm/libvendor/osm_vendor_mlx_dispatcher.c =================================================================== --- osm/libvendor/osm_vendor_mlx_dispatcher.c (revision 7286) +++ osm/libvendor/osm_vendor_mlx_dispatcher.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -268,7 +268,7 @@ __osmv_dispatch_simple_mad(IN osm_bind_h p_mad_buf = osm_madw_get_mad_ptr(p_madw); /* Copy the payload to the MAD buffer */ - cl_memcpy((void*)p_mad_buf, (void*)p_mad, MAD_BLOCK_SIZE); + memcpy((void*)p_mad_buf, (void*)p_mad, MAD_BLOCK_SIZE); if (NULL != p_txn) { Index: osm/libvendor/osm_vendor_mlx_ts.c =================================================================== --- osm/libvendor/osm_vendor_mlx_ts.c (revision 7286) +++ osm/libvendor/osm_vendor_mlx_ts.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -292,7 +292,7 @@ osmv_transport_mad_send(IN const osm_bin OSM_LOG_ENTER( p_vend->p_log, osmv_transport_mad_send); - cl_memclr(&ts_mad,sizeof(ts_mad)); + memset(&ts_mad, 0, sizeof(ts_mad)); /* Make sure the p_bo object is still relevant */ if (( p_bo->magic_ptr != p_bo) || p_bo->is_closing ) @@ -301,7 +301,7 @@ osmv_transport_mad_send(IN const osm_bin /* * Copy the MAD over to the sent mad */ - cl_memcpy(&ts_mad, p_mad_hdr, MAD_BLOCK_SIZE); + memcpy(&ts_mad, p_mad_hdr, MAD_BLOCK_SIZE); /* * For all sends other than directed route SM MADs, @@ -376,7 +376,7 @@ __osm_transport_gen_dummy_mad(osmv_bind_ int ts_ioctl_ret; /* prepare the mad fields following the stored filter on the bind */ - cl_memclr(&ts_mad, sizeof(ts_mad)); + memset(&ts_mad, 0, sizeof(ts_mad)); ts_mad.format_version = 1; ts_mad.mgmt_class = p_mgr->filter.mgmt_class; ts_mad.attribute_id = 0x2; @@ -482,9 +482,9 @@ __osmv_TOPSPIN_mad_addr_to_osm_addr( p_rcv_desc->grh.traffic_class, p_rcv_desc->grh.flow_label); p_mad_addr->addr_type.gsi.grh_info.hop_limit = p_rcv_desc->grh.hop_limit; - cl_memcpy(&p_mad_addr->addr_type.gsi.grh_info.src_gid.raw, + memcpy(&p_mad_addr->addr_type.gsi.grh_info.src_gid.raw, &p_rcv_desc->grh.sgid, sizeof(ib_net64_t)); - cl_memcpy(&p_mad_addr->addr_type.gsi.grh_info.dest_gid.raw, + memcpy(&p_mad_addr->addr_type.gsi.grh_info.dest_gid.raw, p_rcv_desc->grh.dgid, sizeof(ib_net64_t)); } */ @@ -512,7 +512,7 @@ osm_vendor_set_sm( OSM_LOG_ENTER( p_vend->p_log, osm_vendor_set_sm ); - cl_memclr(&set_port_data, sizeof(set_port_data)); + memset(&set_port_data, 0, sizeof(set_port_data)); set_port_data.port = p_bo->port_num; set_port_data.port_info.valid_fields = IB_PORT_IS_SM; Index: osm/libvendor/osm_pkt_randomizer.c =================================================================== --- osm/libvendor/osm_pkt_randomizer.c (revision 7286) +++ osm/libvendor/osm_pkt_randomizer.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -196,8 +196,8 @@ __osm_pkt_randomizer_process_path( buf, rand_value, p_pkt_rand->osm_pkt_unstable_link_rate ); /* update the path in the fault paths */ - cl_memcpy( &(p_pkt_rand->fault_dr_paths[p_pkt_rand->num_paths_initialized]), - p_dr_path, sizeof(osm_dr_path_t) ); + memcpy( &(p_pkt_rand->fault_dr_paths[p_pkt_rand->num_paths_initialized]), + p_dr_path, sizeof(osm_dr_path_t) ); p_pkt_rand->num_paths_initialized++; in_fault_paths = TRUE; } @@ -288,7 +288,7 @@ osm_pkt_randomizer_init( res = IB_INSUFFICIENT_MEMORY; goto Exit; } - cl_memclr( *pp_pkt_randomizer, sizeof(osm_pkt_randomizer_t) ); + memset( *pp_pkt_randomizer, 0, sizeof(osm_pkt_randomizer_t) ); (*pp_pkt_randomizer)->num_paths_initialized = 0; tmp = atol( getenv("OSM_PKT_DROP_RATE") ); Index: osm/libvendor/osm_vendor_mlx_hca.c =================================================================== --- osm/libvendor/osm_vendor_mlx_hca.c (revision 7286) +++ osm/libvendor/osm_vendor_mlx_hca.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -71,7 +71,7 @@ ib_api_status_t __osm_vendor_gid_to_guid( IN u_int8_t * gid, OUT VAPI_gid_t * guid ) { - cl_memcpy( guid, gid + 8, 8 ); + memcpy( guid, gid + 8, 8 ); return ( IB_SUCCESS ); } Index: osm/libvendor/osm_vendor_mlx_sa.c =================================================================== --- osm/libvendor/osm_vendor_mlx_sa.c (revision 7286) +++ osm/libvendor/osm_vendor_mlx_sa.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -536,7 +536,7 @@ __osmv_send_sa_req( cl_atomic_inc( &trans_id ); /* Cleanup the MAD from any residue */ - cl_memclr(p_sa_mad, MAD_BLOCK_SIZE); + memset(p_sa_mad, 0, MAD_BLOCK_SIZE); /* Initialize the standard MAD header. */ ib_mad_init_new( @@ -555,8 +555,8 @@ __osmv_send_sa_req( p_sa_mad->comp_mask = p_sa_mad_data->comp_mask; if( p_sa_mad->comp_mask ) { - cl_memcpy( p_sa_mad->data, p_sa_mad_data->p_attr, - ib_get_attr_size(p_sa_mad_data->attr_offset)); + memcpy( p_sa_mad->data, p_sa_mad_data->p_attr, + ib_get_attr_size(p_sa_mad_data->attr_offset)); } /* @@ -674,8 +674,8 @@ osmv_query_sa( sa_mad_data.attr_offset = ib_get_attr_offset( sizeof( ib_service_record_t ) ); sa_mad_data.p_attr = &svc_rec; - cl_memcpy( svc_rec.service_name, p_query_req->p_query_input, - sizeof( ib_svc_name_t ) ); + memcpy( svc_rec.service_name, p_query_req->p_query_input, + sizeof( ib_svc_name_t ) ); break; case OSMV_QUERY_SVC_REC_BY_ID: @@ -765,7 +765,7 @@ osmv_query_sa( case OSMV_QUERY_PATH_REC_BY_PORT_GUIDS: osm_log( p_log, OSM_LOG_DEBUG, "osmv_query_sa DBG:001 %s","PATH_REC_BY_PORT_GUIDS\n" ); - cl_memclr(&path_rec, sizeof(ib_path_rec_t )); + memset(&path_rec, 0, sizeof(ib_path_rec_t )); sa_mad_data.attr_id = IB_MAD_ATTR_PATH_RECORD; sa_mad_data.attr_offset = ib_get_attr_offset( sizeof( ib_path_rec_t ) ); @@ -784,24 +784,24 @@ osmv_query_sa( case OSMV_QUERY_PATH_REC_BY_GIDS: osm_log( p_log, OSM_LOG_DEBUG, "osmv_query_sa DBG:001 %s","PATH_REC_BY_GIDS\n" ); - cl_memclr(&path_rec, sizeof(ib_path_rec_t )); + memset(&path_rec, 0, sizeof(ib_path_rec_t )); sa_mad_data.attr_id = IB_MAD_ATTR_PATH_RECORD; sa_mad_data.attr_offset = ib_get_attr_offset( sizeof( ib_path_rec_t ) ); sa_mad_data.comp_mask = ( IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID ); sa_mad_data.p_attr = &path_rec; - cl_memcpy( &path_rec.dgid, - &( ( osmv_gid_pair_t * ) ( p_query_req->p_query_input ) )-> - dest_gid, sizeof( ib_gid_t ) ); - cl_memcpy( &path_rec.sgid, - &( ( osmv_gid_pair_t * ) ( p_query_req->p_query_input ) )-> - src_gid, sizeof( ib_gid_t ) ); + memcpy( &path_rec.dgid, + &( ( osmv_gid_pair_t * ) ( p_query_req->p_query_input ) )-> + dest_gid, sizeof( ib_gid_t ) ); + memcpy( &path_rec.sgid, + &( ( osmv_gid_pair_t * ) ( p_query_req->p_query_input ) )-> + src_gid, sizeof( ib_gid_t ) ); break; case OSMV_QUERY_PATH_REC_BY_LIDS: osm_log( p_log, OSM_LOG_DEBUG, "osmv_query_sa DBG:001 %s","PATH_REC_BY_LIDS\n" ); - cl_memclr(&path_rec, sizeof(ib_path_rec_t )); + memset(&path_rec, 0, sizeof(ib_path_rec_t )); sa_mad_data.method = IB_MAD_METHOD_GET; sa_mad_data.attr_id = IB_MAD_ATTR_PATH_RECORD; sa_mad_data.attr_offset = Index: osm/libvendor/osm_vendor_ibumad_sa.c =================================================================== --- osm/libvendor/osm_vendor_ibumad_sa.c (revision 7286) +++ osm/libvendor/osm_vendor_ibumad_sa.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -528,7 +528,7 @@ __osmv_send_sa_req( cl_atomic_inc( &trans_id ); /* Cleanup the MAD from any residue */ - cl_memclr(p_sa_mad, MAD_BLOCK_SIZE); + memset(p_sa_mad, 0, MAD_BLOCK_SIZE); /* Initialize the standard MAD header. */ ib_mad_init_new( @@ -551,8 +551,8 @@ __osmv_send_sa_req( #endif if( p_sa_mad->comp_mask ) { - cl_memcpy( p_sa_mad->data, p_sa_mad_data->p_attr, - ib_get_attr_size(p_sa_mad_data->attr_offset)); + memcpy( p_sa_mad->data, p_sa_mad_data->p_attr, + ib_get_attr_size(p_sa_mad_data->attr_offset)); } /* @@ -670,8 +670,8 @@ osmv_query_sa( sa_mad_data.attr_offset = ib_get_attr_offset( sizeof( ib_service_record_t ) ); sa_mad_data.p_attr = &svc_rec; - cl_memcpy( svc_rec.service_name, p_query_req->p_query_input, - sizeof( ib_svc_name_t ) ); + memcpy( svc_rec.service_name, p_query_req->p_query_input, + sizeof( ib_svc_name_t ) ); break; case OSMV_QUERY_SVC_REC_BY_ID: @@ -759,7 +759,7 @@ osmv_query_sa( case OSMV_QUERY_PATH_REC_BY_PORT_GUIDS: osm_log( p_log, OSM_LOG_DEBUG, "osmv_query_sa DBG:001 %s","PATH_REC_BY_PORT_GUIDS\n" ); - cl_memclr(&path_rec, sizeof(ib_path_rec_t )); + memset(&path_rec, 0, sizeof(ib_path_rec_t )); sa_mad_data.attr_id = IB_MAD_ATTR_PATH_RECORD; sa_mad_data.attr_offset = ib_get_attr_offset( sizeof( ib_path_rec_t ) ); @@ -778,24 +778,24 @@ osmv_query_sa( case OSMV_QUERY_PATH_REC_BY_GIDS: osm_log( p_log, OSM_LOG_DEBUG, "osmv_query_sa DBG:001 %s","PATH_REC_BY_GIDS\n" ); - cl_memclr(&path_rec, sizeof(ib_path_rec_t )); + memset(&path_rec, 0, sizeof(ib_path_rec_t )); sa_mad_data.attr_id = IB_MAD_ATTR_PATH_RECORD; sa_mad_data.attr_offset = ib_get_attr_offset( sizeof( ib_path_rec_t ) ); sa_mad_data.comp_mask = ( IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID ); sa_mad_data.p_attr = &path_rec; - cl_memcpy( &path_rec.dgid, - &( ( osmv_gid_pair_t * ) ( p_query_req->p_query_input ) )-> - dest_gid, sizeof( ib_gid_t ) ); - cl_memcpy( &path_rec.sgid, - &( ( osmv_gid_pair_t * ) ( p_query_req->p_query_input ) )-> - src_gid, sizeof( ib_gid_t ) ); + memcpy( &path_rec.dgid, + &( ( osmv_gid_pair_t * ) ( p_query_req->p_query_input ) )-> + dest_gid, sizeof( ib_gid_t ) ); + memcpy( &path_rec.sgid, + &( ( osmv_gid_pair_t * ) ( p_query_req->p_query_input ) )-> + src_gid, sizeof( ib_gid_t ) ); break; case OSMV_QUERY_PATH_REC_BY_LIDS: osm_log( p_log, OSM_LOG_DEBUG, "osmv_query_sa DBG:001 %s","PATH_REC_BY_LIDS\n" ); - cl_memclr(&path_rec, sizeof(ib_path_rec_t )); + memset(&path_rec, 0, sizeof(ib_path_rec_t )); sa_mad_data.method = IB_MAD_METHOD_GET; sa_mad_data.attr_id = IB_MAD_ATTR_PATH_RECORD; sa_mad_data.attr_offset = @@ -848,7 +848,7 @@ osmv_query_sa( CL_ASSERT( 0 ); return IB_ERROR; } - cl_memclr(&multipath_rec, sizeof(ib_multipath_rec_t )); + memset(&multipath_rec, 0, sizeof(ib_multipath_rec_t )); sa_mad_data.method = IB_MAD_METHOD_GETMULTI; sa_mad_data.attr_id = IB_MAD_ATTR_MULTIPATH_RECORD; sa_mad_data.attr_offset = Index: osm/libvendor/osm_vendor_mlx_sender.c =================================================================== --- osm/libvendor/osm_vendor_mlx_sender.c (revision 7286) +++ osm/libvendor/osm_vendor_mlx_sender.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,8 +77,8 @@ osmv_simple_send_madw(IN osm_bind_handle CL_ASSERT( p_madw->mad_size <= MAD_BLOCK_SIZE ); - cl_memclr(p_mad, MAD_BLOCK_SIZE); - cl_memcpy(p_mad, osm_madw_get_mad_ptr(p_madw), p_madw->mad_size); + memset(p_mad, 0, MAD_BLOCK_SIZE); + memcpy(p_mad, osm_madw_get_mad_ptr(p_madw), p_madw->mad_size); if (NULL != p_txn) { @@ -275,7 +275,7 @@ osmv_rmpp_send_ack(IN osm_bind_handle_t } #endif - cl_memcpy(p_resp_mad, p_req_mad, MAD_BLOCK_SIZE); + memcpy(p_resp_mad, p_req_mad, MAD_BLOCK_SIZE); p_resp_mad->common_hdr.method = osmv_invert_method(p_req_mad->method); p_resp_mad->rmpp_type = IB_RMPP_TYPE_ACK; @@ -302,7 +302,7 @@ osmv_rmpp_send_nak(IN osm_bind_handle_t uint8_t resp_mad[MAD_BLOCK_SIZE]; ib_rmpp_mad_t *p_resp_mad = (ib_rmpp_mad_t*)resp_mad; - cl_memcpy(p_resp_mad, p_req_mad, MAD_BLOCK_SIZE); + memcpy(p_resp_mad, p_req_mad, MAD_BLOCK_SIZE); p_resp_mad->common_hdr.method = osmv_invert_method(p_req_mad->method); p_resp_mad->rmpp_type = nak_type; Index: osm/libvendor/osm_vendor_mlx_sar.c =================================================================== --- osm/libvendor/osm_vendor_mlx_sar.c (revision 7286) +++ osm/libvendor/osm_vendor_mlx_sar.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -95,10 +95,10 @@ osmv_rmpp_sar_get_mad_seg( } /* cleanup */ - cl_memclr(p_buf, MAD_BLOCK_SIZE); + memset(p_buf, 0, MAD_BLOCK_SIZE); /* attach header */ - cl_memcpy(p_buf,p_sar->p_arbt_mad,p_sar->hdr_sz); + memcpy(p_buf,p_sar->p_arbt_mad,p_sar->hdr_sz); /* fill data */ @@ -106,10 +106,10 @@ osmv_rmpp_sar_get_mad_seg( sz_left = p_sar->data_len - ((seg_idx -1) * p_sar->data_sz); if (sz_left > p_sar->data_sz) { - cl_memcpy((char*)p_buf+p_sar->hdr_sz,(char*)p_seg,p_sar->data_sz); + memcpy((char*)p_buf+p_sar->hdr_sz,(char*)p_seg,p_sar->data_sz); } else - cl_memcpy((char*)p_buf+ p_sar->hdr_sz, (char*)p_seg, sz_left); + memcpy((char*)p_buf+ p_sar->hdr_sz, (char*)p_seg, sz_left); return IB_SUCCESS; @@ -135,7 +135,7 @@ osmv_rmpp_sar_reassemble_arbt_mad(osmv_r p_item = cl_qlist_head(p_bufs); p_obj = PARENT_STRUCT(p_item, cl_list_obj_t, list_item); buf_tmp = cl_qlist_obj(p_obj); - cl_memcpy(p_mad,buf_tmp,p_sar->hdr_sz); + memcpy(p_mad,buf_tmp,p_sar->hdr_sz); p_mad = (char*)p_mad + p_sar->hdr_sz; space_left-= p_sar->hdr_sz; @@ -148,14 +148,14 @@ osmv_rmpp_sar_reassemble_arbt_mad(osmv_r if (FALSE == cl_is_qlist_empty(p_bufs)) { - cl_memcpy((char*)p_mad,(char*)buf_tmp+p_sar->hdr_sz,p_sar->data_sz); + memcpy((char*)p_mad,(char*)buf_tmp+p_sar->hdr_sz,p_sar->data_sz); p_mad = (char*)p_mad + p_sar->data_sz; space_left-= p_sar->data_sz; } else { /* the last mad on the list */ - cl_memcpy((char*)p_mad,(char*)buf_tmp+p_sar->hdr_sz,space_left); + memcpy((char*)p_mad,(char*)buf_tmp+p_sar->hdr_sz,space_left); p_mad= (char*)p_mad+space_left; } Index: osm/libvendor/osm_vendor_mlx_sim.c =================================================================== --- osm/libvendor/osm_vendor_mlx_sim.c (revision 7286) +++ osm/libvendor/osm_vendor_mlx_sim.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -257,7 +257,7 @@ osmv_transport_mad_send(IN const osm_bin OSM_LOG_ENTER( p_vend->p_log, osmv_transport_mad_send); - cl_memclr(&mad_msg,sizeof(mad_msg)); + memset(&mad_msg, 0, sizeof(mad_msg)); /* Make sure the p_bo object is still relevant */ if (( p_bo->magic_ptr != p_bo) || p_bo->is_closing ) @@ -266,7 +266,7 @@ osmv_transport_mad_send(IN const osm_bin /* * Copy the MAD over to the sent mad */ - cl_memcpy(&mad_msg.header, p_mad_hdr, MAD_BLOCK_SIZE); + memcpy(&mad_msg.header, p_mad_hdr, MAD_BLOCK_SIZE); /* * For all sends other than directed route SM MADs, @@ -411,9 +411,9 @@ __osmv_ibms_mad_addr_to_osm_addr( p_rcv_desc->grh.traffic_class, p_rcv_desc->grh.flow_label); p_osm_addr->addr_type.gsi.grh_info.hop_limit = p_rcv_desc->grh.hop_limit; - cl_memcpy(&p_osm_addr->addr_type.gsi.grh_info.src_gid.raw, + memcpy(&p_osm_addr->addr_type.gsi.grh_info.src_gid.raw, &p_rcv_desc->grh.sgid, sizeof(ib_net64_t)); - cl_memcpy(&p_osm_addr->addr_type.gsi.grh_info.dest_gid.raw, + memcpy(&p_osm_addr->addr_type.gsi.grh_info.dest_gid.raw, p_rcv_desc->grh.dgid, sizeof(ib_net64_t)); } */ Index: osm/libvendor/osm_vendor_umadt.c =================================================================== --- osm/libvendor/osm_vendor_umadt.c (revision 7286) +++ osm/libvendor/osm_vendor_umadt.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -325,7 +325,7 @@ osm_vendor_get_ports( /* query each ca & copy its info into callers buffer */ for ( caCount = 0; caCount < caGuidCount; caCount++ ) { - cl_memclr ( &caAttributes,sizeof ( IB_CA_ATTRIBUTES ) ); + memset ( &caAttributes, 0, sizeof ( IB_CA_ATTRIBUTES ) ); /* Open the CA */ Status = p_umadt_obj->IbtInterface.Vpi.OpenCA ( CaGuidArray[caCount], NULL, /* CACompletionCallback */ @@ -348,7 +348,7 @@ osm_vendor_get_ports( if ( caAttributes.Ports > free_guids) { *p_num_guids = 0; - cl_memclr (p_guids, (*p_num_guids)*sizeof(uint64_t)); + memset (p_guids, 0, (*p_num_guids)*sizeof(uint64_t)); return IB_INSUFFICIENT_MEMORY; } @@ -360,12 +360,12 @@ osm_vendor_get_ports( { p_umadt_obj->IbtInterface.Vpi.CloseCA ( caHandle ); *p_num_guids = 0; - cl_memclr (p_guids, (*p_num_guids)*sizeof(uint64_t)); + memset (p_guids, 0, (*p_num_guids)*sizeof(uint64_t)); return IB_INSUFFICIENT_MEMORY; } - cl_memclr ( pPortAttributesList, - caAttributes.PortAttributesListSize ); + memset ( pPortAttributesList, 0, + caAttributes.PortAttributesListSize ); caAttributes.PortAttributesList = pPortAttributesList; @@ -376,7 +376,7 @@ osm_vendor_get_ports( { p_umadt_obj->IbtInterface.Vpi.CloseCA ( caHandle ); *p_num_guids = 0; - cl_memclr (p_guids, (*p_num_guids)*sizeof(uint64_t)); + memset (p_guids, 0, (*p_num_guids)*sizeof(uint64_t)); return IB_ERROR; } @@ -616,7 +616,7 @@ osm_vendor_send( } /* No Segmentation required */ - cl_memcpy(&p_madt_struct->IBMad, p_mad, MAD_BLOCK_SIZE); + memcpy(&p_madt_struct->IBMad, p_mad, MAD_BLOCK_SIZE); /* Post the MAD */ @@ -668,7 +668,7 @@ osm_vendor_send( if ( i == 0) /* First Packet */ { /* Since this is the first MAD, copy the entire MAD_SIZE */ - cl_memcpy(&p_madt_struct->IBMad, p_mad, MAD_BLOCK_SIZE); + memcpy(&p_madt_struct->IBMad, p_mad, MAD_BLOCK_SIZE); p_frag_data = (uint8_t*)p_mad + MAD_BLOCK_SIZE; @@ -691,21 +691,21 @@ osm_vendor_send( else if ( i < num_mads -1) /* Not last packet */ { /* First copy only the header */ - cl_memcpy(&p_madt_struct->IBMad, p_mad, IB_SA_MAD_HDR_SIZE); + memcpy(&p_madt_struct->IBMad, p_mad, IB_SA_MAD_HDR_SIZE); /* Set the relevant fields in the SA_MAD_HEADER */ p_sa_mad = (ib_sa_mad_t_vM3*)&p_madt_struct->IBMad; p_sa_mad->payload_len = cl_ntoh32(IB_SA_DATA_SIZE); p_sa_mad->seg_num = cl_ntoh32(seg_num++); p_sa_mad->frag_flag = 0; /* Now copy the fragmented data */ - cl_memcpy(((uint8_t*)&p_madt_struct->IBMad) + IB_SA_MAD_HDR_SIZE, p_frag_data, IB_SA_DATA_SIZE); + memcpy(((uint8_t*)&p_madt_struct->IBMad) + IB_SA_MAD_HDR_SIZE, p_frag_data, IB_SA_DATA_SIZE); p_frag_data = p_frag_data + IB_SA_DATA_SIZE; } else if ( i == num_mads - 1) /* Last packet */ { /* First copy only the header */ - cl_memcpy(&p_madt_struct->IBMad, p_mad, IB_SA_MAD_HDR_SIZE); + memcpy(&p_madt_struct->IBMad, p_mad, IB_SA_MAD_HDR_SIZE); /* Set the relevant fields in the SA_MAD_HEADER */ p_sa_mad = (ib_sa_mad_t_vM3*)&p_madt_struct->IBMad; p_sa_mad->seg_num = cl_ntoh32(seg_num++); @@ -713,7 +713,7 @@ osm_vendor_send( p_sa_mad->payload_len = cl_ntoh32(cl_ntoh32(((ib_sa_mad_t_vM3*)p_mad)->payload_len) % IB_SA_DATA_SIZE); /* Now copy the fragmented data */ - cl_memcpy((((uint8_t*)&p_madt_struct->IBMad)) + IB_SA_MAD_HDR_SIZE, + memcpy((((uint8_t*)&p_madt_struct->IBMad)) + IB_SA_MAD_HDR_SIZE, p_frag_data, cl_ntoh32(p_sa_mad->payload_len)); p_frag_data = p_frag_data + IB_SA_DATA_SIZE; Index: osm/libvendor/osm_vendor_mlx_rmpp_ctx.c =================================================================== --- osm/libvendor/osm_vendor_mlx_rmpp_ctx.c (revision 7286) +++ osm/libvendor/osm_vendor_mlx_rmpp_ctx.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -264,7 +264,7 @@ osmv_rmpp_recv_ctx_store_mad_seg(IN osmv { return IB_INSUFFICIENT_MEMORY; } - cl_memcpy(p_list_mad,p_mad,MAD_BLOCK_SIZE); + memcpy(p_list_mad,p_mad,MAD_BLOCK_SIZE); p_obj = cl_zalloc(sizeof(cl_list_obj_t)); if (NULL == p_obj) Index: osm/libvendor/osm_vendor_test.c =================================================================== --- osm/libvendor/osm_vendor_test.c (revision 7286) +++ osm/libvendor/osm_vendor_test.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -67,7 +67,7 @@ void osm_vendor_construct( IN osm_vendor_t* const p_vend ) { - cl_memclr( p_vend, sizeof(*p_vend) ); + memset( p_vend, 0, sizeof(*p_vend) ); } /********************************************************************** Index: osm/libvendor/osm_vendor_mlx_ibmgt.c =================================================================== --- osm/libvendor/osm_vendor_mlx_ibmgt.c (revision 7286) +++ osm/libvendor/osm_vendor_mlx_ibmgt.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -387,7 +387,7 @@ osmv_transport_mad_send(IN const osm_bin else { /* is a directed route - we need to construct a permissive address */ - cl_memclr( &av, sizeof( av ) ); + memset( &av, 0, sizeof( av ) ); /* we do not need port number since it is part of the mad_hndl */ av.dlid = IB_LID_PERMISSIVE; } @@ -711,7 +711,7 @@ __osmv_IBMGT_rcv_desc_to_osm_addr( p_mad_addr->static_rate = 0; /* HACK - we do not know the rate ! */ p_mad_addr->path_bits = p_rcv_desc->local_path_bits; /* Clear the grh any way to avoid unset fields */ - cl_memclr(&p_mad_addr->addr_type.gsi.grh_info, sizeof(p_mad_addr->addr_type.gsi.grh_info)); + memset(&p_mad_addr->addr_type.gsi.grh_info, 0, sizeof(p_mad_addr->addr_type.gsi.grh_info)); if (is_smi) { @@ -746,10 +746,10 @@ __osmv_IBMGT_rcv_desc_to_osm_addr( p_rcv_desc->grh.traffic_class, p_rcv_desc->grh.flow_label); p_mad_addr->addr_type.gsi.grh_info.hop_limit = p_rcv_desc->grh.hop_limit; - cl_memcpy(&p_mad_addr->addr_type.gsi.grh_info.src_gid.raw, - &p_rcv_desc->grh.sgid, sizeof(ib_net64_t)); - cl_memcpy(&p_mad_addr->addr_type.gsi.grh_info.dest_gid.raw, - p_rcv_desc->grh.dgid, sizeof(ib_net64_t)); + memcpy(&p_mad_addr->addr_type.gsi.grh_info.src_gid.raw, + &p_rcv_desc->grh.sgid, sizeof(ib_net64_t)); + memcpy(&p_mad_addr->addr_type.gsi.grh_info.dest_gid.raw, + p_rcv_desc->grh.dgid, sizeof(ib_net64_t)); } } } @@ -766,7 +766,7 @@ __osmv_IBMGT_osm_addr_to_ibmgt_addr( /* For global destination or Multicast address:*/ u_int8_t ver; - cl_memclr( p_av, sizeof( IB_ud_av_t ) ); + memset( p_av, 0, sizeof( IB_ud_av_t ) ); p_av->src_path_bits = p_mad_addr->path_bits; p_av->static_rate = p_mad_addr->static_rate; @@ -789,10 +789,9 @@ __osmv_IBMGT_osm_addr_to_ibmgt_addr( &p_av->flow_label); p_av->hop_limit = p_mad_addr->addr_type.gsi.grh_info.hop_limit; p_av->sgid_index = 0; /* we always use source GID 0 */ - cl_memcpy(&p_av->dgid, - &p_mad_addr->addr_type.gsi.grh_info.dest_gid.raw, - sizeof(ib_net64_t)); - + memcpy(&p_av->dgid, + &p_mad_addr->addr_type.gsi.grh_info.dest_gid.raw, + sizeof(ib_net64_t)); } } Index: osm/libvendor/osm_vendor_ts.c =================================================================== --- osm/libvendor/osm_vendor_ts.c (revision 7286) +++ osm/libvendor/osm_vendor_ts.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -60,7 +60,7 @@ __osm_set_wrid_by_p_madw(IN osm_madw_t * CL_ASSERT(p_madw->p_mad); - cl_memcpy( &wrid, &p_madw, sizeof( osm_madw_t * ) ); + memcpy( &wrid, &p_madw, sizeof( osm_madw_t * ) ); wrid = (wrid << 1) | ib_mad_is_response(p_madw->p_mad) | (p_madw->p_mad->method == IB_MAD_METHOD_TRAP_REPRESS); @@ -74,7 +74,7 @@ __osm_set_p_madw_and_resp_by_wrid( OUT osm_madw_t **pp_madw) { *is_resp = wrid & 0x0000000000000001; wrid = wrid >> 1; - cl_memcpy( pp_madw, &wrid, sizeof( osm_madw_t * ) ); + memcpy( pp_madw, &wrid, sizeof( osm_madw_t * ) ); } /********************************************************************** @@ -120,9 +120,9 @@ __osm_ts_conv_mad_rcv_desc_to_osm_addr( p_rcv_desc->grh.traffic_class, p_rcv_desc->grh.flow_label); p_mad_addr->addr_type.gsi.grh_info.hop_limit = p_rcv_desc->grh.hop_limit; - cl_memcpy(&p_mad_addr->addr_type.gsi.grh_info.src_gid.raw, + memcpy(&p_mad_addr->addr_type.gsi.grh_info.src_gid.raw, &p_rcv_desc->grh.sgid, sizeof(ib_net64_t)); - cl_memcpy(&p_mad_addr->addr_type.gsi.grh_info.dest_gid.raw, + memcpy(&p_mad_addr->addr_type.gsi.grh_info.dest_gid.raw, p_rcv_desc->grh.dgid, sizeof(ib_net64_t)); } */ @@ -167,8 +167,8 @@ __osm_vendor_clear_sm( IN osm_bind_handl OSM_LOG_ENTER( p_vend->p_log, __osm_vendor_clear_sm ); - cl_memclr( &attr_mod, sizeof( attr_mod ) ); - cl_memclr( &attr_mask, sizeof( attr_mask ) ); + memset( &attr_mod, 0, sizeof( attr_mod ) ); + memset( &attr_mask, 0, sizeof( attr_mask ) ); attr_mod.is_sm = FALSE; attr_mask = HCA_ATTR_IS_SM; @@ -193,7 +193,7 @@ __osm_vendor_clear_sm( IN osm_bind_handl void osm_vendor_construct( IN osm_vendor_t * const p_vend ) { - cl_memclr( p_vend, sizeof( *p_vend ) ); + memset( p_vend, 0, sizeof( *p_vend ) ); cl_thread_construct( &(p_vend->smi_bind.poller) ); cl_thread_construct( &(p_vend->gsi_bind.poller) ); } @@ -386,7 +386,7 @@ __osm_ts_rcv_callback( p_new_vw->p_resp_madw = NULL; p_new_vw->p_mad_buf = p_mad_buf; - cl_memcpy( p_new_vw->p_mad_buf, p_mad, mad_size ); + memcpy( p_new_vw->p_mad_buf, p_mad, mad_size ); /* attach the buffer to the wrapper */ p_madw->p_mad = p_mad_buf; @@ -724,7 +724,7 @@ osm_vendor_get( IN osm_bind_handle_t h_b goto Exit; } - cl_memclr( p_mad, p_vw->size ); + memset( p_mad, 0, p_vw->size ); /* track locally */ p_vw->p_mad_buf = p_mad; @@ -804,7 +804,7 @@ osm_ts_send_mad( /* * Copy the MAD over to the sent mad */ - cl_memcpy(&ts_mad, p_mad, 256); + memcpy(&ts_mad, p_mad, 256); /* * For all sends other than directed route SM MADs, @@ -958,8 +958,8 @@ osm_vendor_set_sm( OSM_LOG_ENTER( p_vend->p_log, osm_vendor_set_sm ); - cl_memclr( &attr_mod, sizeof( attr_mod ) ); - cl_memclr( &attr_mask, sizeof( attr_mask ) ); + memset( &attr_mod, 0, sizeof( attr_mod ) ); + memset( &attr_mask, 0, sizeof( attr_mask ) ); attr_mod.is_sm = is_sm_val; attr_mask = HCA_ATTR_IS_SM; Index: osm/libvendor/osm_vendor_mtl.c =================================================================== --- osm/libvendor/osm_vendor_mtl.c (revision 7286) +++ osm/libvendor/osm_vendor_mtl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -67,7 +67,7 @@ __osm_set_wrid_by_p_madw(IN osm_madw_t * CL_ASSERT(p_madw->p_mad); - cl_memcpy( &wrid, &p_madw, sizeof( osm_madw_t * ) ); + memcpy( &wrid, &p_madw, sizeof( osm_madw_t * ) ); wrid = (wrid << 1) | ib_mad_is_response(p_madw->p_mad) | (p_madw->p_mad->method == IB_MAD_METHOD_TRAP_REPRESS); @@ -81,7 +81,7 @@ __osm_set_p_madw_and_resp_by_wrid( OUT osm_madw_t **pp_madw) { *is_resp = wrid & 0x0000000000000001; wrid = wrid >> 1; - cl_memcpy( pp_madw, &wrid, sizeof( osm_madw_t * ) ); + memcpy( pp_madw, &wrid, sizeof( osm_madw_t * ) ); } /********************************************************************** @@ -111,7 +111,6 @@ __osm_mtl_conv_ibmgt_rcv_desc_to_osm_add /* Does IBMGT supposed to provide the QPN is network or HOST ? */ p_mad_addr->addr_type.gsi.remote_qp = cl_hton32(p_rcv_desc->qp); - p_mad_addr->addr_type.gsi.remote_qkey = IB_QP1_WELL_KNOWN_Q_KEY; /* we do have the p_mad_addr->pkey_ix but how to get the PKey by index ? */ /* the only way seems to be to use VAPI_query_hca_pkey_tbl and obtain */ @@ -131,10 +130,10 @@ __osm_mtl_conv_ibmgt_rcv_desc_to_osm_add p_rcv_desc->grh.traffic_class, p_rcv_desc->grh.flow_label); p_mad_addr->addr_type.gsi.grh_info.hop_limit = p_rcv_desc->grh.hop_limit; - cl_memcpy(&p_mad_addr->addr_type.gsi.grh_info.src_gid.raw, - &p_rcv_desc->grh.sgid, sizeof(ib_net64_t)); - cl_memcpy(&p_mad_addr->addr_type.gsi.grh_info.dest_gid.raw, - p_rcv_desc->grh.dgid, sizeof(ib_net64_t)); + memcpy(&p_mad_addr->addr_type.gsi.grh_info.src_gid.raw, + &p_rcv_desc->grh.sgid, sizeof(ib_net64_t)); + memcpy(&p_mad_addr->addr_type.gsi.grh_info.dest_gid.raw, + p_rcv_desc->grh.dgid, sizeof(ib_net64_t)); } } } @@ -152,7 +151,7 @@ __osm_mtl_conv_osm_addr_to_ibmgt_addr( /* For global destination or Multicast address:*/ u_int8_t ver; - cl_memclr( p_av, sizeof( IB_ud_av_t ) ); + memset( p_av, 0, sizeof( IB_ud_av_t ) ); p_av->src_path_bits = p_mad_addr->path_bits; p_av->static_rate = p_mad_addr->static_rate; @@ -175,9 +174,9 @@ __osm_mtl_conv_osm_addr_to_ibmgt_addr( &p_av->flow_label); p_av->hop_limit = p_mad_addr->addr_type.gsi.grh_info.hop_limit; p_av->sgid_index = 0; /* we always use source GID 0 */ - cl_memcpy(&p_av->dgid, - &p_mad_addr->addr_type.gsi.grh_info.dest_gid.raw, - sizeof(ib_net64_t)); + memcpy(&p_av->dgid, + &p_mad_addr->addr_type.gsi.grh_info.dest_gid.raw, + sizeof(ib_net64_t)); } } @@ -196,8 +195,8 @@ __osm_vendor_clear_sm( IN osm_bind_handl OSM_LOG_ENTER( p_vend->p_log, __osm_vendor_clear_sm ); - cl_memclr( &attr_mod, sizeof( attr_mod ) ); - cl_memclr( &attr_mask, sizeof( attr_mask ) ); + memset( &attr_mod, 0, sizeof( attr_mod ) ); + memset( &attr_mask, 0, sizeof( attr_mask ) ); attr_mod.is_sm = FALSE; attr_mask = HCA_ATTR_IS_SM; @@ -216,14 +215,13 @@ __osm_vendor_clear_sm( IN osm_bind_handl OSM_LOG_EXIT( p_vend->p_log ); } - /********************************************************************** * ANY CONSTRUCTION OF THE osm_vendor_t OBJECT **********************************************************************/ void osm_vendor_construct( IN osm_vendor_t * const p_vend ) { - cl_memclr( p_vend, sizeof( *p_vend ) ); + memset( p_vend, 0, sizeof( *p_vend ) ); } /********************************************************************** @@ -519,7 +517,7 @@ __osm_mtl_rcv_callback( IN IB_MGT_mad_hn p_new_vw->mad_buf_p = mad_buf_p; /* HACK: We do not support RMPP in receiving MADS */ - cl_memcpy( p_new_vw->mad_buf_p, payload_p, MAD_BLOCK_SIZE ); + memcpy( p_new_vw->mad_buf_p, payload_p, MAD_BLOCK_SIZE ); /* attach the buffer to the wrapper */ madw_p->p_mad = mad_buf_p; @@ -676,7 +674,7 @@ osm_vendor_bind( IN osm_vendor_t * const /* create the bind object tracking this binding */ p_bind = (osm_mtl_bind_info_t *)cl_malloc( sizeof(osm_mtl_bind_info_t) ); - cl_memclr(p_bind, sizeof(osm_mtl_bind_info_t)); + memset(p_bind, 0, sizeof(osm_mtl_bind_info_t)); if( p_bind == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -686,7 +684,7 @@ osm_vendor_bind( IN osm_vendor_t * const } /* track this bind request info */ - cl_memcpy( p_bind->hca_id, hca_id, sizeof( VAPI_hca_id_t ) ); + memcpy( p_bind->hca_id, hca_id, sizeof( VAPI_hca_id_t ) ); p_bind->port_num = port_num; p_bind->p_vend = p_vend; p_bind->client_context = context; @@ -884,7 +882,7 @@ osm_vendor_get( IN osm_bind_handle_t h_b goto Exit; } - cl_memclr( mad_p, p_vw->size ); + memset( mad_p, 0, p_vw->size ); /* track locally */ p_vw->mad_buf_p = mad_p; @@ -975,7 +973,7 @@ osm_mtl_send_mad( else { /* is a directed route - we need to construct a permissive address */ - cl_memclr( &av, sizeof( av ) ); + memset( &av, 0, sizeof( av ) ); /* we do not need port number since it is part of the mad_hndl */ av.dlid = IB_LID_PERMISSIVE; } @@ -1149,8 +1147,8 @@ osm_vendor_set_sm( OSM_LOG_ENTER( p_vend->p_log, osm_vendor_set_sm ); - cl_memclr( &attr_mod, sizeof( attr_mod ) ); - cl_memclr( &attr_mask, sizeof( attr_mask ) ); + memset( &attr_mod, 0, sizeof( attr_mod ) ); + memset( &attr_mask, 0, sizeof( attr_mask ) ); attr_mod.is_sm = is_sm_val; attr_mask = HCA_ATTR_IS_SM; Index: osm/libvendor/osm_vendor_al.c =================================================================== --- osm/libvendor/osm_vendor_al.c (revision 7286) +++ osm/libvendor/osm_vendor_al.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -948,7 +948,7 @@ __osm_vendor_init_av( IN const osm_al_bind_info_t* p_bind, IN ib_av_attr_t* p_av ) { - cl_memclr( p_av, sizeof(*p_av) ); + memset( p_av, 0, sizeof(*p_av) ); p_av->port_num = p_bind->port_num; p_av->dlid = IB_LID_PERMISSIVE; } @@ -1023,7 +1023,7 @@ osm_vendor_bind( /* Get the proper QP. */ - cl_memclr( &qp_create, sizeof(qp_create) ); + memset( &qp_create, 0, sizeof(qp_create) ); switch( p_user_bind->mad_class ) { @@ -1065,7 +1065,7 @@ osm_vendor_bind( CL_ASSERT( p_bind->h_qp ); CL_ASSERT( p_bind->pool_key ); - cl_memclr( &mad_svc, sizeof(mad_svc) ); + memset( &mad_svc, 0, sizeof(mad_svc) ); mad_svc.mad_svc_context = p_bind; mad_svc.pfn_mad_send_cb = __osm_al_send_callback; @@ -1259,7 +1259,7 @@ osm_vendor_send( */ if( p_mad->mgmt_class != IB_MCLASS_SUBN_DIR ) { - cl_memclr( &av, sizeof(av) ); + memset( &av, 0, sizeof(av) ); av.port_num = p_bind->port_num; av.dlid = p_mad_addr->dest_lid; av.static_rate = p_mad_addr->static_rate; @@ -1427,7 +1427,7 @@ osm_vendor_set_sm( OSM_LOG_ENTER( p_vend->p_log, osm_vendor_set_sm ); - cl_memclr( &attr_mod, sizeof(attr_mod) ); + memset( &attr_mod, 0, sizeof(attr_mod) ); attr_mod.cap.sm = is_sm_val; Index: osm/libvendor/osm_vendor_mlx_ts_anafa.c =================================================================== --- osm/libvendor/osm_vendor_mlx_ts_anafa.c (revision 7286) +++ osm/libvendor/osm_vendor_mlx_ts_anafa.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -283,7 +283,7 @@ osmv_transport_mad_send (IN const osm_bi /* * Copy the MAD over to the sent mad */ - cl_memcpy (&ts_mad, p_mad_hdr, MAD_BLOCK_SIZE); + memcpy (&ts_mad, p_mad_hdr, MAD_BLOCK_SIZE); /* * For all sends other than directed route SM MADs, @@ -415,9 +415,9 @@ __osmv_TOPSPIN_ANAFA_mad_addr_to_osm_add p_rcv_desc->grh.traffic_class, p_rcv_desc->grh.flow_label); p_mad_addr->addr_type.gsi.grh_info.hop_limit = p_rcv_desc->grh.hop_limit; - cl_memcpy(&p_mad_addr->addr_type.gsi.grh_info.src_gid.raw, + memcpy(&p_mad_addr->addr_type.gsi.grh_info.src_gid.raw, &p_rcv_desc->grh.sgid, sizeof(ib_net64_t)); - cl_memcpy(&p_mad_addr->addr_type.gsi.grh_info.dest_gid.raw, + memcpy(&p_mad_addr->addr_type.gsi.grh_info.dest_gid.raw, p_rcv_desc->grh.dgid, sizeof(ib_net64_t)); } */ Index: osm/libvendor/osm_vendor_mlx.c =================================================================== --- osm/libvendor/osm_vendor_mlx.c (revision 7286) +++ osm/libvendor/osm_vendor_mlx.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -632,8 +632,8 @@ osm_vendor_set_sm( OSM_LOG_ENTER( p_vend->p_log, osm_vendor_set_sm ); - cl_memclr( &attr_mod, sizeof( attr_mod ) ); - cl_memclr( &attr_mask, sizeof( attr_mask ) ); + memset( &attr_mod, 0, sizeof( attr_mod ) ); + memset( &attr_mask, 0, sizeof( attr_mask ) ); attr_mod.is_sm = is_sm_val; attr_mask = HCA_ATTR_IS_SM; Index: osm/libvendor/osm_vendor_mlx_hca_anafa.c =================================================================== --- osm/libvendor/osm_vendor_mlx_hca_anafa.c (revision 7286) +++ osm/libvendor/osm_vendor_mlx_hca_anafa.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -71,7 +71,7 @@ typedef struct _osm_ca_info { ib_api_status_t __osm_vendor_gid_to_guid (IN tTS_IB_GID gid, OUT ib_net64_t * p_guid) { - cl_memcpy (p_guid, gid + 8, 8); + memcpy (p_guid, gid + 8, 8); return (IB_SUCCESS); } Index: osm/complib/cl_timer.c =================================================================== --- osm/complib/cl_timer.c (revision 7286) +++ osm/complib/cl_timer.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -221,7 +221,7 @@ void cl_timer_construct( IN cl_timer_t* const p_timer ) { - cl_memclr( p_timer, sizeof(cl_timer_t) ); + memset( p_timer, 0, sizeof(cl_timer_t) ); p_timer->state = CL_UNINITIALIZED; } Index: osm/complib/cl_ptr_vector.c =================================================================== --- osm/complib/cl_ptr_vector.c (revision 7286) +++ osm/complib/cl_ptr_vector.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -61,7 +61,7 @@ cl_ptr_vector_construct( { CL_ASSERT( p_vector ); - cl_memclr( p_vector, sizeof(cl_ptr_vector_t) ); + memset( p_vector, 0, sizeof(cl_ptr_vector_t) ); p_vector->state = CL_UNINITIALIZED; } @@ -220,7 +220,7 @@ cl_ptr_vector_set_capacity( if( p_vector->p_ptr_array ) { /* Copy the old pointer array into the new. */ - cl_memcpy( p_new_ptr_array, p_vector->p_ptr_array, + memcpy( p_new_ptr_array, p_vector->p_ptr_array, p_vector->capacity * sizeof(void*) ); /* Free the old pointer array. */ Index: osm/complib/cl_perf.c =================================================================== --- osm/complib/cl_perf.c (revision 7286) +++ osm/complib/cl_perf.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -78,7 +78,7 @@ void __cl_perf_construct( IN cl_perf_t* const p_perf ) { - cl_memclr( p_perf, sizeof(cl_perf_t) ); + memset( p_perf, 0, sizeof(cl_perf_t) ); p_perf->state = CL_UNINITIALIZED; } Index: osm/complib/cl_threadpool.c =================================================================== --- osm/complib/cl_threadpool.c (revision 7286) +++ osm/complib/cl_threadpool.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -93,7 +93,7 @@ cl_thread_pool_construct( { CL_ASSERT( p_thread_pool); - cl_memclr( p_thread_pool, sizeof(cl_thread_pool_t) ); + memset( p_thread_pool, 0, sizeof(cl_thread_pool_t) ); cl_event_construct( &p_thread_pool->wakeup_event ); cl_event_construct( &p_thread_pool->destroy_event ); cl_list_construct( &p_thread_pool->thread_list ); Index: osm/complib/cl_vector.c =================================================================== --- osm/complib/cl_vector.c (revision 7286) +++ osm/complib/cl_vector.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -86,7 +86,7 @@ cl_vector_copy_general( IN const void* const p_src, IN const size_t size ) { - cl_memcpy( p_dest, p_src, size ); + memcpy( p_dest, p_src, size ); } @@ -212,7 +212,7 @@ cl_vector_construct( { CL_ASSERT( p_vector ); - cl_memclr( p_vector, sizeof(cl_vector_t) ); + memset( p_vector, 0, sizeof(cl_vector_t) ); p_vector->state = CL_UNINITIALIZED; } @@ -412,7 +412,7 @@ cl_vector_set_capacity( if( p_vector->p_ptr_array ) { /* Copy the old pointer array into the new. */ - cl_memcpy( p_new_ptr_array, p_vector->p_ptr_array, + memcpy( p_new_ptr_array, p_vector->p_ptr_array, p_vector->capacity * sizeof(void*) ); /* Free the old pointer array. */ Index: osm/complib/cl_memory.c =================================================================== --- osm/complib/cl_memory.c (revision 7286) +++ osm/complib/cl_memory.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -259,7 +259,7 @@ cl_mem_check( void ) else { /* obtain the size from the header */ - cl_memcpy(&size, (char*)p_mem + _MEM_DEBUG_MAGIC_SIZE_, sizeof(size)); + memcpy(&size, (char*)p_mem + _MEM_DEBUG_MAGIC_SIZE_, sizeof(size)); if (memcmp((char*)p_mem + sizeof(size) + _MEM_DEBUG_MAGIC_SIZE_ + size, &_MEM_DEBUG_MAGIC_END_, _MEM_DEBUG_MAGIC_SIZE_)) @@ -322,12 +322,12 @@ __cl_malloc_trk( if( !p_mem ) return( NULL ); /* now poisen */ - cl_memset(p_mem, 0xA5, size + _MEM_DEBUG_EXTRA_SIZE_); + memset(p_mem, 0xA5, size + _MEM_DEBUG_EXTRA_SIZE_); /* special layout */ - cl_memcpy(p_mem, &_MEM_DEBUG_MAGIC_START_, _MEM_DEBUG_MAGIC_SIZE_); - cl_memcpy((char*)p_mem + _MEM_DEBUG_MAGIC_SIZE_, &size, sizeof(size)); - cl_memcpy((char*)p_mem + sizeof(size) + size + _MEM_DEBUG_MAGIC_SIZE_, - &_MEM_DEBUG_MAGIC_END_, _MEM_DEBUG_MAGIC_SIZE_); + memcpy(p_mem, &_MEM_DEBUG_MAGIC_START_, _MEM_DEBUG_MAGIC_SIZE_); + memcpy((char*)p_mem + _MEM_DEBUG_MAGIC_SIZE_, &size, sizeof(size)); + memcpy((char*)p_mem + sizeof(size) + size + _MEM_DEBUG_MAGIC_SIZE_, + &_MEM_DEBUG_MAGIC_END_, _MEM_DEBUG_MAGIC_SIZE_); p_mem = (char*)p_mem + _MEM_DEBUG_MAGIC_SIZE_ + sizeof(size); #endif @@ -363,7 +363,7 @@ __cl_malloc_trk( return( p_mem ); } } - cl_memcpy( p_hdr->file_name, temp_buf, FILE_NAME_LENGTH ); + memcpy( p_hdr->file_name, temp_buf, FILE_NAME_LENGTH ); p_hdr->line_num = temp_line; /* * We store the pointer to the memory returned to the user. This allows @@ -401,7 +401,7 @@ __cl_zalloc_trk( p_buffer = __cl_malloc_trk( p_file_name, line_num, size ); if( p_buffer ) - cl_memclr( p_buffer, size ); + memset( p_buffer, 0, size ); return( p_buffer ); } @@ -415,7 +415,7 @@ __cl_zalloc_ntrk( p_buffer = __cl_malloc_priv( size ); if( p_buffer ) - cl_memclr( p_buffer, size ); + memset( p_buffer, 0, size ); return( p_buffer ); } @@ -504,7 +504,7 @@ __cl_free_trk( else { /* obtain the size from the header */ - cl_memcpy(&size, (char*)p_mem + _MEM_DEBUG_MAGIC_SIZE_, sizeof(size)); + memcpy(&size, (char*)p_mem + _MEM_DEBUG_MAGIC_SIZE_, sizeof(size)); if (memcmp((char*)p_mem + sizeof(size) + _MEM_DEBUG_MAGIC_SIZE_ + size, &_MEM_DEBUG_MAGIC_END_, _MEM_DEBUG_MAGIC_SIZE_)) @@ -514,7 +514,7 @@ __cl_free_trk( ); } /* now poisen */ - cl_memset(p_mem, 0x5A, size + _MEM_DEBUG_EXTRA_SIZE_); + memset(p_mem, 0x5A, size + _MEM_DEBUG_EXTRA_SIZE_); } __cl_free_priv( p_mem ); } Index: osm/complib/cl_pool.c =================================================================== --- osm/complib/cl_pool.c (revision 7286) +++ osm/complib/cl_pool.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -70,7 +70,7 @@ cl_qcpool_construct( { CL_ASSERT( p_pool ); - cl_memclr( p_pool, sizeof(cl_qcpool_t) ); + memset( p_pool, 0, sizeof(cl_qcpool_t) ); p_pool->state = CL_UNINITIALIZED; } @@ -126,7 +126,7 @@ cl_qcpool_init( (void**)(p_pool->component_sizes + num_components); /* Copy the user's sizes into our array for future use. */ - cl_memcpy( p_pool->component_sizes, component_sizes, + memcpy( p_pool->component_sizes, component_sizes, sizeof(uint32_t) * num_components ); /* Store the number of components per object. */ @@ -452,7 +452,7 @@ void cl_qpool_construct( IN cl_qpool_t* const p_pool ) { - cl_memclr( p_pool, sizeof(cl_qpool_t) ); + memset( p_pool, 0, sizeof(cl_qpool_t) ); cl_qcpool_construct( &p_pool->qcpool ); } @@ -562,7 +562,7 @@ cl_cpool_construct( { CL_ASSERT( p_pool ); - cl_memclr( p_pool, sizeof(cl_cpool_t) ); + memset( p_pool, 0, sizeof(cl_cpool_t) ); cl_qcpool_construct( &p_pool->qcpool ); } @@ -680,7 +680,7 @@ cl_pool_construct( { CL_ASSERT( p_pool ); - cl_memclr( p_pool, sizeof(cl_pool_t) ); + memset( p_pool, 0, sizeof(cl_pool_t) ); cl_qcpool_construct( &p_pool->qcpool ); } Index: osm/complib/cl_map.c =================================================================== --- osm/complib/cl_map.c (revision 7286) +++ osm/complib/cl_map.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -233,7 +233,7 @@ cl_qmap_init( { CL_ASSERT( p_map ); - cl_memclr( p_map, sizeof(cl_qmap_t) ); + memset( p_map, 0, sizeof(cl_qmap_t) ); /* special setup for the root node */ p_map->root.p_up = &p_map->root; @@ -1268,7 +1268,7 @@ cl_fmap_init( CL_ASSERT( p_map ); CL_ASSERT( pfn_compare ); - cl_memclr( p_map, sizeof(cl_fmap_t) ); + memset( p_map, 0, sizeof(cl_fmap_t) ); /* special setup for the root node */ p_map->root.p_up = &p_map->root; Index: osm/complib/cl_memory_osd.c =================================================================== --- osm/complib/cl_memory_osd.c (revision 7286) +++ osm/complib/cl_memory_osd.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -69,30 +69,3 @@ __cl_free_priv( free( p_memory ); } -void -cl_memset( - IN void* const p_memory, - IN const uint8_t fill, - IN const size_t count ) -{ - memset( p_memory, fill, count ); -} - -void* -cl_memcpy( - IN void* const p_dest, - IN const void* const p_src, - IN const size_t count ) -{ - return( memcpy( p_dest, p_src, count ) ); -} - -int32_t -cl_memcmp( - IN const void* const p_mem, - IN const void* const p_ref, - IN const size_t count ) -{ - return( memcmp( p_mem, p_ref, count ) ); -} - Index: osm/osmtest/osmtest.c =================================================================== --- osm/osmtest/osmtest.c (revision 7286) +++ osm/osmtest/osmtest.c (working copy) @@ -435,7 +435,7 @@ subnet_init( IN subnet_t * const p_subn void osmtest_construct( IN osmtest_t * const p_osmt ) { - cl_memclr( p_osmt, sizeof( *p_osmt ) ); + memset( p_osmt, 0, sizeof( *p_osmt ) ); osm_log_construct( &p_osmt->log ); subnet_construct( &p_osmt->exp_subn ); } @@ -616,8 +616,8 @@ osmtest_get_all_recs( IN osmtest_t * con * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); p_context->p_osmt = p_osmt; user.attr_id = attr_id; @@ -693,7 +693,7 @@ osmtest_validate_sa_class_port_info( IN * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); + memset( &req, 0, sizeof( req ) ); p_context->p_osmt = p_osmt; req.query_type = OSMV_QUERY_CLASS_PORT_INFO; @@ -787,9 +787,9 @@ osmtest_get_node_rec( IN osmtest_t * con * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); - cl_memclr( &record, sizeof( record ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); + memset( &record, 0, sizeof( record ) ); record.node_info.node_guid = node_guid; @@ -874,9 +874,9 @@ osmtest_get_node_rec_by_lid( IN osmtest_ * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); - cl_memclr( &record, sizeof( record ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); + memset( &record, 0, sizeof( record ) ); record.lid = lid; @@ -946,7 +946,7 @@ osmtest_get_path_rec_by_guid_pair( IN os OSM_LOG_ENTER( &p_osmt->log, osmtest_get_path_rec_by_guid_pair); - cl_memclr( p_context, sizeof( *p_context ) ); + memset( p_context, 0, sizeof( *p_context ) ); p_context->p_osmt = p_osmt; req.timeout_ms = p_osmt->opt.transaction_timeout; @@ -1022,7 +1022,7 @@ osmtest_get_multipath_rec( IN osmtest_t * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); + memset( &req, 0, sizeof( req ) ); p_context->p_osmt = p_osmt; req.timeout_ms = p_osmt->opt.transaction_timeout; @@ -1099,9 +1099,9 @@ osmtest_get_port_rec( IN osmtest_t * con * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); - cl_memclr( &record, sizeof( record ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); + memset( &record, 0, sizeof( record ) ); record.lid = lid; @@ -1183,9 +1183,9 @@ osmtest_get_port_rec_by_num( IN osmtest_ * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); - cl_memclr( &record, sizeof( record ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); + memset( &record, 0, sizeof( record ) ); record.lid = lid; record.port_num = port_num; @@ -1254,7 +1254,7 @@ osmtest_stress_port_recs_large( IN osmte OSM_LOG_ENTER( &p_osmt->log, osmtest_stress_port_recs_large ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); /* * Do a blocking query for all NodeRecords in the subnet. */ @@ -1319,7 +1319,7 @@ osmtest_stress_node_recs_large( IN osmte OSM_LOG_ENTER( &p_osmt->log, osmtest_stress_node_recs_large ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); /* * Do a blocking query for all NodeRecords in the subnet. @@ -1385,7 +1385,7 @@ osmtest_stress_path_recs_large( IN osmte OSM_LOG_ENTER( &p_osmt->log, osmtest_stress_path_recs_large ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); /* * Do a blocking query for all Records in the subnet. @@ -1452,7 +1452,7 @@ osmtest_stress_path_recs_by_guid ( IN os OSM_LOG_ENTER( &p_osmt->log, osmtest_stress_path_recs_by_guid ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); context.p_osmt = p_osmt; @@ -1557,7 +1557,7 @@ osmtest_stress_port_recs_small( IN osmte OSM_LOG_ENTER( &p_osmt->log, osmtest_stress_port_recs_small ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); /* * Do a blocking query for our own PortRecord in the subnet. @@ -1642,9 +1642,9 @@ osmtest_wrong_sm_key_ignored( IN osmtest * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); - cl_memclr( &record, sizeof( record ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); + memset( &record, 0, sizeof( record ) ); record.lid = p_osmt->local_port.sm_lid; record.port_num = port_num; @@ -1947,7 +1947,7 @@ osmtest_write_all_link_recs( IN osmtest_ OSM_LOG_ENTER( &p_osmt->log, osmtest_write_all_link_recs ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); /* * Do a blocking query for all NodeRecords in the subnet. @@ -2023,7 +2023,7 @@ osmtest_get_path_rec_by_lid_pair( IN osm OSM_LOG_ENTER( &p_osmt->log, osmtest_get_path_rec_by_lid_pair); - cl_memclr( p_context, sizeof( *p_context ) ); + memset( p_context, 0, sizeof( *p_context ) ); p_context->p_osmt = p_osmt; req.timeout_ms = p_osmt->opt.transaction_timeout; @@ -2094,7 +2094,7 @@ osmtest_write_all_node_recs( IN osmtest_ OSM_LOG_ENTER( &p_osmt->log, osmtest_write_all_node_recs ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); /* * Do a blocking query for all NodeRecords in the subnet. @@ -2169,7 +2169,7 @@ osmtest_write_all_port_recs( IN osmtest_ OSM_LOG_ENTER( &p_osmt->log, osmtest_write_all_port_recs ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); /* * Do a blocking query for all NodeRecords in the subnet. @@ -2244,7 +2244,7 @@ osmtest_write_all_path_recs( OSM_LOG_ENTER( &p_osmt->log, osmtest_write_all_path_recs ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); /* * Do a blocking query for all records in the subnet. @@ -2340,7 +2340,7 @@ osmtest_write_all_node_recs( for (lid = 1; lid <= p_osmt->max_lid; lid++) { /* prepare the qury context */ - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); status = osmtest_get_node_rec_by_lid( p_osmt, cl_ntoh16( lid ), &context ); if( status != IB_SUCCESS ) @@ -2430,7 +2430,7 @@ osmtest_write_all_port_recs( IN osmtest_ OSM_LOG_ENTER( &p_osmt->log, osmtest_write_all_port_recs ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); /* print header */ result = fprintf( fh, "#\n" "# PortInfo Records\n" "#\n" ); @@ -2459,7 +2459,7 @@ osmtest_write_all_port_recs( IN osmtest_ for (port_num = 0; port_num <= p_node_rec->node_info.num_ports; port_num++) { /* prepare the qury context */ - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); status = osmtest_get_port_rec_by_num( p_osmt, p_node_rec->lid, @@ -2545,7 +2545,7 @@ osmtest_write_all_path_recs( IN osmtest_ OSM_LOG_ENTER( &p_osmt->log, osmtest_write_all_path_recs ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); /* * Go over all nodes that exist in the subnet @@ -4006,7 +4006,7 @@ osmtest_validate_all_node_recs( IN osmte OSM_LOG_ENTER( &p_osmt->log, osmtest_validate_all_node_recs ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); /* * Do a blocking query for all NodeRecords in the subnet. @@ -4090,7 +4090,7 @@ osmtest_validate_all_guidinfo_recs( IN o OSM_LOG_ENTER( &p_osmt->log, osmtest_validate_all_guidinfo_recs ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); /* * Do a blocking query for all GuidInfoRecords in the subnet. @@ -4146,7 +4146,7 @@ osmtest_validate_all_path_recs( IN osmte OSM_LOG_ENTER( &p_osmt->log, osmtest_validate_all_path_recs ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); /* * Do a blocking query for all PathRecords in the subnet. @@ -4230,7 +4230,7 @@ osmtest_validate_single_path_rec_lid_pai OSM_LOG_ENTER( &p_osmt->log, osmtest_validate_single_path_rec_lid_pair ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); status = osmtest_get_path_rec_by_lid_pair( p_osmt, p_path->rec.slid, @@ -4307,10 +4307,10 @@ osmtest_validate_single_node_rec_lid( IN cl_ntoh16( lid ) ); } - cl_memclr( &context, sizeof( context ) ); - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); - cl_memclr( &record, sizeof( record ) ); + memset( &context, 0, sizeof( context ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); + memset( &record, 0, sizeof( record ) ); record.lid = lid; @@ -4404,7 +4404,7 @@ osmtest_validate_single_port_rec_lid( IN OSM_LOG_ENTER( &p_osmt->log, osmtest_validate_single_port_rec_lid ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); context.p_osmt = p_osmt; osmtest_get_port_rec_by_num( p_osmt, @@ -4461,7 +4461,7 @@ osmtest_validate_single_path_rec_guid_pa OSM_LOG_ENTER( &p_osmt->log, osmtest_validate_single_path_rec_guid_pair ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); if( osm_log_is_active( &p_osmt->log, OSM_LOG_DEBUG ) ) { @@ -4814,8 +4814,8 @@ osmtest_validate_against_db( IN osmtest_ goto Exit; #if defined (VENDOR_RMPP_SUPPORT) && defined (DUAL_SIDED_RMPP) - cl_memclr( &context, sizeof( context ) ); - cl_memclr( &request, sizeof( request ) ); + memset( &context, 0, sizeof( context ) ); + memset( &request, 0, sizeof( request ) ); request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT | IB_MPR_COMPMASK_DGIDCOUNT; request.sgid_count = 1; request.dgid_count = 1; @@ -4825,8 +4825,8 @@ osmtest_validate_against_db( IN osmtest_ if( status != IB_SUCCESS ) goto Exit; - cl_memclr( &context, sizeof( context ) ); - cl_memclr( &request, sizeof( request ) ); + memset( &context, 0, sizeof( context ) ); + memset( &request, 0, sizeof( request ) ); status = osmtest_get_multipath_rec( p_osmt, &request, &context ); if( status == IB_SUCCESS ) goto Exit; @@ -4837,8 +4837,8 @@ osmtest_validate_against_db( IN osmtest_ "IS EXPECTED ERROR ^^^^\n"); } - cl_memclr( &context, sizeof( context ) ); - cl_memclr( &request, sizeof( request ) ); + memset( &context, 0, sizeof( context ) ); + memset( &request, 0, sizeof( request ) ); request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT; request.sgid_count = 1; ib_gid_set_default( &request.gids[0], portguid ); @@ -4852,8 +4852,8 @@ osmtest_validate_against_db( IN osmtest_ "IS EXPECTED ERROR ^^^^\n"); } - cl_memclr( &context, sizeof( context ) ); - cl_memclr( &request, sizeof( request ) ); + memset( &context, 0, sizeof( context ) ); + memset( &request, 0, sizeof( request ) ); request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT | IB_MPR_COMPMASK_DGIDCOUNT; request.sgid_count = 1; request.dgid_count = 1; @@ -4871,7 +4871,7 @@ osmtest_validate_against_db( IN osmtest_ "IS EXPECTED ERROR ^^^^\n"); } - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT | IB_MPR_COMPMASK_DGIDCOUNT; request.sgid_count = 1; request.dgid_count = 1; @@ -4889,8 +4889,8 @@ osmtest_validate_against_db( IN osmtest_ "IS EXPECTED ERROR ^^^^\n"); } - cl_memclr( &context, sizeof( context ) ); - cl_memclr( &request, sizeof( request ) ); + memset( &context, 0, sizeof( context ) ); + memset( &request, 0, sizeof( request ) ); request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT | IB_MPR_COMPMASK_DGIDCOUNT | IB_MPR_COMPMASK_NUMBPATH; request.sgid_count = 2; @@ -6216,8 +6216,8 @@ osmtest_bind( IN osmtest_t * p_osmt, /* * Copy the port info for the selected port. */ - cl_memcpy( &p_osmt->local_port, &attr_array[port_index], - sizeof( p_osmt->local_port ) ); + memcpy( &p_osmt->local_port, &attr_array[port_index], + sizeof( p_osmt->local_port ) ); /* bind to the SA */ osm_log( &p_osmt->log, OSM_LOG_DEBUG, Index: osm/osmtest/osmt_service.c =================================================================== --- osm/osmtest/osmt_service.c (revision 7286) +++ osm/osmtest/osmt_service.c (working copy) @@ -1,4 +1,5 @@ /* + * Copyright (c) 2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2006 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -83,10 +84,10 @@ osmt_register_service( IN osmtest_t * co "Registering service: name: %s id: 0x%" PRIx64 "\n", service_name, cl_ntoh64(service_id)); - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &context, sizeof( context ) ); - cl_memclr( &user, sizeof( user ) ); - cl_memclr( &svc_rec, sizeof( svc_rec ) ); + memset( &req, 0, sizeof( req ) ); + memset( &context, 0, sizeof( context ) ); + memset( &user, 0, sizeof( user ) ); + memset( &svc_rec, 0, sizeof( svc_rec ) ); /* set the new service record fields */ svc_rec.service_id = service_id; @@ -94,10 +95,10 @@ osmt_register_service( IN osmtest_t * co svc_rec.service_gid.unicast.prefix = 0; svc_rec.service_gid.unicast.interface_id = p_osmt->local_port.port_guid; svc_rec.service_lease = service_lease; - cl_memclr(&svc_rec.service_key,16*sizeof(uint8_t)); + memset(&svc_rec.service_key, 0, 16*sizeof(uint8_t)); svc_rec.service_key[0] = service_key_lsb; - cl_memclr(svc_rec.service_name, sizeof(svc_rec.service_name)); - cl_memcpy(svc_rec.service_name, service_name, + memset(svc_rec.service_name, 0, sizeof(svc_rec.service_name)); + memcpy(svc_rec.service_name, service_name, (strlen(service_name)+1)*sizeof(char)); /* prepare the data used for this query */ @@ -198,10 +199,10 @@ osmt_register_service_with_full_key ( IN "Registering service: name: %s id: 0x%" PRIx64 "\n", service_name, cl_ntoh64(service_id)); - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &context, sizeof( context ) ); - cl_memclr( &user, sizeof( user ) ); - cl_memclr( &svc_rec, sizeof( svc_rec ) ); + memset( &req, 0, sizeof( req ) ); + memset( &context, 0, sizeof( context ) ); + memset( &user, 0, sizeof( user ) ); + memset( &svc_rec, 0, sizeof( svc_rec ) ); /* set the new service record fields */ svc_rec.service_id = service_id; @@ -209,12 +210,12 @@ osmt_register_service_with_full_key ( IN svc_rec.service_gid.unicast.prefix = 0; svc_rec.service_gid.unicast.interface_id = p_osmt->local_port.port_guid; svc_rec.service_lease = service_lease; - cl_memclr(&svc_rec.service_key,16*sizeof(uint8_t)); - cl_memcpy(svc_rec.service_key,service_key,16*sizeof(uint8_t)); - cl_memclr(svc_rec.service_name, sizeof(svc_rec.service_name)); - cl_memclr(skey, 16*sizeof(uint8_t)); - cl_memcpy(svc_rec.service_name, service_name, - (strlen(service_name)+1)*sizeof(char)); + memset(&svc_rec.service_key, 0, 16*sizeof(uint8_t)); + memcpy(svc_rec.service_key,service_key, 16*sizeof(uint8_t)); + memset(svc_rec.service_name, 0, sizeof(svc_rec.service_name)); + memset(skey, 0, 16*sizeof(uint8_t)); + memcpy(svc_rec.service_name, service_name, + (strlen(service_name)+1)*sizeof(char)); /* prepare the data used for this query */ /* sa_mad_data.method = IB_MAD_METHOD_SET; */ @@ -275,7 +276,7 @@ osmt_register_service_with_full_key ( IN i,service_key[i],i,p_rec->service_key[i]); } /* since c15-0.1.14 not supported all key association queries should bring in return zero in service key */ - if (cl_memcmp(skey,p_rec->service_key,16*sizeof(uint8_t)) != 0) + if (memcmp(skey,p_rec->service_key,16*sizeof(uint8_t)) != 0) { status = IB_REMOTE_ERROR; osm_log( &p_osmt->log, OSM_LOG_ERROR, @@ -340,10 +341,10 @@ osmt_register_service_with_data( IN osmt "Registering service: name: %s id: 0x%" PRIx64 "\n", service_name, cl_ntoh64(service_id)); - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &context, sizeof( context ) ); - cl_memclr( &user, sizeof( user ) ); - cl_memclr( &svc_rec, sizeof( svc_rec ) ); + memset( &req, 0, sizeof( req ) ); + memset( &context, 0, sizeof( context ) ); + memset( &user, 0, sizeof( user ) ); + memset( &svc_rec, 0, sizeof( svc_rec ) ); /* set the new service record fields */ svc_rec.service_id = service_id; @@ -351,18 +352,18 @@ osmt_register_service_with_data( IN osmt svc_rec.service_gid.unicast.prefix = 0; svc_rec.service_gid.unicast.interface_id = p_osmt->local_port.port_guid; svc_rec.service_lease = service_lease; - cl_memclr(&svc_rec.service_key,16*sizeof(uint8_t)); + memset(&svc_rec.service_key, 0, 16*sizeof(uint8_t)); svc_rec.service_key[0] = service_key_lsb; /* Copy data to service_data arrays */ - cl_memcpy(svc_rec.service_data8,service_data8,16*sizeof(uint8_t)); - cl_memcpy(svc_rec.service_data16,service_data16,8*sizeof(ib_net16_t)); - cl_memcpy(svc_rec.service_data32,service_data32,4*sizeof(ib_net32_t)); - cl_memcpy(svc_rec.service_data64,service_data64,2*sizeof(ib_net64_t)); - - cl_memclr(svc_rec.service_name, sizeof(svc_rec.service_name)); - cl_memcpy(svc_rec.service_name, service_name, - (strlen(service_name)+1)*sizeof(char)); + memcpy(svc_rec.service_data8, service_data8, 16*sizeof(uint8_t)); + memcpy(svc_rec.service_data16, service_data16, 8*sizeof(ib_net16_t)); + memcpy(svc_rec.service_data32, service_data32, 4*sizeof(ib_net32_t)); + memcpy(svc_rec.service_data64, service_data64, 2*sizeof(ib_net64_t)); + + memset(svc_rec.service_name, 0, sizeof(svc_rec.service_name)); + memcpy(svc_rec.service_name, service_name, + (strlen(service_name)+1)*sizeof(char)); /* prepare the data used for this query */ /* sa_mad_data.method = IB_MAD_METHOD_SET; */ @@ -455,10 +456,10 @@ osmt_register_service_with_data( IN osmt p_rec = osmv_get_query_svc_rec( context.result.p_result_madw, 0 ); osm_log( &p_osmt->log, OSM_LOG_VERBOSE, "Comparing service data...\n"); - if (cl_memcmp(service_data8,p_rec->service_data8,16*sizeof(uint8_t)) != 0 || - cl_memcmp(service_data16,p_rec->service_data16,8*sizeof(uint16_t)) != 0 || - cl_memcmp(service_data32,p_rec->service_data32,4*sizeof(uint32_t)) != 0 || - cl_memcmp(service_data64,p_rec->service_data64,2*sizeof(uint64_t)) != 0 + if (memcmp(service_data8,p_rec->service_data8,16*sizeof(uint8_t)) != 0 || + memcmp(service_data16,p_rec->service_data16,8*sizeof(uint16_t)) != 0 || + memcmp(service_data32,p_rec->service_data32,4*sizeof(uint32_t)) != 0 || + memcmp(service_data64,p_rec->service_data64,2*sizeof(uint64_t)) != 0 ) { status = IB_REMOTE_ERROR; @@ -509,8 +510,8 @@ osmt_get_service_by_id_and_name ( IN osm * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &context, sizeof( context ) ); + memset( &req, 0, sizeof( req ) ); + memset( &context, 0, sizeof( context ) ); context.p_osmt = p_osmt; @@ -523,12 +524,12 @@ osmt_get_service_by_id_and_name ( IN osm req.pfn_query_cb = osmtest_query_res_cb; req.sm_key = 0; - cl_memclr( &svc_rec, sizeof( svc_rec ) ); - cl_memclr( &user, sizeof( user ) ); + memset( &svc_rec, 0, sizeof( svc_rec ) ); + memset( &user, 0, sizeof( user ) ); /* set the new service record fields */ - cl_memclr(svc_rec.service_name, sizeof(svc_rec.service_name)); - cl_memcpy(svc_rec.service_name, sr_name, - (strlen(sr_name)+1)*sizeof(char)); + memset(svc_rec.service_name, 0, sizeof(svc_rec.service_name)); + memcpy(svc_rec.service_name, sr_name, + (strlen(sr_name)+1)*sizeof(char)); svc_rec.service_id = sid; req.p_query_input = &user; @@ -648,8 +649,8 @@ osmt_get_service_by_id ( IN osmtest_t * * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &context, sizeof( context ) ); + memset( &req, 0, sizeof( req ) ); + memset( &context, 0, sizeof( context ) ); context.p_osmt = p_osmt; @@ -662,8 +663,8 @@ osmt_get_service_by_id ( IN osmtest_t * req.pfn_query_cb = osmtest_query_res_cb; req.sm_key = 0; - cl_memclr( &svc_rec, sizeof( svc_rec ) ); - cl_memclr( &user, sizeof( user ) ); + memset( &svc_rec, 0, sizeof( svc_rec ) ); + memset( &user, 0, sizeof( user ) ); /* set the new service record fields */ svc_rec.service_id = sid; req.p_query_input = &user; @@ -796,8 +797,8 @@ osmt_get_service_by_name_and_key ( IN os * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &context, sizeof( context ) ); + memset( &req, 0, sizeof( req ) ); + memset( &context, 0, sizeof( context ) ); context.p_osmt = p_osmt; @@ -810,12 +811,12 @@ osmt_get_service_by_name_and_key ( IN os req.pfn_query_cb = osmtest_query_res_cb; req.sm_key = 0; - cl_memclr( &svc_rec, sizeof( svc_rec ) ); - cl_memclr( &user, sizeof( user ) ); + memset( &svc_rec, 0, sizeof( svc_rec ) ); + memset( &user, 0, sizeof( user ) ); /* set the new service record fields */ - cl_memclr(svc_rec.service_name, sizeof(svc_rec.service_name)); - cl_memcpy(svc_rec.service_name, sr_name, - (strlen(sr_name)+1)*sizeof(char)); + memset(svc_rec.service_name, 0, sizeof(svc_rec.service_name)); + memcpy(svc_rec.service_name, sr_name, + (strlen(sr_name)+1)*sizeof(char)); for (i = 0; i <= 15; i++) svc_rec.service_key[i] = skey[i]; @@ -937,8 +938,8 @@ osmt_get_service_by_name( IN osmtest_t * * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &context, sizeof( context ) ); + memset( &req, 0, sizeof( req ) ); + memset( &context, 0, sizeof( context ) ); context.p_osmt = p_osmt; @@ -951,8 +952,8 @@ osmt_get_service_by_name( IN osmtest_t * req.pfn_query_cb = osmtest_query_res_cb; req.sm_key = 0; - cl_memclr(service_name, sizeof(service_name)); - cl_memcpy(service_name, sr_name, (strlen(sr_name)+1)*sizeof(char)); + memset(service_name, 0, sizeof(service_name)); + memcpy(service_name, sr_name, (strlen(sr_name)+1)*sizeof(char)); req.p_query_input = service_name; status = osmv_query_sa( p_osmt->h_bind, &req ); @@ -1073,8 +1074,8 @@ osmt_get_all_services_and_check_names( I * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &context, sizeof( context ) ); + memset( &req, 0, sizeof( req ) ); + memset( &context, 0, sizeof( context ) ); context.p_osmt = p_osmt; @@ -1189,7 +1190,7 @@ osmt_delete_service_by_name(IN osmtest_t "Trying to Delete service name: %s\n", sr_name); - cl_memclr( &svc_rec, sizeof( svc_rec ) ); + memset( &svc_rec, 0, sizeof( svc_rec ) ); status = osmt_get_service_by_name(p_osmt, sr_name,rec_num, &svc_rec); if (status != IB_SUCCESS) @@ -1201,14 +1202,14 @@ osmt_delete_service_by_name(IN osmtest_t goto ExitNoDel; } - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &context, sizeof( context ) ); - cl_memclr( &user, sizeof( user ) ); + memset( &req, 0, sizeof( req ) ); + memset( &context, 0, sizeof( context ) ); + memset( &user, 0, sizeof( user ) ); /* set the new service record fields */ - cl_memclr(svc_rec.service_name, sizeof(svc_rec.service_name)); - cl_memcpy(svc_rec.service_name, sr_name, - (strlen(sr_name)+1)*sizeof(char)); + memset(svc_rec.service_name, 0, sizeof(svc_rec.service_name)); + memcpy(svc_rec.service_name, sr_name, + (strlen(sr_name)+1)*sizeof(char)); /* prepare the data used for this query */ context.p_osmt = p_osmt; @@ -1384,14 +1385,14 @@ osmt_run_service_records_flow( IN osmtes /* Generate 2 instances of service record with consecutive data */ for (instance = 0 ; instance < 2 ; instance++) { /* First, clear all arrays */ - cl_memclr (service_data8,16*sizeof(uint8_t)); - cl_memclr (service_data16,8*sizeof(uint16_t)); - cl_memclr (service_data32,4*sizeof(uint32_t)); - cl_memclr (service_data64,2*sizeof(uint64_t)); - service_data8[instance]=instance+1; - service_data16[instance]=cl_hton16(instance+2); - service_data32[instance]=cl_hton32(instance+3); - service_data64[instance]=cl_hton64(instance+4); + memset (service_data8, 0, 16*sizeof(uint8_t)); + memset (service_data16, 0, 8*sizeof(uint16_t)); + memset (service_data32, 0, 4*sizeof(uint32_t)); + memset (service_data64, 0, 2*sizeof(uint64_t)); + service_data8[instance] = instance+1; + service_data16[instance] = cl_hton16(instance+2); + service_data32[instance] = cl_hton32(instance+3); + service_data64[instance] = cl_hton64(instance+4); status = osmt_register_service_with_data( p_osmt, cl_ntoh64(id[3]), /* IN ib_net64_t service_id, */ @@ -1410,7 +1411,7 @@ osmt_run_service_records_flow( IN osmtes } /* Trying to create service with zero key */ - cl_memclr (service_key,16*sizeof(uint8_t)); + memset (service_key, 0, 16*sizeof(uint8_t)); status = osmt_register_service_with_full_key( p_osmt, cl_ntoh64(id[5]), /* IN ib_net64_t service_id, */ @@ -1642,7 +1643,7 @@ osmt_run_service_records_flow( IN osmtes } /* Test Service Key */ - cl_memclr(service_key,16*sizeof(uint8_t)); + memset(service_key, 0, 16*sizeof(uint8_t)); /* Check for service_name[5] with service_key=0 - the service shouldn't exist with this name. */ @@ -1695,9 +1696,9 @@ osmt_run_service_records_flow( IN osmtes #ifdef VENDOR_RMPP_SUPPORT /* These ar the only service_names which are valid */ - cl_memcpy(&service_valid_names[0],&service_name[0],sizeof(uint8_t)*64); - cl_memcpy(&service_valid_names[1],&service_name[2],sizeof(uint8_t)*64); - cl_memcpy(&service_valid_names[2],&service_name[6],sizeof(uint8_t)*64); + memcpy(&service_valid_names[0], &service_name[0], sizeof(uint8_t)*64); + memcpy(&service_valid_names[1], &service_name[2], sizeof(uint8_t)*64); + memcpy(&service_valid_names[2], &service_name[6], sizeof(uint8_t)*64); status = osmt_get_all_services_and_check_names(p_osmt,service_valid_names, 3, &num_recs); if (status != IB_SUCCESS) Index: osm/osmtest/osmt_multicast.c =================================================================== --- osm/osmtest/osmt_multicast.c (revision 7286) +++ osm/osmtest/osmt_multicast.c (working copy) @@ -1,4 +1,5 @@ /* + * Copyright (c) 2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -66,7 +67,7 @@ __match_mgids( ib_gid_t* p_mgid_list_item = (ib_gid_t*)p_object; int32_t count; - count = cl_memcmp( + count = memcmp( p_mgid_context, p_mgid_list_item, sizeof(ib_gid_t)); @@ -102,8 +103,8 @@ osmt_query_mcast( IN osmtest_t * const p * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); context.p_osmt = p_osmt; user.attr_id = IB_MAD_ATTR_MCMEMBER_RECORD; @@ -205,7 +206,7 @@ osmt_query_mcast( IN osmtest_t * const p status = IB_ERROR; goto Exit; } - cl_memcpy(&p_mgrp->mcmember_rec,p_rec,sizeof(p_mgrp->mcmember_rec)); + memcpy(&p_mgrp->mcmember_rec,p_rec,sizeof(p_mgrp->mcmember_rec)); cl_qmap_insert(&p_osmt->exp_subn.mgrp_mlid_tbl, cl_ntoh16(p_rec->mlid),&p_mgrp->map_item); } @@ -241,10 +242,10 @@ osmt_send_mcast_request( IN osmtest_t * * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); - cl_memclr( &context, sizeof( context ) ); - cl_memclr( p_res, sizeof(ib_sa_mad_t ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); + memset( &context, 0, sizeof( context ) ); + memset( p_res, 0, sizeof(ib_sa_mad_t ) ); context.p_osmt = p_osmt; @@ -303,9 +304,9 @@ osmt_send_mcast_request( IN osmtest_t * } /* ok it worked */ - cl_memcpy(p_res, - osm_madw_get_mad_ptr(context.result.p_result_madw), - sizeof(ib_sa_mad_t)); + memcpy(p_res, + osm_madw_get_mad_ptr(context.result.p_result_madw), + sizeof(ib_sa_mad_t)); status = context.result.status; @@ -340,13 +341,13 @@ void osmt_init_mc_query_rec(IN osmtest_t * const p_osmt, IN OUT ib_member_rec_t *p_mc_req) { /* use default values so we can change only what we want later */ - cl_memclr(p_mc_req,sizeof(ib_member_rec_t)); + memset(p_mc_req, 0, sizeof(ib_member_rec_t)); /* we leave the MGID to the user */ - cl_memcpy(&p_mc_req->port_gid.unicast.interface_id, - &p_osmt->local_port.port_guid, - sizeof(p_osmt->local_port.port_guid) - ); + memcpy(&p_mc_req->port_gid.unicast.interface_id, + &p_osmt->local_port.port_guid, + sizeof(p_osmt->local_port.port_guid) + ); /* use our own subnet prefix: */ p_mc_req->port_gid.unicast.prefix = CL_HTON64(0xFE80000000000000ULL); @@ -527,8 +528,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons while( p_mgrp != (osmtest_mgrp_t*)cl_qmap_end( p_mgrp_mlid_tbl ) ) { /* search for ipoib mgid */ - if (!cl_memcmp(&osm_ipoib_good_mgid,&p_mgrp->mcmember_rec.mgid,sizeof(osm_ipoib_good_mgid)) || - !cl_memcmp(&osm_ts_ipoib_good_mgid,&p_mgrp->mcmember_rec.mgid,sizeof(osm_ts_ipoib_good_mgid))) + if (!memcmp(&osm_ipoib_good_mgid,&p_mgrp->mcmember_rec.mgid,sizeof(osm_ipoib_good_mgid)) || + !memcmp(&osm_ts_ipoib_good_mgid,&p_mgrp->mcmember_rec.mgid,sizeof(osm_ts_ipoib_good_mgid))) { IPoIBIsFound=1; } @@ -545,7 +546,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons "osmt_run_mcast_flow: " "Found IPoIB MC Group, so we run SilverStorm Bug Flow...\n"); /* Try to join first like IPoIB of SilverStorm */ - cl_memcpy(&mc_req_rec.mgid,&osm_ipoib_good_mgid,sizeof(ib_gid_t)); + memcpy(&mc_req_rec.mgid,&osm_ipoib_good_mgid,sizeof(ib_gid_t)); /* Request Join */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); comp_mask = @@ -593,7 +594,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons mtu_phys = p_mc_res->mtu ; rate_phys = p_mc_res->rate ; - cl_memcpy(&mc_req_rec.mgid,&osm_ipoib_good_mgid,sizeof(ib_gid_t)); + memcpy(&mc_req_rec.mgid,&osm_ipoib_good_mgid,sizeof(ib_gid_t)); /* Request Join */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); comp_mask = @@ -676,7 +677,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons /* Request Get */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); - cl_memclr(&mc_req_rec.port_gid.unicast.interface_id,sizeof(ib_net64_t)); + memset(&mc_req_rec.port_gid.unicast.interface_id, 0, sizeof(ib_net64_t)); comp_mask = IB_MCR_COMPMASK_GID; @@ -710,7 +711,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" ); /* no MGID */ - cl_memclr(&mc_req_rec.mgid,sizeof(ib_gid_t)); + memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); /* Request Join */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); @@ -757,7 +758,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" ); /* no MGID */ - cl_memclr(&mc_req_rec.mgid,sizeof(ib_gid_t)); + memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); /* Request Join */ ib_member_set_join_state(&mc_req_rec,IB_MC_REC_STATE_FULL_MEMBER ); @@ -796,7 +797,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons osmt_init_mc_query_rec(p_osmt, &mc_req_rec); /* no MGID */ - cl_memclr(&mc_req_rec.mgid,sizeof(ib_gid_t)); + memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); mc_req_rec.mgid.raw[15] = 0x01; @@ -913,7 +914,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" ); /* no MGID */ - /* cl_memclr(&mc_req_rec.mgid,sizeof(ib_gid_t)); */ + /* memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); */ /* Request Join */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); @@ -1241,7 +1242,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons p_mc_res = ib_sa_mad_get_payload_ptr(&res_sa_mad); /* no MGID */ - cl_memclr(&mc_req_rec.mgid,sizeof(ib_gid_t)); + memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); /* Request Join */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); @@ -1324,7 +1325,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons p_mc_res = ib_sa_mad_get_payload_ptr(&res_sa_mad); /* no MGID */ - cl_memclr(&mc_req_rec.mgid,sizeof(ib_gid_t)); + memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); /* Request Join */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); @@ -1381,7 +1382,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons p_mc_res = ib_sa_mad_get_payload_ptr(&res_sa_mad); /* no MGID */ - cl_memclr(&mc_req_rec.mgid,sizeof(ib_gid_t)); + memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); /* Request Join */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); @@ -1779,7 +1780,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons p_mc_res = ib_sa_mad_get_payload_ptr(&res_sa_mad); /* no MGID */ - cl_memclr(&mc_req_rec.mgid,sizeof(ib_gid_t)); + memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); /* Request Join */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); @@ -2212,7 +2213,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons mc_req_rec.mgid = good_mgid; mc_req_rec.mgid.raw[12] = 0xFB; - cl_memcpy(&special_mgid, &mc_req_rec.mgid, sizeof(ib_gid_t)); + memcpy(&special_mgid, &mc_req_rec.mgid, sizeof(ib_gid_t)); mc_req_rec.scope_state = 0x2F; /* link-local scope, Full member with all other bits turned on */ status = osmt_send_mcast_request( p_osmt, 1, @@ -2251,7 +2252,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons "osmt_run_mcast_flow: " "Check o15-0.2.4 statement...\n"); /* Try to join */ - cl_memcpy(&mc_req_rec.mgid,&p_mc_res->mgid,sizeof(ib_gid_t)); + memcpy(&mc_req_rec.mgid,&p_mc_res->mgid,sizeof(ib_gid_t)); /* Request Join */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_NON_MEMBER); comp_mask = @@ -2868,7 +2869,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons /* First create new mgrp */ ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); mc_req_rec.mtu = IB_MTU_LEN_1024 | IB_PATH_SELECTOR_EXACTLY << 6; - cl_memclr(&mc_req_rec.mgid,sizeof(ib_gid_t)); + memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); comp_mask = IB_MCR_COMPMASK_MGID | IB_MCR_COMPMASK_PORT_GID | @@ -2893,7 +2894,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Failed to create new mgrp\n"); goto Exit; } - cl_memcpy(&tmp_mgid,&p_mc_res->mgid,sizeof(ib_gid_t)); + memcpy(&tmp_mgid,&p_mc_res->mgid,sizeof(ib_gid_t)); osm_dump_mc_record( &p_osmt->log, p_mc_res, OSM_LOG_INFO ); /* tmp_mtu = p_mc_res->mtu & 0x3F; */ @@ -2904,7 +2905,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n" ); mc_req_rec.mtu = IB_MTU_LEN_4096 | IB_PATH_SELECTOR_GREATER_THAN << 6; - cl_memcpy(&mc_req_rec.mgid,&tmp_mgid,sizeof(ib_gid_t)); + memcpy(&mc_req_rec.mgid,&tmp_mgid,sizeof(ib_gid_t)); ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); comp_mask = IB_MCR_COMPMASK_GID | @@ -2947,7 +2948,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons "Checking Proxy Join...\n" ); osmt_init_mc_query_rec(p_osmt, &mc_req_rec); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); /* * Do a blocking query for all NodeRecords in the subnet. @@ -2991,7 +2992,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons if (remote_port_guid != 0x0) { ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER); - cl_memclr(&mc_req_rec.mgid,sizeof(ib_gid_t)); + memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); mc_req_rec.port_gid.unicast.interface_id = remote_port_guid; comp_mask = IB_MCR_COMPMASK_MGID | @@ -3022,7 +3023,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons } p_mc_res = ib_sa_mad_get_payload_ptr(&res_sa_mad); - cl_memcpy(&proxy_mgid,&p_mc_res->mgid,sizeof(ib_gid_t)); + memcpy(&proxy_mgid,&p_mc_res->mgid,sizeof(ib_gid_t)); /* First try a bad deletion then good one */ osmt_init_mc_query_rec(p_osmt, &mc_req_rec); @@ -3105,7 +3106,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons IB_LINK_WIDTH_ACTIVE_1X | IB_PATH_SELECTOR_GREATER_THAN << 6; mc_req_rec.mlid = max_mlid; - cl_memclr(&mc_req_rec.mgid,sizeof(ib_gid_t)); + memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); comp_mask = IB_MCR_COMPMASK_MGID | IB_MCR_COMPMASK_PORT_GID | @@ -3182,14 +3183,14 @@ osmt_run_mcast_flow( IN osmtest_t * cons while( p_mgrp != (osmtest_mgrp_t*)cl_qmap_end( p_mgrp_mlid_tbl ) ) { /* Only if different from IPoIB Mgid try to delete */ - if (cl_memcmp(&osm_ipoib_good_mgid,&p_mgrp->mcmember_rec.mgid,sizeof(osm_ipoib_good_mgid)) && - cl_memcmp(&osm_ts_ipoib_good_mgid,&p_mgrp->mcmember_rec.mgid,sizeof(osm_ts_ipoib_good_mgid))) + if (memcmp(&osm_ipoib_good_mgid,&p_mgrp->mcmember_rec.mgid,sizeof(osm_ipoib_good_mgid)) && + memcmp(&osm_ts_ipoib_good_mgid,&p_mgrp->mcmember_rec.mgid,sizeof(osm_ts_ipoib_good_mgid))) { osmt_init_mc_query_rec(p_osmt, &mc_req_rec); mc_req_rec.mgid = p_mgrp->mcmember_rec.mgid; /* o15-0.1.4 - need to specify the oppsite state for a valid delete */ - if (!cl_memcmp(&special_mgid,&p_mgrp->mcmember_rec.mgid,sizeof(special_mgid))) + if (!memcmp(&special_mgid,&p_mgrp->mcmember_rec.mgid,sizeof(special_mgid))) { mc_req_rec.scope_state = 0x2F; } Index: osm/osmtest/osmt_slvl_vl_arb.c =================================================================== --- osm/osmtest/osmt_slvl_vl_arb.c (revision 7286) +++ osm/osmtest/osmt_slvl_vl_arb.c (working copy) @@ -1,4 +1,5 @@ /* + * Copyright (c) 2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -130,9 +131,9 @@ osmt_query_vl_arb( * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); - cl_memclr( &context, sizeof( context ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); + memset( &context, 0, sizeof( context ) ); context.p_osmt = p_osmt; @@ -348,9 +349,9 @@ osmt_query_slvl_map( * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); - cl_memclr( &context, sizeof( context ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); + memset( &context, 0, sizeof( context ) ); context.p_osmt = p_osmt; Index: osm/osmtest/osmt_inform.c =================================================================== --- osm/osmtest/osmt_inform.c (revision 7286) +++ osm/osmtest/osmt_inform.c (working copy) @@ -1,4 +1,5 @@ /* + * Copyright (c) 2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -136,7 +137,7 @@ osmt_bind_inform_qp( IN osmtest_t * cons p_qp_ctx->p_recv_buf = (uint8_t *)p_qp_ctx->qp_bind_hndl.buf_ptr + 2 * (GRH_LEN + MAD_BLOCK_SIZE); /* Need to clear assigned memory of p_send_buf - before using it to send any data */ - cl_memclr(p_qp_ctx->p_send_buf, MAD_BLOCK_SIZE); + memset(p_qp_ctx->p_send_buf, 0, MAD_BLOCK_SIZE); status = IB_SUCCESS; osm_log( p_log, OSM_LOG_DEBUG, @@ -222,7 +223,7 @@ osmt_reg_unreg_inform_info( IN osmtest_t p_sa_mad->attr_id = IB_MAD_ATTR_INFORM_INFO; /* copy the reference inform info */ - cl_memcpy(p_ii, p_inform_info, sizeof(ib_inform_info_t)); + memcpy(p_ii, p_inform_info, sizeof(ib_inform_info_t)); if (reg_flag) { @@ -449,7 +450,7 @@ osmt_send_trap_wait_for_forward( IN osmt p_osmt->local_port.sm_lid); /* init the MAD */ - cl_memclr(p_smp, sizeof(ib_smp_t)); + memset(p_smp, 0, sizeof(ib_smp_t)); ib_mad_init_new( (ib_mad_t*)p_smp, IB_MCLASS_SUBN_LID, ( uint8_t ) 2, @@ -692,7 +693,7 @@ ib_api_status_t osmt_init_inform_info(IN osmtest_t * const p_osmt, OUT ib_inform_info_t* p_ii) { - cl_memclr(p_ii, sizeof(ib_inform_info_t)); + memset(p_ii, 0, sizeof(ib_inform_info_t)); /* p_ii->lid_range_begin = cl_hton16(1); */ p_ii->lid_range_begin = 0xFFFF; p_ii->lid_range_end = cl_hton16(p_osmt->max_lid); @@ -709,7 +710,7 @@ osmt_init_inform_info_by_trap (IN osmtes IN ib_net16_t trap_num, OUT ib_inform_info_t* p_ii) { - cl_memclr(p_ii, sizeof(ib_inform_info_t)); + memset(p_ii, 0, sizeof(ib_inform_info_t)); /* p_ii->lid_range_begin = cl_hton16(1); */ p_ii->lid_range_begin = 0xFFFF; p_ii->lid_range_end = cl_hton16(p_osmt->max_lid); Index: osm/opensm/osm_lin_fwd_rcv_ctrl.c =================================================================== --- osm/opensm/osm_lin_fwd_rcv_ctrl.c (revision 7286) +++ osm/opensm/osm_lin_fwd_rcv_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -73,7 +73,7 @@ void osm_lft_rcv_ctrl_construct( IN osm_lft_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_pkey_rcv.c =================================================================== --- osm/opensm/osm_pkey_rcv.c (revision 7286) +++ osm/opensm/osm_pkey_rcv.c (working copy) @@ -62,7 +62,7 @@ void osm_pkey_rcv_construct( IN osm_pkey_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); } /********************************************************************** Index: osm/opensm/osm_sm_state_mgr.c =================================================================== --- osm/opensm/osm_sm_state_mgr.c (revision 7286) +++ osm/opensm/osm_sm_state_mgr.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -166,7 +166,7 @@ __osm_sm_state_mgr_send_local_port_info_ * Send a query of SubnGet(PortInfo) to our own port, in order to * update the master_sm_base_lid of the subnet. */ - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); p_port = ( osm_port_t * ) cl_qmap_get( &p_sm_mgr->p_subn->port_guid_tbl, port_guid ); if( p_port == @@ -220,7 +220,7 @@ __osm_sm_state_mgr_send_master_sm_info_r OSM_LOG_ENTER( p_sm_mgr->p_log, __osm_sm_state_mgr_send_master_sm_info_req ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); if( p_sm_mgr->p_subn->sm_state == IB_SMINFO_STATE_STANDBY ) { /* @@ -395,7 +395,7 @@ void osm_sm_state_mgr_construct( IN osm_sm_state_mgr_t * const p_sm_mgr ) { - cl_memclr( p_sm_mgr, sizeof( *p_sm_mgr ) ); + memset( p_sm_mgr, 0, sizeof( *p_sm_mgr ) ); cl_spinlock_construct( &p_sm_mgr->state_lock ); cl_timer_construct( &p_sm_mgr->polling_timer ); } Index: osm/opensm/osm_port.c =================================================================== --- osm/opensm/osm_port.c (revision 7286) +++ osm/opensm/osm_port.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -67,7 +67,7 @@ void osm_physp_construct( IN osm_physp_t* const p_physp ) { - cl_memclr( p_physp, sizeof(*p_physp) ); + memset( p_physp, 0, sizeof(*p_physp) ); osm_dr_path_construct( &p_physp->dr_path ); cl_ptr_vector_construct( &p_physp->slvl_by_port ); osm_pkey_tbl_construct( &p_physp->pkeys ); @@ -93,7 +93,7 @@ osm_physp_destroy( /* free the P_Key Tables */ osm_pkey_tbl_destroy( &p_physp->pkeys ); - cl_memclr( p_physp, sizeof(*p_physp) ); + memset( p_physp, 0, sizeof(*p_physp) ); osm_dr_path_construct( &p_physp->dr_path ); /* clear dr_path */ } } @@ -516,7 +516,8 @@ inline uint64_t __osm_ptr_to_key(void const *p) { uint64_t k = 0; - cl_memcpy(&k, p, sizeof(void *)); + + memcpy(&k, p, sizeof(void *)); return k; } @@ -524,7 +525,8 @@ inline void * __osm_key_to_ptr(uint64_t k) { void *p = 0; - cl_memcpy(&p, &k, sizeof(void *)); + + memcpy(&p, &k, sizeof(void *)); return p; } @@ -649,7 +651,7 @@ __osm_physp_update_new_dr_path( __osm_ptr_to_key(p_physp) ); } - cl_memclr( path_array, sizeof(path_array) ); + memset( path_array, 0, sizeof(path_array) ); p_physp = (osm_physp_t*)cl_list_remove_head( &tmpPortsList ); while ( p_physp != NULL ) { Index: osm/opensm/osm_sa_guidinfo_record.c =================================================================== --- osm/opensm/osm_sa_guidinfo_record.c (revision 7286) +++ osm/opensm/osm_sa_guidinfo_record.c (working copy) @@ -93,7 +93,7 @@ void osm_gir_rcv_construct( IN osm_gir_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); cl_qlock_pool_construct( &p_rcv->pool ); } @@ -177,7 +177,7 @@ __osm_gir_rcv_new_gir( cl_ntoh16( match_lid ), block_num ); } - cl_memclr( &p_rec_item->rec, sizeof( p_rec_item->rec ) ); + memset( &p_rec_item->rec, 0, sizeof( p_rec_item->rec ) ); p_rec_item->rec.lid = match_lid; p_rec_item->rec.block_num = block_num; @@ -570,7 +570,7 @@ osm_gir_rcv_process( Then copy all records from the list into the response payload. */ - cl_memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); + memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); p_resp_sa_mad->method = (uint8_t)(p_resp_sa_mad->method | 0x80); /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; Index: osm/opensm/osm_state_mgr.c =================================================================== --- osm/opensm/osm_state_mgr.c (revision 7286) +++ osm/opensm/osm_state_mgr.c (working copy) @@ -79,7 +79,7 @@ void osm_state_mgr_construct( IN osm_state_mgr_t * const p_mgr ) { - cl_memclr( p_mgr, sizeof( *p_mgr ) ); + memset( p_mgr, 0, sizeof( *p_mgr ) ); cl_spinlock_construct( &p_mgr->state_lock ); cl_spinlock_construct( &p_mgr->idle_lock ); p_mgr->state = OSM_SM_STATE_INIT; @@ -587,7 +587,7 @@ __osm_state_mgr_get_sw_info( p_node = osm_switch_get_node_ptr( p_sw ); p_dr_path = osm_node_get_any_dr_path_ptr( p_node ); - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); mad_context.si_context.node_guid = osm_node_get_node_guid( p_node ); mad_context.si_context.set_method = FALSE; @@ -625,10 +625,10 @@ __osm_state_mgr_get_remote_port_info( /* generate a dr path leaving on the physp to the remote node */ p_dr_path = osm_physp_get_dr_path_ptr( p_physp ); - cl_memcpy( &rem_node_dr_path, p_dr_path, sizeof( osm_dr_path_t ) ); + memcpy( &rem_node_dr_path, p_dr_path, sizeof( osm_dr_path_t ) ); osm_dr_path_extend( &rem_node_dr_path, osm_physp_get_port_num( p_physp ) ); - cl_memclr( &mad_context, sizeof( mad_context ) ); + memset( &mad_context, 0, sizeof( mad_context ) ); mad_context.pi_context.node_guid = osm_node_get_node_guid( osm_physp_get_node_ptr( p_physp ) ); @@ -672,7 +672,7 @@ __osm_state_mgr_sweep_hop_0( OSM_LOG_ENTER( p_mgr->p_log, __osm_state_mgr_sweep_hop_0 ); - cl_memclr( path_array, sizeof( path_array ) ); + memset( path_array, 0, sizeof( path_array ) ); /* * First, get the bind handle. @@ -707,7 +707,7 @@ __osm_state_mgr_sweep_hop_0( CL_PLOCK_RELEASE( p_mgr->p_lock ); - cl_memclr( &ni_context, sizeof( ni_context ) ); + memset( &ni_context, 0, sizeof( ni_context ) ); osm_dr_path_init( &dr_path, h_bind, 0, path_array ); status = osm_req_get( p_mgr->p_req, &dr_path, @@ -933,7 +933,7 @@ __osm_state_mgr_sweep_hop_1( CL_ASSERT( h_bind != OSM_BIND_INVALID_HANDLE ); - cl_memclr( path_array, sizeof( path_array ) ); + memset( path_array, 0, sizeof( path_array ) ); /* the hop_1 operations depend on the type of our node. * Currently - legal nodes that can host SM are SW and CA */ switch ( osm_node_get_type( p_node ) ) @@ -1029,7 +1029,7 @@ __osm_state_mgr_light_sweep_start( p_sw_tbl = &p_mgr->p_subn->sw_guid_tbl; - cl_memclr( path_array, sizeof( path_array ) ); + memset( path_array, 0, sizeof( path_array ) ); /* * First, get the bind handle. @@ -1622,7 +1622,7 @@ __osm_state_mgr_send_handover( * Send a query of SubnSet(SMInfo) HANDOVER to the remote sm given. */ - cl_memclr( &context, sizeof( context ) ); + memset( &context, 0, sizeof( context ) ); p_port = p_sm->p_port; if( p_port == NULL ) { @@ -1722,8 +1722,8 @@ __osm_state_mgr_report_new_ports( /* we need to provide the GID */ port_gid.unicast.prefix = p_mgr->p_subn->opt.subnet_prefix; port_gid.unicast.interface_id = port_guid; - cl_memcpy( &( notice.data_details.ntc_64_67.gid ), - &( port_gid ), sizeof( ib_gid_t ) ); + memcpy( &( notice.data_details.ntc_64_67.gid ), + &( port_gid ), sizeof( ib_gid_t ) ); /* According to page 653 - the issuer gid in this case of trap * is the SM gid, since the SM is the initiator of this trap. */ Index: osm/opensm/osm_sa_vlarb_record.c =================================================================== --- osm/opensm/osm_sa_vlarb_record.c (revision 7286) +++ osm/opensm/osm_sa_vlarb_record.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -95,7 +95,7 @@ void osm_vlarb_rec_rcv_construct( IN osm_vlarb_rec_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); cl_qlock_pool_construct( &p_rcv->pool ); } @@ -191,7 +191,7 @@ __osm_sa_vl_arb_create( ); } - cl_memclr( &p_rec_item->rec, sizeof( p_rec_item->rec ) ); + memset( &p_rec_item->rec, 0, sizeof( p_rec_item->rec ) ); p_rec_item->rec.lid = lid; p_rec_item->rec.port_num = osm_physp_get_port_num( p_physp ); @@ -521,7 +521,7 @@ osm_vlarb_rec_rcv_process( Then copy all records from the list into the response payload. */ - cl_memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); + memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); p_resp_sa_mad->method = (uint8_t)(p_resp_sa_mad->method | 0x80); /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; Index: osm/opensm/osm_sa_multipath_record.c =================================================================== --- osm/opensm/osm_sa_multipath_record.c (revision 7286) +++ osm/opensm/osm_sa_multipath_record.c (working copy) @@ -98,7 +98,7 @@ void osm_mpr_rcv_construct( IN osm_mpr_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); cl_qlock_pool_construct( &p_rcv->pr_pool ); } @@ -1350,7 +1350,7 @@ __osm_mpr_rcv_respond( p_resp_sa_mad = osm_madw_get_sa_mad_ptr( p_resp_madw ); - cl_memcpy( p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE ); + memcpy( p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE ); p_resp_sa_mad->method |= IB_MAD_METHOD_RESP_MASK; /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; Index: osm/opensm/osm_subnet.c =================================================================== --- osm/opensm/osm_subnet.c (revision 7286) +++ osm/opensm/osm_subnet.c (working copy) @@ -71,7 +71,7 @@ void osm_subn_construct( IN osm_subn_t* const p_subn ) { - cl_memclr( p_subn, sizeof(*p_subn) ); + memset( p_subn, 0, sizeof(*p_subn) ); cl_ptr_vector_construct( &p_subn->node_lid_tbl ); cl_ptr_vector_construct( &p_subn->port_lid_tbl ); cl_qmap_init( &p_subn->sw_guid_tbl ); @@ -415,7 +415,7 @@ void osm_subn_set_default_opt( IN osm_subn_opt_t* const p_opt ) { - cl_memclr(p_opt, sizeof(osm_subn_opt_t)); + memset(p_opt, 0, sizeof(osm_subn_opt_t)); p_opt->guid = 0; p_opt->m_key = OSM_DEFAULT_M_KEY; p_opt->sm_key = OSM_DEFAULT_SM_KEY; Index: osm/opensm/osm_sa_sminfo_record_ctrl.c =================================================================== --- osm/opensm/osm_sa_sminfo_record_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_sminfo_record_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_smir_ctrl_construct( IN osm_smir_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sweep_fail_ctrl.c =================================================================== --- osm/opensm/osm_sweep_fail_ctrl.c (revision 7286) +++ osm/opensm/osm_sweep_fail_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -80,7 +80,7 @@ void osm_sweep_fail_ctrl_construct( IN osm_sweep_fail_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_matrix.c =================================================================== --- osm/opensm/osm_matrix.c (revision 7286) +++ osm/opensm/osm_matrix.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -72,7 +72,7 @@ __osm_lid_matrix_vec_init( { osm_lid_matrix_t* const p_lmx = (osm_lid_matrix_t*)context; - cl_memset( p_elem, OSM_NO_PATH, p_lmx->num_ports + 1); + memset( p_elem, OSM_NO_PATH, p_lmx->num_ports + 1); return( CL_SUCCESS ); } @@ -88,7 +88,7 @@ __osm_lid_matrix_vec_clear( osm_lid_matrix_t* const p_lmx = (osm_lid_matrix_t*)context; UNUSED_PARAM( index ); - cl_memset( p_elem, OSM_NO_PATH, p_lmx->num_ports + 1); + memset( p_elem, OSM_NO_PATH, p_lmx->num_ports + 1); } /********************************************************************** Index: osm/opensm/osm_trap_rcv_ctrl.c =================================================================== --- osm/opensm/osm_trap_rcv_ctrl.c (revision 7286) +++ osm/opensm/osm_trap_rcv_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -73,7 +73,7 @@ void osm_trap_rcv_ctrl_construct( IN osm_trap_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sa_portinfo_record_ctrl.c =================================================================== --- osm/opensm/osm_sa_portinfo_record_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_portinfo_record_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_pir_rcv_ctrl_construct( IN osm_pir_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sa_service_record_ctrl.c =================================================================== --- osm/opensm/osm_sa_service_record_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_service_record_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -73,7 +73,7 @@ void osm_sr_rcv_ctrl_construct( IN osm_sr_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sa_lft_record.c =================================================================== --- osm/opensm/osm_sa_lft_record.c (revision 7286) +++ osm/opensm/osm_sa_lft_record.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -89,7 +89,7 @@ void osm_lftr_rcv_construct( IN osm_lftr_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); cl_qlock_pool_construct( &p_rcv->pool ); } @@ -174,7 +174,7 @@ __osm_lftr_rcv_new_lftr( ); } - cl_memclr( &p_rec_item->rec, sizeof(ib_lft_record_t) ); + memset( &p_rec_item->rec, 0, sizeof(ib_lft_record_t) ); p_rec_item->rec.lid = lid; p_rec_item->rec.block_num = block; @@ -465,7 +465,7 @@ osm_lftr_rcv_process( Then copy all records from the list into the response payload. */ - cl_memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); + memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); p_resp_sa_mad->method = (uint8_t)(p_resp_sa_mad->method | 0x80); /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; Index: osm/opensm/osm_pkey_rcv_ctrl.c =================================================================== --- osm/opensm/osm_pkey_rcv_ctrl.c (revision 7286) +++ osm/opensm/osm_pkey_rcv_ctrl.c (working copy) @@ -65,7 +65,7 @@ void osm_pkey_rcv_ctrl_construct( IN osm_pkey_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_db_files.c =================================================================== --- osm/opensm/osm_db_files.c (revision 7286) +++ osm/opensm/osm_db_files.c (working copy) @@ -126,7 +126,7 @@ void osm_db_construct( IN osm_db_t* const p_db ) { - cl_memclr(p_db, sizeof(osm_db_t)); + memset(p_db, 0, sizeof(osm_db_t)); cl_list_construct( &p_db->domains ); } Index: osm/opensm/osm_resp.c =================================================================== --- osm/opensm/osm_resp.c (revision 7286) +++ osm/opensm/osm_resp.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -70,7 +70,7 @@ void osm_resp_construct( IN osm_resp_t* const p_resp ) { - cl_memclr( p_resp, sizeof(*p_resp) ); + memset( p_resp, 0, sizeof(*p_resp) ); } /********************************************************************** @@ -145,7 +145,7 @@ osm_resp_make_resp_smp( p_dest_smp->dr_dlid = p_dest_smp->dr_slid; p_dest_smp->dr_slid = p_dest_smp->dr_dlid; - cl_memcpy( &p_dest_smp->data, p_payload, IB_SMP_DATA_SIZE ); + memcpy( &p_dest_smp->data, p_payload, IB_SMP_DATA_SIZE ); Exit: OSM_LOG_EXIT( p_resp->p_log ); Index: osm/opensm/osm_slvl_map_rcv_ctrl.c =================================================================== --- osm/opensm/osm_slvl_map_rcv_ctrl.c (revision 7286) +++ osm/opensm/osm_slvl_map_rcv_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_slvl_rcv_ctrl_construct( IN osm_slvl_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sa_pkey_record_ctrl.c =================================================================== --- osm/opensm/osm_sa_pkey_record_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_pkey_record_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -65,7 +65,7 @@ void osm_pkey_rec_rcv_ctrl_construct( IN osm_pkey_rec_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_helper.c =================================================================== --- osm/opensm/osm_helper.c (revision 7286) +++ osm/opensm/osm_helper.c (working copy) @@ -963,7 +963,7 @@ osm_dump_multipath_record( if( osm_log_is_active( p_log, log_level ) ) { - cl_memclr(buf_line, sizeof(buf_line)); + memset(buf_line, 0, sizeof(buf_line)); p_gid = p_mpr->gids; if ( p_mpr->sgid_count ) { Index: osm/opensm/osm_sa_service_record.c =================================================================== --- osm/opensm/osm_sa_service_record.c (revision 7286) +++ osm/opensm/osm_sa_service_record.c (working copy) @@ -98,7 +98,7 @@ void osm_sr_rcv_construct( IN osm_sr_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); cl_qlock_pool_construct( &p_rcv->sr_pool ); cl_timer_construct(&p_rcv->sr_timer ); } @@ -399,7 +399,7 @@ __osm_sr_rcv_respond( p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); p_resp_sa_mad = osm_madw_get_sa_mad_ptr( p_resp_madw ); - cl_memcpy( p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE ); + memcpy( p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE ); /* but what if it was a SET ? setting the response bit is not enough */ if (p_rcvd_mad->method == IB_MAD_METHOD_SET) @@ -434,7 +434,7 @@ __osm_sr_rcv_respond( (num_rec == 0)) { p_resp_sa_mad->status = IB_SA_MAD_STATUS_NO_RECORDS; - cl_memclr( p_resp_sr, sizeof(*p_resp_sr) ); + memset( p_resp_sr, 0, sizeof(*p_resp_sr) ); } else { @@ -461,7 +461,7 @@ __osm_sr_rcv_respond( { *p_resp_sr = p_sr_item->service_rec; if (trusted_req == FALSE) - cl_memclr(p_resp_sr->service_key, sizeof(p_resp_sr->service_key)); + memset(p_resp_sr->service_key, 0, sizeof(p_resp_sr->service_key)); num_copied++; } @@ -510,9 +510,9 @@ __get_matching_sr( if((comp_mask & IB_SR_COMPMASK_SGID) == IB_SR_COMPMASK_SGID) { if( - cl_memcmp(&p_sr_item->p_service_rec->service_gid, - &p_svcr->service_record.service_gid, - sizeof(p_svcr->service_record.service_gid)) != 0) + memcmp(&p_sr_item->p_service_rec->service_gid, + &p_svcr->service_record.service_gid, + sizeof(p_svcr->service_record.service_gid)) != 0) return; } if((comp_mask & IB_SR_COMPMASK_SPKEY) == IB_SR_COMPMASK_SPKEY ) @@ -524,17 +524,17 @@ __get_matching_sr( if((comp_mask & IB_SR_COMPMASK_SKEY) == IB_SR_COMPMASK_SKEY) { - if(cl_memcmp(p_sr_item->p_service_rec->service_key , - p_svcr->service_record.service_key, - 16*sizeof(uint8_t))) + if(memcmp(p_sr_item->p_service_rec->service_key , + p_svcr->service_record.service_key, + 16*sizeof(uint8_t))) return; } if((comp_mask & IB_SR_COMPMASK_SNAME) == IB_SR_COMPMASK_SNAME) { if( - cl_memcmp(p_sr_item->p_service_rec->service_name, - p_svcr->service_record.service_name, - sizeof(p_svcr->service_record.service_name)) != 0 + memcmp(p_sr_item->p_service_rec->service_name, + p_svcr->service_record.service_name, + sizeof(p_svcr->service_record.service_name)) != 0 ) return; } Index: osm/opensm/osm_sa_response.c =================================================================== --- osm/opensm/osm_sa_response.c (revision 7286) +++ osm/opensm/osm_sa_response.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -69,7 +69,7 @@ void osm_sa_resp_construct( IN osm_sa_resp_t* const p_resp ) { - cl_memclr( p_resp, sizeof(*p_resp) ); + memset( p_resp, 0, sizeof(*p_resp) ); } /********************************************************************** Index: osm/opensm/osm_sa_portinfo_record.c =================================================================== --- osm/opensm/osm_sa_portinfo_record.c (revision 7286) +++ osm/opensm/osm_sa_portinfo_record.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -93,7 +93,7 @@ void osm_pir_rcv_construct( IN osm_pir_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); cl_qlock_pool_construct( &p_rcv->pool ); } @@ -176,7 +176,7 @@ __osm_pir_rcv_new_pir( cl_ntoh16( lid ), osm_physp_get_port_num( p_physp ) ); } - cl_memclr( &p_rec_item->rec, sizeof( p_rec_item->rec ) ); + memset( &p_rec_item->rec, 0, sizeof( p_rec_item->rec ) ); p_rec_item->rec.lid = lid; p_rec_item->rec.port_info = *osm_physp_get_port_info_ptr( p_physp ); @@ -796,7 +796,7 @@ osm_pir_rcv_process( Then copy all records from the list into the response payload. */ - cl_memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); + memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); p_resp_sa_mad->method = (uint8_t)(p_resp_sa_mad->method | 0x80); /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; Index: osm/opensm/osm_sa_slvl_record_ctrl.c =================================================================== --- osm/opensm/osm_sa_slvl_record_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_slvl_record_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_slvl_rec_rcv_ctrl_construct( IN osm_slvl_rec_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_req.c =================================================================== --- osm/opensm/osm_req.c (revision 7286) +++ osm/opensm/osm_req.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -76,7 +76,7 @@ osm_req_construct( { CL_ASSERT( p_req ); - cl_memclr( p_req, sizeof(*p_req) ); + memset( p_req, 0, sizeof(*p_req) ); } /********************************************************************** @@ -289,8 +289,8 @@ osm_req_set( if( p_context ) p_madw->context = *p_context; - cl_memcpy( osm_madw_get_smp_ptr( p_madw )->data, - p_payload, payload_size ); + memcpy( osm_madw_get_smp_ptr( p_madw )->data, + p_payload, payload_size ); osm_vl15_post( p_req->p_vl15, p_madw ); Index: osm/opensm/osm_sa_pkey_record.c =================================================================== --- osm/opensm/osm_sa_pkey_record.c (revision 7286) +++ osm/opensm/osm_sa_pkey_record.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -83,7 +83,7 @@ void osm_pkey_rec_rcv_construct( IN osm_pkey_rec_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); cl_qlock_pool_construct( &p_rcv->pool ); } @@ -179,7 +179,7 @@ __osm_sa_pkey_create( ); } - cl_memclr( &p_rec_item->rec, sizeof( p_rec_item->rec ) ); + memset( &p_rec_item->rec, 0, sizeof( p_rec_item->rec ) ); p_rec_item->rec.lid = lid; p_rec_item->rec.block_num = block; @@ -535,7 +535,7 @@ osm_pkey_rec_rcv_process( Then copy all records from the list into the response payload. */ - cl_memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); + memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); p_resp_sa_mad->method = (uint8_t)(p_resp_sa_mad->method | 0x80); /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; Index: osm/opensm/osm_sminfo_rcv_ctrl.c =================================================================== --- osm/opensm/osm_sminfo_rcv_ctrl.c (revision 7286) +++ osm/opensm/osm_sminfo_rcv_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -73,7 +73,7 @@ void osm_sminfo_rcv_ctrl_construct( IN osm_sminfo_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sa_lft_record_ctrl.c =================================================================== --- osm/opensm/osm_sa_lft_record_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_lft_record_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_lftr_rcv_ctrl_construct( IN osm_lftr_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_inform.c =================================================================== --- osm/opensm/osm_inform.c (revision 7286) +++ osm/opensm/osm_inform.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -73,7 +73,7 @@ void osm_infr_construct( IN osm_infr_t* const p_infr ) { - cl_memclr( p_infr, sizeof(osm_infr_t) ); + memset( p_infr, 0, sizeof(osm_infr_t) ); } /********************************************************************** **********************************************************************/ @@ -97,7 +97,7 @@ osm_infr_init( /* what else do we need in the inform_record ??? */ /* copy the contents of the provided informinfo */ - cl_memcpy(p_infr,p_infr_rec, sizeof(osm_infr_t)); + memcpy(p_infr,p_infr_rec, sizeof(osm_infr_t)); } @@ -134,7 +134,7 @@ __match_rid_of_inf_rec( osm_infr_t* p_infr = (osm_infr_t*)p_list_item; int32_t count; - count = cl_memcmp( + count = memcmp( &p_infr->inform_record, p_infr_rec, sizeof(p_infr_rec->subscriber_gid) + @@ -209,13 +209,13 @@ __match_inf_rec( int32_t count1, count2; OSM_LOG_ENTER( p_log, __match_inf_rec); - count1 = cl_memcmp(&p_infr->report_addr, &p_infr_rec->report_addr, - sizeof(p_infr_rec->report_addr)); + count1 = memcmp(&p_infr->report_addr, &p_infr_rec->report_addr, + sizeof(p_infr_rec->report_addr)); if (count1) osm_log(p_log, OSM_LOG_DEBUG, "__match_inf_rec : " "Differ by Address\n"); - count2 = cl_memcmp( + count2 = memcmp( &p_infr->inform_record.inform_info, &p_infr_rec->inform_record.inform_info, sizeof(p_infr->inform_record.inform_info)); @@ -457,7 +457,7 @@ __match_notice_to_inf_rec( if (p_ii->gid.unicast.prefix != 0 || p_ii->gid.unicast.interface_id != 0 ) { /* macth by GID */ - if (cl_memcmp(&(p_ii->gid), &(p_ntc->issuer_gid), sizeof(ib_gid_t))) + if (memcmp(&(p_ii->gid), &(p_ntc->issuer_gid), sizeof(ib_gid_t))) { osm_log(p_log, OSM_LOG_DEBUG, "__match_notice_to_inf_rec: " Index: osm/opensm/osm_lin_fwd_rcv.c =================================================================== --- osm/opensm/osm_lin_fwd_rcv.c (revision 7286) +++ osm/opensm/osm_lin_fwd_rcv.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -66,7 +66,7 @@ void osm_lft_rcv_construct( IN osm_lft_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); } /********************************************************************** Index: osm/opensm/osm_service.c =================================================================== --- osm/opensm/osm_service.c (revision 7286) +++ osm/opensm/osm_service.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -60,7 +60,7 @@ void osm_svcr_construct( IN osm_svcr_t* const p_svcr ) { - cl_memclr( p_svcr, sizeof(*p_svcr) ); + memset( p_svcr, 0, sizeof(*p_svcr) ); } /********************************************************************** @@ -124,7 +124,7 @@ __match_rid_of_svc_rec( osm_svcr_t* p_svcr = (osm_svcr_t*)p_list_item; int32_t count; - count = cl_memcmp( + count = memcmp( &p_svcr->service_record, p_svc_rec, sizeof(p_svc_rec->service_id) + Index: osm/opensm/osm_sa_slvl_record.c =================================================================== --- osm/opensm/osm_sa_slvl_record.c (revision 7286) +++ osm/opensm/osm_sa_slvl_record.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -95,7 +95,7 @@ void osm_slvl_rec_rcv_construct( IN osm_slvl_rec_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); cl_qlock_pool_construct( &p_rcv->pool ); } @@ -191,7 +191,7 @@ __osm_sa_slvl_create( ); } - cl_memclr( &p_rec_item->rec, sizeof( p_rec_item->rec ) ); + memset( &p_rec_item->rec, 0, sizeof( p_rec_item->rec ) ); p_rec_item->rec.lid = lid; p_rec_item->rec.out_port_num = osm_physp_get_port_num( p_physp ); @@ -499,7 +499,7 @@ osm_slvl_rec_rcv_process( Then copy all records from the list into the response payload. */ - cl_memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); + memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); p_resp_sa_mad->method = (uint8_t)(p_resp_sa_mad->method | 0x80); /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; Index: osm/opensm/osm_switch.c =================================================================== --- osm/opensm/osm_switch.c (revision 7286) +++ osm/opensm/osm_switch.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -64,7 +64,7 @@ osm_switch_construct( IN osm_switch_t* const p_sw ) { CL_ASSERT( p_sw ); - cl_memclr( p_sw, sizeof(*p_sw) ); + memset( p_sw, 0, sizeof(*p_sw) ); osm_lid_matrix_construct( &p_sw->lmx ); } @@ -193,7 +193,7 @@ osm_switch_get_fwd_tbl_block( if( base_lid_ho <= max_lid_ho ) { - cl_memclr( p_block, IB_SMP_DATA_SIZE ); + memset( p_block, 0, IB_SMP_DATA_SIZE ); /* Determine the range of LIDs we can return with this block. */ @@ -376,9 +376,9 @@ osm_switch_recommend_path( /* Is the sys guid already used ? */ sys_used = FALSE; for (i = 0; !sys_used && (i < *p_num_used_sys); i++) - if (!cl_memcmp(&p_rem_node->node_info.sys_guid, - &remote_sys_guids[i], - sizeof(uint64_t))) + if (!memcmp(&p_rem_node->node_info.sys_guid, + &remote_sys_guids[i], + sizeof(uint64_t))) sys_used = TRUE; /* If not update the least hops for this case */ @@ -396,9 +396,9 @@ osm_switch_recommend_path( /* Else Is the node guid already used ? */ node_used = FALSE; for (i = 0; !node_used && (i < *p_num_used_nodes); i++) - if (!cl_memcmp(&p_rem_node->node_info.node_guid, - &remote_node_guids[i], - sizeof(uint64_t))) + if (!memcmp(&p_rem_node->node_info.node_guid, + &remote_node_guids[i], + sizeof(uint64_t))) node_used = TRUE; @@ -448,13 +448,13 @@ osm_switch_recommend_path( p_physp = osm_node_get_physp_ptr(p_sw->p_node, best_port); p_rem_physp = osm_physp_get_remote(p_physp); p_rem_node = osm_physp_get_node_ptr(p_rem_physp); - cl_memcpy(&remote_node_guids[*p_num_used_nodes], - &(p_rem_node->node_info.node_guid), - sizeof(uint64_t)); + memcpy(&remote_node_guids[*p_num_used_nodes], + &(p_rem_node->node_info.node_guid), + sizeof(uint64_t)); (*p_num_used_nodes)++; - cl_memcpy(&remote_sys_guids[*p_num_used_sys], - &(p_rem_node->node_info.sys_guid), - sizeof(uint64_t)); + memcpy(&remote_sys_guids[*p_num_used_sys], + &(p_rem_node->node_info.sys_guid), + sizeof(uint64_t)); (*p_num_used_sys)++; } Index: osm/opensm/osm_opensm.c =================================================================== --- osm/opensm/osm_opensm.c (revision 7286) +++ osm/opensm/osm_opensm.c (working copy) @@ -74,7 +74,7 @@ void osm_opensm_construct( IN osm_opensm_t * const p_osm ) { - cl_memclr( p_osm, sizeof( *p_osm ) ); + memset( p_osm, 0, sizeof( *p_osm ) ); osm_subn_construct( &p_osm->subn ); osm_sm_construct( &p_osm->sm ); osm_sa_construct( &p_osm->sa ); Index: osm/opensm/osm_sa.c =================================================================== --- osm/opensm/osm_sa.c (revision 7286) +++ osm/opensm/osm_sa.c (working copy) @@ -76,7 +76,7 @@ void osm_sa_construct( IN osm_sa_t* const p_sa ) { - cl_memclr( p_sa, sizeof(*p_sa) ); + memset( p_sa, 0, sizeof(*p_sa) ); p_sa->state = OSM_SA_STATE_INIT; p_sa->sa_trans_id = OSM_SA_INITIAL_TID_VALUE; Index: osm/opensm/osm_vl_arb_rcv_ctrl.c =================================================================== --- osm/opensm/osm_vl_arb_rcv_ctrl.c (revision 7286) +++ osm/opensm/osm_vl_arb_rcv_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_vla_rcv_ctrl_construct( IN osm_vla_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sminfo_rcv.c =================================================================== --- osm/opensm/osm_sminfo_rcv.c (revision 7286) +++ osm/opensm/osm_sminfo_rcv.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -71,7 +71,7 @@ void osm_sminfo_rcv_construct( IN osm_sminfo_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); } /********************************************************************** @@ -157,7 +157,7 @@ __osm_sminfo_rcv_process_get_request( /* No real need to grab the lock for this function. */ - cl_memclr( payload, sizeof( payload ) ); + memset( payload, 0, sizeof( payload ) ); p_smp = osm_madw_get_smp_ptr( p_madw ); @@ -263,7 +263,7 @@ __osm_sminfo_rcv_process_set_request( /* No real need to grab the lock for this function. */ - cl_memclr( payload, sizeof( payload ) ); + memset( payload, 0, sizeof( payload ) ); /* get the lock */ CL_PLOCK_EXCL_ACQUIRE( p_rcv->p_lock ); Index: osm/opensm/osm_multicast.c =================================================================== --- osm/opensm/osm_multicast.c (revision 7286) +++ osm/opensm/osm_multicast.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -81,7 +81,7 @@ void osm_mgrp_construct( IN osm_mgrp_t* const p_mgrp ) { - cl_memclr( p_mgrp, sizeof(*p_mgrp) ); + memset( p_mgrp, 0, sizeof(*p_mgrp) ); cl_qmap_init( &p_mgrp->mcm_port_tbl ); } @@ -332,9 +332,9 @@ osm_mgrp_send_delete_notice( notice.issuer_lid = p_subn->sm_base_lid; /* following o14-12.1.11 and table 120 p726 */ /* we need to provide the MGID */ - cl_memcpy(&(notice.data_details.ntc_64_67.gid), - &(p_mgrp->mcmember_rec.mgid), - sizeof(ib_gid_t)); + memcpy(&(notice.data_details.ntc_64_67.gid), + &(p_mgrp->mcmember_rec.mgid), + sizeof(ib_gid_t)); /* According to page 653 - the issuer gid in this case of trap is the SM gid, since the SM is the initiator of this trap. */ @@ -379,9 +379,9 @@ osm_mgrp_send_create_notice( notice.issuer_lid = p_subn->sm_base_lid; /* following o14-12.1.11 and table 120 p726 */ /* we need to provide the MGID */ - cl_memcpy(&(notice.data_details.ntc_64_67.gid), - &(p_mgrp->mcmember_rec.mgid), - sizeof(ib_gid_t)); + memcpy(&(notice.data_details.ntc_64_67.gid), + &(p_mgrp->mcmember_rec.mgid), + sizeof(ib_gid_t)); /* According to page 653 - the issuer gid in this case of trap is the SM gid, since the SM is the initiator of this trap. */ Index: osm/opensm/osm_sa_class_port_info.c =================================================================== --- osm/opensm/osm_sa_class_port_info.c (revision 7286) +++ osm/opensm/osm_sa_class_port_info.c (working copy) @@ -83,7 +83,7 @@ void osm_cpi_rcv_construct( IN osm_cpi_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); } /********************************************************************** @@ -140,7 +140,7 @@ __osm_cpi_rcv_respond( OSM_LOG_ENTER( p_rcv->p_log, __osm_cpi_rcv_respond ); - cl_memclr(&zero_gid, sizeof(ib_gid_t)); + memset(&zero_gid, 0, sizeof(ib_gid_t)); /* Get a MAD to reply. Address of Mad is in the received mad_wrapper @@ -160,7 +160,7 @@ __osm_cpi_rcv_respond( p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); p_resp_sa_mad = osm_madw_get_sa_mad_ptr( p_resp_madw ); - cl_memcpy( p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE ); + memcpy( p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE ); p_resp_sa_mad->method |= IB_MAD_METHOD_RESP_MASK; /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; Index: osm/opensm/osm_node_info_rcv.c =================================================================== --- osm/opensm/osm_node_info_rcv.c (revision 7286) +++ osm/opensm/osm_node_info_rcv.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -978,7 +978,7 @@ void osm_ni_rcv_construct( IN osm_ni_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); } /********************************************************************** Index: osm/opensm/osm_sa_vlarb_record_ctrl.c =================================================================== --- osm/opensm/osm_sa_vlarb_record_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_vlarb_record_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_vlarb_rec_rcv_ctrl_construct( IN osm_vlarb_rec_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_mcast_mgr.c =================================================================== --- osm/opensm/osm_mcast_mgr.c (revision 7286) +++ osm/opensm/osm_mcast_mgr.c (working copy) @@ -385,7 +385,7 @@ void osm_mcast_mgr_construct( IN osm_mcast_mgr_t* const p_mgr ) { - cl_memclr( p_mgr, sizeof(*p_mgr) ); + memset( p_mgr, 0, sizeof(*p_mgr) ); } /********************************************************************** @@ -1636,7 +1636,7 @@ osm_mcast_mgr_process_mgrp_cb( OSM_LOG_ENTER( p_mgr->p_log, osm_mcast_mgr_process_mgrp_cb ); /* nice copy no warning on size diff */ - cl_memcpy(&mlid, &p_ctxt->mlid, sizeof(mlid)); + memcpy(&mlid, &p_ctxt->mlid, sizeof(mlid)); /* we can destroy the context now */ cl_free(p_ctxt); Index: osm/opensm/osm_sa_sminfo_record.c =================================================================== --- osm/opensm/osm_sa_sminfo_record.c (revision 7286) +++ osm/opensm/osm_sa_sminfo_record.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -80,7 +80,7 @@ void osm_smir_rcv_construct( IN osm_smir_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); } /********************************************************************** @@ -228,7 +228,7 @@ osm_smir_rcv_process( p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); /* Copy the MAD header back into the response mad */ - cl_memcpy( p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE ); + memcpy( p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE ); p_resp_sa_mad->method = (uint8_t)(p_resp_sa_mad->method | 0x80); /* Fill in the offset (paylen will be done by the rmpp SAR) */ p_resp_sa_mad->attr_offset = Index: osm/opensm/osm_sa_informinfo_ctrl.c =================================================================== --- osm/opensm/osm_sa_informinfo_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_informinfo_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_infr_rcv_ctrl_construct( IN osm_infr_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sm.c =================================================================== --- osm/opensm/osm_sm.c (revision 7286) +++ osm/opensm/osm_sm.c (working copy) @@ -135,7 +135,7 @@ void osm_sm_construct( IN osm_sm_t * const p_sm ) { - cl_memclr( p_sm, sizeof( *p_sm ) ); + memset( p_sm, 0, sizeof( *p_sm ) ); p_sm->thread_state = OSM_THREAD_STATE_NONE; p_sm->sm_trans_id = OSM_SM_INITIAL_TID_VALUE; cl_event_construct( &p_sm->signal ); @@ -596,7 +596,7 @@ __osm_sm_mgrp_connect( */ ctx2 = ( osm_mcast_mgr_ctxt_t * ) cl_malloc( sizeof( osm_mcast_mgr_ctxt_t ) ); - cl_memcpy( &ctx2->mlid, &p_mgrp->mlid, sizeof( p_mgrp->mlid ) ); + memcpy( &ctx2->mlid, &p_mgrp->mlid, sizeof( p_mgrp->mlid ) ); ctx2->req_type = req_type; ctx2->port_guid = port_guid; @@ -629,7 +629,7 @@ __osm_sm_mgrp_disconnect( */ ctx2 = ( osm_mcast_mgr_ctxt_t * ) cl_malloc( sizeof( osm_mcast_mgr_ctxt_t ) ); - cl_memcpy( &ctx2->mlid, &p_mgrp->mlid, sizeof( p_mgrp->mlid ) ); + memcpy( &ctx2->mlid, &p_mgrp->mlid, sizeof( p_mgrp->mlid ) ); ctx2->req_type = OSM_MCAST_REQ_TYPE_LEAVE; ctx2->port_guid = port_guid; Index: osm/opensm/osm_trap_rcv.c =================================================================== --- osm/opensm/osm_trap_rcv.c (revision 7286) +++ osm/opensm/osm_trap_rcv.c (working copy) @@ -178,7 +178,7 @@ void osm_trap_rcv_construct( IN osm_trap_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); cl_event_wheel_construct( &p_rcv->trap_aging_tracker ); } @@ -344,8 +344,8 @@ __osm_trap_rcv_process_request( /* No real need to grab the lock for this function. */ - cl_memclr( payload, sizeof( payload ) ); - cl_memclr( &tmp_madw, sizeof( tmp_madw )); + memset( payload, 0, sizeof( payload ) ); + memset( &tmp_madw, 0, sizeof( tmp_madw )); p_smp = osm_madw_get_smp_ptr( p_madw ); @@ -364,8 +364,8 @@ __osm_trap_rcv_process_request( * payload. */ - cl_memcpy(payload, &(p_smp->data), IB_SMP_DATA_SIZE); - cl_memcpy(&tmp_madw, p_madw, sizeof( tmp_madw )); + memcpy(payload, &(p_smp->data), IB_SMP_DATA_SIZE); + memcpy(&tmp_madw, p_madw, sizeof( tmp_madw )); if (is_gsi == FALSE) { @@ -606,9 +606,9 @@ __osm_trap_rcv_process_request( { if (tmp_madw.mad_addr.addr_type.gsi.global_route) { - cl_memcpy(&(p_ntci->issuer_gid), - &(tmp_madw.mad_addr.addr_type.gsi.grh_info.src_gid), - sizeof(ib_gid_t)); + memcpy(&(p_ntci->issuer_gid), + &(tmp_madw.mad_addr.addr_type.gsi.grh_info.src_gid), + sizeof(ib_gid_t)); } else { Index: osm/opensm/osm_lin_fwd_tbl.c =================================================================== --- osm/opensm/osm_lin_fwd_tbl.c (revision 7286) +++ osm/opensm/osm_lin_fwd_tbl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -83,7 +83,7 @@ osm_lin_tbl_new( /* Initialize the table to OSM_NO_PATH, which means "invalid port" */ - cl_memset( p_tbl, OSM_NO_PATH, __osm_lin_tbl_compute_obj_size( size ) ); + memset( p_tbl, OSM_NO_PATH, __osm_lin_tbl_compute_obj_size( size ) ); if( p_tbl != NULL ) { p_tbl->size = (uint16_t)size; Index: osm/opensm/osm_prtn.c =================================================================== --- osm/opensm/osm_prtn.c (revision 7286) +++ osm/opensm/osm_prtn.c (working copy) @@ -206,10 +206,10 @@ ib_api_status_t osm_prtn_add_mcgroup(osm pkey = cl_hton16(cl_ntoh16(p->pkey) | 0x8000); - cl_memclr(&mc_rec, sizeof(mc_rec)); + memset(&mc_rec, 0, sizeof(mc_rec)); mc_rec.mgid = osm_ipoib_mgid; /* this is ipv4 broadcast */ - cl_memcpy(&mc_rec.mgid.raw[4], &pkey, sizeof(pkey)); + memcpy(&mc_rec.mgid.raw[4], &pkey, sizeof(pkey)); mc_rec.qkey = CL_HTON32(0x0b1b); mc_rec.mtu = mtu ? mtu : 4; /* 2048 Bytes */ @@ -235,7 +235,7 @@ ib_api_status_t osm_prtn_add_mcgroup(osm /* workaround for TS */ /* FIXME: remove this upon TS fixes */ mc_rec.mgid = osm_ts_ipoib_mgid; - cl_memcpy(&mc_rec.mgid.raw[4], &pkey, sizeof(pkey)); + memcpy(&mc_rec.mgid.raw[4], &pkey, sizeof(pkey)); status = osm_mcmr_rcv_find_or_create_new_mgrp(&p_sa->mcmr_rcv, comp_mask, &mc_rec, &p_mgrp); if (p_mgrp) Index: osm/opensm/osm_ucast_mgr.c =================================================================== --- osm/opensm/osm_ucast_mgr.c (revision 7286) +++ osm/opensm/osm_ucast_mgr.c (working copy) @@ -81,7 +81,7 @@ void osm_ucast_mgr_construct( IN osm_ucast_mgr_t* const p_mgr ) { - cl_memclr( p_mgr, sizeof(*p_mgr) ); + memset( p_mgr, 0, sizeof(*p_mgr) ); } /********************************************************************** Index: osm/opensm/osm_sa_informinfo.c =================================================================== --- osm/opensm/osm_sa_informinfo.c (revision 7286) +++ osm/opensm/osm_sa_informinfo.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -74,7 +74,7 @@ void osm_infr_rcv_construct( IN osm_infr_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); } /********************************************************************** @@ -147,8 +147,8 @@ __validate_ports_access_rights( p_rcv->p_subn, &p_infr_rec->report_addr ); - cl_memclr( &zero_gid, sizeof(zero_gid) ); - if ( cl_memcmp (&(p_infr_rec->inform_record.inform_info.gid), + memset( &zero_gid, 0, sizeof(zero_gid) ); + if ( memcmp (&(p_infr_rec->inform_record.inform_info.gid), &zero_gid, sizeof(ib_gid_t) ) ) { /* a gid is defined */ @@ -305,7 +305,7 @@ __osm_infr_rcv_respond( p_resp_sa_mad = osm_madw_get_sa_mad_ptr( p_resp_madw ); /* copy the request InformInfo */ - cl_memcpy( p_resp_sa_mad, p_sa_mad, MAD_BLOCK_SIZE ); + memcpy( p_resp_sa_mad, p_sa_mad, MAD_BLOCK_SIZE ); p_resp_sa_mad->method = IB_MAD_METHOD_GET_RESP; /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; Index: osm/opensm/osm_remote_sm.c =================================================================== --- osm/opensm/osm_remote_sm.c (revision 7286) +++ osm/opensm/osm_remote_sm.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -64,7 +64,7 @@ void osm_remote_sm_construct( IN osm_remote_sm_t* const p_sm ) { - cl_memclr( p_sm, sizeof(*p_sm) ); + memset( p_sm, 0, sizeof(*p_sm) ); } /********************************************************************** @@ -73,7 +73,7 @@ void osm_remote_sm_destroy( IN osm_remote_sm_t* const p_sm ) { - cl_memclr( p_sm, sizeof(*p_sm) ); + memset( p_sm, 0, sizeof(*p_sm) ); } /********************************************************************** Index: osm/opensm/osm_mad_pool.c =================================================================== --- osm/opensm/osm_mad_pool.c (revision 7286) +++ osm/opensm/osm_mad_pool.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -86,7 +86,7 @@ osm_mad_pool_construct( { CL_ASSERT( p_pool ); - cl_memclr( p_pool, sizeof(*p_pool) ); + memset( p_pool, 0, sizeof(*p_pool) ); cl_qlock_pool_construct( &p_pool->madw_pool ); } Index: osm/opensm/osm_sa_mcmember_record_ctrl.c =================================================================== --- osm/opensm/osm_sa_mcmember_record_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_mcmember_record_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -79,7 +79,7 @@ void osm_mcmr_rcv_ctrl_construct( IN osm_mcmr_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_node_info_rcv_ctrl.c =================================================================== --- osm/opensm/osm_node_info_rcv_ctrl.c (revision 7286) +++ osm/opensm/osm_node_info_rcv_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_ni_rcv_ctrl_construct( IN osm_ni_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_link_mgr.c =================================================================== --- osm/opensm/osm_link_mgr.c (revision 7286) +++ osm/opensm/osm_link_mgr.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -66,7 +66,7 @@ void osm_link_mgr_construct( IN osm_link_mgr_t* const p_mgr ) { - cl_memclr( p_mgr, sizeof(*p_mgr) ); + memset( p_mgr, 0, sizeof(*p_mgr) ); } /********************************************************************** @@ -180,10 +180,10 @@ osm_link_mgr_set_physp_pi( p_node = osm_physp_get_node_ptr( p_physp ); p_old_pi = osm_physp_get_port_info_ptr( p_physp ); - cl_memclr( payload, IB_SMP_DATA_SIZE ); + memset( payload, 0, IB_SMP_DATA_SIZE ); /* Correction by FUJITSU */ - cl_memcpy( payload, p_old_pi, sizeof(ib_port_info_t) ); + memcpy( payload, p_old_pi, sizeof(ib_port_info_t) ); /* Correction following a bug injected by the previous @@ -211,32 +211,32 @@ osm_link_mgr_set_physp_pi( port_num == 0 ) { p_pi->m_key = p_mgr->p_subn->opt.m_key; - if (cl_memcmp( &p_pi->m_key, &p_old_pi->m_key, sizeof(p_pi->m_key) )) + if (memcmp( &p_pi->m_key, &p_old_pi->m_key, sizeof(p_pi->m_key) )) send_set = TRUE; p_pi->subnet_prefix = p_mgr->p_subn->opt.subnet_prefix; - if (cl_memcmp( &p_pi->subnet_prefix, &p_old_pi->subnet_prefix, - sizeof(p_pi->subnet_prefix) )) + if (memcmp( &p_pi->subnet_prefix, &p_old_pi->subnet_prefix, + sizeof(p_pi->subnet_prefix) )) send_set = TRUE; p_pi->base_lid = osm_physp_get_base_lid( p_physp ); - if (cl_memcmp( &p_pi->base_lid, &p_old_pi->base_lid, - sizeof(p_pi->base_lid) )) + if (memcmp( &p_pi->base_lid, &p_old_pi->base_lid, + sizeof(p_pi->base_lid) )) send_set = TRUE; /* we are initializing the ports with our local sm_base_lid */ p_pi->master_sm_base_lid = p_mgr->p_subn->sm_base_lid; - if (cl_memcmp( &p_pi->master_sm_base_lid, &p_old_pi->master_sm_base_lid, - sizeof(p_pi->master_sm_base_lid) )) + if (memcmp( &p_pi->master_sm_base_lid, &p_old_pi->master_sm_base_lid, + sizeof(p_pi->master_sm_base_lid) )) send_set = TRUE; p_pi->m_key_lease_period = p_mgr->p_subn->opt.m_key_lease_period; - if (cl_memcmp( &p_pi->m_key_lease_period, &p_old_pi->m_key_lease_period, - sizeof(p_pi->m_key_lease_period) )) + if (memcmp( &p_pi->m_key_lease_period, &p_old_pi->m_key_lease_period, + sizeof(p_pi->m_key_lease_period) )) send_set = TRUE; p_pi->mkey_lmc = p_mgr->p_subn->opt.lmc; - if (cl_memcmp( &p_pi->mkey_lmc, &p_old_pi->mkey_lmc, sizeof(p_pi->mkey_lmc) )) + if (memcmp( &p_pi->mkey_lmc, &p_old_pi->mkey_lmc, sizeof(p_pi->mkey_lmc) )) send_set = TRUE; ib_port_info_set_timeout( p_pi, p_mgr->p_subn->opt.subnet_timeout ); @@ -274,8 +274,8 @@ osm_link_mgr_set_physp_pi( p_pi, p_mgr->p_subn->opt.local_phy_errors_threshold, p_mgr->p_subn->opt.overrun_errors_threshold); - if (cl_memcmp( &p_pi->error_threshold, &p_old_pi->error_threshold, - sizeof(p_pi->error_threshold) )) + if (memcmp( &p_pi->error_threshold, &p_old_pi->error_threshold, + sizeof(p_pi->error_threshold) )) send_set = TRUE; /* @@ -283,8 +283,8 @@ osm_link_mgr_set_physp_pi( then determine the neighbor MTU. */ p_pi->link_width_enabled = p_old_pi->link_width_supported; - if (cl_memcmp( &p_pi->link_width_enabled, &p_old_pi->link_width_enabled, - sizeof(p_pi->link_width_enabled) )) + if (memcmp( &p_pi->link_width_enabled, &p_old_pi->link_width_enabled, + sizeof(p_pi->link_width_enabled) )) send_set = TRUE; /* calc new op_vls and mtu */ Index: osm/opensm/osm_mcast_fwd_rcv_ctrl.c =================================================================== --- osm/opensm/osm_mcast_fwd_rcv_ctrl.c (revision 7286) +++ osm/opensm/osm_mcast_fwd_rcv_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -73,7 +73,7 @@ void osm_mft_rcv_ctrl_construct( IN osm_mft_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sa_node_record.c =================================================================== --- osm/opensm/osm_sa_node_record.c (revision 7286) +++ osm/opensm/osm_sa_node_record.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -86,7 +86,7 @@ void osm_nr_rcv_construct( IN osm_nr_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); cl_qlock_pool_construct( &p_rcv->pool ); } @@ -171,14 +171,14 @@ __osm_nr_rcv_new_nr( ); } - cl_memclr( &p_rec_item->rec, sizeof(ib_node_record_t) ); + memset( &p_rec_item->rec, 0, sizeof(ib_node_record_t) ); p_rec_item->rec.lid = lid; p_rec_item->rec.node_info = p_node->node_info; p_rec_item->rec.node_info.port_guid = port_guid; - cl_memcpy(&(p_rec_item->rec.node_desc), &(p_node->node_desc), - IB_NODE_DESCRIPTION_SIZE); + memcpy(&(p_rec_item->rec.node_desc), &(p_node->node_desc), + IB_NODE_DESCRIPTION_SIZE); cl_qlist_insert_tail( p_list, (cl_list_item_t*)&p_rec_item->pool_item ); Exit: @@ -572,7 +572,7 @@ osm_nr_rcv_process( Then copy all records from the list into the response payload. */ - cl_memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); + memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); p_resp_sa_mad->method = (uint8_t)(p_resp_sa_mad->method | 0x80); /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; Index: osm/opensm/osm_mcast_tbl.c =================================================================== --- osm/opensm/osm_mcast_tbl.c (revision 7286) +++ osm/opensm/osm_mcast_tbl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -68,7 +68,7 @@ osm_mcast_tbl_init( CL_ASSERT( p_tbl ); CL_ASSERT( num_ports ); - cl_memclr( p_tbl, sizeof(*p_tbl) ); + memset( p_tbl, 0, sizeof(*p_tbl) ); p_tbl->max_block_in_use = -1; @@ -285,7 +285,7 @@ osm_mcast_tbl_get_block( /* Caller shouldn't do this for efficiency's sake... */ - cl_memclr( p_block, IB_SMP_DATA_SIZE ); + memset( p_block, 0, IB_SMP_DATA_SIZE ); return( TRUE ); } Index: osm/opensm/osm_sa_mad_ctrl.c =================================================================== --- osm/opensm/osm_sa_mad_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_mad_ctrl.c (working copy) @@ -499,7 +499,7 @@ osm_sa_mad_ctrl_construct( IN osm_sa_mad_ctrl_t* const p_ctrl ) { CL_ASSERT( p_ctrl ); - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_pkey.c =================================================================== --- osm/opensm/osm_pkey.c (revision 7286) +++ osm/opensm/osm_pkey.c (working copy) @@ -124,7 +124,7 @@ void osm_pkey_tbl_sync_new_blocks( break; cl_ptr_vector_set(&((osm_pkey_tbl_t *)p_pkey_tbl)->new_blocks, b, p_new_block); } - cl_memcpy(p_new_block, p_block, sizeof(*p_new_block)); + memcpy(p_new_block, p_block, sizeof(*p_new_block)); } } @@ -154,7 +154,7 @@ int osm_pkey_tbl_set( } /* sets the block values */ - cl_memcpy( p_pkey_block, p_tbl, sizeof(ib_pkey_table_t) ); + memcpy( p_pkey_block, p_tbl, sizeof(ib_pkey_table_t) ); /* NOTE: as the spec does not require uniqueness of PKeys in Index: osm/opensm/osm_req_ctrl.c =================================================================== --- osm/opensm/osm_req_ctrl.c (revision 7286) +++ osm/opensm/osm_req_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -86,7 +86,7 @@ void osm_req_ctrl_construct( IN osm_req_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sa_multipath_record_ctrl.c =================================================================== --- osm/opensm/osm_sa_multipath_record_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_multipath_record_ctrl.c (working copy) @@ -78,7 +78,7 @@ void osm_mpr_rcv_ctrl_construct( IN osm_mpr_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sa_link_record.c =================================================================== --- osm/opensm/osm_sa_link_record.c (revision 7286) +++ osm/opensm/osm_sa_link_record.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -82,7 +82,7 @@ void osm_lr_rcv_construct( IN osm_lr_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); cl_qlock_pool_construct( &p_rcv->lr_pool ); } @@ -669,7 +669,7 @@ __osm_lr_rcv_respond( p_resp_sa_mad = osm_madw_get_sa_mad_ptr( p_resp_madw ); /* Copy the header from the request to response */ - cl_memcpy( p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE ); + memcpy( p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE ); p_resp_sa_mad->method |= IB_MAD_METHOD_RESP_MASK; p_resp_sa_mad->attr_offset = ib_get_attr_offset( sizeof(ib_link_record_t) ); @@ -695,7 +695,7 @@ __osm_lr_rcv_respond( if ((p_rcvd_mad->method == IB_MAD_METHOD_GET) && (num_rec == 0)) { p_resp_sa_mad->status = IB_SA_MAD_STATUS_NO_RECORDS; - cl_memclr( p_resp_lr, sizeof(*p_resp_lr) ); + memset( p_resp_lr, 0, sizeof(*p_resp_lr) ); } else { Index: osm/opensm/osm_sw_info_rcv.c =================================================================== --- osm/opensm/osm_sw_info_rcv.c (revision 7286) +++ osm/opensm/osm_sw_info_rcv.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -535,7 +535,7 @@ void osm_si_rcv_construct( IN osm_si_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); } /********************************************************************** Index: osm/opensm/osm_mcm_port.c =================================================================== --- osm/opensm/osm_mcm_port.c (revision 7286) +++ osm/opensm/osm_mcm_port.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -59,7 +59,7 @@ void osm_mcm_port_construct( IN osm_mcm_port_t* const p_mcm ) { - cl_memclr( p_mcm, sizeof(*p_mcm) ); + memset( p_mcm, 0, sizeof(*p_mcm) ); } /********************************************************************** Index: osm/opensm/osm_mcast_fwd_rcv.c =================================================================== --- osm/opensm/osm_mcast_fwd_rcv.c (revision 7286) +++ osm/opensm/osm_mcast_fwd_rcv.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -68,7 +68,7 @@ void osm_mft_rcv_construct( IN osm_mft_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); } /********************************************************************** Index: osm/opensm/osm_node_desc_rcv_ctrl.c =================================================================== --- osm/opensm/osm_node_desc_rcv_ctrl.c (revision 7286) +++ osm/opensm/osm_node_desc_rcv_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_nd_rcv_ctrl_construct( IN osm_nd_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sm_mad_ctrl.c =================================================================== --- osm/opensm/osm_sm_mad_ctrl.c (revision 7286) +++ osm/opensm/osm_sm_mad_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -927,7 +927,7 @@ osm_sm_mad_ctrl_construct( IN osm_sm_mad_ctrl_t* const p_ctrl ) { CL_ASSERT( p_ctrl ); - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_slvl_map_rcv.c =================================================================== --- osm/opensm/osm_slvl_map_rcv.c (revision 7286) +++ osm/opensm/osm_slvl_map_rcv.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -74,7 +74,7 @@ void osm_slvl_rcv_construct( IN osm_slvl_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); } /********************************************************************** Index: osm/opensm/osm_sa_node_record_ctrl.c =================================================================== --- osm/opensm/osm_sa_node_record_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_node_record_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -73,7 +73,7 @@ void osm_nr_rcv_ctrl_construct( IN osm_nr_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sa_class_port_info_ctrl.c =================================================================== --- osm/opensm/osm_sa_class_port_info_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_class_port_info_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_cpi_rcv_ctrl_construct( IN osm_cpi_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_port_info_rcv_ctrl.c =================================================================== --- osm/opensm/osm_port_info_rcv_ctrl.c (revision 7286) +++ osm/opensm/osm_port_info_rcv_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_pi_rcv_ctrl_construct( IN osm_pi_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_node_desc_rcv.c =================================================================== --- osm/opensm/osm_node_desc_rcv.c (revision 7286) +++ osm/opensm/osm_node_desc_rcv.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -79,7 +79,7 @@ __osm_nd_rcv_process_nd( if( osm_log_is_active( p_rcv->p_log, OSM_LOG_VERBOSE ) ) { - cl_memcpy( desc, p_nd, sizeof(*p_nd) ); + memcpy( desc, p_nd, sizeof(*p_nd) ); /* Guarantee null termination before printing. */ desc[IB_NODE_DESCRIPTION_SIZE] = '\0'; @@ -89,7 +89,7 @@ __osm_nd_rcv_process_nd( cl_ntoh64( osm_node_get_node_guid( p_node )), desc ); } - cl_memcpy( &p_node->node_desc.description, p_nd, sizeof(*p_nd) ); + memcpy( &p_node->node_desc.description, p_nd, sizeof(*p_nd) ); OSM_LOG_EXIT( p_rcv->p_log ); } @@ -100,7 +100,7 @@ void osm_nd_rcv_construct( IN osm_nd_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); } /********************************************************************** Index: osm/opensm/osm_sa_path_record_ctrl.c =================================================================== --- osm/opensm/osm_sa_path_record_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_path_record_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_pr_rcv_ctrl_construct( IN osm_pr_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sw_info_rcv_ctrl.c =================================================================== --- osm/opensm/osm_sw_info_rcv_ctrl.c (revision 7286) +++ osm/opensm/osm_sw_info_rcv_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_si_rcv_ctrl_construct( IN osm_si_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sa_link_record_ctrl.c =================================================================== --- osm/opensm/osm_sa_link_record_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_link_record_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -73,7 +73,7 @@ void osm_lr_rcv_ctrl_construct( IN osm_lr_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_qos.c =================================================================== --- osm/opensm/osm_qos.c (revision 7286) +++ osm/opensm/osm_qos.c (working copy) @@ -88,14 +88,14 @@ static ib_api_status_t vlarb_update_tabl vl_mask = (1 << (ib_port_info_get_op_vls(p_pi) - 1)) - 1; - cl_memset(&block, 0, sizeof(block)); - cl_memcpy(&block, table_block, - block_length * sizeof(block.vl_entry[0])); + memset(&block, 0, sizeof(block)); + memcpy(&block, table_block, + block_length * sizeof(block.vl_entry[0])); for (i = 0; i < block_length; i++) block.vl_entry[i].vl &= vl_mask; - if (!cl_memcmp(&p->vl_arb[block_num], &block, - block_length * sizeof(block.vl_entry[0]))) + if (!memcmp(&p->vl_arb[block_num], &block, + block_length * sizeof(block.vl_entry[0]))) return IB_SUCCESS; context.vla_context.node_guid = @@ -185,7 +185,7 @@ static ib_api_status_t sl2vl_update_tabl } p_tbl = osm_physp_get_slvl_tbl(p, in_port); - if (p_tbl && !cl_memcmp(p_tbl, &tbl, sizeof(tbl))) + if (p_tbl && !memcmp(p_tbl, &tbl, sizeof(tbl))) return IB_SUCCESS; context.slvl_context.node_guid = osm_node_get_node_guid(p_node); @@ -243,8 +243,8 @@ static ib_api_status_t vl_high_limit_upd if (p_pi->vl_high_limit == qcfg->vl_high_limit) return IB_SUCCESS; - cl_memclr(payload, IB_SMP_DATA_SIZE); - cl_memcpy(payload, p_pi, sizeof(ib_port_info_t)); + memset(payload, 0, IB_SMP_DATA_SIZE); + memcpy(payload, p_pi, sizeof(ib_port_info_t)); p_pi = (ib_port_info_t *) payload; p_pi->state_info2 = 0; Index: osm/opensm/osm_sa_mcmember_record.c =================================================================== --- osm/opensm/osm_sa_mcmember_record.c (revision 7286) +++ osm/opensm/osm_sa_mcmember_record.c (working copy) @@ -100,7 +100,7 @@ void osm_mcmr_rcv_construct( IN osm_mcmr_recv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); cl_qlock_pool_construct( &p_rcv->pool ); } @@ -191,9 +191,9 @@ __search_mgrp_by_mgid( /* compare entire MGID so different scope will not sneak in for the same MGID */ - if (cl_memcmp(&p_mgrp->mcmember_rec.mgid, - &p_recvd_mcmember_rec->mgid, - sizeof(ib_gid_t))) + if (memcmp(&p_mgrp->mcmember_rec.mgid, + &p_recvd_mcmember_rec->mgid, + sizeof(ib_gid_t))) return; if(p_ctxt->p_mgrp) @@ -440,8 +440,8 @@ __add_new_mgrp_port( p_rcv->p_subn, p_mad_addr ); - if (! cl_memcmp(&p_recvd_mcmember_rec->port_gid, &requester_gid, - sizeof(ib_gid_t))) + if (!memcmp(&p_recvd_mcmember_rec->port_gid, &requester_gid, + sizeof(ib_gid_t))) { proxy_join = FALSE; osm_log( p_rcv->p_log, OSM_LOG_DEBUG, @@ -524,7 +524,7 @@ __osm_mcmr_rcv_respond( p_resp_sa_mad = (ib_sa_mad_t*)p_resp_madw->p_mad; p_sa_mad = (ib_sa_mad_t*)p_madw->p_mad; /* Copy the MAD header back into the response mad */ - cl_memcpy(p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE); + memcpy(p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE); /* based on the current method decide about the response: */ if ((p_resp_sa_mad->method == IB_MAD_METHOD_GET) || (p_resp_sa_mad->method == IB_MAD_METHOD_SET)) { @@ -776,7 +776,7 @@ __validate_modify(IN osm_mcmr_recv_t* co p_rcv->p_subn, p_mad_addr ); - if (cl_memcmp(&((*pp_mcm_port)->port_gid), &request_gid, sizeof(ib_gid_t))) + if (memcmp(&((*pp_mcm_port)->port_gid), &request_gid, sizeof(ib_gid_t))) { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__validate_modify: " @@ -959,7 +959,7 @@ __validate_requested_mgid(IN osm_mcmr_re } /* the MGID signature can mark IPoIB or SA assigned MGIDs */ - cl_memcpy(&signature, &(p_mcm_rec->mgid.multicast.raw_group_id), sizeof(signature)); + memcpy(&signature, &(p_mcm_rec->mgid.multicast.raw_group_id), sizeof(signature)); signature = cl_ntoh16(signature); osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__validate_requested_mgid: " @@ -1268,12 +1268,12 @@ osm_mcmr_rcv_create_new_mgrp( p_mgid->raw[3] = 0x1B; /* HACK: I will be using the SA port gid for making it globally unique */ - cl_memcpy((&p_mgid->raw[4]), - &p_rcv->p_subn->opt.subnet_prefix, sizeof(uint64_t)); + memcpy((&p_mgid->raw[4]), + &p_rcv->p_subn->opt.subnet_prefix, sizeof(uint64_t)); /* HACK how do we get a unique number - use the mlid twice */ - cl_memcpy(&p_mgid->raw[10], &mlid, sizeof(uint16_t)); - cl_memcpy(&p_mgid->raw[12], &mlid, sizeof(uint16_t)); + memcpy(&p_mgid->raw[10], &mlid, sizeof(uint16_t)); + memcpy(&p_mgid->raw[12], &mlid, sizeof(uint16_t)); osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "osm_mcmr_rcv_create_new_mgrp: " "Allocated new MGID:0x%016" PRIx64 " : " @@ -1841,7 +1841,7 @@ __osm_mcmr_rcv_new_mcmr( goto Exit; } - cl_memclr( &p_rec_item->rec, sizeof( p_rec_item->rec ) ); + memset( &p_rec_item->rec, 0, sizeof( p_rec_item->rec ) ); /* HACK: Not trusted requesters should result with 0 Join State, Port Guid, and Proxy */ @@ -1899,17 +1899,17 @@ __osm_sa_mcm_by_comp_mask_cb( /* first try to eliminate the group by MGID, MLID, or P_Key */ if ((IB_MCR_COMPMASK_MGID & comp_mask) && - cl_memcmp(&p_rcvd_rec->mgid, &p_mgrp->mcmember_rec.mgid, sizeof(ib_gid_t))) + memcmp(&p_rcvd_rec->mgid, &p_mgrp->mcmember_rec.mgid, sizeof(ib_gid_t))) goto Exit; if ((IB_MCR_COMPMASK_MLID & comp_mask) && - cl_memcmp(&p_rcvd_rec->mlid, &p_mgrp->mcmember_rec.mlid, sizeof(uint16_t))) + memcmp(&p_rcvd_rec->mlid, &p_mgrp->mcmember_rec.mlid, sizeof(uint16_t))) goto Exit; /* if the requester physical port doesn't have the pkey that is defined for the group - exit. */ - if (! osm_physp_has_pkey( p_rcv->p_log, p_mgrp->mcmember_rec.pkey, - p_req_physp )) + if (!osm_physp_has_pkey( p_rcv->p_log, p_mgrp->mcmember_rec.pkey, + p_req_physp )) goto Exit; /* now do the rest of the match */ @@ -1975,7 +1975,7 @@ __osm_sa_mcm_by_comp_mask_cb( if (osm_mgrp_is_port_present(p_mgrp, portguid, &p_mcm_port)) { scope_state = p_mcm_port->scope_state; - cl_memcpy(&port_gid, &(p_mcm_port->port_gid), sizeof(ib_gid_t)); + memcpy(&port_gid, &(p_mcm_port->port_gid), sizeof(ib_gid_t)); proxy_join = p_mcm_port->proxy_join; } else @@ -2009,7 +2009,7 @@ __osm_sa_mcm_by_comp_mask_cb( /* add to the list */ match_rec = p_mgrp->mcmember_rec; match_rec.scope_state = p_mcm_port->scope_state; - cl_memcpy( &(match_rec.port_gid), &(p_mcm_port->port_gid), + memcpy( &(match_rec.port_gid), &(p_mcm_port->port_gid), sizeof(ib_gid_t)); osm_log(p_rcv->p_log, OSM_LOG_DEBUG, "__osm_sa_mcm_by_comp_mask_cb: " @@ -2037,7 +2037,7 @@ __osm_sa_mcm_by_comp_mask_cb( /* add to the list */ match_rec = p_mgrp->mcmember_rec; match_rec.scope_state = scope_state; - cl_memcpy(&(match_rec.port_gid), &port_gid, sizeof(ib_gid_t)); + memcpy(&(match_rec.port_gid), &port_gid, sizeof(ib_gid_t)); match_rec.proxy_join = (uint8_t)proxy_join; __osm_mcmr_rcv_new_mcmr(p_rcv, &match_rec, p_ctxt->p_list); @@ -2203,7 +2203,7 @@ osm_mcmr_query_mgrp(IN osm_mcmr_recv_t* Then copy all records from the list into the response payload. */ - cl_memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); + memcpy( p_resp_sa_mad, p_rcvd_mad, IB_SA_MAD_HDR_SIZE ); p_resp_sa_mad->method = (uint8_t)(p_resp_sa_mad->method | 0x80); /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; @@ -2245,7 +2245,7 @@ osm_mcmr_query_mgrp(IN osm_mcmr_recv_t* *p_resp_rec = p_rec_item->rec; if (trusted_req == FALSE) { - cl_memclr(&p_resp_rec->port_gid, sizeof(ib_gid_t)); + memset(&p_resp_rec->port_gid, 0, sizeof(ib_gid_t)); ib_member_set_join_state(p_resp_rec, 0); p_resp_rec->proxy_join = 0; } Index: osm/opensm/osm_vl15intf.c =================================================================== --- osm/opensm/osm_vl15intf.c (revision 7286) +++ osm/opensm/osm_vl15intf.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -282,7 +282,7 @@ void osm_vl15_construct( IN osm_vl15_t* const p_vl ) { - cl_memclr( p_vl, sizeof(*p_vl) ); + memset( p_vl, 0, sizeof(*p_vl) ); p_vl->state = OSM_VL15_STATE_INIT; p_vl->thread_state = OSM_THREAD_STATE_NONE; cl_event_construct( &p_vl->signal ); Index: osm/opensm/osm_port_info_rcv.c =================================================================== --- osm/opensm/osm_port_info_rcv.c (revision 7286) +++ osm/opensm/osm_port_info_rcv.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -198,7 +198,7 @@ __osm_pi_rcv_process_endport( This port indicates it's an SM and it's not our own port. Acquire the SMInfo Attribute. */ - cl_memclr( &context, sizeof(context) ); + memset( &context, 0, sizeof(context) ); context.smi_context.set_method = FALSE; status = osm_req_get( p_rcv->p_req, osm_physp_get_dr_path_ptr( p_physp ), @@ -530,7 +530,7 @@ void osm_pi_rcv_construct( IN osm_pi_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); } /********************************************************************** Index: osm/opensm/osm_drop_mgr.c =================================================================== --- osm/opensm/osm_drop_mgr.c (revision 7286) +++ osm/opensm/osm_drop_mgr.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -72,7 +72,7 @@ osm_drop_mgr_construct( IN osm_drop_mgr_t* const p_mgr ) { CL_ASSERT( p_mgr ); - cl_memclr( p_mgr, sizeof(*p_mgr) ); + memset( p_mgr, 0, sizeof(*p_mgr) ); } /********************************************************************** @@ -322,9 +322,9 @@ __osm_drop_mgr_remove_port( /* we need to provide the GID */ port_gid.unicast.prefix = p_mgr->p_subn->opt.subnet_prefix; port_gid.unicast.interface_id = port_guid; - cl_memcpy(&(notice.data_details.ntc_64_67.gid), - &(port_gid), - sizeof(ib_gid_t)); + memcpy(&(notice.data_details.ntc_64_67.gid), + &(port_gid), + sizeof(ib_gid_t)); /* According to page 653 - the issuer gid in this case of trap is the SM gid, since the SM is the initiator of this trap. */ Index: osm/opensm/osm_state_mgr_ctrl.c =================================================================== --- osm/opensm/osm_state_mgr_ctrl.c (revision 7286) +++ osm/opensm/osm_state_mgr_ctrl.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -77,7 +77,7 @@ void osm_state_mgr_ctrl_construct( IN osm_state_mgr_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sa_guidinfo_record_ctrl.c =================================================================== --- osm/opensm/osm_sa_guidinfo_record_ctrl.c (revision 7286) +++ osm/opensm/osm_sa_guidinfo_record_ctrl.c (working copy) @@ -76,7 +76,7 @@ void osm_gir_rcv_ctrl_construct( IN osm_gir_rcv_ctrl_t* const p_ctrl ) { - cl_memclr( p_ctrl, sizeof(*p_ctrl) ); + memset( p_ctrl, 0, sizeof(*p_ctrl) ); p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; } Index: osm/opensm/osm_sa_path_record.c =================================================================== --- osm/opensm/osm_sa_path_record.c (revision 7286) +++ osm/opensm/osm_sa_path_record.c (working copy) @@ -102,7 +102,7 @@ void osm_pr_rcv_construct( IN osm_pr_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); cl_qlock_pool_construct( &p_rcv->pr_pool ); } @@ -1245,9 +1245,9 @@ __search_mgrp_by_mgid( /* compare entire MGID so different scope will not sneak in for the same MGID */ - if ( cl_memcmp( &p_mgrp->mcmember_rec.mgid, - p_recvd_mgid, - sizeof(ib_gid_t) ) ) + if ( memcmp( &p_mgrp->mcmember_rec.mgid, + p_recvd_mgid, + sizeof(ib_gid_t) ) ) return; #if 0 @@ -1591,7 +1591,7 @@ __osm_pr_rcv_respond( p_sa_mad = osm_madw_get_sa_mad_ptr( p_madw ); p_resp_sa_mad = osm_madw_get_sa_mad_ptr( p_resp_madw ); - cl_memcpy( p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE ); + memcpy( p_resp_sa_mad, p_sa_mad, IB_SA_MAD_HDR_SIZE ); p_resp_sa_mad->method |= IB_MAD_METHOD_RESP_MASK; /* C15-0.1.5 - always return SM_Key = 0 (table 185 p 884) */ p_resp_sa_mad->sm_key = 0; Index: osm/opensm/osm_lid_mgr.c =================================================================== --- osm/opensm/osm_lid_mgr.c (revision 7286) +++ osm/opensm/osm_lid_mgr.c (working copy) @@ -118,7 +118,7 @@ void osm_lid_mgr_construct( IN osm_lid_mgr_t* const p_mgr ) { - cl_memclr( p_mgr, sizeof(*p_mgr) ); + memset( p_mgr, 0, sizeof(*p_mgr) ); cl_ptr_vector_construct( &p_mgr->used_lids ); } @@ -1008,12 +1008,12 @@ __osm_lid_mgr_set_physp_pi( Third, send the SMP to this physical port. */ - cl_memclr( payload, IB_SMP_DATA_SIZE ); + memset( payload, 0, IB_SMP_DATA_SIZE ); /* Correction by FUJITSU */ if( port_num != 0 ) { - cl_memcpy( payload, p_old_pi, sizeof(ib_port_info_t) ); + memcpy( payload, p_old_pi, sizeof(ib_port_info_t) ); } /* @@ -1047,36 +1047,36 @@ __osm_lid_mgr_set_physp_pi( p_pi->m_key = p_mgr->p_subn->opt.m_key; /* Check to see if the value we are setting is different than the value in the port_info. If it is, turn on send_set flag */ - if (cl_memcmp( &p_pi->m_key, &p_old_pi->m_key, sizeof(p_pi->m_key) )) + if (memcmp( &p_pi->m_key, &p_old_pi->m_key, sizeof(p_pi->m_key) )) send_set = TRUE; p_pi->subnet_prefix = p_mgr->p_subn->opt.subnet_prefix; /* Check to see if the value we are setting is different than the value in the port_info. If it is, turn on send_set flag */ - if (cl_memcmp( &p_pi->subnet_prefix, &p_old_pi->subnet_prefix, - sizeof(p_pi->subnet_prefix) )) + if (memcmp( &p_pi->subnet_prefix, &p_old_pi->subnet_prefix, + sizeof(p_pi->subnet_prefix) )) send_set = TRUE; p_pi->base_lid = lid; /* Check to see if the value we are setting is different than the value in the port_info. If it is, turn on send_set flag */ - if (cl_memcmp( &p_pi->base_lid, &p_old_pi->base_lid, - sizeof(p_pi->base_lid) )) + if (memcmp( &p_pi->base_lid, &p_old_pi->base_lid, + sizeof(p_pi->base_lid) )) send_set = TRUE; /* we are updating the ports with our local sm_base_lid */ p_pi->master_sm_base_lid = p_mgr->p_subn->sm_base_lid; /* Check to see if the value we are setting is different than the value in the port_info. If it is, turn on send_set flag */ - if (cl_memcmp( &p_pi->master_sm_base_lid, &p_old_pi->master_sm_base_lid, - sizeof(p_pi->master_sm_base_lid) )) + if (memcmp( &p_pi->master_sm_base_lid, &p_old_pi->master_sm_base_lid, + sizeof(p_pi->master_sm_base_lid) )) send_set = TRUE; p_pi->m_key_lease_period = p_mgr->p_subn->opt.m_key_lease_period; /* Check to see if the value we are setting is different than the value in the port_info. If it is, turn on send_set flag */ - if (cl_memcmp( &p_pi->m_key_lease_period, &p_old_pi->m_key_lease_period, - sizeof(p_pi->m_key_lease_period) )) + if (memcmp( &p_pi->m_key_lease_period, &p_old_pi->m_key_lease_period, + sizeof(p_pi->m_key_lease_period) )) send_set = TRUE; /* @@ -1099,15 +1099,15 @@ __osm_lid_mgr_set_physp_pi( p_pi->link_width_enabled = p_old_pi->link_width_supported; /* Check to see if the value we are setting is different than the value in the port_info. If it is, turn on send_set flag */ - if (cl_memcmp( &p_pi->link_width_enabled, &p_old_pi->link_width_enabled, - sizeof(p_pi->link_width_enabled) )) + if (memcmp( &p_pi->link_width_enabled, &p_old_pi->link_width_enabled, + sizeof(p_pi->link_width_enabled) )) send_set = TRUE; /* M_KeyProtectBits are always zero */ p_pi->mkey_lmc = p_mgr->p_subn->opt.lmc; /* Check to see if the value we are setting is different than the value in the port_info. If it is, turn on send_set flag */ - if (cl_memcmp( &p_pi->mkey_lmc, &p_old_pi->mkey_lmc, sizeof(p_pi->mkey_lmc) )) + if (memcmp( &p_pi->mkey_lmc, &p_old_pi->mkey_lmc, sizeof(p_pi->mkey_lmc) )) send_set = TRUE; /* calc new op_vls and mtu */ @@ -1134,8 +1134,8 @@ __osm_lid_mgr_set_physp_pi( p_mgr->p_subn->opt.local_phy_errors_threshold, p_mgr->p_subn->opt.overrun_errors_threshold); - if (cl_memcmp( &p_pi->error_threshold, &p_old_pi->error_threshold, - sizeof(p_pi->error_threshold) )) + if (memcmp( &p_pi->error_threshold, &p_old_pi->error_threshold, + sizeof(p_pi->error_threshold) )) send_set = TRUE; /* Index: osm/opensm/osm_pkey_mgr.c =================================================================== --- osm/opensm/osm_pkey_mgr.c (revision 7286) +++ osm/opensm/osm_pkey_mgr.c (working copy) @@ -102,8 +102,8 @@ pkey_mgr_enforce_partition( if ((p_pi->vl_enforce & 0xc) == (0xc)*(enforce == TRUE)) return IB_SUCCESS; - cl_memclr( payload, IB_SMP_DATA_SIZE ); - cl_memcpy( payload, p_pi, sizeof(ib_port_info_t) ); + memset( payload, 0, IB_SMP_DATA_SIZE ); + memcpy( payload, p_pi, sizeof(ib_port_info_t) ); p_pi = (ib_port_info_t*)payload; if (enforce == TRUE) @@ -263,7 +263,7 @@ pkey_mgr_update_peer_port( { block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index ); peer_block = osm_pkey_tbl_block_get( p_peer_pkey_tbl, block_index ); - if ( cl_memcmp( peer_block, block, sizeof( *peer_block ) ) ) + if ( memcmp( peer_block, block, sizeof( *peer_block ) ) ) { status = pkey_mgr_update_pkey_entry( p_req, peer, block, block_index ); if ( status == IB_SUCCESS ) @@ -322,7 +322,7 @@ static boolean_t pkey_mgr_update_port( block = osm_pkey_tbl_block_get( p_pkey_tbl, block_index ); new_block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index ); - if (!new_block || !cl_memcmp( new_block, block, sizeof( *block ) ) ) + if (!new_block || !memcmp( new_block, block, sizeof( *block ) ) ) continue; status = pkey_mgr_update_pkey_entry( p_req, p, new_block, block_index ); Index: osm/opensm/osm_vl_arb_rcv.c =================================================================== --- osm/opensm/osm_vl_arb_rcv.c (revision 7286) +++ osm/opensm/osm_vl_arb_rcv.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -74,7 +74,7 @@ void osm_vla_rcv_construct( IN osm_vla_rcv_t* const p_rcv ) { - cl_memclr( p_rcv, sizeof(*p_rcv) ); + memset( p_rcv, 0, sizeof(*p_rcv) ); } /********************************************************************** From rdreier at cisco.com Wed May 17 09:21:15 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 09:21:15 -0700 Subject: [openib-general] SRP [PATCH] Looks like a potantial bug In-Reply-To: <20060517155044.GA5319@mellanox.co.il> (Ishai Rabinovitz's message of "Wed, 17 May 2006 18:50:44 +0300") References: <20060517154004.GA5091@mellanox.co.il> <20060517155044.GA5319@mellanox.co.il> Message-ID: Yes, good catch. Thanks, applied. From mst at mellanox.co.il Wed May 17 09:23:12 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 17 May 2006 19:23:12 +0300 Subject: [openib-general] Re: In-Reply-To: References: <446B2319.9030204@voltaire.com> <20060517154643.GG30211@mellanox.co.il> Message-ID: <20060517162312.GI30211@mellanox.co.il> Quoting r. Roland Dreier : > Michael> But, I think it's still useful to make it possible for > Michael> people to test development snapshots on stable kernels > Michael> simply because we'll get more testing and feedback this > Michael> way. > > It's fine except when API changes force us to diverge from upstream. > Then it becomes a hassle. Yes. Still, its mostly easy. -- MST From ishai at mellanox.co.il Wed May 17 09:21:41 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Wed, 17 May 2006 19:21:41 +0300 Subject: [openib-general] SRP [PATCH] Cleaning in srp_remove_one Message-ID: <20060517162141.GA5396@mellanox.co.il> 3 changes in the same place: 1) The if statement is redundant. 2) There is no need to save the flags - it is inside a mutex_lock. 3) We hold the mutex for the list and we are not deleting from the list so there is no need for list_for_each_entry_safe. Signed-off-by: Ishai Rabinovitz Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-14 14:22:12.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-14 14:26:54.000000000 +0300 @@ -1750,7 +1750,6 @@ static void srp_remove_one(struct ib_dev struct srp_host *host, *tmp_host; LIST_HEAD(target_list); struct srp_target_port *target, *tmp_target; - unsigned long flags; dev_list = ib_get_client_data(device, &srp_client); @@ -1767,12 +1766,10 @@ static void srp_remove_one(struct ib_dev * commands and don't try to reconnect. */ mutex_lock(&host->target_mutex); - list_for_each_entry_safe(target, tmp_target, - &host->target_list, list) { - spin_lock_irqsave(target->scsi_host->host_lock, flags); - if (target->state != SRP_TARGET_REMOVED) - target->state = SRP_TARGET_REMOVED; - spin_unlock_irqrestore(target->scsi_host->host_lock, flags); + list_for_each_entry(target, &host->target_list, list) { + spin_lock_irq(target->scsi_host->host_lock); + target->state = SRP_TARGET_REMOVED; + spin_unlock_irq(target->scsi_host->host_lock); } mutex_unlock(&host->target_mutex); -- Ishai Rabinovitz From sean.hefty at intel.com Wed May 17 09:28:14 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 17 May 2006 09:28:14 -0700 Subject: [openib-general] [PATCH] IB: Make needlessly global ib_mad_cachestatic In-Reply-To: Message-ID: >Any reason not to apply this? Looks fine to apply be me. - Sean From halr at voltaire.com Wed May 17 09:24:11 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 May 2006 12:24:11 -0400 Subject: [openib-general] Re: compilation warning in diags tools In-Reply-To: <200605171847.27448.dotanb@mellanox.co.il> References: <200605171847.27448.dotanb@mellanox.co.il> Message-ID: <1147883048.18971.50642.camel@hal.voltaire.com> On Wed, 2006-05-17 at 11:47, Dotan Barak wrote: > Hi. > > Here is a compilation warning when using gcc 3.4.5: > > src/grouping.c: In function `get_router_slot': > src/grouping.c:213: warning: implicit declaration of function `calloc' > /bin/sh ./libtool --tag=CC --mode=link gcc -m64 -L../libibcommon -libcommon -L../libibumad -libumad -L../osm/opensm/.libs -lopensm -L../os > m/libvendor/.libs -losmvendor -L../osm/complib/.libs -losmcomp -o src/ibnetdiscover src_ibnetdiscover-ibnetdiscover.o src_ibnetdiscover-gro > uping.o ../libibcommon/libibcommon.la ../libibumad/libibumad.la ../libibmad/libibmad.la > > (i think that stdlib.h should be included to prevent this warning) Fixed in r7290. Can you update and try to be sure ? Thanks. -- Hal > > thanks > Dotan From ken at novell.com Wed May 17 09:34:35 2006 From: ken at novell.com (Ken L Johnson) Date: Wed, 17 May 2006 10:34:35 -0600 Subject: [openib-general] ib_mthca fails to load with old firmware In-Reply-To: References: Message-ID: <200605171034.36013.ken@novell.com> Hi Scott - On Wed, 17 May 2006 at 08:40:50 -0700, Scott Weitzenkamp wrote: > What kind of blade systems are these? For some blade systems, Cisco > provides HCA firmware that has been configured to provide better signal > integrity. > > If you run /usr/local/ofed/sbin/tvflash -i, I can then tell which > firmware you need. The blade systems are all Dell 1855's. Here's the output you requested: ---8<--- blade9:/usr/local/ofed/sbin # ./tvflash -i HCA #0: MT25208 Tavor Compat, DLGL, revision A0 Primary image is v4.6.000 build 3.0.0.160, with label 'HCA.DLGL.A0' Secondary image is v4.6.000 build 3.0.0.160, with label 'HCA.DLGL.A0' Vital Product Data Product Name: DLGL P/N: 99-00063-03 E/C: Rev: A8 S/N: 57O1771 Freq/Power: PW=10W;PCIe 8X Date Code: 3105 Checksum: Ok --->8--- Regards, -- Ken L Johnson From sashak at voltaire.com Wed May 17 09:44:33 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 17 May 2006 19:44:33 +0300 Subject: [openib-general] [PATCH] opensm: make more statics Message-ID: <20060517164432.7959.78926.stgit@sashak.voltaire.com> This makes local functions to be static in osm_link_mgr.c. Signed-off-by: Sasha Khapyorsky --- osm/opensm/osm_link_mgr.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/osm/opensm/osm_link_mgr.c b/osm/opensm/osm_link_mgr.c index 5d9ab7d..2b0d2de 100644 --- a/osm/opensm/osm_link_mgr.c +++ b/osm/opensm/osm_link_mgr.c @@ -111,8 +111,8 @@ osm_link_mgr_init( /********************************************************************** **********************************************************************/ -void -osm_link_mgr_set_physp_pi( +static void +__osm_link_mgr_set_physp_pi( IN osm_link_mgr_t* const p_mgr, IN osm_physp_t* const p_physp, IN uint8_t const port_state ) @@ -129,7 +129,7 @@ osm_link_mgr_set_physp_pi( boolean_t send_set = FALSE; osm_physp_t *p_remote_physp; - OSM_LOG_ENTER( p_mgr->p_log, osm_link_mgr_set_physp_pi ); + OSM_LOG_ENTER( p_mgr->p_log, __osm_link_mgr_set_physp_pi ); CL_ASSERT( p_physp ); CL_ASSERT( osm_physp_is_valid( p_physp ) ); @@ -151,7 +151,7 @@ osm_link_mgr_set_physp_pi( if (! p_switch ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, - "osm_link_mgr_set_physp_pi: ERR 4201: " + "__osm_link_mgr_set_physp_pi: ERR 4201: " "Cannot find switch by guid: 0x%" PRIx64 "\n", cl_ntoh64( p_node->node_info.node_guid ) ); goto Exit; @@ -165,7 +165,7 @@ osm_link_mgr_set_physp_pi( if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) { osm_log( p_mgr->p_log, OSM_LOG_DEBUG, - "osm_link_mgr_set_physp_pi: " + "__osm_link_mgr_set_physp_pi: " "Skipping port 0, GUID = 0x%016" PRIx64 "\n", cl_ntoh64( osm_physp_get_port_guid( p_physp ) ) ); } @@ -366,7 +366,7 @@ osm_link_mgr_set_physp_pi( /********************************************************************** **********************************************************************/ -osm_signal_t +static osm_signal_t __osm_link_mgr_process_port( IN osm_link_mgr_t* const p_mgr, IN osm_port_t* const p_port, @@ -419,7 +419,7 @@ __osm_link_mgr_process_port( (current_state < link_state) ) { p_mgr->send_set_reqs = FALSE; - osm_link_mgr_set_physp_pi( + __osm_link_mgr_set_physp_pi( p_mgr, p_physp, link_state ); From mst at mellanox.co.il Wed May 17 09:45:29 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 17 May 2006 19:45:29 +0300 Subject: [openib-general] Re: multcast join failed In-Reply-To: References: <20060517155125.GH30211@mellanox.co.il> Message-ID: <20060517164529.GA12290@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: multcast join failed > > > With svn trunk, I started getting the following on one machine: > > > > ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status -22 > > > > and I can't ping this machine over ipoib. > > Any idea? > > No, nothing of significance has changed in ipoib for a while. When can we get EINVAL from multicast join? -- MST From olson at unixfolk.com Wed May 17 09:47:59 2006 From: olson at unixfolk.com (Dave Olson) Date: Wed, 17 May 2006 09:47:59 -0700 (PDT) Subject: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes In-Reply-To: References: Message-ID: On Mon, 15 May 2006, Roland Dreier wrote: | This looks like a pastiche of several patches. Why can't it be split | up into logical pieces? | | > Call dma_free_coherent without ipath_mutex held. | | Why? Doesn't freeing work with the mutex held? Sure, that's the way the previous code worked. We are seeing a bug (with both our driver native MPI processes and mthca mvapic), where when 8 processes using "simultaneously exit", we get watchdogs and/or hangs in the close routines. Moving the freeing outside the mutex was an attempt to see if we were running into some VM issues by doing lots of page unlocking and freeing with the mutex held. It seemed to help somewhat, but not to solve the problem. It also allows other processes to open and close in a somewhat more timely fashion. Dave Olson olson at unixfolk.com http://www.unixfolk.com/dave From sashak at voltaire.com Wed May 17 10:03:48 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 17 May 2006 20:03:48 +0300 Subject: [openib-general] Re: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out In-Reply-To: <1147867898.18971.44877.camel@hal.voltaire.com> References: <1147864436.18971.43577.camel@hal.voltaire.com> <20060517114133.GX30211@mellanox.co.il> <1147867898.18971.44877.camel@hal.voltaire.com> Message-ID: <20060517170348.GL24906@sashak.voltaire.com> On 08:11 Wed 17 May , Hal Rosenstock wrote: > On Wed, 2006-05-17 at 07:41, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > Subject: [PATCH] [TRIVIAL] OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out > > > > > > OpenSM/osm_sm_mad_ctrl.c: Make some dead code be compiled out > > > > > > Signed-off-by: Hal Rosenstock > > > > > > > > > @@ -844,25 +846,26 @@ __osm_sm_mad_ctrl_send_err_cb( > > > lid. > > > */ > > > /* For now - do not add the alternate dr path to the release */ > > > - if (0) > > > - if ( p_madw->mad_addr.dest_lid != 0xFFFF ) > > > +#if 0 > > > + if ( p_madw->mad_addr.dest_lid != 0xFFFF ) > > > > In my experience, if you compile with -O, gcc does a good enough job of > > dead code elimination. > > But not all builds are that way though. Also "#if 0" makes temporary disabled code more "visible" (for future improvements). Sasha. From mst at mellanox.co.il Wed May 17 10:11:46 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 17 May 2006 20:11:46 +0300 Subject: [openib-general] Re: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines In-Reply-To: <1147882436.18971.50423.camel@hal.voltaire.com> References: <1147882436.18971.50423.camel@hal.voltaire.com> Message-ID: <20060517171146.GB12290@mellanox.co.il> Quoting r. Hal Rosenstock : > * > * SEE ALSO > -* Memory Management, cl_free, cl_malloc, cl_memset, cl_memclr, cl_memcpy, cl_memcmp > +* Memory Management, cl_free, cl_malloc > **********/ Next: cl_malloc/cl_free? -- MST From iod00d at hp.com Wed May 17 10:16:17 2006 From: iod00d at hp.com (Grant Grundler) Date: Wed, 17 May 2006 10:16:17 -0700 Subject: [openib-general] Re: In-Reply-To: References: <446B2319.9030204@voltaire.com> Message-ID: <20060517171617.GA6719@esmail.cup.hp.com> On Wed, May 17, 2006 at 07:40:07AM -0700, Roland Dreier wrote: > Yes, I agree. That's why I think we should get rid of the > "linux-kernel" part of the svn tree entirely. Because everyone who > wants to test new code seems to run last stable kernel + svn drivers > instead of the new development kernel. That's because openib guarantee SVN drivers will build with last stable kernel. Change that policy and document the steps that folks should follow. I'd be willing to occasionally try newer kernels if you think that's what we should be doing. thanks, grant From halr at voltaire.com Wed May 17 10:22:36 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 May 2006 13:22:36 -0400 Subject: [openib-general] Re: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines In-Reply-To: <20060517171146.GB12290@mellanox.co.il> References: <1147882436.18971.50423.camel@hal.voltaire.com> <20060517171146.GB12290@mellanox.co.il> Message-ID: <1147886554.18971.52002.camel@hal.voltaire.com> On Wed, 2006-05-17 at 13:11, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > * > > * SEE ALSO > > -* Memory Management, cl_free, cl_malloc, cl_memset, cl_memclr, cl_memcpy, cl_memcmp > > +* Memory Management, cl_free, cl_malloc > > **********/ > > Next: cl_malloc/cl_free? Yes, I didn't want to hold this part up for that. There will be a separate patch for that but not sure when. Right now, there is memory tracking code implemented. -- Hal From eitan at mellanox.co.il Wed May 17 10:38:53 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 17 May 2006 20:38:53 +0300 Subject: [openib-general] Re: multcast join failed Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBA3@mtlexch01.mtl.com> You are probably running with no partition policy file. I think you need one to get the default partition setup for IPoIB. Hal, is this correct? Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Michael S. Tsirkin > Sent: Wednesday, May 17, 2006 7:45 PM > To: Roland Dreier > Cc: openib-general at openib.org > Subject: [openib-general] Re: multcast join failed > > Quoting r. Roland Dreier : > > Subject: Re: multcast join failed > > > > > With svn trunk, I started getting the following on one machine: > > > > > > ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status -22 > > > > > > and I can't ping this machine over ipoib. > > > Any idea? > > > > No, nothing of significance has changed in ipoib for a while. > > When can we get EINVAL from multicast join? > > -- > MST > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From zhushisongzhu at yahoo.com Wed May 17 10:35:59 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Wed, 17 May 2006 10:35:59 -0700 (PDT) Subject: [openib-general] OFED-1.0-rc4 need db-devel In-Reply-To: <446B43FA.7000709@mellanox.co.il> Message-ID: <20060517173559.31299.qmail@web36915.mail.mud.yahoo.com> why don't use db4-devel. db-devel sems obsolete. zhu --- Vladimir Sokolovsky wrote: > Scott Weitzenkamp (sweitzen) wrote: > >> db-devel package is required to build open_iscsi > package RPM. > >> This package is not relevant for RHEL 4.3. > >> There are two options to install OFED-1.0-rc4 on > RHEL 4.3 without > >> open_iscsi: > >> 1. Select "Custom installation" and don't choose > to install > >> open_iscsi. > >> 2. Edit ofed.conf (created automatically under > OFED-1.0-rc4 directory > >> when you run install.sh or build.sh) and set > *open_iscsi=n*. > >> Then run: > >> ./install.sh -c ofed.conf > >> > > > > Why don't we ignore these packages on RHEL4 U3, > just like we ignore > > uDAPL on ppc64? > > > > Scott > > > > > We will do this in OFED-1.0-rc5. > > Vladimir > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From halr at voltaire.com Wed May 17 10:36:26 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 May 2006 13:36:26 -0400 Subject: [openib-general] Re: [PATCH] opensm: make more statics In-Reply-To: <20060517164432.7959.78926.stgit@sashak.voltaire.com> References: <20060517164432.7959.78926.stgit@sashak.voltaire.com> Message-ID: <1147887385.18971.52332.camel@hal.voltaire.com> On Wed, 2006-05-17 at 12:44, Sasha Khapyorsky wrote: > This makes local functions to be static in osm_link_mgr.c. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to both trunk and 1.0 branch. -- Hal From halr at voltaire.com Wed May 17 10:39:46 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 May 2006 13:39:46 -0400 Subject: [openib-general] Re: multcast join failed In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBA3@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBA3@mtlexch01.mtl.com> Message-ID: <1147887557.18971.52412.camel@hal.voltaire.com> On Wed, 2006-05-17 at 13:38, Eitan Zahavi wrote: > You are probably running with no partition policy file. Assuming he is running with OpenSM from the trunk > I think you need one to get the default partition setup for IPoIB. > > Hal, is this correct? Assuming the above is true, he either needs to run: opensm -N or create a partition configuration file /etc/osm-partitions.conf Default=0x7fff,ipoib:ALL=full; and run: opensm assuming all he cares about is the default partition -- Hal > Eitan Zahavi > Senior Engineering Director, Software Architect > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: openib-general-bounces at openib.org [mailto:openib-general- > > bounces at openib.org] On Behalf Of Michael S. Tsirkin > > Sent: Wednesday, May 17, 2006 7:45 PM > > To: Roland Dreier > > Cc: openib-general at openib.org > > Subject: [openib-general] Re: multcast join failed > > > > Quoting r. Roland Dreier : > > > Subject: Re: multcast join failed > > > > > > > With svn trunk, I started getting the following on one machine: > > > > > > > > ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, > status -22 > > > > > > > > and I can't ping this machine over ipoib. > > > > Any idea? > > > > > > No, nothing of significance has changed in ipoib for a while. > > > > When can we get EINVAL from multicast join? > > > > -- > > MST > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Wed May 17 10:49:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 10:49:05 -0700 Subject: [openib-general] Re: multcast join failed In-Reply-To: <20060517164529.GA12290@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 17 May 2006 19:45:29 +0300") References: <20060517155125.GH30211@mellanox.co.il> <20060517164529.GA12290@mellanox.co.il> Message-ID: Michael> When can we get EINVAL from multicast join? If the SM returned a bad status I think. From halr at voltaire.com Wed May 17 10:57:56 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 May 2006 13:57:56 -0400 Subject: [openib-general] multcast join failed In-Reply-To: <20060517155125.GH30211@mellanox.co.il> References: <20060517155125.GH30211@mellanox.co.il> Message-ID: <1147888298.18971.52686.camel@hal.voltaire.com> On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote: > Hi, Roland! > With svn trunk, I started getting the following on one machine: > > ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status -22 > > and I can't ping this machine over ipoib. > Any idea? What SM are you using ? If OpenSM, are there any errors in the osm.log ? -- Hal From rdreier at cisco.com Wed May 17 11:06:39 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 11:06:39 -0700 Subject: [openib-general] Re: SRP [PATCH] Cleaning in srp_remove_one In-Reply-To: <20060517162141.GA5396@mellanox.co.il> (Ishai Rabinovitz's message of "Wed, 17 May 2006 19:21:41 +0300") References: <20060517162141.GA5396@mellanox.co.il> Message-ID: Thanks. I had already merged some changes from Matthew Wilcox that clean up that loop a little bit, but I merged the rest of your patch too. - R. From rdreier at cisco.com Wed May 17 11:08:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 11:08:09 -0700 Subject: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes In-Reply-To: (Dave Olson's message of "Wed, 17 May 2006 09:47:59 -0700 (PDT)") References: Message-ID: Dave> We are seeing a bug (with both our driver native MPI Dave> processes and mthca mvapic), where when 8 processes using Dave> "simultaneously exit", we get watchdogs and/or hangs in the Dave> close routines. Moving the freeing outside the mutex was an Dave> attempt to see if we were running into some VM issues by Dave> doing lots of page unlocking and freeing with the mutex Dave> held. It seemed to help somewhat, but not to solve the Dave> problem. Am I understanding correctly that you see a hang or watchdog timeout even with the mthca driver? Is there any possibility of posting the test case to reproduce this? It doesn't seem likely that ipath changes are going to fix a generic bug like this... - R. From sashak at voltaire.com Wed May 17 11:13:58 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 17 May 2006 21:13:58 +0300 Subject: [openib-general] Re: multcast join failed In-Reply-To: <1147887557.18971.52412.camel@hal.voltaire.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBA3@mtlexch01.mtl.com> <1147887557.18971.52412.camel@hal.voltaire.com> Message-ID: <20060517181358.GB18370@sashak.voltaire.com> On 13:39 Wed 17 May , Hal Rosenstock wrote: > On Wed, 2006-05-17 at 13:38, Eitan Zahavi wrote: > > You are probably running with no partition policy file. > > Assuming he is running with OpenSM from the trunk > > > I think you need one to get the default partition setup for IPoIB. > > > > Hal, is this correct? > > Assuming the above is true, he either needs to run: > > opensm -N > > or create a partition configuration file /etc/osm-partitions.conf > > Default=0x7fff,ipoib:ALL=full; > > and run: > > opensm > > assuming all he cares about is the default partition Without partition policy file OpenSM should configure Default partition with full membership (pkey=0xffff) for all ports and precreate IPoIB MCG (it is equivalent to "Default=0x7fff,ipoib:ALL=full;" as in Hal's example). Actual pkey tables content could be checked with: $ smpquery pkeys [port number] Sasha From rdreier at cisco.com Wed May 17 11:14:18 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 11:14:18 -0700 Subject: [openib-general][PATCH] srp: throttle command per lun, In-Reply-To: <44622531.6020902@mellanox.com> (Vu Pham's message of "Wed, 10 May 2006 10:38:57 -0700") References: <443E8325.2000502@mellanox.com> <446209EF.7040207@mellanox.com> <44622531.6020902@mellanox.com> Message-ID: Thanks, applied. From mst at mellanox.co.il Wed May 17 11:23:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 17 May 2006 21:23:25 +0300 Subject: [openib-general] Re: multcast join failed In-Reply-To: <1147888298.18971.52686.camel@hal.voltaire.com> References: <20060517155125.GH30211@mellanox.co.il> <1147888298.18971.52686.camel@hal.voltaire.com> Message-ID: <20060517182325.GA12742@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: multcast join failed > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote: > > Hi, Roland! > > With svn trunk, I started getting the following on one machine: > > > > ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status -22 > > > > and I can't ping this machine over ipoib. > > Any idea? > > What SM are you using ? > > If OpenSM, are there any errors in the osm.log ? > > -- Hal > opensm I see these May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, sending IB_SA_MAD_STATUS_REQ_INVALID May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, sending IB_SA_MAD_STATUS_REQ_INVALID May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, sending IB_SA_MAD_STATUS_REQ_INVALID May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, sending IB_SA_MAD_STATUS_REQ_INVALID May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, sending IB_SA_MAD_STATUS_REQ_INVALID -- MST From mshefty at ichips.intel.com Wed May 17 11:25:24 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 17 May 2006 11:25:24 -0700 Subject: [openib-general] [librdmacm] changes to cmatose to return a value different than 0 when there is a failure In-Reply-To: <200605171835.44079.dotanb@mellanox.co.il> References: <200605171835.44079.dotanb@mellanox.co.il> Message-ID: <446B6A94.7020306@ichips.intel.com> Dotan Barak wrote: > Added checks to the return values of all of the functions that may fail > (in order to add this test to the regression system). Thanks - applied with one minor change. > + int rc; Changed 'rc' to 'ret' to match the rest of the code. - Sean From sashak at voltaire.com Wed May 17 11:31:23 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 17 May 2006 21:31:23 +0300 Subject: [openib-general] Re: multcast join failed In-Reply-To: <20060517182325.GA12742@mellanox.co.il> References: <20060517155125.GH30211@mellanox.co.il> <1147888298.18971.52686.camel@hal.voltaire.com> <20060517182325.GA12742@mellanox.co.il> Message-ID: <20060517183123.GC18370@sashak.voltaire.com> On 21:23 Wed 17 May , Michael S. Tsirkin wrote: > > opensm > I see these > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID Is it 1x port? Sasha From halr at voltaire.com Wed May 17 11:23:50 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 May 2006 14:23:50 -0400 Subject: [openib-general] Re: multcast join failed In-Reply-To: <20060517182325.GA12742@mellanox.co.il> References: <20060517155125.GH30211@mellanox.co.il> <1147888298.18971.52686.camel@hal.voltaire.com> <20060517182325.GA12742@mellanox.co.il> Message-ID: <1147890227.18971.53438.camel@hal.voltaire.com> On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: multcast join failed > > > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote: > > > Hi, Roland! > > > With svn trunk, I started getting the following on one machine: > > > > > > ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status -22 > > > > > > and I can't ping this machine over ipoib. > > > Any idea? > > > > What SM are you using ? > > > > If OpenSM, are there any errors in the osm.log ? > > > > -- Hal > > > > > opensm > I see these > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > sending IB_SA_MAD_STATUS_REQ_INVALID Are you attempting to join a 4x group from a 1x port (or perhaps there is a MTU mismatch between the port and the group) ? -- Hal From mst at mellanox.co.il Wed May 17 12:15:41 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 17 May 2006 22:15:41 +0300 Subject: [openib-general] Re: multcast join failed In-Reply-To: <1147890227.18971.53438.camel@hal.voltaire.com> References: <20060517155125.GH30211@mellanox.co.il> <1147888298.18971.52686.camel@hal.voltaire.com> <20060517182325.GA12742@mellanox.co.il> <1147890227.18971.53438.camel@hal.voltaire.com> Message-ID: <20060517191541.GC12742@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: multcast join failed > > On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > Subject: Re: multcast join failed > > > > > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote: > > > > Hi, Roland! > > > > With svn trunk, I started getting the following on one machine: > > > > > > > > ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status -22 > > > > > > > > and I can't ping this machine over ipoib. > > > > Any idea? > > > > > > What SM are you using ? > > > > > > If OpenSM, are there any errors in the osm.log ? > > > > > > -- Hal > > > > > > > > > opensm > > I see these > > > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > sending IB_SA_MAD_STATUS_REQ_INVALID > > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > sending IB_SA_MAD_STATUS_REQ_INVALID > > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > sending IB_SA_MAD_STATUS_REQ_INVALID > > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > sending IB_SA_MAD_STATUS_REQ_INVALID > > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > sending IB_SA_MAD_STATUS_REQ_INVALID > > Are you attempting to join a 4x group from a 1x port (or perhaps there > is a MTU mismatch between the port and the group) ? > > -- Hal > Yes, for some reason it came up 1x. but why? -- MST From arlin.r.davis at intel.com Wed May 17 12:17:03 2006 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 17 May 2006 12:17:03 -0700 Subject: [openib-general] [PATCH] uDAPL: fix uCMA provider event types and dapl_ep_create segv bug Message-ID: James, Fix for uCMA provider to return the correct event as a result of rejects. Also, ran into a segv bug with dapl_ep_create when creating without a conn_evd. Thanks, -arlin Signed-off by: Arlin Davis Index: dapl/common/dapl_ep_create.c =================================================================== --- dapl/common/dapl_ep_create.c (revision 7140) +++ dapl/common/dapl_ep_create.c (working copy) @@ -310,7 +310,10 @@ dapl_ep_create ( * * N.B. This should really be done by a util routine. */ - dapl_os_atomic_inc (& ((DAPL_EVD *)connect_evd_handle)->evd_ref_count); + if (connect_evd_handle != DAT_HANDLE_NULL) + { + dapl_os_atomic_inc (& ((DAPL_EVD *)connect_evd_handle)->evd_ref_count); + } /* Optional handles */ if (recv_evd_handle != DAT_HANDLE_NULL) { Index: dapl/openib_cma/dapl_ib_cm.c =================================================================== --- dapl/openib_cma/dapl_ib_cm.c (revision 7140) +++ dapl/openib_cma/dapl_ib_cm.c (working copy) @@ -285,14 +285,24 @@ static void dapli_cm_active_cb(struct da NULL, conn->ep); break; case RDMA_CM_EVENT_REJECTED: + { + ib_cm_events_t cm_event; + + /* no device type specified so assume IB for now */ + if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */ + cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; + else + cm_event = IB_CME_DESTINATION_REJECT; + dapl_dbg_log( DAPL_DBG_TYPE_WARN, " dapli_cm_active_handler: REJECTED reason=%d\n", event->status); - dapl_evd_connection_callback(conn, IB_CME_DESTINATION_REJECT, - NULL, conn->ep); + + dapl_evd_connection_callback(conn, cm_event, NULL, conn->ep); + break; - + } case RDMA_CM_EVENT_ESTABLISHED: dapl_dbg_log(DAPL_DBG_TYPE_CM, @@ -381,6 +391,14 @@ static void dapli_cm_passive_cb(struct d break; case RDMA_CM_EVENT_REJECTED: + { + ib_cm_events_t cm_event; + + /* no device type specified so assume IB for now */ + if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */ + cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; + else + cm_event = IB_CME_DESTINATION_REJECT; dapl_dbg_log( DAPL_DBG_TYPE_WARN, @@ -395,10 +413,11 @@ static void dapli_cm_passive_cb(struct d &ipaddr->dst_addr)->sin_addr.s_addr), ntohs(((struct sockaddr_in *) &ipaddr->dst_addr)->sin_port)); - - dapls_cr_callback(conn, IB_CME_DESTINATION_REJECT, - NULL, conn->sp); + + dapl_cr_callback(conn, cm_event, NULL, conn->sp); + break; + } case RDMA_CM_EVENT_ESTABLISHED: dapl_dbg_log(DAPL_DBG_TYPE_CM, From halr at voltaire.com Wed May 17 12:18:00 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 May 2006 15:18:00 -0400 Subject: [openib-general] Re: multcast join failed In-Reply-To: <20060517191541.GC12742@mellanox.co.il> References: <20060517155125.GH30211@mellanox.co.il> <1147888298.18971.52686.camel@hal.voltaire.com> <20060517182325.GA12742@mellanox.co.il> <1147890227.18971.53438.camel@hal.voltaire.com> <20060517191541.GC12742@mellanox.co.il> Message-ID: <1147893478.18971.54629.camel@hal.voltaire.com> On Wed, 2006-05-17 at 15:15, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: multcast join failed > > > > On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock : > > > > Subject: Re: multcast join failed > > > > > > > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote: > > > > > Hi, Roland! > > > > > With svn trunk, I started getting the following on one machine: > > > > > > > > > > ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status -22 > > > > > > > > > > and I can't ping this machine over ipoib. > > > > > Any idea? > > > > > > > > What SM are you using ? > > > > > > > > If OpenSM, are there any errors in the osm.log ? > > > > > > > > -- Hal > > > > > > > > > > > > > opensm > > > I see these > > > > > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > Are you attempting to join a 4x group from a 1x port (or perhaps there > > is a MTU mismatch between the port and the group) ? > > > > -- Hal > > > > Yes, for some reason it came up 1x. but why? If LinkWidthEnabled on both the HCA and switch port indicate 1x/4x, it is an autonegotiation thing. Perhaps you have a bad cable ? -- Hal From mst at mellanox.co.il Wed May 17 12:28:23 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 17 May 2006 22:28:23 +0300 Subject: [openib-general] Re: multcast join failed In-Reply-To: <1147893478.18971.54629.camel@hal.voltaire.com> References: <20060517155125.GH30211@mellanox.co.il> <1147888298.18971.52686.camel@hal.voltaire.com> <20060517182325.GA12742@mellanox.co.il> <1147890227.18971.53438.camel@hal.voltaire.com> <20060517191541.GC12742@mellanox.co.il> <1147893478.18971.54629.camel@hal.voltaire.com> Message-ID: <20060517192823.GE12742@mellanox.co.il> Quoting r. Hal Rosenstock : > > Yes, for some reason it came up 1x. but why? > > If LinkWidthEnabled on both the HCA and switch port indicate 1x/4x, it > is an autonegotiation thing. Perhaps you have a bad cable ? Hmm. -- MST From mst at mellanox.co.il Wed May 17 12:29:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 17 May 2006 22:29:43 +0300 Subject: [openib-general] Re: multcast join failed In-Reply-To: <1147893478.18971.54629.camel@hal.voltaire.com> References: <20060517155125.GH30211@mellanox.co.il> <1147888298.18971.52686.camel@hal.voltaire.com> <20060517182325.GA12742@mellanox.co.il> <1147890227.18971.53438.camel@hal.voltaire.com> <20060517191541.GC12742@mellanox.co.il> <1147893478.18971.54629.camel@hal.voltaire.com> Message-ID: <20060517192943.GF12742@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: multcast join failed > > On Wed, 2006-05-17 at 15:15, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > Subject: Re: multcast join failed > > > > > > On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote: > > > > Quoting r. Hal Rosenstock : > > > > > Subject: Re: multcast join failed > > > > > > > > > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote: > > > > > > Hi, Roland! > > > > > > With svn trunk, I started getting the following on one machine: > > > > > > > > > > > > ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status -22 > > > > > > > > > > > > and I can't ping this machine over ipoib. > > > > > > Any idea? > > > > > > > > > > What SM are you using ? > > > > > > > > > > If OpenSM, are there any errors in the osm.log ? > > > > > > > > > > -- Hal > > > > > > > > > > > > > > > > > opensm > > > > I see these > > > > > > > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > > > Are you attempting to join a 4x group from a 1x port (or perhaps there > > > is a MTU mismatch between the port and the group) ? > > > > > > -- Hal > > > > > > > Yes, for some reason it came up 1x. but why? > > If LinkWidthEnabled on both the HCA and switch port indicate 1x/4x, it > is an autonegotiation thing. Perhaps you have a bad cable ? OKay, I'll check, but why isn't ipoib working? Why is the mcast group 4x? ITs a back-to-back configuration ... -- MST From rdreier at cisco.com Wed May 17 11:08:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 11:08:09 -0700 Subject: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes In-Reply-To: (Dave Olson's message of "Wed, 17 May 2006 09:47:59 -0700 (PDT)") References: Message-ID: Dave> We are seeing a bug (with both our driver native MPI Dave> processes and mthca mvapic), where when 8 processes using Dave> "simultaneously exit", we get watchdogs and/or hangs in the Dave> close routines. Moving the freeing outside the mutex was an Dave> attempt to see if we were running into some VM issues by Dave> doing lots of page unlocking and freeing with the mutex Dave> held. It seemed to help somewhat, but not to solve the Dave> problem. Am I understanding correctly that you see a hang or watchdog timeout even with the mthca driver? Is there any possibility of posting the test case to reproduce this? It doesn't seem likely that ipath changes are going to fix a generic bug like this... - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From rdreier at cisco.com Wed May 17 11:08:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 11:08:09 -0700 Subject: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes In-Reply-To: (Dave Olson's message of "Wed, 17 May 2006 09:47:59 -0700 (PDT)") References: Message-ID: Dave> We are seeing a bug (with both our driver native MPI Dave> processes and mthca mvapic), where when 8 processes using Dave> "simultaneously exit", we get watchdogs and/or hangs in the Dave> close routines. Moving the freeing outside the mutex was an Dave> attempt to see if we were running into some VM issues by Dave> doing lots of page unlocking and freeing with the mutex Dave> held. It seemed to help somewhat, but not to solve the Dave> problem. Am I understanding correctly that you see a hang or watchdog timeout even with the mthca driver? Is there any possibility of posting the test case to reproduce this? It doesn't seem likely that ipath changes are going to fix a generic bug like this... - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From halr at voltaire.com Wed May 17 12:46:43 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 May 2006 15:46:43 -0400 Subject: [openib-general] Re: multcast join failed In-Reply-To: <20060517192943.GF12742@mellanox.co.il> References: <20060517155125.GH30211@mellanox.co.il> <1147888298.18971.52686.camel@hal.voltaire.com> <20060517182325.GA12742@mellanox.co.il> <1147890227.18971.53438.camel@hal.voltaire.com> <20060517191541.GC12742@mellanox.co.il> <1147893478.18971.54629.camel@hal.voltaire.com> <20060517192943.GF12742@mellanox.co.il> Message-ID: <1147895193.18971.55203.camel@hal.voltaire.com> On Wed, 2006-05-17 at 15:29, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: multcast join failed > > > > On Wed, 2006-05-17 at 15:15, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock : > > > > Subject: Re: multcast join failed > > > > > > > > On Wed, 2006-05-17 at 14:23, Michael S. Tsirkin wrote: > > > > > Quoting r. Hal Rosenstock : > > > > > > Subject: Re: multcast join failed > > > > > > > > > > > > On Wed, 2006-05-17 at 11:51, Michael S. Tsirkin wrote: > > > > > > > Hi, Roland! > > > > > > > With svn trunk, I started getting the following on one machine: > > > > > > > > > > > > > > ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status -22 > > > > > > > > > > > > > > and I can't ping this machine over ipoib. > > > > > > > Any idea? > > > > > > > > > > > > What SM are you using ? > > > > > > > > > > > > If OpenSM, are there any errors in the osm.log ? > > > > > > > > > > > > -- Hal > > > > > > > > > > > > > > > > > > > > > opensm > > > > > I see these > > > > > > > > > > May 17 21:21:05 762211 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > > May 17 21:21:21 641961 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > > May 17 21:21:21 763080 [42804960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > > May 17 21:21:37 642611 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > > May 17 21:21:37 763955 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B12: > > > > > __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed, > > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > > > > > Are you attempting to join a 4x group from a 1x port (or perhaps there > > > > is a MTU mismatch between the port and the group) ? > > > > > > > > -- Hal > > > > > > > > > > Yes, for some reason it came up 1x. but why? > > > > If LinkWidthEnabled on both the HCA and switch port indicate 1x/4x, it > > is an autonegotiation thing. Perhaps you have a bad cable ? > > OKay, I'll check, but why isn't ipoib working? Why is the mcast group 4x? It defaults to 4x. If you want the group to be 1x, do something like the following in /etc/osm-partitions.conf Default=0x7fff,ipoib,rate=2:ALL=full; You can check osm/doc/partition-config.txt for more config info. > ITs a back-to-back configuration ... OK. That shouldn't matter. -- Hal From rdreier at cisco.com Wed May 17 11:08:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 11:08:09 -0700 Subject: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes In-Reply-To: (Dave Olson's message of "Wed, 17 May 2006 09:47:59 -0700 (PDT)") References: Message-ID: Dave> We are seeing a bug (with both our driver native MPI Dave> processes and mthca mvapic), where when 8 processes using Dave> "simultaneously exit", we get watchdogs and/or hangs in the Dave> close routines. Moving the freeing outside the mutex was an Dave> attempt to see if we were running into some VM issues by Dave> doing lots of page unlocking and freeing with the mutex Dave> held. It seemed to help somewhat, but not to solve the Dave> problem. Am I understanding correctly that you see a hang or watchdog timeout even with the mthca driver? Is there any possibility of posting the test case to reproduce this? It doesn't seem likely that ipath changes are going to fix a generic bug like this... - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From rdreier at cisco.com Wed May 17 11:08:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 11:08:09 -0700 Subject: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes In-Reply-To: (Dave Olson's message of "Wed, 17 May 2006 09:47:59 -0700 (PDT)") References: Message-ID: Dave> We are seeing a bug (with both our driver native MPI Dave> processes and mthca mvapic), where when 8 processes using Dave> "simultaneously exit", we get watchdogs and/or hangs in the Dave> close routines. Moving the freeing outside the mutex was an Dave> attempt to see if we were running into some VM issues by Dave> doing lots of page unlocking and freeing with the mutex Dave> held. It seemed to help somewhat, but not to solve the Dave> problem. Am I understanding correctly that you see a hang or watchdog timeout even with the mthca driver? Is there any possibility of posting the test case to reproduce this? It doesn't seem likely that ipath changes are going to fix a generic bug like this... - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From arlin.r.davis at intel.com Wed May 17 14:16:02 2006 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 17 May 2006 14:16:02 -0700 Subject: [openib-general] RE: [PATCH2] uDAPL: fix uCMA provider event types and dapl_ep_create segv bug Message-ID: >-----Original Message----- >From: Arlin Davis [mailto:arlin.r.davis at intel.com] >Sent: Wednesday, May 17, 2006 12:17 PM >To: 'James Lentini' >Cc: openib-general >Subject: [PATCH] uDAPL: fix uCMA provider event types and dapl_ep_create segv bug > >James, > >Fix for uCMA provider to return the correct event as a result of rejects. Also, ran into a segv bug >with dapl_ep_create when creating without a conn_evd. > >Thanks, > >-arlin > > Signed-off by: Arlin Davis Sorry, the last patch was wrong. Try again... -arlin Signed-off by: Arlin Davis Index: dapl/common/dapl_ep_create.c =================================================================== --- dapl/common/dapl_ep_create.c (revision 7299) +++ dapl/common/dapl_ep_create.c (working copy) @@ -310,7 +310,10 @@ dapl_ep_create ( * * N.B. This should really be done by a util routine. */ - dapl_os_atomic_inc (& ((DAPL_EVD *)connect_evd_handle)->evd_ref_count); + if (connect_evd_handle != DAT_HANDLE_NULL) + { + dapl_os_atomic_inc (& ((DAPL_EVD *)connect_evd_handle)->evd_ref_count); + } /* Optional handles */ if (recv_evd_handle != DAT_HANDLE_NULL) { Index: dapl/openib_cma/dapl_ib_cm.c =================================================================== --- dapl/openib_cma/dapl_ib_cm.c (revision 7299) +++ dapl/openib_cma/dapl_ib_cm.c (working copy) @@ -287,14 +287,24 @@ static void dapli_cm_active_cb(struct da NULL, conn->ep); break; case RDMA_CM_EVENT_REJECTED: + { + ib_cm_events_t cm_event; + + /* no device type specified so assume IB for now */ + if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */ + cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; + else + cm_event = IB_CME_DESTINATION_REJECT; + dapl_dbg_log( DAPL_DBG_TYPE_WARN, " dapli_cm_active_handler: REJECTED reason=%d\n", event->status); - dapl_evd_connection_callback(conn, IB_CME_DESTINATION_REJECT, - NULL, conn->ep); + + dapl_evd_connection_callback(conn, cm_event, NULL, conn->ep); + break; - + } case RDMA_CM_EVENT_ESTABLISHED: dapl_dbg_log(DAPL_DBG_TYPE_CM, @@ -383,6 +393,14 @@ static void dapli_cm_passive_cb(struct d break; case RDMA_CM_EVENT_REJECTED: + { + ib_cm_events_t cm_event; + + /* no device type specified so assume IB for now */ + if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */ + cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; + else + cm_event = IB_CME_DESTINATION_REJECT; dapl_dbg_log( DAPL_DBG_TYPE_WARN, @@ -397,10 +415,11 @@ static void dapli_cm_passive_cb(struct d &ipaddr->dst_addr)->sin_addr.s_addr), ntohs(((struct sockaddr_in *) &ipaddr->dst_addr)->sin_port)); - - dapls_cr_callback(conn, IB_CME_DESTINATION_REJECT, - NULL, conn->sp); + + dapls_cr_callback(conn, cm_event, NULL, conn->sp); + break; + } case RDMA_CM_EVENT_ESTABLISHED: dapl_dbg_log(DAPL_DBG_TYPE_CM, From sashak at voltaire.com Wed May 17 15:02:49 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 18 May 2006 01:02:49 +0300 Subject: [openib-general] [PATCH] Replace cl_memory.h by string.h [was: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines] In-Reply-To: <1147882436.18971.50423.camel@hal.voltaire.com> References: <1147882436.18971.50423.camel@hal.voltaire.com> Message-ID: <20060517220249.GC28485@sashak.voltaire.com> On 12:14 Wed 17 May , Hal Rosenstock wrote: > OpenSM: Use memory routines directly and eliminate cl_mem* routines > as these routines are part of ISO C > > Signed-off-by: Hal Rosenstock Following Hal's cleanup this includes string.h header file for proper mem*() functions prototype definitions where necessary, removes/includes cl_memory.h as needed. Also couple of unistd.h additions for close(), sleep() and unlink() calls. Signed-off-by: Sasha Khapyorsky --- osm/complib/cl_event_wheel.c | 1 + osm/complib/cl_map.c | 2 +- osm/complib/cl_memory.c | 1 + osm/complib/cl_perf.c | 2 ++ osm/complib/cl_pool.c | 1 + osm/complib/cl_ptr_vector.c | 1 + osm/complib/cl_threadpool.c | 1 + osm/complib/cl_timer.c | 1 + osm/complib/cl_vector.c | 1 + osm/complib/libosmcomp.map | 3 --- osm/include/complib/cl_byteswap.h | 3 +-- osm/include/complib/cl_memory.h | 1 - osm/include/iba/ib_types.h | 2 +- osm/include/opensm/osm_lin_fwd_tbl.h | 1 + osm/include/opensm/osm_madw.h | 1 + osm/include/opensm/osm_mcm_info.h | 1 + osm/include/opensm/osm_mtree.h | 1 + osm/include/opensm/osm_path.h | 1 + osm/include/opensm/osm_port.h | 1 + osm/include/opensm/osm_port_profile.h | 1 + osm/include/opensm/osm_rand_fwd_tbl.h | 1 + osm/include/vendor/osm_vendor_mlx_svc.h | 2 ++ osm/include/vendor/osm_vendor_mtl.h | 2 -- .../vendor/osm_vendor_mtl_transaction_mgr.h | 1 - osm/include/vendor/osm_vendor_ts.h | 1 - osm/libvendor/osm_pkt_randomizer.c | 2 ++ osm/libvendor/osm_vendor_al.c | 1 + osm/libvendor/osm_vendor_ibumad.c | 10 ++++++---- osm/libvendor/osm_vendor_ibumad_sa.c | 3 +++ osm/libvendor/osm_vendor_mlx.c | 2 ++ osm/libvendor/osm_vendor_mlx_anafa.c | 1 + osm/libvendor/osm_vendor_mlx_dispatcher.c | 1 + osm/libvendor/osm_vendor_mlx_hca.c | 1 + osm/libvendor/osm_vendor_mlx_hca_anafa.c | 1 + osm/libvendor/osm_vendor_mlx_ibmgt.c | 2 ++ osm/libvendor/osm_vendor_mlx_rmpp_ctx.c | 1 + osm/libvendor/osm_vendor_mlx_sa.c | 2 ++ osm/libvendor/osm_vendor_mlx_sar.c | 4 +++- osm/libvendor/osm_vendor_mlx_sender.c | 1 + osm/libvendor/osm_vendor_mlx_sim.c | 2 ++ osm/libvendor/osm_vendor_mlx_ts.c | 2 ++ osm/libvendor/osm_vendor_mlx_ts_anafa.c | 2 ++ osm/libvendor/osm_vendor_mtl.c | 2 ++ osm/libvendor/osm_vendor_mtl_transaction_mgr.c | 1 + osm/libvendor/osm_vendor_test.c | 1 + osm/libvendor/osm_vendor_ts.c | 2 ++ osm/libvendor/osm_vendor_umadt.c | 1 + osm/opensm/osm_db_files.c | 6 ++++-- osm/opensm/osm_db_pack.c | 1 + osm/opensm/osm_drop_mgr.c | 2 ++ osm/opensm/osm_fwd_tbl.c | 1 - osm/opensm/osm_helper.c | 2 +- osm/opensm/osm_inform.c | 1 + osm/opensm/osm_lid_mgr.c | 1 + osm/opensm/osm_lin_fwd_rcv.c | 2 +- osm/opensm/osm_lin_fwd_rcv_ctrl.c | 2 +- osm/opensm/osm_lin_fwd_tbl.c | 1 + osm/opensm/osm_link_mgr.c | 2 +- osm/opensm/osm_mad_pool.c | 1 + osm/opensm/osm_matrix.c | 1 + osm/opensm/osm_mcast_fwd_rcv.c | 2 +- osm/opensm/osm_mcast_fwd_rcv_ctrl.c | 2 +- osm/opensm/osm_mcast_mgr.c | 2 ++ osm/opensm/osm_mcast_tbl.c | 1 + osm/opensm/osm_mcm_info.c | 1 + osm/opensm/osm_mcm_port.c | 2 ++ osm/opensm/osm_mtree.c | 1 + osm/opensm/osm_multicast.c | 1 + osm/opensm/osm_node_desc_rcv.c | 2 +- osm/opensm/osm_node_desc_rcv_ctrl.c | 2 +- osm/opensm/osm_node_info_rcv.c | 2 +- osm/opensm/osm_node_info_rcv_ctrl.c | 2 +- osm/opensm/osm_opensm.c | 4 +--- osm/opensm/osm_pkey.c | 1 + osm/opensm/osm_pkey_mgr.c | 1 + osm/opensm/osm_pkey_rcv.c | 2 +- osm/opensm/osm_pkey_rcv_ctrl.c | 2 +- osm/opensm/osm_port.c | 1 + osm/opensm/osm_port_info_rcv.c | 2 +- osm/opensm/osm_port_info_rcv_ctrl.c | 2 +- osm/opensm/osm_prtn.c | 1 + osm/opensm/osm_qos.c | 1 + osm/opensm/osm_remote_sm.c | 2 +- osm/opensm/osm_req.c | 2 +- osm/opensm/osm_req_ctrl.c | 2 +- osm/opensm/osm_resp.c | 2 +- osm/opensm/osm_sa.c | 2 +- osm/opensm/osm_sa_class_port_info.c | 2 +- osm/opensm/osm_sa_class_port_info_ctrl.c | 2 +- osm/opensm/osm_sa_guidinfo_record.c | 2 +- osm/opensm/osm_sa_guidinfo_record_ctrl.c | 2 +- osm/opensm/osm_sa_informinfo.c | 2 +- osm/opensm/osm_sa_informinfo_ctrl.c | 2 +- osm/opensm/osm_sa_lft_record.c | 1 + osm/opensm/osm_sa_lft_record_ctrl.c | 2 +- osm/opensm/osm_sa_link_record.c | 2 +- osm/opensm/osm_sa_link_record_ctrl.c | 2 +- osm/opensm/osm_sa_mad_ctrl.c | 2 +- osm/opensm/osm_sa_mcmember_record.c | 1 + osm/opensm/osm_sa_mcmember_record_ctrl.c | 2 +- osm/opensm/osm_sa_multipath_record.c | 2 +- osm/opensm/osm_sa_multipath_record_ctrl.c | 2 +- osm/opensm/osm_sa_node_record.c | 1 + osm/opensm/osm_sa_node_record_ctrl.c | 2 +- osm/opensm/osm_sa_path_record.c | 2 +- osm/opensm/osm_sa_path_record_ctrl.c | 2 +- osm/opensm/osm_sa_pkey_record.c | 2 +- osm/opensm/osm_sa_pkey_record_ctrl.c | 2 +- osm/opensm/osm_sa_portinfo_record.c | 2 +- osm/opensm/osm_sa_portinfo_record_ctrl.c | 2 +- osm/opensm/osm_sa_response.c | 2 +- osm/opensm/osm_sa_service_record.c | 2 +- osm/opensm/osm_sa_service_record_ctrl.c | 2 +- osm/opensm/osm_sa_slvl_record.c | 2 +- osm/opensm/osm_sa_slvl_record_ctrl.c | 2 +- osm/opensm/osm_sa_sminfo_record.c | 2 +- osm/opensm/osm_sa_sminfo_record_ctrl.c | 2 +- osm/opensm/osm_sa_vlarb_record.c | 2 +- osm/opensm/osm_sa_vlarb_record_ctrl.c | 2 +- osm/opensm/osm_service.c | 1 + osm/opensm/osm_slvl_map_rcv.c | 2 +- osm/opensm/osm_slvl_map_rcv_ctrl.c | 2 +- osm/opensm/osm_sm.c | 1 + osm/opensm/osm_sm_mad_ctrl.c | 2 +- osm/opensm/osm_sm_state_mgr.c | 2 +- osm/opensm/osm_sminfo_rcv.c | 1 + osm/opensm/osm_sminfo_rcv_ctrl.c | 2 +- osm/opensm/osm_state_mgr.c | 2 ++ osm/opensm/osm_state_mgr_ctrl.c | 2 +- osm/opensm/osm_subnet.c | 2 ++ osm/opensm/osm_sw_info_rcv.c | 2 +- osm/opensm/osm_sw_info_rcv_ctrl.c | 2 +- osm/opensm/osm_sweep_fail_ctrl.c | 2 +- osm/opensm/osm_switch.c | 1 + osm/opensm/osm_trap_rcv.c | 2 +- osm/opensm/osm_trap_rcv_ctrl.c | 2 +- osm/opensm/osm_ucast_mgr.c | 2 ++ osm/opensm/osm_ucast_updn.c | 1 + osm/opensm/osm_vl15intf.c | 2 +- osm/opensm/osm_vl_arb_rcv.c | 2 +- osm/opensm/osm_vl_arb_rcv_ctrl.c | 2 +- osm/osmtest/include/osmtest_subnet.h | 1 + osm/osmtest/osmt_inform.c | 1 - osm/osmtest/osmt_slvl_vl_arb.c | 1 - osm/osmtest/osmtest.c | 2 +- 145 files changed, 166 insertions(+), 88 deletions(-) e117de15a67314817a58b6300b432ec9ffa6a0a5 diff --git a/osm/complib/cl_event_wheel.c b/osm/complib/cl_event_wheel.c index cf04df7..aaaa53d 100644 --- a/osm/complib/cl_event_wheel.c +++ b/osm/complib/cl_event_wheel.c @@ -40,6 +40,7 @@ # include #endif /* HAVE_CONFIG_H */ #include +#include #include #include diff --git a/osm/complib/cl_map.c b/osm/complib/cl_map.c index 974b0d3..8962e9a 100644 --- a/osm/complib/cl_map.c +++ b/osm/complib/cl_map.c @@ -70,10 +70,10 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include -#include /****************************************************************************** diff --git a/osm/complib/cl_memory.c b/osm/complib/cl_memory.c index 49ff45d..a9ae948 100644 --- a/osm/complib/cl_memory.c +++ b/osm/complib/cl_memory.c @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #define _MEM_DEBUG_MODE_ 0 #ifdef _MEM_DEBUG_MODE_ diff --git a/osm/complib/cl_perf.c b/osm/complib/cl_perf.c index 753eba3..0c8ead2 100644 --- a/osm/complib/cl_perf.c +++ b/osm/complib/cl_perf.c @@ -51,6 +51,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include + /* * Always turn on performance tracking when building this file to allow the * performance counter functions to be built into the component library. diff --git a/osm/complib/cl_pool.c b/osm/complib/cl_pool.c index cfd2774..3fe07a8 100644 --- a/osm/complib/cl_pool.c +++ b/osm/complib/cl_pool.c @@ -52,6 +52,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/complib/cl_ptr_vector.c b/osm/complib/cl_ptr_vector.c index bddce00..5ab74c3 100644 --- a/osm/complib/cl_ptr_vector.c +++ b/osm/complib/cl_ptr_vector.c @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include diff --git a/osm/complib/cl_threadpool.c b/osm/complib/cl_threadpool.c index a2f620d..a2a4848 100644 --- a/osm/complib/cl_threadpool.c +++ b/osm/complib/cl_threadpool.c @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/complib/cl_timer.c b/osm/complib/cl_timer.c index 847545f..b3cc3e9 100644 --- a/osm/complib/cl_timer.c +++ b/osm/complib/cl_timer.c @@ -48,6 +48,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/complib/cl_vector.c b/osm/complib/cl_vector.c index 3e1a757..bcda8e0 100644 --- a/osm/complib/cl_vector.c +++ b/osm/complib/cl_vector.c @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include diff --git a/osm/complib/libosmcomp.map b/osm/complib/libosmcomp.map index 7a7ee1d..73fb242 100644 --- a/osm/complib/libosmcomp.map +++ b/osm/complib/libosmcomp.map @@ -87,9 +87,6 @@ OSMCOMP_1.0 { __cl_find_mem; __cl_free_trk; __cl_free_ntrk; - cl_memset; - cl_memcpy; - cl_memcmp; __cl_perf_run_calibration; __cl_perf_construct; __cl_perf_init; diff --git a/osm/include/complib/cl_byteswap.h b/osm/include/complib/cl_byteswap.h index 932d564..d144ea3 100644 --- a/osm/include/complib/cl_byteswap.h +++ b/osm/include/complib/cl_byteswap.h @@ -51,8 +51,7 @@ #ifndef _CL_BYTESWAP_H_ #define _CL_BYTESWAP_H_ - -#include +#include #include #ifdef __cplusplus diff --git a/osm/include/complib/cl_memory.h b/osm/include/complib/cl_memory.h index 9f558ac..4bbf7a2 100644 --- a/osm/include/complib/cl_memory.h +++ b/osm/include/complib/cl_memory.h @@ -52,7 +52,6 @@ #define _CL_MEMORY_H_ #include -#include #ifdef __cplusplus # define BEGIN_C_DECLS extern "C" { diff --git a/osm/include/iba/ib_types.h b/osm/include/iba/ib_types.h index 811d836..b72e810 100644 --- a/osm/include/iba/ib_types.h +++ b/osm/include/iba/ib_types.h @@ -38,9 +38,9 @@ #if !defined(__IB_TYPES_H__) #define __IB_TYPES_H__ +#include #include #include -#include #ifdef __cplusplus # define BEGIN_C_DECLS extern "C" { diff --git a/osm/include/opensm/osm_lin_fwd_tbl.h b/osm/include/opensm/osm_lin_fwd_tbl.h index dee01a9..ca378a8 100644 --- a/osm/include/opensm/osm_lin_fwd_tbl.h +++ b/osm/include/opensm/osm_lin_fwd_tbl.h @@ -50,6 +50,7 @@ #ifndef _OSM_LIN_FWD_TBL_H_ #define _OSM_LIN_FWD_TBL_H_ +#include #include #include diff --git a/osm/include/opensm/osm_madw.h b/osm/include/opensm/osm_madw.h index 2173957..4fde04c 100644 --- a/osm/include/opensm/osm_madw.h +++ b/osm/include/opensm/osm_madw.h @@ -51,6 +51,7 @@ #ifndef _OSM_MADW_H_ #define _OSM_MADW_H_ +#include #include #include #include diff --git a/osm/include/opensm/osm_mcm_info.h b/osm/include/opensm/osm_mcm_info.h index c4d5443..1f325b1 100644 --- a/osm/include/opensm/osm_mcm_info.h +++ b/osm/include/opensm/osm_mcm_info.h @@ -50,6 +50,7 @@ #ifndef _OSM_MCM_INFO_H_ #define _OSM_MCM_INFO_H_ +#include #include #include #include diff --git a/osm/include/opensm/osm_mtree.h b/osm/include/opensm/osm_mtree.h index 57c894b..013112d 100644 --- a/osm/include/opensm/osm_mtree.h +++ b/osm/include/opensm/osm_mtree.h @@ -51,6 +51,7 @@ #ifndef _OSM_MTREE_H_ #define _OSM_MTREE_H_ +#include #include #include #include diff --git a/osm/include/opensm/osm_path.h b/osm/include/opensm/osm_path.h index bf1cc67..cb3bb8e 100644 --- a/osm/include/opensm/osm_path.h +++ b/osm/include/opensm/osm_path.h @@ -38,6 +38,7 @@ #ifndef _OSM_PATH_H_ #define _OSM_PATH_H_ +#include #include #include diff --git a/osm/include/opensm/osm_port.h b/osm/include/opensm/osm_port.h index 46a0064..cf3f6f2 100644 --- a/osm/include/opensm/osm_port.h +++ b/osm/include/opensm/osm_port.h @@ -50,6 +50,7 @@ #ifndef _OSM_PORT_H_ #define _OSM_PORT_H_ +#include #include #include #include diff --git a/osm/include/opensm/osm_port_profile.h b/osm/include/opensm/osm_port_profile.h index 9a58115..9c0f7f7 100644 --- a/osm/include/opensm/osm_port_profile.h +++ b/osm/include/opensm/osm_port_profile.h @@ -50,6 +50,7 @@ #ifndef _OSM_PORT_PROFILE_H_ #define _OSM_PORT_PROFILE_H_ +#include #include #include #include diff --git a/osm/include/opensm/osm_rand_fwd_tbl.h b/osm/include/opensm/osm_rand_fwd_tbl.h index 1d293e5..fac9ffd 100644 --- a/osm/include/opensm/osm_rand_fwd_tbl.h +++ b/osm/include/opensm/osm_rand_fwd_tbl.h @@ -51,6 +51,7 @@ #ifndef _OSM_RAND_FWD_TBL_H_ #define _OSM_RAND_FWD_TBL_H_ #include +#include #include #ifdef __cplusplus diff --git a/osm/include/vendor/osm_vendor_mlx_svc.h b/osm/include/vendor/osm_vendor_mlx_svc.h index 69d379c..e4897d4 100644 --- a/osm/include/vendor/osm_vendor_mlx_svc.h +++ b/osm/include/vendor/osm_vendor_mlx_svc.h @@ -38,7 +38,9 @@ #ifndef _OSMV_SVC_H_ #define _OSMV_SVC_H_ #include +#include #include +#include #include #ifdef __cplusplus diff --git a/osm/include/vendor/osm_vendor_mtl.h b/osm/include/vendor/osm_vendor_mtl.h index 5837867..218bdf7 100644 --- a/osm/include/vendor/osm_vendor_mtl.h +++ b/osm/include/vendor/osm_vendor_mtl.h @@ -60,10 +60,8 @@ #define OUT #include "iba/ib_types.h" #include "iba/ib_al.h" #include -#include #include #include -#include #ifdef __cplusplus # define BEGIN_C_DECLS extern "C" { diff --git a/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h b/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h index 7bf938d..82d2cc2 100644 --- a/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h +++ b/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h @@ -61,7 +61,6 @@ #include #include #include #include -#include #include #include #include diff --git a/osm/include/vendor/osm_vendor_ts.h b/osm/include/vendor/osm_vendor_ts.h index b4c2f21..4414cba 100644 --- a/osm/include/vendor/osm_vendor_ts.h +++ b/osm/include/vendor/osm_vendor_ts.h @@ -59,7 +59,6 @@ #define OUT #include "iba/ib_types.h" #include "iba/ib_al.h" #include -#include #include #include #include diff --git a/osm/libvendor/osm_pkt_randomizer.c b/osm/libvendor/osm_pkt_randomizer.c index 2fa7621..29df135 100644 --- a/osm/libvendor/osm_pkt_randomizer.c +++ b/osm/libvendor/osm_pkt_randomizer.c @@ -51,12 +51,14 @@ #endif /* HAVE_CONFIG_H */ #include #include +#include #ifndef WIN32 #include #include #endif +#include /********************************************************************** * Return TRUE if the path is in a fault path, and FALSE otherwise. diff --git a/osm/libvendor/osm_vendor_al.c b/osm/libvendor/osm_vendor_al.c index d26d6d8..3240625 100644 --- a/osm/libvendor/osm_vendor_al.c +++ b/osm/libvendor/osm_vendor_al.c @@ -59,6 +59,7 @@ #include #ifdef OSM_VENDOR_INTF_AL +#include #include #include #include diff --git a/osm/libvendor/osm_vendor_ibumad.c b/osm/libvendor/osm_vendor_ibumad.c index 0a7fbe3..a3041d0 100644 --- a/osm/libvendor/osm_vendor_ibumad.c +++ b/osm/libvendor/osm_vendor_ibumad.c @@ -57,20 +57,22 @@ #include #ifdef OSM_VENDOR_INTF_OPENIB +#include +#include +#include +#include + +#include #include #include #include #include #include -#include #include #include #include #include -#include -#include -#include /****s* OpenSM: Vendor AL/osm_umad_bind_info_t * NAME diff --git a/osm/libvendor/osm_vendor_ibumad_sa.c b/osm/libvendor/osm_vendor_ibumad_sa.c index 6eae887..568d39c 100644 --- a/osm/libvendor/osm_vendor_ibumad_sa.c +++ b/osm/libvendor/osm_vendor_ibumad_sa.c @@ -38,10 +38,13 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include +#include + #define MAX_PORTS 64 /***************************************************************************** diff --git a/osm/libvendor/osm_vendor_mlx.c b/osm/libvendor/osm_vendor_mlx.c index 4c75d41..4a4be06 100644 --- a/osm/libvendor/osm_vendor_mlx.c +++ b/osm/libvendor/osm_vendor_mlx.c @@ -38,12 +38,14 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include #include #include #include +#include /** * FORWARD REFERENCES diff --git a/osm/libvendor/osm_vendor_mlx_anafa.c b/osm/libvendor/osm_vendor_mlx_anafa.c index 32af9bb..3cd917f 100644 --- a/osm/libvendor/osm_vendor_mlx_anafa.c +++ b/osm/libvendor/osm_vendor_mlx_anafa.c @@ -55,6 +55,7 @@ #include #include #include +#include #include /** diff --git a/osm/libvendor/osm_vendor_mlx_dispatcher.c b/osm/libvendor/osm_vendor_mlx_dispatcher.c index 341e784..afa1473 100644 --- a/osm/libvendor/osm_vendor_mlx_dispatcher.c +++ b/osm/libvendor/osm_vendor_mlx_dispatcher.c @@ -38,6 +38,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/libvendor/osm_vendor_mlx_hca.c b/osm/libvendor/osm_vendor_mlx_hca.c index bb120ac..c0dca86 100644 --- a/osm/libvendor/osm_vendor_mlx_hca.c +++ b/osm/libvendor/osm_vendor_mlx_hca.c @@ -39,6 +39,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #if defined(OSM_VENDOR_INTF_MTL) | defined(OSM_VENDOR_INTF_TS) #undef IN #undef OUT diff --git a/osm/libvendor/osm_vendor_mlx_hca_anafa.c b/osm/libvendor/osm_vendor_mlx_hca_anafa.c index 5045563..8f87225 100644 --- a/osm/libvendor/osm_vendor_mlx_hca_anafa.c +++ b/osm/libvendor/osm_vendor_mlx_hca_anafa.c @@ -44,6 +44,7 @@ #undef IN #undef OUT #include +#include #include #include diff --git a/osm/libvendor/osm_vendor_mlx_ibmgt.c b/osm/libvendor/osm_vendor_mlx_ibmgt.c index 117ad12..ace790b 100644 --- a/osm/libvendor/osm_vendor_mlx_ibmgt.c +++ b/osm/libvendor/osm_vendor_mlx_ibmgt.c @@ -46,7 +46,9 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include +#include #include #include #include diff --git a/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c b/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c index 69708c9..df250e2 100644 --- a/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c +++ b/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c @@ -38,6 +38,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/libvendor/osm_vendor_mlx_sa.c b/osm/libvendor/osm_vendor_mlx_sa.c index 85fd810..212344a 100644 --- a/osm/libvendor/osm_vendor_mlx_sa.c +++ b/osm/libvendor/osm_vendor_mlx_sa.c @@ -40,6 +40,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include +#include #include #include #include diff --git a/osm/libvendor/osm_vendor_mlx_sar.c b/osm/libvendor/osm_vendor_mlx_sar.c index 5b0bd70..f6b6405 100644 --- a/osm/libvendor/osm_vendor_mlx_sar.c +++ b/osm/libvendor/osm_vendor_mlx_sar.c @@ -38,8 +38,10 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include +#include +#include ib_api_status_t osmv_rmpp_sar_init(osmv_rmpp_sar_t* p_sar, void* p_arbt_mad, diff --git a/osm/libvendor/osm_vendor_mlx_sender.c b/osm/libvendor/osm_vendor_mlx_sender.c index 3317702..e1ed0a0 100644 --- a/osm/libvendor/osm_vendor_mlx_sender.c +++ b/osm/libvendor/osm_vendor_mlx_sender.c @@ -38,6 +38,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/libvendor/osm_vendor_mlx_sim.c b/osm/libvendor/osm_vendor_mlx_sim.c index b927f2f..ba81e03 100644 --- a/osm/libvendor/osm_vendor_mlx_sim.c +++ b/osm/libvendor/osm_vendor_mlx_sim.c @@ -51,12 +51,14 @@ #include #include #include #include +#include #include #include #include #include +#include /* the simulator messages definition */ #include diff --git a/osm/libvendor/osm_vendor_mlx_ts.c b/osm/libvendor/osm_vendor_mlx_ts.c index 483b69b..a32173e 100644 --- a/osm/libvendor/osm_vendor_mlx_ts.c +++ b/osm/libvendor/osm_vendor_mlx_ts.c @@ -51,12 +51,14 @@ #include #include #include #include +#include #include #include #include #include +#include #include typedef struct _osmv_TOPSPIN_transport_mgr_ { diff --git a/osm/libvendor/osm_vendor_mlx_ts_anafa.c b/osm/libvendor/osm_vendor_mlx_ts_anafa.c index dd3c462..a9395df 100644 --- a/osm/libvendor/osm_vendor_mlx_ts_anafa.c +++ b/osm/libvendor/osm_vendor_mlx_ts_anafa.c @@ -52,6 +52,7 @@ #include #include #include #include +#include #include #include @@ -59,6 +60,7 @@ #include #include +#include #include static void diff --git a/osm/libvendor/osm_vendor_mtl.c b/osm/libvendor/osm_vendor_mtl.c index f9b2284..82a68de 100644 --- a/osm/libvendor/osm_vendor_mtl.c +++ b/osm/libvendor/osm_vendor_mtl.c @@ -43,6 +43,8 @@ #include #ifdef OSM_VENDOR_INTF_MTL +#include +#include #include #include /* HACK - I do not know how to prevent complib from loading kernel H files */ diff --git a/osm/libvendor/osm_vendor_mtl_transaction_mgr.c b/osm/libvendor/osm_vendor_mtl_transaction_mgr.c index 997eb37..2b1c960 100644 --- a/osm/libvendor/osm_vendor_mtl_transaction_mgr.c +++ b/osm/libvendor/osm_vendor_mtl_transaction_mgr.c @@ -40,6 +40,7 @@ # include #endif /* HAVE_CONFIG_H */ #include +#include #include #include #include diff --git a/osm/libvendor/osm_vendor_test.c b/osm/libvendor/osm_vendor_test.c index ecacc67..013262e 100644 --- a/osm/libvendor/osm_vendor_test.c +++ b/osm/libvendor/osm_vendor_test.c @@ -56,6 +56,7 @@ #include #ifdef OSM_VENDOR_INTF_TEST +#include #include #include #include diff --git a/osm/libvendor/osm_vendor_ts.c b/osm/libvendor/osm_vendor_ts.c index 16d52e2..fa51382 100644 --- a/osm/libvendor/osm_vendor_ts.c +++ b/osm/libvendor/osm_vendor_ts.c @@ -40,8 +40,10 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include +#include #include #include diff --git a/osm/libvendor/osm_vendor_umadt.c b/osm/libvendor/osm_vendor_umadt.c index 01d9b10..e27801a 100644 --- a/osm/libvendor/osm_vendor_umadt.c +++ b/osm/libvendor/osm_vendor_umadt.c @@ -61,6 +61,7 @@ #ifdef OSM_VENDOR_INTF_UMADT #include #include +#include #include #include diff --git a/osm/opensm/osm_db_files.c b/osm/opensm/osm_db_files.c index a8e82a7..930aaef 100644 --- a/osm/opensm/osm_db_files.c +++ b/osm/opensm/osm_db_files.c @@ -46,11 +46,13 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include -#include #include #include #include +#include +#include +#include +#include /****d* Database/OSM_DB_MAX_LINE_LEN * NAME diff --git a/osm/opensm/osm_db_pack.c b/osm/opensm/osm_db_pack.c index 3f90397..b93ac84 100644 --- a/osm/opensm/osm_db_pack.c +++ b/osm/opensm/osm_db_pack.c @@ -40,6 +40,7 @@ # include #endif /* HAVE_CONFIG_H */ #include +#include #include #include static inline void diff --git a/osm/opensm/osm_drop_mgr.c b/osm/opensm/osm_drop_mgr.c index 470e5df..929088a 100644 --- a/osm/opensm/osm_drop_mgr.c +++ b/osm/opensm/osm_drop_mgr.c @@ -51,7 +51,9 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include +#include #include #include #include diff --git a/osm/opensm/osm_fwd_tbl.c b/osm/opensm/osm_fwd_tbl.c index 852e048..ee32194 100644 --- a/osm/opensm/osm_fwd_tbl.c +++ b/osm/opensm/osm_fwd_tbl.c @@ -51,7 +51,6 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include #include #include #include diff --git a/osm/opensm/osm_helper.c b/osm/opensm/osm_helper.c index e54644b..3886609 100644 --- a/osm/opensm/osm_helper.c +++ b/osm/opensm/osm_helper.c @@ -51,7 +51,7 @@ #endif /* HAVE_CONFIG_H */ #include #include -#include +#include #include #include #include diff --git a/osm/opensm/osm_inform.c b/osm/opensm/osm_inform.c index f20b068..172190c 100644 --- a/osm/opensm/osm_inform.c +++ b/osm/opensm/osm_inform.c @@ -49,6 +49,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/opensm/osm_lid_mgr.c b/osm/opensm/osm_lid_mgr.c index 31d0be4..a33a420 100644 --- a/osm/opensm/osm_lid_mgr.c +++ b/osm/opensm/osm_lid_mgr.c @@ -90,6 +90,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/opensm/osm_lin_fwd_rcv.c b/osm/opensm/osm_lin_fwd_rcv.c index 8ae7da8..339fe11 100644 --- a/osm/opensm/osm_lin_fwd_rcv.c +++ b/osm/opensm/osm_lin_fwd_rcv.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include #include diff --git a/osm/opensm/osm_lin_fwd_rcv_ctrl.c b/osm/opensm/osm_lin_fwd_rcv_ctrl.c index 4e915e7..987440d 100644 --- a/osm/opensm/osm_lin_fwd_rcv_ctrl.c +++ b/osm/opensm/osm_lin_fwd_rcv_ctrl.c @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_lin_fwd_tbl.c b/osm/opensm/osm_lin_fwd_tbl.c index f8a6b87..3b4895f 100644 --- a/osm/opensm/osm_lin_fwd_tbl.c +++ b/osm/opensm/osm_lin_fwd_tbl.c @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/opensm/osm_link_mgr.c b/osm/opensm/osm_link_mgr.c index c8307d3..87e9e46 100644 --- a/osm/opensm/osm_link_mgr.c +++ b/osm/opensm/osm_link_mgr.c @@ -50,8 +50,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_mad_pool.c b/osm/opensm/osm_mad_pool.c index 72f9db8..12ecabf 100644 --- a/osm/opensm/osm_mad_pool.c +++ b/osm/opensm/osm_mad_pool.c @@ -52,6 +52,7 @@ # include #endif /* HAVE_CONFIG_H */ #include +#include #include #include #include diff --git a/osm/opensm/osm_matrix.c b/osm/opensm/osm_matrix.c index 3efb0bd..073d9b8 100644 --- a/osm/opensm/osm_matrix.c +++ b/osm/opensm/osm_matrix.c @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include diff --git a/osm/opensm/osm_mcast_fwd_rcv.c b/osm/opensm/osm_mcast_fwd_rcv.c index 73763f5..d0ffa59 100644 --- a/osm/opensm/osm_mcast_fwd_rcv.c +++ b/osm/opensm/osm_mcast_fwd_rcv.c @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_mcast_fwd_rcv_ctrl.c b/osm/opensm/osm_mcast_fwd_rcv_ctrl.c index a6f46fd..9201ecf 100644 --- a/osm/opensm/osm_mcast_fwd_rcv_ctrl.c +++ b/osm/opensm/osm_mcast_fwd_rcv_ctrl.c @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c index f729c61..96d3b0f 100644 --- a/osm/opensm/osm_mcast_mgr.c +++ b/osm/opensm/osm_mcast_mgr.c @@ -50,6 +50,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include +#include #include #include #include diff --git a/osm/opensm/osm_mcast_tbl.c b/osm/opensm/osm_mcast_tbl.c index 401d97c..b8fa325 100644 --- a/osm/opensm/osm_mcast_tbl.c +++ b/osm/opensm/osm_mcast_tbl.c @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/opensm/osm_mcm_info.c b/osm/opensm/osm_mcm_info.c index 08c0d12..a5ac7f3 100644 --- a/osm/opensm/osm_mcm_info.c +++ b/osm/opensm/osm_mcm_info.c @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include /********************************************************************** diff --git a/osm/opensm/osm_mcm_port.c b/osm/opensm/osm_mcm_port.c index e92ad76..16ed84e 100644 --- a/osm/opensm/osm_mcm_port.c +++ b/osm/opensm/osm_mcm_port.c @@ -51,6 +51,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include +#include #include /********************************************************************** diff --git a/osm/opensm/osm_mtree.c b/osm/opensm/osm_mtree.c index f9d82d6..421e39e 100644 --- a/osm/opensm/osm_mtree.c +++ b/osm/opensm/osm_mtree.c @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include diff --git a/osm/opensm/osm_multicast.c b/osm/opensm/osm_multicast.c index 2256741..690f7df 100644 --- a/osm/opensm/osm_multicast.c +++ b/osm/opensm/osm_multicast.c @@ -49,6 +49,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/opensm/osm_node_desc_rcv.c b/osm/opensm/osm_node_desc_rcv.c index 62fe034..f9fa22d 100644 --- a/osm/opensm/osm_node_desc_rcv.c +++ b/osm/opensm/osm_node_desc_rcv.c @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_node_desc_rcv_ctrl.c b/osm/opensm/osm_node_desc_rcv_ctrl.c index 9f689e2..3f26b83 100644 --- a/osm/opensm/osm_node_desc_rcv_ctrl.c +++ b/osm/opensm/osm_node_desc_rcv_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_node_info_rcv.c b/osm/opensm/osm_node_info_rcv.c index c35e2b7..59257a0 100644 --- a/osm/opensm/osm_node_info_rcv.c +++ b/osm/opensm/osm_node_info_rcv.c @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_node_info_rcv_ctrl.c b/osm/opensm/osm_node_info_rcv_ctrl.c index 478f9c4..cbff6ce 100644 --- a/osm/opensm/osm_node_info_rcv_ctrl.c +++ b/osm/opensm/osm_node_info_rcv_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_opensm.c b/osm/opensm/osm_opensm.c index 2a8e0f8..8c422b5 100644 --- a/osm/opensm/osm_opensm.c +++ b/osm/opensm/osm_opensm.c @@ -53,7 +53,7 @@ #endif /* HAVE_CONFIG_H */ #include #include -#include +#include #include #include #include @@ -130,8 +130,6 @@ osm_opensm_destroy( cl_plock_destroy( &p_osm->lock ); - cl_mem_display( ); - osm_log_destroy( &p_osm->log ); } diff --git a/osm/opensm/osm_pkey.c b/osm/opensm/osm_pkey.c index b0cb869..5ecfdd9 100644 --- a/osm/opensm/osm_pkey.c +++ b/osm/opensm/osm_pkey.c @@ -51,6 +51,7 @@ #endif /* HAVE_CONFIG_H */ #include #include +#include #include #include #include diff --git a/osm/opensm/osm_pkey_mgr.c b/osm/opensm/osm_pkey_mgr.c index f98d13b..e08b7cc 100644 --- a/osm/opensm/osm_pkey_mgr.c +++ b/osm/opensm/osm_pkey_mgr.c @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/opensm/osm_pkey_rcv.c b/osm/opensm/osm_pkey_rcv.c index 8696dc4..5262a6b 100644 --- a/osm/opensm/osm_pkey_rcv.c +++ b/osm/opensm/osm_pkey_rcv.c @@ -39,8 +39,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_pkey_rcv_ctrl.c b/osm/opensm/osm_pkey_rcv_ctrl.c index 77ebab2..cd4367a 100644 --- a/osm/opensm/osm_pkey_rcv_ctrl.c +++ b/osm/opensm/osm_pkey_rcv_ctrl.c @@ -43,7 +43,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_port.c b/osm/opensm/osm_port.c index f8c51e8..53ab006 100644 --- a/osm/opensm/osm_port.c +++ b/osm/opensm/osm_port.c @@ -52,6 +52,7 @@ # include #endif /* HAVE_CONFIG_H */ #include +#include #include #include #include diff --git a/osm/opensm/osm_port_info_rcv.c b/osm/opensm/osm_port_info_rcv.c index 119bcbd..a08c57c 100644 --- a/osm/opensm/osm_port_info_rcv.c +++ b/osm/opensm/osm_port_info_rcv.c @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_port_info_rcv_ctrl.c b/osm/opensm/osm_port_info_rcv_ctrl.c index 9f6001f..303bedb 100644 --- a/osm/opensm/osm_port_info_rcv_ctrl.c +++ b/osm/opensm/osm_port_info_rcv_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_prtn.c b/osm/opensm/osm_prtn.c index 26790b4..8b748c4 100644 --- a/osm/opensm/osm_prtn.c +++ b/osm/opensm/osm_prtn.c @@ -54,6 +54,7 @@ #include #include #include +#include #include #include #include diff --git a/osm/opensm/osm_qos.c b/osm/opensm/osm_qos.c index cd5c26a..c23ef87 100644 --- a/osm/opensm/osm_qos.c +++ b/osm/opensm/osm_qos.c @@ -46,6 +46,7 @@ # include #endif /* HAVE_CONFIG_H */ #include +#include #include #include diff --git a/osm/opensm/osm_remote_sm.c b/osm/opensm/osm_remote_sm.c index eb65d22..b91264e 100644 --- a/osm/opensm/osm_remote_sm.c +++ b/osm/opensm/osm_remote_sm.c @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include /********************************************************************** **********************************************************************/ diff --git a/osm/opensm/osm_req.c b/osm/opensm/osm_req.c index 9ddc9e9..534694b 100644 --- a/osm/opensm/osm_req.c +++ b/osm/opensm/osm_req.c @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_req_ctrl.c b/osm/opensm/osm_req_ctrl.c index 708e7c9..2d0e7e0 100644 --- a/osm/opensm/osm_req_ctrl.c +++ b/osm/opensm/osm_req_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include #include diff --git a/osm/opensm/osm_resp.c b/osm/opensm/osm_resp.c index 9b5079a..aa60bf2 100644 --- a/osm/opensm/osm_resp.c +++ b/osm/opensm/osm_resp.c @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sa.c b/osm/opensm/osm_sa.c index b33431c..fa7dad8 100644 --- a/osm/opensm/osm_sa.c +++ b/osm/opensm/osm_sa.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include #include diff --git a/osm/opensm/osm_sa_class_port_info.c b/osm/opensm/osm_sa_class_port_info.c index 389bc9c..cfad739 100644 --- a/osm/opensm/osm_sa_class_port_info.c +++ b/osm/opensm/osm_sa_class_port_info.c @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sa_class_port_info_ctrl.c b/osm/opensm/osm_sa_class_port_info_ctrl.c index 219a837..c71af4c 100644 --- a/osm/opensm/osm_sa_class_port_info_ctrl.c +++ b/osm/opensm/osm_sa_class_port_info_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sa_guidinfo_record.c b/osm/opensm/osm_sa_guidinfo_record.c index 7d1eebf..601c809 100644 --- a/osm/opensm/osm_sa_guidinfo_record.c +++ b/osm/opensm/osm_sa_guidinfo_record.c @@ -54,8 +54,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sa_guidinfo_record_ctrl.c b/osm/opensm/osm_sa_guidinfo_record_ctrl.c index b252b20..f2211b1 100644 --- a/osm/opensm/osm_sa_guidinfo_record_ctrl.c +++ b/osm/opensm/osm_sa_guidinfo_record_ctrl.c @@ -54,7 +54,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sa_informinfo.c b/osm/opensm/osm_sa_informinfo.c index 149e609..a820dea 100644 --- a/osm/opensm/osm_sa_informinfo.c +++ b/osm/opensm/osm_sa_informinfo.c @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sa_informinfo_ctrl.c b/osm/opensm/osm_sa_informinfo_ctrl.c index 75edabc..31644af 100644 --- a/osm/opensm/osm_sa_informinfo_ctrl.c +++ b/osm/opensm/osm_sa_informinfo_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sa_lft_record.c b/osm/opensm/osm_sa_lft_record.c index b9b903e..2d17dbe 100644 --- a/osm/opensm/osm_sa_lft_record.c +++ b/osm/opensm/osm_sa_lft_record.c @@ -55,6 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/opensm/osm_sa_lft_record_ctrl.c b/osm/opensm/osm_sa_lft_record_ctrl.c index 0682438..1cc2544 100644 --- a/osm/opensm/osm_sa_lft_record_ctrl.c +++ b/osm/opensm/osm_sa_lft_record_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sa_link_record.c b/osm/opensm/osm_sa_link_record.c index 1a407e1..a525002 100644 --- a/osm/opensm/osm_sa_link_record.c +++ b/osm/opensm/osm_sa_link_record.c @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sa_link_record_ctrl.c b/osm/opensm/osm_sa_link_record_ctrl.c index 707c184..01db21d 100644 --- a/osm/opensm/osm_sa_link_record_ctrl.c +++ b/osm/opensm/osm_sa_link_record_ctrl.c @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sa_mad_ctrl.c b/osm/opensm/osm_sa_mad_ctrl.c index 1f87ea2..81584ce 100644 --- a/osm/opensm/osm_sa_mad_ctrl.c +++ b/osm/opensm/osm_sa_mad_ctrl.c @@ -50,7 +50,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include #include diff --git a/osm/opensm/osm_sa_mcmember_record.c b/osm/opensm/osm_sa_mcmember_record.c index 291fbf5..5129231 100644 --- a/osm/opensm/osm_sa_mcmember_record.c +++ b/osm/opensm/osm_sa_mcmember_record.c @@ -55,6 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/opensm/osm_sa_mcmember_record_ctrl.c b/osm/opensm/osm_sa_mcmember_record_ctrl.c index 99a779a..a583979 100644 --- a/osm/opensm/osm_sa_mcmember_record_ctrl.c +++ b/osm/opensm/osm_sa_mcmember_record_ctrl.c @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include #include diff --git a/osm/opensm/osm_sa_multipath_record.c b/osm/opensm/osm_sa_multipath_record.c index bdf53a3..c8efdb4 100644 --- a/osm/opensm/osm_sa_multipath_record.c +++ b/osm/opensm/osm_sa_multipath_record.c @@ -52,8 +52,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sa_multipath_record_ctrl.c b/osm/opensm/osm_sa_multipath_record_ctrl.c index 7c0337c..e330bb8 100644 --- a/osm/opensm/osm_sa_multipath_record_ctrl.c +++ b/osm/opensm/osm_sa_multipath_record_ctrl.c @@ -56,7 +56,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sa_node_record.c b/osm/opensm/osm_sa_node_record.c index ecaa048..ac9be22 100644 --- a/osm/opensm/osm_sa_node_record.c +++ b/osm/opensm/osm_sa_node_record.c @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/opensm/osm_sa_node_record_ctrl.c b/osm/opensm/osm_sa_node_record_ctrl.c index dcf5944..61b363a 100644 --- a/osm/opensm/osm_sa_node_record_ctrl.c +++ b/osm/opensm/osm_sa_node_record_ctrl.c @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sa_path_record.c b/osm/opensm/osm_sa_path_record.c index 1e4a137..7da6d70 100644 --- a/osm/opensm/osm_sa_path_record.c +++ b/osm/opensm/osm_sa_path_record.c @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sa_path_record_ctrl.c b/osm/opensm/osm_sa_path_record_ctrl.c index eab7171..9495785 100644 --- a/osm/opensm/osm_sa_path_record_ctrl.c +++ b/osm/opensm/osm_sa_path_record_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sa_pkey_record.c b/osm/opensm/osm_sa_pkey_record.c index e60466b..0eeb0c0 100644 --- a/osm/opensm/osm_sa_pkey_record.c +++ b/osm/opensm/osm_sa_pkey_record.c @@ -43,8 +43,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sa_pkey_record_ctrl.c b/osm/opensm/osm_sa_pkey_record_ctrl.c index 01cdc0f..a9d8a8d 100644 --- a/osm/opensm/osm_sa_pkey_record_ctrl.c +++ b/osm/opensm/osm_sa_pkey_record_ctrl.c @@ -43,7 +43,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sa_portinfo_record.c b/osm/opensm/osm_sa_portinfo_record.c index 3acb8c9..e1ca873 100644 --- a/osm/opensm/osm_sa_portinfo_record.c +++ b/osm/opensm/osm_sa_portinfo_record.c @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sa_portinfo_record_ctrl.c b/osm/opensm/osm_sa_portinfo_record_ctrl.c index 831843b..4f53f04 100644 --- a/osm/opensm/osm_sa_portinfo_record_ctrl.c +++ b/osm/opensm/osm_sa_portinfo_record_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sa_response.c b/osm/opensm/osm_sa_response.c index 30f561f..03c94f7 100644 --- a/osm/opensm/osm_sa_response.c +++ b/osm/opensm/osm_sa_response.c @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sa_service_record.c b/osm/opensm/osm_sa_service_record.c index 38ee80b..a65e41d 100644 --- a/osm/opensm/osm_sa_service_record.c +++ b/osm/opensm/osm_sa_service_record.c @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sa_service_record_ctrl.c b/osm/opensm/osm_sa_service_record_ctrl.c index 5f8c936..8af9cd7 100644 --- a/osm/opensm/osm_sa_service_record_ctrl.c +++ b/osm/opensm/osm_sa_service_record_ctrl.c @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sa_slvl_record.c b/osm/opensm/osm_sa_slvl_record.c index 237b99c..5d1928e 100644 --- a/osm/opensm/osm_sa_slvl_record.c +++ b/osm/opensm/osm_sa_slvl_record.c @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sa_slvl_record_ctrl.c b/osm/opensm/osm_sa_slvl_record_ctrl.c index d156bf1..7801508 100644 --- a/osm/opensm/osm_sa_slvl_record_ctrl.c +++ b/osm/opensm/osm_sa_slvl_record_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sa_sminfo_record.c b/osm/opensm/osm_sa_sminfo_record.c index 9c3f436..b9dee38 100644 --- a/osm/opensm/osm_sa_sminfo_record.c +++ b/osm/opensm/osm_sa_sminfo_record.c @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sa_sminfo_record_ctrl.c b/osm/opensm/osm_sa_sminfo_record_ctrl.c index 72c2fad..3b07920 100644 --- a/osm/opensm/osm_sa_sminfo_record_ctrl.c +++ b/osm/opensm/osm_sa_sminfo_record_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sa_vlarb_record.c b/osm/opensm/osm_sa_vlarb_record.c index ddbef9c..059e5a9 100644 --- a/osm/opensm/osm_sa_vlarb_record.c +++ b/osm/opensm/osm_sa_vlarb_record.c @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sa_vlarb_record_ctrl.c b/osm/opensm/osm_sa_vlarb_record_ctrl.c index f7ad3ed..a243e08 100644 --- a/osm/opensm/osm_sa_vlarb_record_ctrl.c +++ b/osm/opensm/osm_sa_vlarb_record_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_service.c b/osm/opensm/osm_service.c index 723e117..a1309d3 100644 --- a/osm/opensm/osm_service.c +++ b/osm/opensm/osm_service.c @@ -49,6 +49,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/opensm/osm_slvl_map_rcv.c b/osm/opensm/osm_slvl_map_rcv.c index 9a6acf5..33c3d45 100644 --- a/osm/opensm/osm_slvl_map_rcv.c +++ b/osm/opensm/osm_slvl_map_rcv.c @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_slvl_map_rcv_ctrl.c b/osm/opensm/osm_slvl_map_rcv_ctrl.c index ee357da..4da0eff 100644 --- a/osm/opensm/osm_slvl_map_rcv_ctrl.c +++ b/osm/opensm/osm_slvl_map_rcv_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sm.c b/osm/opensm/osm_sm.c index f6e33c5..0e09f26 100644 --- a/osm/opensm/osm_sm.c +++ b/osm/opensm/osm_sm.c @@ -55,6 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/opensm/osm_sm_mad_ctrl.c b/osm/opensm/osm_sm_mad_ctrl.c index 1b90335..9dceef2 100644 --- a/osm/opensm/osm_sm_mad_ctrl.c +++ b/osm/opensm/osm_sm_mad_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include #include diff --git a/osm/opensm/osm_sm_state_mgr.c b/osm/opensm/osm_sm_state_mgr.c index a881f7f..8ae9889 100644 --- a/osm/opensm/osm_sm_state_mgr.c +++ b/osm/opensm/osm_sm_state_mgr.c @@ -50,8 +50,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sminfo_rcv.c b/osm/opensm/osm_sminfo_rcv.c index e5c4bbb..5914984 100644 --- a/osm/opensm/osm_sminfo_rcv.c +++ b/osm/opensm/osm_sminfo_rcv.c @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/opensm/osm_sminfo_rcv_ctrl.c b/osm/opensm/osm_sminfo_rcv_ctrl.c index 76ae65c..327d7eb 100644 --- a/osm/opensm/osm_sminfo_rcv_ctrl.c +++ b/osm/opensm/osm_sminfo_rcv_ctrl.c @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c index c97875c..97b017d 100644 --- a/osm/opensm/osm_state_mgr.c +++ b/osm/opensm/osm_state_mgr.c @@ -50,7 +50,9 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include +#include #include #include #include diff --git a/osm/opensm/osm_state_mgr_ctrl.c b/osm/opensm/osm_state_mgr_ctrl.c index a7afc46..0bde333 100644 --- a/osm/opensm/osm_state_mgr_ctrl.c +++ b/osm/opensm/osm_state_mgr_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c index 9b4bcfe..c251411 100644 --- a/osm/opensm/osm_subnet.c +++ b/osm/opensm/osm_subnet.c @@ -51,6 +51,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include +#include #include #include #include diff --git a/osm/opensm/osm_sw_info_rcv.c b/osm/opensm/osm_sw_info_rcv.c index 7a1f72f..6bbd73a 100644 --- a/osm/opensm/osm_sw_info_rcv.c +++ b/osm/opensm/osm_sw_info_rcv.c @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_sw_info_rcv_ctrl.c b/osm/opensm/osm_sw_info_rcv_ctrl.c index a97a7dc..fb8fe50 100644 --- a/osm/opensm/osm_sw_info_rcv_ctrl.c +++ b/osm/opensm/osm_sw_info_rcv_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_sweep_fail_ctrl.c b/osm/opensm/osm_sweep_fail_ctrl.c index 022988a..e27a540 100644 --- a/osm/opensm/osm_sweep_fail_ctrl.c +++ b/osm/opensm/osm_sweep_fail_ctrl.c @@ -49,7 +49,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c index fa726c6..7e89475 100644 --- a/osm/opensm/osm_switch.c +++ b/osm/opensm/osm_switch.c @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/opensm/osm_trap_rcv.c b/osm/opensm/osm_trap_rcv.c index 7e39832..9865f53 100644 --- a/osm/opensm/osm_trap_rcv.c +++ b/osm/opensm/osm_trap_rcv.c @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_trap_rcv_ctrl.c b/osm/opensm/osm_trap_rcv_ctrl.c index 1e6bf45..ee5a1a4 100644 --- a/osm/opensm/osm_trap_rcv_ctrl.c +++ b/osm/opensm/osm_trap_rcv_ctrl.c @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c index 4492c1a..95f4d04 100644 --- a/osm/opensm/osm_ucast_mgr.c +++ b/osm/opensm/osm_ucast_mgr.c @@ -54,6 +54,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include +#include #include #include #include diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c index b70cf21..44e1993 100644 --- a/osm/opensm/osm_ucast_updn.c +++ b/osm/opensm/osm_ucast_updn.c @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include diff --git a/osm/opensm/osm_vl15intf.c b/osm/opensm/osm_vl15intf.c index f72620b..68f17c5 100644 --- a/osm/opensm/osm_vl15intf.c +++ b/osm/opensm/osm_vl15intf.c @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_vl_arb_rcv.c b/osm/opensm/osm_vl_arb_rcv.c index 70fd5ed..e33a2f9 100644 --- a/osm/opensm/osm_vl_arb_rcv.c +++ b/osm/opensm/osm_vl_arb_rcv.c @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include diff --git a/osm/opensm/osm_vl_arb_rcv_ctrl.c b/osm/opensm/osm_vl_arb_rcv_ctrl.c index 9113985..f1f22c7 100644 --- a/osm/opensm/osm_vl_arb_rcv_ctrl.c +++ b/osm/opensm/osm_vl_arb_rcv_ctrl.c @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include diff --git a/osm/osmtest/include/osmtest_subnet.h b/osm/osmtest/include/osmtest_subnet.h index 0e7cf3e..277a2aa 100644 --- a/osm/osmtest/include/osmtest_subnet.h +++ b/osm/osmtest/include/osmtest_subnet.h @@ -47,6 +47,7 @@ #ifndef _OSMTEST_SUBNET_H_ #define _OSMTEST_SUBNET_H_ +#include #include #include #include diff --git a/osm/osmtest/osmt_inform.c b/osm/osmtest/osmt_inform.c index b24ae30..e1562db 100644 --- a/osm/osmtest/osmt_inform.c +++ b/osm/osmtest/osmt_inform.c @@ -56,7 +56,6 @@ #include #include #include #include -#include #include #include "osmtest.h" diff --git a/osm/osmtest/osmt_slvl_vl_arb.c b/osm/osmtest/osmt_slvl_vl_arb.c index 6cb8377..9fc84f6 100644 --- a/osm/osmtest/osmt_slvl_vl_arb.c +++ b/osm/osmtest/osmt_slvl_vl_arb.c @@ -54,7 +54,6 @@ #include #include #include #include -#include #include "osmtest.h" /********************************************************************** diff --git a/osm/osmtest/osmtest.c b/osm/osmtest/osmtest.c index 78aff53..5eb5482 100644 --- a/osm/osmtest/osmtest.c +++ b/osm/osmtest/osmtest.c @@ -56,8 +56,8 @@ #endif #include #include -#ifdef __WIN__ #include +#ifdef __WIN__ #include #else #include -- 1.3.2 From rdreier at cisco.com Wed May 17 15:05:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 15:05:00 -0700 Subject: [openib-general] [PATCH] Replace cl_memory.h by string.h In-Reply-To: <20060517220249.GC28485@sashak.voltaire.com> (Sasha Khapyorsky's message of "Thu, 18 May 2006 01:02:49 +0300") References: <1147882436.18971.50423.camel@hal.voltaire.com> <20060517220249.GC28485@sashak.voltaire.com> Message-ID: Just curious -- what's the reason behind changes like: > --- a/osm/complib/cl_event_wheel.c > +++ b/osm/complib/cl_event_wheel.c > @@ -40,6 +40,7 @@ # include > #endif /* HAVE_CONFIG_H */ > > #include > +#include > #include > #include It seems including cl_memory.h in more places is a step backwards, or am I missing the point here? - R. From mst at mellanox.co.il Wed May 17 15:07:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 18 May 2006 01:07:34 +0300 Subject: [openib-general] Re: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines In-Reply-To: <1147886554.18971.52002.camel@hal.voltaire.com> References: <1147882436.18971.50423.camel@hal.voltaire.com> <20060517171146.GB12290@mellanox.co.il> <1147886554.18971.52002.camel@hal.voltaire.com> Message-ID: <20060517220733.GD14211@mellanox.co.il> Quoting r. Hal Rosenstock : > Right now, there is memory > tracking code implemented. Doesn't MALLOC_CHECK_ do what you want? -- MST From sashak at voltaire.com Wed May 17 15:20:37 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 18 May 2006 01:20:37 +0300 Subject: [openib-general] [PATCH] Replace cl_memory.h by string.h In-Reply-To: References: <1147882436.18971.50423.camel@hal.voltaire.com> <20060517220249.GC28485@sashak.voltaire.com> Message-ID: <20060517222037.GD28485@sashak.voltaire.com> On 15:05 Wed 17 May , Roland Dreier wrote: > Just curious -- what's the reason behind changes like: > > > --- a/osm/complib/cl_event_wheel.c > > +++ b/osm/complib/cl_event_wheel.c > > @@ -40,6 +40,7 @@ # include > > #endif /* HAVE_CONFIG_H */ > > > > #include > > +#include > > #include > > #include > > It seems including cl_memory.h in more places is a step backwards, or > am I missing the point here? It is necessary for explicit prototyping yet used cl_malloc(), cl_free(). I guess this will be removed with next wave of Hal's cleanup. Sasha From rdreier at cisco.com Wed May 17 15:27:25 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 15:27:25 -0700 Subject: [openib-general][PATCH] srp: param sg_tablesize, In-Reply-To: <44622634.1070705@mellanox.com> (Vu Pham's message of "Wed, 10 May 2006 10:43:16 -0700") References: <443E8325.2000502@mellanox.com> <44622634.1070705@mellanox.com> Message-ID: Thanks, applied in slightly tweaked form as below: diff-tree 7c0543697efa99b2f1d308c415b0b2f3c0810f74 (from fbd15762bd05491db039ecd0ea57ee5f848759b0) Author: Vu Pham Date: Wed May 17 15:21:41 2006 -0700 IB/srp: Allow sg_tablesize to be adjusted Make the sg_tablesize used by SRP adjustable at module load time via a module parameter. Calculate the corresponding IU length required to support this. Signed-off-by: Vu Pham Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 72b61cd..4dd6e6a 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -62,6 +62,13 @@ MODULE_DESCRIPTION("InfiniBand SCSI RDMA "v" DRV_VERSION " (" DRV_RELDATE ")"); MODULE_LICENSE("Dual BSD/GPL"); +static int srp_sg_tablesize = SRP_DEF_SG_TABLESIZE; +static int srp_max_iu_len; + +module_param(srp_sg_tablesize, int, 0444); +MODULE_PARM_DESC(srp_sg_tablesize, + "Max number of gather/scatter entries per I/O (default is 12)"); + static int topspin_workarounds = 1; module_param(topspin_workarounds, int, 0444); @@ -311,7 +318,7 @@ static int srp_send_req(struct srp_targe req->priv.opcode = SRP_LOGIN_REQ; req->priv.tag = 0; - req->priv.req_it_iu_len = cpu_to_be32(SRP_MAX_IU_LEN); + req->priv.req_it_iu_len = cpu_to_be32(srp_max_iu_len); req->priv.req_buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | SRP_BUF_FORMAT_INDIRECT); memcpy(req->priv.initiator_port_id, target->srp_host->initiator_port_id, 16); @@ -953,7 +960,7 @@ static int srp_queuecommand(struct scsi_ goto err; dma_sync_single_for_cpu(target->srp_host->dev->dev->dma_device, iu->dma, - SRP_MAX_IU_LEN, DMA_TO_DEVICE); + srp_max_iu_len, DMA_TO_DEVICE); req = list_entry(target->free_reqs.next, struct srp_request, list); @@ -986,7 +993,7 @@ static int srp_queuecommand(struct scsi_ } dma_sync_single_for_device(target->srp_host->dev->dev->dma_device, iu->dma, - SRP_MAX_IU_LEN, DMA_TO_DEVICE); + srp_max_iu_len, DMA_TO_DEVICE); if (__srp_post_send(target, iu, len)) { printk(KERN_ERR PFX "Send failed\n"); @@ -1018,7 +1025,7 @@ static int srp_alloc_iu_bufs(struct srp_ for (i = 0; i < SRP_SQ_SIZE + 1; ++i) { target->tx_ring[i] = srp_alloc_iu(target->srp_host, - SRP_MAX_IU_LEN, + srp_max_iu_len, GFP_KERNEL, DMA_TO_DEVICE); if (!target->tx_ring[i]) goto err; @@ -1436,7 +1443,6 @@ static struct scsi_host_template srp_tem .eh_host_reset_handler = srp_reset_host, .can_queue = SRP_SQ_SIZE, .this_id = -1, - .sg_tablesize = SRP_MAX_INDIRECT, .cmd_per_lun = SRP_SQ_SIZE, .use_clustering = ENABLE_CLUSTERING, .shost_attrs = srp_host_attrs @@ -1914,6 +1920,11 @@ static int __init srp_init_module(void) { int ret; + srp_template.sg_tablesize = srp_sg_tablesize; + srp_max_iu_len = (sizeof (struct srp_cmd) + + sizeof (struct srp_indirect_buf) + + srp_sg_tablesize * 16); + ret = class_register(&srp_class); if (ret) { printk(KERN_ERR PFX "couldn't register class infiniband_srp\n"); diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h index c071c30..033a447 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.h +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -56,7 +56,7 @@ enum { SRP_DLID_REDIRECT = 2, SRP_MAX_LUN = 512, - SRP_MAX_IU_LEN = 256, + SRP_DEF_SG_TABLESIZE = 12, SRP_RQ_SHIFT = 6, SRP_RQ_SIZE = 1 << SRP_RQ_SHIFT, @@ -71,9 +71,6 @@ enum { }; #define SRP_OP_RECV (1 << 31) -#define SRP_MAX_INDIRECT ((SRP_MAX_IU_LEN - \ - sizeof (struct srp_cmd) - \ - sizeof (struct srp_indirect_buf)) / 16) enum srp_target_state { SRP_TARGET_LIVE, From sashak at voltaire.com Wed May 17 15:50:54 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 18 May 2006 01:50:54 +0300 Subject: [openib-general] [PATCH] opensm: remove unused cl_memory_osd.h [was: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines] In-Reply-To: <20060517220249.GC28485@sashak.voltaire.com> References: <1147882436.18971.50423.camel@hal.voltaire.com> <20060517220249.GC28485@sashak.voltaire.com> Message-ID: <20060517225054.GF28485@sashak.voltaire.com> On 01:02 Thu 18 May , Sasha Khapyorsky wrote: > On 12:14 Wed 17 May , Hal Rosenstock wrote: > > OpenSM: Use memory routines directly and eliminate cl_mem* routines > > as these routines are part of ISO C > > > > Signed-off-by: Hal Rosenstock > > Following Hal's cleanup And more: This removes unused cl_memory_osd.h file from complib Signed-off-by: Sasha Khapyorsky --- osm/complib/Makefile.am | 1 osm/include/Makefile.am | 1 osm/include/complib/cl_memory_osd.h | 79 ----------------------------------- 3 files changed, 0 insertions(+), 81 deletions(-) delete mode 100644 osm/include/complib/cl_memory_osd.h 95ce6332a6531ae1c7dab4060bfa5800e1b8f4ec diff --git a/osm/complib/Makefile.am b/osm/complib/Makefile.am index ecbd8e2..809a404 100644 --- a/osm/complib/Makefile.am +++ b/osm/complib/Makefile.am @@ -51,7 +51,6 @@ libosmcompinclude_HEADERS = $(srcdir)/.. $(srcdir)/../include/complib/cl_map.h \ $(srcdir)/../include/complib/cl_math.h \ $(srcdir)/../include/complib/cl_memory.h \ - $(srcdir)/../include/complib/cl_memory_osd.h \ $(srcdir)/../include/complib/cl_memtrack.h \ $(srcdir)/../include/complib/cl_packoff.h \ $(srcdir)/../include/complib/cl_packon.h \ diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am index c7054ad..b23b1de 100644 --- a/osm/include/Makefile.am +++ b/osm/include/Makefile.am @@ -124,7 +124,6 @@ EXTRA_DIST = \ $(srcdir)/opensm/osm_state_mgr_ctrl.h \ $(srcdir)/complib/cl_thread_osd.h \ $(srcdir)/complib/cl_packon.h \ - $(srcdir)/complib/cl_memory_osd.h \ $(srcdir)/complib/cl_atomic_osd.h \ $(srcdir)/complib/cl_spinlock.h \ $(srcdir)/complib/cl_passivelock.h \ diff --git a/osm/include/complib/cl_memory_osd.h b/osm/include/complib/cl_memory_osd.h deleted file mode 100644 index 9ef17e0..0000000 --- a/osm/include/complib/cl_memory_osd.h +++ /dev/null @@ -1,79 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * $Id$ - */ - - - -/* - * Abstract: - * Defines sized datatypes for Linux Kernel and User mode - * exported sizes are int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t - * int64_t, uint64_t. uintn_t is a polymorphic type, size is native size and - * also size of the pointer. - * - * Environment: - * Linux User and Kernel Mode - * - * $Revision: 1.2 $ - */ - -#ifndef _CL_MEMORY_OSD_H_ -#define _CL_MEMORY_OSD_H_ - -#include - -#ifdef __cplusplus -# define BEGIN_C_DECLS extern "C" { -# define END_C_DECLS } -#else /* !__cplusplus */ -# define BEGIN_C_DECLS -# define END_C_DECLS -#endif /* __cplusplus */ - -BEGIN_C_DECLS - -#ifndef __WIN__ - -static inline uint32_t -cl_get_pagesize( void ) -{ - return getpagesize(); -} - -#endif - -END_C_DECLS - -#endif /* _CL_MEMORY_OSD_H_ */ -- 1.3.2 From halr at voltaire.com Wed May 17 15:44:11 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 17 May 2006 18:44:11 -0400 Subject: [openib-general] [PATCH] Replace cl_memory.h by string.h In-Reply-To: <20060517222037.GD28485@sashak.voltaire.com> References: <1147882436.18971.50423.camel@hal.voltaire.com> <20060517220249.GC28485@sashak.voltaire.com> <20060517222037.GD28485@sashak.voltaire.com> Message-ID: <1147905655.18971.58534.camel@hal.voltaire.com> On Wed, 2006-05-17 at 18:20, Sasha Khapyorsky wrote: > On 15:05 Wed 17 May , Roland Dreier wrote: > > Just curious -- what's the reason behind changes like: > > > > > --- a/osm/complib/cl_event_wheel.c > > > +++ b/osm/complib/cl_event_wheel.c > > > @@ -40,6 +40,7 @@ # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > #include > > > +#include > > > #include > > > #include > > > > It seems including cl_memory.h in more places is a step backwards, or > > am I missing the point here? > > It is necessary for explicit prototyping yet used cl_malloc(), > cl_free(). I guess this will be removed with next wave of Hal's cleanup. Yes, that's what I expect too. When cl_malloc*/cl_free get removed, this will go away... -- Hal > Sasha From sashak at voltaire.com Wed May 17 15:53:27 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 18 May 2006 01:53:27 +0300 Subject: [openib-general] [PATCH] opensm: remove cl_mem* stuff from diags [was: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines] In-Reply-To: <20060517220249.GC28485@sashak.voltaire.com> References: <1147882436.18971.50423.camel@hal.voltaire.com> <20060517220249.GC28485@sashak.voltaire.com> Message-ID: <20060517225327.GG28485@sashak.voltaire.com> On 01:02 Thu 18 May , Sasha Khapyorsky wrote: > On 12:14 Wed 17 May , Hal Rosenstock wrote: > > OpenSM: Use memory routines directly and eliminate cl_mem* routines > > as these routines are part of ISO C > > > > Signed-off-by: Hal Rosenstock > > Following Hal's cleanup And even more: This cleans cl_mem*() wrappers from diags sources Signed-off-by: Sasha Khapyorsky --- diags/src/saquery.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) d1950d51d8a6ada9b69ed194cd8cc4b2e9aa7902 diff --git a/diags/src/saquery.c b/diags/src/saquery.c index 5526bff..7c07253 100644 --- a/diags/src/saquery.c +++ b/diags/src/saquery.c @@ -42,6 +42,7 @@ #include #include #include #include +#include #define _GNU_SOURCE #include @@ -203,8 +204,8 @@ get_all_records(osm_bind_handle_t bind_h osmv_query_req_t req; osmv_user_query_t user; - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); user.attr_id = query_id; user.attr_offset = attr_offset; -- 1.3.2 From vuhuong at mellanox.com Wed May 17 17:04:42 2006 From: vuhuong at mellanox.com (Vu Pham) Date: Wed, 17 May 2006 17:04:42 -0700 Subject: [openib-general][PATCH] srp: param sg_tablesize, In-Reply-To: References: <443E8325.2000502@mellanox.com> <44622634.1070705@mellanox.com> Message-ID: <446BBA1A.6060500@mellanox.com> > @@ -1914,6 +1920,11 @@ static int __init srp_init_module(void) > { > int ret; > Thanks, should we do a check and put some cap on srp_sg_tablesize value ie. + srp_sg_tablesize = max(1, srp_sg_tablesize); + srp_sg_tablesize = min(srp_sg_tablesize, SRP_MAX_SG_TABLESIZE); > + srp_template.sg_tablesize = srp_sg_tablesize; > + srp_max_iu_len = (sizeof (struct srp_cmd) + > + sizeof (struct srp_indirect_buf) + > + srp_sg_tablesize * 16); > + > > SRP_MAX_LUN = 512, > - SRP_MAX_IU_LEN = 256, > + SRP_DEF_SG_TABLESIZE = 12, + SRP_MAX_SG_TABLESIZE = 128, From katiyar.mohit at gmail.com Wed May 17 18:22:11 2006 From: katiyar.mohit at gmail.com (Mohit Katiyar) Date: Thu, 18 May 2006 10:22:11 +0900 Subject: [openib-general] iSER Status Message-ID: <46465bb30605171822i5879915em79134a293043fe96@mail.gmail.com> Hi all, Can anyone tell me whether the latest stable release of Linux(2.6.16.16) contains both iSER intiator and target code or only the initiator code? The open-iser target code available at https://openfabrics.org/svn/gen2/ulps/open-iser-target/ is stable or not? Thanks Mohit From danb at voltaire.com Wed May 17 20:46:55 2006 From: danb at voltaire.com (Dan Bar Dov) Date: Thu, 18 May 2006 06:46:55 +0300 Subject: [openib-general] iSER Status Message-ID: Hi Mohit, Linux kernel 2.6.16 does not include ISER. ISER (initiator only) is scheduled for kernel 2.6.18. The ISER initiator code from openIB trunk is stable, and works with the open-iscsi initiator. The ISER target code is the seed of a project aimed to provide an iSCSI/ISER target. It is in early development. The code itself is stable, but there is no iSCSI target you can interface it with. We plan to interface it with the stgt project. Dan > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Mohit Katiyar > Sent: Thursday, May 18, 2006 4:22 AM > To: openib-general at openib.org > Subject: [openib-general] iSER Status > > Hi all, > Can anyone tell me whether the latest stable release of > Linux(2.6.16.16) contains both iSER intiator and target code or only > the initiator code? The open-iser target code available at > https://openfabrics.org/svn/gen2/ulps/open-iser-target/ is stable or > not? > > Thanks > Mohit > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From btmiller at helix.nih.gov Wed May 17 21:11:22 2006 From: btmiller at helix.nih.gov (Tim Miller) Date: Thu, 18 May 2006 00:11:22 -0400 Subject: [openib-general] OpenIB 1.0 RC + PathScale problem Message-ID: Hi All, I'm trying to test the 1.0 RC branch from subversion with PathScale InfiniPath HT-460 (I've used previous versions with some success). The code compiles successfully, and I can run ibv_rc_pingpong and even a simple MPI program. But when I try to run my main application, I get the following error: libibverbs: Warning: couldn't load driver /usr/local/lib/infiniband/libipathverbs.so: /usr/local/lib/infiniband/libipathverbs.so: undefined symbol: ibv_cmd_poll_cq Does anyone know what might cause this error? I ran an nm on libipathverbs.so and saw ibv_cmd_poll_cq and I found it in the libibverbs source, too, so I'm a bit confused about what the root cause of this is. My apologies if this has already been raised. I took a quick look in the archives and did not see anything off hand that matches this. Thanks, Tim M. -- Tim Miller System Administrator -- Laboratory of Computational Biology National Institutes of Health -- Bldg. 50 Rm. 3309 -- 301-402-0618 From olson at unixfolk.com Wed May 17 21:13:59 2006 From: olson at unixfolk.com (Dave Olson) Date: Wed, 17 May 2006 21:13:59 -0700 (PDT) Subject: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes In-Reply-To: References: Message-ID: On Wed, 17 May 2006, Roland Dreier wrote: | Dave> We are seeing a bug (with both our driver native MPI | Dave> processes and mthca mvapic), where when 8 processes using | Dave> "simultaneously exit", we get watchdogs and/or hangs in the | Dave> close routines. Moving the freeing outside the mutex was an | Dave> attempt to see if we were running into some VM issues by | Dave> doing lots of page unlocking and freeing with the mutex | Dave> held. It seemed to help somewhat, but not to solve the | Dave> problem. | | Am I understanding correctly that you see a hang or watchdog timeout | even with the mthca driver? Yes. That is, the symptoms are the same, although the cause may be different. | Is there any possibility of posting the test case to reproduce this? It's the MPI job mpi_multibw (based on the OSU osu_bw, but changed to do messaging rate), running 8 copies per dual-core 4-socket opteron, both on InfiniPath MPI, and MVAPICH (built for gen2). We ship the source with our upcoming release, and will probably make it available outside our release. We did discover one possible problem today, which is shared between our device code and the core openib code, and that's doing some memory freeing and accounting from a work thread (updating mm->locked_vm and cleaning up from earlier get_user_pages); the code in our driver was copied from the openib core code, it's not literally shared. I have a strong suspicion that at least sometimes, it's executing after the current->mm has gone away. I'm looking at that more right now. | It doesn't seem likely that ipath changes are going to fix a generic | bug like this... It wasn't an attempt to fix it, so much as to work around it, while I worked on other higher priority stuff. As I mentioned, it also helps a bit in allowing multiple processes to be in the open and close code simultaneously, when you have multiple cpus, so even on that basis, I'd probably leave it as it now is. Dave Olson olson at unixfolk.com http://www.unixfolk.com/dave From mst at mellanox.co.il Wed May 17 21:24:27 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 18 May 2006 07:24:27 +0300 Subject: [openib-general] Re: [resend][RFC][PATCH] adding call to madvise In-Reply-To: <20060514134240.GZ5319@minantech.com> References: <20060511134217.GW5319@minantech.com> <20060511185926.GA1561@minantech.com> <20060514134240.GZ5319@minantech.com> Message-ID: <20060518042427.GA11533@mellanox.co.il> Quoting r. Gleb Natapov : > @@ -187,8 +194,8 @@ int ibv_lock_range(void *base, size_t si > > > if (node->refcnt++ == 0) { > - ret = mlock((void *) node->start, > - node->end - node->start + 1); > + ret = madvise((void *) node->start, > + node->end - node->start + 1, MADV_DONTFORK); > if (ret) > goto out; > } Will this break libibverbs on older kernels that don't have madvise? Maybe test MADV_DONTFORK during library startup and set a flag? -- MST From bos at pathscale.com Wed May 17 21:52:44 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Wed, 17 May 2006 21:52:44 -0700 Subject: [openib-general] OpenIB 1.0 RC + PathScale problem In-Reply-To: References: Message-ID: <1147927964.3094.8.camel@localhost.localdomain> On Thu, 2006-05-18 at 00:11 -0400, Tim Miller wrote: > But when I try to run my main application, I get the > following error: > > libibverbs: Warning: couldn't load driver > /usr/local/lib/infiniband/libipathverbs.so: > /usr/local/lib/infiniband/libipathverbs.so: undefined symbol: > ibv_cmd_poll_cq > > Does anyone know what might cause this error? No. We don't see this problem here. Can you provide some more information, please? Running ldd on /usr/local/lib/infiniband/libipathverbs.so would be a good place to start, so you can see exactly which libibverbs.so is being linked against. Also, if you could post the relevant nm output for both libraries, that would be good. Thanks, (Dave Olson's message of "Wed, 17 May 2006 21:13:59 -0700 (PDT)") References: Message-ID: Dave> We did discover one possible problem today, which is shared Dave> between our device code and the core openib code, and that's Dave> doing some memory freeing and accounting from a work thread Dave> (updating mm->locked_vm and cleaning up from earlier Dave> get_user_pages); the code in our driver was copied from the Dave> openib core code, it's not literally shared. Dave> I have a strong suspicion that at least sometimes, it's Dave> executing after the current->mm has gone away. I'm looking Dave> at that more right now. It doesn't seem likely to me. In uverbs_mem.c, ib_umem_release_on_close() does get_task_mm() and gives up if it can't take a reference to the task's mm. The mmput() doesn't happen until ib_umem_account() runs in the work thread. I do see obvious bugs in ipath_user_pages.c, though. In ipath_release_user_pages_on_close(), you have: mm = get_task_mm(current); if (!mm) goto bail; work = kmalloc(sizeof(*work), GFP_KERNEL); if (!work) goto bail_mm; goto bail; INIT_WORK(&work->work, user_pages_account, work); work->mm = mm; work->num_pages = num_pages; bail_mm: mmput(mm); bail: return; So with the "goto bail" you skip the code which does something with the work you allocate, which means that you leak not only the work structure but also the reference to the task's mm that you took. Even without the "goto bail" the code still wouldn't actually schedule the work, so the work structure would be leaked, although you would do mmput(). I'm not sure what you were trying to do here.c - R. From zhushisongzhu at yahoo.com Wed May 17 21:57:31 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Wed, 17 May 2006 21:57:31 -0700 (PDT) Subject: [openib-general] OFED RC4 also can't support >2000 connections In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB9B@mtlexch01.mtl.com> Message-ID: <20060518045731.39795.qmail@web36908.mail.mud.yahoo.com> After executing command 'SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so ./lt-ab -c 2000 -n 2000 -X 193.12.10.14:3129', there are some problems either: (1)sometimes server(running squid) occurred kernel panic. (2)client never connect server successfully. If using ttcp.aio to test, the error occurred on client is: [root at ibtest1 ~]# ./ttcp.aio -t 193.12.10.14 ttcp-t: buflen = 8192 nbuf = 2048 align = 16384/0 port = 5001 193.12.10.14 ttcp-t: socket ttcp-t: connect: Cannot allocate memory errno=12 [root at ibtest1 ~]# how to solve the problem? tks zhu --- Eitan Zahavi wrote: > Hi Zhu, > > If you are using libsdp.conf to select which ports > should map to SDP and > which to TCP you might run out of resources for > tracking the opened > sockets. > > Try increasing the following constant in libsdp: > libsdp/src/port.c line 48: > #define MAPPED_SOCKET_MAX 1024 > to something like: > #define MAPPED_SOCKET_MAX 10000 > > Or, if you can use SDP sockets only (your config > file is empty anyway): > SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so squid -d 10 -f > squid.conf > SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so ./lt-ab -c 2000 > -n 2000 -X > 193.12.10.14:3129 > > Hope this fixes the issue you see > > Eitan Zahavi > Senior Engineering Director, Software Architect > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: openib-general-bounces at openib.org > [mailto:openib-general- > > bounces at openib.org] On Behalf Of zhu shi song > > Sent: Wednesday, May 17, 2006 3:17 PM > > To: openib-general at openib.org > > Subject: [openib-general] OFED RC4 also can't > support >2000 > connections > > > > I have installed OFED RC4 on my RHEL 4.3(2.6.9-34 > > kernel). I use the same method I told in previous > > mail. When increasing concurrent sdp connection > to > > 2000. sdp refuse connection in server side. And > client > > can't connect to server through sdp connection > > forever. > > > > OS: RHEL 4.3 (2.6.9-34) > > IB: OFED RC4 > > Test Method: > > Server: LD_PRELOAD=libsdp.so squid -d 10 -f > > squid.conf( sdp listening on IB0: > 193.12.10.14:3129) > > Client: LD_PRELOAD=libsdp.so ./lt-ab -c 2000 > -n > > 2000 -X 193.12.10.14:3129 > > http://www.google.com/index.html ( IB0: > 193.12.10.24) > > > > > > Who know what's wrong with sdp many concurrent > > connections? I have bought the cards for about 3 > > weeks, but I can't make them work correctly. > Urgent! > > > > tks > > zhu > > > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > > protection around > > http://mail.yahoo.com > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From ogerlitz at voltaire.com Wed May 17 22:13:42 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 18 May 2006 08:13:42 +0300 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> <446B1E84.9020505@voltaire.com> Message-ID: <446C0286.70708@voltaire.com> Roland Dreier wrote: > Or> Can you spare few words whats the difference between the > Or> for-2.6.18 and for-mm branches of your git tree? > > for-mm is what Andrew pulls to get patches for -mm. It has things > that I think should be seen in -mm, but which I am not ready to queue > in for-2.6.18. You can use git show-branch or gitk to visualize > exactly how the branches relate. Just to make sure... does "not ready to queue in for-2.6.18" wrt iSER relates to the dependency on the 2.6.18 iSCSI updates (as the CMA can [should ?] be pushed with iSER), or you see any further issues which needs to be fixed before the code is ready? I will try git show-branch, thanks. > Or> When you say the code is pushed into master.kernel.org are you > Or> referring to the mm tree of Andrew Morton? i don't see he has > Or> one under kernel.org/git? > > No, I mean it's in my tree on master.kernel.org, rather than just > sitting on my local hard disk. OK, thanks for the clarification. Or. From bos at pathscale.com Wed May 17 22:15:11 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Wed, 17 May 2006 22:15:11 -0700 Subject: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes In-Reply-To: References: Message-ID: <1147929311.3094.16.camel@localhost.localdomain> On Wed, 2006-05-17 at 21:55 -0700, Roland Dreier wrote: > So with the "goto bail" you skip the code which does something with > the work you allocate, which means that you leak not only the work > structure but also the reference to the task's mm that you took. Wow. I have no idea where that extra "goto bail" came from. It's not supposed to be there. (Or Gerlitz's message of "Thu, 18 May 2006 08:13:42 +0300") References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> <446B1E84.9020505@voltaire.com> <446C0286.70708@voltaire.com> Message-ID: Or> Just to make sure... does "not ready to queue in for-2.6.18" Or> wrt iSER relates to the dependency on the 2.6.18 iSCSI updates Or> (as the CMA can [should ?] be pushed with iSER), or you see Or> any further issues which needs to be fixed before the code is Or> ready? The only issue is that iser can't be merged until both James Bottomly and I have merged other stuff upstream first. - R. From rdreier at cisco.com Wed May 17 22:17:50 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 22:17:50 -0700 Subject: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes In-Reply-To: <1147929311.3094.16.camel@localhost.localdomain> (Bryan O'Sullivan's message of "Wed, 17 May 2006 22:15:11 -0700") References: <1147929311.3094.16.camel@localhost.localdomain> Message-ID: Bryan> Wow. I have no idea where that extra "goto bail" came Bryan> from. It's not supposed to be there. Even without it you still leak the work structure, because there's no schedule_work(). Now that I look at it, in uverbs_mem.c, the mm will be leaked if the kmalloc fails... - R. From dotanb at mellanox.co.il Wed May 17 22:25:54 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Thu, 18 May 2006 08:25:54 +0300 Subject: [openib-general] [librdmacm] changes to cmatose to return a value different than 0 when there is a failure In-Reply-To: <446B6A94.7020306@ichips.intel.com> References: <200605171835.44079.dotanb@mellanox.co.il> <446B6A94.7020306@ichips.intel.com> Message-ID: <200605180825.54823.dotanb@mellanox.co.il> On Wednesday 17 May 2006 21:25, Sean Hefty wrote: > Dotan Barak wrote: > > Added checks to the return values of all of the functions that may fail > > (in order to add this test to the regression system). > > Thanks - applied with one minor change. > > > + int rc; > > Changed 'rc' to 'ret' to match the rest of the code. > > - Sean > great, thanks (next time i will pay attention to this issue). Dotan From olson at pathscale.com Wed May 17 22:26:32 2006 From: olson at pathscale.com (Dave Olson) Date: Wed, 17 May 2006 22:26:32 -0700 (PDT) Subject: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes In-Reply-To: References: Message-ID: On Wed, 17 May 2006, Dave Olson wrote: | On Wed, 17 May 2006, Roland Dreier wrote: | | | Am I understanding correctly that you see a hang or watchdog timeout | | even with the mthca driver? | | Yes. That is, the symptoms are the same, although the cause | may be different. | | | Is there any possibility of posting the test case to reproduce this? | | It's the MPI job mpi_multibw (based on the OSU osu_bw, but changed | to do messaging rate), running 8 copies per dual-core 4-socket opteron, | both on InfiniPath MPI, and MVAPICH (built for gen2). Here's the typical case where the watchdog fires (with infinipath MPI), on FC4 2.6.16 2108 (without kprobes, with kprobes things are slightly different, but not much; I'm running without since we were often in the kprobes code from the exit code, but I think that's just a red-herring). The sysrq p was some seconds prior to the watchdog. It's almost as though something is looping far too many times during the close cleanup. The other 7 exitting processes are typically in sys_exit_group -> do_exit -> __up_red --> __spin_lock_irqsave -> __up_read (or __down_read) (from what sysrq t prints). They are all runnable on the other 7 processors. The infinipath driver does mmap both memory and device pages for each of these processes. SysRq : Show Regs CPU 0: Modules linked in: ib_sdp(U) ib_cm(U) ib_umad(U) ib_uverbs(U) ib_ipath(U) ib_ipoib(U) ib_sa(U) ib_mad(U) ib_core(U) ipath_core(U) nfs(U) nfsd(U) exportfs(U) lockd(U) nfs_acl(U) ipv6(U) autofs4(U) sunrpc(U) video(U) button(U) battery(U) ac(U) i2c_nforce2(U) i2c_core(U) e1000(U) floppy(U) sg(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) sata_nv(U) libata(U) aic79xx(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U) Pid: 23788, comm: mpi_multibw Not tainted 2.6.16-1.2108_FC4.rootsmp #1 RIP: 0010:[] {__do_softirq+81} RSP: 0018:ffffffff8048d368 EFLAGS: 00000206 RAX: 0000000000000022 RBX: 0000000000000022 RCX: 0000000000000080 RDX: 0000000000000000 RSI: 00000000000000c0 RDI: ffff81007f1fd0c0 RBP: ffffffff80528f80 R08: 0000000000000200 R09: 0000000000000002 R10: ffffffff804a6a38 R11: 0000000000000000 R12: ffffffff80577c80 R13: 0000000000000000 R14: 000000000000000a R15: 00002aaabba6c000 FS: 00002aaaab32ffa0(0000) GS:ffffffff80511000(0000) knlGS:00000000f7fc86c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000055555565ebe8 CR3: 000000007ac6d000 CR4: 00000000000006e0 Call Trace: {call_softirq+30} {do_softirq+44} {apic_timer_interrupt+132} {_write_unlock_irq+14} {__set_page_dirty_nobuffers+183} {unmap_vmas+1042} {exit_mmap+124} {mmput+37} {do_exit+584} {__dequeue_signal+459} {sys_exit_group+0} {get_signal_to_deliver+1568} {do_signal+116} {__pollwait+0} {sys_select+934} {sysret_signal+28} {ptregscall_common+103} [ perhaps 20 or 30 seconds later, NMI fires; we had already been sort of stuck for 60 seconds or so when I did the sysrq p above ] NMI Watchdog detected LOCKUP on CPU 1 CPU 1 Modules linked in: ib_sdp(U) ib_cm(U) ib_umad(U) ib_uverbs(U) ib_ipath(U) ib_ipoib(U) ib_sa(U) ib_mad(U) ib_core(U) ipath_core(U) nfs(U) nfsd(U) exportfs(U) lockd(U) nfs_acl(U) ipv6(U) autofs4(U) sunrpc(U) video(U) button(U) battery(U) ac(U) i2c_nforce2(U) i2c_core(U) e1000(U) floppy(U) sg(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) sata_nv(U) libata(U) aic79xx(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U) Pid: 23789, comm: mpi_multibw Not tainted 2.6.16-1.2108_FC4.rootsmp #1 RIP: 0010:[] {_raw_write_lock+161} RSP: 0018:ffff81007c5b5c18 EFLAGS: 00000086 RAX: 000000008f02e600 RBX: ffff810037cec680 RCX: 00000000002c2671 RDX: 0000000000927190 RSI: 0000000000000001 RDI: ffff810037cec680 RBP: ffff810037cec668 R08: ffff810002d6b500 R09: 00000000fffffffa R10: 0000000000000003 R11: ffffffff80165922 R12: ffff810037cec680 R13: 00002aaaac200000 R14: ffff810002d6b540 R15: 00002aaabba6c000 FS: 00002aaaaaae6080(0000) GS:ffff81011fc466c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000033f38bdaf0 CR3: 000000007c296000 CR4: 00000000000006e0 Process mpi_multibw (pid: 23789, threadinfo ffff81007c5b4000, task ffff8100030557a0) Stack: ffff810002d6b540 ffffffff8016596b 0000000075ad5067 00002aaaac1b4000 ffff81007d451da0 ffffffff8016cc80 0000000000000000 ffff81007c5b5d38 ffffffffffffffff 0000000000000000 Call Trace: {__set_page_dirty_nobuffers+73} {unmap_vmas+1042} {exit_mmap+124} {mmput+37} {do_exit+584} {__dequeue_signal+459} {sys_exit_group+0} {get_signal_to_deliver+1568} {do_signal+116} {__pollwait+0} {sys_select+934} {sysret_signal+28} {ptregscall_common+103} Code: 84 c0 75 7f f0 81 03 00 00 00 01 f3 90 48 83 c1 01 48 8b 15 Kernel panic - not syncing: nmi watchdog From ogerlitz at voltaire.com Wed May 17 22:30:22 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 18 May 2006 08:30:22 +0300 Subject: [openib-general] Re: testing IB with unreleased kernels In-Reply-To: <20060517171617.GA6719@esmail.cup.hp.com> References: <446B2319.9030204@voltaire.com> <20060517171617.GA6719@esmail.cup.hp.com> Message-ID: <446C066E.2060209@voltaire.com> Grant Grundler wrote: > On Wed, May 17, 2006 at 07:40:07AM -0700, Roland Dreier wrote: >> Yes, I agree. That's why I think we should get rid of the >> "linux-kernel" part of the svn tree entirely. Because everyone who >> wants to test new code seems to run last stable kernel + svn drivers >> instead of the new development kernel. > That's because openib guarantee SVN drivers will build with last > stable kernel. Change that policy and document the steps > that folks should follow. I'd be willing to occasionally try > newer kernels if you think that's what we should be doing. Please note that both approaches suggested above will not force to test latest IB code with the under-development kernel... This is b/c most of the code (specifically the already in-tree) has zero backport to the latest stable kernel, eg the kernel portion of OFED which is targeted for 2.6.16 is based on the for-2.6.17 branch of Roland's GIT tree (expect for the components not there yet, which are co from the SVN), but OFED is not tested with (does not support) 2.6.17-rcX The same "trick" would work also with Grant's approach. So there's no replacement for testing done at least by the openib maintainers (and distros!!! when they start moving to IB...) for: +1 next-kernel-RC-versions downloaded from kernel.org (eg 2.6.17-RCX) +2 next-next-kernel-branches of infiniband.git (Roland's tree) Ofcourse people are busy, and testing is derived from needs.. for example the iSER maintainers (...) are testing now with what's closet to 2.6.18 and i guess the ipath maintainers are testing with 2.6.17-rc4 But at some point of the cycle, its a must that each maintainer would test his/her code with next-kernel-RC-versions from kernel.org Or. Or. From ogerlitz at voltaire.com Wed May 17 22:33:04 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 18 May 2006 08:33:04 +0300 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> <446B1E84.9020505@voltaire.com> <446C0286.70708@voltaire.com> Message-ID: <446C0710.8090803@voltaire.com> Roland Dreier wrote: > Or> Just to make sure... does "not ready to queue in for-2.6.18" > Or> wrt iSER relates to the dependency on the 2.6.18 iSCSI updates > Or> (as the CMA can [should ?] be pushed with iSER), or you see > Or> any further issues which needs to be fixed before the code is > Or> ready? > > The only issue is that iser can't be merged until both James Bottomly > and I have merged other stuff upstream first. Sure, thanks for the clarification. As for the CMA merge, you prefer to be on the safe side and do it **before** and not **with** iSER? Do you have any estimate when the 2.6.18 merge window opens? Or. From rdreier at cisco.com Wed May 17 22:35:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 May 2006 22:35:37 -0700 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: <446C0710.8090803@voltaire.com> (Or Gerlitz's message of "Thu, 18 May 2006 08:33:04 +0300") References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> <446B1E84.9020505@voltaire.com> <446C0286.70708@voltaire.com> <446C0710.8090803@voltaire.com> Message-ID: Or> Sure, thanks for the clarification. As for the CMA merge, you Or> prefer to be on the safe side and do it **before** and not Or> **with** iSER? Yes, that's what I'm planning on. Or> Do you have any estimate when the 2.6.18 merge window opens? Right after 2.6.17 is released. - R. From glebn at voltaire.com Wed May 17 22:45:58 2006 From: glebn at voltaire.com (Gleb Natapov) Date: Thu, 18 May 2006 08:45:58 +0300 Subject: [openib-general] Re: [resend][RFC][PATCH] adding call to madvise In-Reply-To: <20060518042427.GA11533@mellanox.co.il> References: <20060511134217.GW5319@minantech.com> <20060511185926.GA1561@minantech.com> <20060514134240.GZ5319@minantech.com> <20060518042427.GA11533@mellanox.co.il> Message-ID: <20060518054558.GB16303@minantech.com> On Thu, May 18, 2006 at 07:24:27AM +0300, Michael S. Tsirkin wrote: > Quoting r. Gleb Natapov : > > @@ -187,8 +194,8 @@ int ibv_lock_range(void *base, size_t si > > > > > > if (node->refcnt++ == 0) { > > - ret = mlock((void *) node->start, > > - node->end - node->start + 1); > > + ret = madvise((void *) node->start, > > + node->end - node->start + 1, MADV_DONTFORK); > > if (ret) > > goto out; > > } > > Will this break libibverbs on older kernels that don't have madvise? > Maybe test MADV_DONTFORK during library startup and set a flag? > madvise is always there, but older kernels will return EINVAL and we don't check return value of ibv_lock_range() in ibv_reg_mr() so no harm is done. It is possible to test for MADV_DONTFORK support during libibvervs init and disable all madvise pathes if it is not available, but then we will have two different configuration to test with no much gain. -- Gleb. From ogerlitz at voltaire.com Wed May 17 23:44:00 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Thu, 18 May 2006 09:44:00 +0300 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> <446B1E84.9020505@voltaire.com> <446C0286.70708@voltaire.com> <446C0710.8090803@voltaire.com> Message-ID: <446C17B0.4010500@voltaire.com> Roland Dreier wrote: > Or> Sure, thanks for the clarification. As for the CMA merge, you > Or> prefer to be on the safe side and do it **before** and not > Or> **with** iSER? > > Yes, that's what I'm planning on. Sure, better safe than sorry is good habit! its just this two weeks short time frame for three (iscsi && cma -> iser) serialized pushes which worries me a little, i guess there's nothing we can do about it. Or. From olson at unixfolk.com Thu May 18 00:04:28 2006 From: olson at unixfolk.com (Dave Olson) Date: Thu, 18 May 2006 00:04:28 -0700 (PDT) Subject: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes In-Reply-To: References: Message-ID: On Wed, 17 May 2006, Roland Dreier wrote: | I do see obvious bugs in ipath_user_pages.c, though. In | ipath_release_user_pages_on_close(), you have: | | mm = get_task_mm(current); | if (!mm) | goto bail; It turns out that since this is called from ipath_close(), mm will always be NULL, so what we do is leak memory, and possibly leave some locked pages. I've been looking at this code this evening; fixing it is clearly needed, but doesn't help the long delays, hangs, and watchdogs, so far. Dave Olson olson at unixfolk.com http://www.unixfolk.com/dave From jackm at mellanox.co.il Thu May 18 00:14:57 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Thu, 18 May 2006 10:14:57 +0300 Subject: [openib-general] OpenIB 1.0 RC + PathScale problem In-Reply-To: <1147927964.3094.8.camel@localhost.localdomain> References: <1147927964.3094.8.camel@localhost.localdomain> Message-ID: <200605181014.58567.jackm@mellanox.co.il> On Thursday 18 May 2006 07:52, Bryan O'Sullivan wrote: > On Thu, 2006-05-18 at 00:11 -0400, Tim Miller wrote: > > But when I try to run my main application, I get the > > following error: > > > > libibverbs: Warning: couldn't load driver > > /usr/local/lib/infiniband/libipathverbs.so: > > /usr/local/lib/infiniband/libipathverbs.so: undefined symbol: > > ibv_cmd_poll_cq > > > > Does anyone know what might cause this error? > in openib 1.0 RC4, the function name was still ib_cmd_poll_cq. In the upcoming RC5, both names are supported. - Jack From eitan at mellanox.co.il Thu May 18 02:28:28 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 18 May 2006 12:28:28 +0300 Subject: [openib-general] [libsdp] RFC: Configuration file enhancements Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBB0@mtlexch01.mtl.com> Hi All, I would like to get your comments for the following libsdp enhancements: 1. Today: If libsdp.config file does not exist or is not specifying any match rule no SDP socket is used Change: Under the above conditions all sockets will be SDP sockets (as if SIMPLE_LIBSDP was env set) 2. Today: No way to match all programs or ports and exclude a few Change: New "unmatch" directive will allow exclusion from SDP sockets. Will support same syntax as "match". Example simple config to map all programs to SDP excluding a single server by its name: unmatch program myLowPriorityServerProgName match program * 3. Today: "match_both" is not clearly described as applying to passive side only, even though it does not have a meaning for "active" side (since connection is either on INET or SDP) Change: Wrror on cases where the user specified match_both destination ... 4. Today: If connect over SDP fails an automatic fall back to INET socket is performed Change: "match_fallback" should be used for active side rules when fallback is required. Moreover "match" will not fallback - i.e. if SDP socket is required and fail - connect will return an error. Thanks Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhushisongzhu at yahoo.com Thu May 18 03:53:06 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Thu, 18 May 2006 03:53:06 -0700 (PDT) Subject: [openib-general] OFED RC4 also can't support >2000 connections In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB9B@mtlexch01.mtl.com> Message-ID: <20060518105306.25457.qmail@web36905.mail.mud.yahoo.com> I modified squid to let it listening on sdp socket directly. Client Side: SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so ./lt-ab -c 1000 -n 1000 -X 193.12.10.l4:3129 http://www.google.com/index.html Server Side: ./squid -d 10 -f squid.conf (listening on http_port 193.12.10.14:3129, socket is opened in sdp way) The test results are: (1) no kernel panic (2) when concurrent connections is 100, performance is not stable. sometimes is ok. sometimes is slower than TCP. (3) when connections are set to 1000, there always 100~300 requests failure. (4) when connections are set to 2000, there are many "ib_sdp WARN: Failed to allocate buffer page" warnings (5) at last, when connections are set to 3000, "Cannot allocate Memory(12)" error occurred and never build connection to server. SDP can't work forever. I think there is some bugs, now I'm looking up the source code, But I'm not so familiar about the code,so I think I can't solve the problem fast. I hope someone who is responsible sdp development to test sdp intensively and correct the bug, then sdp will be some product grade and not just stay conceptual stage. tks zhu --- Eitan Zahavi wrote: > Hi Zhu, > > If you are using libsdp.conf to select which ports > should map to SDP and > which to TCP you might run out of resources for > tracking the opened > sockets. > > Try increasing the following constant in libsdp: > libsdp/src/port.c line 48: > #define MAPPED_SOCKET_MAX 1024 > to something like: > #define MAPPED_SOCKET_MAX 10000 > > Or, if you can use SDP sockets only (your config > file is empty anyway): > SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so squid -d 10 -f > squid.conf > SIMPLE_LIBSDP=1 LD_PRELOAD=libsdp.so ./lt-ab -c 2000 > -n 2000 -X > 193.12.10.14:3129 > > Hope this fixes the issue you see > > Eitan Zahavi > Senior Engineering Director, Software Architect > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: openib-general-bounces at openib.org > [mailto:openib-general- > > bounces at openib.org] On Behalf Of zhu shi song > > Sent: Wednesday, May 17, 2006 3:17 PM > > To: openib-general at openib.org > > Subject: [openib-general] OFED RC4 also can't > support >2000 > connections > > > > I have installed OFED RC4 on my RHEL 4.3(2.6.9-34 > > kernel). I use the same method I told in previous > > mail. When increasing concurrent sdp connection > to > > 2000. sdp refuse connection in server side. And > client > > can't connect to server through sdp connection > > forever. > > > > OS: RHEL 4.3 (2.6.9-34) > > IB: OFED RC4 > > Test Method: > > Server: LD_PRELOAD=libsdp.so squid -d 10 -f > > squid.conf( sdp listening on IB0: > 193.12.10.14:3129) > > Client: LD_PRELOAD=libsdp.so ./lt-ab -c 2000 > -n > > 2000 -X 193.12.10.14:3129 > > http://www.google.com/index.html ( IB0: > 193.12.10.24) > > > > > > Who know what's wrong with sdp many concurrent > > connections? I have bought the cards for about 3 > > weeks, but I can't make them work correctly. > Urgent! > > > > tks > > zhu > > > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > > protection around > > http://mail.yahoo.com > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From mst at mellanox.co.il Thu May 18 04:02:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 18 May 2006 14:02:19 +0300 Subject: [openib-general] Re: [libsdp] RFC: Configuration file enhancements In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBB0@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBB0@mtlexch01.mtl.com> Message-ID: <20060518110219.GP30211@mellanox.co.il> Quoting r. Eitan Zahavi : > 3. Today: "match_both" is not clearly described as applying to passive side only, even though it does > > not have a meaning for "active" side (since connection is either on INET or SDP) > > Change: Wrror on cases where the user specified match_both destination ? > > 4. Today: If connect over SDP fails an automatic fall back to INET socket is performed > > Change: "match_fallback" should be used for active side rules when fallback is required. Moreover > > "match" will not fallback ­ i.e. if SDP socket is required and fail ­ connect will return an error. > > Thanks IMO, unmatch, match_both match_fallback are misleading names: you still do matching in the same way, you supply a modifier affecting SDP/TCP selection. How about we have an extra parameter to match directive? It could be sdp, tcp, or both. Thus: match sdp listen *:12865 match tcp destination 192.169.2.0/24 # tcp only to this destination match both destination 192.168.1.0/24 # sdp with fallback -- MST From mst at mellanox.co.il Thu May 18 04:08:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 18 May 2006 14:08:56 +0300 Subject: [openib-general] Re: OFED RC4 also can't support >2000 connections In-Reply-To: <20060518105306.25457.qmail@web36905.mail.mud.yahoo.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BB9B@mtlexch01.mtl.com> <20060518105306.25457.qmail@web36905.mail.mud.yahoo.com> Message-ID: <20060518110856.GQ30211@mellanox.co.il> Quoting r. zhu shi song : > I hope someone who is responsible sdp development to test sdp intensively and > correct the bug, then sdp will be some product grade and not just stay > conceptual stage. Correct. SDP is not a product yet. I am working at it though so it will get there, stay tuned. -- MST From halr at voltaire.com Thu May 18 04:21:19 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 May 2006 07:21:19 -0400 Subject: [openib-general] [PATCH] MVAPICH: Use memory routines directly rather than cl_mem* routines Message-ID: <1147951268.18971.73459.camel@hal.voltaire.com> MVAPICH: Use memory routines directly (cl_mem* routines were eliminated in OpenSM/complib) Use memory routines directly as these routines are part of ISO C Signed-off-by: Hal Rosenstock This patch is only currently required on the trunk only Index: mvapich-gen2/mpid/ch_gen2/ibmcgrp/ibmcgrp.c =================================================================== --- mvapich-gen2/mpid/ch_gen2/ibmcgrp/ibmcgrp.c (revision 7319) +++ mvapich-gen2/mpid/ch_gen2/ibmcgrp/ibmcgrp.c (working copy) @@ -100,7 +100,7 @@ ibmcgrp_init( IN ibmcgrp_t * const p_ibm ib_api_status_t status; /* just making sure - cleanup the static global obj */ - cl_memclr( p_ibmcgrp, sizeof( *p_ibmcgrp ) ); + memset( p_ibmcgrp, 0, sizeof( *p_ibmcgrp ) ); /* construct and init the log */ p_ibmcgrp->p_log = (osm_log_t *)cl_malloc(sizeof(osm_log_t)); @@ -311,20 +311,20 @@ ibmcgrp_run( IN ibmcgrp_t * p_ibmcgrp ) * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); - cl_memclr( &context, sizeof( context ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); + memset( &context, 0, sizeof( context ) ); /* initialize some defaults on the MC grp request */ /* use default values so we can change only what we want later */ - cl_memclr(&mc_rec,sizeof(ib_member_rec_t)); + memset(&mc_rec, 0, sizeof(ib_member_rec_t)); /* Use the MGID provided */ - cl_memcpy(&mc_rec.mgid, &(p_ibmcgrp->p_opt->mgid), sizeof(ib_gid_t)); + memcpy(&mc_rec.mgid, &(p_ibmcgrp->p_opt->mgid), sizeof(ib_gid_t)); /* our own port gid - as stored in the main object */ - cl_memcpy(&mc_rec.port_gid.unicast.interface_id , + memcpy(&mc_rec.port_gid.unicast.interface_id, &p_ibmcgrp->port_guid, sizeof(p_ibmcgrp->port_guid) ); Index: mvapich-gen2/mpid/ch_gen2/ibmcgrp.c =================================================================== --- mvapich-gen2/mpid/ch_gen2/ibmcgrp.c (revision 7319) +++ mvapich-gen2/mpid/ch_gen2/ibmcgrp.c (working copy) @@ -104,7 +104,7 @@ ibmcgrp_init( IN ibmcgrp_t * const p_ibm ib_api_status_t status; /* just making sure - cleanup the static global obj */ - cl_memclr( p_ibmcgrp, sizeof( *p_ibmcgrp ) ); + memset( p_ibmcgrp, 0, sizeof( *p_ibmcgrp ) ); /* construct and init the log */ p_ibmcgrp->p_log = (osm_log_t *)cl_malloc(sizeof(osm_log_t)); @@ -315,20 +315,20 @@ ibmcgrp_run( IN ibmcgrp_t * p_ibmcgrp ) * * The query structures are locals. */ - cl_memclr( &req, sizeof( req ) ); - cl_memclr( &user, sizeof( user ) ); - cl_memclr( &context, sizeof( context ) ); + memset( &req, 0, sizeof( req ) ); + memset( &user, 0, sizeof( user ) ); + memset( &context, 0, sizeof( context ) ); /* initialize some defaults on the MC grp request */ /* use default values so we can change only what we want later */ - cl_memclr(&mc_rec,sizeof(ib_member_rec_t)); + memset(&mc_rec, 0, sizeof(ib_member_rec_t)); /* Use the MGID provided */ - cl_memcpy(&mc_rec.mgid, &(p_ibmcgrp->p_opt->mgid), sizeof(ib_gid_t)); + memcpy(&mc_rec.mgid, &(p_ibmcgrp->p_opt->mgid), sizeof(ib_gid_t)); /* our own port gid - as stored in the main object */ - cl_memcpy(&mc_rec.port_gid.unicast.interface_id , + memcpy(&mc_rec.port_gid.unicast.interface_id, &p_ibmcgrp->port_guid, sizeof(p_ibmcgrp->port_guid) ); From dotanb at mellanox.co.il Thu May 18 04:42:11 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Thu, 18 May 2006 14:42:11 +0300 Subject: [openib-general] Re: compilation warning in diags tools In-Reply-To: <1147883048.18971.50642.camel@hal.voltaire.com> References: <200605171847.27448.dotanb@mellanox.co.il> <1147883048.18971.50642.camel@hal.voltaire.com> Message-ID: <200605181442.11489.dotanb@mellanox.co.il> On Wednesday 17 May 2006 19:24, Hal Rosenstock wrote: > On Wed, 2006-05-17 at 11:47, Dotan Barak wrote: > > Hi. > > > > Here is a compilation warning when using gcc 3.4.5: > > > > src/grouping.c: In function `get_router_slot': > > src/grouping.c:213: warning: implicit declaration of function `calloc' > > /bin/sh ./libtool --tag=CC --mode=link gcc -m64 -L../libibcommon -libcommon -L../libibumad -libumad -L../osm/opensm/.libs -lopensm -L../os > > m/libvendor/.libs -losmvendor -L../osm/complib/.libs -losmcomp -o src/ibnetdiscover src_ibnetdiscover-ibnetdiscover.o src_ibnetdiscover-gro > > uping.o ../libibcommon/libibcommon.la ../libibumad/libibumad.la ../libibmad/libibmad.la > > > > (i think that stdlib.h should be included to prevent this warning) > > Fixed in r7290. Can you update and try to be sure ? Thanks. it seems that this issue was solved. thanks Dotan From halr at voltaire.com Thu May 18 04:37:29 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 May 2006 07:37:29 -0400 Subject: [openib-general] Re: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines In-Reply-To: <20060517220733.GD14211@mellanox.co.il> References: <1147882436.18971.50423.camel@hal.voltaire.com> <20060517171146.GB12290@mellanox.co.il> <1147886554.18971.52002.camel@hal.voltaire.com> <20060517220733.GD14211@mellanox.co.il> Message-ID: <1147952022.18971.73688.camel@hal.voltaire.com> On Wed, 2006-05-17 at 18:07, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Right now, there is memory > > tracking code implemented. > > Doesn't MALLOC_CHECK_ do what you want? Yes, this looks good to me. Just a few questions: Is this in all glibc's that we would care about ? I'm also not sure about the Windows implications here. Maybe Eitan can comment. -- Hal From mst at mellanox.co.il Thu May 18 04:50:12 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 18 May 2006 14:50:12 +0300 Subject: [openib-general] Re: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines In-Reply-To: <1147952022.18971.73688.camel@hal.voltaire.com> References: <1147882436.18971.50423.camel@hal.voltaire.com> <20060517171146.GB12290@mellanox.co.il> <1147886554.18971.52002.camel@hal.voltaire.com> <20060517220733.GD14211@mellanox.co.il> <1147952022.18971.73688.camel@hal.voltaire.com> Message-ID: <20060518115012.GT30211@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines > > On Wed, 2006-05-17 at 18:07, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > Right now, there is memory > > > tracking code implemented. > > > > Doesn't MALLOC_CHECK_ do what you want? > > Yes, this looks good to me. Just a few questions: > > Is this in all glibc's that we would care about ? I think it's been there since the dawn of time :) > I'm also not sure about the Windows implications here. Maybe Eitan can > comment. MSDN says: "When the application is linked with a debug version of the C run-time libraries, malloc resolves to _malloc_dbg. For more information about how the heap is managed during the debugging process, see Using C Run-Time Library Debugging Support." -- MST From halr at voltaire.com Thu May 18 04:50:00 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 May 2006 07:50:00 -0400 Subject: [openib-general] Re: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines In-Reply-To: <20060518115012.GT30211@mellanox.co.il> References: <1147882436.18971.50423.camel@hal.voltaire.com> <20060517171146.GB12290@mellanox.co.il> <1147886554.18971.52002.camel@hal.voltaire.com> <20060517220733.GD14211@mellanox.co.il> <1147952022.18971.73688.camel@hal.voltaire.com> <20060518115012.GT30211@mellanox.co.il> Message-ID: <1147952996.18971.73976.camel@hal.voltaire.com> On Thu, 2006-05-18 at 07:50, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines > > > > On Wed, 2006-05-17 at 18:07, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock : > > > > Right now, there is memory > > > > tracking code implemented. > > > > > > Doesn't MALLOC_CHECK_ do what you want? > > > > Yes, this looks good to me. Just a few questions: > > > > Is this in all glibc's that we would care about ? > > I think it's been there since the dawn of time :) > > > I'm also not sure about the Windows implications here. Maybe Eitan can > > comment. > > MSDN says: > > "When the application is linked with a debug version of the C run-time > libraries, malloc resolves to _malloc_dbg. For more information about how the > heap is managed during the debugging process, see Using C Run-Time Library > Debugging Support." Sounds good to me. So any final objections to eliminating cl_malloc/free and all the memory tracking code in OpenSM ? Speak now... -- Hal From svenar at simula.no Thu May 18 05:11:07 2006 From: svenar at simula.no (Sven-Arne Reinemo) Date: Thu, 18 May 2006 14:11:07 +0200 Subject: [openib-general] Problems running IBMgtSim with OpenSM Message-ID: <446C645B.3030206@simula.no> Hi, I am trying to get the IBMgtSim running with OpenSM, but I only get an "Unable to find requested CA guid 0x2c90000000009" error (complete dump included below). Any suggestions to what might be wrong? I don't have any IBA hardware in the machine I'm running the simulator on as I thought it was unnecessary (?). The code is a recent checkout from . Best regards, Sven-Arne svenar at bricoleur:~/tmp$ sudo RunSimTest -o /usr/local/bin/opensm -t /home/svenar/openib/utils/src/linux-user/ibmgtsim/tests/Gnu16NodeOsmTest.topo Password: -I- Using random seed:53879 -I- Simulation directory is: /tmp/ibmgtsim.26852 -I- Calling IBMgtSim -s 53879 -V 0xA3 -t /home/svenar/openib/utils/src/linux-user/ibmgtsim/tests/Gnu16NodeOsmTest.topo -l /tmp/ibmgtsim.26852/sim.log -I- Simulator Ready -I- Connecting to the simulator control server:bricoleur.simula.no port:12450 -I- Connected to the simulator control server -I- Defined 51 guids -I- Node H-1 data: 0x0002c90000000008 {0x0002c90000000009 1} {0x0002c9000000000a 2} -I- Starting: /usr/local/bin/opensm -D 0x43 -g 0x0002c90000000009 ... -I- Waiting for OpenSM subnet up ... -I- OpenSM Event:ERR May 18 13:57:45 161191 [B7E606C0] -> osm_vendor_open_port: ERR 5422: Unable to find requested CA guid 0x2c90000000009 -I- New 1 events of /tmp/ibmgtsim.26852/osm.log -I- OpenSM Event:ERR May 18 13:57:45 161261 [B7E606C0] -> osm_vendor_bind: ERR 5424: Unable to Open Port 0x2c90000000009 -I- New 1 events of /tmp/ibmgtsim.26852/osm.log -I- OpenSM Event:ERR May 18 13:57:45 161289 [B7E606C0] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed -I- New 1 events of /tmp/ibmgtsim.26852/osm.log -I- OpenSM Event:ERR May 18 13:57:45 161320 [B7E606C0] -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR) -I- New 1 events of /tmp/ibmgtsim.26852/osm.log -I- OpenSM Event:ERR May 18 13:57:45 161370 [B7E606C0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind -I- New 1 events of /tmp/ibmgtsim.26852/osm.log -I- OpenSM with log:/tmp/ibmgtsim.26852/osm.log died -I- New 1 events of /tmp/ibmgtsim.26852/osm.log -I- Closing SIM ... -I- Closing Sub Processes ... -I- Closing Pipes ... -I- Status = 1 -I- Simulation dir left intact:/tmp/ibmgtsim.26852 svenar at bricoleur:~/tmp$ -- SAR ---- GnuPG public key - http://home.ifi.uio.no/~svenar/gpg.asc ---- "There are only 10 kinds of people in this world; those who know binary and those who don't." -- Unknown From glynnisdelong at promotionstat.com Thu May 18 04:17:53 2006 From: glynnisdelong at promotionstat.com (jess frye) Date: Thu, 18 May 2006 11:17:53 +0000 Subject: [openib-general] Re: Your life, quasi submission Message-ID: <004901c67a6c$75ea8c00$60176793@xohq> Need money? Your credit doesn't matter to us! Want IMMEDIATE cash to spend ANY way you like, or simply wish to LOWER your monthly payments by a third or more? Simply fill out this one-minute form... http://ogaler.com/a62f/ wool-fringed Panhandle state bar joist re-expression grape cane borer gall sickness bush batter bread chain reflex vanilla bean balsam cucumber quasi-foolish cider brandy ferret-badger weary-looking false-spoken corn-exporting top-secret half-stocking eagle owl patent leather cord-connector body Un-moslemlike supralinear punctuation core bar -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Thu May 18 05:52:51 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 May 2006 08:52:51 -0400 Subject: [openib-general] Problems running IBMgtSim with OpenSM In-Reply-To: <446C645B.3030206@simula.no> References: <446C645B.3030206@simula.no> Message-ID: <1147956768.18971.75147.camel@hal.voltaire.com> On Thu, 2006-05-18 at 08:11, Sven-Arne Reinemo wrote: > Hi, > > I am trying to get the IBMgtSim running with OpenSM, I haven't used the simulator but I know OpenSM needs to be built specially for use with the simulator. Was that done (How was the OpenSM build invoked) ? -- Hal > but I only get an > "Unable to find requested CA guid 0x2c90000000009" error (complete dump > included below). Any suggestions to what might be wrong? > > I don't have any IBA hardware in the machine I'm running the simulator > on as I thought it was unnecessary (?). The code is a recent checkout > from . > > Best regards, > Sven-Arne > > svenar at bricoleur:~/tmp$ sudo RunSimTest -o /usr/local/bin/opensm -t > /home/svenar/openib/utils/src/linux-user/ibmgtsim/tests/Gnu16NodeOsmTest.topo > Password: > -I- Using random seed:53879 > -I- Simulation directory is: /tmp/ibmgtsim.26852 > -I- Calling IBMgtSim -s 53879 -V 0xA3 -t > /home/svenar/openib/utils/src/linux-user/ibmgtsim/tests/Gnu16NodeOsmTest.topo > -l /tmp/ibmgtsim.26852/sim.log > -I- Simulator Ready > -I- Connecting to the simulator control server:bricoleur.simula.no > port:12450 > -I- Connected to the simulator control server > -I- Defined 51 guids > -I- Node H-1 data: 0x0002c90000000008 {0x0002c90000000009 1} > {0x0002c9000000000a 2} > -I- Starting: /usr/local/bin/opensm -D 0x43 -g 0x0002c90000000009 ... > -I- Waiting for OpenSM subnet up ... > -I- OpenSM Event:ERR May 18 13:57:45 161191 [B7E606C0] -> > osm_vendor_open_port: ERR 5422: Unable to find requested CA guid > 0x2c90000000009 > -I- New 1 events of /tmp/ibmgtsim.26852/osm.log > -I- OpenSM Event:ERR May 18 13:57:45 161261 [B7E606C0] -> > osm_vendor_bind: ERR 5424: Unable to Open Port 0x2c90000000009 > -I- New 1 events of /tmp/ibmgtsim.26852/osm.log > -I- OpenSM Event:ERR May 18 13:57:45 161289 [B7E606C0] -> > osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed > -I- New 1 events of /tmp/ibmgtsim.26852/osm.log > -I- OpenSM Event:ERR May 18 13:57:45 161320 [B7E606C0] -> osm_sm_bind: > ERR 2E10: SM MAD Controller bind failed (IB_ERROR) > -I- New 1 events of /tmp/ibmgtsim.26852/osm.log > -I- OpenSM Event:ERR May 18 13:57:45 161370 [B7E606C0] -> > osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind > -I- New 1 events of /tmp/ibmgtsim.26852/osm.log > -I- OpenSM with log:/tmp/ibmgtsim.26852/osm.log died > -I- New 1 events of /tmp/ibmgtsim.26852/osm.log > -I- Closing SIM ... > -I- Closing Sub Processes ... > -I- Closing Pipes ... > -I- Status = 1 > -I- Simulation dir left intact:/tmp/ibmgtsim.26852 > svenar at bricoleur:~/tmp$ From mdidomenico at silverstorm.com Thu May 18 06:11:24 2006 From: mdidomenico at silverstorm.com (Di Domenico, Michael) Date: Thu, 18 May 2006 09:11:24 -0400 Subject: [openib-general] OpenIB 1.0 RC + PathScale problem Message-ID: Tim, I'm certainly no expert, but I came across different but similar issues, where my applications where picking up another set of libraries, that I wasn't aware were on the system... I was getting the same 'undefined symbols' errors. You might want to check for ib libs that might be in your path. Just my 2 cents... > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Tim Miller > Sent: Thursday, May 18, 2006 12:11 AM > To: openib-general at openib.org > Subject: [openib-general] OpenIB 1.0 RC + PathScale problem > > Hi All, > > I'm trying to test the 1.0 RC branch from subversion with PathScale > InfiniPath HT-460 (I've used previous versions with some success). The > code compiles successfully, and I can run ibv_rc_pingpong and even a > simple MPI program. But when I try to run my main application, I get the > following error: > > libibverbs: Warning: couldn't load driver > /usr/local/lib/infiniband/libipathverbs.so: > /usr/local/lib/infiniband/libipathverbs.so: undefined symbol: > ibv_cmd_poll_cq > > Does anyone know what might cause this error? I ran an nm on > libipathverbs.so and saw ibv_cmd_poll_cq and I found it in the libibverbs > source, too, so I'm a bit confused about what the root cause of this is. > > My apologies if this has already been raised. I took a quick look in the > archives and did not see anything off hand that matches this. > > Thanks, > Tim M. > > -- > Tim Miller > System Administrator -- Laboratory of Computational Biology > National Institutes of Health -- Bldg. 50 Rm. 3309 -- 301-402- > 0618 > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general From halr at voltaire.com Thu May 18 06:12:15 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 May 2006 09:12:15 -0400 Subject: [openib-general] Re: [PATCH] Replace cl_memory.h by string.h [was: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines] In-Reply-To: <20060517220249.GC28485@sashak.voltaire.com> References: <1147882436.18971.50423.camel@hal.voltaire.com> <20060517220249.GC28485@sashak.voltaire.com> Message-ID: <1147956950.18971.75189.camel@hal.voltaire.com> On Wed, 2006-05-17 at 18:02, Sasha Khapyorsky wrote: > On 12:14 Wed 17 May , Hal Rosenstock wrote: > > OpenSM: Use memory routines directly and eliminate cl_mem* routines > > as these routines are part of ISO C > > > > Signed-off-by: Hal Rosenstock > > Following Hal's cleanup this includes string.h header file for proper > mem*() functions prototype definitions where necessary, removes/includes > cl_memory.h as needed. Also couple of unistd.h additions for close(), > sleep() and unlink() calls. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only. -- Hal From zhushisongzhu at yahoo.com Thu May 18 06:33:25 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Thu, 18 May 2006 06:33:25 -0700 (PDT) Subject: [openib-general] Re: OFED RC4 also can't support >2000 connections In-Reply-To: <20060518110856.GQ30211@mellanox.co.il> Message-ID: <20060518133325.58723.qmail@web36910.mail.mud.yahoo.com> Do you have any time table for sdp? My project is urgent to use sdp. If it's too late, I'll think another method. Because SDP has reliable connection semantics, it's very useful for such as web, ftp, https general applications. So infiband can easily extend to new application area except cluster computing customized applications. But unfortunately it can't work. I hope you can speed your development process. I'll try my best to help you. I'm studying the source code now. tks zhu --- "Michael S. Tsirkin" wrote: > Quoting r. zhu shi song : > > I hope someone who is responsible sdp development > to test sdp intensively and > > correct the bug, then sdp will be some product > grade and not just stay > > conceptual stage. > > Correct. SDP is not a product yet. I am working at > it though so it will get > there, stay tuned. > > -- > MST > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From laxly at thenation.com Thu May 18 06:33:51 2006 From: laxly at thenation.com (kimberrly maitha) Date: Thu, 18 May 2006 13:33:51 +0000 Subject: [openib-general] He loves but little who can say and count in words, how much he loves Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mcemldl.gif Type: image/gif Size: 5659 bytes Desc: not available URL: From mst at mellanox.co.il Thu May 18 06:36:27 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 18 May 2006 16:36:27 +0300 Subject: [openib-general] [PATCH] cma: fix bind to ip Message-ID: <20060518133627.GW30211@mellanox.co.il> Fix private data format for bind to specific IP for SDP. Further, CMA format mask for IPv6 was set incorrectly (hint - memset(foo, 1, bar) does not set memory to all-ones) so fix that. Signed-off-by: Ali Ayoub Signed-off-by: Michael S. Tsirkin Index: linux-2.6.16/drivers/infiniband/core/cma.c =================================================================== --- linux-2.6.16.orig/drivers/infiniband/core/cma.c 2006-05-18 13:36:23.000000000 +0300 +++ linux-2.6.16/drivers/infiniband/core/cma.c 2006-05-18 13:47:59.000000000 +0300 @@ -930,29 +930,50 @@ static __be64 cma_get_service_id(enum rd be16_to_cpu(((struct sockaddr_in *) addr)->sin_port)); } -static void cma_set_compare_data(struct sockaddr *addr, +static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr, struct ib_cm_compare_data *compare) { - struct cma_hdr *data, *mask; + struct cma_hdr *cma_data, *cma_mask; + struct sdp_hh *sdp_data, *sdp_mask; + __u32 ip4_addr; + struct in6_addr ip6_addr; memset(compare, 0, sizeof *compare); - data = (void *) compare->data; - mask = (void *) compare->mask; + cma_data = (void *)compare->data; + cma_mask = (void *)compare->mask; + sdp_data = (void *)compare->data; + sdp_mask = (void *)compare->mask; switch (addr->sa_family) { case AF_INET: - cma_set_ip_ver(data, 4); - cma_set_ip_ver(mask, 0xF); - data->dst_addr.ip4.addr = ((struct sockaddr_in *) addr)-> - sin_addr.s_addr; - mask->dst_addr.ip4.addr = ~0; + ip4_addr = ((struct sockaddr_in *)addr)->sin_addr.s_addr; + if (ps == RDMA_PS_SDP) { + sdp_set_ip_ver(sdp_data, 4); + sdp_set_ip_ver(sdp_mask, 0xF); + sdp_data->dst_addr.ip4.addr = ip4_addr; + sdp_mask->dst_addr.ip4.addr = ~0; + } else { + cma_set_ip_ver(cma_data, 4); + cma_set_ip_ver(cma_mask, 0xF); + cma_data->dst_addr.ip4.addr = ip4_addr; + cma_mask->dst_addr.ip4.addr = ~0; + } break; case AF_INET6: - cma_set_ip_ver(data, 6); - cma_set_ip_ver(mask, 0xF); - data->dst_addr.ip6 = ((struct sockaddr_in6 *) addr)-> - sin6_addr; - memset(&mask->dst_addr.ip6, 1, sizeof mask->dst_addr.ip6); + ip6_addr = ((struct sockaddr_in6 *)addr)->sin6_addr; + if (ps == RDMA_PS_SDP) { + sdp_set_ip_ver(sdp_data, 6); + sdp_set_ip_ver(sdp_mask, 0xF); + sdp_data->dst_addr.ip6 = ip6_addr; + memset(&sdp_mask->dst_addr.ip6, 0xFF, + sizeof sdp_mask->dst_addr.ip6); + } else { + cma_set_ip_ver(cma_data, 6); + cma_set_ip_ver(cma_mask, 0xF); + cma_data->dst_addr.ip6 = ip6_addr; + memset(&cma_mask->dst_addr.ip6, 0xFF, + sizeof cma_mask->dst_addr.ip6); + } break; default: break; @@ -976,7 +997,7 @@ static int cma_ib_listen(struct rdma_id_ if (cma_any_addr(addr)) ret = ib_cm_listen(id_priv->cm_id.ib, svc_id, 0, NULL); else { - cma_set_compare_data(addr, &compare_data); + cma_set_compare_data(id_priv->id.ps, addr, &compare_data); ret = ib_cm_listen(id_priv->cm_id.ib, svc_id, 0, &compare_data); } -- MST From zhushisongzhu at yahoo.com Thu May 18 06:38:33 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Thu, 18 May 2006 06:38:33 -0700 (PDT) Subject: [openib-general] Re: OFED RC4 also can't support >2000 connections In-Reply-To: <20060518110856.GQ30211@mellanox.co.il> Message-ID: <20060518133833.10118.qmail@web36903.mail.mud.yahoo.com> Also I found sdp has memory leak problem. After Client connect Server for several thousands times, just 2.5% of system memory is free. SDP exhausted the whole memory at all. tks zhu --- "Michael S. Tsirkin" wrote: > Quoting r. zhu shi song : > > I hope someone who is responsible sdp development > to test sdp intensively and > > correct the bug, then sdp will be some product > grade and not just stay > > conceptual stage. > > Correct. SDP is not a product yet. I am working at > it though so it will get > there, stay tuned. > > -- > MST > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From mst at mellanox.co.il Thu May 18 06:42:31 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 18 May 2006 16:42:31 +0300 Subject: [openib-general] Re: OFED RC4 also can't support >2000 connections In-Reply-To: <20060518133833.10118.qmail@web36903.mail.mud.yahoo.com> References: <20060518110856.GQ30211@mellanox.co.il> <20060518133833.10118.qmail@web36903.mail.mud.yahoo.com> Message-ID: <20060518134231.GY30211@mellanox.co.il> Quoting r. zhu shi song : > Subject: Re: OFED RC4 also can't support >2000 connections > > Also I found sdp has memory leak problem. After > Client connect Server for several thousands times, > just 2.5% of system memory is free. SDP exhausted the > whole memory at all. > > tks > zhu this isn't svn trunk, is it? -- MST From halr at voltaire.com Thu May 18 06:42:07 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 May 2006 09:42:07 -0400 Subject: [openib-general] Re: [PATCH] opensm: remove unused cl_memory_osd.h [was: [PATCH] OpenSM: Use memory routines directly and eliminate cl_mem* routines] In-Reply-To: <20060517225054.GF28485@sashak.voltaire.com> References: <1147882436.18971.50423.camel@hal.voltaire.com> <20060517220249.GC28485@sashak.voltaire.com> <20060517225054.GF28485@sashak.voltaire.com> Message-ID: <1147959723.18971.76054.camel@hal.voltaire.com> On Wed, 2006-05-17 at 18:50, Sasha Khapyorsky wrote: > On 01:02 Thu 18 May , Sasha Khapyorsky wrote: > > On 12:14 Wed 17 May , Hal Rosenstock wrote: > > > OpenSM: Use memory routines directly and eliminate cl_mem* routines > > > as these routines are part of ISO C > > > > > > Signed-off-by: Hal Rosenstock > > > > Following Hal's cleanup > > And more: > > This removes unused cl_memory_osd.h file from complib > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only. -- Hal From jlentini at netapp.com Thu May 18 07:37:26 2006 From: jlentini at netapp.com (James Lentini) Date: Thu, 18 May 2006 10:37:26 -0400 (EDT) Subject: [openib-general] remove mutex-backport.h? Message-ID: I just noticed that svn/gen2/trunk/src/linux-kernel/infiniband/include/linux/mutex-backport.h is still hanging around. Any reason it hasn't been deleted yet? james From tziporet at mellanox.co.il Thu May 18 07:44:23 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 18 May 2006 17:44:23 +0300 Subject: [openib-general] Re: OFED RC4 also can't support >2000 connections In-Reply-To: <20060518133325.58723.qmail@web36910.mail.mud.yahoo.com> References: <20060518133325.58723.qmail@web36910.mail.mud.yahoo.com> Message-ID: <446C8847.4050601@mellanox.co.il> zhu shi song wrote: > Do you have any time table for sdp? My project is > urgent to use sdp. If it's too late, I'll think > another method. Because SDP has reliable connection > semantics, it's very useful for such as web, ftp, > https general applications. So infiband can easily > extend to new application area except cluster > computing customized applications. But unfortunately > it can't work. I hope you can speed your development > process. I'll try my best to help you. I'm studying > the source code now. > tks > zhu > > We expect it to be robust next month (June) Tziporet From eitan at mellanox.co.il Thu May 18 07:53:34 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 18 May 2006 17:53:34 +0300 Subject: [openib-general] Problems running IBMgtSim with OpenSM Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBB2@mtlexch01.mtl.com> Hi Sven, I'm glad to hear you are interested in the simulator. In order to connect OpenSM to the simulator you need to build it with special flags. The best way to do it is: 1. Build the simulator code 2. Checkout the management/osm tree (lets say to ~/osm); then cd ~/osm; ./autogen.sh 3. Go somewhere (like /tmp/osm_sim) ~/osm/configure --with-osmv=sim --prefix=/tmp/osm_sim 4. make ; make install 5. run the simulator code as you did. But you do not need any sudo - the simulation code can be run on the user account. Eitan Zahavi > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Sven-Arne Reinemo > Sent: Thursday, May 18, 2006 3:11 PM > To: openib-general at openib.org > Subject: [openib-general] Problems running IBMgtSim with OpenSM > > Hi, > > I am trying to get the IBMgtSim running with OpenSM, but I only get an > "Unable to find requested CA guid 0x2c90000000009" error (complete dump > included below). Any suggestions to what might be wrong? > > I don't have any IBA hardware in the machine I'm running the simulator > on as I thought it was unnecessary (?). The code is a recent checkout > from . > > Best regards, > Sven-Arne > > svenar at bricoleur:~/tmp$ sudo RunSimTest -o /usr/local/bin/opensm -t > /home/svenar/openib/utils/src/linux-user/ibmgtsim/tests/Gnu16NodeOsmTest .topo > Password: > -I- Using random seed:53879 > -I- Simulation directory is: /tmp/ibmgtsim.26852 > -I- Calling IBMgtSim -s 53879 -V 0xA3 -t > /home/svenar/openib/utils/src/linux-user/ibmgtsim/tests/Gnu16NodeOsmTest .topo > -l /tmp/ibmgtsim.26852/sim.log > -I- Simulator Ready > -I- Connecting to the simulator control server:bricoleur.simula.no > port:12450 > -I- Connected to the simulator control server > -I- Defined 51 guids > -I- Node H-1 data: 0x0002c90000000008 {0x0002c90000000009 1} > {0x0002c9000000000a 2} > -I- Starting: /usr/local/bin/opensm -D 0x43 -g 0x0002c90000000009 ... > -I- Waiting for OpenSM subnet up ... > -I- OpenSM Event:ERR May 18 13:57:45 161191 [B7E606C0] -> > osm_vendor_open_port: ERR 5422: Unable to find requested CA guid > 0x2c90000000009 > -I- New 1 events of /tmp/ibmgtsim.26852/osm.log > -I- OpenSM Event:ERR May 18 13:57:45 161261 [B7E606C0] -> > osm_vendor_bind: ERR 5424: Unable to Open Port 0x2c90000000009 > -I- New 1 events of /tmp/ibmgtsim.26852/osm.log > -I- OpenSM Event:ERR May 18 13:57:45 161289 [B7E606C0] -> > osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed > -I- New 1 events of /tmp/ibmgtsim.26852/osm.log > -I- OpenSM Event:ERR May 18 13:57:45 161320 [B7E606C0] -> osm_sm_bind: > ERR 2E10: SM MAD Controller bind failed (IB_ERROR) > -I- New 1 events of /tmp/ibmgtsim.26852/osm.log > -I- OpenSM Event:ERR May 18 13:57:45 161370 [B7E606C0] -> > osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind > -I- New 1 events of /tmp/ibmgtsim.26852/osm.log > -I- OpenSM with log:/tmp/ibmgtsim.26852/osm.log died > -I- New 1 events of /tmp/ibmgtsim.26852/osm.log > -I- Closing SIM ... > -I- Closing Sub Processes ... > -I- Closing Pipes ... > -I- Status = 1 > -I- Simulation dir left intact:/tmp/ibmgtsim.26852 > svenar at bricoleur:~/tmp$ > > -- > SAR > ---- GnuPG public key - http://home.ifi.uio.no/~svenar/gpg.asc ---- > "There are only 10 kinds of people in this world; those who know > binary and those who don't." > -- Unknown > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Thu May 18 07:50:31 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 18 May 2006 07:50:31 -0700 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> <446B1E84.9020505@voltaire.com> <446C0286.70708@voltaire.com> <446C0710.8090803@voltaire.com> <446C17B0.4010500@voltaire.com> Message-ID: Or> Sure, better safe than sorry is good habit! its just this two Or> weeks short time frame for three (iscsi && cma -> iser) Or> serialized pushes which worries me a little, i guess there's Or> nothing we can do about it. It's not really a short window at all -- I would be surprised if it even took more than 3 days to merge everything. All the maintainers with git trees generally send Linus a pull request on the first day of the merge window. Anyway the only thing that iser is really serialized against is the iscsi merge. I will send Linus a request to pull iser as soon as he has pulled James's tree. If Linus has not pulled my for-2.6.18 tree yet, then he'll just get a bigger merge including the CMA etc. - R. From leonida at voltaire.com Thu May 18 07:48:10 2006 From: leonida at voltaire.com (Leonid Arsh) Date: Thu, 18 May 2006 17:48:10 +0300 Subject: [openib-general][RFC][PATCH] mthca: HCA initialization parameters Message-ID: <20060518144810.GA9756@voltaire.com> Hello, we need a capability to change the HCA parameters, in order to tune its resources. There is a special structure 'mthca_profile' in the MTHA driver, used during the HCA initialization and determining different HCA initialization parameters, such as maximum number of QPs, CQs, address vectors etc. Unfortunately, the parameters can not be defined outside the driver. Attached file implements a number of the module parameters allowing to define the 'mthca_profile' values. Comments, corrections and suggestions are welcomed. Signed-off-by: Leonid Arsh --- openib-1.0/src/linux-kernel/infiniband/hw/mthca/mthca_main.c.orig 2006-05-15 21:27:59.000000000 +0300 +++ openib-1.0/src/linux-kernel/infiniband/hw/mthca/mthca_main.c 2006-05-17 22:10:35.000000000 +0300 @@ -81,6 +81,53 @@ module_param(tune_pci, int, 0444); MODULE_PARM_DESC(tune_pci, "increase PCI burst from the default set by BIOS if nonzero"); +static int num_qp = 0; +module_param(num_qp, int, 0444); +MODULE_PARM_DESC(num_qp, "Maximum number of QPs available per HCA"); + +static int rdb_per_qp = 0; +module_param(rdb_per_qp, int, 0444); +MODULE_PARM_DESC(rdb_per_qp, "Number of RDB buffers per QP"); + +static int num_srq = 0; +module_param(num_srq, int, 0444); +MODULE_PARM_DESC(num_srq, "Maximum number of Shared Receive Queues per HCA "); + +static int num_cq = 0; +module_param(num_cq, int, 0444); +MODULE_PARM_DESC(num_cq, "Maximum number of CQs per HCA"); + +static int num_mcg = 0; +module_param(num_mcg, int, 0444); +MODULE_PARM_DESC(num_mcg, "Maximum number of Multicast groups per HCA"); + +static int num_mpt = 0; +module_param(num_mpt, int, 0444); +MODULE_PARM_DESC(num_mpt, + "Maximum number of Memory Protection Table entries per HCA"); + +static int num_mtt = 0; +module_param(num_mtt, int, 0444); +MODULE_PARM_DESC(num_mtt, + "Maximum number of Memory Translation table segments per HCA"); + +static int num_udav = 0; +module_param(num_udav, int, 0444); +MODULE_PARM_DESC(num_udav, "Maximum number of UD Address Vectors per HCA"); + +static int num_uar = 0; +module_param(num_uar, int, 0444); +MODULE_PARM_DESC(num_uar, "Maximum number of User Access Regions per HCA"); + +static int fmr_reserved_mtts = 0; +module_param(fmr_reserved_mtts, int, 0444); +MODULE_PARM_DESC(fmr_reserved_mtts, + "Number of Memory Translation table segments reserved for FMR"); + +static int uarc_size = 0; +module_param(uarc_size, int, 0444); +MODULE_PARM_DESC(uarc_size, "User Access Region Context size"); + static const char mthca_version[] __devinitdata = DRV_NAME ": Mellanox InfiniBand HCA driver v" DRV_VERSION " (" DRV_RELDATE ")\n"; @@ -97,6 +144,22 @@ .uarc_size = 1 << 18, /* Arbel only */ }; +static void __devinit mthca_setup_profile(struct mthca_profile *profile) +{ + + if(num_qp) profile->num_qp = num_qp; + if(rdb_per_qp) profile->rdb_per_qp = rdb_per_qp; + if(num_srq) profile->num_srq = num_srq; + if(num_cq) profile->num_cq = num_cq; + if(num_mcg) profile->num_mcg = num_mcg; + if(num_mpt) profile->num_mpt = num_mpt; + if(num_mtt) profile->num_mtt = num_mtt; + if(num_udav) profile->num_udav = num_udav; + if(num_uar) profile->num_uar = num_uar; + if(num_uar) profile->uarc_size = num_uar; + if(fmr_reserved_mtts) profile->fmr_reserved_mtts = fmr_reserved_mtts; + +} static int __devinit mthca_tune_pci(struct mthca_dev *mdev) { int cap; @@ -994,6 +1057,8 @@ printk(KERN_INFO PFX "Initializing %s\n", pci_name(pdev)); + mthca_setup_profile(&default_profile); + if (id->driver_data >= ARRAY_SIZE(mthca_hca_table)) { printk(KERN_ERR PFX "%s has invalid driver data %lx\n", pci_name(pdev), id->driver_data); From eitan at mellanox.co.il Thu May 18 08:00:15 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 18 May 2006 18:00:15 +0300 Subject: [openib-general] [PATCH] Replace cl_memory.h by string.h [was:[PATCH] OpenSM: Use memory routines directly and eliminatecl_mem* routines] Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBB4@mtlexch01.mtl.com> Hi Sasha, Hal, There several applications (ibis and ibmgtsim) that depend on complib, The changes of cleaning up the cl_memory API affect these utilities. Can you please provide the list of APIs removed and their replacements ? Also if we eventually converge on a single complib for windows and linux then the Windows stack is going to be affected by these changes too. EZ Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Sasha Khapyorsky > Sent: Thursday, May 18, 2006 1:03 AM > To: Hal Rosenstock > Cc: openib-general at openib.org > Subject: [openib-general] [PATCH] Replace cl_memory.h by string.h [was:[PATCH] > OpenSM: Use memory routines directly and eliminatecl_mem* routines] > > On 12:14 Wed 17 May , Hal Rosenstock wrote: > > OpenSM: Use memory routines directly and eliminate cl_mem* routines > > as these routines are part of ISO C > > > > Signed-off-by: Hal Rosenstock > > Following Hal's cleanup this includes string.h header file for proper > mem*() functions prototype definitions where necessary, removes/includes > cl_memory.h as needed. Also couple of unistd.h additions for close(), > sleep() and unlink() calls. > > Signed-off-by: Sasha Khapyorsky > > > --- > > osm/complib/cl_event_wheel.c | 1 + > osm/complib/cl_map.c | 2 +- > osm/complib/cl_memory.c | 1 + > osm/complib/cl_perf.c | 2 ++ > osm/complib/cl_pool.c | 1 + > osm/complib/cl_ptr_vector.c | 1 + > osm/complib/cl_threadpool.c | 1 + > osm/complib/cl_timer.c | 1 + > osm/complib/cl_vector.c | 1 + > osm/complib/libosmcomp.map | 3 --- > osm/include/complib/cl_byteswap.h | 3 +-- > osm/include/complib/cl_memory.h | 1 - > osm/include/iba/ib_types.h | 2 +- > osm/include/opensm/osm_lin_fwd_tbl.h | 1 + > osm/include/opensm/osm_madw.h | 1 + > osm/include/opensm/osm_mcm_info.h | 1 + > osm/include/opensm/osm_mtree.h | 1 + > osm/include/opensm/osm_path.h | 1 + > osm/include/opensm/osm_port.h | 1 + > osm/include/opensm/osm_port_profile.h | 1 + > osm/include/opensm/osm_rand_fwd_tbl.h | 1 + > osm/include/vendor/osm_vendor_mlx_svc.h | 2 ++ > osm/include/vendor/osm_vendor_mtl.h | 2 -- > .../vendor/osm_vendor_mtl_transaction_mgr.h | 1 - > osm/include/vendor/osm_vendor_ts.h | 1 - > osm/libvendor/osm_pkt_randomizer.c | 2 ++ > osm/libvendor/osm_vendor_al.c | 1 + > osm/libvendor/osm_vendor_ibumad.c | 10 ++++++---- > osm/libvendor/osm_vendor_ibumad_sa.c | 3 +++ > osm/libvendor/osm_vendor_mlx.c | 2 ++ > osm/libvendor/osm_vendor_mlx_anafa.c | 1 + > osm/libvendor/osm_vendor_mlx_dispatcher.c | 1 + > osm/libvendor/osm_vendor_mlx_hca.c | 1 + > osm/libvendor/osm_vendor_mlx_hca_anafa.c | 1 + > osm/libvendor/osm_vendor_mlx_ibmgt.c | 2 ++ > osm/libvendor/osm_vendor_mlx_rmpp_ctx.c | 1 + > osm/libvendor/osm_vendor_mlx_sa.c | 2 ++ > osm/libvendor/osm_vendor_mlx_sar.c | 4 +++- > osm/libvendor/osm_vendor_mlx_sender.c | 1 + > osm/libvendor/osm_vendor_mlx_sim.c | 2 ++ > osm/libvendor/osm_vendor_mlx_ts.c | 2 ++ > osm/libvendor/osm_vendor_mlx_ts_anafa.c | 2 ++ > osm/libvendor/osm_vendor_mtl.c | 2 ++ > osm/libvendor/osm_vendor_mtl_transaction_mgr.c | 1 + > osm/libvendor/osm_vendor_test.c | 1 + > osm/libvendor/osm_vendor_ts.c | 2 ++ > osm/libvendor/osm_vendor_umadt.c | 1 + > osm/opensm/osm_db_files.c | 6 ++++-- > osm/opensm/osm_db_pack.c | 1 + > osm/opensm/osm_drop_mgr.c | 2 ++ > osm/opensm/osm_fwd_tbl.c | 1 - > osm/opensm/osm_helper.c | 2 +- > osm/opensm/osm_inform.c | 1 + > osm/opensm/osm_lid_mgr.c | 1 + > osm/opensm/osm_lin_fwd_rcv.c | 2 +- > osm/opensm/osm_lin_fwd_rcv_ctrl.c | 2 +- > osm/opensm/osm_lin_fwd_tbl.c | 1 + > osm/opensm/osm_link_mgr.c | 2 +- > osm/opensm/osm_mad_pool.c | 1 + > osm/opensm/osm_matrix.c | 1 + > osm/opensm/osm_mcast_fwd_rcv.c | 2 +- > osm/opensm/osm_mcast_fwd_rcv_ctrl.c | 2 +- > osm/opensm/osm_mcast_mgr.c | 2 ++ > osm/opensm/osm_mcast_tbl.c | 1 + > osm/opensm/osm_mcm_info.c | 1 + > osm/opensm/osm_mcm_port.c | 2 ++ > osm/opensm/osm_mtree.c | 1 + > osm/opensm/osm_multicast.c | 1 + > osm/opensm/osm_node_desc_rcv.c | 2 +- > osm/opensm/osm_node_desc_rcv_ctrl.c | 2 +- > osm/opensm/osm_node_info_rcv.c | 2 +- > osm/opensm/osm_node_info_rcv_ctrl.c | 2 +- > osm/opensm/osm_opensm.c | 4 +--- > osm/opensm/osm_pkey.c | 1 + > osm/opensm/osm_pkey_mgr.c | 1 + > osm/opensm/osm_pkey_rcv.c | 2 +- > osm/opensm/osm_pkey_rcv_ctrl.c | 2 +- > osm/opensm/osm_port.c | 1 + > osm/opensm/osm_port_info_rcv.c | 2 +- > osm/opensm/osm_port_info_rcv_ctrl.c | 2 +- > osm/opensm/osm_prtn.c | 1 + > osm/opensm/osm_qos.c | 1 + > osm/opensm/osm_remote_sm.c | 2 +- > osm/opensm/osm_req.c | 2 +- > osm/opensm/osm_req_ctrl.c | 2 +- > osm/opensm/osm_resp.c | 2 +- > osm/opensm/osm_sa.c | 2 +- > osm/opensm/osm_sa_class_port_info.c | 2 +- > osm/opensm/osm_sa_class_port_info_ctrl.c | 2 +- > osm/opensm/osm_sa_guidinfo_record.c | 2 +- > osm/opensm/osm_sa_guidinfo_record_ctrl.c | 2 +- > osm/opensm/osm_sa_informinfo.c | 2 +- > osm/opensm/osm_sa_informinfo_ctrl.c | 2 +- > osm/opensm/osm_sa_lft_record.c | 1 + > osm/opensm/osm_sa_lft_record_ctrl.c | 2 +- > osm/opensm/osm_sa_link_record.c | 2 +- > osm/opensm/osm_sa_link_record_ctrl.c | 2 +- > osm/opensm/osm_sa_mad_ctrl.c | 2 +- > osm/opensm/osm_sa_mcmember_record.c | 1 + > osm/opensm/osm_sa_mcmember_record_ctrl.c | 2 +- > osm/opensm/osm_sa_multipath_record.c | 2 +- > osm/opensm/osm_sa_multipath_record_ctrl.c | 2 +- > osm/opensm/osm_sa_node_record.c | 1 + > osm/opensm/osm_sa_node_record_ctrl.c | 2 +- > osm/opensm/osm_sa_path_record.c | 2 +- > osm/opensm/osm_sa_path_record_ctrl.c | 2 +- > osm/opensm/osm_sa_pkey_record.c | 2 +- > osm/opensm/osm_sa_pkey_record_ctrl.c | 2 +- > osm/opensm/osm_sa_portinfo_record.c | 2 +- > osm/opensm/osm_sa_portinfo_record_ctrl.c | 2 +- > osm/opensm/osm_sa_response.c | 2 +- > osm/opensm/osm_sa_service_record.c | 2 +- > osm/opensm/osm_sa_service_record_ctrl.c | 2 +- > osm/opensm/osm_sa_slvl_record.c | 2 +- > osm/opensm/osm_sa_slvl_record_ctrl.c | 2 +- > osm/opensm/osm_sa_sminfo_record.c | 2 +- > osm/opensm/osm_sa_sminfo_record_ctrl.c | 2 +- > osm/opensm/osm_sa_vlarb_record.c | 2 +- > osm/opensm/osm_sa_vlarb_record_ctrl.c | 2 +- > osm/opensm/osm_service.c | 1 + > osm/opensm/osm_slvl_map_rcv.c | 2 +- > osm/opensm/osm_slvl_map_rcv_ctrl.c | 2 +- > osm/opensm/osm_sm.c | 1 + > osm/opensm/osm_sm_mad_ctrl.c | 2 +- > osm/opensm/osm_sm_state_mgr.c | 2 +- > osm/opensm/osm_sminfo_rcv.c | 1 + > osm/opensm/osm_sminfo_rcv_ctrl.c | 2 +- > osm/opensm/osm_state_mgr.c | 2 ++ > osm/opensm/osm_state_mgr_ctrl.c | 2 +- > osm/opensm/osm_subnet.c | 2 ++ > osm/opensm/osm_sw_info_rcv.c | 2 +- > osm/opensm/osm_sw_info_rcv_ctrl.c | 2 +- > osm/opensm/osm_sweep_fail_ctrl.c | 2 +- > osm/opensm/osm_switch.c | 1 + > osm/opensm/osm_trap_rcv.c | 2 +- > osm/opensm/osm_trap_rcv_ctrl.c | 2 +- > osm/opensm/osm_ucast_mgr.c | 2 ++ > osm/opensm/osm_ucast_updn.c | 1 + > osm/opensm/osm_vl15intf.c | 2 +- > osm/opensm/osm_vl_arb_rcv.c | 2 +- > osm/opensm/osm_vl_arb_rcv_ctrl.c | 2 +- > osm/osmtest/include/osmtest_subnet.h | 1 + > osm/osmtest/osmt_inform.c | 1 - > osm/osmtest/osmt_slvl_vl_arb.c | 1 - > osm/osmtest/osmtest.c | 2 +- > 145 files changed, 166 insertions(+), 88 deletions(-) > > e117de15a67314817a58b6300b432ec9ffa6a0a5 > diff --git a/osm/complib/cl_event_wheel.c b/osm/complib/cl_event_wheel.c > index cf04df7..aaaa53d 100644 > --- a/osm/complib/cl_event_wheel.c > +++ b/osm/complib/cl_event_wheel.c > @@ -40,6 +40,7 @@ # include > #endif /* HAVE_CONFIG_H */ > > #include > +#include > #include > #include > > diff --git a/osm/complib/cl_map.c b/osm/complib/cl_map.c > index 974b0d3..8962e9a 100644 > --- a/osm/complib/cl_map.c > +++ b/osm/complib/cl_map.c > @@ -70,10 +70,10 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > -#include > > > /*********************************************************************** ******* > diff --git a/osm/complib/cl_memory.c b/osm/complib/cl_memory.c > index 49ff45d..a9ae948 100644 > --- a/osm/complib/cl_memory.c > +++ b/osm/complib/cl_memory.c > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #define _MEM_DEBUG_MODE_ 0 > #ifdef _MEM_DEBUG_MODE_ > diff --git a/osm/complib/cl_perf.c b/osm/complib/cl_perf.c > index 753eba3..0c8ead2 100644 > --- a/osm/complib/cl_perf.c > +++ b/osm/complib/cl_perf.c > @@ -51,6 +51,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > + > /* > * Always turn on performance tracking when building this file to allow the > * performance counter functions to be built into the component library. > diff --git a/osm/complib/cl_pool.c b/osm/complib/cl_pool.c > index cfd2774..3fe07a8 100644 > --- a/osm/complib/cl_pool.c > +++ b/osm/complib/cl_pool.c > @@ -52,6 +52,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/complib/cl_ptr_vector.c b/osm/complib/cl_ptr_vector.c > index bddce00..5ab74c3 100644 > --- a/osm/complib/cl_ptr_vector.c > +++ b/osm/complib/cl_ptr_vector.c > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > > diff --git a/osm/complib/cl_threadpool.c b/osm/complib/cl_threadpool.c > index a2f620d..a2a4848 100644 > --- a/osm/complib/cl_threadpool.c > +++ b/osm/complib/cl_threadpool.c > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/complib/cl_timer.c b/osm/complib/cl_timer.c > index 847545f..b3cc3e9 100644 > --- a/osm/complib/cl_timer.c > +++ b/osm/complib/cl_timer.c > @@ -48,6 +48,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/complib/cl_vector.c b/osm/complib/cl_vector.c > index 3e1a757..bcda8e0 100644 > --- a/osm/complib/cl_vector.c > +++ b/osm/complib/cl_vector.c > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > > diff --git a/osm/complib/libosmcomp.map b/osm/complib/libosmcomp.map > index 7a7ee1d..73fb242 100644 > --- a/osm/complib/libosmcomp.map > +++ b/osm/complib/libosmcomp.map > @@ -87,9 +87,6 @@ OSMCOMP_1.0 { > __cl_find_mem; > __cl_free_trk; > __cl_free_ntrk; > - cl_memset; > - cl_memcpy; > - cl_memcmp; > __cl_perf_run_calibration; > __cl_perf_construct; > __cl_perf_init; > diff --git a/osm/include/complib/cl_byteswap.h b/osm/include/complib/cl_byteswap.h > index 932d564..d144ea3 100644 > --- a/osm/include/complib/cl_byteswap.h > +++ b/osm/include/complib/cl_byteswap.h > @@ -51,8 +51,7 @@ > #ifndef _CL_BYTESWAP_H_ > #define _CL_BYTESWAP_H_ > > - > -#include > +#include > #include > > #ifdef __cplusplus > diff --git a/osm/include/complib/cl_memory.h b/osm/include/complib/cl_memory.h > index 9f558ac..4bbf7a2 100644 > --- a/osm/include/complib/cl_memory.h > +++ b/osm/include/complib/cl_memory.h > @@ -52,7 +52,6 @@ #define _CL_MEMORY_H_ > > > #include > -#include > > #ifdef __cplusplus > # define BEGIN_C_DECLS extern "C" { > diff --git a/osm/include/iba/ib_types.h b/osm/include/iba/ib_types.h > index 811d836..b72e810 100644 > --- a/osm/include/iba/ib_types.h > +++ b/osm/include/iba/ib_types.h > @@ -38,9 +38,9 @@ > #if !defined(__IB_TYPES_H__) > #define __IB_TYPES_H__ > > +#include > #include > #include > -#include > > #ifdef __cplusplus > # define BEGIN_C_DECLS extern "C" { > diff --git a/osm/include/opensm/osm_lin_fwd_tbl.h > b/osm/include/opensm/osm_lin_fwd_tbl.h > index dee01a9..ca378a8 100644 > --- a/osm/include/opensm/osm_lin_fwd_tbl.h > +++ b/osm/include/opensm/osm_lin_fwd_tbl.h > @@ -50,6 +50,7 @@ > #ifndef _OSM_LIN_FWD_TBL_H_ > #define _OSM_LIN_FWD_TBL_H_ > > +#include > #include > #include > > diff --git a/osm/include/opensm/osm_madw.h b/osm/include/opensm/osm_madw.h > index 2173957..4fde04c 100644 > --- a/osm/include/opensm/osm_madw.h > +++ b/osm/include/opensm/osm_madw.h > @@ -51,6 +51,7 @@ > #ifndef _OSM_MADW_H_ > #define _OSM_MADW_H_ > > +#include > #include > #include > #include > diff --git a/osm/include/opensm/osm_mcm_info.h > b/osm/include/opensm/osm_mcm_info.h > index c4d5443..1f325b1 100644 > --- a/osm/include/opensm/osm_mcm_info.h > +++ b/osm/include/opensm/osm_mcm_info.h > @@ -50,6 +50,7 @@ > #ifndef _OSM_MCM_INFO_H_ > #define _OSM_MCM_INFO_H_ > > +#include > #include > #include > #include > diff --git a/osm/include/opensm/osm_mtree.h b/osm/include/opensm/osm_mtree.h > index 57c894b..013112d 100644 > --- a/osm/include/opensm/osm_mtree.h > +++ b/osm/include/opensm/osm_mtree.h > @@ -51,6 +51,7 @@ > #ifndef _OSM_MTREE_H_ > #define _OSM_MTREE_H_ > > +#include > #include > #include > #include > diff --git a/osm/include/opensm/osm_path.h b/osm/include/opensm/osm_path.h > index bf1cc67..cb3bb8e 100644 > --- a/osm/include/opensm/osm_path.h > +++ b/osm/include/opensm/osm_path.h > @@ -38,6 +38,7 @@ > #ifndef _OSM_PATH_H_ > #define _OSM_PATH_H_ > > +#include > #include > #include > > diff --git a/osm/include/opensm/osm_port.h b/osm/include/opensm/osm_port.h > index 46a0064..cf3f6f2 100644 > --- a/osm/include/opensm/osm_port.h > +++ b/osm/include/opensm/osm_port.h > @@ -50,6 +50,7 @@ > #ifndef _OSM_PORT_H_ > #define _OSM_PORT_H_ > > +#include > #include > #include > #include > diff --git a/osm/include/opensm/osm_port_profile.h > b/osm/include/opensm/osm_port_profile.h > index 9a58115..9c0f7f7 100644 > --- a/osm/include/opensm/osm_port_profile.h > +++ b/osm/include/opensm/osm_port_profile.h > @@ -50,6 +50,7 @@ > #ifndef _OSM_PORT_PROFILE_H_ > #define _OSM_PORT_PROFILE_H_ > > +#include > #include > #include > #include > diff --git a/osm/include/opensm/osm_rand_fwd_tbl.h > b/osm/include/opensm/osm_rand_fwd_tbl.h > index 1d293e5..fac9ffd 100644 > --- a/osm/include/opensm/osm_rand_fwd_tbl.h > +++ b/osm/include/opensm/osm_rand_fwd_tbl.h > @@ -51,6 +51,7 @@ #ifndef _OSM_RAND_FWD_TBL_H_ > #define _OSM_RAND_FWD_TBL_H_ > > #include > +#include > #include > > #ifdef __cplusplus > diff --git a/osm/include/vendor/osm_vendor_mlx_svc.h > b/osm/include/vendor/osm_vendor_mlx_svc.h > index 69d379c..e4897d4 100644 > --- a/osm/include/vendor/osm_vendor_mlx_svc.h > +++ b/osm/include/vendor/osm_vendor_mlx_svc.h > @@ -38,7 +38,9 @@ #ifndef _OSMV_SVC_H_ > #define _OSMV_SVC_H_ > > #include > +#include > #include > +#include > #include > > #ifdef __cplusplus > diff --git a/osm/include/vendor/osm_vendor_mtl.h > b/osm/include/vendor/osm_vendor_mtl.h > index 5837867..218bdf7 100644 > --- a/osm/include/vendor/osm_vendor_mtl.h > +++ b/osm/include/vendor/osm_vendor_mtl.h > @@ -60,10 +60,8 @@ #define OUT > #include "iba/ib_types.h" > #include "iba/ib_al.h" > #include > -#include > #include > #include > -#include > > #ifdef __cplusplus > # define BEGIN_C_DECLS extern "C" { > diff --git a/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h > b/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h > index 7bf938d..82d2cc2 100644 > --- a/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h > +++ b/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h > @@ -61,7 +61,6 @@ #include > #include > #include > #include > -#include > #include > #include > #include > diff --git a/osm/include/vendor/osm_vendor_ts.h > b/osm/include/vendor/osm_vendor_ts.h > index b4c2f21..4414cba 100644 > --- a/osm/include/vendor/osm_vendor_ts.h > +++ b/osm/include/vendor/osm_vendor_ts.h > @@ -59,7 +59,6 @@ #define OUT > #include "iba/ib_types.h" > #include "iba/ib_al.h" > #include > -#include > #include > #include > #include > diff --git a/osm/libvendor/osm_pkt_randomizer.c > b/osm/libvendor/osm_pkt_randomizer.c > index 2fa7621..29df135 100644 > --- a/osm/libvendor/osm_pkt_randomizer.c > +++ b/osm/libvendor/osm_pkt_randomizer.c > @@ -51,12 +51,14 @@ #endif /* HAVE_CONFIG_H */ > > #include > #include > +#include > > #ifndef WIN32 > #include > #include > #endif > > +#include > > /********************************************************************** > * Return TRUE if the path is in a fault path, and FALSE otherwise. > diff --git a/osm/libvendor/osm_vendor_al.c b/osm/libvendor/osm_vendor_al.c > index d26d6d8..3240625 100644 > --- a/osm/libvendor/osm_vendor_al.c > +++ b/osm/libvendor/osm_vendor_al.c > @@ -59,6 +59,7 @@ #include > > #ifdef OSM_VENDOR_INTF_AL > > +#include > #include > #include > #include > diff --git a/osm/libvendor/osm_vendor_ibumad.c > b/osm/libvendor/osm_vendor_ibumad.c > index 0a7fbe3..a3041d0 100644 > --- a/osm/libvendor/osm_vendor_ibumad.c > +++ b/osm/libvendor/osm_vendor_ibumad.c > @@ -57,20 +57,22 @@ #include > > #ifdef OSM_VENDOR_INTF_OPENIB > > +#include > +#include > +#include > +#include > + > +#include > #include > #include > #include > #include > #include > -#include > #include > #include > #include > #include > > -#include > -#include > -#include > > /****s* OpenSM: Vendor AL/osm_umad_bind_info_t > * NAME > diff --git a/osm/libvendor/osm_vendor_ibumad_sa.c > b/osm/libvendor/osm_vendor_ibumad_sa.c > index 6eae887..568d39c 100644 > --- a/osm/libvendor/osm_vendor_ibumad_sa.c > +++ b/osm/libvendor/osm_vendor_ibumad_sa.c > @@ -38,10 +38,13 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > > +#include > + > #define MAX_PORTS 64 > > /*********************************************************************** ****** > diff --git a/osm/libvendor/osm_vendor_mlx.c b/osm/libvendor/osm_vendor_mlx.c > index 4c75d41..4a4be06 100644 > --- a/osm/libvendor/osm_vendor_mlx.c > +++ b/osm/libvendor/osm_vendor_mlx.c > @@ -38,12 +38,14 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > #include > #include > #include > +#include > > /** > * FORWARD REFERENCES > diff --git a/osm/libvendor/osm_vendor_mlx_anafa.c > b/osm/libvendor/osm_vendor_mlx_anafa.c > index 32af9bb..3cd917f 100644 > --- a/osm/libvendor/osm_vendor_mlx_anafa.c > +++ b/osm/libvendor/osm_vendor_mlx_anafa.c > @@ -55,6 +55,7 @@ #include > #include > #include > > +#include > #include > > /** > diff --git a/osm/libvendor/osm_vendor_mlx_dispatcher.c > b/osm/libvendor/osm_vendor_mlx_dispatcher.c > index 341e784..afa1473 100644 > --- a/osm/libvendor/osm_vendor_mlx_dispatcher.c > +++ b/osm/libvendor/osm_vendor_mlx_dispatcher.c > @@ -38,6 +38,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/libvendor/osm_vendor_mlx_hca.c > b/osm/libvendor/osm_vendor_mlx_hca.c > index bb120ac..c0dca86 100644 > --- a/osm/libvendor/osm_vendor_mlx_hca.c > +++ b/osm/libvendor/osm_vendor_mlx_hca.c > @@ -39,6 +39,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #if defined(OSM_VENDOR_INTF_MTL) | defined(OSM_VENDOR_INTF_TS) > #undef IN > #undef OUT > diff --git a/osm/libvendor/osm_vendor_mlx_hca_anafa.c > b/osm/libvendor/osm_vendor_mlx_hca_anafa.c > index 5045563..8f87225 100644 > --- a/osm/libvendor/osm_vendor_mlx_hca_anafa.c > +++ b/osm/libvendor/osm_vendor_mlx_hca_anafa.c > @@ -44,6 +44,7 @@ #undef IN > #undef OUT > > #include > +#include > > #include > #include > diff --git a/osm/libvendor/osm_vendor_mlx_ibmgt.c > b/osm/libvendor/osm_vendor_mlx_ibmgt.c > index 117ad12..ace790b 100644 > --- a/osm/libvendor/osm_vendor_mlx_ibmgt.c > +++ b/osm/libvendor/osm_vendor_mlx_ibmgt.c > @@ -46,7 +46,9 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > +#include > #include > #include > #include > diff --git a/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c > b/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c > index 69708c9..df250e2 100644 > --- a/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c > +++ b/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c > @@ -38,6 +38,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/libvendor/osm_vendor_mlx_sa.c > b/osm/libvendor/osm_vendor_mlx_sa.c > index 85fd810..212344a 100644 > --- a/osm/libvendor/osm_vendor_mlx_sa.c > +++ b/osm/libvendor/osm_vendor_mlx_sa.c > @@ -40,6 +40,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > +#include > #include > #include > #include > diff --git a/osm/libvendor/osm_vendor_mlx_sar.c > b/osm/libvendor/osm_vendor_mlx_sar.c > index 5b0bd70..f6b6405 100644 > --- a/osm/libvendor/osm_vendor_mlx_sar.c > +++ b/osm/libvendor/osm_vendor_mlx_sar.c > @@ -38,8 +38,10 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > +#include > +#include > > ib_api_status_t > osmv_rmpp_sar_init(osmv_rmpp_sar_t* p_sar, void* p_arbt_mad, > diff --git a/osm/libvendor/osm_vendor_mlx_sender.c > b/osm/libvendor/osm_vendor_mlx_sender.c > index 3317702..e1ed0a0 100644 > --- a/osm/libvendor/osm_vendor_mlx_sender.c > +++ b/osm/libvendor/osm_vendor_mlx_sender.c > @@ -38,6 +38,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/libvendor/osm_vendor_mlx_sim.c > b/osm/libvendor/osm_vendor_mlx_sim.c > index b927f2f..ba81e03 100644 > --- a/osm/libvendor/osm_vendor_mlx_sim.c > +++ b/osm/libvendor/osm_vendor_mlx_sim.c > @@ -51,12 +51,14 @@ #include > #include > #include > #include > +#include > > #include > #include > #include > #include > > +#include > /* the simulator messages definition */ > #include > > diff --git a/osm/libvendor/osm_vendor_mlx_ts.c > b/osm/libvendor/osm_vendor_mlx_ts.c > index 483b69b..a32173e 100644 > --- a/osm/libvendor/osm_vendor_mlx_ts.c > +++ b/osm/libvendor/osm_vendor_mlx_ts.c > @@ -51,12 +51,14 @@ #include > #include > #include > #include > +#include > > #include > #include > #include > #include > > +#include > #include > > typedef struct _osmv_TOPSPIN_transport_mgr_ { > diff --git a/osm/libvendor/osm_vendor_mlx_ts_anafa.c > b/osm/libvendor/osm_vendor_mlx_ts_anafa.c > index dd3c462..a9395df 100644 > --- a/osm/libvendor/osm_vendor_mlx_ts_anafa.c > +++ b/osm/libvendor/osm_vendor_mlx_ts_anafa.c > @@ -52,6 +52,7 @@ #include > #include > #include > #include > +#include > > #include > #include > @@ -59,6 +60,7 @@ #include #include > #include > > +#include > #include > > static void > diff --git a/osm/libvendor/osm_vendor_mtl.c b/osm/libvendor/osm_vendor_mtl.c > index f9b2284..82a68de 100644 > --- a/osm/libvendor/osm_vendor_mtl.c > +++ b/osm/libvendor/osm_vendor_mtl.c > @@ -43,6 +43,8 @@ #include > > #ifdef OSM_VENDOR_INTF_MTL > > +#include > +#include > #include > #include > /* HACK - I do not know how to prevent complib from loading kernel H files */ > diff --git a/osm/libvendor/osm_vendor_mtl_transaction_mgr.c > b/osm/libvendor/osm_vendor_mtl_transaction_mgr.c > index 997eb37..2b1c960 100644 > --- a/osm/libvendor/osm_vendor_mtl_transaction_mgr.c > +++ b/osm/libvendor/osm_vendor_mtl_transaction_mgr.c > @@ -40,6 +40,7 @@ # include > #endif /* HAVE_CONFIG_H */ > > #include > +#include > #include > #include > #include > diff --git a/osm/libvendor/osm_vendor_test.c b/osm/libvendor/osm_vendor_test.c > index ecacc67..013262e 100644 > --- a/osm/libvendor/osm_vendor_test.c > +++ b/osm/libvendor/osm_vendor_test.c > @@ -56,6 +56,7 @@ #include > > #ifdef OSM_VENDOR_INTF_TEST > > +#include > #include > #include > #include > diff --git a/osm/libvendor/osm_vendor_ts.c b/osm/libvendor/osm_vendor_ts.c > index 16d52e2..fa51382 100644 > --- a/osm/libvendor/osm_vendor_ts.c > +++ b/osm/libvendor/osm_vendor_ts.c > @@ -40,8 +40,10 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > +#include > #include > #include > > diff --git a/osm/libvendor/osm_vendor_umadt.c > b/osm/libvendor/osm_vendor_umadt.c > index 01d9b10..e27801a 100644 > --- a/osm/libvendor/osm_vendor_umadt.c > +++ b/osm/libvendor/osm_vendor_umadt.c > @@ -61,6 +61,7 @@ #ifdef OSM_VENDOR_INTF_UMADT > > #include > #include > +#include > > #include > #include > diff --git a/osm/opensm/osm_db_files.c b/osm/opensm/osm_db_files.c > index a8e82a7..930aaef 100644 > --- a/osm/opensm/osm_db_files.c > +++ b/osm/opensm/osm_db_files.c > @@ -46,11 +46,13 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > -#include > #include > #include > #include > +#include > +#include > +#include > +#include > > /****d* Database/OSM_DB_MAX_LINE_LEN > * NAME > diff --git a/osm/opensm/osm_db_pack.c b/osm/opensm/osm_db_pack.c > index 3f90397..b93ac84 100644 > --- a/osm/opensm/osm_db_pack.c > +++ b/osm/opensm/osm_db_pack.c > @@ -40,6 +40,7 @@ # include > #endif /* HAVE_CONFIG_H */ > > #include > +#include > #include > #include > static inline void > diff --git a/osm/opensm/osm_drop_mgr.c b/osm/opensm/osm_drop_mgr.c > index 470e5df..929088a 100644 > --- a/osm/opensm/osm_drop_mgr.c > +++ b/osm/opensm/osm_drop_mgr.c > @@ -51,7 +51,9 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_fwd_tbl.c b/osm/opensm/osm_fwd_tbl.c > index 852e048..ee32194 100644 > --- a/osm/opensm/osm_fwd_tbl.c > +++ b/osm/opensm/osm_fwd_tbl.c > @@ -51,7 +51,6 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_helper.c b/osm/opensm/osm_helper.c > index e54644b..3886609 100644 > --- a/osm/opensm/osm_helper.c > +++ b/osm/opensm/osm_helper.c > @@ -51,7 +51,7 @@ #endif /* HAVE_CONFIG_H */ > > #include > #include > -#include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_inform.c b/osm/opensm/osm_inform.c > index f20b068..172190c 100644 > --- a/osm/opensm/osm_inform.c > +++ b/osm/opensm/osm_inform.c > @@ -49,6 +49,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_lid_mgr.c b/osm/opensm/osm_lid_mgr.c > index 31d0be4..a33a420 100644 > --- a/osm/opensm/osm_lid_mgr.c > +++ b/osm/opensm/osm_lid_mgr.c > @@ -90,6 +90,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_lin_fwd_rcv.c b/osm/opensm/osm_lin_fwd_rcv.c > index 8ae7da8..339fe11 100644 > --- a/osm/opensm/osm_lin_fwd_rcv.c > +++ b/osm/opensm/osm_lin_fwd_rcv.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_lin_fwd_rcv_ctrl.c > b/osm/opensm/osm_lin_fwd_rcv_ctrl.c > index 4e915e7..987440d 100644 > --- a/osm/opensm/osm_lin_fwd_rcv_ctrl.c > +++ b/osm/opensm/osm_lin_fwd_rcv_ctrl.c > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_lin_fwd_tbl.c b/osm/opensm/osm_lin_fwd_tbl.c > index f8a6b87..3b4895f 100644 > --- a/osm/opensm/osm_lin_fwd_tbl.c > +++ b/osm/opensm/osm_lin_fwd_tbl.c > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_link_mgr.c b/osm/opensm/osm_link_mgr.c > index c8307d3..87e9e46 100644 > --- a/osm/opensm/osm_link_mgr.c > +++ b/osm/opensm/osm_link_mgr.c > @@ -50,8 +50,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_mad_pool.c b/osm/opensm/osm_mad_pool.c > index 72f9db8..12ecabf 100644 > --- a/osm/opensm/osm_mad_pool.c > +++ b/osm/opensm/osm_mad_pool.c > @@ -52,6 +52,7 @@ # include > #endif /* HAVE_CONFIG_H */ > > #include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_matrix.c b/osm/opensm/osm_matrix.c > index 3efb0bd..073d9b8 100644 > --- a/osm/opensm/osm_matrix.c > +++ b/osm/opensm/osm_matrix.c > @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > > > diff --git a/osm/opensm/osm_mcast_fwd_rcv.c b/osm/opensm/osm_mcast_fwd_rcv.c > index 73763f5..d0ffa59 100644 > --- a/osm/opensm/osm_mcast_fwd_rcv.c > +++ b/osm/opensm/osm_mcast_fwd_rcv.c > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_mcast_fwd_rcv_ctrl.c > b/osm/opensm/osm_mcast_fwd_rcv_ctrl.c > index a6f46fd..9201ecf 100644 > --- a/osm/opensm/osm_mcast_fwd_rcv_ctrl.c > +++ b/osm/opensm/osm_mcast_fwd_rcv_ctrl.c > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c > index f729c61..96d3b0f 100644 > --- a/osm/opensm/osm_mcast_mgr.c > +++ b/osm/opensm/osm_mcast_mgr.c > @@ -50,6 +50,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_mcast_tbl.c b/osm/opensm/osm_mcast_tbl.c > index 401d97c..b8fa325 100644 > --- a/osm/opensm/osm_mcast_tbl.c > +++ b/osm/opensm/osm_mcast_tbl.c > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_mcm_info.c b/osm/opensm/osm_mcm_info.c > index 08c0d12..a5ac7f3 100644 > --- a/osm/opensm/osm_mcm_info.c > +++ b/osm/opensm/osm_mcm_info.c > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > > /********************************************************************** > diff --git a/osm/opensm/osm_mcm_port.c b/osm/opensm/osm_mcm_port.c > index e92ad76..16ed84e 100644 > --- a/osm/opensm/osm_mcm_port.c > +++ b/osm/opensm/osm_mcm_port.c > @@ -51,6 +51,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > +#include > #include > > /********************************************************************** > diff --git a/osm/opensm/osm_mtree.c b/osm/opensm/osm_mtree.c > index f9d82d6..421e39e 100644 > --- a/osm/opensm/osm_mtree.c > +++ b/osm/opensm/osm_mtree.c > @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > > diff --git a/osm/opensm/osm_multicast.c b/osm/opensm/osm_multicast.c > index 2256741..690f7df 100644 > --- a/osm/opensm/osm_multicast.c > +++ b/osm/opensm/osm_multicast.c > @@ -49,6 +49,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_node_desc_rcv.c b/osm/opensm/osm_node_desc_rcv.c > index 62fe034..f9fa22d 100644 > --- a/osm/opensm/osm_node_desc_rcv.c > +++ b/osm/opensm/osm_node_desc_rcv.c > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_node_desc_rcv_ctrl.c > b/osm/opensm/osm_node_desc_rcv_ctrl.c > index 9f689e2..3f26b83 100644 > --- a/osm/opensm/osm_node_desc_rcv_ctrl.c > +++ b/osm/opensm/osm_node_desc_rcv_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_node_info_rcv.c b/osm/opensm/osm_node_info_rcv.c > index c35e2b7..59257a0 100644 > --- a/osm/opensm/osm_node_info_rcv.c > +++ b/osm/opensm/osm_node_info_rcv.c > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_node_info_rcv_ctrl.c > b/osm/opensm/osm_node_info_rcv_ctrl.c > index 478f9c4..cbff6ce 100644 > --- a/osm/opensm/osm_node_info_rcv_ctrl.c > +++ b/osm/opensm/osm_node_info_rcv_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_opensm.c b/osm/opensm/osm_opensm.c > index 2a8e0f8..8c422b5 100644 > --- a/osm/opensm/osm_opensm.c > +++ b/osm/opensm/osm_opensm.c > @@ -53,7 +53,7 @@ #endif /* HAVE_CONFIG_H */ > > #include > #include > -#include > +#include > #include > #include > #include > @@ -130,8 +130,6 @@ osm_opensm_destroy( > > cl_plock_destroy( &p_osm->lock ); > > - cl_mem_display( ); > - > osm_log_destroy( &p_osm->log ); > } > > diff --git a/osm/opensm/osm_pkey.c b/osm/opensm/osm_pkey.c > index b0cb869..5ecfdd9 100644 > --- a/osm/opensm/osm_pkey.c > +++ b/osm/opensm/osm_pkey.c > @@ -51,6 +51,7 @@ #endif /* HAVE_CONFIG_H */ > > #include > #include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_pkey_mgr.c b/osm/opensm/osm_pkey_mgr.c > index f98d13b..e08b7cc 100644 > --- a/osm/opensm/osm_pkey_mgr.c > +++ b/osm/opensm/osm_pkey_mgr.c > @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_pkey_rcv.c b/osm/opensm/osm_pkey_rcv.c > index 8696dc4..5262a6b 100644 > --- a/osm/opensm/osm_pkey_rcv.c > +++ b/osm/opensm/osm_pkey_rcv.c > @@ -39,8 +39,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_pkey_rcv_ctrl.c b/osm/opensm/osm_pkey_rcv_ctrl.c > index 77ebab2..cd4367a 100644 > --- a/osm/opensm/osm_pkey_rcv_ctrl.c > +++ b/osm/opensm/osm_pkey_rcv_ctrl.c > @@ -43,7 +43,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_port.c b/osm/opensm/osm_port.c > index f8c51e8..53ab006 100644 > --- a/osm/opensm/osm_port.c > +++ b/osm/opensm/osm_port.c > @@ -52,6 +52,7 @@ # include > #endif /* HAVE_CONFIG_H */ > > #include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_port_info_rcv.c b/osm/opensm/osm_port_info_rcv.c > index 119bcbd..a08c57c 100644 > --- a/osm/opensm/osm_port_info_rcv.c > +++ b/osm/opensm/osm_port_info_rcv.c > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_port_info_rcv_ctrl.c > b/osm/opensm/osm_port_info_rcv_ctrl.c > index 9f6001f..303bedb 100644 > --- a/osm/opensm/osm_port_info_rcv_ctrl.c > +++ b/osm/opensm/osm_port_info_rcv_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_prtn.c b/osm/opensm/osm_prtn.c > index 26790b4..8b748c4 100644 > --- a/osm/opensm/osm_prtn.c > +++ b/osm/opensm/osm_prtn.c > @@ -54,6 +54,7 @@ #include > #include > > #include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_qos.c b/osm/opensm/osm_qos.c > index cd5c26a..c23ef87 100644 > --- a/osm/opensm/osm_qos.c > +++ b/osm/opensm/osm_qos.c > @@ -46,6 +46,7 @@ # include > #endif /* HAVE_CONFIG_H */ > > #include > +#include > > #include > #include > diff --git a/osm/opensm/osm_remote_sm.c b/osm/opensm/osm_remote_sm.c > index eb65d22..b91264e 100644 > --- a/osm/opensm/osm_remote_sm.c > +++ b/osm/opensm/osm_remote_sm.c > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > > /********************************************************************** > **********************************************************************/ > diff --git a/osm/opensm/osm_req.c b/osm/opensm/osm_req.c > index 9ddc9e9..534694b 100644 > --- a/osm/opensm/osm_req.c > +++ b/osm/opensm/osm_req.c > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_req_ctrl.c b/osm/opensm/osm_req_ctrl.c > index 708e7c9..2d0e7e0 100644 > --- a/osm/opensm/osm_req_ctrl.c > +++ b/osm/opensm/osm_req_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_resp.c b/osm/opensm/osm_resp.c > index 9b5079a..aa60bf2 100644 > --- a/osm/opensm/osm_resp.c > +++ b/osm/opensm/osm_resp.c > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa.c b/osm/opensm/osm_sa.c > index b33431c..fa7dad8 100644 > --- a/osm/opensm/osm_sa.c > +++ b/osm/opensm/osm_sa.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_class_port_info.c > b/osm/opensm/osm_sa_class_port_info.c > index 389bc9c..cfad739 100644 > --- a/osm/opensm/osm_sa_class_port_info.c > +++ b/osm/opensm/osm_sa_class_port_info.c > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_class_port_info_ctrl.c > b/osm/opensm/osm_sa_class_port_info_ctrl.c > index 219a837..c71af4c 100644 > --- a/osm/opensm/osm_sa_class_port_info_ctrl.c > +++ b/osm/opensm/osm_sa_class_port_info_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sa_guidinfo_record.c > b/osm/opensm/osm_sa_guidinfo_record.c > index 7d1eebf..601c809 100644 > --- a/osm/opensm/osm_sa_guidinfo_record.c > +++ b/osm/opensm/osm_sa_guidinfo_record.c > @@ -54,8 +54,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_guidinfo_record_ctrl.c > b/osm/opensm/osm_sa_guidinfo_record_ctrl.c > index b252b20..f2211b1 100644 > --- a/osm/opensm/osm_sa_guidinfo_record_ctrl.c > +++ b/osm/opensm/osm_sa_guidinfo_record_ctrl.c > @@ -54,7 +54,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sa_informinfo.c b/osm/opensm/osm_sa_informinfo.c > index 149e609..a820dea 100644 > --- a/osm/opensm/osm_sa_informinfo.c > +++ b/osm/opensm/osm_sa_informinfo.c > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_informinfo_ctrl.c > b/osm/opensm/osm_sa_informinfo_ctrl.c > index 75edabc..31644af 100644 > --- a/osm/opensm/osm_sa_informinfo_ctrl.c > +++ b/osm/opensm/osm_sa_informinfo_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sa_lft_record.c b/osm/opensm/osm_sa_lft_record.c > index b9b903e..2d17dbe 100644 > --- a/osm/opensm/osm_sa_lft_record.c > +++ b/osm/opensm/osm_sa_lft_record.c > @@ -55,6 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_lft_record_ctrl.c > b/osm/opensm/osm_sa_lft_record_ctrl.c > index 0682438..1cc2544 100644 > --- a/osm/opensm/osm_sa_lft_record_ctrl.c > +++ b/osm/opensm/osm_sa_lft_record_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sa_link_record.c b/osm/opensm/osm_sa_link_record.c > index 1a407e1..a525002 100644 > --- a/osm/opensm/osm_sa_link_record.c > +++ b/osm/opensm/osm_sa_link_record.c > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_link_record_ctrl.c > b/osm/opensm/osm_sa_link_record_ctrl.c > index 707c184..01db21d 100644 > --- a/osm/opensm/osm_sa_link_record_ctrl.c > +++ b/osm/opensm/osm_sa_link_record_ctrl.c > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sa_mad_ctrl.c b/osm/opensm/osm_sa_mad_ctrl.c > index 1f87ea2..81584ce 100644 > --- a/osm/opensm/osm_sa_mad_ctrl.c > +++ b/osm/opensm/osm_sa_mad_ctrl.c > @@ -50,7 +50,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_mcmember_record.c > b/osm/opensm/osm_sa_mcmember_record.c > index 291fbf5..5129231 100644 > --- a/osm/opensm/osm_sa_mcmember_record.c > +++ b/osm/opensm/osm_sa_mcmember_record.c > @@ -55,6 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_mcmember_record_ctrl.c > b/osm/opensm/osm_sa_mcmember_record_ctrl.c > index 99a779a..a583979 100644 > --- a/osm/opensm/osm_sa_mcmember_record_ctrl.c > +++ b/osm/opensm/osm_sa_mcmember_record_ctrl.c > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_multipath_record.c > b/osm/opensm/osm_sa_multipath_record.c > index bdf53a3..c8efdb4 100644 > --- a/osm/opensm/osm_sa_multipath_record.c > +++ b/osm/opensm/osm_sa_multipath_record.c > @@ -52,8 +52,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_multipath_record_ctrl.c > b/osm/opensm/osm_sa_multipath_record_ctrl.c > index 7c0337c..e330bb8 100644 > --- a/osm/opensm/osm_sa_multipath_record_ctrl.c > +++ b/osm/opensm/osm_sa_multipath_record_ctrl.c > @@ -56,7 +56,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sa_node_record.c b/osm/opensm/osm_sa_node_record.c > index ecaa048..ac9be22 100644 > --- a/osm/opensm/osm_sa_node_record.c > +++ b/osm/opensm/osm_sa_node_record.c > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_node_record_ctrl.c > b/osm/opensm/osm_sa_node_record_ctrl.c > index dcf5944..61b363a 100644 > --- a/osm/opensm/osm_sa_node_record_ctrl.c > +++ b/osm/opensm/osm_sa_node_record_ctrl.c > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sa_path_record.c b/osm/opensm/osm_sa_path_record.c > index 1e4a137..7da6d70 100644 > --- a/osm/opensm/osm_sa_path_record.c > +++ b/osm/opensm/osm_sa_path_record.c > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_path_record_ctrl.c > b/osm/opensm/osm_sa_path_record_ctrl.c > index eab7171..9495785 100644 > --- a/osm/opensm/osm_sa_path_record_ctrl.c > +++ b/osm/opensm/osm_sa_path_record_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sa_pkey_record.c b/osm/opensm/osm_sa_pkey_record.c > index e60466b..0eeb0c0 100644 > --- a/osm/opensm/osm_sa_pkey_record.c > +++ b/osm/opensm/osm_sa_pkey_record.c > @@ -43,8 +43,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_pkey_record_ctrl.c > b/osm/opensm/osm_sa_pkey_record_ctrl.c > index 01cdc0f..a9d8a8d 100644 > --- a/osm/opensm/osm_sa_pkey_record_ctrl.c > +++ b/osm/opensm/osm_sa_pkey_record_ctrl.c > @@ -43,7 +43,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sa_portinfo_record.c > b/osm/opensm/osm_sa_portinfo_record.c > index 3acb8c9..e1ca873 100644 > --- a/osm/opensm/osm_sa_portinfo_record.c > +++ b/osm/opensm/osm_sa_portinfo_record.c > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_portinfo_record_ctrl.c > b/osm/opensm/osm_sa_portinfo_record_ctrl.c > index 831843b..4f53f04 100644 > --- a/osm/opensm/osm_sa_portinfo_record_ctrl.c > +++ b/osm/opensm/osm_sa_portinfo_record_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sa_response.c b/osm/opensm/osm_sa_response.c > index 30f561f..03c94f7 100644 > --- a/osm/opensm/osm_sa_response.c > +++ b/osm/opensm/osm_sa_response.c > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_service_record.c > b/osm/opensm/osm_sa_service_record.c > index 38ee80b..a65e41d 100644 > --- a/osm/opensm/osm_sa_service_record.c > +++ b/osm/opensm/osm_sa_service_record.c > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_service_record_ctrl.c > b/osm/opensm/osm_sa_service_record_ctrl.c > index 5f8c936..8af9cd7 100644 > --- a/osm/opensm/osm_sa_service_record_ctrl.c > +++ b/osm/opensm/osm_sa_service_record_ctrl.c > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sa_slvl_record.c b/osm/opensm/osm_sa_slvl_record.c > index 237b99c..5d1928e 100644 > --- a/osm/opensm/osm_sa_slvl_record.c > +++ b/osm/opensm/osm_sa_slvl_record.c > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_slvl_record_ctrl.c > b/osm/opensm/osm_sa_slvl_record_ctrl.c > index d156bf1..7801508 100644 > --- a/osm/opensm/osm_sa_slvl_record_ctrl.c > +++ b/osm/opensm/osm_sa_slvl_record_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sa_sminfo_record.c > b/osm/opensm/osm_sa_sminfo_record.c > index 9c3f436..b9dee38 100644 > --- a/osm/opensm/osm_sa_sminfo_record.c > +++ b/osm/opensm/osm_sa_sminfo_record.c > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_sminfo_record_ctrl.c > b/osm/opensm/osm_sa_sminfo_record_ctrl.c > index 72c2fad..3b07920 100644 > --- a/osm/opensm/osm_sa_sminfo_record_ctrl.c > +++ b/osm/opensm/osm_sa_sminfo_record_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sa_vlarb_record.c b/osm/opensm/osm_sa_vlarb_record.c > index ddbef9c..059e5a9 100644 > --- a/osm/opensm/osm_sa_vlarb_record.c > +++ b/osm/opensm/osm_sa_vlarb_record.c > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sa_vlarb_record_ctrl.c > b/osm/opensm/osm_sa_vlarb_record_ctrl.c > index f7ad3ed..a243e08 100644 > --- a/osm/opensm/osm_sa_vlarb_record_ctrl.c > +++ b/osm/opensm/osm_sa_vlarb_record_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_service.c b/osm/opensm/osm_service.c > index 723e117..a1309d3 100644 > --- a/osm/opensm/osm_service.c > +++ b/osm/opensm/osm_service.c > @@ -49,6 +49,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_slvl_map_rcv.c b/osm/opensm/osm_slvl_map_rcv.c > index 9a6acf5..33c3d45 100644 > --- a/osm/opensm/osm_slvl_map_rcv.c > +++ b/osm/opensm/osm_slvl_map_rcv.c > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_slvl_map_rcv_ctrl.c > b/osm/opensm/osm_slvl_map_rcv_ctrl.c > index ee357da..4da0eff 100644 > --- a/osm/opensm/osm_slvl_map_rcv_ctrl.c > +++ b/osm/opensm/osm_slvl_map_rcv_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sm.c b/osm/opensm/osm_sm.c > index f6e33c5..0e09f26 100644 > --- a/osm/opensm/osm_sm.c > +++ b/osm/opensm/osm_sm.c > @@ -55,6 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_sm_mad_ctrl.c b/osm/opensm/osm_sm_mad_ctrl.c > index 1b90335..9dceef2 100644 > --- a/osm/opensm/osm_sm_mad_ctrl.c > +++ b/osm/opensm/osm_sm_mad_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_sm_state_mgr.c b/osm/opensm/osm_sm_state_mgr.c > index a881f7f..8ae9889 100644 > --- a/osm/opensm/osm_sm_state_mgr.c > +++ b/osm/opensm/osm_sm_state_mgr.c > @@ -50,8 +50,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sminfo_rcv.c b/osm/opensm/osm_sminfo_rcv.c > index e5c4bbb..5914984 100644 > --- a/osm/opensm/osm_sminfo_rcv.c > +++ b/osm/opensm/osm_sminfo_rcv.c > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_sminfo_rcv_ctrl.c b/osm/opensm/osm_sminfo_rcv_ctrl.c > index 76ae65c..327d7eb 100644 > --- a/osm/opensm/osm_sminfo_rcv_ctrl.c > +++ b/osm/opensm/osm_sminfo_rcv_ctrl.c > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c > index c97875c..97b017d 100644 > --- a/osm/opensm/osm_state_mgr.c > +++ b/osm/opensm/osm_state_mgr.c > @@ -50,7 +50,9 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_state_mgr_ctrl.c b/osm/opensm/osm_state_mgr_ctrl.c > index a7afc46..0bde333 100644 > --- a/osm/opensm/osm_state_mgr_ctrl.c > +++ b/osm/opensm/osm_state_mgr_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c > index 9b4bcfe..c251411 100644 > --- a/osm/opensm/osm_subnet.c > +++ b/osm/opensm/osm_subnet.c > @@ -51,6 +51,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_sw_info_rcv.c b/osm/opensm/osm_sw_info_rcv.c > index 7a1f72f..6bbd73a 100644 > --- a/osm/opensm/osm_sw_info_rcv.c > +++ b/osm/opensm/osm_sw_info_rcv.c > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_sw_info_rcv_ctrl.c > b/osm/opensm/osm_sw_info_rcv_ctrl.c > index a97a7dc..fb8fe50 100644 > --- a/osm/opensm/osm_sw_info_rcv_ctrl.c > +++ b/osm/opensm/osm_sw_info_rcv_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_sweep_fail_ctrl.c b/osm/opensm/osm_sweep_fail_ctrl.c > index 022988a..e27a540 100644 > --- a/osm/opensm/osm_sweep_fail_ctrl.c > +++ b/osm/opensm/osm_sweep_fail_ctrl.c > @@ -49,7 +49,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c > index fa726c6..7e89475 100644 > --- a/osm/opensm/osm_switch.c > +++ b/osm/opensm/osm_switch.c > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_trap_rcv.c b/osm/opensm/osm_trap_rcv.c > index 7e39832..9865f53 100644 > --- a/osm/opensm/osm_trap_rcv.c > +++ b/osm/opensm/osm_trap_rcv.c > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_trap_rcv_ctrl.c b/osm/opensm/osm_trap_rcv_ctrl.c > index 1e6bf45..ee5a1a4 100644 > --- a/osm/opensm/osm_trap_rcv_ctrl.c > +++ b/osm/opensm/osm_trap_rcv_ctrl.c > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c > index 4492c1a..95f4d04 100644 > --- a/osm/opensm/osm_ucast_mgr.c > +++ b/osm/opensm/osm_ucast_mgr.c > @@ -54,6 +54,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c > index b70cf21..44e1993 100644 > --- a/osm/opensm/osm_ucast_updn.c > +++ b/osm/opensm/osm_ucast_updn.c > @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > #include > #include > diff --git a/osm/opensm/osm_vl15intf.c b/osm/opensm/osm_vl15intf.c > index f72620b..68f17c5 100644 > --- a/osm/opensm/osm_vl15intf.c > +++ b/osm/opensm/osm_vl15intf.c > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_vl_arb_rcv.c b/osm/opensm/osm_vl_arb_rcv.c > index 70fd5ed..e33a2f9 100644 > --- a/osm/opensm/osm_vl_arb_rcv.c > +++ b/osm/opensm/osm_vl_arb_rcv.c > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > #include > -#include > #include > #include > #include > diff --git a/osm/opensm/osm_vl_arb_rcv_ctrl.c b/osm/opensm/osm_vl_arb_rcv_ctrl.c > index 9113985..f1f22c7 100644 > --- a/osm/opensm/osm_vl_arb_rcv_ctrl.c > +++ b/osm/opensm/osm_vl_arb_rcv_ctrl.c > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > -#include > +#include > #include > #include > > diff --git a/osm/osmtest/include/osmtest_subnet.h > b/osm/osmtest/include/osmtest_subnet.h > index 0e7cf3e..277a2aa 100644 > --- a/osm/osmtest/include/osmtest_subnet.h > +++ b/osm/osmtest/include/osmtest_subnet.h > @@ -47,6 +47,7 @@ > #ifndef _OSMTEST_SUBNET_H_ > #define _OSMTEST_SUBNET_H_ > > +#include > #include > #include > #include > diff --git a/osm/osmtest/osmt_inform.c b/osm/osmtest/osmt_inform.c > index b24ae30..e1562db 100644 > --- a/osm/osmtest/osmt_inform.c > +++ b/osm/osmtest/osmt_inform.c > @@ -56,7 +56,6 @@ #include > #include > #include > #include > -#include > > #include > #include "osmtest.h" > diff --git a/osm/osmtest/osmt_slvl_vl_arb.c b/osm/osmtest/osmt_slvl_vl_arb.c > index 6cb8377..9fc84f6 100644 > --- a/osm/osmtest/osmt_slvl_vl_arb.c > +++ b/osm/osmtest/osmt_slvl_vl_arb.c > @@ -54,7 +54,6 @@ #include > #include > #include > #include > -#include > #include "osmtest.h" > > /********************************************************************** > diff --git a/osm/osmtest/osmtest.c b/osm/osmtest/osmtest.c > index 78aff53..5eb5482 100644 > --- a/osm/osmtest/osmtest.c > +++ b/osm/osmtest/osmtest.c > @@ -56,8 +56,8 @@ #endif > > #include > #include > -#ifdef __WIN__ > #include > +#ifdef __WIN__ > #include > #else > #include > -- > 1.3.2 > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From dotanb at mellanox.co.il Thu May 18 08:01:10 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Thu, 18 May 2006 18:01:10 +0300 Subject: [openib-general] [librmdacm] fix rping to return a value different than zero when there is a failure Message-ID: <200605181801.10833.dotanb@mellanox.co.il> Hi. Here is a patch to fix this issue in the test. I couldn't find the parameters for executing this test. can you please send me an a command line example which i can use? thanks Added checks to the return values of all of the functions that may fail (in order to add this test to the regression system). Signed-off-by: Dotan Barak Index: last_stable/src/userspace/librdmacm/examples/rping.c =================================================================== --- last_stable.orig/src/userspace/librdmacm/examples/rping.c 2006-05-18 17:07:12.000000000 +0300 +++ last_stable/src/userspace/librdmacm/examples/rping.c 2006-05-18 17:44:47.000000000 +0300 @@ -148,10 +148,10 @@ struct rping_cb { struct rdma_cm_id *child_cm_id; /* connection on server side */ }; -static void rping_cma_event_handler(struct rdma_cm_id *cma_id, +static int rping_cma_event_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) { - int ret; + int ret = 0; struct rping_cb *cb = cma_id->context; DEBUG_LOG("cma_event type %d cma_id %p (%s)\n", event->event, cma_id, @@ -194,6 +194,7 @@ static void rping_cma_event_handler(stru fprintf(stderr, "cma event %d, error %d\n", event->event, event->status); sem_post(&cb->sem); + ret = -1; break; case RDMA_CM_EVENT_DISCONNECTED: @@ -203,13 +204,17 @@ static void rping_cma_event_handler(stru case RDMA_CM_EVENT_DEVICE_REMOVAL: fprintf(stderr, "cma detected device removal!!!!\n"); + ret = -1; break; default: fprintf(stderr, "oof bad type!\n"); sem_post(&cb->sem); + ret = -1; break; } + + return ret; } static int server_recv(struct rping_cb *cb, struct ibv_wc *wc) @@ -248,7 +253,7 @@ static int client_recv(struct rping_cb * return 0; } -static void rping_cq_event_handler(struct rping_cb *cb) +static int rping_cq_event_handler(struct rping_cb *cb) { struct ibv_wc wc; struct ibv_recv_wr *bad_wr; @@ -258,6 +263,7 @@ static void rping_cq_event_handler(struc if (wc.status) { fprintf(stderr, "cq completion failed status %d\n", wc.status); + ret = -1; goto error; } @@ -297,6 +303,7 @@ static void rping_cq_event_handler(struc default: DEBUG_LOG("unknown!!!!! completion\n"); + ret = -1; goto error; } } @@ -304,11 +311,13 @@ static void rping_cq_event_handler(struc fprintf(stderr, "poll error %d\n", ret); goto error; } - return; + return 0; error: cb->state = ERROR; sem_post(&cb->sem); + + return ret; } static int rping_accept(struct rping_cb *cb) @@ -545,7 +554,9 @@ static void *cm_thread(void *arg) fprintf(stderr, "rdma_get_cm_event err %d\n", ret); exit(ret); } - rping_cma_event_handler(event->id, event); + ret = rping_cma_event_handler(event->id, event); + if (ret) + exit(ret); rdma_ack_cm_event(event); } } @@ -559,7 +570,7 @@ static void *cq_thread(void *arg) DEBUG_LOG("cq_thread started.\n"); - while (1) { + while (1) { ret = ibv_get_cq_event(cb->channel, &ev_cq, &ev_ctx); if (ret) { fprintf(stderr, "Failed to get cq event!\n"); @@ -574,7 +585,9 @@ static void *cq_thread(void *arg) fprintf(stderr, "Failed to set notify!\n"); exit(ret); } - rping_cq_event_handler(cb); + ret = rping_cq_event_handler(cb); + if (ret); + exit(ret); ibv_ack_cq_events(cb->cq, 1); } } @@ -591,10 +604,10 @@ static void rping_format_send(struct rpi info->buf, info->rkey, info->size); } -static void rping_test_server(struct rping_cb *cb) +static int rping_test_server(struct rping_cb *cb) { struct ibv_send_wr *bad_wr; - int ret; + int ret = 0; while (1) { /* Wait for client's Start STAG/TO/Len */ @@ -602,6 +615,7 @@ static void rping_test_server(struct rpi if (cb->state != RDMA_READ_ADV) { fprintf(stderr, "wait for RDMA_READ_ADV state %d\n", cb->state); + ret = -1; break; } @@ -616,6 +630,7 @@ static void rping_test_server(struct rpi ret = ibv_post_send(cb->qp, &cb->rdma_sq_wr, &bad_wr); if (ret) { fprintf(stderr, "post send error %d\n", ret); + ret = 1; break; } DEBUG_LOG("server posted rdma read req \n"); @@ -625,6 +640,7 @@ static void rping_test_server(struct rpi if (cb->state != RDMA_READ_COMPLETE) { fprintf(stderr, "wait for RDMA_READ_COMPLETE state %d\n", cb->state); + ret = -1; break; } DEBUG_LOG("server received read complete\n"); @@ -646,6 +662,7 @@ static void rping_test_server(struct rpi if (cb->state != RDMA_WRITE_ADV) { fprintf(stderr, "wait for RDMA_WRITE_ADV state %d\n", cb->state); + ret = -1; break; } DEBUG_LOG("server received sink adv\n"); @@ -671,6 +688,7 @@ static void rping_test_server(struct rpi if (cb->state != RDMA_WRITE_COMPLETE) { fprintf(stderr, "wait for RDMA_WRITE_COMPLETE state %d\n", cb->state); + ret = -1; break; } DEBUG_LOG("server rdma write complete \n"); @@ -683,6 +701,8 @@ static void rping_test_server(struct rpi } DEBUG_LOG("server posted go ahead\n"); } + + return ret; } static int rping_bind_server(struct rping_cb *cb) @@ -719,19 +739,19 @@ static int rping_bind_server(struct rpin return 0; } -static void rping_run_server(struct rping_cb *cb) +static int rping_run_server(struct rping_cb *cb) { struct ibv_recv_wr *bad_wr; int ret; ret = rping_bind_server(cb); if (ret) - return; + return ret; ret = rping_setup_qp(cb, cb->child_cm_id); if (ret) { fprintf(stderr, "setup_qp failed: %d\n", ret); - return; + return ret; } ret = rping_setup_buffers(cb); @@ -761,11 +781,13 @@ err2: rping_free_buffers(cb); err1: rping_free_qp(cb); + + return ret; } -static void rping_test_client(struct rping_cb *cb) +static int rping_test_client(struct rping_cb *cb) { - int ping, start, cc, i, ret; + int ping, start, cc, i, ret = 0; struct ibv_send_wr *bad_wr; unsigned char c; @@ -798,6 +820,7 @@ static void rping_test_client(struct rpi if (cb->state != RDMA_WRITE_ADV) { fprintf(stderr, "wait for RDMA_WRITE_ADV state %d\n", cb->state); + ret = -1; break; } @@ -813,18 +836,22 @@ static void rping_test_client(struct rpi if (cb->state != RDMA_WRITE_COMPLETE) { fprintf(stderr, "wait for RDMA_WRITE_COMPLETE state %d\n", cb->state); + ret = -1; break; } if (cb->validate) if (memcmp(cb->start_buf, cb->rdma_buf, cb->size)) { fprintf(stderr, "data mismatch!\n"); + ret = -1; break; } if (cb->verbose) printf("ping data: %s\n", cb->rdma_buf); } + + return ret; } static int rping_connect_client(struct rping_cb *cb) @@ -881,19 +908,19 @@ static int rping_bind_client(struct rpin return 0; } -static void rping_run_client(struct rping_cb *cb) +static int rping_run_client(struct rping_cb *cb) { struct ibv_recv_wr *bad_wr; int ret; ret = rping_bind_client(cb); if (ret) - return; + return ret; ret = rping_setup_qp(cb, cb->cm_id); if (ret) { fprintf(stderr, "setup_qp failed: %d\n", ret); - return; + return ret; } ret = rping_setup_buffers(cb); @@ -922,6 +949,8 @@ err2: rping_free_buffers(cb); err1: rping_free_qp(cb); + + return ret; } static void usage(char *name) @@ -1039,9 +1068,9 @@ int main(int argc, char *argv[]) pthread_create(&cb->cmthread, NULL, cm_thread, cb); if (cb->server) - rping_run_server(cb); + ret = rping_run_server(cb); else - rping_run_client(cb); + ret = rping_run_client(cb); DEBUG_LOG("destroy cm_id %p\n", cb->cm_id); rdma_destroy_id(cb->cm_id); @@ -1049,5 +1078,7 @@ out2: rdma_destroy_event_channel(cb->cm_channel); out: free(cb); + + printf("return status %d\n", ret); return ret; } From halr at voltaire.com Thu May 18 08:07:16 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 May 2006 11:07:16 -0400 Subject: [openib-general] [PATCH] Replace cl_memory.h by string.h [was:[PATCH] OpenSM: Use memory routines directly and eliminatecl_mem* routines] In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBB4@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBB4@mtlexch01.mtl.com> Message-ID: <1147964812.18971.77653.camel@hal.voltaire.com> Hi Eitan, On Thu, 2006-05-18 at 11:00, Eitan Zahavi wrote: > Hi Sasha, Hal, > > There several applications (ibis and ibmgtsim) that depend on complib, > The changes of cleaning up the cl_memory API affect these utilities. > Can you please provide the list of APIs removed and their replacements ? cl_memset -> memset cl_memclr(x, y) -> memset ( x, 0, y) cl_memcpy -> memcpy Soon cl_malloc/cl_zalloc/cl_free will change (and the memory tracking will be removed). -- Hal > Also if we eventually converge on a single complib for windows and linux > then the Windows stack is going to be affected by these changes too. > > EZ > > Eitan Zahavi > Senior Engineering Director, Software Architect > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: openib-general-bounces at openib.org [mailto:openib-general- > > bounces at openib.org] On Behalf Of Sasha Khapyorsky > > Sent: Thursday, May 18, 2006 1:03 AM > > To: Hal Rosenstock > > Cc: openib-general at openib.org > > Subject: [openib-general] [PATCH] Replace cl_memory.h by string.h > [was:[PATCH] > > OpenSM: Use memory routines directly and eliminatecl_mem* routines] > > > > On 12:14 Wed 17 May , Hal Rosenstock wrote: > > > OpenSM: Use memory routines directly and eliminate cl_mem* routines > > > as these routines are part of ISO C > > > > > > Signed-off-by: Hal Rosenstock > > > > Following Hal's cleanup this includes string.h header file for proper > > mem*() functions prototype definitions where necessary, > removes/includes > > cl_memory.h as needed. Also couple of unistd.h additions for close(), > > sleep() and unlink() calls. > > > > Signed-off-by: Sasha Khapyorsky > > > > > > --- > > > > osm/complib/cl_event_wheel.c | 1 + > > osm/complib/cl_map.c | 2 +- > > osm/complib/cl_memory.c | 1 + > > osm/complib/cl_perf.c | 2 ++ > > osm/complib/cl_pool.c | 1 + > > osm/complib/cl_ptr_vector.c | 1 + > > osm/complib/cl_threadpool.c | 1 + > > osm/complib/cl_timer.c | 1 + > > osm/complib/cl_vector.c | 1 + > > osm/complib/libosmcomp.map | 3 --- > > osm/include/complib/cl_byteswap.h | 3 +-- > > osm/include/complib/cl_memory.h | 1 - > > osm/include/iba/ib_types.h | 2 +- > > osm/include/opensm/osm_lin_fwd_tbl.h | 1 + > > osm/include/opensm/osm_madw.h | 1 + > > osm/include/opensm/osm_mcm_info.h | 1 + > > osm/include/opensm/osm_mtree.h | 1 + > > osm/include/opensm/osm_path.h | 1 + > > osm/include/opensm/osm_port.h | 1 + > > osm/include/opensm/osm_port_profile.h | 1 + > > osm/include/opensm/osm_rand_fwd_tbl.h | 1 + > > osm/include/vendor/osm_vendor_mlx_svc.h | 2 ++ > > osm/include/vendor/osm_vendor_mtl.h | 2 -- > > .../vendor/osm_vendor_mtl_transaction_mgr.h | 1 - > > osm/include/vendor/osm_vendor_ts.h | 1 - > > osm/libvendor/osm_pkt_randomizer.c | 2 ++ > > osm/libvendor/osm_vendor_al.c | 1 + > > osm/libvendor/osm_vendor_ibumad.c | 10 ++++++---- > > osm/libvendor/osm_vendor_ibumad_sa.c | 3 +++ > > osm/libvendor/osm_vendor_mlx.c | 2 ++ > > osm/libvendor/osm_vendor_mlx_anafa.c | 1 + > > osm/libvendor/osm_vendor_mlx_dispatcher.c | 1 + > > osm/libvendor/osm_vendor_mlx_hca.c | 1 + > > osm/libvendor/osm_vendor_mlx_hca_anafa.c | 1 + > > osm/libvendor/osm_vendor_mlx_ibmgt.c | 2 ++ > > osm/libvendor/osm_vendor_mlx_rmpp_ctx.c | 1 + > > osm/libvendor/osm_vendor_mlx_sa.c | 2 ++ > > osm/libvendor/osm_vendor_mlx_sar.c | 4 +++- > > osm/libvendor/osm_vendor_mlx_sender.c | 1 + > > osm/libvendor/osm_vendor_mlx_sim.c | 2 ++ > > osm/libvendor/osm_vendor_mlx_ts.c | 2 ++ > > osm/libvendor/osm_vendor_mlx_ts_anafa.c | 2 ++ > > osm/libvendor/osm_vendor_mtl.c | 2 ++ > > osm/libvendor/osm_vendor_mtl_transaction_mgr.c | 1 + > > osm/libvendor/osm_vendor_test.c | 1 + > > osm/libvendor/osm_vendor_ts.c | 2 ++ > > osm/libvendor/osm_vendor_umadt.c | 1 + > > osm/opensm/osm_db_files.c | 6 ++++-- > > osm/opensm/osm_db_pack.c | 1 + > > osm/opensm/osm_drop_mgr.c | 2 ++ > > osm/opensm/osm_fwd_tbl.c | 1 - > > osm/opensm/osm_helper.c | 2 +- > > osm/opensm/osm_inform.c | 1 + > > osm/opensm/osm_lid_mgr.c | 1 + > > osm/opensm/osm_lin_fwd_rcv.c | 2 +- > > osm/opensm/osm_lin_fwd_rcv_ctrl.c | 2 +- > > osm/opensm/osm_lin_fwd_tbl.c | 1 + > > osm/opensm/osm_link_mgr.c | 2 +- > > osm/opensm/osm_mad_pool.c | 1 + > > osm/opensm/osm_matrix.c | 1 + > > osm/opensm/osm_mcast_fwd_rcv.c | 2 +- > > osm/opensm/osm_mcast_fwd_rcv_ctrl.c | 2 +- > > osm/opensm/osm_mcast_mgr.c | 2 ++ > > osm/opensm/osm_mcast_tbl.c | 1 + > > osm/opensm/osm_mcm_info.c | 1 + > > osm/opensm/osm_mcm_port.c | 2 ++ > > osm/opensm/osm_mtree.c | 1 + > > osm/opensm/osm_multicast.c | 1 + > > osm/opensm/osm_node_desc_rcv.c | 2 +- > > osm/opensm/osm_node_desc_rcv_ctrl.c | 2 +- > > osm/opensm/osm_node_info_rcv.c | 2 +- > > osm/opensm/osm_node_info_rcv_ctrl.c | 2 +- > > osm/opensm/osm_opensm.c | 4 +--- > > osm/opensm/osm_pkey.c | 1 + > > osm/opensm/osm_pkey_mgr.c | 1 + > > osm/opensm/osm_pkey_rcv.c | 2 +- > > osm/opensm/osm_pkey_rcv_ctrl.c | 2 +- > > osm/opensm/osm_port.c | 1 + > > osm/opensm/osm_port_info_rcv.c | 2 +- > > osm/opensm/osm_port_info_rcv_ctrl.c | 2 +- > > osm/opensm/osm_prtn.c | 1 + > > osm/opensm/osm_qos.c | 1 + > > osm/opensm/osm_remote_sm.c | 2 +- > > osm/opensm/osm_req.c | 2 +- > > osm/opensm/osm_req_ctrl.c | 2 +- > > osm/opensm/osm_resp.c | 2 +- > > osm/opensm/osm_sa.c | 2 +- > > osm/opensm/osm_sa_class_port_info.c | 2 +- > > osm/opensm/osm_sa_class_port_info_ctrl.c | 2 +- > > osm/opensm/osm_sa_guidinfo_record.c | 2 +- > > osm/opensm/osm_sa_guidinfo_record_ctrl.c | 2 +- > > osm/opensm/osm_sa_informinfo.c | 2 +- > > osm/opensm/osm_sa_informinfo_ctrl.c | 2 +- > > osm/opensm/osm_sa_lft_record.c | 1 + > > osm/opensm/osm_sa_lft_record_ctrl.c | 2 +- > > osm/opensm/osm_sa_link_record.c | 2 +- > > osm/opensm/osm_sa_link_record_ctrl.c | 2 +- > > osm/opensm/osm_sa_mad_ctrl.c | 2 +- > > osm/opensm/osm_sa_mcmember_record.c | 1 + > > osm/opensm/osm_sa_mcmember_record_ctrl.c | 2 +- > > osm/opensm/osm_sa_multipath_record.c | 2 +- > > osm/opensm/osm_sa_multipath_record_ctrl.c | 2 +- > > osm/opensm/osm_sa_node_record.c | 1 + > > osm/opensm/osm_sa_node_record_ctrl.c | 2 +- > > osm/opensm/osm_sa_path_record.c | 2 +- > > osm/opensm/osm_sa_path_record_ctrl.c | 2 +- > > osm/opensm/osm_sa_pkey_record.c | 2 +- > > osm/opensm/osm_sa_pkey_record_ctrl.c | 2 +- > > osm/opensm/osm_sa_portinfo_record.c | 2 +- > > osm/opensm/osm_sa_portinfo_record_ctrl.c | 2 +- > > osm/opensm/osm_sa_response.c | 2 +- > > osm/opensm/osm_sa_service_record.c | 2 +- > > osm/opensm/osm_sa_service_record_ctrl.c | 2 +- > > osm/opensm/osm_sa_slvl_record.c | 2 +- > > osm/opensm/osm_sa_slvl_record_ctrl.c | 2 +- > > osm/opensm/osm_sa_sminfo_record.c | 2 +- > > osm/opensm/osm_sa_sminfo_record_ctrl.c | 2 +- > > osm/opensm/osm_sa_vlarb_record.c | 2 +- > > osm/opensm/osm_sa_vlarb_record_ctrl.c | 2 +- > > osm/opensm/osm_service.c | 1 + > > osm/opensm/osm_slvl_map_rcv.c | 2 +- > > osm/opensm/osm_slvl_map_rcv_ctrl.c | 2 +- > > osm/opensm/osm_sm.c | 1 + > > osm/opensm/osm_sm_mad_ctrl.c | 2 +- > > osm/opensm/osm_sm_state_mgr.c | 2 +- > > osm/opensm/osm_sminfo_rcv.c | 1 + > > osm/opensm/osm_sminfo_rcv_ctrl.c | 2 +- > > osm/opensm/osm_state_mgr.c | 2 ++ > > osm/opensm/osm_state_mgr_ctrl.c | 2 +- > > osm/opensm/osm_subnet.c | 2 ++ > > osm/opensm/osm_sw_info_rcv.c | 2 +- > > osm/opensm/osm_sw_info_rcv_ctrl.c | 2 +- > > osm/opensm/osm_sweep_fail_ctrl.c | 2 +- > > osm/opensm/osm_switch.c | 1 + > > osm/opensm/osm_trap_rcv.c | 2 +- > > osm/opensm/osm_trap_rcv_ctrl.c | 2 +- > > osm/opensm/osm_ucast_mgr.c | 2 ++ > > osm/opensm/osm_ucast_updn.c | 1 + > > osm/opensm/osm_vl15intf.c | 2 +- > > osm/opensm/osm_vl_arb_rcv.c | 2 +- > > osm/opensm/osm_vl_arb_rcv_ctrl.c | 2 +- > > osm/osmtest/include/osmtest_subnet.h | 1 + > > osm/osmtest/osmt_inform.c | 1 - > > osm/osmtest/osmt_slvl_vl_arb.c | 1 - > > osm/osmtest/osmtest.c | 2 +- > > 145 files changed, 166 insertions(+), 88 deletions(-) > > > > e117de15a67314817a58b6300b432ec9ffa6a0a5 > > diff --git a/osm/complib/cl_event_wheel.c > b/osm/complib/cl_event_wheel.c > > index cf04df7..aaaa53d 100644 > > --- a/osm/complib/cl_event_wheel.c > > +++ b/osm/complib/cl_event_wheel.c > > @@ -40,6 +40,7 @@ # include > > #endif /* HAVE_CONFIG_H */ > > > > #include > > +#include > > #include > > #include > > > > diff --git a/osm/complib/cl_map.c b/osm/complib/cl_map.c > > index 974b0d3..8962e9a 100644 > > --- a/osm/complib/cl_map.c > > +++ b/osm/complib/cl_map.c > > @@ -70,10 +70,10 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > -#include > > > > > > > /*********************************************************************** > ******* > > diff --git a/osm/complib/cl_memory.c b/osm/complib/cl_memory.c > > index 49ff45d..a9ae948 100644 > > --- a/osm/complib/cl_memory.c > > +++ b/osm/complib/cl_memory.c > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #define _MEM_DEBUG_MODE_ 0 > > #ifdef _MEM_DEBUG_MODE_ > > diff --git a/osm/complib/cl_perf.c b/osm/complib/cl_perf.c > > index 753eba3..0c8ead2 100644 > > --- a/osm/complib/cl_perf.c > > +++ b/osm/complib/cl_perf.c > > @@ -51,6 +51,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > + > > /* > > * Always turn on performance tracking when building this file to > allow the > > * performance counter functions to be built into the component > library. > > diff --git a/osm/complib/cl_pool.c b/osm/complib/cl_pool.c > > index cfd2774..3fe07a8 100644 > > --- a/osm/complib/cl_pool.c > > +++ b/osm/complib/cl_pool.c > > @@ -52,6 +52,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/complib/cl_ptr_vector.c b/osm/complib/cl_ptr_vector.c > > index bddce00..5ab74c3 100644 > > --- a/osm/complib/cl_ptr_vector.c > > +++ b/osm/complib/cl_ptr_vector.c > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > > > diff --git a/osm/complib/cl_threadpool.c b/osm/complib/cl_threadpool.c > > index a2f620d..a2a4848 100644 > > --- a/osm/complib/cl_threadpool.c > > +++ b/osm/complib/cl_threadpool.c > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/complib/cl_timer.c b/osm/complib/cl_timer.c > > index 847545f..b3cc3e9 100644 > > --- a/osm/complib/cl_timer.c > > +++ b/osm/complib/cl_timer.c > > @@ -48,6 +48,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/complib/cl_vector.c b/osm/complib/cl_vector.c > > index 3e1a757..bcda8e0 100644 > > --- a/osm/complib/cl_vector.c > > +++ b/osm/complib/cl_vector.c > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > > > diff --git a/osm/complib/libosmcomp.map b/osm/complib/libosmcomp.map > > index 7a7ee1d..73fb242 100644 > > --- a/osm/complib/libosmcomp.map > > +++ b/osm/complib/libosmcomp.map > > @@ -87,9 +87,6 @@ OSMCOMP_1.0 { > > __cl_find_mem; > > __cl_free_trk; > > __cl_free_ntrk; > > - cl_memset; > > - cl_memcpy; > > - cl_memcmp; > > __cl_perf_run_calibration; > > __cl_perf_construct; > > __cl_perf_init; > > diff --git a/osm/include/complib/cl_byteswap.h > b/osm/include/complib/cl_byteswap.h > > index 932d564..d144ea3 100644 > > --- a/osm/include/complib/cl_byteswap.h > > +++ b/osm/include/complib/cl_byteswap.h > > @@ -51,8 +51,7 @@ > > #ifndef _CL_BYTESWAP_H_ > > #define _CL_BYTESWAP_H_ > > > > - > > -#include > > +#include > > #include > > > > #ifdef __cplusplus > > diff --git a/osm/include/complib/cl_memory.h > b/osm/include/complib/cl_memory.h > > index 9f558ac..4bbf7a2 100644 > > --- a/osm/include/complib/cl_memory.h > > +++ b/osm/include/complib/cl_memory.h > > @@ -52,7 +52,6 @@ #define _CL_MEMORY_H_ > > > > > > #include > > -#include > > > > #ifdef __cplusplus > > # define BEGIN_C_DECLS extern "C" { > > diff --git a/osm/include/iba/ib_types.h b/osm/include/iba/ib_types.h > > index 811d836..b72e810 100644 > > --- a/osm/include/iba/ib_types.h > > +++ b/osm/include/iba/ib_types.h > > @@ -38,9 +38,9 @@ > > #if !defined(__IB_TYPES_H__) > > #define __IB_TYPES_H__ > > > > +#include > > #include > > #include > > -#include > > > > #ifdef __cplusplus > > # define BEGIN_C_DECLS extern "C" { > > diff --git a/osm/include/opensm/osm_lin_fwd_tbl.h > > b/osm/include/opensm/osm_lin_fwd_tbl.h > > index dee01a9..ca378a8 100644 > > --- a/osm/include/opensm/osm_lin_fwd_tbl.h > > +++ b/osm/include/opensm/osm_lin_fwd_tbl.h > > @@ -50,6 +50,7 @@ > > #ifndef _OSM_LIN_FWD_TBL_H_ > > #define _OSM_LIN_FWD_TBL_H_ > > > > +#include > > #include > > #include > > > > diff --git a/osm/include/opensm/osm_madw.h > b/osm/include/opensm/osm_madw.h > > index 2173957..4fde04c 100644 > > --- a/osm/include/opensm/osm_madw.h > > +++ b/osm/include/opensm/osm_madw.h > > @@ -51,6 +51,7 @@ > > #ifndef _OSM_MADW_H_ > > #define _OSM_MADW_H_ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/include/opensm/osm_mcm_info.h > > b/osm/include/opensm/osm_mcm_info.h > > index c4d5443..1f325b1 100644 > > --- a/osm/include/opensm/osm_mcm_info.h > > +++ b/osm/include/opensm/osm_mcm_info.h > > @@ -50,6 +50,7 @@ > > #ifndef _OSM_MCM_INFO_H_ > > #define _OSM_MCM_INFO_H_ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/include/opensm/osm_mtree.h > b/osm/include/opensm/osm_mtree.h > > index 57c894b..013112d 100644 > > --- a/osm/include/opensm/osm_mtree.h > > +++ b/osm/include/opensm/osm_mtree.h > > @@ -51,6 +51,7 @@ > > #ifndef _OSM_MTREE_H_ > > #define _OSM_MTREE_H_ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/include/opensm/osm_path.h > b/osm/include/opensm/osm_path.h > > index bf1cc67..cb3bb8e 100644 > > --- a/osm/include/opensm/osm_path.h > > +++ b/osm/include/opensm/osm_path.h > > @@ -38,6 +38,7 @@ > > #ifndef _OSM_PATH_H_ > > #define _OSM_PATH_H_ > > > > +#include > > #include > > #include > > > > diff --git a/osm/include/opensm/osm_port.h > b/osm/include/opensm/osm_port.h > > index 46a0064..cf3f6f2 100644 > > --- a/osm/include/opensm/osm_port.h > > +++ b/osm/include/opensm/osm_port.h > > @@ -50,6 +50,7 @@ > > #ifndef _OSM_PORT_H_ > > #define _OSM_PORT_H_ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/include/opensm/osm_port_profile.h > > b/osm/include/opensm/osm_port_profile.h > > index 9a58115..9c0f7f7 100644 > > --- a/osm/include/opensm/osm_port_profile.h > > +++ b/osm/include/opensm/osm_port_profile.h > > @@ -50,6 +50,7 @@ > > #ifndef _OSM_PORT_PROFILE_H_ > > #define _OSM_PORT_PROFILE_H_ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/include/opensm/osm_rand_fwd_tbl.h > > b/osm/include/opensm/osm_rand_fwd_tbl.h > > index 1d293e5..fac9ffd 100644 > > --- a/osm/include/opensm/osm_rand_fwd_tbl.h > > +++ b/osm/include/opensm/osm_rand_fwd_tbl.h > > @@ -51,6 +51,7 @@ #ifndef _OSM_RAND_FWD_TBL_H_ > > #define _OSM_RAND_FWD_TBL_H_ > > > > #include > > +#include > > #include > > > > #ifdef __cplusplus > > diff --git a/osm/include/vendor/osm_vendor_mlx_svc.h > > b/osm/include/vendor/osm_vendor_mlx_svc.h > > index 69d379c..e4897d4 100644 > > --- a/osm/include/vendor/osm_vendor_mlx_svc.h > > +++ b/osm/include/vendor/osm_vendor_mlx_svc.h > > @@ -38,7 +38,9 @@ #ifndef _OSMV_SVC_H_ > > #define _OSMV_SVC_H_ > > > > #include > > +#include > > #include > > +#include > > #include > > > > #ifdef __cplusplus > > diff --git a/osm/include/vendor/osm_vendor_mtl.h > > b/osm/include/vendor/osm_vendor_mtl.h > > index 5837867..218bdf7 100644 > > --- a/osm/include/vendor/osm_vendor_mtl.h > > +++ b/osm/include/vendor/osm_vendor_mtl.h > > @@ -60,10 +60,8 @@ #define OUT > > #include "iba/ib_types.h" > > #include "iba/ib_al.h" > > #include > > -#include > > #include > > #include > > -#include > > > > #ifdef __cplusplus > > # define BEGIN_C_DECLS extern "C" { > > diff --git a/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h > > b/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h > > index 7bf938d..82d2cc2 100644 > > --- a/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h > > +++ b/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h > > @@ -61,7 +61,6 @@ #include > > #include > > #include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/include/vendor/osm_vendor_ts.h > > b/osm/include/vendor/osm_vendor_ts.h > > index b4c2f21..4414cba 100644 > > --- a/osm/include/vendor/osm_vendor_ts.h > > +++ b/osm/include/vendor/osm_vendor_ts.h > > @@ -59,7 +59,6 @@ #define OUT > > #include "iba/ib_types.h" > > #include "iba/ib_al.h" > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/libvendor/osm_pkt_randomizer.c > > b/osm/libvendor/osm_pkt_randomizer.c > > index 2fa7621..29df135 100644 > > --- a/osm/libvendor/osm_pkt_randomizer.c > > +++ b/osm/libvendor/osm_pkt_randomizer.c > > @@ -51,12 +51,14 @@ #endif /* HAVE_CONFIG_H */ > > > > #include > > #include > > +#include > > > > #ifndef WIN32 > > #include > > #include > > #endif > > > > +#include > > > > > /********************************************************************** > > * Return TRUE if the path is in a fault path, and FALSE otherwise. > > diff --git a/osm/libvendor/osm_vendor_al.c > b/osm/libvendor/osm_vendor_al.c > > index d26d6d8..3240625 100644 > > --- a/osm/libvendor/osm_vendor_al.c > > +++ b/osm/libvendor/osm_vendor_al.c > > @@ -59,6 +59,7 @@ #include > > > > #ifdef OSM_VENDOR_INTF_AL > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/libvendor/osm_vendor_ibumad.c > > b/osm/libvendor/osm_vendor_ibumad.c > > index 0a7fbe3..a3041d0 100644 > > --- a/osm/libvendor/osm_vendor_ibumad.c > > +++ b/osm/libvendor/osm_vendor_ibumad.c > > @@ -57,20 +57,22 @@ #include > > > > #ifdef OSM_VENDOR_INTF_OPENIB > > > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > #include > > #include > > #include > > #include > > #include > > -#include > > #include > > #include > > #include > > #include > > > > -#include > > -#include > > -#include > > > > /****s* OpenSM: Vendor AL/osm_umad_bind_info_t > > * NAME > > diff --git a/osm/libvendor/osm_vendor_ibumad_sa.c > > b/osm/libvendor/osm_vendor_ibumad_sa.c > > index 6eae887..568d39c 100644 > > --- a/osm/libvendor/osm_vendor_ibumad_sa.c > > +++ b/osm/libvendor/osm_vendor_ibumad_sa.c > > @@ -38,10 +38,13 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > > > +#include > > + > > #define MAX_PORTS 64 > > > > > /*********************************************************************** > ****** > > diff --git a/osm/libvendor/osm_vendor_mlx.c > b/osm/libvendor/osm_vendor_mlx.c > > index 4c75d41..4a4be06 100644 > > --- a/osm/libvendor/osm_vendor_mlx.c > > +++ b/osm/libvendor/osm_vendor_mlx.c > > @@ -38,12 +38,14 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > #include > > #include > > #include > > +#include > > > > /** > > * FORWARD REFERENCES > > diff --git a/osm/libvendor/osm_vendor_mlx_anafa.c > > b/osm/libvendor/osm_vendor_mlx_anafa.c > > index 32af9bb..3cd917f 100644 > > --- a/osm/libvendor/osm_vendor_mlx_anafa.c > > +++ b/osm/libvendor/osm_vendor_mlx_anafa.c > > @@ -55,6 +55,7 @@ #include > > #include > > #include > > > > +#include > > #include > > > > /** > > diff --git a/osm/libvendor/osm_vendor_mlx_dispatcher.c > > b/osm/libvendor/osm_vendor_mlx_dispatcher.c > > index 341e784..afa1473 100644 > > --- a/osm/libvendor/osm_vendor_mlx_dispatcher.c > > +++ b/osm/libvendor/osm_vendor_mlx_dispatcher.c > > @@ -38,6 +38,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/libvendor/osm_vendor_mlx_hca.c > > b/osm/libvendor/osm_vendor_mlx_hca.c > > index bb120ac..c0dca86 100644 > > --- a/osm/libvendor/osm_vendor_mlx_hca.c > > +++ b/osm/libvendor/osm_vendor_mlx_hca.c > > @@ -39,6 +39,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #if defined(OSM_VENDOR_INTF_MTL) | defined(OSM_VENDOR_INTF_TS) > > #undef IN > > #undef OUT > > diff --git a/osm/libvendor/osm_vendor_mlx_hca_anafa.c > > b/osm/libvendor/osm_vendor_mlx_hca_anafa.c > > index 5045563..8f87225 100644 > > --- a/osm/libvendor/osm_vendor_mlx_hca_anafa.c > > +++ b/osm/libvendor/osm_vendor_mlx_hca_anafa.c > > @@ -44,6 +44,7 @@ #undef IN > > #undef OUT > > > > #include > > +#include > > > > #include > > #include > > diff --git a/osm/libvendor/osm_vendor_mlx_ibmgt.c > > b/osm/libvendor/osm_vendor_mlx_ibmgt.c > > index 117ad12..ace790b 100644 > > --- a/osm/libvendor/osm_vendor_mlx_ibmgt.c > > +++ b/osm/libvendor/osm_vendor_mlx_ibmgt.c > > @@ -46,7 +46,9 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > +#include > > #include > > #include > > #include > > diff --git a/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c > > b/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c > > index 69708c9..df250e2 100644 > > --- a/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c > > +++ b/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c > > @@ -38,6 +38,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/libvendor/osm_vendor_mlx_sa.c > > b/osm/libvendor/osm_vendor_mlx_sa.c > > index 85fd810..212344a 100644 > > --- a/osm/libvendor/osm_vendor_mlx_sa.c > > +++ b/osm/libvendor/osm_vendor_mlx_sa.c > > @@ -40,6 +40,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > +#include > > #include > > #include > > #include > > diff --git a/osm/libvendor/osm_vendor_mlx_sar.c > > b/osm/libvendor/osm_vendor_mlx_sar.c > > index 5b0bd70..f6b6405 100644 > > --- a/osm/libvendor/osm_vendor_mlx_sar.c > > +++ b/osm/libvendor/osm_vendor_mlx_sar.c > > @@ -38,8 +38,10 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > +#include > > +#include > > > > ib_api_status_t > > osmv_rmpp_sar_init(osmv_rmpp_sar_t* p_sar, void* p_arbt_mad, > > diff --git a/osm/libvendor/osm_vendor_mlx_sender.c > > b/osm/libvendor/osm_vendor_mlx_sender.c > > index 3317702..e1ed0a0 100644 > > --- a/osm/libvendor/osm_vendor_mlx_sender.c > > +++ b/osm/libvendor/osm_vendor_mlx_sender.c > > @@ -38,6 +38,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/libvendor/osm_vendor_mlx_sim.c > > b/osm/libvendor/osm_vendor_mlx_sim.c > > index b927f2f..ba81e03 100644 > > --- a/osm/libvendor/osm_vendor_mlx_sim.c > > +++ b/osm/libvendor/osm_vendor_mlx_sim.c > > @@ -51,12 +51,14 @@ #include > > #include > > #include > > #include > > +#include > > > > #include > > #include > > #include > > #include > > > > +#include > > /* the simulator messages definition */ > > #include > > > > diff --git a/osm/libvendor/osm_vendor_mlx_ts.c > > b/osm/libvendor/osm_vendor_mlx_ts.c > > index 483b69b..a32173e 100644 > > --- a/osm/libvendor/osm_vendor_mlx_ts.c > > +++ b/osm/libvendor/osm_vendor_mlx_ts.c > > @@ -51,12 +51,14 @@ #include > > #include > > #include > > #include > > +#include > > > > #include > > #include > > #include > > #include > > > > +#include > > #include > > > > typedef struct _osmv_TOPSPIN_transport_mgr_ { > > diff --git a/osm/libvendor/osm_vendor_mlx_ts_anafa.c > > b/osm/libvendor/osm_vendor_mlx_ts_anafa.c > > index dd3c462..a9395df 100644 > > --- a/osm/libvendor/osm_vendor_mlx_ts_anafa.c > > +++ b/osm/libvendor/osm_vendor_mlx_ts_anafa.c > > @@ -52,6 +52,7 @@ #include > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -59,6 +60,7 @@ #include > #include > > #include > > > > +#include > > #include > > > > static void > > diff --git a/osm/libvendor/osm_vendor_mtl.c > b/osm/libvendor/osm_vendor_mtl.c > > index f9b2284..82a68de 100644 > > --- a/osm/libvendor/osm_vendor_mtl.c > > +++ b/osm/libvendor/osm_vendor_mtl.c > > @@ -43,6 +43,8 @@ #include > > > > #ifdef OSM_VENDOR_INTF_MTL > > > > +#include > > +#include > > #include > > #include > > /* HACK - I do not know how to prevent complib from loading kernel H > files */ > > diff --git a/osm/libvendor/osm_vendor_mtl_transaction_mgr.c > > b/osm/libvendor/osm_vendor_mtl_transaction_mgr.c > > index 997eb37..2b1c960 100644 > > --- a/osm/libvendor/osm_vendor_mtl_transaction_mgr.c > > +++ b/osm/libvendor/osm_vendor_mtl_transaction_mgr.c > > @@ -40,6 +40,7 @@ # include > > #endif /* HAVE_CONFIG_H */ > > > > #include > > +#include > > #include > > #include > > #include > > diff --git a/osm/libvendor/osm_vendor_test.c > b/osm/libvendor/osm_vendor_test.c > > index ecacc67..013262e 100644 > > --- a/osm/libvendor/osm_vendor_test.c > > +++ b/osm/libvendor/osm_vendor_test.c > > @@ -56,6 +56,7 @@ #include > > > > #ifdef OSM_VENDOR_INTF_TEST > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/libvendor/osm_vendor_ts.c > b/osm/libvendor/osm_vendor_ts.c > > index 16d52e2..fa51382 100644 > > --- a/osm/libvendor/osm_vendor_ts.c > > +++ b/osm/libvendor/osm_vendor_ts.c > > @@ -40,8 +40,10 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > +#include > > #include > > #include > > > > diff --git a/osm/libvendor/osm_vendor_umadt.c > > b/osm/libvendor/osm_vendor_umadt.c > > index 01d9b10..e27801a 100644 > > --- a/osm/libvendor/osm_vendor_umadt.c > > +++ b/osm/libvendor/osm_vendor_umadt.c > > @@ -61,6 +61,7 @@ #ifdef OSM_VENDOR_INTF_UMADT > > > > #include > > #include > > +#include > > > > #include > > #include > > diff --git a/osm/opensm/osm_db_files.c b/osm/opensm/osm_db_files.c > > index a8e82a7..930aaef 100644 > > --- a/osm/opensm/osm_db_files.c > > +++ b/osm/opensm/osm_db_files.c > > @@ -46,11 +46,13 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > -#include > > #include > > #include > > #include > > +#include > > +#include > > +#include > > +#include > > > > /****d* Database/OSM_DB_MAX_LINE_LEN > > * NAME > > diff --git a/osm/opensm/osm_db_pack.c b/osm/opensm/osm_db_pack.c > > index 3f90397..b93ac84 100644 > > --- a/osm/opensm/osm_db_pack.c > > +++ b/osm/opensm/osm_db_pack.c > > @@ -40,6 +40,7 @@ # include > > #endif /* HAVE_CONFIG_H */ > > > > #include > > +#include > > #include > > #include > > static inline void > > diff --git a/osm/opensm/osm_drop_mgr.c b/osm/opensm/osm_drop_mgr.c > > index 470e5df..929088a 100644 > > --- a/osm/opensm/osm_drop_mgr.c > > +++ b/osm/opensm/osm_drop_mgr.c > > @@ -51,7 +51,9 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_fwd_tbl.c b/osm/opensm/osm_fwd_tbl.c > > index 852e048..ee32194 100644 > > --- a/osm/opensm/osm_fwd_tbl.c > > +++ b/osm/opensm/osm_fwd_tbl.c > > @@ -51,7 +51,6 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_helper.c b/osm/opensm/osm_helper.c > > index e54644b..3886609 100644 > > --- a/osm/opensm/osm_helper.c > > +++ b/osm/opensm/osm_helper.c > > @@ -51,7 +51,7 @@ #endif /* HAVE_CONFIG_H */ > > > > #include > > #include > > -#include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_inform.c b/osm/opensm/osm_inform.c > > index f20b068..172190c 100644 > > --- a/osm/opensm/osm_inform.c > > +++ b/osm/opensm/osm_inform.c > > @@ -49,6 +49,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_lid_mgr.c b/osm/opensm/osm_lid_mgr.c > > index 31d0be4..a33a420 100644 > > --- a/osm/opensm/osm_lid_mgr.c > > +++ b/osm/opensm/osm_lid_mgr.c > > @@ -90,6 +90,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_lin_fwd_rcv.c > b/osm/opensm/osm_lin_fwd_rcv.c > > index 8ae7da8..339fe11 100644 > > --- a/osm/opensm/osm_lin_fwd_rcv.c > > +++ b/osm/opensm/osm_lin_fwd_rcv.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_lin_fwd_rcv_ctrl.c > > b/osm/opensm/osm_lin_fwd_rcv_ctrl.c > > index 4e915e7..987440d 100644 > > --- a/osm/opensm/osm_lin_fwd_rcv_ctrl.c > > +++ b/osm/opensm/osm_lin_fwd_rcv_ctrl.c > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_lin_fwd_tbl.c > b/osm/opensm/osm_lin_fwd_tbl.c > > index f8a6b87..3b4895f 100644 > > --- a/osm/opensm/osm_lin_fwd_tbl.c > > +++ b/osm/opensm/osm_lin_fwd_tbl.c > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_link_mgr.c b/osm/opensm/osm_link_mgr.c > > index c8307d3..87e9e46 100644 > > --- a/osm/opensm/osm_link_mgr.c > > +++ b/osm/opensm/osm_link_mgr.c > > @@ -50,8 +50,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_mad_pool.c b/osm/opensm/osm_mad_pool.c > > index 72f9db8..12ecabf 100644 > > --- a/osm/opensm/osm_mad_pool.c > > +++ b/osm/opensm/osm_mad_pool.c > > @@ -52,6 +52,7 @@ # include > > #endif /* HAVE_CONFIG_H */ > > > > #include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_matrix.c b/osm/opensm/osm_matrix.c > > index 3efb0bd..073d9b8 100644 > > --- a/osm/opensm/osm_matrix.c > > +++ b/osm/opensm/osm_matrix.c > > @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > > > > > diff --git a/osm/opensm/osm_mcast_fwd_rcv.c > b/osm/opensm/osm_mcast_fwd_rcv.c > > index 73763f5..d0ffa59 100644 > > --- a/osm/opensm/osm_mcast_fwd_rcv.c > > +++ b/osm/opensm/osm_mcast_fwd_rcv.c > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_mcast_fwd_rcv_ctrl.c > > b/osm/opensm/osm_mcast_fwd_rcv_ctrl.c > > index a6f46fd..9201ecf 100644 > > --- a/osm/opensm/osm_mcast_fwd_rcv_ctrl.c > > +++ b/osm/opensm/osm_mcast_fwd_rcv_ctrl.c > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c > > index f729c61..96d3b0f 100644 > > --- a/osm/opensm/osm_mcast_mgr.c > > +++ b/osm/opensm/osm_mcast_mgr.c > > @@ -50,6 +50,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_mcast_tbl.c b/osm/opensm/osm_mcast_tbl.c > > index 401d97c..b8fa325 100644 > > --- a/osm/opensm/osm_mcast_tbl.c > > +++ b/osm/opensm/osm_mcast_tbl.c > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_mcm_info.c b/osm/opensm/osm_mcm_info.c > > index 08c0d12..a5ac7f3 100644 > > --- a/osm/opensm/osm_mcm_info.c > > +++ b/osm/opensm/osm_mcm_info.c > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > > > > /********************************************************************** > > diff --git a/osm/opensm/osm_mcm_port.c b/osm/opensm/osm_mcm_port.c > > index e92ad76..16ed84e 100644 > > --- a/osm/opensm/osm_mcm_port.c > > +++ b/osm/opensm/osm_mcm_port.c > > @@ -51,6 +51,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > +#include > > #include > > > > > /********************************************************************** > > diff --git a/osm/opensm/osm_mtree.c b/osm/opensm/osm_mtree.c > > index f9d82d6..421e39e 100644 > > --- a/osm/opensm/osm_mtree.c > > +++ b/osm/opensm/osm_mtree.c > > @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_multicast.c b/osm/opensm/osm_multicast.c > > index 2256741..690f7df 100644 > > --- a/osm/opensm/osm_multicast.c > > +++ b/osm/opensm/osm_multicast.c > > @@ -49,6 +49,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_node_desc_rcv.c > b/osm/opensm/osm_node_desc_rcv.c > > index 62fe034..f9fa22d 100644 > > --- a/osm/opensm/osm_node_desc_rcv.c > > +++ b/osm/opensm/osm_node_desc_rcv.c > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_node_desc_rcv_ctrl.c > > b/osm/opensm/osm_node_desc_rcv_ctrl.c > > index 9f689e2..3f26b83 100644 > > --- a/osm/opensm/osm_node_desc_rcv_ctrl.c > > +++ b/osm/opensm/osm_node_desc_rcv_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_node_info_rcv.c > b/osm/opensm/osm_node_info_rcv.c > > index c35e2b7..59257a0 100644 > > --- a/osm/opensm/osm_node_info_rcv.c > > +++ b/osm/opensm/osm_node_info_rcv.c > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_node_info_rcv_ctrl.c > > b/osm/opensm/osm_node_info_rcv_ctrl.c > > index 478f9c4..cbff6ce 100644 > > --- a/osm/opensm/osm_node_info_rcv_ctrl.c > > +++ b/osm/opensm/osm_node_info_rcv_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_opensm.c b/osm/opensm/osm_opensm.c > > index 2a8e0f8..8c422b5 100644 > > --- a/osm/opensm/osm_opensm.c > > +++ b/osm/opensm/osm_opensm.c > > @@ -53,7 +53,7 @@ #endif /* HAVE_CONFIG_H */ > > > > #include > > #include > > -#include > > +#include > > #include > > #include > > #include > > @@ -130,8 +130,6 @@ osm_opensm_destroy( > > > > cl_plock_destroy( &p_osm->lock ); > > > > - cl_mem_display( ); > > - > > osm_log_destroy( &p_osm->log ); > > } > > > > diff --git a/osm/opensm/osm_pkey.c b/osm/opensm/osm_pkey.c > > index b0cb869..5ecfdd9 100644 > > --- a/osm/opensm/osm_pkey.c > > +++ b/osm/opensm/osm_pkey.c > > @@ -51,6 +51,7 @@ #endif /* HAVE_CONFIG_H */ > > > > #include > > #include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_pkey_mgr.c b/osm/opensm/osm_pkey_mgr.c > > index f98d13b..e08b7cc 100644 > > --- a/osm/opensm/osm_pkey_mgr.c > > +++ b/osm/opensm/osm_pkey_mgr.c > > @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_pkey_rcv.c b/osm/opensm/osm_pkey_rcv.c > > index 8696dc4..5262a6b 100644 > > --- a/osm/opensm/osm_pkey_rcv.c > > +++ b/osm/opensm/osm_pkey_rcv.c > > @@ -39,8 +39,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_pkey_rcv_ctrl.c > b/osm/opensm/osm_pkey_rcv_ctrl.c > > index 77ebab2..cd4367a 100644 > > --- a/osm/opensm/osm_pkey_rcv_ctrl.c > > +++ b/osm/opensm/osm_pkey_rcv_ctrl.c > > @@ -43,7 +43,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_port.c b/osm/opensm/osm_port.c > > index f8c51e8..53ab006 100644 > > --- a/osm/opensm/osm_port.c > > +++ b/osm/opensm/osm_port.c > > @@ -52,6 +52,7 @@ # include > > #endif /* HAVE_CONFIG_H */ > > > > #include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_port_info_rcv.c > b/osm/opensm/osm_port_info_rcv.c > > index 119bcbd..a08c57c 100644 > > --- a/osm/opensm/osm_port_info_rcv.c > > +++ b/osm/opensm/osm_port_info_rcv.c > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_port_info_rcv_ctrl.c > > b/osm/opensm/osm_port_info_rcv_ctrl.c > > index 9f6001f..303bedb 100644 > > --- a/osm/opensm/osm_port_info_rcv_ctrl.c > > +++ b/osm/opensm/osm_port_info_rcv_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_prtn.c b/osm/opensm/osm_prtn.c > > index 26790b4..8b748c4 100644 > > --- a/osm/opensm/osm_prtn.c > > +++ b/osm/opensm/osm_prtn.c > > @@ -54,6 +54,7 @@ #include > > #include > > > > #include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_qos.c b/osm/opensm/osm_qos.c > > index cd5c26a..c23ef87 100644 > > --- a/osm/opensm/osm_qos.c > > +++ b/osm/opensm/osm_qos.c > > @@ -46,6 +46,7 @@ # include > > #endif /* HAVE_CONFIG_H */ > > > > #include > > +#include > > > > #include > > #include > > diff --git a/osm/opensm/osm_remote_sm.c b/osm/opensm/osm_remote_sm.c > > index eb65d22..b91264e 100644 > > --- a/osm/opensm/osm_remote_sm.c > > +++ b/osm/opensm/osm_remote_sm.c > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > > > > /********************************************************************** > > > **********************************************************************/ > > diff --git a/osm/opensm/osm_req.c b/osm/opensm/osm_req.c > > index 9ddc9e9..534694b 100644 > > --- a/osm/opensm/osm_req.c > > +++ b/osm/opensm/osm_req.c > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_req_ctrl.c b/osm/opensm/osm_req_ctrl.c > > index 708e7c9..2d0e7e0 100644 > > --- a/osm/opensm/osm_req_ctrl.c > > +++ b/osm/opensm/osm_req_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_resp.c b/osm/opensm/osm_resp.c > > index 9b5079a..aa60bf2 100644 > > --- a/osm/opensm/osm_resp.c > > +++ b/osm/opensm/osm_resp.c > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa.c b/osm/opensm/osm_sa.c > > index b33431c..fa7dad8 100644 > > --- a/osm/opensm/osm_sa.c > > +++ b/osm/opensm/osm_sa.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_class_port_info.c > > b/osm/opensm/osm_sa_class_port_info.c > > index 389bc9c..cfad739 100644 > > --- a/osm/opensm/osm_sa_class_port_info.c > > +++ b/osm/opensm/osm_sa_class_port_info.c > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_class_port_info_ctrl.c > > b/osm/opensm/osm_sa_class_port_info_ctrl.c > > index 219a837..c71af4c 100644 > > --- a/osm/opensm/osm_sa_class_port_info_ctrl.c > > +++ b/osm/opensm/osm_sa_class_port_info_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sa_guidinfo_record.c > > b/osm/opensm/osm_sa_guidinfo_record.c > > index 7d1eebf..601c809 100644 > > --- a/osm/opensm/osm_sa_guidinfo_record.c > > +++ b/osm/opensm/osm_sa_guidinfo_record.c > > @@ -54,8 +54,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_guidinfo_record_ctrl.c > > b/osm/opensm/osm_sa_guidinfo_record_ctrl.c > > index b252b20..f2211b1 100644 > > --- a/osm/opensm/osm_sa_guidinfo_record_ctrl.c > > +++ b/osm/opensm/osm_sa_guidinfo_record_ctrl.c > > @@ -54,7 +54,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sa_informinfo.c > b/osm/opensm/osm_sa_informinfo.c > > index 149e609..a820dea 100644 > > --- a/osm/opensm/osm_sa_informinfo.c > > +++ b/osm/opensm/osm_sa_informinfo.c > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_informinfo_ctrl.c > > b/osm/opensm/osm_sa_informinfo_ctrl.c > > index 75edabc..31644af 100644 > > --- a/osm/opensm/osm_sa_informinfo_ctrl.c > > +++ b/osm/opensm/osm_sa_informinfo_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sa_lft_record.c > b/osm/opensm/osm_sa_lft_record.c > > index b9b903e..2d17dbe 100644 > > --- a/osm/opensm/osm_sa_lft_record.c > > +++ b/osm/opensm/osm_sa_lft_record.c > > @@ -55,6 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_lft_record_ctrl.c > > b/osm/opensm/osm_sa_lft_record_ctrl.c > > index 0682438..1cc2544 100644 > > --- a/osm/opensm/osm_sa_lft_record_ctrl.c > > +++ b/osm/opensm/osm_sa_lft_record_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sa_link_record.c > b/osm/opensm/osm_sa_link_record.c > > index 1a407e1..a525002 100644 > > --- a/osm/opensm/osm_sa_link_record.c > > +++ b/osm/opensm/osm_sa_link_record.c > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_link_record_ctrl.c > > b/osm/opensm/osm_sa_link_record_ctrl.c > > index 707c184..01db21d 100644 > > --- a/osm/opensm/osm_sa_link_record_ctrl.c > > +++ b/osm/opensm/osm_sa_link_record_ctrl.c > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sa_mad_ctrl.c > b/osm/opensm/osm_sa_mad_ctrl.c > > index 1f87ea2..81584ce 100644 > > --- a/osm/opensm/osm_sa_mad_ctrl.c > > +++ b/osm/opensm/osm_sa_mad_ctrl.c > > @@ -50,7 +50,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_mcmember_record.c > > b/osm/opensm/osm_sa_mcmember_record.c > > index 291fbf5..5129231 100644 > > --- a/osm/opensm/osm_sa_mcmember_record.c > > +++ b/osm/opensm/osm_sa_mcmember_record.c > > @@ -55,6 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_mcmember_record_ctrl.c > > b/osm/opensm/osm_sa_mcmember_record_ctrl.c > > index 99a779a..a583979 100644 > > --- a/osm/opensm/osm_sa_mcmember_record_ctrl.c > > +++ b/osm/opensm/osm_sa_mcmember_record_ctrl.c > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_multipath_record.c > > b/osm/opensm/osm_sa_multipath_record.c > > index bdf53a3..c8efdb4 100644 > > --- a/osm/opensm/osm_sa_multipath_record.c > > +++ b/osm/opensm/osm_sa_multipath_record.c > > @@ -52,8 +52,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_multipath_record_ctrl.c > > b/osm/opensm/osm_sa_multipath_record_ctrl.c > > index 7c0337c..e330bb8 100644 > > --- a/osm/opensm/osm_sa_multipath_record_ctrl.c > > +++ b/osm/opensm/osm_sa_multipath_record_ctrl.c > > @@ -56,7 +56,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sa_node_record.c > b/osm/opensm/osm_sa_node_record.c > > index ecaa048..ac9be22 100644 > > --- a/osm/opensm/osm_sa_node_record.c > > +++ b/osm/opensm/osm_sa_node_record.c > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_node_record_ctrl.c > > b/osm/opensm/osm_sa_node_record_ctrl.c > > index dcf5944..61b363a 100644 > > --- a/osm/opensm/osm_sa_node_record_ctrl.c > > +++ b/osm/opensm/osm_sa_node_record_ctrl.c > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sa_path_record.c > b/osm/opensm/osm_sa_path_record.c > > index 1e4a137..7da6d70 100644 > > --- a/osm/opensm/osm_sa_path_record.c > > +++ b/osm/opensm/osm_sa_path_record.c > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_path_record_ctrl.c > > b/osm/opensm/osm_sa_path_record_ctrl.c > > index eab7171..9495785 100644 > > --- a/osm/opensm/osm_sa_path_record_ctrl.c > > +++ b/osm/opensm/osm_sa_path_record_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sa_pkey_record.c > b/osm/opensm/osm_sa_pkey_record.c > > index e60466b..0eeb0c0 100644 > > --- a/osm/opensm/osm_sa_pkey_record.c > > +++ b/osm/opensm/osm_sa_pkey_record.c > > @@ -43,8 +43,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_pkey_record_ctrl.c > > b/osm/opensm/osm_sa_pkey_record_ctrl.c > > index 01cdc0f..a9d8a8d 100644 > > --- a/osm/opensm/osm_sa_pkey_record_ctrl.c > > +++ b/osm/opensm/osm_sa_pkey_record_ctrl.c > > @@ -43,7 +43,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sa_portinfo_record.c > > b/osm/opensm/osm_sa_portinfo_record.c > > index 3acb8c9..e1ca873 100644 > > --- a/osm/opensm/osm_sa_portinfo_record.c > > +++ b/osm/opensm/osm_sa_portinfo_record.c > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_portinfo_record_ctrl.c > > b/osm/opensm/osm_sa_portinfo_record_ctrl.c > > index 831843b..4f53f04 100644 > > --- a/osm/opensm/osm_sa_portinfo_record_ctrl.c > > +++ b/osm/opensm/osm_sa_portinfo_record_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sa_response.c > b/osm/opensm/osm_sa_response.c > > index 30f561f..03c94f7 100644 > > --- a/osm/opensm/osm_sa_response.c > > +++ b/osm/opensm/osm_sa_response.c > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_service_record.c > > b/osm/opensm/osm_sa_service_record.c > > index 38ee80b..a65e41d 100644 > > --- a/osm/opensm/osm_sa_service_record.c > > +++ b/osm/opensm/osm_sa_service_record.c > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_service_record_ctrl.c > > b/osm/opensm/osm_sa_service_record_ctrl.c > > index 5f8c936..8af9cd7 100644 > > --- a/osm/opensm/osm_sa_service_record_ctrl.c > > +++ b/osm/opensm/osm_sa_service_record_ctrl.c > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sa_slvl_record.c > b/osm/opensm/osm_sa_slvl_record.c > > index 237b99c..5d1928e 100644 > > --- a/osm/opensm/osm_sa_slvl_record.c > > +++ b/osm/opensm/osm_sa_slvl_record.c > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_slvl_record_ctrl.c > > b/osm/opensm/osm_sa_slvl_record_ctrl.c > > index d156bf1..7801508 100644 > > --- a/osm/opensm/osm_sa_slvl_record_ctrl.c > > +++ b/osm/opensm/osm_sa_slvl_record_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sa_sminfo_record.c > > b/osm/opensm/osm_sa_sminfo_record.c > > index 9c3f436..b9dee38 100644 > > --- a/osm/opensm/osm_sa_sminfo_record.c > > +++ b/osm/opensm/osm_sa_sminfo_record.c > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_sminfo_record_ctrl.c > > b/osm/opensm/osm_sa_sminfo_record_ctrl.c > > index 72c2fad..3b07920 100644 > > --- a/osm/opensm/osm_sa_sminfo_record_ctrl.c > > +++ b/osm/opensm/osm_sa_sminfo_record_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sa_vlarb_record.c > b/osm/opensm/osm_sa_vlarb_record.c > > index ddbef9c..059e5a9 100644 > > --- a/osm/opensm/osm_sa_vlarb_record.c > > +++ b/osm/opensm/osm_sa_vlarb_record.c > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sa_vlarb_record_ctrl.c > > b/osm/opensm/osm_sa_vlarb_record_ctrl.c > > index f7ad3ed..a243e08 100644 > > --- a/osm/opensm/osm_sa_vlarb_record_ctrl.c > > +++ b/osm/opensm/osm_sa_vlarb_record_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_service.c b/osm/opensm/osm_service.c > > index 723e117..a1309d3 100644 > > --- a/osm/opensm/osm_service.c > > +++ b/osm/opensm/osm_service.c > > @@ -49,6 +49,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_slvl_map_rcv.c > b/osm/opensm/osm_slvl_map_rcv.c > > index 9a6acf5..33c3d45 100644 > > --- a/osm/opensm/osm_slvl_map_rcv.c > > +++ b/osm/opensm/osm_slvl_map_rcv.c > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_slvl_map_rcv_ctrl.c > > b/osm/opensm/osm_slvl_map_rcv_ctrl.c > > index ee357da..4da0eff 100644 > > --- a/osm/opensm/osm_slvl_map_rcv_ctrl.c > > +++ b/osm/opensm/osm_slvl_map_rcv_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sm.c b/osm/opensm/osm_sm.c > > index f6e33c5..0e09f26 100644 > > --- a/osm/opensm/osm_sm.c > > +++ b/osm/opensm/osm_sm.c > > @@ -55,6 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sm_mad_ctrl.c > b/osm/opensm/osm_sm_mad_ctrl.c > > index 1b90335..9dceef2 100644 > > --- a/osm/opensm/osm_sm_mad_ctrl.c > > +++ b/osm/opensm/osm_sm_mad_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sm_state_mgr.c > b/osm/opensm/osm_sm_state_mgr.c > > index a881f7f..8ae9889 100644 > > --- a/osm/opensm/osm_sm_state_mgr.c > > +++ b/osm/opensm/osm_sm_state_mgr.c > > @@ -50,8 +50,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sminfo_rcv.c b/osm/opensm/osm_sminfo_rcv.c > > index e5c4bbb..5914984 100644 > > --- a/osm/opensm/osm_sminfo_rcv.c > > +++ b/osm/opensm/osm_sminfo_rcv.c > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sminfo_rcv_ctrl.c > b/osm/opensm/osm_sminfo_rcv_ctrl.c > > index 76ae65c..327d7eb 100644 > > --- a/osm/opensm/osm_sminfo_rcv_ctrl.c > > +++ b/osm/opensm/osm_sminfo_rcv_ctrl.c > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c > > index c97875c..97b017d 100644 > > --- a/osm/opensm/osm_state_mgr.c > > +++ b/osm/opensm/osm_state_mgr.c > > @@ -50,7 +50,9 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_state_mgr_ctrl.c > b/osm/opensm/osm_state_mgr_ctrl.c > > index a7afc46..0bde333 100644 > > --- a/osm/opensm/osm_state_mgr_ctrl.c > > +++ b/osm/opensm/osm_state_mgr_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c > > index 9b4bcfe..c251411 100644 > > --- a/osm/opensm/osm_subnet.c > > +++ b/osm/opensm/osm_subnet.c > > @@ -51,6 +51,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sw_info_rcv.c > b/osm/opensm/osm_sw_info_rcv.c > > index 7a1f72f..6bbd73a 100644 > > --- a/osm/opensm/osm_sw_info_rcv.c > > +++ b/osm/opensm/osm_sw_info_rcv.c > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_sw_info_rcv_ctrl.c > > b/osm/opensm/osm_sw_info_rcv_ctrl.c > > index a97a7dc..fb8fe50 100644 > > --- a/osm/opensm/osm_sw_info_rcv_ctrl.c > > +++ b/osm/opensm/osm_sw_info_rcv_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_sweep_fail_ctrl.c > b/osm/opensm/osm_sweep_fail_ctrl.c > > index 022988a..e27a540 100644 > > --- a/osm/opensm/osm_sweep_fail_ctrl.c > > +++ b/osm/opensm/osm_sweep_fail_ctrl.c > > @@ -49,7 +49,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c > > index fa726c6..7e89475 100644 > > --- a/osm/opensm/osm_switch.c > > +++ b/osm/opensm/osm_switch.c > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_trap_rcv.c b/osm/opensm/osm_trap_rcv.c > > index 7e39832..9865f53 100644 > > --- a/osm/opensm/osm_trap_rcv.c > > +++ b/osm/opensm/osm_trap_rcv.c > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_trap_rcv_ctrl.c > b/osm/opensm/osm_trap_rcv_ctrl.c > > index 1e6bf45..ee5a1a4 100644 > > --- a/osm/opensm/osm_trap_rcv_ctrl.c > > +++ b/osm/opensm/osm_trap_rcv_ctrl.c > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c > > index 4492c1a..95f4d04 100644 > > --- a/osm/opensm/osm_ucast_mgr.c > > +++ b/osm/opensm/osm_ucast_mgr.c > > @@ -54,6 +54,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c > > index b70cf21..44e1993 100644 > > --- a/osm/opensm/osm_ucast_updn.c > > +++ b/osm/opensm/osm_ucast_updn.c > > @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_vl15intf.c b/osm/opensm/osm_vl15intf.c > > index f72620b..68f17c5 100644 > > --- a/osm/opensm/osm_vl15intf.c > > +++ b/osm/opensm/osm_vl15intf.c > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_vl_arb_rcv.c b/osm/opensm/osm_vl_arb_rcv.c > > index 70fd5ed..e33a2f9 100644 > > --- a/osm/opensm/osm_vl_arb_rcv.c > > +++ b/osm/opensm/osm_vl_arb_rcv.c > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > #include > > -#include > > #include > > #include > > #include > > diff --git a/osm/opensm/osm_vl_arb_rcv_ctrl.c > b/osm/opensm/osm_vl_arb_rcv_ctrl.c > > index 9113985..f1f22c7 100644 > > --- a/osm/opensm/osm_vl_arb_rcv_ctrl.c > > +++ b/osm/opensm/osm_vl_arb_rcv_ctrl.c > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > -#include > > +#include > > #include > > #include > > > > diff --git a/osm/osmtest/include/osmtest_subnet.h > > b/osm/osmtest/include/osmtest_subnet.h > > index 0e7cf3e..277a2aa 100644 > > --- a/osm/osmtest/include/osmtest_subnet.h > > +++ b/osm/osmtest/include/osmtest_subnet.h > > @@ -47,6 +47,7 @@ > > #ifndef _OSMTEST_SUBNET_H_ > > #define _OSMTEST_SUBNET_H_ > > > > +#include > > #include > > #include > > #include > > diff --git a/osm/osmtest/osmt_inform.c b/osm/osmtest/osmt_inform.c > > index b24ae30..e1562db 100644 > > --- a/osm/osmtest/osmt_inform.c > > +++ b/osm/osmtest/osmt_inform.c > > @@ -56,7 +56,6 @@ #include > > #include > > #include > > #include > > -#include > > > > #include > > #include "osmtest.h" > > diff --git a/osm/osmtest/osmt_slvl_vl_arb.c > b/osm/osmtest/osmt_slvl_vl_arb.c > > index 6cb8377..9fc84f6 100644 > > --- a/osm/osmtest/osmt_slvl_vl_arb.c > > +++ b/osm/osmtest/osmt_slvl_vl_arb.c > > @@ -54,7 +54,6 @@ #include > > #include > > #include > > #include > > -#include > > #include "osmtest.h" > > > > > /********************************************************************** > > diff --git a/osm/osmtest/osmtest.c b/osm/osmtest/osmtest.c > > index 78aff53..5eb5482 100644 > > --- a/osm/osmtest/osmtest.c > > +++ b/osm/osmtest/osmtest.c > > @@ -56,8 +56,8 @@ #endif > > > > #include > > #include > > -#ifdef __WIN__ > > #include > > +#ifdef __WIN__ > > #include > > #else > > #include > > -- > > 1.3.2 > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Thu May 18 08:20:31 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 18 May 2006 08:20:31 -0700 Subject: [openib-general] OpenIB 1.0 RC + PathScale problem References: <1147927964.3094.8.camel@localhost.localdomain> <200605181014.58567.jackm@mellanox.co.il> Message-ID: Jack> in openib 1.0 RC4, the function name was still Jack> ib_cmd_poll_cq. In the upcoming RC5, both names are Jack> supported. How did that happen? As far as I can tell the function was named ibv_cmd_poll_cq right from the start, when it was added in svn rev 3783. - R. From btmiller at helix.nih.gov Thu May 18 08:24:38 2006 From: btmiller at helix.nih.gov (Tim Miller) Date: Thu, 18 May 2006 11:24:38 -0400 Subject: [openib-general] OpenIB 1.0 RC + PathScale problem In-Reply-To: References: Message-ID: On Thu, 18 May 2006, Di Domenico, Michael wrote: > I'm certainly no expert, but I came across different but similar issues, > where my applications where picking up another set of libraries, that I > wasn't aware were on the system... I was getting the same 'undefined > symbols' errors. You might want to check for ib libs that might be in > your path. Hi Michael, Thanks to Bryan, Jack, and yourself for responding. I suspect it is a library issue, but I'm having some trouble tracking down the exact source of the problem. The first odd thing that doing an nm of /usr/local/lib/infiniband/ipathverbs.so shows ibv_cmd_poll_cq is indeed undefined (the symbol is defined in libibverbs.so). Here's the ldd output for ipathverbs.so: tim at o8:/usr/local/lib 165$ ldd infiniband/ipathverbs.so libc.so.6 => /lib64/tls/libc.so.6 (0x00002b01d0d00000) /lib64/ld-linux-x86-64.so.2 (0x0000555555554000) Oddly, it doesn't seem to depend on libibverbs. Is that normal? If not, I must be doing something wrong in building the libraries. I tried uninstalling all the userspace stuff (via make uninstall) and reconfiguring/remaking after a make clean in the source dir. I built, in order, libibverbs, libipathvers, and libmthca. Is that the correct way to do things (it's what I got from the quickstart Wiki entry)? FWIW, this is on SuSE 9.3 with kernel 2.6.16.16 custom compiled, but this seems to be entirely an issue with the userspace libs. For completeness, here is the ldd output for libibverbs: tim at o8:/usr/local/lib 166$ ldd libibverbs.so libsysfs.so.1 => /lib64/libsysfs.so.1 (0x00002ac94dbd4000) libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x00002ac94dce0000) libdl.so.2 => /lib64/libdl.so.2 (0x00002ac94ddf5000) libc.so.6 => /lib64/tls/libc.so.6 (0x00002ac94def9000) /lib64/ld-linux-x86-64.so.2 (0x0000555555554000) The nm output is pretty long, but let me know if you want to see it too. Thanks, Tim -- Tim Miller System Administrator -- Laboratory of Computational Biology National Institutes of Health -- Bldg. 50 Rm. 3309 -- 301-402-0618 From mst at mellanox.co.il Thu May 18 08:32:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 18 May 2006 18:32:54 +0300 Subject: [openib-general] [PATCH] mthca: fix posting lists of 256 entries for tavor Message-ID: <20060518153254.GF30211@mellanox.co.il> If we post a list of length 256 exactly, nreq in doorbell gets set to 256 which is wrong: should be encoded by 0. This is because we only zero it out on next WR, which may not be there. The solution is to ring the doorbell after posting a WQE, not before posting the next one. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.16/drivers/infiniband/hw/mthca/mthca_qp.c =================================================================== --- linux-2.6.16/drivers/infiniband/hw/mthca/mthca_qp.c (revision 7261) +++ linux-2.6.16/drivers/infiniband/hw/mthca/mthca_qp.c (working copy) @@ -1725,23 +1725,7 @@ ind = qp->rq.next_ind; - for (nreq = 0; wr; ++nreq, wr = wr->next) { - if (unlikely(nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB)) { - nreq = 0; - - doorbell[0] = cpu_to_be32((qp->rq.next_ind << qp->rq.wqe_shift) | size0); - doorbell[1] = cpu_to_be32(qp->qpn << 8); - - wmb(); - - mthca_write64(doorbell, - dev->kar + MTHCA_RECEIVE_DOORBELL, - MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); - - qp->rq.head += MTHCA_TAVOR_MAX_WQES_PER_RECV_DB; - size0 = 0; - } - + for (nreq = 0; wr; wr = wr->next) { if (mthca_wq_overflow(&qp->rq, nreq, qp->ibqp.recv_cq)) { mthca_err(dev, "RQ %06x full (%u head, %u tail," " %d max, %d nreq)\n", qp->qpn, @@ -1795,6 +1779,22 @@ ++ind; if (unlikely(ind >= qp->rq.max)) ind -= qp->rq.max; + + if (unlikely(++nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB)) { + nreq = 0; + + doorbell[0] = cpu_to_be32((qp->rq.next_ind << qp->rq.wqe_shift) | size0); + doorbell[1] = cpu_to_be32(qp->qpn << 8); + + wmb(); + + mthca_write64(doorbell, + dev->kar + MTHCA_RECEIVE_DOORBELL, + MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); + + qp->rq.head += MTHCA_TAVOR_MAX_WQES_PER_RECV_DB; + size0 = 0; + } } out: -- MST From mst at mellanox.co.il Thu May 18 08:33:08 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 18 May 2006 18:33:08 +0300 Subject: [openib-general] [PATCH] libmthca: fix posting lists with 256 entries Message-ID: <20060518153308.GG30211@mellanox.co.il> Fix posting lists of 256 entries. Same as previous kernel patch. Signed-off-by: Michael S. Tsirkin Index: openib/src/userspace/libmthca/src/qp.c =================================================================== --- openib/src/userspace/libmthca/src/qp.c (revision 7317) +++ openib/src/userspace/libmthca/src/qp.c (working copy) @@ -332,25 +332,7 @@ int mthca_tavor_post_recv(struct ibv_qp ind = qp->rq.next_ind; - for (nreq = 0; wr; ++nreq, wr = wr->next) { - if (nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB) { - nreq = 0; - - doorbell[0] = htonl((qp->rq.next_ind << qp->rq.wqe_shift) | size0); - doorbell[1] = htonl(ibqp->qp_num << 8); - - /* - * Make sure that descriptors are written - * before doorbell is rung. - */ - mb(); - - mthca_write64(doorbell, to_mctx(ibqp->context), MTHCA_RECV_DOORBELL); - - qp->rq.head += MTHCA_TAVOR_MAX_WQES_PER_RECV_DB; - size0 = 0; - } - + for (nreq = 0; wr; wr = wr->next) { if (wq_overflow(&qp->rq, nreq, to_mcq(qp->ibv_qp.recv_cq))) { ret = -1; *bad_wr = wr; @@ -400,6 +382,24 @@ int mthca_tavor_post_recv(struct ibv_qp ++ind; if (ind >= qp->rq.max) ind -= qp->rq.max; + + if (++nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB) { + nreq = 0; + + doorbell[0] = htonl((qp->rq.next_ind << qp->rq.wqe_shift) | size0); + doorbell[1] = htonl(ibqp->qp_num << 8); + + /* + * Make sure that descriptors are written + * before doorbell is rung. + */ + mb(); + + mthca_write64(doorbell, to_mctx(ibqp->context), MTHCA_RECV_DOORBELL); + + qp->rq.head += MTHCA_TAVOR_MAX_WQES_PER_RECV_DB; + size0 = 0; + } } out: -- MST From rheflin at atipa.com Thu May 18 08:51:02 2006 From: rheflin at atipa.com (Roger Heflin) Date: Thu, 18 May 2006 10:51:02 -0500 Subject: [openib-general] openmpi on ib over pathscale (2.6.17-rc4+53 patches) Message-ID: <446C97E6.5030005@atipa.com> Hello, I have been doing some testing with hpl over openmpi over ib on a pathscale card. Previously I had ran the same code over no-mem non-pathscale cards with no apparent issues. Current xhpl starts and runs for a while but gets some odd error messages (this is an improvement with the 53 patches-as before it kernel crashed the second machine everytime on startup). There do appear to be some odd issues upon startup (xhpl won't start initially-hangs forever-but if if "ifdown/up ib0" on both machines then restart it will startup). Here is what the run looks like: [0,1,3][btl_openib_endpoint.c:889:mca_btl_openib_endpoint_create_qp] ibv_create_qp: returned 0 byte(s) for max inline dat a [0,1,3][btl_openib_endpoint.c:889:mca_btl_openib_endpoint_create_qp] ibv_create_qp: returned 0 byte(s) for max inline dat a [0,1,1][btl_openib_endpoint.c:889:mca_btl_openib_endpoint_create_qp] ibv_create_qp: returned 0 byte(s) for max inline dat a [0,1,1][btl_openib_endpoint.c:889:mca_btl_openib_endpoint_create_qp] ibv_create_qp: returned 0 byte(s) for max inline dat a [0,1,0][btl_openib_endpoint.c:889:mca_btl_openib_endpoint_create_qp] ibv_create_qp: returned 0 byte(s) for max inline dat a [0,1,0][btl_openib_endpoint.c:889:mca_btl_openib_endpoint_create_qp] ibv_create_qp: returned 0 byte(s) for max inline dat a [0,1,2][btl_openib_endpoint.c:889:mca_btl_openib_endpoint_create_qp] ibv_create_qp: returned 0 byte(s) for max inline dat a [0,1,2][btl_openib_endpoint.c:889:mca_btl_openib_endpoint_create_qp] ibv_create_qp: returned 0 byte(s) for max inline dat a ============================================================================ HPLinpack 1.0a -- High-Performance Linpack benchmark -- January 20, 2004 Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK ============================================================================ An explanation of the input/output parameters follows: [0,1,3][btl_openib_endpoint.c:889:mca_btl_openib_endpoint_create_qp] T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. ibv_create_qp: returned 0 byte(s) for max inline data [0,1,3][btl_openib_endpoint.c:889:mca_btl_openib_endpoint_create_qp] ibv_create_qp: returned 0 byte(s) for max inline dat a P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 20480 28672 40960 NB : 64 80 96 112 120 128 136 144 152 160 240 288 PMAP : Row-major process mapping P : 2 Q : 2 PFACT : Left Crout NBMIN : 2 4 NDIV : 2 RFACT : Left Crout BCAST : 1ring 1ringM 2ring 2ringM Blong BlongM DEPTH : 0 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words ---------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual checks will be computed: 1) ||Ax-b||_oo / ( eps * ||A||_1 * N ) 2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) 3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0 [0,1,0][btl_openib_endpoint.c:889:mca_btl_openib_endpoint_create_qp] ibv_create_qp: returned 0 byte(s) for max inline dat a [0,1,0][btl_openib_endpoint.c:889:mca_btl_openib_endpoint_create_qp] ibv_create_qp: returned 0 byte(s) for max inline dat a ============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- WR00L2L2 20480 64 2 2 208.52 2.747e+01 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0190455 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0054105 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0010075 ...... PASSED ============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- WR00L2L4 20480 64 2 2 208.51 2.747e+01 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0206957 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0058793 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0010948 ...... PASSED ============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- WR00L2C2 20480 64 2 2 210.67 2.719e+01 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0190455 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0054105 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0010075 ...... PASSED ============================================================================ T/V N NB P Q Time Gflops ---------------------------------------------------------------------------- WR00L2C4 20480 64 2 2 206.97 2.767e+01 ---------------------------------------------------------------------------- ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0191957 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0054531 ...... PASSED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0010154 ...... PASSED [0,1,0][btl_openib_component.c:722:mca_btl_openib_component_progress] error polling LP CQ with status 12 for wr_id 471123 48205468 opcode 0 [0,1,0][btl_openib_component.c:722:mca_btl_openib_component_progress] error polling LP CQ with status 5 for wr_id 4711234 8205468 opcode 0 [0,1,0][btl_openib_component.c:722:mca_btl_openib_component_progress] error polling LP CQ with status 5 for wr_id 4711234 8271288 opcode 0 At this point in time it appears to be hung. If it restart and re-run it will hang at some different point. The previous run hung after the first step, this run made it 4 steps. All processes are still running (and using cpu) but no output is any longer being returned. ctrl-c will stop it and does stop the processes on both nodes. Rebooting both nodes and starting clean does not seem to change any behavior. The above 3 messages always appear at the time that the hang appears to happen, so they do appear to be related. IP over open ib appears to still be pingable, so IB is still up, there are no abnormal messages in dmesg/messages on either of the 2 machines being used. I appear to be able to duplicate this, and I can collect any information that would help when the hang happens. From a clean reboot it appears to last longer, but in the end the messages look much the same. Roger From swise at opengridcomputing.com Thu May 18 08:59:39 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 18 May 2006 10:59:39 -0500 Subject: [openib-general] [librmdacm] fix rping to return a value different than zero when there is a failure In-Reply-To: <200605181801.10833.dotanb@mellanox.co.il> References: <200605181801.10833.dotanb@mellanox.co.il> Message-ID: <1147967979.18914.29.camel@stevo-desktop> On Thu, 2006-05-18 at 18:01 +0300, Dotan Barak wrote: > Hi. > > Here is a patch to fix this issue in the test. > I couldn't find the parameters for executing this test. can you please send me an a command line example which i can use? > thanks > > [root at bass4 perftest]# rping -? rping -c|s [-vVd] [-S size] [-C count] -a addr -p port -c client side -s server side -v display ping data to stdout -V verbosity -d debug printfs -S size ping data size -C count ping count times -a addr address -p port port To ping 100 packets of size 100 and see something, run this on the server: # rping -s -S100 -a 0.0.0.0 -p 9999 and this on the client: # rping -c -Vv -S 100 -C 100 -a -p 9999 I just noticed that the usage is wrong. -V _validates_ the ping/pong data. patch comments below: > > > Added checks to the return values of all of the functions that may fail > (in order to add this test to the regression system). > > Signed-off-by: Dotan Barak > > > static int rping_accept(struct rping_cb *cb) > @@ -545,7 +554,9 @@ static void *cm_thread(void *arg) > fprintf(stderr, "rdma_get_cm_event err %d\n", ret); > exit(ret); > } > - rping_cma_event_handler(event->id, event); > + ret = rping_cma_event_handler(event->id, event); > + if (ret) > + exit(ret); > rdma_ack_cm_event(event); Won't the process hang on exit if all the events are not acked? I seem to remember this happening when I was debugging this program originally. If it will hang, then fix the patch to ack the events before exiting... > } > } > @@ -559,7 +570,7 @@ static void *cq_thread(void *arg) > > DEBUG_LOG("cq_thread started.\n"); > > - while (1) { > + while (1) { > ret = ibv_get_cq_event(cb->channel, &ev_cq, &ev_ctx); > if (ret) { > fprintf(stderr, "Failed to get cq event!\n"); > @@ -574,7 +585,9 @@ static void *cq_thread(void *arg) > fprintf(stderr, "Failed to set notify!\n"); > exit(ret); > } > - rping_cq_event_handler(cb); > + ret = rping_cq_event_handler(cb); > + if (ret); > + exit(ret); > ibv_ack_cq_events(cb->cq, 1); Same comment. From iod00d at hp.com Thu May 18 09:47:54 2006 From: iod00d at hp.com (Grant Grundler) Date: Thu, 18 May 2006 09:47:54 -0700 Subject: [openib-general] Re: testing IB with unreleased kernels In-Reply-To: <446C066E.2060209@voltaire.com> References: <446B2319.9030204@voltaire.com> <20060517171617.GA6719@esmail.cup.hp.com> <446C066E.2060209@voltaire.com> Message-ID: <20060518164754.GA11371@esmail.cup.hp.com> On Thu, May 18, 2006 at 08:30:22AM +0300, Or Gerlitz wrote: > Please note that both approaches suggested above will not force to test > latest IB code with the under-development kernel... Right. This is an open-source project. You can only "encourage" people to do one thing or another by making the integration harder or easier. And if it's too hard (ie get OFED bits that match a given kernel ABI), too many people just not bother. > So there's no replacement for testing done at least by the openib > maintainers (and distros!!! when they start moving to IB...) for: "at least by the openib maintainers" is a start but not sufficient. That's only a handful of people. > +1 next-kernel-RC-versions downloaded from kernel.org (eg 2.6.17-RCX) > +2 next-next-kernel-branches of infiniband.git (Roland's tree) > > Ofcourse people are busy, and testing is derived from needs.. Yes, people are busy. But people who only "need" are waiting for RHEL4u4, SLES10, et al and probably not even on this list. People who have time, _interest_, and HW usually end up "playing around" (ie testing) with bits if it's not too complicated. thanks, grant From mshefty at ichips.intel.com Thu May 18 09:50:38 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 18 May 2006 09:50:38 -0700 Subject: [openib-general] [PATCH] cma: fix bind to ip In-Reply-To: <20060518133627.GW30211@mellanox.co.il> References: <20060518133627.GW30211@mellanox.co.il> Message-ID: <446CA5DE.1040600@ichips.intel.com> Thanks! Committed with only minor adjustment to spacing. - Sean From swise at opengridcomputing.com Thu May 18 09:56:03 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 18 May 2006 11:56:03 -0500 Subject: [openib-general] [PATCH] - rping usage statement error Message-ID: <1147971363.18914.40.camel@stevo-desktop> Committed r7341. Steve. ---- Fixed usage statement. Signed-off-by: Steve Wise Index: rping.c =================================================================== --- rping.c (revision 7337) +++ rping.c (working copy) @@ -931,7 +931,7 @@ printf("\t-c\t\tclient side\n"); printf("\t-s\t\tserver side\n"); printf("\t-v\t\tdisplay ping data to stdout\n"); - printf("\t-V\t\tverbosity\n"); + printf("\t-V\t\tvalidate ping data\n"); printf("\t-d\t\tdebug printfs\n"); printf("\t-S size \tping data size\n"); printf("\t-C count\tping count times\n"); From rdreier at cisco.com Thu May 18 10:04:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 18 May 2006 10:04:07 -0700 Subject: [openib-general] [PATCH] cma: fix bind to ip In-Reply-To: <446CA5DE.1040600@ichips.intel.com> (Sean Hefty's message of "Thu, 18 May 2006 09:50:38 -0700") References: <20060518133627.GW30211@mellanox.co.il> <446CA5DE.1040600@ichips.intel.com> Message-ID: Sean> Thanks! Committed with only minor adjustment to spacing. - Should I add that commit to what I have queued for 2.6.18? From sean.hefty at intel.com Thu May 18 10:09:17 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 May 2006 10:09:17 -0700 Subject: [openib-general] [PATCH] cma: fix bind to ip In-Reply-To: Message-ID: > Sean> Thanks! Committed with only minor adjustment to spacing. - > >Should I add that commit to what I have queued for 2.6.18? It shouldn't hurt, but isn't strictly needed. The changes are for SDP, and IPv6 support still requires more work. My personal vote would be yes, with the hope that the new SDP might work with the in kernel CMA. - Sean From eitan at mellanox.co.il Thu May 18 10:28:16 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 18 May 2006 20:28:16 +0300 Subject: [openib-general] [PATCH] Replace cl_memory.h by string.h[was:[PATCH] OpenSM: Use memory routines directly and eliminatecl_mem*routines] Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBB5@mtlexch01.mtl.com> I think we need to change major version of libosmcomp such that applications compiled against the previous API will fail to link... Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Thursday, May 18, 2006 6:07 PM > To: Eitan Zahavi > Cc: Sasha Khapyorsky; openib-general at openib.org > Subject: RE: [openib-general] [PATCH] Replace cl_memory.h by > string.h[was:[PATCH] OpenSM: Use memory routines directly and > eliminatecl_mem*routines] > > Hi Eitan, > > On Thu, 2006-05-18 at 11:00, Eitan Zahavi wrote: > > Hi Sasha, Hal, > > > > There several applications (ibis and ibmgtsim) that depend on complib, > > The changes of cleaning up the cl_memory API affect these utilities. > > Can you please provide the list of APIs removed and their replacements ? > > cl_memset -> memset > cl_memclr(x, y) -> memset ( x, 0, y) > cl_memcpy -> memcpy > > Soon cl_malloc/cl_zalloc/cl_free will change (and the memory tracking > will be removed). > > -- Hal > > > Also if we eventually converge on a single complib for windows and linux > > then the Windows stack is going to be affected by these changes too. > > > > EZ > > > > Eitan Zahavi > > Senior Engineering Director, Software Architect > > Mellanox Technologies LTD > > Tel:+972-4-9097208 > > Fax:+972-4-9593245 > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > -----Original Message----- > > > From: openib-general-bounces at openib.org [mailto:openib-general- > > > bounces at openib.org] On Behalf Of Sasha Khapyorsky > > > Sent: Thursday, May 18, 2006 1:03 AM > > > To: Hal Rosenstock > > > Cc: openib-general at openib.org > > > Subject: [openib-general] [PATCH] Replace cl_memory.h by string.h > > [was:[PATCH] > > > OpenSM: Use memory routines directly and eliminatecl_mem* routines] > > > > > > On 12:14 Wed 17 May , Hal Rosenstock wrote: > > > > OpenSM: Use memory routines directly and eliminate cl_mem* routines > > > > as these routines are part of ISO C > > > > > > > > Signed-off-by: Hal Rosenstock > > > > > > Following Hal's cleanup this includes string.h header file for proper > > > mem*() functions prototype definitions where necessary, > > removes/includes > > > cl_memory.h as needed. Also couple of unistd.h additions for close(), > > > sleep() and unlink() calls. > > > > > > Signed-off-by: Sasha Khapyorsky > > > > > > > > > --- > > > > > > osm/complib/cl_event_wheel.c | 1 + > > > osm/complib/cl_map.c | 2 +- > > > osm/complib/cl_memory.c | 1 + > > > osm/complib/cl_perf.c | 2 ++ > > > osm/complib/cl_pool.c | 1 + > > > osm/complib/cl_ptr_vector.c | 1 + > > > osm/complib/cl_threadpool.c | 1 + > > > osm/complib/cl_timer.c | 1 + > > > osm/complib/cl_vector.c | 1 + > > > osm/complib/libosmcomp.map | 3 --- > > > osm/include/complib/cl_byteswap.h | 3 +-- > > > osm/include/complib/cl_memory.h | 1 - > > > osm/include/iba/ib_types.h | 2 +- > > > osm/include/opensm/osm_lin_fwd_tbl.h | 1 + > > > osm/include/opensm/osm_madw.h | 1 + > > > osm/include/opensm/osm_mcm_info.h | 1 + > > > osm/include/opensm/osm_mtree.h | 1 + > > > osm/include/opensm/osm_path.h | 1 + > > > osm/include/opensm/osm_port.h | 1 + > > > osm/include/opensm/osm_port_profile.h | 1 + > > > osm/include/opensm/osm_rand_fwd_tbl.h | 1 + > > > osm/include/vendor/osm_vendor_mlx_svc.h | 2 ++ > > > osm/include/vendor/osm_vendor_mtl.h | 2 -- > > > .../vendor/osm_vendor_mtl_transaction_mgr.h | 1 - > > > osm/include/vendor/osm_vendor_ts.h | 1 - > > > osm/libvendor/osm_pkt_randomizer.c | 2 ++ > > > osm/libvendor/osm_vendor_al.c | 1 + > > > osm/libvendor/osm_vendor_ibumad.c | 10 ++++++---- > > > osm/libvendor/osm_vendor_ibumad_sa.c | 3 +++ > > > osm/libvendor/osm_vendor_mlx.c | 2 ++ > > > osm/libvendor/osm_vendor_mlx_anafa.c | 1 + > > > osm/libvendor/osm_vendor_mlx_dispatcher.c | 1 + > > > osm/libvendor/osm_vendor_mlx_hca.c | 1 + > > > osm/libvendor/osm_vendor_mlx_hca_anafa.c | 1 + > > > osm/libvendor/osm_vendor_mlx_ibmgt.c | 2 ++ > > > osm/libvendor/osm_vendor_mlx_rmpp_ctx.c | 1 + > > > osm/libvendor/osm_vendor_mlx_sa.c | 2 ++ > > > osm/libvendor/osm_vendor_mlx_sar.c | 4 +++- > > > osm/libvendor/osm_vendor_mlx_sender.c | 1 + > > > osm/libvendor/osm_vendor_mlx_sim.c | 2 ++ > > > osm/libvendor/osm_vendor_mlx_ts.c | 2 ++ > > > osm/libvendor/osm_vendor_mlx_ts_anafa.c | 2 ++ > > > osm/libvendor/osm_vendor_mtl.c | 2 ++ > > > osm/libvendor/osm_vendor_mtl_transaction_mgr.c | 1 + > > > osm/libvendor/osm_vendor_test.c | 1 + > > > osm/libvendor/osm_vendor_ts.c | 2 ++ > > > osm/libvendor/osm_vendor_umadt.c | 1 + > > > osm/opensm/osm_db_files.c | 6 ++++-- > > > osm/opensm/osm_db_pack.c | 1 + > > > osm/opensm/osm_drop_mgr.c | 2 ++ > > > osm/opensm/osm_fwd_tbl.c | 1 - > > > osm/opensm/osm_helper.c | 2 +- > > > osm/opensm/osm_inform.c | 1 + > > > osm/opensm/osm_lid_mgr.c | 1 + > > > osm/opensm/osm_lin_fwd_rcv.c | 2 +- > > > osm/opensm/osm_lin_fwd_rcv_ctrl.c | 2 +- > > > osm/opensm/osm_lin_fwd_tbl.c | 1 + > > > osm/opensm/osm_link_mgr.c | 2 +- > > > osm/opensm/osm_mad_pool.c | 1 + > > > osm/opensm/osm_matrix.c | 1 + > > > osm/opensm/osm_mcast_fwd_rcv.c | 2 +- > > > osm/opensm/osm_mcast_fwd_rcv_ctrl.c | 2 +- > > > osm/opensm/osm_mcast_mgr.c | 2 ++ > > > osm/opensm/osm_mcast_tbl.c | 1 + > > > osm/opensm/osm_mcm_info.c | 1 + > > > osm/opensm/osm_mcm_port.c | 2 ++ > > > osm/opensm/osm_mtree.c | 1 + > > > osm/opensm/osm_multicast.c | 1 + > > > osm/opensm/osm_node_desc_rcv.c | 2 +- > > > osm/opensm/osm_node_desc_rcv_ctrl.c | 2 +- > > > osm/opensm/osm_node_info_rcv.c | 2 +- > > > osm/opensm/osm_node_info_rcv_ctrl.c | 2 +- > > > osm/opensm/osm_opensm.c | 4 +--- > > > osm/opensm/osm_pkey.c | 1 + > > > osm/opensm/osm_pkey_mgr.c | 1 + > > > osm/opensm/osm_pkey_rcv.c | 2 +- > > > osm/opensm/osm_pkey_rcv_ctrl.c | 2 +- > > > osm/opensm/osm_port.c | 1 + > > > osm/opensm/osm_port_info_rcv.c | 2 +- > > > osm/opensm/osm_port_info_rcv_ctrl.c | 2 +- > > > osm/opensm/osm_prtn.c | 1 + > > > osm/opensm/osm_qos.c | 1 + > > > osm/opensm/osm_remote_sm.c | 2 +- > > > osm/opensm/osm_req.c | 2 +- > > > osm/opensm/osm_req_ctrl.c | 2 +- > > > osm/opensm/osm_resp.c | 2 +- > > > osm/opensm/osm_sa.c | 2 +- > > > osm/opensm/osm_sa_class_port_info.c | 2 +- > > > osm/opensm/osm_sa_class_port_info_ctrl.c | 2 +- > > > osm/opensm/osm_sa_guidinfo_record.c | 2 +- > > > osm/opensm/osm_sa_guidinfo_record_ctrl.c | 2 +- > > > osm/opensm/osm_sa_informinfo.c | 2 +- > > > osm/opensm/osm_sa_informinfo_ctrl.c | 2 +- > > > osm/opensm/osm_sa_lft_record.c | 1 + > > > osm/opensm/osm_sa_lft_record_ctrl.c | 2 +- > > > osm/opensm/osm_sa_link_record.c | 2 +- > > > osm/opensm/osm_sa_link_record_ctrl.c | 2 +- > > > osm/opensm/osm_sa_mad_ctrl.c | 2 +- > > > osm/opensm/osm_sa_mcmember_record.c | 1 + > > > osm/opensm/osm_sa_mcmember_record_ctrl.c | 2 +- > > > osm/opensm/osm_sa_multipath_record.c | 2 +- > > > osm/opensm/osm_sa_multipath_record_ctrl.c | 2 +- > > > osm/opensm/osm_sa_node_record.c | 1 + > > > osm/opensm/osm_sa_node_record_ctrl.c | 2 +- > > > osm/opensm/osm_sa_path_record.c | 2 +- > > > osm/opensm/osm_sa_path_record_ctrl.c | 2 +- > > > osm/opensm/osm_sa_pkey_record.c | 2 +- > > > osm/opensm/osm_sa_pkey_record_ctrl.c | 2 +- > > > osm/opensm/osm_sa_portinfo_record.c | 2 +- > > > osm/opensm/osm_sa_portinfo_record_ctrl.c | 2 +- > > > osm/opensm/osm_sa_response.c | 2 +- > > > osm/opensm/osm_sa_service_record.c | 2 +- > > > osm/opensm/osm_sa_service_record_ctrl.c | 2 +- > > > osm/opensm/osm_sa_slvl_record.c | 2 +- > > > osm/opensm/osm_sa_slvl_record_ctrl.c | 2 +- > > > osm/opensm/osm_sa_sminfo_record.c | 2 +- > > > osm/opensm/osm_sa_sminfo_record_ctrl.c | 2 +- > > > osm/opensm/osm_sa_vlarb_record.c | 2 +- > > > osm/opensm/osm_sa_vlarb_record_ctrl.c | 2 +- > > > osm/opensm/osm_service.c | 1 + > > > osm/opensm/osm_slvl_map_rcv.c | 2 +- > > > osm/opensm/osm_slvl_map_rcv_ctrl.c | 2 +- > > > osm/opensm/osm_sm.c | 1 + > > > osm/opensm/osm_sm_mad_ctrl.c | 2 +- > > > osm/opensm/osm_sm_state_mgr.c | 2 +- > > > osm/opensm/osm_sminfo_rcv.c | 1 + > > > osm/opensm/osm_sminfo_rcv_ctrl.c | 2 +- > > > osm/opensm/osm_state_mgr.c | 2 ++ > > > osm/opensm/osm_state_mgr_ctrl.c | 2 +- > > > osm/opensm/osm_subnet.c | 2 ++ > > > osm/opensm/osm_sw_info_rcv.c | 2 +- > > > osm/opensm/osm_sw_info_rcv_ctrl.c | 2 +- > > > osm/opensm/osm_sweep_fail_ctrl.c | 2 +- > > > osm/opensm/osm_switch.c | 1 + > > > osm/opensm/osm_trap_rcv.c | 2 +- > > > osm/opensm/osm_trap_rcv_ctrl.c | 2 +- > > > osm/opensm/osm_ucast_mgr.c | 2 ++ > > > osm/opensm/osm_ucast_updn.c | 1 + > > > osm/opensm/osm_vl15intf.c | 2 +- > > > osm/opensm/osm_vl_arb_rcv.c | 2 +- > > > osm/opensm/osm_vl_arb_rcv_ctrl.c | 2 +- > > > osm/osmtest/include/osmtest_subnet.h | 1 + > > > osm/osmtest/osmt_inform.c | 1 - > > > osm/osmtest/osmt_slvl_vl_arb.c | 1 - > > > osm/osmtest/osmtest.c | 2 +- > > > 145 files changed, 166 insertions(+), 88 deletions(-) > > > > > > e117de15a67314817a58b6300b432ec9ffa6a0a5 > > > diff --git a/osm/complib/cl_event_wheel.c > > b/osm/complib/cl_event_wheel.c > > > index cf04df7..aaaa53d 100644 > > > --- a/osm/complib/cl_event_wheel.c > > > +++ b/osm/complib/cl_event_wheel.c > > > @@ -40,6 +40,7 @@ # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > #include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/complib/cl_map.c b/osm/complib/cl_map.c > > > index 974b0d3..8962e9a 100644 > > > --- a/osm/complib/cl_map.c > > > +++ b/osm/complib/cl_map.c > > > @@ -70,10 +70,10 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > -#include > > > > > > > > > > > /*********************************************************************** > > ******* > > > diff --git a/osm/complib/cl_memory.c b/osm/complib/cl_memory.c > > > index 49ff45d..a9ae948 100644 > > > --- a/osm/complib/cl_memory.c > > > +++ b/osm/complib/cl_memory.c > > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #define _MEM_DEBUG_MODE_ 0 > > > #ifdef _MEM_DEBUG_MODE_ > > > diff --git a/osm/complib/cl_perf.c b/osm/complib/cl_perf.c > > > index 753eba3..0c8ead2 100644 > > > --- a/osm/complib/cl_perf.c > > > +++ b/osm/complib/cl_perf.c > > > @@ -51,6 +51,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > + > > > /* > > > * Always turn on performance tracking when building this file to > > allow the > > > * performance counter functions to be built into the component > > library. > > > diff --git a/osm/complib/cl_pool.c b/osm/complib/cl_pool.c > > > index cfd2774..3fe07a8 100644 > > > --- a/osm/complib/cl_pool.c > > > +++ b/osm/complib/cl_pool.c > > > @@ -52,6 +52,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/complib/cl_ptr_vector.c b/osm/complib/cl_ptr_vector.c > > > index bddce00..5ab74c3 100644 > > > --- a/osm/complib/cl_ptr_vector.c > > > +++ b/osm/complib/cl_ptr_vector.c > > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/complib/cl_threadpool.c b/osm/complib/cl_threadpool.c > > > index a2f620d..a2a4848 100644 > > > --- a/osm/complib/cl_threadpool.c > > > +++ b/osm/complib/cl_threadpool.c > > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/complib/cl_timer.c b/osm/complib/cl_timer.c > > > index 847545f..b3cc3e9 100644 > > > --- a/osm/complib/cl_timer.c > > > +++ b/osm/complib/cl_timer.c > > > @@ -48,6 +48,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/complib/cl_vector.c b/osm/complib/cl_vector.c > > > index 3e1a757..bcda8e0 100644 > > > --- a/osm/complib/cl_vector.c > > > +++ b/osm/complib/cl_vector.c > > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/complib/libosmcomp.map b/osm/complib/libosmcomp.map > > > index 7a7ee1d..73fb242 100644 > > > --- a/osm/complib/libosmcomp.map > > > +++ b/osm/complib/libosmcomp.map > > > @@ -87,9 +87,6 @@ OSMCOMP_1.0 { > > > __cl_find_mem; > > > __cl_free_trk; > > > __cl_free_ntrk; > > > - cl_memset; > > > - cl_memcpy; > > > - cl_memcmp; > > > __cl_perf_run_calibration; > > > __cl_perf_construct; > > > __cl_perf_init; > > > diff --git a/osm/include/complib/cl_byteswap.h > > b/osm/include/complib/cl_byteswap.h > > > index 932d564..d144ea3 100644 > > > --- a/osm/include/complib/cl_byteswap.h > > > +++ b/osm/include/complib/cl_byteswap.h > > > @@ -51,8 +51,7 @@ > > > #ifndef _CL_BYTESWAP_H_ > > > #define _CL_BYTESWAP_H_ > > > > > > - > > > -#include > > > +#include > > > #include > > > > > > #ifdef __cplusplus > > > diff --git a/osm/include/complib/cl_memory.h > > b/osm/include/complib/cl_memory.h > > > index 9f558ac..4bbf7a2 100644 > > > --- a/osm/include/complib/cl_memory.h > > > +++ b/osm/include/complib/cl_memory.h > > > @@ -52,7 +52,6 @@ #define _CL_MEMORY_H_ > > > > > > > > > #include > > > -#include > > > > > > #ifdef __cplusplus > > > # define BEGIN_C_DECLS extern "C" { > > > diff --git a/osm/include/iba/ib_types.h b/osm/include/iba/ib_types.h > > > index 811d836..b72e810 100644 > > > --- a/osm/include/iba/ib_types.h > > > +++ b/osm/include/iba/ib_types.h > > > @@ -38,9 +38,9 @@ > > > #if !defined(__IB_TYPES_H__) > > > #define __IB_TYPES_H__ > > > > > > +#include > > > #include > > > #include > > > -#include > > > > > > #ifdef __cplusplus > > > # define BEGIN_C_DECLS extern "C" { > > > diff --git a/osm/include/opensm/osm_lin_fwd_tbl.h > > > b/osm/include/opensm/osm_lin_fwd_tbl.h > > > index dee01a9..ca378a8 100644 > > > --- a/osm/include/opensm/osm_lin_fwd_tbl.h > > > +++ b/osm/include/opensm/osm_lin_fwd_tbl.h > > > @@ -50,6 +50,7 @@ > > > #ifndef _OSM_LIN_FWD_TBL_H_ > > > #define _OSM_LIN_FWD_TBL_H_ > > > > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/include/opensm/osm_madw.h > > b/osm/include/opensm/osm_madw.h > > > index 2173957..4fde04c 100644 > > > --- a/osm/include/opensm/osm_madw.h > > > +++ b/osm/include/opensm/osm_madw.h > > > @@ -51,6 +51,7 @@ > > > #ifndef _OSM_MADW_H_ > > > #define _OSM_MADW_H_ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/include/opensm/osm_mcm_info.h > > > b/osm/include/opensm/osm_mcm_info.h > > > index c4d5443..1f325b1 100644 > > > --- a/osm/include/opensm/osm_mcm_info.h > > > +++ b/osm/include/opensm/osm_mcm_info.h > > > @@ -50,6 +50,7 @@ > > > #ifndef _OSM_MCM_INFO_H_ > > > #define _OSM_MCM_INFO_H_ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/include/opensm/osm_mtree.h > > b/osm/include/opensm/osm_mtree.h > > > index 57c894b..013112d 100644 > > > --- a/osm/include/opensm/osm_mtree.h > > > +++ b/osm/include/opensm/osm_mtree.h > > > @@ -51,6 +51,7 @@ > > > #ifndef _OSM_MTREE_H_ > > > #define _OSM_MTREE_H_ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/include/opensm/osm_path.h > > b/osm/include/opensm/osm_path.h > > > index bf1cc67..cb3bb8e 100644 > > > --- a/osm/include/opensm/osm_path.h > > > +++ b/osm/include/opensm/osm_path.h > > > @@ -38,6 +38,7 @@ > > > #ifndef _OSM_PATH_H_ > > > #define _OSM_PATH_H_ > > > > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/include/opensm/osm_port.h > > b/osm/include/opensm/osm_port.h > > > index 46a0064..cf3f6f2 100644 > > > --- a/osm/include/opensm/osm_port.h > > > +++ b/osm/include/opensm/osm_port.h > > > @@ -50,6 +50,7 @@ > > > #ifndef _OSM_PORT_H_ > > > #define _OSM_PORT_H_ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/include/opensm/osm_port_profile.h > > > b/osm/include/opensm/osm_port_profile.h > > > index 9a58115..9c0f7f7 100644 > > > --- a/osm/include/opensm/osm_port_profile.h > > > +++ b/osm/include/opensm/osm_port_profile.h > > > @@ -50,6 +50,7 @@ > > > #ifndef _OSM_PORT_PROFILE_H_ > > > #define _OSM_PORT_PROFILE_H_ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/include/opensm/osm_rand_fwd_tbl.h > > > b/osm/include/opensm/osm_rand_fwd_tbl.h > > > index 1d293e5..fac9ffd 100644 > > > --- a/osm/include/opensm/osm_rand_fwd_tbl.h > > > +++ b/osm/include/opensm/osm_rand_fwd_tbl.h > > > @@ -51,6 +51,7 @@ #ifndef _OSM_RAND_FWD_TBL_H_ > > > #define _OSM_RAND_FWD_TBL_H_ > > > > > > #include > > > +#include > > > #include > > > > > > #ifdef __cplusplus > > > diff --git a/osm/include/vendor/osm_vendor_mlx_svc.h > > > b/osm/include/vendor/osm_vendor_mlx_svc.h > > > index 69d379c..e4897d4 100644 > > > --- a/osm/include/vendor/osm_vendor_mlx_svc.h > > > +++ b/osm/include/vendor/osm_vendor_mlx_svc.h > > > @@ -38,7 +38,9 @@ #ifndef _OSMV_SVC_H_ > > > #define _OSMV_SVC_H_ > > > > > > #include > > > +#include > > > #include > > > +#include > > > #include > > > > > > #ifdef __cplusplus > > > diff --git a/osm/include/vendor/osm_vendor_mtl.h > > > b/osm/include/vendor/osm_vendor_mtl.h > > > index 5837867..218bdf7 100644 > > > --- a/osm/include/vendor/osm_vendor_mtl.h > > > +++ b/osm/include/vendor/osm_vendor_mtl.h > > > @@ -60,10 +60,8 @@ #define OUT > > > #include "iba/ib_types.h" > > > #include "iba/ib_al.h" > > > #include > > > -#include > > > #include > > > #include > > > -#include > > > > > > #ifdef __cplusplus > > > # define BEGIN_C_DECLS extern "C" { > > > diff --git a/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h > > > b/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h > > > index 7bf938d..82d2cc2 100644 > > > --- a/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h > > > +++ b/osm/include/vendor/osm_vendor_mtl_transaction_mgr.h > > > @@ -61,7 +61,6 @@ #include > > > #include > > > #include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/include/vendor/osm_vendor_ts.h > > > b/osm/include/vendor/osm_vendor_ts.h > > > index b4c2f21..4414cba 100644 > > > --- a/osm/include/vendor/osm_vendor_ts.h > > > +++ b/osm/include/vendor/osm_vendor_ts.h > > > @@ -59,7 +59,6 @@ #define OUT > > > #include "iba/ib_types.h" > > > #include "iba/ib_al.h" > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/libvendor/osm_pkt_randomizer.c > > > b/osm/libvendor/osm_pkt_randomizer.c > > > index 2fa7621..29df135 100644 > > > --- a/osm/libvendor/osm_pkt_randomizer.c > > > +++ b/osm/libvendor/osm_pkt_randomizer.c > > > @@ -51,12 +51,14 @@ #endif /* HAVE_CONFIG_H */ > > > > > > #include > > > #include > > > +#include > > > > > > #ifndef WIN32 > > > #include > > > #include > > > #endif > > > > > > +#include > > > > > > > > /********************************************************************** > > > * Return TRUE if the path is in a fault path, and FALSE otherwise. > > > diff --git a/osm/libvendor/osm_vendor_al.c > > b/osm/libvendor/osm_vendor_al.c > > > index d26d6d8..3240625 100644 > > > --- a/osm/libvendor/osm_vendor_al.c > > > +++ b/osm/libvendor/osm_vendor_al.c > > > @@ -59,6 +59,7 @@ #include > > > > > > #ifdef OSM_VENDOR_INTF_AL > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/libvendor/osm_vendor_ibumad.c > > > b/osm/libvendor/osm_vendor_ibumad.c > > > index 0a7fbe3..a3041d0 100644 > > > --- a/osm/libvendor/osm_vendor_ibumad.c > > > +++ b/osm/libvendor/osm_vendor_ibumad.c > > > @@ -57,20 +57,22 @@ #include > > > > > > #ifdef OSM_VENDOR_INTF_OPENIB > > > > > > +#include > > > +#include > > > +#include > > > +#include > > > + > > > +#include > > > #include > > > #include > > > #include > > > #include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > #include > > > > > > -#include > > > -#include > > > -#include > > > > > > /****s* OpenSM: Vendor AL/osm_umad_bind_info_t > > > * NAME > > > diff --git a/osm/libvendor/osm_vendor_ibumad_sa.c > > > b/osm/libvendor/osm_vendor_ibumad_sa.c > > > index 6eae887..568d39c 100644 > > > --- a/osm/libvendor/osm_vendor_ibumad_sa.c > > > +++ b/osm/libvendor/osm_vendor_ibumad_sa.c > > > @@ -38,10 +38,13 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > > > > +#include > > > + > > > #define MAX_PORTS 64 > > > > > > > > /*********************************************************************** > > ****** > > > diff --git a/osm/libvendor/osm_vendor_mlx.c > > b/osm/libvendor/osm_vendor_mlx.c > > > index 4c75d41..4a4be06 100644 > > > --- a/osm/libvendor/osm_vendor_mlx.c > > > +++ b/osm/libvendor/osm_vendor_mlx.c > > > @@ -38,12 +38,14 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > +#include > > > > > > /** > > > * FORWARD REFERENCES > > > diff --git a/osm/libvendor/osm_vendor_mlx_anafa.c > > > b/osm/libvendor/osm_vendor_mlx_anafa.c > > > index 32af9bb..3cd917f 100644 > > > --- a/osm/libvendor/osm_vendor_mlx_anafa.c > > > +++ b/osm/libvendor/osm_vendor_mlx_anafa.c > > > @@ -55,6 +55,7 @@ #include > > > #include > > > #include > > > > > > +#include > > > #include > > > > > > /** > > > diff --git a/osm/libvendor/osm_vendor_mlx_dispatcher.c > > > b/osm/libvendor/osm_vendor_mlx_dispatcher.c > > > index 341e784..afa1473 100644 > > > --- a/osm/libvendor/osm_vendor_mlx_dispatcher.c > > > +++ b/osm/libvendor/osm_vendor_mlx_dispatcher.c > > > @@ -38,6 +38,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/libvendor/osm_vendor_mlx_hca.c > > > b/osm/libvendor/osm_vendor_mlx_hca.c > > > index bb120ac..c0dca86 100644 > > > --- a/osm/libvendor/osm_vendor_mlx_hca.c > > > +++ b/osm/libvendor/osm_vendor_mlx_hca.c > > > @@ -39,6 +39,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #if defined(OSM_VENDOR_INTF_MTL) | defined(OSM_VENDOR_INTF_TS) > > > #undef IN > > > #undef OUT > > > diff --git a/osm/libvendor/osm_vendor_mlx_hca_anafa.c > > > b/osm/libvendor/osm_vendor_mlx_hca_anafa.c > > > index 5045563..8f87225 100644 > > > --- a/osm/libvendor/osm_vendor_mlx_hca_anafa.c > > > +++ b/osm/libvendor/osm_vendor_mlx_hca_anafa.c > > > @@ -44,6 +44,7 @@ #undef IN > > > #undef OUT > > > > > > #include > > > +#include > > > > > > #include > > > #include > > > diff --git a/osm/libvendor/osm_vendor_mlx_ibmgt.c > > > b/osm/libvendor/osm_vendor_mlx_ibmgt.c > > > index 117ad12..ace790b 100644 > > > --- a/osm/libvendor/osm_vendor_mlx_ibmgt.c > > > +++ b/osm/libvendor/osm_vendor_mlx_ibmgt.c > > > @@ -46,7 +46,9 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c > > > b/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c > > > index 69708c9..df250e2 100644 > > > --- a/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c > > > +++ b/osm/libvendor/osm_vendor_mlx_rmpp_ctx.c > > > @@ -38,6 +38,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/libvendor/osm_vendor_mlx_sa.c > > > b/osm/libvendor/osm_vendor_mlx_sa.c > > > index 85fd810..212344a 100644 > > > --- a/osm/libvendor/osm_vendor_mlx_sa.c > > > +++ b/osm/libvendor/osm_vendor_mlx_sa.c > > > @@ -40,6 +40,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/libvendor/osm_vendor_mlx_sar.c > > > b/osm/libvendor/osm_vendor_mlx_sar.c > > > index 5b0bd70..f6b6405 100644 > > > --- a/osm/libvendor/osm_vendor_mlx_sar.c > > > +++ b/osm/libvendor/osm_vendor_mlx_sar.c > > > @@ -38,8 +38,10 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > +#include > > > +#include > > > > > > ib_api_status_t > > > osmv_rmpp_sar_init(osmv_rmpp_sar_t* p_sar, void* p_arbt_mad, > > > diff --git a/osm/libvendor/osm_vendor_mlx_sender.c > > > b/osm/libvendor/osm_vendor_mlx_sender.c > > > index 3317702..e1ed0a0 100644 > > > --- a/osm/libvendor/osm_vendor_mlx_sender.c > > > +++ b/osm/libvendor/osm_vendor_mlx_sender.c > > > @@ -38,6 +38,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/libvendor/osm_vendor_mlx_sim.c > > > b/osm/libvendor/osm_vendor_mlx_sim.c > > > index b927f2f..ba81e03 100644 > > > --- a/osm/libvendor/osm_vendor_mlx_sim.c > > > +++ b/osm/libvendor/osm_vendor_mlx_sim.c > > > @@ -51,12 +51,14 @@ #include > > > #include > > > #include > > > #include > > > +#include > > > > > > #include > > > #include > > > #include > > > #include > > > > > > +#include > > > /* the simulator messages definition */ > > > #include > > > > > > diff --git a/osm/libvendor/osm_vendor_mlx_ts.c > > > b/osm/libvendor/osm_vendor_mlx_ts.c > > > index 483b69b..a32173e 100644 > > > --- a/osm/libvendor/osm_vendor_mlx_ts.c > > > +++ b/osm/libvendor/osm_vendor_mlx_ts.c > > > @@ -51,12 +51,14 @@ #include > > > #include > > > #include > > > #include > > > +#include > > > > > > #include > > > #include > > > #include > > > #include > > > > > > +#include > > > #include > > > > > > typedef struct _osmv_TOPSPIN_transport_mgr_ { > > > diff --git a/osm/libvendor/osm_vendor_mlx_ts_anafa.c > > > b/osm/libvendor/osm_vendor_mlx_ts_anafa.c > > > index dd3c462..a9395df 100644 > > > --- a/osm/libvendor/osm_vendor_mlx_ts_anafa.c > > > +++ b/osm/libvendor/osm_vendor_mlx_ts_anafa.c > > > @@ -52,6 +52,7 @@ #include > > > #include > > > #include > > > #include > > > +#include > > > > > > #include > > > #include > > > @@ -59,6 +60,7 @@ #include > > #include > > > #include > > > > > > +#include > > > #include > > > > > > static void > > > diff --git a/osm/libvendor/osm_vendor_mtl.c > > b/osm/libvendor/osm_vendor_mtl.c > > > index f9b2284..82a68de 100644 > > > --- a/osm/libvendor/osm_vendor_mtl.c > > > +++ b/osm/libvendor/osm_vendor_mtl.c > > > @@ -43,6 +43,8 @@ #include > > > > > > #ifdef OSM_VENDOR_INTF_MTL > > > > > > +#include > > > +#include > > > #include > > > #include > > > /* HACK - I do not know how to prevent complib from loading kernel H > > files */ > > > diff --git a/osm/libvendor/osm_vendor_mtl_transaction_mgr.c > > > b/osm/libvendor/osm_vendor_mtl_transaction_mgr.c > > > index 997eb37..2b1c960 100644 > > > --- a/osm/libvendor/osm_vendor_mtl_transaction_mgr.c > > > +++ b/osm/libvendor/osm_vendor_mtl_transaction_mgr.c > > > @@ -40,6 +40,7 @@ # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > #include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/libvendor/osm_vendor_test.c > > b/osm/libvendor/osm_vendor_test.c > > > index ecacc67..013262e 100644 > > > --- a/osm/libvendor/osm_vendor_test.c > > > +++ b/osm/libvendor/osm_vendor_test.c > > > @@ -56,6 +56,7 @@ #include > > > > > > #ifdef OSM_VENDOR_INTF_TEST > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/libvendor/osm_vendor_ts.c > > b/osm/libvendor/osm_vendor_ts.c > > > index 16d52e2..fa51382 100644 > > > --- a/osm/libvendor/osm_vendor_ts.c > > > +++ b/osm/libvendor/osm_vendor_ts.c > > > @@ -40,8 +40,10 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/libvendor/osm_vendor_umadt.c > > > b/osm/libvendor/osm_vendor_umadt.c > > > index 01d9b10..e27801a 100644 > > > --- a/osm/libvendor/osm_vendor_umadt.c > > > +++ b/osm/libvendor/osm_vendor_umadt.c > > > @@ -61,6 +61,7 @@ #ifdef OSM_VENDOR_INTF_UMADT > > > > > > #include > > > #include > > > +#include > > > > > > #include > > > #include > > > diff --git a/osm/opensm/osm_db_files.c b/osm/opensm/osm_db_files.c > > > index a8e82a7..930aaef 100644 > > > --- a/osm/opensm/osm_db_files.c > > > +++ b/osm/opensm/osm_db_files.c > > > @@ -46,11 +46,13 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > -#include > > > #include > > > #include > > > #include > > > +#include > > > +#include > > > +#include > > > +#include > > > > > > /****d* Database/OSM_DB_MAX_LINE_LEN > > > * NAME > > > diff --git a/osm/opensm/osm_db_pack.c b/osm/opensm/osm_db_pack.c > > > index 3f90397..b93ac84 100644 > > > --- a/osm/opensm/osm_db_pack.c > > > +++ b/osm/opensm/osm_db_pack.c > > > @@ -40,6 +40,7 @@ # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > #include > > > +#include > > > #include > > > #include > > > static inline void > > > diff --git a/osm/opensm/osm_drop_mgr.c b/osm/opensm/osm_drop_mgr.c > > > index 470e5df..929088a 100644 > > > --- a/osm/opensm/osm_drop_mgr.c > > > +++ b/osm/opensm/osm_drop_mgr.c > > > @@ -51,7 +51,9 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_fwd_tbl.c b/osm/opensm/osm_fwd_tbl.c > > > index 852e048..ee32194 100644 > > > --- a/osm/opensm/osm_fwd_tbl.c > > > +++ b/osm/opensm/osm_fwd_tbl.c > > > @@ -51,7 +51,6 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_helper.c b/osm/opensm/osm_helper.c > > > index e54644b..3886609 100644 > > > --- a/osm/opensm/osm_helper.c > > > +++ b/osm/opensm/osm_helper.c > > > @@ -51,7 +51,7 @@ #endif /* HAVE_CONFIG_H */ > > > > > > #include > > > #include > > > -#include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_inform.c b/osm/opensm/osm_inform.c > > > index f20b068..172190c 100644 > > > --- a/osm/opensm/osm_inform.c > > > +++ b/osm/opensm/osm_inform.c > > > @@ -49,6 +49,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_lid_mgr.c b/osm/opensm/osm_lid_mgr.c > > > index 31d0be4..a33a420 100644 > > > --- a/osm/opensm/osm_lid_mgr.c > > > +++ b/osm/opensm/osm_lid_mgr.c > > > @@ -90,6 +90,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_lin_fwd_rcv.c > > b/osm/opensm/osm_lin_fwd_rcv.c > > > index 8ae7da8..339fe11 100644 > > > --- a/osm/opensm/osm_lin_fwd_rcv.c > > > +++ b/osm/opensm/osm_lin_fwd_rcv.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_lin_fwd_rcv_ctrl.c > > > b/osm/opensm/osm_lin_fwd_rcv_ctrl.c > > > index 4e915e7..987440d 100644 > > > --- a/osm/opensm/osm_lin_fwd_rcv_ctrl.c > > > +++ b/osm/opensm/osm_lin_fwd_rcv_ctrl.c > > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_lin_fwd_tbl.c > > b/osm/opensm/osm_lin_fwd_tbl.c > > > index f8a6b87..3b4895f 100644 > > > --- a/osm/opensm/osm_lin_fwd_tbl.c > > > +++ b/osm/opensm/osm_lin_fwd_tbl.c > > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_link_mgr.c b/osm/opensm/osm_link_mgr.c > > > index c8307d3..87e9e46 100644 > > > --- a/osm/opensm/osm_link_mgr.c > > > +++ b/osm/opensm/osm_link_mgr.c > > > @@ -50,8 +50,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_mad_pool.c b/osm/opensm/osm_mad_pool.c > > > index 72f9db8..12ecabf 100644 > > > --- a/osm/opensm/osm_mad_pool.c > > > +++ b/osm/opensm/osm_mad_pool.c > > > @@ -52,6 +52,7 @@ # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > #include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_matrix.c b/osm/opensm/osm_matrix.c > > > index 3efb0bd..073d9b8 100644 > > > --- a/osm/opensm/osm_matrix.c > > > +++ b/osm/opensm/osm_matrix.c > > > @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > > > > > > > diff --git a/osm/opensm/osm_mcast_fwd_rcv.c > > b/osm/opensm/osm_mcast_fwd_rcv.c > > > index 73763f5..d0ffa59 100644 > > > --- a/osm/opensm/osm_mcast_fwd_rcv.c > > > +++ b/osm/opensm/osm_mcast_fwd_rcv.c > > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_mcast_fwd_rcv_ctrl.c > > > b/osm/opensm/osm_mcast_fwd_rcv_ctrl.c > > > index a6f46fd..9201ecf 100644 > > > --- a/osm/opensm/osm_mcast_fwd_rcv_ctrl.c > > > +++ b/osm/opensm/osm_mcast_fwd_rcv_ctrl.c > > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_mcast_mgr.c b/osm/opensm/osm_mcast_mgr.c > > > index f729c61..96d3b0f 100644 > > > --- a/osm/opensm/osm_mcast_mgr.c > > > +++ b/osm/opensm/osm_mcast_mgr.c > > > @@ -50,6 +50,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_mcast_tbl.c b/osm/opensm/osm_mcast_tbl.c > > > index 401d97c..b8fa325 100644 > > > --- a/osm/opensm/osm_mcast_tbl.c > > > +++ b/osm/opensm/osm_mcast_tbl.c > > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_mcm_info.c b/osm/opensm/osm_mcm_info.c > > > index 08c0d12..a5ac7f3 100644 > > > --- a/osm/opensm/osm_mcm_info.c > > > +++ b/osm/opensm/osm_mcm_info.c > > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > > > > > > /********************************************************************** > > > diff --git a/osm/opensm/osm_mcm_port.c b/osm/opensm/osm_mcm_port.c > > > index e92ad76..16ed84e 100644 > > > --- a/osm/opensm/osm_mcm_port.c > > > +++ b/osm/opensm/osm_mcm_port.c > > > @@ -51,6 +51,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > +#include > > > #include > > > > > > > > /********************************************************************** > > > diff --git a/osm/opensm/osm_mtree.c b/osm/opensm/osm_mtree.c > > > index f9d82d6..421e39e 100644 > > > --- a/osm/opensm/osm_mtree.c > > > +++ b/osm/opensm/osm_mtree.c > > > @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_multicast.c b/osm/opensm/osm_multicast.c > > > index 2256741..690f7df 100644 > > > --- a/osm/opensm/osm_multicast.c > > > +++ b/osm/opensm/osm_multicast.c > > > @@ -49,6 +49,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_node_desc_rcv.c > > b/osm/opensm/osm_node_desc_rcv.c > > > index 62fe034..f9fa22d 100644 > > > --- a/osm/opensm/osm_node_desc_rcv.c > > > +++ b/osm/opensm/osm_node_desc_rcv.c > > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_node_desc_rcv_ctrl.c > > > b/osm/opensm/osm_node_desc_rcv_ctrl.c > > > index 9f689e2..3f26b83 100644 > > > --- a/osm/opensm/osm_node_desc_rcv_ctrl.c > > > +++ b/osm/opensm/osm_node_desc_rcv_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_node_info_rcv.c > > b/osm/opensm/osm_node_info_rcv.c > > > index c35e2b7..59257a0 100644 > > > --- a/osm/opensm/osm_node_info_rcv.c > > > +++ b/osm/opensm/osm_node_info_rcv.c > > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_node_info_rcv_ctrl.c > > > b/osm/opensm/osm_node_info_rcv_ctrl.c > > > index 478f9c4..cbff6ce 100644 > > > --- a/osm/opensm/osm_node_info_rcv_ctrl.c > > > +++ b/osm/opensm/osm_node_info_rcv_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_opensm.c b/osm/opensm/osm_opensm.c > > > index 2a8e0f8..8c422b5 100644 > > > --- a/osm/opensm/osm_opensm.c > > > +++ b/osm/opensm/osm_opensm.c > > > @@ -53,7 +53,7 @@ #endif /* HAVE_CONFIG_H */ > > > > > > #include > > > #include > > > -#include > > > +#include > > > #include > > > #include > > > #include > > > @@ -130,8 +130,6 @@ osm_opensm_destroy( > > > > > > cl_plock_destroy( &p_osm->lock ); > > > > > > - cl_mem_display( ); > > > - > > > osm_log_destroy( &p_osm->log ); > > > } > > > > > > diff --git a/osm/opensm/osm_pkey.c b/osm/opensm/osm_pkey.c > > > index b0cb869..5ecfdd9 100644 > > > --- a/osm/opensm/osm_pkey.c > > > +++ b/osm/opensm/osm_pkey.c > > > @@ -51,6 +51,7 @@ #endif /* HAVE_CONFIG_H */ > > > > > > #include > > > #include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_pkey_mgr.c b/osm/opensm/osm_pkey_mgr.c > > > index f98d13b..e08b7cc 100644 > > > --- a/osm/opensm/osm_pkey_mgr.c > > > +++ b/osm/opensm/osm_pkey_mgr.c > > > @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_pkey_rcv.c b/osm/opensm/osm_pkey_rcv.c > > > index 8696dc4..5262a6b 100644 > > > --- a/osm/opensm/osm_pkey_rcv.c > > > +++ b/osm/opensm/osm_pkey_rcv.c > > > @@ -39,8 +39,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_pkey_rcv_ctrl.c > > b/osm/opensm/osm_pkey_rcv_ctrl.c > > > index 77ebab2..cd4367a 100644 > > > --- a/osm/opensm/osm_pkey_rcv_ctrl.c > > > +++ b/osm/opensm/osm_pkey_rcv_ctrl.c > > > @@ -43,7 +43,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_port.c b/osm/opensm/osm_port.c > > > index f8c51e8..53ab006 100644 > > > --- a/osm/opensm/osm_port.c > > > +++ b/osm/opensm/osm_port.c > > > @@ -52,6 +52,7 @@ # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > #include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_port_info_rcv.c > > b/osm/opensm/osm_port_info_rcv.c > > > index 119bcbd..a08c57c 100644 > > > --- a/osm/opensm/osm_port_info_rcv.c > > > +++ b/osm/opensm/osm_port_info_rcv.c > > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_port_info_rcv_ctrl.c > > > b/osm/opensm/osm_port_info_rcv_ctrl.c > > > index 9f6001f..303bedb 100644 > > > --- a/osm/opensm/osm_port_info_rcv_ctrl.c > > > +++ b/osm/opensm/osm_port_info_rcv_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_prtn.c b/osm/opensm/osm_prtn.c > > > index 26790b4..8b748c4 100644 > > > --- a/osm/opensm/osm_prtn.c > > > +++ b/osm/opensm/osm_prtn.c > > > @@ -54,6 +54,7 @@ #include > > > #include > > > > > > #include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_qos.c b/osm/opensm/osm_qos.c > > > index cd5c26a..c23ef87 100644 > > > --- a/osm/opensm/osm_qos.c > > > +++ b/osm/opensm/osm_qos.c > > > @@ -46,6 +46,7 @@ # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > #include > > > +#include > > > > > > #include > > > #include > > > diff --git a/osm/opensm/osm_remote_sm.c b/osm/opensm/osm_remote_sm.c > > > index eb65d22..b91264e 100644 > > > --- a/osm/opensm/osm_remote_sm.c > > > +++ b/osm/opensm/osm_remote_sm.c > > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > > > > > > /********************************************************************** > > > > > **********************************************************************/ > > > diff --git a/osm/opensm/osm_req.c b/osm/opensm/osm_req.c > > > index 9ddc9e9..534694b 100644 > > > --- a/osm/opensm/osm_req.c > > > +++ b/osm/opensm/osm_req.c > > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_req_ctrl.c b/osm/opensm/osm_req_ctrl.c > > > index 708e7c9..2d0e7e0 100644 > > > --- a/osm/opensm/osm_req_ctrl.c > > > +++ b/osm/opensm/osm_req_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_resp.c b/osm/opensm/osm_resp.c > > > index 9b5079a..aa60bf2 100644 > > > --- a/osm/opensm/osm_resp.c > > > +++ b/osm/opensm/osm_resp.c > > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa.c b/osm/opensm/osm_sa.c > > > index b33431c..fa7dad8 100644 > > > --- a/osm/opensm/osm_sa.c > > > +++ b/osm/opensm/osm_sa.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_class_port_info.c > > > b/osm/opensm/osm_sa_class_port_info.c > > > index 389bc9c..cfad739 100644 > > > --- a/osm/opensm/osm_sa_class_port_info.c > > > +++ b/osm/opensm/osm_sa_class_port_info.c > > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_class_port_info_ctrl.c > > > b/osm/opensm/osm_sa_class_port_info_ctrl.c > > > index 219a837..c71af4c 100644 > > > --- a/osm/opensm/osm_sa_class_port_info_ctrl.c > > > +++ b/osm/opensm/osm_sa_class_port_info_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sa_guidinfo_record.c > > > b/osm/opensm/osm_sa_guidinfo_record.c > > > index 7d1eebf..601c809 100644 > > > --- a/osm/opensm/osm_sa_guidinfo_record.c > > > +++ b/osm/opensm/osm_sa_guidinfo_record.c > > > @@ -54,8 +54,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_guidinfo_record_ctrl.c > > > b/osm/opensm/osm_sa_guidinfo_record_ctrl.c > > > index b252b20..f2211b1 100644 > > > --- a/osm/opensm/osm_sa_guidinfo_record_ctrl.c > > > +++ b/osm/opensm/osm_sa_guidinfo_record_ctrl.c > > > @@ -54,7 +54,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sa_informinfo.c > > b/osm/opensm/osm_sa_informinfo.c > > > index 149e609..a820dea 100644 > > > --- a/osm/opensm/osm_sa_informinfo.c > > > +++ b/osm/opensm/osm_sa_informinfo.c > > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_informinfo_ctrl.c > > > b/osm/opensm/osm_sa_informinfo_ctrl.c > > > index 75edabc..31644af 100644 > > > --- a/osm/opensm/osm_sa_informinfo_ctrl.c > > > +++ b/osm/opensm/osm_sa_informinfo_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sa_lft_record.c > > b/osm/opensm/osm_sa_lft_record.c > > > index b9b903e..2d17dbe 100644 > > > --- a/osm/opensm/osm_sa_lft_record.c > > > +++ b/osm/opensm/osm_sa_lft_record.c > > > @@ -55,6 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_lft_record_ctrl.c > > > b/osm/opensm/osm_sa_lft_record_ctrl.c > > > index 0682438..1cc2544 100644 > > > --- a/osm/opensm/osm_sa_lft_record_ctrl.c > > > +++ b/osm/opensm/osm_sa_lft_record_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sa_link_record.c > > b/osm/opensm/osm_sa_link_record.c > > > index 1a407e1..a525002 100644 > > > --- a/osm/opensm/osm_sa_link_record.c > > > +++ b/osm/opensm/osm_sa_link_record.c > > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_link_record_ctrl.c > > > b/osm/opensm/osm_sa_link_record_ctrl.c > > > index 707c184..01db21d 100644 > > > --- a/osm/opensm/osm_sa_link_record_ctrl.c > > > +++ b/osm/opensm/osm_sa_link_record_ctrl.c > > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sa_mad_ctrl.c > > b/osm/opensm/osm_sa_mad_ctrl.c > > > index 1f87ea2..81584ce 100644 > > > --- a/osm/opensm/osm_sa_mad_ctrl.c > > > +++ b/osm/opensm/osm_sa_mad_ctrl.c > > > @@ -50,7 +50,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_mcmember_record.c > > > b/osm/opensm/osm_sa_mcmember_record.c > > > index 291fbf5..5129231 100644 > > > --- a/osm/opensm/osm_sa_mcmember_record.c > > > +++ b/osm/opensm/osm_sa_mcmember_record.c > > > @@ -55,6 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_mcmember_record_ctrl.c > > > b/osm/opensm/osm_sa_mcmember_record_ctrl.c > > > index 99a779a..a583979 100644 > > > --- a/osm/opensm/osm_sa_mcmember_record_ctrl.c > > > +++ b/osm/opensm/osm_sa_mcmember_record_ctrl.c > > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_multipath_record.c > > > b/osm/opensm/osm_sa_multipath_record.c > > > index bdf53a3..c8efdb4 100644 > > > --- a/osm/opensm/osm_sa_multipath_record.c > > > +++ b/osm/opensm/osm_sa_multipath_record.c > > > @@ -52,8 +52,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_multipath_record_ctrl.c > > > b/osm/opensm/osm_sa_multipath_record_ctrl.c > > > index 7c0337c..e330bb8 100644 > > > --- a/osm/opensm/osm_sa_multipath_record_ctrl.c > > > +++ b/osm/opensm/osm_sa_multipath_record_ctrl.c > > > @@ -56,7 +56,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sa_node_record.c > > b/osm/opensm/osm_sa_node_record.c > > > index ecaa048..ac9be22 100644 > > > --- a/osm/opensm/osm_sa_node_record.c > > > +++ b/osm/opensm/osm_sa_node_record.c > > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_node_record_ctrl.c > > > b/osm/opensm/osm_sa_node_record_ctrl.c > > > index dcf5944..61b363a 100644 > > > --- a/osm/opensm/osm_sa_node_record_ctrl.c > > > +++ b/osm/opensm/osm_sa_node_record_ctrl.c > > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sa_path_record.c > > b/osm/opensm/osm_sa_path_record.c > > > index 1e4a137..7da6d70 100644 > > > --- a/osm/opensm/osm_sa_path_record.c > > > +++ b/osm/opensm/osm_sa_path_record.c > > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_path_record_ctrl.c > > > b/osm/opensm/osm_sa_path_record_ctrl.c > > > index eab7171..9495785 100644 > > > --- a/osm/opensm/osm_sa_path_record_ctrl.c > > > +++ b/osm/opensm/osm_sa_path_record_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sa_pkey_record.c > > b/osm/opensm/osm_sa_pkey_record.c > > > index e60466b..0eeb0c0 100644 > > > --- a/osm/opensm/osm_sa_pkey_record.c > > > +++ b/osm/opensm/osm_sa_pkey_record.c > > > @@ -43,8 +43,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_pkey_record_ctrl.c > > > b/osm/opensm/osm_sa_pkey_record_ctrl.c > > > index 01cdc0f..a9d8a8d 100644 > > > --- a/osm/opensm/osm_sa_pkey_record_ctrl.c > > > +++ b/osm/opensm/osm_sa_pkey_record_ctrl.c > > > @@ -43,7 +43,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sa_portinfo_record.c > > > b/osm/opensm/osm_sa_portinfo_record.c > > > index 3acb8c9..e1ca873 100644 > > > --- a/osm/opensm/osm_sa_portinfo_record.c > > > +++ b/osm/opensm/osm_sa_portinfo_record.c > > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_portinfo_record_ctrl.c > > > b/osm/opensm/osm_sa_portinfo_record_ctrl.c > > > index 831843b..4f53f04 100644 > > > --- a/osm/opensm/osm_sa_portinfo_record_ctrl.c > > > +++ b/osm/opensm/osm_sa_portinfo_record_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sa_response.c > > b/osm/opensm/osm_sa_response.c > > > index 30f561f..03c94f7 100644 > > > --- a/osm/opensm/osm_sa_response.c > > > +++ b/osm/opensm/osm_sa_response.c > > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_service_record.c > > > b/osm/opensm/osm_sa_service_record.c > > > index 38ee80b..a65e41d 100644 > > > --- a/osm/opensm/osm_sa_service_record.c > > > +++ b/osm/opensm/osm_sa_service_record.c > > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_service_record_ctrl.c > > > b/osm/opensm/osm_sa_service_record_ctrl.c > > > index 5f8c936..8af9cd7 100644 > > > --- a/osm/opensm/osm_sa_service_record_ctrl.c > > > +++ b/osm/opensm/osm_sa_service_record_ctrl.c > > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sa_slvl_record.c > > b/osm/opensm/osm_sa_slvl_record.c > > > index 237b99c..5d1928e 100644 > > > --- a/osm/opensm/osm_sa_slvl_record.c > > > +++ b/osm/opensm/osm_sa_slvl_record.c > > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_slvl_record_ctrl.c > > > b/osm/opensm/osm_sa_slvl_record_ctrl.c > > > index d156bf1..7801508 100644 > > > --- a/osm/opensm/osm_sa_slvl_record_ctrl.c > > > +++ b/osm/opensm/osm_sa_slvl_record_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sa_sminfo_record.c > > > b/osm/opensm/osm_sa_sminfo_record.c > > > index 9c3f436..b9dee38 100644 > > > --- a/osm/opensm/osm_sa_sminfo_record.c > > > +++ b/osm/opensm/osm_sa_sminfo_record.c > > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_sminfo_record_ctrl.c > > > b/osm/opensm/osm_sa_sminfo_record_ctrl.c > > > index 72c2fad..3b07920 100644 > > > --- a/osm/opensm/osm_sa_sminfo_record_ctrl.c > > > +++ b/osm/opensm/osm_sa_sminfo_record_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sa_vlarb_record.c > > b/osm/opensm/osm_sa_vlarb_record.c > > > index ddbef9c..059e5a9 100644 > > > --- a/osm/opensm/osm_sa_vlarb_record.c > > > +++ b/osm/opensm/osm_sa_vlarb_record.c > > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sa_vlarb_record_ctrl.c > > > b/osm/opensm/osm_sa_vlarb_record_ctrl.c > > > index f7ad3ed..a243e08 100644 > > > --- a/osm/opensm/osm_sa_vlarb_record_ctrl.c > > > +++ b/osm/opensm/osm_sa_vlarb_record_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_service.c b/osm/opensm/osm_service.c > > > index 723e117..a1309d3 100644 > > > --- a/osm/opensm/osm_service.c > > > +++ b/osm/opensm/osm_service.c > > > @@ -49,6 +49,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_slvl_map_rcv.c > > b/osm/opensm/osm_slvl_map_rcv.c > > > index 9a6acf5..33c3d45 100644 > > > --- a/osm/opensm/osm_slvl_map_rcv.c > > > +++ b/osm/opensm/osm_slvl_map_rcv.c > > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_slvl_map_rcv_ctrl.c > > > b/osm/opensm/osm_slvl_map_rcv_ctrl.c > > > index ee357da..4da0eff 100644 > > > --- a/osm/opensm/osm_slvl_map_rcv_ctrl.c > > > +++ b/osm/opensm/osm_slvl_map_rcv_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sm.c b/osm/opensm/osm_sm.c > > > index f6e33c5..0e09f26 100644 > > > --- a/osm/opensm/osm_sm.c > > > +++ b/osm/opensm/osm_sm.c > > > @@ -55,6 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sm_mad_ctrl.c > > b/osm/opensm/osm_sm_mad_ctrl.c > > > index 1b90335..9dceef2 100644 > > > --- a/osm/opensm/osm_sm_mad_ctrl.c > > > +++ b/osm/opensm/osm_sm_mad_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sm_state_mgr.c > > b/osm/opensm/osm_sm_state_mgr.c > > > index a881f7f..8ae9889 100644 > > > --- a/osm/opensm/osm_sm_state_mgr.c > > > +++ b/osm/opensm/osm_sm_state_mgr.c > > > @@ -50,8 +50,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sminfo_rcv.c b/osm/opensm/osm_sminfo_rcv.c > > > index e5c4bbb..5914984 100644 > > > --- a/osm/opensm/osm_sminfo_rcv.c > > > +++ b/osm/opensm/osm_sminfo_rcv.c > > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sminfo_rcv_ctrl.c > > b/osm/opensm/osm_sminfo_rcv_ctrl.c > > > index 76ae65c..327d7eb 100644 > > > --- a/osm/opensm/osm_sminfo_rcv_ctrl.c > > > +++ b/osm/opensm/osm_sminfo_rcv_ctrl.c > > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c > > > index c97875c..97b017d 100644 > > > --- a/osm/opensm/osm_state_mgr.c > > > +++ b/osm/opensm/osm_state_mgr.c > > > @@ -50,7 +50,9 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_state_mgr_ctrl.c > > b/osm/opensm/osm_state_mgr_ctrl.c > > > index a7afc46..0bde333 100644 > > > --- a/osm/opensm/osm_state_mgr_ctrl.c > > > +++ b/osm/opensm/osm_state_mgr_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c > > > index 9b4bcfe..c251411 100644 > > > --- a/osm/opensm/osm_subnet.c > > > +++ b/osm/opensm/osm_subnet.c > > > @@ -51,6 +51,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sw_info_rcv.c > > b/osm/opensm/osm_sw_info_rcv.c > > > index 7a1f72f..6bbd73a 100644 > > > --- a/osm/opensm/osm_sw_info_rcv.c > > > +++ b/osm/opensm/osm_sw_info_rcv.c > > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_sw_info_rcv_ctrl.c > > > b/osm/opensm/osm_sw_info_rcv_ctrl.c > > > index a97a7dc..fb8fe50 100644 > > > --- a/osm/opensm/osm_sw_info_rcv_ctrl.c > > > +++ b/osm/opensm/osm_sw_info_rcv_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_sweep_fail_ctrl.c > > b/osm/opensm/osm_sweep_fail_ctrl.c > > > index 022988a..e27a540 100644 > > > --- a/osm/opensm/osm_sweep_fail_ctrl.c > > > +++ b/osm/opensm/osm_sweep_fail_ctrl.c > > > @@ -49,7 +49,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_switch.c b/osm/opensm/osm_switch.c > > > index fa726c6..7e89475 100644 > > > --- a/osm/opensm/osm_switch.c > > > +++ b/osm/opensm/osm_switch.c > > > @@ -51,6 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_trap_rcv.c b/osm/opensm/osm_trap_rcv.c > > > index 7e39832..9865f53 100644 > > > --- a/osm/opensm/osm_trap_rcv.c > > > +++ b/osm/opensm/osm_trap_rcv.c > > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_trap_rcv_ctrl.c > > b/osm/opensm/osm_trap_rcv_ctrl.c > > > index 1e6bf45..ee5a1a4 100644 > > > --- a/osm/opensm/osm_trap_rcv_ctrl.c > > > +++ b/osm/opensm/osm_trap_rcv_ctrl.c > > > @@ -51,7 +51,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_ucast_mgr.c b/osm/opensm/osm_ucast_mgr.c > > > index 4492c1a..95f4d04 100644 > > > --- a/osm/opensm/osm_ucast_mgr.c > > > +++ b/osm/opensm/osm_ucast_mgr.c > > > @@ -54,6 +54,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_ucast_updn.c b/osm/opensm/osm_ucast_updn.c > > > index b70cf21..44e1993 100644 > > > --- a/osm/opensm/osm_ucast_updn.c > > > +++ b/osm/opensm/osm_ucast_updn.c > > > @@ -50,6 +50,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_vl15intf.c b/osm/opensm/osm_vl15intf.c > > > index f72620b..68f17c5 100644 > > > --- a/osm/opensm/osm_vl15intf.c > > > +++ b/osm/opensm/osm_vl15intf.c > > > @@ -55,8 +55,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_vl_arb_rcv.c b/osm/opensm/osm_vl_arb_rcv.c > > > index 70fd5ed..e33a2f9 100644 > > > --- a/osm/opensm/osm_vl_arb_rcv.c > > > +++ b/osm/opensm/osm_vl_arb_rcv.c > > > @@ -51,8 +51,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > diff --git a/osm/opensm/osm_vl_arb_rcv_ctrl.c > > b/osm/opensm/osm_vl_arb_rcv_ctrl.c > > > index 9113985..f1f22c7 100644 > > > --- a/osm/opensm/osm_vl_arb_rcv_ctrl.c > > > +++ b/osm/opensm/osm_vl_arb_rcv_ctrl.c > > > @@ -55,7 +55,7 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > -#include > > > +#include > > > #include > > > #include > > > > > > diff --git a/osm/osmtest/include/osmtest_subnet.h > > > b/osm/osmtest/include/osmtest_subnet.h > > > index 0e7cf3e..277a2aa 100644 > > > --- a/osm/osmtest/include/osmtest_subnet.h > > > +++ b/osm/osmtest/include/osmtest_subnet.h > > > @@ -47,6 +47,7 @@ > > > #ifndef _OSMTEST_SUBNET_H_ > > > #define _OSMTEST_SUBNET_H_ > > > > > > +#include > > > #include > > > #include > > > #include > > > diff --git a/osm/osmtest/osmt_inform.c b/osm/osmtest/osmt_inform.c > > > index b24ae30..e1562db 100644 > > > --- a/osm/osmtest/osmt_inform.c > > > +++ b/osm/osmtest/osmt_inform.c > > > @@ -56,7 +56,6 @@ #include > > > #include > > > #include > > > #include > > > -#include > > > > > > #include > > > #include "osmtest.h" > > > diff --git a/osm/osmtest/osmt_slvl_vl_arb.c > > b/osm/osmtest/osmt_slvl_vl_arb.c > > > index 6cb8377..9fc84f6 100644 > > > --- a/osm/osmtest/osmt_slvl_vl_arb.c > > > +++ b/osm/osmtest/osmt_slvl_vl_arb.c > > > @@ -54,7 +54,6 @@ #include > > > #include > > > #include > > > #include > > > -#include > > > #include "osmtest.h" > > > > > > > > /********************************************************************** > > > diff --git a/osm/osmtest/osmtest.c b/osm/osmtest/osmtest.c > > > index 78aff53..5eb5482 100644 > > > --- a/osm/osmtest/osmtest.c > > > +++ b/osm/osmtest/osmtest.c > > > @@ -56,8 +56,8 @@ #endif > > > > > > #include > > > #include > > > -#ifdef __WIN__ > > > #include > > > +#ifdef __WIN__ > > > #include > > > #else > > > #include > > > -- > > > 1.3.2 > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > From eitan at mellanox.co.il Thu May 18 10:35:53 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 18 May 2006 20:35:53 +0300 Subject: [openib-general] RE: [libsdp] RFC: Configuration file enhancements Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBB6@mtlexch01.mtl.com> Hi Michael, I agree, Let's make it even clearer: use sdp listen *:12865 use tcp destination 192.169.2.0/24 # tcp only to this destination use both destination 192.168.1.0/24 # sdp with fallback use both listen *:22 # ssh listening on both tcp and sdp sockets Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Michael S. Tsirkin > Sent: Thursday, May 18, 2006 2:02 PM > To: Eitan Zahavi > Cc: openib-general at openib.org > Subject: Re: [libsdp] RFC: Configuration file enhancements > > Quoting r. Eitan Zahavi : > > 3. Today: "match_both" is not clearly described as applying to passive side only, > even though it does > > > > not have a meaning for "active" side (since connection is either on INET or > SDP) > > > > Change: Wrror on cases where the user specified match_both destination ? > > > > 4. Today: If connect over SDP fails an automatic fall back to INET socket is > performed > > > > Change: "match_fallback" should be used for active side rules when fallback is > required. Moreover > > > > "match" will not fallback - i.e. if SDP socket is required and fail - connect > will return an error. > > > > Thanks > > IMO, unmatch, match_both match_fallback are misleading names: you still do > matching in the same way, you supply a modifier affecting SDP/TCP > selection. > > How about we have an extra parameter to match directive? > It could be sdp, tcp, or both. > > Thus: > > match sdp listen *:12865 > match tcp destination 192.169.2.0/24 # tcp only to this destination > match both destination 192.168.1.0/24 # sdp with fallback > > -- > MST From joern at wohnheim.fh-wedel.de Thu May 18 10:41:12 2006 From: joern at wohnheim.fh-wedel.de (=?iso-8859-1?Q?J=F6rn?= Engel) Date: Thu, 18 May 2006 19:41:12 +0200 Subject: [openib-general] Re: [PATCH 01/16] ehca: module infrastructure In-Reply-To: <4468BD39.3010008@de.ibm.com> References: <4468BD39.3010008@de.ibm.com> Message-ID: <20060518174112.GB26113@wohnheim.fh-wedel.de> On Mon, 15 May 2006 19:41:13 +0200, Heiko J Schick wrote: > + * This source code is distributed under a dual license of GPL v2.0 and > OpenIB Your mailer is still mangling long lines, it seems. If you need a quick solution, I could offer you a gmail invite. > + > + EDEB_EX(7, "ret=%x", ret); > + > + return ret; > + > +create_aqp1: > + ib_destroy_cq(sport->ibcq_aqp1); > + > + EDEB_EX(7, "ret=%x", ret); > + > + return ret; > +} Those two cases could be combined with a goto. Saves a tiny amount of rodata. > +#define EHCA_RESOURCE_ATTR(name) > \ > +static ssize_t ehca_show_##name(struct device *dev, > \ You have spaces instead of tabs in the lines the mailer mangled. > + > \ > + data = rblock->name; \ > + kfree(rblock); \ > + \ > + if ((strcmp(#name, "num_ports") == 0) && (ehca_nr_ports == 1)) \ > + return snprintf(buf, 256, "1\n"); \ > + else \ > + return snprintf(buf, 256, "%d\n", data); \ > + \ Is rblock->num_ports uninitialized when (ehca_nr_ports == 1)? Looks rather odd. > + shca = (struct ehca_shca *)ib_alloc_device(sizeof(*shca)); A quick grep showed that every single return value of ib_alloc_device() has a cast. Roland, can't you just change ib_alloc_device() to return void*? > +static struct of_device_id ehca_device_table[] = > +{ > + { > + .name = "lhca", > + .compatible = "IBM,lhca", > + }, > + {}, > +}; Is the extra element needed? > + if ((ret = ehca_create_slab_caches(&ehca_module))) { > + EDEB_ERR(4, "Cannot create SLAB caches"); > + ret = -ENOMEM; > + goto module_init1; > + } ret = try_something() if (ret) { ... } > + ehca_module.timer.data = (unsigned long)(void*)&ehca_module; Why the double cast? Jörn -- Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. -- Rob Pike From halr at voltaire.com Thu May 18 10:36:04 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 May 2006 13:36:04 -0400 Subject: [openib-general] [PATCH] Replace cl_memory.h by string.h[was:[PATCH] OpenSM: Use memory routines directly and eliminatecl_mem*routines] In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBB5@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBB5@mtlexch01.mtl.com> Message-ID: <1147973633.18971.80190.camel@hal.voltaire.com> On Thu, 2006-05-18 at 13:28, Eitan Zahavi wrote: > I think we need to change major version of libosmcomp such that > applications compiled against the previous API will fail to link... Right (missed that). Will be done shortly. -- Hal > Eitan Zahavi > Senior Engineering Director, Software Architect > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Thursday, May 18, 2006 6:07 PM > > To: Eitan Zahavi > > Cc: Sasha Khapyorsky; openib-general at openib.org > > Subject: RE: [openib-general] [PATCH] Replace cl_memory.h by > > string.h[was:[PATCH] OpenSM: Use memory routines directly and > > eliminatecl_mem*routines] > > > > Hi Eitan, > > > > On Thu, 2006-05-18 at 11:00, Eitan Zahavi wrote: > > > Hi Sasha, Hal, > > > > > > There several applications (ibis and ibmgtsim) that depend on > complib, > > > The changes of cleaning up the cl_memory API affect these utilities. > > > Can you please provide the list of APIs removed and their > replacements ? > > > > cl_memset -> memset > > cl_memclr(x, y) -> memset ( x, 0, y) > > cl_memcpy -> memcpy > > > > Soon cl_malloc/cl_zalloc/cl_free will change (and the memory tracking > > will be removed). > > > > -- Hal > > > > > Also if we eventually converge on a single complib for windows and > linux > > > then the Windows stack is going to be affected by these changes too. > > > > > > EZ > > > > > > Eitan Zahavi > > > Senior Engineering Director, Software Architect > > > Mellanox Technologies LTD > > > Tel:+972-4-9097208 > > > Fax:+972-4-9593245 > > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > > > > -----Original Message----- > > > > From: openib-general-bounces at openib.org [mailto:openib-general- > > > > bounces at openib.org] On Behalf Of Sasha Khapyorsky > > > > Sent: Thursday, May 18, 2006 1:03 AM > > > > To: Hal Rosenstock > > > > Cc: openib-general at openib.org > > > > Subject: [openib-general] [PATCH] Replace cl_memory.h by string.h > > > [was:[PATCH] > > > > OpenSM: Use memory routines directly and eliminatecl_mem* > routines] > > > > > > > > On 12:14 Wed 17 May , Hal Rosenstock wrote: > > > > > OpenSM: Use memory routines directly and eliminate cl_mem* > routines > > > > > as these routines are part of ISO C > > > > > > > > > > Signed-off-by: Hal Rosenstock > > > > > > > > Following Hal's cleanup this includes string.h header file for > proper > > > > mem*() functions prototype definitions where necessary, > > > removes/includes > > > > cl_memory.h as needed. Also couple of unistd.h additions for > close(), > > > > sleep() and unlink() calls. > > > > > > > > Signed-off-by: Sasha Khapyorsky From svenar at simula.no Thu May 18 06:55:30 2006 From: svenar at simula.no (Sven-Arne Reinemo) Date: Thu, 18 May 2006 15:55:30 +0200 Subject: [openib-general] Problems running IBMgtSim with OpenSM In-Reply-To: <1147956768.18971.75147.camel@hal.voltaire.com> References: <446C645B.3030206@simula.no> <1147956768.18971.75147.camel@hal.voltaire.com> Message-ID: <446C7CD2.4090705@simula.no> Anno Domini 18-05-2006 14:52, Hal Rosenstock wrote: > On Thu, 2006-05-18 at 08:11, Sven-Arne Reinemo wrote: >> Hi, >> >> I am trying to get the IBMgtSim running with OpenSM, > > I haven't used the simulator but I know OpenSM needs to be built > specially for use with the simulator. Was that done (How was the OpenSM > build invoked) ? Thanks! To make it work with the simulator I had to give configure the following options (--with-sim=): ./configure --with-osmv=sim --with-sim=/usr/local -- SAR ---- GnuPG public key - http://home.ifi.uio.no/~svenar/gpg.asc ---- "There are only 10 kinds of people in this world; those who know binary and those who don't." -- Unknown From rdreier at cisco.com Thu May 18 11:38:15 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 18 May 2006 11:38:15 -0700 Subject: [openib-general] Re: [PATCH] mthca: fix posting lists of 256 entries for tavor In-Reply-To: <20060518153254.GF30211@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 18 May 2006 18:32:54 +0300") References: <20060518153254.GF30211@mellanox.co.il> Message-ID: Thanks, applied. From rdreier at cisco.com Thu May 18 11:40:15 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 18 May 2006 11:40:15 -0700 Subject: [openib-general] Re: [PATCH] libmthca: fix posting lists with 256 entries In-Reply-To: <20060518153308.GG30211@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 18 May 2006 18:33:08 +0300") References: <20060518153308.GG30211@mellanox.co.il> Message-ID: Thanks, applied. From rdreier at cisco.com Thu May 18 11:47:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 18 May 2006 11:47:13 -0700 Subject: [openib-general][RFC][PATCH] mthca: HCA initialization parameters In-Reply-To: <20060518144810.GA9756@voltaire.com> (Leonid Arsh's message of "Thu, 18 May 2006 17:48:10 +0300") References: <20060518144810.GA9756@voltaire.com> Message-ID: Leonid> Hello, we need a capability to change the HCA parameters, Leonid> in order to tune its resources. Leonid> There is a special structure 'mthca_profile' in the Leonid> MTHA driver, used during the HCA initialization and Leonid> determining different HCA initialization parameters, such Leonid> as maximum number of QPs, CQs, address vectors etc. Leonid> Unfortunately, the parameters can not be defined outside Leonid> the driver. Leonid> Attached file implements a number of the module Leonid> parameters allowing to define the 'mthca_profile' values. Thanks, I've held off on doing this because adding these module parameters doesn't handle multiple different HCAs very gracefully. But I'm not sure if I really have a better solution -- (ab)using request_firmware() maybe? Does it make sense to tune all of these values? For example is anyone really changing the size of the user access region context? And certainly making num_uar tunable doesn't make any sense -- what do we do if the user asks for more UARs than the PCI BAR can cover? And what do we save if someone asks for fewer? The scheme of making the module parameter take effect only if its non-zero seems really confusing to me. Someone is going to look in sysfs and see that num_qp is 0 and get confused. Also I think all of these values need to be powers of 2, so that should probably be enforced somehow (either by making the parameters be log-base-2 values, or using roundup_pow_of_two -- I'm not sure which is better). - R. From rdreier at cisco.com Thu May 18 11:48:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 18 May 2006 11:48:21 -0700 Subject: [openib-general] Re: [resend][RFC][PATCH] adding call to madvise In-Reply-To: <20060518042427.GA11533@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 18 May 2006 07:24:27 +0300") References: <20060511134217.GW5319@minantech.com> <20060511185926.GA1561@minantech.com> <20060514134240.GZ5319@minantech.com> <20060518042427.GA11533@mellanox.co.il> Message-ID: Michael> Will this break libibverbs on older kernels that don't Michael> have madvise? Maybe test MADV_DONTFORK during library Michael> startup and set a flag? All kernels have madvise() -- the only question is whether they support MADV_DONTFORK. The cost on kernels that don't is a system call to return -EINVAL. It might be worth setting a flag on library initialization to save all the system calls. But maybe not. - R. From halr at voltaire.com Thu May 18 12:01:55 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 May 2006 15:01:55 -0400 Subject: [openib-general] [PATCH] OpenSM: Eliminate error on active -> active port state transition Message-ID: <1147978910.18971.81627.camel@hal.voltaire.com> OpenSM: Eliminate error on active -> active port state transition SM may transition port for armed to active but in the mean time due to passing a data packet with active enable set, the port may already have transitioned to active. Active -> active port state transition is indicated as an error but it isn't really an error so don't indicate error in the osm log. Signed-off-by: Hal Rosenstock Index: include/opensm/osm_madw.h =================================================================== --- include/opensm/osm_madw.h (revision 7342) +++ include/opensm/osm_madw.h (working copy) @@ -185,6 +185,7 @@ typedef struct _osm_pi_context boolean_t light_sweep; boolean_t update_master_sm_base_lid; boolean_t ignore_errors; + boolean_t active_transition; } osm_pi_context_t; /*********/ Index: opensm/osm_lid_mgr.c =================================================================== --- opensm/osm_lid_mgr.c (revision 7324) +++ opensm/osm_lid_mgr.c (working copy) @@ -1191,6 +1191,7 @@ __osm_lid_mgr_set_physp_pi( context.pi_context.update_master_sm_base_lid = FALSE; context.pi_context.ignore_errors = FALSE; context.pi_context.light_sweep = FALSE; + context.pi_context.active_transition = FALSE; /* We need to set the cli_rereg bit when we are in first_time_master_sweep for Index: opensm/osm_link_mgr.c =================================================================== --- opensm/osm_link_mgr.c (revision 7336) +++ opensm/osm_link_mgr.c (working copy) @@ -320,7 +320,13 @@ __osm_link_mgr_set_physp_pi( if (port_state != IB_LINK_NO_CHANGE && ib_port_info_get_port_state(p_pi) != ib_port_info_get_port_state(p_old_pi) ) + { send_set = TRUE; + if (port_state == IB_LINK_ACTIVE) + context.pi_context.active_transition = TRUE; + else + context.pi_context.active_transition = FALSE; + } context.pi_context.node_guid = osm_node_get_node_guid( p_node ); context.pi_context.port_guid = osm_physp_get_port_guid( p_physp ); Index: opensm/osm_node_info_rcv.c =================================================================== --- opensm/osm_node_info_rcv.c (revision 7324) +++ opensm/osm_node_info_rcv.c (working copy) @@ -331,6 +331,7 @@ __osm_ni_rcv_process_new_node( context.pi_context.update_master_sm_base_lid = FALSE; context.pi_context.ignore_errors = FALSE; context.pi_context.light_sweep = FALSE; + context.pi_context.active_transition = FALSE; status = osm_req_get( p_rcv->p_gen_req, osm_physp_get_dr_path_ptr( p_physp ), Index: opensm/osm_pkey_mgr.c =================================================================== --- opensm/osm_pkey_mgr.c (revision 7324) +++ opensm/osm_pkey_mgr.c (working copy) @@ -120,6 +120,7 @@ pkey_mgr_enforce_partition( context.pi_context.update_master_sm_base_lid = FALSE; context.pi_context.ignore_errors = FALSE; context.pi_context.light_sweep = FALSE; + context.pi_context.active_transition = FALSE; return osm_req_set( p_req, osm_physp_get_dr_path_ptr( p_physp ), payload, sizeof(payload), Index: opensm/osm_port_info_rcv.c =================================================================== --- opensm/osm_port_info_rcv.c (revision 7324) +++ opensm/osm_port_info_rcv.c (working copy) @@ -586,6 +586,7 @@ osm_pi_rcv_process_set( ib_smp_t *p_smp; ib_port_info_t *p_pi; osm_pi_context_t *p_context; + osm_log_level_t level; OSM_LOG_ENTER( p_rcv->p_log, osm_pi_rcv_process_set ); @@ -605,16 +606,31 @@ osm_pi_rcv_process_set( /* check for error */ if (!p_context->ignore_errors && (cl_ntoh16(p_smp->status) & 0x7fff)) { - osm_log( p_rcv->p_log, OSM_LOG_ERROR, - "osm_pi_rcv_process_set: ERR 0F10: " - "Received error status for SetResp()\n"); + /* If port already ACTIVE, don't treat status 7 error as error */ + if (p_context->active_transition && + (cl_ntoh16(p_smp->status) & 0x7fff) == 0x1c) + { + level = OSM_LOG_INFO; + osm_log( p_rcv->p_log, OSM_LOG_INFO, + "osm_pi_rcv_process_set: " + "Received error status 0x%x for SetResp() during ACTIVE transition\n", + cl_ntoh16(p_smp->status) & 0x7fff); + /* Should there be a subsequent Get to validate that port is ACTIVE ? */ + } + else + { + level = OSM_LOG_ERROR; + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "osm_pi_rcv_process_set: ERR 0F10: " + "Received error status for SetResp()\n"); + } osm_dump_port_info( p_rcv->p_log, osm_node_get_node_guid( p_node ), port_guid, port_num, p_pi, - OSM_LOG_ERROR); + level); } if( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) Index: opensm/osm_qos.c =================================================================== --- opensm/osm_qos.c (revision 7324) +++ opensm/osm_qos.c (working copy) @@ -260,6 +260,7 @@ static ib_api_status_t vl_high_limit_upd context.pi_context.update_master_sm_base_lid = FALSE; context.pi_context.ignore_errors = FALSE; context.pi_context.light_sweep = FALSE; + context.pi_context.active_transition = FALSE; return osm_req_set(p_req, osm_physp_get_dr_path_ptr(p), payload, sizeof(payload), IB_MAD_ATTR_PORT_INFO, Index: opensm/osm_sm_state_mgr.c =================================================================== --- opensm/osm_sm_state_mgr.c (revision 7324) +++ opensm/osm_sm_state_mgr.c (working copy) @@ -187,6 +187,7 @@ __osm_sm_state_mgr_send_local_port_info_ /* with the new master lid value. */ context.pi_context.update_master_sm_base_lid = TRUE; context.pi_context.light_sweep = FALSE; + context.pi_context.active_transition = FALSE; status = osm_req_get( p_sm_mgr->p_req, osm_physp_get_dr_path_ptr Index: opensm/osm_state_mgr.c =================================================================== --- opensm/osm_state_mgr.c (revision 7324) +++ opensm/osm_state_mgr.c (working copy) @@ -640,6 +640,7 @@ __osm_state_mgr_get_remote_port_info( mad_context.pi_context.light_sweep = TRUE; mad_context.pi_context.ignore_errors = FALSE; mad_context.pi_context.update_master_sm_base_lid = FALSE; + mad_context.pi_context.active_transition = FALSE; /* note that with some negative logic - if the query failed it means that * there is no point in going to heavy sweep */ Index: opensm/osm_sw_info_rcv.c =================================================================== --- opensm/osm_sw_info_rcv.c (revision 7324) +++ opensm/osm_sw_info_rcv.c (working copy) @@ -107,6 +107,7 @@ __osm_si_rcv_get_port_info( context.pi_context.update_master_sm_base_lid = FALSE; context.pi_context.ignore_errors = FALSE; context.pi_context.light_sweep = FALSE; + context.pi_context.active_transition = FALSE; num_ports = osm_node_get_num_physp( p_node ); osm_dr_path_init( &dr_path, From sean.hefty at intel.com Thu May 18 12:23:32 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 May 2006 12:23:32 -0700 Subject: [openib-general] [PATCH] CM: remove pkey from SIDR REQ - use pkey from path rec instead Message-ID: The pkey is provided into a SIDR REQ in two places, once as a parameter, and again in the path record. Remove the pkey as a parameter and always use that given in the path record. This change has no practical effect on ABI functionality. Signed-off-by: Sean Hefty --- Index: include/rdma/ib_cm.h =================================================================== --- include/rdma/ib_cm.h (revision 6884) +++ include/rdma/ib_cm.h (working copy) @@ -546,7 +546,6 @@ struct ib_cm_sidr_req_param { const void *private_data; u8 private_data_len; u8 max_cm_retries; - u16 pkey; }; /** Index: include/rdma/ib_user_cm.h =================================================================== --- include/rdma/ib_user_cm.h (revision 6884) +++ include/rdma/ib_user_cm.h (working copy) @@ -200,7 +200,7 @@ struct ib_ucm_sidr_req { __be64 sid; __u64 data; __u64 path; - __u16 pkey; + __u16 reserved_pkey; __u8 len; __u8 max_cm_retries; __u8 reserved[4]; Index: core/ucm.c =================================================================== --- core/ucm.c (revision 7119) +++ core/ucm.c (working copy) @@ -1044,7 +1044,6 @@ static ssize_t ib_ucm_send_sidr_req(stru param.service_id = cmd.sid; param.timeout_ms = cmd.timeout; param.max_cm_retries = cmd.max_cm_retries; - param.pkey = cmd.pkey; ctx = ib_ucm_ctx_get(file, cmd.id); if (!IS_ERR(ctx)) { Index: core/cm.c =================================================================== --- core/cm.c (revision 7197) +++ core/cm.c (working copy) @@ -2615,7 +2606,7 @@ static void cm_format_sidr_req(struct cm cm_format_mad_hdr(&sidr_req_msg->hdr, CM_SIDR_REQ_ATTR_ID, cm_form_tid(cm_id_priv, CM_MSG_SEQUENCE_SIDR)); sidr_req_msg->request_id = cm_id_priv->id.local_id; - sidr_req_msg->pkey = cpu_to_be16(param->pkey); + sidr_req_msg->pkey = cpu_to_be16(param->path->pkey); sidr_req_msg->service_id = param->service_id; if (param->private_data && param->private_data_len) From sean.hefty at intel.com Thu May 18 12:27:15 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 May 2006 12:27:15 -0700 Subject: [openib-general] [PATCH 1/3] verbs: add call to initialize ib_ah_attr from a work completion Message-ID: Expose a new call to initialize address handle attributes from a work completion. This functionality is duplicated by both verbs and the CM. Signed-off-by: Sean Hefty --- Index: include/rdma/ib_verbs.h =================================================================== --- include/rdma/ib_verbs.h (revision 6885) +++ include/rdma/ib_verbs.h (working copy) @@ -1110,6 +1110,20 @@ int ib_dealloc_pd(struct ib_pd *pd); struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr); /** + * ib_init_ah_from_wc - Initializes address handle attributes from a + * work completion. + * @device: Device on which the received message arrived. + * @port_num: Port on which the received message arrived. + * @wc: Work completion associated with the received message. + * @grh: References the received global route header. This parameter is + * ignored unless the work completion indicates that the GRH is valid. + * @ah_attr: Returned attributes that can be used when creating an address + * handle for replying to the message. + */ +int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc, + struct ib_grh *grh, struct ib_ah_attr *ah_attr); + +/** * ib_create_ah_from_wc - Creates an address handle associated with the * sender of the specified work completion. * @pd: The protection domain associated with the address handle. Index: core/verbs.c =================================================================== --- core/verbs.c (revision 6884) +++ core/verbs.c (working copy) @@ -125,35 +125,47 @@ struct ib_ah *ib_create_ah(struct ib_pd } EXPORT_SYMBOL(ib_create_ah); -struct ib_ah *ib_create_ah_from_wc(struct ib_pd *pd, struct ib_wc *wc, - struct ib_grh *grh, u8 port_num) +int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc, + struct ib_grh *grh, struct ib_ah_attr *ah_attr) { - struct ib_ah_attr ah_attr; u32 flow_class; u16 gid_index; int ret; - memset(&ah_attr, 0, sizeof ah_attr); - ah_attr.dlid = wc->slid; - ah_attr.sl = wc->sl; - ah_attr.src_path_bits = wc->dlid_path_bits; - ah_attr.port_num = port_num; + memset(ah_attr, 0, sizeof *ah_attr); + ah_attr->dlid = wc->slid; + ah_attr->sl = wc->sl; + ah_attr->src_path_bits = wc->dlid_path_bits; + ah_attr->port_num = port_num; if (wc->wc_flags & IB_WC_GRH) { - ah_attr.ah_flags = IB_AH_GRH; - ah_attr.grh.dgid = grh->sgid; + ah_attr->ah_flags = IB_AH_GRH; + ah_attr->grh.dgid = grh->sgid; - ret = ib_find_cached_gid(pd->device, &grh->dgid, &port_num, + ret = ib_find_cached_gid(device, &grh->dgid, &port_num, &gid_index); if (ret) - return ERR_PTR(ret); + return ret; - ah_attr.grh.sgid_index = (u8) gid_index; + ah_attr->grh.sgid_index = (u8) gid_index; flow_class = be32_to_cpu(grh->version_tclass_flow); - ah_attr.grh.flow_label = flow_class & 0xFFFFF; - ah_attr.grh.traffic_class = (flow_class >> 20) & 0xFF; - ah_attr.grh.hop_limit = grh->hop_limit; + ah_attr->grh.flow_label = flow_class & 0xFFFFF; + ah_attr->grh.hop_limit = grh->hop_limit; + ah_attr->grh.traffic_class = (flow_class >> 20) & 0xFF; } + return 0; +} +EXPORT_SYMBOL(ib_init_ah_from_wc); + +struct ib_ah *ib_create_ah_from_wc(struct ib_pd *pd, struct ib_wc *wc, + struct ib_grh *grh, u8 port_num) +{ + struct ib_ah_attr ah_attr; + int ret; + + ret = ib_init_ah_from_wc(pd->device, port_num, wc, grh, &ah_attr); + if (ret) + return ERR_PTR(ret); return ib_create_ah(pd, &ah_attr); } From sean.hefty at intel.com Thu May 18 12:32:47 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 May 2006 12:32:47 -0700 Subject: [openib-general] [PATCH 2/3] SA: add call to initialize ib_ah_attr from a path record In-Reply-To: Message-ID: Add a new call to initialize address handle attributes given a path record. This functionality is used by the CM, and would also be useful for users of UD QPs. Signed-off-by: Sean Hefty --- I located this function in the SA to avoid dependencies between ib_core and ib_sa. This patch may be applied independently of the previous patch in this series. Index: include/rdma/ib_sa.h =================================================================== --- include/rdma/ib_sa.h (revision 6884) +++ include/rdma/ib_sa.h (working copy) @@ -373,6 +373,14 @@ ib_sa_mcmember_rec_delete(struct ib_devi } /** + * ib_init_ah_from_path - Initialize address handle attributes based on an SA + * path record. + */ +int ib_init_ah_from_path(struct ib_device *device, u8 port_num, + struct ib_sa_path_rec *rec, + struct ib_ah_attr *ah_attr); + +/** * ib_sa_pack_attr - Copy an SA attribute from a host defined structure to * a network packed structure. * dst: Destination buffer. Index: core/sa_query.c =================================================================== --- core/sa_query.c (revision 6884) +++ core/sa_query.c (working copy) @@ -46,6 +46,7 @@ #include #include +#include MODULE_AUTHOR("Roland Dreier"); MODULE_DESCRIPTION("InfiniBand subnet administration query support"); @@ -440,6 +441,36 @@ void ib_sa_cancel_query(int id, struct i } EXPORT_SYMBOL(ib_sa_cancel_query); +int ib_init_ah_from_path(struct ib_device *device, u8 port_num, + struct ib_sa_path_rec *rec, struct ib_ah_attr *ah_attr) +{ + int ret; + u16 gid_index; + + memset(ah_attr, 0, sizeof *ah_attr); + ah_attr->dlid = be16_to_cpu(rec->dlid); + ah_attr->sl = rec->sl; + ah_attr->src_path_bits = be16_to_cpu(rec->slid) & 0x7F; + ah_attr->port_num = port_num; + + if (rec->sgid.global.subnet_prefix != rec->dgid.global.subnet_prefix) { + ah_attr->ah_flags = IB_AH_GRH; + ah_attr->grh.dgid = rec->dgid; + + ret = ib_find_cached_gid(device, &rec->sgid, &port_num, + &gid_index); + if (ret) + return ret; + + ah_attr->grh.sgid_index = (u8) gid_index; + ah_attr->grh.flow_label = be32_to_cpu(rec->flow_label); + ah_attr->grh.hop_limit = rec->hop_limit; + ah_attr->grh.traffic_class = rec->traffic_class; + } + return 0; +} +EXPORT_SYMBOL(ib_init_ah_from_path); + int ib_sa_pack_attr(void *dst, void *src, int attr_id) { switch (attr_id) { From rdreier at cisco.com Thu May 18 12:33:39 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 18 May 2006 12:33:39 -0700 Subject: [openib-general] [git pull] Please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus The changes and patch are: Ishai Rabinovitz: IB/srp: Complete correct SCSI commands on device reset Michael S. Tsirkin: IB/mthca: Fix posting lists of 256 receive requests for Tavor Roland Dreier: IB/mthca: Make fw_cmd_doorbell default to 0 IB/srp: Don't wait for disconnection if sending DREQ fails IB/srp: Get rid of extra scsi_host_put()s if reconnection fails IB/uverbs: Don't leak ref to mm on error path drivers/infiniband/core/uverbs_mem.c | 4 +++- drivers/infiniband/hw/mthca/mthca_cmd.c | 2 +- drivers/infiniband/hw/mthca/mthca_qp.c | 35 ++++++++++++++++--------------- drivers/infiniband/ulp/srp/ib_srp.c | 10 ++++----- 4 files changed, 27 insertions(+), 24 deletions(-) diff --git a/drivers/infiniband/core/uverbs_mem.c b/drivers/infiniband/core/uverbs_mem.c index 36a32c3..efe147d 100644 --- a/drivers/infiniband/core/uverbs_mem.c +++ b/drivers/infiniband/core/uverbs_mem.c @@ -211,8 +211,10 @@ void ib_umem_release_on_close(struct ib_ */ work = kmalloc(sizeof *work, GFP_KERNEL); - if (!work) + if (!work) { + mmput(mm); return; + } INIT_WORK(&work->work, ib_umem_account, work); work->mm = mm; diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c index 1985b5d..798e13e 100644 --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -182,7 +182,7 @@ struct mthca_cmd_context { u8 status; }; -static int fw_cmd_doorbell = 1; +static int fw_cmd_doorbell = 0; module_param(fw_cmd_doorbell, int, 0644); MODULE_PARM_DESC(fw_cmd_doorbell, "post FW commands through doorbell page if nonzero " "(and supported by FW)"); diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 19765f6..07c13be 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -1727,23 +1727,7 @@ int mthca_tavor_post_receive(struct ib_q ind = qp->rq.next_ind; - for (nreq = 0; wr; ++nreq, wr = wr->next) { - if (unlikely(nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB)) { - nreq = 0; - - doorbell[0] = cpu_to_be32((qp->rq.next_ind << qp->rq.wqe_shift) | size0); - doorbell[1] = cpu_to_be32(qp->qpn << 8); - - wmb(); - - mthca_write64(doorbell, - dev->kar + MTHCA_RECEIVE_DOORBELL, - MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); - - qp->rq.head += MTHCA_TAVOR_MAX_WQES_PER_RECV_DB; - size0 = 0; - } - + for (nreq = 0; wr; wr = wr->next) { if (mthca_wq_overflow(&qp->rq, nreq, qp->ibqp.recv_cq)) { mthca_err(dev, "RQ %06x full (%u head, %u tail," " %d max, %d nreq)\n", qp->qpn, @@ -1797,6 +1781,23 @@ int mthca_tavor_post_receive(struct ib_q ++ind; if (unlikely(ind >= qp->rq.max)) ind -= qp->rq.max; + + ++nreq; + if (unlikely(nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB)) { + nreq = 0; + + doorbell[0] = cpu_to_be32((qp->rq.next_ind << qp->rq.wqe_shift) | size0); + doorbell[1] = cpu_to_be32(qp->qpn << 8); + + wmb(); + + mthca_write64(doorbell, + dev->kar + MTHCA_RECEIVE_DOORBELL, + MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); + + qp->rq.head += MTHCA_TAVOR_MAX_WQES_PER_RECV_DB; + size0 = 0; + } } out: diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index c32ce43..9cbdffa 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -340,7 +340,10 @@ static void srp_disconnect_target(struct /* XXX should send SRP_I_LOGOUT request */ init_completion(&target->done); - ib_send_cm_dreq(target->cm_id, NULL, 0); + if (ib_send_cm_dreq(target->cm_id, NULL, 0)) { + printk(KERN_DEBUG PFX "Sending CM DREQ failed\n"); + return; + } wait_for_completion(&target->done); } @@ -351,7 +354,6 @@ static void srp_remove_work(void *target spin_lock_irq(target->scsi_host->host_lock); if (target->state != SRP_TARGET_DEAD) { spin_unlock_irq(target->scsi_host->host_lock); - scsi_host_put(target->scsi_host); return; } target->state = SRP_TARGET_REMOVED; @@ -365,8 +367,6 @@ static void srp_remove_work(void *target ib_destroy_cm_id(target->cm_id); srp_free_target_ib(target); scsi_host_put(target->scsi_host); - /* And another put to really free the target port... */ - scsi_host_put(target->scsi_host); } static int srp_connect_target(struct srp_target_port *target) @@ -1241,7 +1241,7 @@ static int srp_reset_device(struct scsi_ list_for_each_entry_safe(req, tmp, &target->req_queue, list) if (req->scmnd->device == scmnd->device) { req->scmnd->result = DID_RESET << 16; - scmnd->scsi_done(scmnd); + req->scmnd->scsi_done(req->scmnd); srp_remove_req(target, req); } From sean.hefty at intel.com Thu May 18 12:35:54 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 May 2006 12:35:54 -0700 Subject: [openib-general] [PATCH 3/3] cm: convert CM to use address handle initialization helper functions In-Reply-To: Message-ID: Convert the CM to make use of the newly exporting helper function for initializing address handle attributes. Signed-off-by: Sean Hefty --- Index: core/cm.c =================================================================== --- core/cm.c (revision 7197) +++ core/cm.c (working copy) @@ -254,23 +255,13 @@ static void cm_set_private_data(struct c cm_id_priv->private_data_len = private_data_len; } -static void cm_set_ah_attr(struct ib_ah_attr *ah_attr, u8 port_num, - u16 dlid, u8 sl, u16 src_path_bits) -{ - memset(ah_attr, 0, sizeof ah_attr); - ah_attr->dlid = dlid; - ah_attr->sl = sl; - ah_attr->src_path_bits = src_path_bits; - ah_attr->port_num = port_num; -} - -static void cm_init_av_for_response(struct cm_port *port, - struct ib_wc *wc, struct cm_av *av) +static void cm_init_av_for_response(struct cm_port *port, struct ib_wc *wc, + struct ib_grh *grh, struct cm_av *av) { av->port = port; av->pkey_index = wc->pkey_index; - cm_set_ah_attr(&av->ah_attr, port->port_num, wc->slid, - wc->sl, wc->dlid_path_bits); + ib_init_ah_from_wc(port->cm_dev->device, port->port_num, wc, + grh, &av->ah_attr); } static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av) @@ -300,9 +291,8 @@ static int cm_init_av_by_path(struct ib_ return ret; av->port = port; - cm_set_ah_attr(&av->ah_attr, av->port->port_num, - be16_to_cpu(path->dlid), path->sl, - be16_to_cpu(path->slid) & 0x7F); + ib_init_ah_from_path(cm_dev->device, port->port_num, path, + &av->ah_attr); av->packet_life_time = path->packet_life_time; return 0; } @@ -1342,6 +1332,7 @@ static int cm_req_handler(struct cm_work cm_id_priv = container_of(cm_id, struct cm_id_private, id); cm_id_priv->id.remote_id = req_msg->local_comm_id; cm_init_av_for_response(work->port, work->mad_recv_wc->wc, + work->mad_recv_wc->recv_buf.grh, &cm_id_priv->av); cm_id_priv->timewait_info = cm_create_timewait_info(cm_id_priv-> id.local_id); @@ -2707,6 +2698,7 @@ static int cm_sidr_req_handler(struct cm cm_id_priv->av.dgid.global.subnet_prefix = cpu_to_be64(wc->slid); cm_id_priv->av.dgid.global.interface_id = 0; cm_init_av_for_response(work->port, work->mad_recv_wc->wc, + work->mad_recv_wc->recv_buf.grh, &cm_id_priv->av); cm_id_priv->id.remote_id = sidr_req_msg->request_id; cm_id_priv->id.state = IB_CM_SIDR_REQ_RCVD; From btmiller at helix.nih.gov Thu May 18 12:41:38 2006 From: btmiller at helix.nih.gov (Tim Miller) Date: Thu, 18 May 2006 15:41:38 -0400 Subject: [openib-general] OpenIB 1.0 RC + PathScale problem In-Reply-To: References: Message-ID: On Thu, 18 May 2006, Tim Miller wrote: > Thanks to Bryan, Jack, and yourself for responding. I suspect it is a library > issue, but I'm having some trouble tracking down the exact source of the > problem. The first odd thing that doing an nm of > /usr/local/lib/infiniband/ipathverbs.so shows ibv_cmd_poll_cq is indeed > undefined (the symbol is defined in libibverbs.so). Here's the ldd output for > ipathverbs.so: > > tim at o8:/usr/local/lib 165$ ldd infiniband/ipathverbs.so > libc.so.6 => /lib64/tls/libc.so.6 (0x00002b01d0d00000) > /lib64/ld-linux-x86-64.so.2 (0x0000555555554000) > > Oddly, it doesn't seem to depend on libibverbs. Is that normal? If not, I > must be doing something wrong in building the libraries. I tried uninstalling > all the userspace stuff (via make uninstall) and reconfiguring/remaking after > a make clean in the source dir. I built, in order, libibverbs, libipathvers, > and libmthca. Is that the correct way to do things (it's what I got from the > quickstart Wiki entry)? Replying to my own post, it seems like the Makefile for libipathverbs is missing -libverbs. I added this and it gets rid of the error about uthe missing symbol. My application is still segfaulting, but I'm not sure if this is OpenIB's fault, though. Cheers, Tim -- Tim Miller System Administrator -- Laboratory of Computational Biology National Institutes of Health -- Bldg. 50 Rm. 3309 -- 301-402-0618 From sean.hefty at intel.com Thu May 18 12:44:52 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 May 2006 12:44:52 -0700 Subject: [openib-general] [PATCH] CM: store and return attributes needed to send to a UD QP after SIDR Message-ID: Modify the CM to maintain the necessary information needed to send to a UD QP after a user has performed SIDR. Expose the remote QPN, remote QKey, and address handle attributes through the ib_cm_init_qp_attr() routine, so that the information is available from userspace without changing the ABI. Signed-off-by: Sean Hefty --- Index: core/cm.c =================================================================== --- core/cm.c (revision 7197) +++ core/cm.c (working copy) @@ -138,6 +138,7 @@ struct cm_id_private { __be64 tid; __be32 local_qpn; __be32 remote_qpn; + __be32 remote_qkey; enum ib_qp_type qp_type; __be32 sq_psn; __be32 rq_psn; @@ -2845,6 +2837,9 @@ static int cm_sidr_rep_handler(struct cm } cm_id_priv->id.state = IB_CM_IDLE; ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg); + + cm_id_priv->remote_qpn = cm_sidr_rep_get_qpn(sidr_rep_msg); + cm_id_priv->remote_qkey = sidr_rep_msg->qkey; spin_unlock_irqrestore(&cm_id_priv->lock, flags); cm_format_sidr_rep_event(work); @@ -3179,10 +3174,17 @@ static int cm_init_qp_rts_attr(struct cm int *qp_attr_mask) { unsigned long flags; - int ret; + int ret = 0; spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->id.state) { + case IB_CM_IDLE: + /* UD QP - return attributes to send to remote QP */ + *qp_attr_mask = IB_QP_AV | IB_QP_DEST_QPN | IB_QP_QKEY; + qp_attr->ah_attr = cm_id_priv->av.ah_attr; + qp_attr->dest_qp_num = be32_to_cpu(cm_id_priv->remote_qpn); + qp_attr->qkey = be32_to_cpu(cm_id_priv->remote_qkey); + break; case IB_CM_REP_RCVD: case IB_CM_MRA_REP_SENT: case IB_CM_REP_SENT: @@ -3203,7 +3205,6 @@ static int cm_init_qp_rts_attr(struct cm *qp_attr_mask |= IB_QP_PATH_MIG_STATE; qp_attr->path_mig_state = IB_MIG_REARM; } - ret = 0; break; default: ret = -EINVAL; From halr at voltaire.com Thu May 18 12:39:59 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 May 2006 15:39:59 -0400 Subject: [openib-general] [PATCH] OpenSM/complib: Increment version number Message-ID: <1147981193.18971.82279.camel@hal.voltaire.com> OpenSM/complib: Increment version number (due to cl_mem* removal) Signed-off-by: Hal Rosenstock Index: complib/libosmcomp.map =================================================================== --- complib/libosmcomp.map (revision 7324) +++ complib/libosmcomp.map (working copy) @@ -1,4 +1,4 @@ -OSMCOMP_1.0 { +OSMCOMP_2.0 { global: cl_async_proc_construct; cl_async_proc_init; Index: complib/libosmcomp.ver =================================================================== --- complib/libosmcomp.ver (revision 7324) +++ complib/libosmcomp.ver (working copy) @@ -6,4 +6,4 @@ # API_REV - advance on any added API # RUNNING_REV - advance any change to the vendor files # AGE - number of backword versions the API still supports -LIBVERSION=1:0:0 +LIBVERSION=2:0:0 From sean.hefty at intel.com Thu May 18 13:06:21 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 May 2006 13:06:21 -0700 Subject: [openib-general] [RFC] [PATCH] RDMA CM: add UD QP support Message-ID: Add base support for UD QPs to the RDMA CM. This allows users of UD QPs access to the CMA's address translation services. >From a usage model, UD QP support is provided through the UDP port space. Client calls are essentially the same as that used to establish a connection. That is, a client calls: resolve_addr, resolve_route, and connect. A server calls: listen and accept. Connect and accept correspond to SIDR REQ / SIDR REP, respectively. This patch introduces a new protocol for SIDR that is the same as that used by the CMA for connection REQs. Signed-off-by: Sean Hefty --- Index: include/rdma/rdma_cm.h =================================================================== --- include/rdma/rdma_cm.h (revision 6993) +++ include/rdma/rdma_cm.h (working copy) @@ -212,9 +212,15 @@ struct rdma_conn_param { /** * rdma_connect - Initiate an active connection request. + * @id: Connection identifier to connect. + * @conn_param: Connection information used for connected QPs. * * Users must have resolved a route for the rdma_cm_id to connect with * by having called rdma_resolve_route before calling this routine. + * + * This call will either connect to a remote QP or obtain remote QP + * information for unconnected rdma_cm_id's. The actual operation is + * based on the rdma_cm_id's port space. */ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); @@ -253,4 +259,3 @@ int rdma_reject(struct rdma_cm_id *id, c int rdma_disconnect(struct rdma_cm_id *id); #endif /* RDMA_CM_H */ - Index: include/rdma/rdma_user_cm.h =================================================================== --- include/rdma/rdma_user_cm.h (revision 6949) +++ include/rdma/rdma_user_cm.h (working copy) @@ -38,7 +38,7 @@ #include #include -#define RDMA_USER_CM_ABI_VERSION 1 +#define RDMA_USER_CM_ABI_VERSION 2 #define RDMA_MAX_PRIVATE_DATA 256 @@ -72,6 +72,8 @@ struct rdma_ucm_cmd_hdr { struct rdma_ucm_create_id { __u64 uid; __u64 response; + __u16 ps; + __u8 reserved[6]; }; struct rdma_ucm_create_id_resp { Index: core/cma.c =================================================================== --- core/cma.c (revision 7339) +++ core/cma.c (working copy) @@ -66,6 +66,7 @@ static DEFINE_MUTEX(lock); static struct workqueue_struct *cma_wq; static DEFINE_IDR(sdp_ps); static DEFINE_IDR(tcp_ps); +static DEFINE_IDR(udp_ps); struct cma_device { struct list_head list; @@ -491,9 +492,17 @@ static inline int cma_any_addr(struct so return cma_zero_addr(addr) || cma_loopback_addr(addr); } +static inline __be16 cma_port(struct sockaddr *addr) +{ + if (addr->sa_family == AF_INET) + return ((struct sockaddr_in *) addr)->sin_port; + else + return ((struct sockaddr_in6 *) addr)->sin6_port; +} + static inline int cma_any_port(struct sockaddr *addr) { - return !((struct sockaddr_in *) addr)->sin_port; + return !cma_port(addr); } static int cma_get_net_info(void *hdr, enum rdma_port_space ps, @@ -833,8 +842,8 @@ out: return ret; } -static struct rdma_id_private* cma_new_id(struct rdma_cm_id *listen_id, - struct ib_cm_event *ib_event) +static struct rdma_id_private* cma_new_conn_id(struct rdma_cm_id *listen_id, + struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv; struct rdma_cm_id *id; @@ -877,6 +886,42 @@ err: return NULL; } +static struct rdma_id_private* cma_new_udp_id(struct rdma_cm_id *listen_id, + struct ib_cm_event *ib_event) +{ + struct rdma_id_private *id_priv; + struct rdma_cm_id *id; + union cma_ip_addr *src, *dst; + __u16 port; + u8 ip_ver; + int ret; + + id = rdma_create_id(listen_id->event_handler, listen_id->context, + listen_id->ps); + if (IS_ERR(id)) + return NULL; + + + if (cma_get_net_info(ib_event->private_data, listen_id->ps, + &ip_ver, &port, &src, &dst)) + goto err; + + cma_save_net_info(&id->route.addr, &listen_id->route.addr, + ip_ver, port, src, dst); + + ret = rdma_translate_ip(&id->route.addr.src_addr, + &id->route.addr.dev_addr); + if (ret) + goto err; + + id_priv = container_of(id, struct rdma_id_private, id); + id_priv->state = CMA_CONNECT; + return id_priv; +err: + rdma_destroy_id(id); + return NULL; +} + static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct rdma_id_private *listen_id, *conn_id; @@ -889,7 +934,10 @@ static int cma_req_handler(struct ib_cm_ goto out; } - conn_id = cma_new_id(&listen_id->id, ib_event); + if (listen_id->id.ps == RDMA_PS_UDP) + conn_id = cma_new_udp_id(&listen_id->id, ib_event); + else + conn_id = cma_new_conn_id(&listen_id->id, ib_event); if (!conn_id) { ret = -ENOMEM; goto out; @@ -926,8 +974,7 @@ out: static __be64 cma_get_service_id(enum rdma_port_space ps, struct sockaddr *addr) { - return cpu_to_be64(((u64)ps << 16) + - be16_to_cpu(((struct sockaddr_in *) addr)->sin_port)); + return cpu_to_be64(((u64)ps << 16) + be16_to_cpu(cma_port(addr))); } static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr, @@ -1508,6 +1555,9 @@ static int cma_get_port(struct rdma_id_p case RDMA_PS_TCP: ps = &tcp_ps; break; + case RDMA_PS_UDP: + ps = &udp_ps; + break; default: return -EPROTONOSUPPORT; } @@ -1586,6 +1636,94 @@ static int cma_format_hdr(void *hdr, enu return 0; } +static int cma_sidr_rep_handler(struct ib_cm_id *cm_id, + struct ib_cm_event *ib_event) +{ + struct rdma_id_private *id_priv = cm_id->context; + enum rdma_cm_event_type event; + struct ib_cm_sidr_rep_event_param *rep = &ib_event->param.sidr_rep_rcvd; + struct rdma_route *route; + int ret, status; + + if (!cma_comp(id_priv, CMA_CONNECT)) + return 0; + + atomic_inc(&id_priv->dev_remove); + switch (ib_event->event) { + case IB_CM_SIDR_REQ_ERROR: + event = RDMA_CM_EVENT_UNREACHABLE; + status = -ETIMEDOUT; + break; + case IB_CM_SIDR_REP_RECEIVED: + if (rep->status != IB_SIDR_SUCCESS) { + event = RDMA_CM_EVENT_UNREACHABLE; + status = ib_event->param.sidr_rep_rcvd.status; + break; + } + route = &id_priv->id.route; + if (rep->qkey != ntohs(cma_port(&route->addr.dst_addr))) { + event = RDMA_CM_EVENT_UNREACHABLE; + status = -EINVAL; + break; + } + event = RDMA_CM_EVENT_ESTABLISHED; + status = 0; + break; + default: + printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d", + ib_event->event); + ret = 0; + goto out; + } + + ret = cma_notify_user(id_priv, event, status, NULL, 0); + if (ret) { + /* Destroy the CM ID by returning a non-zero value. */ + id_priv->cm_id.ib = NULL; + cma_exch(id_priv, CMA_DESTROYING); + cma_release_remove(id_priv); + rdma_destroy_id(&id_priv->id); + return ret; + } +out: + cma_release_remove(id_priv); + return ret; +} + +static int cma_resolve_ib_udp(struct rdma_id_private *id_priv) +{ + struct ib_cm_sidr_req_param req; + struct rdma_route *route; + struct cma_hdr hdr; + int ret; + + id_priv->cm_id.ib = ib_create_cm_id(id_priv->id.device, + cma_sidr_rep_handler, id_priv); + if (IS_ERR(id_priv->cm_id.ib)) + return PTR_ERR(id_priv->cm_id.ib); + + route = &id_priv->id.route; + ret = cma_format_hdr(&hdr, id_priv->id.ps, route); + if (ret) + goto out; + + req.path = route->path_rec; + req.service_id = cma_get_service_id(id_priv->id.ps, + &route->addr.dst_addr); + req.timeout_ms = CMA_CM_RESPONSE_TIMEOUT; + req.private_data = &hdr; + req.private_data_len = sizeof hdr; + req.max_cm_retries = CMA_MAX_CM_RETRIES; + + ret = ib_send_cm_sidr_req(id_priv->cm_id.ib, &req); +out: + if (ret) { + ib_destroy_cm_id(id_priv->cm_id.ib); + id_priv->cm_id.ib = NULL; + } + return ret; +} + static int cma_connect_ib(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { @@ -1660,7 +1798,10 @@ int rdma_connect(struct rdma_cm_id *id, switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: - ret = cma_connect_ib(id_priv, conn_param); + if (id->ps == RDMA_PS_UDP) + ret = cma_resolve_ib_udp(id_priv); + else + ret = cma_connect_ib(id_priv, conn_param); break; default: ret = -ENOSYS; @@ -1702,6 +1843,21 @@ static int cma_accept_ib(struct rdma_id_ return ib_send_cm_rep(id_priv->cm_id.ib, &rep); } +static int cma_send_sidr_rep(struct rdma_id_private *id_priv, + enum ib_cm_sidr_status status) +{ + struct ib_cm_sidr_rep_param rep; + + memset(&rep, 0, sizeof rep); + rep.status = status; + if (status == IB_SIDR_SUCCESS) { + rep.qp_num = id_priv->qp_num; + rep.qkey = ntohs(cma_port(&id_priv->id.route.addr.src_addr)); + } + + return ib_send_cm_sidr_rep(id_priv->cm_id.ib, &rep); +} + int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) { struct rdma_id_private *id_priv; @@ -1719,7 +1875,9 @@ int rdma_accept(struct rdma_cm_id *id, s switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: - if (conn_param) + if (id->ps == RDMA_PS_UDP) + ret = cma_send_sidr_rep(id_priv, IB_SIDR_SUCCESS); + else if (conn_param) ret = cma_accept_ib(id_priv, conn_param); else ret = cma_rep_recv(id_priv); @@ -1752,9 +1910,12 @@ int rdma_reject(struct rdma_cm_id *id, c switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: - ret = ib_send_cm_rej(id_priv->cm_id.ib, - IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, - private_data, private_data_len); + if (id->ps == RDMA_PS_UDP) + ret = cma_send_sidr_rep(id_priv, IB_SIDR_REJECT); + else + ret = ib_send_cm_rej(id_priv->cm_id.ib, + IB_CM_REJ_CONSUMER_DEFINED, NULL, + 0, private_data, private_data_len); break; default: ret = -ENOSYS; @@ -1916,6 +2077,7 @@ static void cma_cleanup(void) destroy_workqueue(cma_wq); idr_destroy(&sdp_ps); idr_destroy(&tcp_ps); + idr_destroy(&udp_ps); } module_init(cma_init); Index: core/ucma.c =================================================================== --- core/ucma.c (revision 7119) +++ core/ucma.c (working copy) @@ -291,7 +291,7 @@ static ssize_t ucma_create_id(struct ucm return -ENOMEM; ctx->uid = cmd.uid; - ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, RDMA_PS_TCP); + ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, cmd.ps); if (IS_ERR(ctx->cm_id)) { ret = PTR_ERR(ctx->cm_id); goto err1; From sean.hefty at intel.com Thu May 18 13:10:43 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 May 2006 13:10:43 -0700 Subject: [openib-general] [PATCH] libibverbs: Add ibv_create/init_ah_from_wc() to userspace Message-ID: Add the helper functions to initialize address handle attributes and create them from work completion information. This assists users of UD QPs in creating reply messages. Signed-off-by: Sean Hefty --- Index: include/infiniband/verbs.h =================================================================== --- include/infiniband/verbs.h (revision 7016) +++ include/infiniband/verbs.h (working copy) @@ -299,6 +299,15 @@ struct ibv_global_route { uint8_t traffic_class; }; +struct ibv_grh { + uint32_t version_tclass_flow; + uint16_t paylen; + uint8_t next_hdr; + uint8_t hop_limit; + union ibv_gid sgid; + union ibv_gid dgid; +}; + enum ibv_rate { IBV_RATE_MAX = 0, IBV_RATE_2_5_GBPS = 2, @@ -942,6 +951,36 @@ static inline int ibv_post_recv(struct i struct ibv_ah *ibv_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr); /** + * ibv_init_ah_from_wc - Initializes address handle attributes from a + * work completion. + * @context: Device context on which the received message arrived. + * @port_num: Port on which the received message arrived. + * @wc: Work completion associated with the received message. + * @grh: References the received global route header. This parameter is + * ignored unless the work completion indicates that the GRH is valid. + * @ah_attr: Returned attributes that can be used when creating an address + * handle for replying to the message. + */ +int ibv_init_ah_from_wc(struct ibv_context *context, uint8_t port_num, + struct ibv_wc *wc, struct ibv_grh *grh, + struct ibv_ah_attr *ah_attr); + +/** + * ibv_create_ah_from_wc - Creates an address handle associated with the + * sender of the specified work completion. + * @pd: The protection domain associated with the address handle. + * @wc: Work completion information associated with a received message. + * @grh: References the received global route header. This parameter is + * ignored unless the work completion indicates that the GRH is valid. + * @port_num: The outbound port number to associate with the address. + * + * The address handle is used to reference a local or global destination + * in all UD QP post sends. + */ +struct ibv_ah *ibv_create_ah_from_wc(struct ibv_pd *pd, struct ibv_wc *wc, + struct ibv_grh *grh, uint8_t port_num); + +/** * ibv_destroy_ah - Destroy an address handle. */ int ibv_destroy_ah(struct ibv_ah *ah); Index: src/libibverbs.map =================================================================== --- src/libibverbs.map (revision 7016) +++ src/libibverbs.map (working copy) @@ -32,6 +32,8 @@ IBVERBS_1.0 { ibv_modify_qp; ibv_destroy_qp; ibv_create_ah; + ibv_init_ah_from_wc; + ibv_create_ah_from_wc; ibv_destroy_ah; ibv_attach_mcast; ibv_detach_mcast; Index: src/verbs.c =================================================================== --- src/verbs.c (revision 7016) +++ src/verbs.c (working copy) @@ -392,6 +392,62 @@ struct ibv_ah *ibv_create_ah(struct ibv_ return ah; } +static int ibv_find_gid_index(struct ibv_context *context, uint8_t port_num, + union ibv_gid *gid) +{ + union ibv_gid sgid; + int i = 0, ret; + + do { + ret = ibv_query_gid(context, port_num, i++, &sgid); + } while (!ret && memcmp(&sgid, gid, sizeof *gid)); + + return ret ? ret : i - 1; +} + +int ibv_init_ah_from_wc(struct ibv_context *context, uint8_t port_num, + struct ibv_wc *wc, struct ibv_grh *grh, + struct ibv_ah_attr *ah_attr) +{ + uint32_t flow_class; + int ret; + + memset(ah_attr, 0, sizeof *ah_attr); + ah_attr->dlid = wc->slid; + ah_attr->sl = wc->sl; + ah_attr->src_path_bits = wc->dlid_path_bits; + ah_attr->port_num = port_num; + + if (wc->wc_flags & IBV_WC_GRH) { + ah_attr->is_global = 1; + ah_attr->grh.dgid = grh->sgid; + + ret = ibv_find_gid_index(context, port_num, &grh->dgid); + if (ret < 0) + return ret; + + ah_attr->grh.sgid_index = (uint8_t) ret; + flow_class = ntohl(grh->version_tclass_flow); + ah_attr->grh.flow_label = flow_class & 0xFFFFF; + ah_attr->grh.hop_limit = grh->hop_limit; + ah_attr->grh.traffic_class = (flow_class >> 20) & 0xFF; + } + return 0; +} + +struct ibv_ah *ibv_create_ah_from_wc(struct ibv_pd *pd, struct ibv_wc *wc, + struct ibv_grh *grh, uint8_t port_num) +{ + struct ibv_ah_attr ah_attr; + int ret; + + ret = ibv_init_ah_from_wc(pd->context, port_num, wc, grh, &ah_attr); + if (ret) + return NULL; + + return ibv_create_ah(pd, &ah_attr); +} + int ibv_destroy_ah(struct ibv_ah *ah) { return ah->context->ops.destroy_ah(ah); Index: ChangeLog =================================================================== --- ChangeLog (revision 7016) +++ ChangeLog (working copy) @@ -1,3 +1,9 @@ +2006-05-16 Sean Hefty + + * src/verbs.c include/infiniband/verbs.h: Add new routines: + ibv_init_ah_from_wc() and ibv_create_ah_from_wc() to simplify UD QP + communication. + 2006-05-02 Roland Dreier * Release version 1.0.3. From sean.hefty at intel.com Thu May 18 13:20:51 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 May 2006 13:20:51 -0700 Subject: [openib-general] [PATCH] libibcm: remove pkey from SIDR REQ In-Reply-To: Message-ID: Remove the pkey from the API for SIDR REQ. The pkey is provided in the path record. Signed-off-by: Sean Hefty --- Index: include/infiniband/cm.h =================================================================== --- include/infiniband/cm.h (revision 7020) +++ include/infiniband/cm.h (working copy) @@ -528,7 +528,6 @@ struct ib_cm_sidr_req_param { void *private_data; uint8_t private_data_len; uint8_t max_cm_retries; - uint16_t pkey; }; /** Index: src/cm.c =================================================================== --- src/cm.c (revision 7020) +++ src/cm.c (working copy) @@ -724,7 +724,7 @@ int ib_cm_send_sidr_req(struct ib_cm_id cmd->id = cm_id->handle; cmd->sid = param->service_id; cmd->timeout = param->timeout_ms; - cmd->pkey = param->pkey; + cmd->pkey = param->path->pkey; cmd->max_cm_retries = param->max_cm_retries; if (param->path) { From mdidomenico at silverstorm.com Thu May 18 13:47:07 2006 From: mdidomenico at silverstorm.com (Di Domenico, Michael) Date: Thu, 18 May 2006 16:47:07 -0400 Subject: [openib-general] OpenIB 1.0 RC + PathScale problem Message-ID: Tim, The only thing I can suggest is to search your filesystem for another 'libib*' library. I've noticed that RedHat EL 4 Update 3 comes with some infiniband libraries under /usr/lib (suse probably does also) and they conflicted with MVAPICH until I removed the rpm's... > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Tim Miller > Sent: Thursday, May 18, 2006 11:25 AM > To: openib-general at openib.org > Subject: RE: [openib-general] OpenIB 1.0 RC + PathScale problem > > On Thu, 18 May 2006, Di Domenico, Michael wrote: > > > I'm certainly no expert, but I came across different but similar issues, > > where my applications where picking up another set of libraries, that I > > wasn't aware were on the system... I was getting the same 'undefined > > symbols' errors. You might want to check for ib libs that might be in > > your path. > > Hi Michael, > > Thanks to Bryan, Jack, and yourself for responding. I suspect it is a > library issue, but I'm having some trouble tracking down the exact source > of the problem. The first odd thing that doing an nm of > /usr/local/lib/infiniband/ipathverbs.so shows ibv_cmd_poll_cq is indeed > undefined (the symbol is defined in libibverbs.so). Here's the ldd output > for ipathverbs.so: > > tim at o8:/usr/local/lib 165$ ldd infiniband/ipathverbs.so > libc.so.6 => /lib64/tls/libc.so.6 (0x00002b01d0d00000) > /lib64/ld-linux-x86-64.so.2 (0x0000555555554000) > > Oddly, it doesn't seem to depend on libibverbs. Is that normal? If not, I > must be doing something wrong in building the libraries. I tried > uninstalling all the userspace stuff (via make uninstall) and > reconfiguring/remaking after a make clean in the source dir. I built, in > order, libibverbs, libipathvers, and libmthca. Is that the correct way to > do things (it's what I got from the quickstart Wiki entry)? > > FWIW, this is on SuSE 9.3 with kernel 2.6.16.16 custom compiled, but this > seems to be entirely an issue with the userspace libs. For completeness, > here is the ldd output for libibverbs: > > tim at o8:/usr/local/lib 166$ ldd libibverbs.so > libsysfs.so.1 => /lib64/libsysfs.so.1 (0x00002ac94dbd4000) > libpthread.so.0 => /lib64/tls/libpthread.so.0 > (0x00002ac94dce0000) > libdl.so.2 => /lib64/libdl.so.2 (0x00002ac94ddf5000) > libc.so.6 => /lib64/tls/libc.so.6 (0x00002ac94def9000) > /lib64/ld-linux-x86-64.so.2 (0x0000555555554000) > > The nm output is pretty long, but let me know if you want to see it too. > > Thanks, > Tim > > -- > Tim Miller > System Administrator -- Laboratory of Computational Biology > National Institutes of Health -- Bldg. 50 Rm. 3309 -- 301-402- > 0618 > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general From sean.hefty at intel.com Thu May 18 14:09:33 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 May 2006 14:09:33 -0700 Subject: [openib-general] [PATCH] librdmacm: add UD QP support In-Reply-To: Message-ID: And a patch to support UD QPs from userspace through the librdmacm. Included in the patch is a test program. Signed-off-by: Sean Hefty --- Index: include/rdma/rdma_cma_ib.h =================================================================== --- include/rdma/rdma_cma_ib.h (revision 0) +++ include/rdma/rdma_cma_ib.h (revision 0) @@ -0,0 +1,47 @@ +/* + * Copyright (c) 2006 Intel Corporation. All rights reserved. + * + * This Software is licensed under one of the following licenses: + * + * 1) under the terms of the "Common Public License 1.0" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/cpl.php. + * + * 2) under the terms of the "The BSD License" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/bsd-license.php. + * + * 3) under the terms of the "GNU General Public License (GPL) Version 2" a + * copy of which is available from the Open Source Initiative, see + * http://www.opensource.org/licenses/gpl-license.php. + * + * Licensee has the right to choose one of the above licenses. + * + * Redistributions of source code must retain the above copyright + * notice and one of the license notices. + * + * Redistributions in binary form must reproduce both the above copyright + * notice, one of the license notices in the documentation + * and/or other materials provided with the distribution. + * + */ + +#if !defined(RDMA_CMA_IB_H) +#define RDMA_CMA_IB_H + +#include + +/** + * rdma_get_udp_dst_attr - Retrieve information about a UDP destination. + * @id: Connection identifier associated with the request. + * @ah_attr: Address handle attributes. A caller uses these attributes to + * create an address handle when communicating with the destination. + * @qpn: The remote QP number associated with the UDP address. + * @qkey: The QKey of the remote QP. + * + * Users must have called rdma_connect() to resolve the destination information. + */ +int rdma_get_dst_attr(struct rdma_cm_id *id, struct ibv_ah_attr *ah_attr, + uint32_t *remote_qpn, uint32_t *remote_qkey); + +#endif /* RDMA_CMA_IB_H */ Property changes on: include/rdma/rdma_cma_ib.h ___________________________________________________________________ Name: svn:executable + * Index: include/rdma/rdma_cma_abi.h =================================================================== --- include/rdma/rdma_cma_abi.h (revision 7021) +++ include/rdma/rdma_cma_abi.h (working copy) @@ -40,7 +40,7 @@ */ #define RDMA_USER_CM_MIN_ABI_VERSION 1 -#define RDMA_USER_CM_MAX_ABI_VERSION 1 +#define RDMA_USER_CM_MAX_ABI_VERSION 2 #define RDMA_MAX_PRIVATE_DATA 256 @@ -68,9 +68,16 @@ struct ucma_abi_cmd_hdr { __u16 out; }; +struct ucma_abi_create_id_v1 { + __u64 uid; + __u64 response; +}; + struct ucma_abi_create_id { __u64 uid; __u64 response; + __u16 ps; + __u8 reserved[6]; }; struct ucma_abi_create_id_resp { Index: include/rdma/rdma_cma.h =================================================================== --- include/rdma/rdma_cma.h (revision 7077) +++ include/rdma/rdma_cma.h (working copy) @@ -54,6 +54,11 @@ enum rdma_cm_event_type { RDMA_CM_EVENT_DEVICE_REMOVAL, }; +enum rdma_port_space { + RDMA_PS_TCP = 0x0106, + RDMA_PS_UDP = 0x0111, +}; + /* Protocol levels for get/set options. */ enum { RDMA_PROTO_IP = 0, @@ -95,6 +100,7 @@ struct rdma_cm_id { void *context; struct ibv_qp *qp; struct rdma_route route; + enum rdma_port_space ps; uint8_t port_num; }; @@ -126,9 +132,11 @@ void rdma_destroy_event_channel(struct r * @id: A reference where the allocated communication identifier will be * returned. * @context: User specified context associated with the rdma_cm_id. + * @ps: RDMA port space. */ int rdma_create_id(struct rdma_event_channel *channel, - struct rdma_cm_id **id, void *context); + struct rdma_cm_id **id, void *context, + enum rdma_port_space ps); /** * rdma_destroy_id - Release a communication identifier. @@ -199,6 +207,10 @@ struct rdma_conn_param { uint8_t flow_control; uint8_t retry_count; /* ignored when accepting */ uint8_t rnr_retry_count; + /* Fields below ignored if a QP is created on the rdma_cm_id. */ + uint8_t srq; + uint32_t qp_num; + enum ibv_qp_type qp_type; }; /** @@ -283,4 +295,18 @@ int rdma_get_option(struct rdma_cm_id *i int rdma_set_option(struct rdma_cm_id *id, int level, int optname, void *optval, size_t optlen); +static inline uint16_t rdma_get_src_port(struct rdma_cm_id *id) +{ + return id->route.addr.src_addr.sin6_family == PF_INET6 ? + id->route.addr.src_addr.sin6_port : + ((struct sockaddr_in *) &id->route.addr.src_addr)->sin_port; +} + +static inline uint16_t rdma_get_dst_port(struct rdma_cm_id *id) +{ + return id->route.addr.dst_addr.sin6_family == PF_INET6 ? + id->route.addr.dst_addr.sin6_port : + ((struct sockaddr_in *) &id->route.addr.dst_addr)->sin_port; +} + #endif /* RDMA_CMA_H */ Index: src/cma.c =================================================================== --- src/cma.c (revision 7079) +++ src/cma.c (working copy) @@ -282,7 +282,8 @@ static void ucma_free_id(struct cma_id_p } static struct cma_id_private *ucma_alloc_id(struct rdma_event_channel *channel, - void *context) + void *context, + enum rdma_port_space ps) { struct cma_id_private *id_priv; @@ -292,6 +293,7 @@ static struct cma_id_private *ucma_alloc memset(id_priv, 0, sizeof *id_priv); id_priv->id.context = context; + id_priv->id.ps = ps; id_priv->id.channel = channel; pthread_mutex_init(&id_priv->mut, NULL); if (pthread_cond_init(&id_priv->cond, NULL)) @@ -303,8 +305,44 @@ err: ucma_free_id(id_priv); return NULL; } +static int ucma_create_id_v1(struct rdma_event_channel *channel, + struct rdma_cm_id **id, void *context, + enum rdma_port_space ps) +{ + struct ucma_abi_create_id_resp *resp; + struct ucma_abi_create_id_v1 *cmd; + struct cma_id_private *id_priv; + void *msg; + int ret, size; + + if (ps != RDMA_PS_TCP) { + fprintf(stderr, "librdmacm: Kernel ABI does not support " + "requested port space.\n"); + return -EPROTONOSUPPORT; + } + + id_priv = ucma_alloc_id(channel, context, ps); + if (!id_priv) + return -ENOMEM; + + CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_CREATE_ID, size); + cmd->uid = (uintptr_t) id_priv; + + ret = write(channel->fd, msg, size); + if (ret != size) + goto err; + + id_priv->handle = resp->id; + *id = &id_priv->id; + return 0; + +err: ucma_free_id(id_priv); + return ret; +} + int rdma_create_id(struct rdma_event_channel *channel, - struct rdma_cm_id **id, void *context) + struct rdma_cm_id **id, void *context, + enum rdma_port_space ps) { struct ucma_abi_create_id_resp *resp; struct ucma_abi_create_id *cmd; @@ -316,12 +354,16 @@ int rdma_create_id(struct rdma_event_cha if (ret) return ret; - id_priv = ucma_alloc_id(channel, context); + if (abi_ver == 1) + return ucma_create_id_v1(channel, id, context, ps); + + id_priv = ucma_alloc_id(channel, context, ps); if (!id_priv) return -ENOMEM; CMA_CREATE_MSG_CMD_RESP(msg, cmd, resp, UCMA_CMD_CREATE_ID, size); cmd->uid = (uintptr_t) id_priv; + cmd->ps = ps; ret = write(channel->fd, msg, size); if (ret != size) @@ -618,6 +660,36 @@ static int ucma_init_ib_qp(struct cma_id IBV_QP_PKEY_INDEX | IBV_QP_PORT); } +static int ucma_init_ud_qp(struct cma_id_private *id_priv, struct ibv_qp *qp) +{ + struct ibv_qp_attr qp_attr; + struct ib_addr *ibaddr; + int ret; + + ibaddr = &id_priv->id.route.addr.addr.ibaddr; + ret = ucma_find_pkey(id_priv->cma_dev, id_priv->id.port_num, + ibaddr->pkey, &qp_attr.pkey_index); + if (ret) + return ret; + + qp_attr.port_num = id_priv->id.port_num; + qp_attr.qp_state = IBV_QPS_INIT; + qp_attr.qkey = ntohs(rdma_get_src_port(&id_priv->id)); + ret = ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE | IBV_QP_PKEY_INDEX | + IBV_QP_PORT | IBV_QP_QKEY); + if (ret) + return ret; + + qp_attr.qp_state = IBV_QPS_RTR; + ret = ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE); + if (ret) + return ret; + + qp_attr.qp_state = IBV_QPS_RTS; + qp_attr.sq_psn = 0; + return ibv_modify_qp(qp, &qp_attr, IBV_QP_STATE | IBV_QP_SQ_PSN); +} + int rdma_create_qp(struct rdma_cm_id *id, struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr) { @@ -633,7 +705,10 @@ int rdma_create_qp(struct rdma_cm_id *id if (!qp) return -ENOMEM; - ret = ucma_init_ib_qp(id_priv, qp); + if (id->ps == RDMA_PS_UDP) + ret = ucma_init_ud_qp(id_priv, qp); + else + ret = ucma_init_ib_qp(id_priv, qp); if (ret) goto err; @@ -651,11 +726,12 @@ void rdma_destroy_qp(struct rdma_cm_id * static void ucma_copy_conn_param_to_kern(struct ucma_abi_conn_param *dst, struct rdma_conn_param *src, - struct ibv_qp *qp) + uint32_t qp_num, + enum ibv_qp_type qp_type, uint8_t srq) { - dst->qp_num = qp->qp_num; - dst->qp_type = qp->qp_type; - dst->srq = (qp->srq != NULL); + dst->qp_num = qp_num; + dst->qp_type = qp_type; + dst->srq = srq; dst->responder_resources = src->responder_resources; dst->initiator_depth = src->initiator_depth; dst->flow_control = src->flow_control; @@ -681,7 +757,15 @@ int rdma_connect(struct rdma_cm_id *id, CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_CONNECT, size); id_priv = container_of(id, struct cma_id_private, id); cmd->id = id_priv->handle; - ucma_copy_conn_param_to_kern(&cmd->conn_param, conn_param, id->qp); + if (id->qp) + ucma_copy_conn_param_to_kern(&cmd->conn_param, conn_param, + id->qp->qp_num, id->qp->qp_type, + (id->qp->srq != NULL)); + else + ucma_copy_conn_param_to_kern(&cmd->conn_param, conn_param, + conn_param->qp_num, + conn_param->qp_type, + conn_param->srq); ret = write(id->channel->fd, msg, size); if (ret != size) @@ -716,15 +800,25 @@ int rdma_accept(struct rdma_cm_id *id, s void *msg; int ret, size; - ret = ucma_modify_qp_rtr(id); - if (ret) - return ret; + if (id->ps != RDMA_PS_UDP) { + ret = ucma_modify_qp_rtr(id); + if (ret) + return ret; + } CMA_CREATE_MSG_CMD(msg, cmd, UCMA_CMD_ACCEPT, size); id_priv = container_of(id, struct cma_id_private, id); cmd->id = id_priv->handle; cmd->uid = (uintptr_t) id_priv; - ucma_copy_conn_param_to_kern(&cmd->conn_param, conn_param, id->qp); + if (id->qp) + ucma_copy_conn_param_to_kern(&cmd->conn_param, conn_param, + id->qp->qp_num, id->qp->qp_type, + (id->qp->srq != NULL)); + else + ucma_copy_conn_param_to_kern(&cmd->conn_param, conn_param, + conn_param->qp_num, + conn_param->qp_type, + conn_param->srq); ret = write(id->channel->fd, msg, size); if (ret != size) { @@ -826,7 +920,8 @@ static int ucma_process_conn_req(struct int ret; listen_id_priv = container_of(event->id, struct cma_id_private, id); - id_priv = ucma_alloc_id(event->id->channel, event->id->context); + id_priv = ucma_alloc_id(event->id->channel, event->id->context, + event->id->ps); if (!id_priv) { ucma_destroy_kern_id(event->id->channel->fd, handle); ret = -ENOMEM; @@ -948,6 +1043,9 @@ retry: } break; case RDMA_CM_EVENT_ESTABLISHED: + if (id_priv->id.ps == RDMA_PS_UDP) + break; + evt->status = ucma_process_establish(&id_priv->id); if (evt->status) { evt->event = RDMA_CM_EVENT_CONNECT_ERROR; @@ -1021,3 +1119,20 @@ int rdma_set_option(struct rdma_cm_id *i return 0; } + +int rdma_get_dst_attr(struct rdma_cm_id *id, struct ibv_ah_attr *ah_attr, + uint32_t *remote_qpn, uint32_t *remote_qkey) +{ + struct ibv_qp_attr qp_attr; + int qp_attr_mask, ret; + + qp_attr.qp_state = IBV_QPS_RTS; + ret = rdma_init_qp_attr(id, &qp_attr, &qp_attr_mask); + if (ret) + return ret; + + *ah_attr = qp_attr.ah_attr; + *remote_qpn = qp_attr.dest_qp_num; + *remote_qkey = qp_attr.qkey; + return 0; +} Index: src/librdmacm.map =================================================================== --- src/librdmacm.map (revision 7019) +++ src/librdmacm.map (working copy) @@ -18,5 +18,6 @@ RDMACM_1.0 { rdma_ack_cm_event; rdma_get_option; rdma_set_option; + rdma_get_dst_attr; local: *; }; Index: librdmacm.spec.in =================================================================== --- librdmacm.spec.in (revision 7018) +++ librdmacm.spec.in (working copy) @@ -66,3 +66,4 @@ rm -rf $RPM_BUILD_ROOT %defattr(-,root,root) %{_bindir}/rping %{_bindir}/ucmatose +%{_bindir}/udaddy Index: Makefile.am =================================================================== --- Makefile.am (revision 7018) +++ Makefile.am (working copy) @@ -18,19 +18,23 @@ endif src_librdmacm_la_SOURCES = src/cma.c src_librdmacm_la_LDFLAGS = -avoid-version $(rdmacm_version_script) -bin_PROGRAMS = examples/ucmatose examples/rping +bin_PROGRAMS = examples/ucmatose examples/rping examples/udaddy examples_ucmatose_SOURCES = examples/cmatose.c examples_ucmatose_LDADD = $(top_builddir)/src/librdmacm.la examples_rping_SOURCES = examples/rping.c examples_rping_LDADD = $(top_builddir)/src/librdmacm.la +examples_udaddy_SOURCES = examples/udaddy.c +examples_udaddy_LDADD = $(top_builddir)/src/librdmacm.la librdmacmincludedir = $(includedir)/rdma librdmacminclude_HEADERS = include/rdma/rdma_cma_abi.h \ + include/rdma/rdma_cma_ib.h \ include/rdma/rdma_cma.h EXTRA_DIST = include/rdma/rdma_cma_abi.h \ include/rdma/rdma_cma.h \ + include/rdma/rdma_cma_ib.h \ src/librdmacm.map \ librdmacm.spec.in Index: examples/rping.c =================================================================== --- examples/rping.c (revision 7019) +++ examples/rping.c (working copy) @@ -1028,7 +1028,7 @@ int main(int argc, char *argv[]) goto out; } - ret = rdma_create_id(cb->cm_channel, &cb->cm_id, cb); + ret = rdma_create_id(cb->cm_channel, &cb->cm_id, cb, RDMA_PS_TCP); if (ret) { ret = errno; fprintf(stderr, "rdma_create_id error %d\n", ret); Index: examples/udaddy.c =================================================================== --- examples/udaddy.c (revision 0) +++ examples/udaddy.c (revision 0) @@ -0,0 +1,636 @@ +/* + * Copyright (c) 2005 Intel Corporation. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id$ + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +/* + * To execute: + * Server: rdma_cmatose + * Client: rdma_cmatose "dst_ip=ip" + */ + +struct cmatest_node { + int id; + struct rdma_cm_id *cma_id; + int connected; + struct ibv_pd *pd; + struct ibv_cq *cq; + struct ibv_mr *mr; + struct ibv_ah *ah; + uint32_t remote_qpn; + uint32_t remote_qkey; + void *mem; +}; + +struct cmatest { + struct rdma_event_channel *channel; + struct cmatest_node *nodes; + int conn_index; + int connects_left; + + struct sockaddr_in dst_in; + struct sockaddr *dst_addr; + struct sockaddr_in src_in; + struct sockaddr *src_addr; +}; + +static struct cmatest test; +static int connections = 1; +static int message_size = 100; +static int message_count = 10; +static int is_server; + +static int create_message(struct cmatest_node *node) +{ + if (!message_size) + message_count = 0; + + if (!message_count) + return 0; + + node->mem = malloc(message_size + sizeof(struct ibv_grh)); + if (!node->mem) { + printf("failed message allocation\n"); + return -1; + } + node->mr = ibv_reg_mr(node->pd, node->mem, + message_size + sizeof(struct ibv_grh), + IBV_ACCESS_LOCAL_WRITE); + if (!node->mr) { + printf("failed to reg MR\n"); + goto err; + } + return 0; +err: + free(node->mem); + return -1; +} + +static int init_node(struct cmatest_node *node) +{ + struct ibv_qp_init_attr init_qp_attr; + int cqe, ret; + + node->pd = ibv_alloc_pd(node->cma_id->verbs); + if (!node->pd) { + ret = -ENOMEM; + printf("cmatose: unable to allocate PD\n"); + goto out; + } + + cqe = message_count ? message_count * 2 : 2; + node->cq = ibv_create_cq(node->cma_id->verbs, cqe, node, 0, 0); + if (!node->cq) { + ret = -ENOMEM; + printf("cmatose: unable to create CQ\n"); + goto out; + } + + memset(&init_qp_attr, 0, sizeof init_qp_attr); + init_qp_attr.cap.max_send_wr = message_count ? message_count : 1; + init_qp_attr.cap.max_recv_wr = message_count ? message_count : 1; + init_qp_attr.cap.max_send_sge = 1; + init_qp_attr.cap.max_recv_sge = 1; + init_qp_attr.qp_context = node; + init_qp_attr.sq_sig_all = 0; + init_qp_attr.qp_type = IBV_QPT_UD; + init_qp_attr.send_cq = node->cq; + init_qp_attr.recv_cq = node->cq; + ret = rdma_create_qp(node->cma_id, node->pd, &init_qp_attr); + if (ret) { + printf("cmatose: unable to create QP: %d\n", ret); + goto out; + } + + ret = create_message(node); + if (ret) { + printf("cmatose: failed to create messages: %d\n", ret); + goto out; + } +out: + return ret; +} + +static int post_recvs(struct cmatest_node *node) +{ + struct ibv_recv_wr recv_wr, *recv_failure; + struct ibv_sge sge; + int i, ret = 0; + + if (!message_count) + return 0; + + recv_wr.next = NULL; + recv_wr.sg_list = &sge; + recv_wr.num_sge = 1; + recv_wr.wr_id = (uintptr_t) node; + + sge.length = message_size + sizeof(struct ibv_grh); + sge.lkey = node->mr->lkey; + sge.addr = (uintptr_t) node->mem; + + for (i = 0; i < message_count && !ret; i++ ) { + ret = ibv_post_recv(node->cma_id->qp, &recv_wr, &recv_failure); + if (ret) { + printf("failed to post receives: %d\n", ret); + break; + } + } + return ret; +} + +static int post_sends(struct cmatest_node *node, int signal_flag) +{ + struct ibv_send_wr send_wr, *bad_send_wr; + struct ibv_sge sge; + int i, ret = 0; + + if (!node->connected || !message_count) + return 0; + + send_wr.next = NULL; + send_wr.sg_list = &sge; + send_wr.num_sge = 1; + send_wr.opcode = IBV_WR_SEND_WITH_IMM; + send_wr.send_flags = IBV_SEND_INLINE | signal_flag; + send_wr.wr_id = (unsigned long)node; + send_wr.imm_data = htonl(node->cma_id->qp->qp_num); + + send_wr.wr.ud.ah = node->ah; + send_wr.wr.ud.remote_qpn = node->remote_qpn; + send_wr.wr.ud.remote_qkey = node->remote_qkey; + + sge.length = message_size - sizeof(struct ibv_grh); + sge.lkey = node->mr->lkey; + sge.addr = (uintptr_t) node->mem; + + for (i = 0; i < message_count && !ret; i++) { + ret = ibv_post_send(node->cma_id->qp, &send_wr, &bad_send_wr); + if (ret) + printf("failed to post sends: %d\n", ret); + } + return ret; +} + +static void connect_error(void) +{ + test.connects_left--; +} + +static int addr_handler(struct cmatest_node *node) +{ + int ret; + + ret = rdma_resolve_route(node->cma_id, 2000); + if (ret) { + printf("cmatose: resolve route failed: %d\n", ret); + connect_error(); + } + return ret; +} + +static int route_handler(struct cmatest_node *node) +{ + struct rdma_conn_param conn_param; + int ret; + + ret = init_node(node); + if (ret) + goto err; + + ret = post_recvs(node); + if (ret) + goto err; + + memset(&conn_param, 0, sizeof conn_param); + conn_param.qp_num = node->cma_id->qp->qp_num; + conn_param.qp_type = node->cma_id->qp->qp_type; + conn_param.retry_count = 5; + ret = rdma_connect(node->cma_id, &conn_param); + if (ret) { + printf("cmatose: failure connecting: %d\n", ret); + goto err; + } + return 0; +err: + connect_error(); + return ret; +} + +static int connect_handler(struct rdma_cm_id *cma_id) +{ + struct cmatest_node *node; + struct rdma_conn_param conn_param; + int ret; + + if (test.conn_index == connections) { + ret = -ENOMEM; + goto err1; + } + node = &test.nodes[test.conn_index++]; + + node->cma_id = cma_id; + cma_id->context = node; + + ret = init_node(node); + if (ret) + goto err2; + + ret = post_recvs(node); + if (ret) + goto err2; + + memset(&conn_param, 0, sizeof conn_param); + conn_param.qp_num = node->cma_id->qp->qp_num; + conn_param.qp_type = node->cma_id->qp->qp_type; + ret = rdma_accept(node->cma_id, &conn_param); + if (ret) { + printf("cmatose: failure accepting: %d\n", ret); + goto err2; + } + node->connected = 1; + test.connects_left--; + return 0; + +err2: + node->cma_id = NULL; + connect_error(); +err1: + printf("cmatose: failing connection request\n"); + rdma_reject(cma_id, NULL, 0); + return ret; +} + +static int resolved_handler(struct cmatest_node *node) +{ + struct ibv_ah_attr ah_attr; + int ret; + + ret = rdma_get_dst_attr(node->cma_id, &ah_attr, &node->remote_qpn, + &node->remote_qkey); + if (ret) { + printf("udaddy: failure getting destination attributes\n"); + goto err; + } + + node->ah = ibv_create_ah(node->pd, &ah_attr); + if (!node->ah) { + printf("udaddy: failure creating address handle\n"); + goto err; + } + + node->connected = 1; + test.connects_left--; + return 0; +err: + connect_error(); + return ret; +} + +static int cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event) +{ + int ret = 0; + + switch (event->event) { + case RDMA_CM_EVENT_ADDR_RESOLVED: + ret = addr_handler(cma_id->context); + break; + case RDMA_CM_EVENT_ROUTE_RESOLVED: + ret = route_handler(cma_id->context); + break; + case RDMA_CM_EVENT_CONNECT_REQUEST: + ret = connect_handler(cma_id); + break; + case RDMA_CM_EVENT_ESTABLISHED: + ret = resolved_handler(cma_id->context); + break; + case RDMA_CM_EVENT_ADDR_ERROR: + case RDMA_CM_EVENT_ROUTE_ERROR: + case RDMA_CM_EVENT_CONNECT_ERROR: + case RDMA_CM_EVENT_UNREACHABLE: + case RDMA_CM_EVENT_REJECTED: + printf("cmatose: event: %d, error: %d\n", event->event, + event->status); + connect_error(); + ret = event->status; + break; + case RDMA_CM_EVENT_DEVICE_REMOVAL: + /* Cleanup will occur after test completes. */ + break; + default: + break; + } + return ret; +} + +static void destroy_node(struct cmatest_node *node) +{ + if (!node->cma_id) + return; + + if (node->ah) + ibv_destroy_ah(node->ah); + + if (node->cma_id->qp) + rdma_destroy_qp(node->cma_id); + + if (node->cq) + ibv_destroy_cq(node->cq); + + if (node->mem) { + ibv_dereg_mr(node->mr); + free(node->mem); + } + + if (node->pd) + ibv_dealloc_pd(node->pd); + + /* Destroy the RDMA ID after all device resources */ + rdma_destroy_id(node->cma_id); +} + +static int alloc_nodes(void) +{ + int ret, i; + + test.nodes = malloc(sizeof *test.nodes * connections); + if (!test.nodes) { + printf("cmatose: unable to allocate memory for test nodes\n"); + return -ENOMEM; + } + memset(test.nodes, 0, sizeof *test.nodes * connections); + + for (i = 0; i < connections; i++) { + test.nodes[i].id = i; + if (!is_server) { + ret = rdma_create_id(test.channel, + &test.nodes[i].cma_id, + &test.nodes[i], RDMA_PS_UDP); + if (ret) + goto err; + } + } + return 0; +err: + while (--i >= 0) + rdma_destroy_id(test.nodes[i].cma_id); + free(test.nodes); + return ret; +} + +static void destroy_nodes(void) +{ + int i; + + for (i = 0; i < connections; i++) + destroy_node(&test.nodes[i]); + free(test.nodes); +} + +static void create_reply_ah(struct cmatest_node *node, struct ibv_wc *wc) +{ + node->ah = ibv_create_ah_from_wc(node->pd, wc, node->mem, + node->cma_id->port_num); + node->remote_qpn = ntohl(wc->imm_data); + node->remote_qkey = ntohs(rdma_get_dst_port(node->cma_id)); +} + +static int poll_cqs(void) +{ + struct ibv_wc wc[8]; + int done, i, ret; + + for (i = 0; i < connections; i++) { + if (!test.nodes[i].connected) + continue; + + for (done = 0; done < message_count; done += ret) { + ret = ibv_poll_cq(test.nodes[i].cq, 8, wc); + if (ret < 0) { + printf("cmatose: failed polling CQ: %d\n", ret); + return ret; + } + + if (ret && !test.nodes[i].ah) + create_reply_ah(&test.nodes[i], wc); + } + } + return 0; +} + +static int connect_events(void) +{ + struct rdma_cm_event *event; + int ret = 0; + + while (test.connects_left && !ret) { + ret = rdma_get_cm_event(test.channel, &event); + if (!ret) { + ret = cma_handler(event->id, event); + rdma_ack_cm_event(event); + } + } + return ret; +} + +static int run_server(void) +{ + struct rdma_cm_id *listen_id; + int i, ret; + + printf("cmatose: starting server\n"); + ret = rdma_create_id(test.channel, &listen_id, &test, RDMA_PS_UDP); + if (ret) { + printf("cmatose: listen request failed\n"); + return ret; + } + + test.src_in.sin_family = PF_INET; + test.src_in.sin_port = 7174; + ret = rdma_bind_addr(listen_id, test.src_addr); + if (ret) { + printf("cmatose: bind address failed: %d\n", ret); + return ret; + } + + ret = rdma_listen(listen_id, 0); + if (ret) { + printf("cmatose: failure trying to listen: %d\n", ret); + goto out; + } + + connect_events(); + + if (message_count) { + printf("receiving data transfers\n"); + ret = poll_cqs(); + if (ret) + goto out; + + printf("sending replies\n"); + for (i = 0; i < connections; i++) { + ret = post_sends(&test.nodes[i], IBV_SEND_SIGNALED); + if (ret) + goto out; + } + + ret = poll_cqs(); + if (ret) + goto out; + printf("data transfers complete\n"); + } +out: + rdma_destroy_id(listen_id); + return ret; +} + +static int get_addr(char *dst, struct sockaddr_in *addr) +{ + struct addrinfo *res; + int ret; + + ret = getaddrinfo(dst, NULL, NULL, &res); + if (ret) { + printf("getaddrinfo failed - invalid hostname or IP address\n"); + return ret; + } + + if (res->ai_family != PF_INET) { + ret = -1; + goto out; + } + + *addr = *(struct sockaddr_in *) res->ai_addr; +out: + freeaddrinfo(res); + return ret; +} + +static int run_client(char *dst, char *src) +{ + int i, ret; + + printf("cmatose: starting client\n"); + if (src) { + ret = get_addr(src, &test.src_in); + if (ret) + return ret; + } + + ret = get_addr(dst, &test.dst_in); + if (ret) + return ret; + + test.dst_in.sin_port = 7174; + + printf("cmatose: connecting\n"); + for (i = 0; i < connections; i++) { + ret = rdma_resolve_addr(test.nodes[i].cma_id, + src ? test.src_addr : NULL, + test.dst_addr, 2000); + if (ret) { + printf("cmatose: failure getting addr: %d\n", ret); + connect_error(); + return ret; + } + } + + ret = connect_events(); + if (ret) + goto out; + + if (message_count) { + printf("initiating data transfers\n"); + for (i = 0; i < connections; i++) { + ret = post_sends(&test.nodes[i], 0); + if (ret) + goto out; + } + printf("receiving data transfers\n"); + ret = poll_cqs(); + if (ret) + goto out; + + printf("data transfers complete\n"); + } +out: + return ret; +} + +int main(int argc, char **argv) +{ + int ret; + + if (argc > 3) { + printf("usage: %s [server_addr [src_addr]]\n", argv[0]); + exit(1); + } + is_server = (argc == 1); + + test.dst_addr = (struct sockaddr *) &test.dst_in; + test.src_addr = (struct sockaddr *) &test.src_in; + test.connects_left = connections; + + test.channel = rdma_create_event_channel(); + if (!test.channel) { + printf("failed to create event channel\n"); + exit(1); + } + + if (alloc_nodes()) + exit(1); + + if (is_server) + ret = run_server(); + else + ret = run_client(argv[1], (argc == 3) ? argv[2] : NULL); + + printf("test complete\n"); + destroy_nodes(); + rdma_destroy_event_channel(test.channel); + + printf("return status %d\n", ret); + return ret; +} Property changes on: examples/udaddy.c ___________________________________________________________________ Name: svn:executable + * Index: examples/cmatose.c =================================================================== --- examples/cmatose.c (revision 7298) +++ examples/cmatose.c (working copy) @@ -380,7 +380,7 @@ static int alloc_nodes(void) if (!is_server) { ret = rdma_create_id(test.channel, &test.nodes[i].cma_id, - &test.nodes[i]); + &test.nodes[i], RDMA_PS_TCP); if (ret) goto err; } @@ -466,7 +466,7 @@ static int run_server(void) int i, ret; printf("cmatose: starting server\n"); - ret = rdma_create_id(test.channel, &listen_id, &test); + ret = rdma_create_id(test.channel, &listen_id, &test, RDMA_PS_TCP); if (ret) { printf("cmatose: listen request failed\n"); return ret; From jlentini at netapp.com Thu May 18 14:38:40 2006 From: jlentini at netapp.com (James Lentini) Date: Thu, 18 May 2006 17:38:40 -0400 (EDT) Subject: [openib-general] RE: [PATCH2] uDAPL: fix uCMA provider event types and dapl_ep_create segv bug In-Reply-To: References: Message-ID: On Wed, 17 May 2006, Arlin Davis wrote: > >Fix for uCMA provider to return the correct event as a result of > >rejects. Also, ran into a segv bug with dapl_ep_create when > >creating without a conn_evd. Committed in revision 7346. From bugzilla-daemon at openib.org Thu May 18 15:59:15 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 18 May 2006 15:59:15 -0700 (PDT) Subject: [openib-general] [Bug 90] New: IPoIB stops working after IB switch cards rebooted in loop Message-ID: <20060518225915.90CE3228645@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=90 Summary: IPoIB stops working after IB switch cards rebooted in loop Product: OpenFabrics Linux Version: 1.0rc4 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: sweitzen at cisco.com Cisco SQA has an automated stress test that reboots non-controller cards in a Topspin chassis, then tries IPoIB traffic from the first IB host on the network to all other IB hosts. After looping this test ~50 times, OFED 1.0 rc4 i686 IPoIB stops working on a test network with a Cisco SFS-7008 and 32 PCI-X hosts (each host has 3 IPoIB interfaces) on it. This test runs fine with Cisco Linux IB host drivers (looped it 300 times over the weekend). I'll keep gathering more details. I can recover IPoIB by shutting down the IPoIB interfaces and then bringing them back up. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Thu May 18 16:01:05 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Thu, 18 May 2006 16:01:05 -0700 (PDT) Subject: [openib-general] [Bug 90] IPoIB stops working after IB switch cards rebooted in loop Message-ID: <20060518230105.A82C0228653@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=90 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |rolandd at cisco.com ------- Additional Comments From sweitzen at cisco.com 2006-05-18 16:01 ------- Roland, since this reproducible in our SQA lab, are you willing to take a look at this one? ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mshefty at ichips.intel.com Thu May 18 16:24:16 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 18 May 2006 16:24:16 -0700 Subject: [openib-general] [PATCH 2/3] SA: add call to initialize ib_ah_attr from a path record In-Reply-To: References: Message-ID: <446D0220.8000709@ichips.intel.com> Sean Hefty wrote: > +int ib_init_ah_from_path(struct ib_device *device, u8 port_num, > + struct ib_sa_path_rec *rec, struct ib_ah_attr *ah_attr) > +{ > + int ret; > + u16 gid_index; > + > + memset(ah_attr, 0, sizeof *ah_attr); > + ah_attr->dlid = be16_to_cpu(rec->dlid); > + ah_attr->sl = rec->sl; > + ah_attr->src_path_bits = be16_to_cpu(rec->slid) & 0x7F; > + ah_attr->port_num = port_num; > + > + if (rec->sgid.global.subnet_prefix != rec->dgid.global.subnet_prefix) { > + ah_attr->ah_flags = IB_AH_GRH; I should note that I compared the subnet prefixes to determine if the GRH should be used. Reading back over the 'GRH flag in ib_ah_attr' thread, it looks like there's consensus that hop_limit > 1 is the check that we want. I will update the code accordingly. - Sean From bos at pathscale.com Thu May 18 16:43:32 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Thu, 18 May 2006 16:43:32 -0700 Subject: [openib-general] Re: openmpi on ib over pathscale (2.6.17-rc4+53 patches) In-Reply-To: <446C97E6.5030005@atipa.com> References: <446C97E6.5030005@atipa.com> Message-ID: <1147995812.3620.0.camel@chalcedony.pathscale.com> On Thu, 2006-05-18 at 10:51 -0500, Roger Heflin wrote: > I appear to be able to duplicate this, and I can collect any information > that would > help when the hang happens. I have some more patches a-cooking, which may include a fix for this problem. I'll keep you posted. Hi, I got "Unhandled CM event 6" (IB_CM_DREQ_ERROR) and "Unhandled CM event 7" (IB_CM_DREQ_RECEIVED). So here is a patch that handles these CM events. This is an initial patch. Maybe it will be more efficient to initiate a reconnect in case we get IB_CM_DREQ_RECEIVED. What do you think? Signed-off-by: Ishai Rabinovitz Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-19 00:05:30.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-19 01:35:37.000000000 +0300 @@ -1214,13 +1231,29 @@ static int srp_cm_handler(struct ib_cm_i target->status = 0; break; + case IB_CM_DREQ_ERROR: + printk(KERN_ERR PFX + "IB_CM_DREQ_ERROR received - connection closed\n"); + /* no need to set comp - there will be a TIMEWAIT_EXIT */ + break; + + case IB_CM_DREQ_RECEIVED: + printk(KERN_ERR PFX + "IB_CM_DREQ_RECEIVED received - connection closed\n"); + if (ib_send_cm_drep(target->cm_id, NULL, 0)) + printk(KERN_ERR PFX "ib_send_cm_drep failed\n"); + /* no need to set comp - there will be a TIMEWAIT_EXIT */ + break; + default: printk(KERN_WARNING PFX "Unhandled CM event %d\n", event->event); break; } - if (comp) + if (comp) { + printk(KERN_ERR PFX "srp_cm_handler: complete to %p\n", target); complete(&target->done); + } kfree(qp_attr); -- Ishai Rabinovitz From christopherx.b.kasten at intel.com Thu May 18 17:26:53 2006 From: christopherx.b.kasten at intel.com (Kasten, ChristopherX B) Date: Thu, 18 May 2006 17:26:53 -0700 Subject: [openib-general] Howto setup rping and krping for OpenIB with AMSO1100 Message-ID: Hello, After spending some time installing OpenIB with AMSO1100, I have managed to get rping and krping to work. Here is a howto document describing the steps I took in the process. This does not get opensm to work, as I am still getting a port guid error on that front. You are welcome to add this to any documentation area if you find it useful. I have not written a detailed howto like this before, and I am also new to this technology, so forgive me if there are any errors. Cheers, Chris Kasten ------------------------------------------------------------------------ --- How to install OpenIB with an Ammasso1100 network card Enables rping, krping Based on the version from May 10, 2006. ------------------------------------------------------------------------ --- Follow the Installation Cheat Sheet with a few variations detailed below (substituting Ammasso for Mellanox): https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet ___________ Step 3: ----------- I used linux kernel 2.6.16.15 In make menuconfig include the following as modules- Device Drivers -> Infiniband support -> Infiniband support Infiniband userspace MAD support Infiniband userspace access (verbs & cm) Ammasso 1100 HCA support Kernel RDMA Ping Module ______________ Steps 5/6: -------------- Ignore ib_mthca on modprobe ______________________________ ICS Building management tools: ------------------------------ In src/userspace/management/osm/ rename authors -> AUTHORS news -> NEWS readme -> README then run ./autogen.sh && ./configure ___________________________________ Building userspace verbs libraries: ----------------------------------- Install sysfsutils before you build these libraries. I used sysfsutils-1.3.0 ./configure make make install Instead of libmthca, build libamso For rping, build librdmacm Use the standard: ./autogen.sh ./configure make make install for these libraries. _______________________________________________ Don't bother with the "Testing" step and beyond ----------------------------------------------- ------------------------------------------------------------------------ ----- Download and install: http://www.opengridcomputing.com/downloads/ogc_amso_kit_20060308.tgz. ------------------------------------------------------------------------ ----- On one of my machines, the following was not added to /etc/modprobe.conf.dist: (as one line) install ib_core for i in ib_core ib_mad ib_cm ib_sa ib_ucm ib_umad ib_uverbs; do /sbin/modprobe --ignore-install $i; done Make sure this is in /etc/modprobe.conf.dist so the modules will load automatically at boot ------------------------------------------------------------------------ ----- Check to see that the following modules are loaded: ib_uverbs - - <-uverbs adds /dev/infiniband/uverbs0 rdma_cm iw_cm ib_addr ib_cm ib_local_sa ib_sa ib_core iw_c2 - - - - <-amso -----These two didn't load automatically for me------ rdma_krping - <-krping rdma_ucm- - - <-rping adds /dev/infiniband/rdma_cm ------------------------------------------------------------------------ ----- Start up interfaces ------------------- After a fresh reboot, check which modules are loaded. I had to add these two. modprobe rdma_krping modprobe rdma_ucm Machine 1: ifconfig iw0 192.168.69.149 up (server) ifconfig eth0 192.168.68.149 up Machine 2: ifconfig iw1 192.168.69.148 up (client) ifconfig eth1 192.168.68.148 up ------------------------------------------------------------------------ ----- Testing: ______ krping ------ On server: /bin/echo server,port=9999,addr=192.168.69.149,validate > /proc/krping On client: /bin/echo client,port=9999,addr=192.168.69.149,validate > /proc/krping (addr = server iw addr) To verify: cat /proc/krping 1 listen: indicates the server is waiting 1-amso0 n1 n2 n3 ...: indicates the connection has been made If n = 0 for all n, there is probably an error _____ rping ----- Server: rping -s -vV -C10 -S10 -a 0.0.0.0 -p 9999 Client: rping -c -vV -C10 -S10 -a 192.168.69.149 -p 9999 This should output: ping data : rdma-ping (10x because of -C10) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Thu May 18 17:46:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 18 May 2006 17:46:09 -0700 Subject: [openib-general] Howto setup rping and krping for OpenIB with AMSO1100 In-Reply-To: (ChristopherX B. Kasten's message of "Thu, 18 May 2006 17:26:53 -0700") References: Message-ID: ChristopherX> Hello, After spending some time installing OpenIB ChristopherX> with AMSO1100, I have managed to get rping and ChristopherX> krping to work. Here is a howto document describing ChristopherX> the steps I took in the process. This does not get ChristopherX> opensm to work, as I am still getting a port guid ChristopherX> error on that front. Thanks for the documentation. Perhaps you could add it to the openib.org wiki? It doesn't make sense to run OpenSM with the Ammasso device or with any iWARP/ethernet device. Subnet managers are strictly an InfiniBand concept. Maybe OpenSM should look at the node_type entry in sysfs and print a more informative message if someone tries to run it on an iWARP RNIC? - R. From christopherx.b.kasten at intel.com Thu May 18 17:57:26 2006 From: christopherx.b.kasten at intel.com (Kasten, ChristopherX B) Date: Thu, 18 May 2006 17:57:26 -0700 Subject: [openib-general] Howto setup rping and krping for OpenIB with AMSO1100 Message-ID: Ok, thanks for the info. I thought the reason opensm didn't work with Ammasso might have been along the lines of -it wasn't meant to-. Good to know. I'll go ahead and add it to the wiki then. Chris -----Original Message----- From: Roland Dreier [mailto:rdreier at cisco.com] Sent: Thursday, May 18, 2006 5:46 PM To: Kasten, ChristopherX B Cc: openib-general at openib.org Subject: Re: [openib-general] Howto setup rping and krping for OpenIB with AMSO1100 ChristopherX> Hello, After spending some time installing OpenIB ChristopherX> with AMSO1100, I have managed to get rping and ChristopherX> krping to work. Here is a howto document describing ChristopherX> the steps I took in the process. This does not get ChristopherX> opensm to work, as I am still getting a port guid ChristopherX> error on that front. Thanks for the documentation. Perhaps you could add it to the openib.org wiki? It doesn't make sense to run OpenSM with the Ammasso device or with any iWARP/ethernet device. Subnet managers are strictly an InfiniBand concept. Maybe OpenSM should look at the node_type entry in sysfs and print a more informative message if someone tries to run it on an iWARP RNIC? - R. From halr at voltaire.com Thu May 18 18:34:46 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 May 2006 21:34:46 -0400 Subject: [openib-general] Howto setup rping and krping for OpenIB with AMSO1100 In-Reply-To: References: Message-ID: <1148002473.18971.89193.camel@hal.voltaire.com> Hi Chris, On Thu, 2006-05-18 at 20:26, Kasten, ChristopherX B wrote: > Hello, > > > > After spending some time installing OpenIB with AMSO1100, I have > managed to get rping and krping to work. Here is a howto document > describing the steps I took in the process. This looks like OFED installation doc with some mods. Should this be sent to the OFED list ? > This does not get opensm to work, as I am still getting a port guid > error on that front. How do you invoke OpenSM ? Any options ? -- Hal > You are welcome to add this to any documentation area if you find it > useful. I have not written a detailed howto like this before, and I > am also new to this technology, so forgive me if there are any errors. > > > > Cheers, > > Chris Kasten > > > > > > --------------------------------------------------------------------------- > > How to install OpenIB with an Ammasso1100 network card > > Enables rping, krping > > Based on the version from May 10, 2006. > > > > --------------------------------------------------------------------------- > > Follow the Installation Cheat Sheet with a few variations detailed > below (substituting Ammasso for Mellanox): > > https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet > > ___________ > > Step 3: > > ----------- > > I used linux kernel 2.6.16.15 > > > > In make menuconfig include the following as modules- > > Device Drivers -> Infiniband support -> Infiniband support > > Infiniband userspace MAD support > > Infiniband userspace access (verbs & > cm) > > Ammasso 1100 HCA support > > Kernel RDMA Ping Module > > ______________ > > Steps 5/6: > > -------------- > > Ignore ib_mthca on modprobe > > > > ______________________________ > > ICS Building management tools: > > ------------------------------ > > In src/userspace/management/osm/ > > rename authors -> AUTHORS > > news -> NEWS > > readme -> README > > then run ./autogen.sh && ./configure > > > > ___________________________________ > > Building userspace verbs libraries: > > ----------------------------------- > > Install sysfsutils before you build these libraries. I used > sysfsutils-1.3.0 > > ./configure > > make > > make install > > > > Instead of libmthca, build libamso > > For rping, build librdmacm > > > > Use the standard: > > > > ./autogen.sh > > ./configure > > make > > make install > > > > for these libraries. > > _______________________________________________ > > Don’t bother with the “Testing” step and beyond > > ----------------------------------------------- > > > > ----------------------------------------------------------------------------- > > Download and install: > > http://www.opengridcomputing.com/downloads/ogc_amso_kit_20060308.tgz. > > ----------------------------------------------------------------------------- > > On one of my machines, the following was not added to > /etc/modprobe.conf.dist: > > > > (as one line) > > install ib_core for i in ib_core ib_mad ib_cm ib_sa ib_ucm ib_umad > ib_uverbs; do /sbin/modprobe > > --ignore-install $i; done > > > > Make sure this is in /etc/modprobe.conf.dist so the modules will load > automatically at boot > > > > ----------------------------------------------------------------------------- > > Check to see that the following modules are loaded: > > > > ib_uverbs - - <-uverbs adds /dev/infiniband/uverbs0 > > rdma_cm > > iw_cm > > ib_addr > > ib_cm > > ib_local_sa > > ib_sa > > ib_core > > iw_c2 - - - - <-amso > > > > -----These two didn't load automatically for me------ > > rdma_krping - <-krping > > rdma_ucm- - - <-rping adds /dev/infiniband/rdma_cm > > > > ----------------------------------------------------------------------------- > > Start up interfaces > > ------------------- > > After a fresh reboot, check which modules are loaded. I had to add > these two. > > modprobe rdma_krping > > modprobe rdma_ucm > > > > Machine 1: ifconfig iw0 192.168.69.149 up > > (server) ifconfig eth0 192.168.68.149 up > > > > Machine 2: ifconfig iw1 192.168.69.148 up > > (client) ifconfig eth1 192.168.68.148 up > > ----------------------------------------------------------------------------- > > Testing: > > ______ > > krping > > ------ > > On server: /bin/echo server,port=9999,addr=192.168.69.149,validate > > /proc/krping > > On client: /bin/echo client,port=9999,addr=192.168.69.149,validate > > /proc/krping > > (addr = server iw addr) > > > > To verify: cat /proc/krping > > > > 1 listen: indicates the server is waiting > > 1-amso0 n1 n2 n3 ...: indicates the connection has been made > > > > If n = 0 for all n, there is probably an error > > > > _____ > > rping > > ----- > > Server: rping -s -vV -C10 -S10 -a 0.0.0.0 -p 9999 > > Client: rping -c -vV -C10 -S10 -a 192.168.69.149 -p 9999 > > > > This should output: > > ping data : rdma-ping (10x because of -C10) > > > > > > ______________________________________________________________________ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Thu May 18 18:54:04 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 18 May 2006 21:54:04 -0400 Subject: [openib-general] Howto setup rping and krping for OpenIB with AMSO1100 In-Reply-To: References: Message-ID: <1148003636.18971.89591.camel@hal.voltaire.com> On Thu, 2006-05-18 at 20:46, Roland Dreier wrote: > ChristopherX> Hello, After spending some time installing OpenIB > ChristopherX> with AMSO1100, I have managed to get rping and > ChristopherX> krping to work. Here is a howto document describing > ChristopherX> the steps I took in the process. This does not get > ChristopherX> opensm to work, as I am still getting a port guid > ChristopherX> error on that front. > > Thanks for the documentation. Perhaps you could add it to the > openib.org wiki? > > It doesn't make sense to run OpenSM with the Ammasso device or with > any iWARP/ethernet device. Subnet managers are strictly an InfiniBand > concept. > > Maybe OpenSM should look at the node_type entry in sysfs and print a > more informative message if someone tries to run it on an iWARP RNIC? I don't think OpenSM ever tries to run on an iWARP RNIC. There are 2 modes of starting up OpenSM. One where it finds the "first" port (active port state, then physical state link up, and finally physical state not disabled) on an IB NIC (no -g specified) and the other where an explicit GUID is chosen by the user (admin). In the first case, any iWARP RNICs are bypassed. If there are no available IB NICs, then an error should be indicated by libibumad. In the second case, here's what I get when OpenSM is attempted to be explictly started on the AMSO1100: ibstat shows: ??? 'amso0' ??? type: AMSO1100 Number of ports: 1 Firmware version: 0.0.0 Hardware version: 0 Node GUID: 0x000db200066d0000 System image GUID: 0x000db200066d0000 Port 1: State: Active Physical state: No state change Rate: 2 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x009f0000 Port GUID: 0x73e01580ffffffff opensm -g=0x73e01580ffffffff Error from osm_opensm_bind (0x2A) The error messages in osm.log show: Jan 27 17:56:15 064682 [AB002D00] -> osm_vendor_bind: Binding to port 0x73e01580ffffffff. Jan 27 17:56:15 065370 [AB002D00] -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1 Jan 27 17:56:15 065396 [AB002D00] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed Jan 27 17:56:15 065410 [AB002D00] -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR) Jan 27 17:56:15 065447 [AB002D00] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind Guess I could add something explictly here for this case to make it more obvious. -- Hal > - R. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From jgunthorpe at obsidianresearch.com Thu May 18 20:09:37 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Thu, 18 May 2006 21:09:37 -0600 Subject: [openib-general] [PATCH 2/3] SA: add call to initialize ib_ah_attr from a path record In-Reply-To: <446D0220.8000709@ichips.intel.com> References: <446D0220.8000709@ichips.intel.com> Message-ID: <20060519030937.GB15832@obsidianresearch.com> On Thu, May 18, 2006 at 04:24:16PM -0700, Sean Hefty wrote: > I should note that I compared the subnet prefixes to determine if the GRH > should be used. Reading back over the 'GRH flag in ib_ah_attr' thread, it > looks like there's consensus that hop_limit > 1 is the check that we want. > I will update the code accordingly. I think it is also prudent to not use a GRH if the DGID's prefix is fe80::/64 (link local scope). Jason From services.de.cartes.desjardins at scd.desjardins.com Thu May 18 21:35:34 2006 From: services.de.cartes.desjardins at scd.desjardins.com (services.de.cartes.desjardins at scd.desjardins.com) Date: Fri, 19 May 2006 13:35:34 +0900 (JST) Subject: [openib-general] Information importante Message-ID: <20060519043534.BBAF2406249@mx2.ibcjapan.co.jp> An HTML attachment was scrubbed... URL: From zhushisongzhu at yahoo.com Fri May 19 01:40:09 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Fri, 19 May 2006 01:40:09 -0700 (PDT) Subject: [openib-general] Re: OFED RC4 also can't support >2000 connections In-Reply-To: <446C8847.4050601@mellanox.co.il> Message-ID: <20060519084009.42683.qmail@web36910.mail.mud.yahoo.com> i'll test latest svn with FC5-x86_64. Will you fix the problem in OFED release or svn version? tks zhu --- Tziporet Koren wrote: > zhu shi song wrote: > > Do you have any time table for sdp? My project is > > urgent to use sdp. If it's too late, I'll think > > another method. Because SDP has reliable > connection > > semantics, it's very useful for such as web, ftp, > > https general applications. So infiband can > easily > > extend to new application area except cluster > > computing customized applications. But > unfortunately > > it can't work. I hope you can speed your > development > > process. I'll try my best to help you. I'm > studying > > the source code now. > > tks > > zhu > > > > > We expect it to be robust next month (June) > > Tziporet > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From mst at mellanox.co.il Fri May 19 03:23:14 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 19 May 2006 13:23:14 +0300 Subject: [openib-general] Re: OFED RC4 also can't support >2000 connections In-Reply-To: <20060519084009.42683.qmail@web36910.mail.mud.yahoo.com> References: <446C8847.4050601@mellanox.co.il> <20060519084009.42683.qmail@web36910.mail.mud.yahoo.com> Message-ID: <20060519102313.GA14962@mellanox.co.il> svn is always the latest code. Quoting r. zhu shi song : > Subject: Re: [openib-general] Re: OFED RC4 also can't support >2000 connections > > i'll test latest svn with FC5-x86_64. Will you fix > the problem in OFED release or svn version? > tks > zhu > > --- Tziporet Koren wrote: > > > zhu shi song wrote: > > > Do you have any time table for sdp? My project is > > > urgent to use sdp. If it's too late, I'll think > > > another method. Because SDP has reliable > > connection > > > semantics, it's very useful for such as web, ftp, > > > https general applications. So infiband can > > easily > > > extend to new application area except cluster > > > computing customized applications. But > > unfortunately > > > it can't work. I hope you can speed your > > development > > > process. I'll try my best to help you. I'm > > studying > > > the source code now. > > > tks > > > zhu > > > > > > > > We expect it to be robust next month (June) > > > > Tziporet > > > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > -- MST From halr at voltaire.com Fri May 19 03:47:39 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 May 2006 06:47:39 -0400 Subject: [openfabrics-ewg] Re: [openib-general] Howto setup rping and krping for OpenIB with AMSO1100 In-Reply-To: <1148002473.18971.89193.camel@hal.voltaire.com> References: <1148002473.18971.89193.camel@hal.voltaire.com> Message-ID: <1148035655.18971.100201.camel@hal.voltaire.com> On Thu, 2006-05-18 at 21:34, Hal Rosenstock wrote: > > After spending some time installing OpenIB with AMSO1100, I have > > managed to get rping and krping to work. Here is a howto document > > describing the steps I took in the process. > > This looks like OFED installation doc with some mods. Should this be > sent to the OFED list ? I don't think that iWARP is not part of OFED 1.0. > > This does not get opensm to work, as I am still getting a port guid > > error on that front. > > How do you invoke OpenSM ? Any options ? What error do you get ? What does osm.log say ? -- Hal From suri at baymicrosystems.com Fri May 19 07:10:33 2006 From: suri at baymicrosystems.com (Suresh Shelvapille) Date: Fri, 19 May 2006 10:10:33 -0400 Subject: [openib-general] [PATCH 2/3] SA: add call to initializeib_ah_attr from a path record In-Reply-To: <20060519030937.GB15832@obsidianresearch.com> Message-ID: <200605191410.k4JEAcQj012113@mail.baymicrosystems.com> Jason: This might work in this case. But, if you look at RFC4391(section 4.1 Broadcast GID params, item 4. other parameters) : "The broadcast-GID's scope bits need to be set based on whether the IPoIB link is confined within an IB subnet or the IPoIB link spans multiple IB subnets. A default of local-subnet scope (i.e., 0x2) is RECOMMENDED. A node might determine the scope bits to use by interactively searching for a broadcast- GID of ever greater scope by first starting with the local- scope. Or, an implementation might include the scope bits as a configuration parameter" How can a node determine the scope param in Roland's crazy setup(when the scope bits are not a config param)? Thanks, Suri > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Jason Gunthorpe > Sent: Thursday, May 18, 2006 11:10 PM > To: Sean Hefty > Cc: 'Roland Dreier'; openib-general at openib.org > Subject: Re: [openib-general] [PATCH 2/3] SA: add call to > initializeib_ah_attr from a path record > > On Thu, May 18, 2006 at 04:24:16PM -0700, Sean Hefty wrote: > > > I should note that I compared the subnet prefixes to determine if the > GRH > > should be used. Reading back over the 'GRH flag in ib_ah_attr' thread, > it > > looks like there's consensus that hop_limit > 1 is the check that we > want. > > I will update the code accordingly. > > I think it is also prudent to not use a GRH if the DGID's prefix is > fe80::/64 (link local scope). > > Jason > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general From halr at voltaire.com Fri May 19 07:19:50 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 May 2006 10:19:50 -0400 Subject: [openib-general] [PATCH 2/3] SA: add call to initialize ib_ah_attr from a path record In-Reply-To: <20060519030937.GB15832@obsidianresearch.com> References: <446D0220.8000709@ichips.intel.com> <20060519030937.GB15832@obsidianresearch.com> Message-ID: <1148048385.18971.104330.camel@hal.voltaire.com> On Thu, 2006-05-18 at 23:09, Jason Gunthorpe wrote: > On Thu, May 18, 2006 at 04:24:16PM -0700, Sean Hefty wrote: > > > I should note that I compared the subnet prefixes to determine if the GRH > > should be used. Reading back over the 'GRH flag in ib_ah_attr' thread, it > > looks like there's consensus that hop_limit > 1 is the check that we want. > > I will update the code accordingly. > > I think it is also prudent to not use a GRH if the DGID's prefix is > fe80::/64 (link local scope). Agreed. This saves on the overhead in the local subnet case. However, wouldn't the HopLimit returned from the SA in this case be 0 or 1 so why would that check be needed ? If it is, I think the DGID prefix checking may be more than just this. -- Hal From halr at voltaire.com Fri May 19 08:44:07 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 May 2006 11:44:07 -0400 Subject: [openib-general] [PATCH] [TRIVIAL] IPoIB doc: Update for IPoIB RFCs being issued by IETF Message-ID: <1148053427.18971.105953.camel@hal.voltaire.com> IPoIB doc: Update for IPoIB RFCs being issued by IETF Now that the IPoIB I-Ds are turning into RFCs, update the IPoIB doc Signed-off-by: Hal Rosenstock Index: linux-kernel/docs/ipoib.txt =================================================================== --- linux-kernel/docs/ipoib.txt (revision 6738) +++ linux-kernel/docs/ipoib.txt (working copy) @@ -1,9 +1,9 @@ IP OVER INFINIBAND The ib_ipoib driver is an implementation of the IP over InfiniBand - protocol as specified by the latest Internet-Drafts issued by the - IETF ipoib working group. It is a "native" implementation in the - sense of setting the interface type to ARPHRD_INFINIBAND and the + protocol as specified by the latest Internet-Drafts and RFCs issued + by the IETF ipoib working group. It is a "native" implementation in + the sense of setting the interface type to ARPHRD_INFINIBAND and the hardware address length to 20 (earlier proprietary implementations masqueraded to the kernel as ethernet interfaces). From rdreier at cisco.com Fri May 19 08:58:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 19 May 2006 08:58:21 -0700 Subject: [openib-general] Re: [PATCH] [TRIVIAL] IPoIB doc: Update for IPoIB RFCs being issued by IETF In-Reply-To: <1148053427.18971.105953.camel@hal.voltaire.com> (Hal Rosenstock's message of "19 May 2006 11:44:07 -0400") References: <1148053427.18971.105953.camel@hal.voltaire.com> Message-ID: Probably would make sense to mention the exact RFC numbers if we're going to change this... From jgunthorpe at obsidianresearch.com Fri May 19 09:05:52 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Fri, 19 May 2006 10:05:52 -0600 Subject: [openib-general] [PATCH 2/3] SA: add call to initialize ib_ah_attr from a path record In-Reply-To: <1148048385.18971.104330.camel@hal.voltaire.com> References: <446D0220.8000709@ichips.intel.com> <20060519030937.GB15832@obsidianresearch.com> <1148048385.18971.104330.camel@hal.voltaire.com> Message-ID: <20060519160552.GD15832@obsidianresearch.com> On Fri, May 19, 2006 at 10:19:50AM -0400, Hal Rosenstock wrote: > > I think it is also prudent to not use a GRH if the DGID's prefix is > > fe80::/64 (link local scope). > > Agreed. This saves on the overhead in the local subnet case. > > However, wouldn't the HopLimit returned from the SA in this case be 0 or > 1 so why would that check be needed ? > > If it is, I think the DGID prefix checking may be more than just this. I think the SA should work like you are describing, but how sure are we that all existing ones do? I'm only suggesting it in case there are some broken SA's out there. Jason From christopherx.b.kasten at intel.com Fri May 19 09:07:44 2006 From: christopherx.b.kasten at intel.com (Kasten, ChristopherX B) Date: Fri, 19 May 2006 09:07:44 -0700 Subject: [openfabrics-ewg] Re: [openib-general] Howto setup rping andkrping for OpenIB with AMSO1100 Message-ID: When I run opensm, I get the error: Using default guid 0x0 Error: Could not get port guid The log file at /var/log/osm.log has the error: osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind It sounds like it wasn't meant to work on ethernet in the first place. Thanks for the clarification. Chris -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Friday, May 19, 2006 3:48 AM To: Kasten, ChristopherX B Cc: OpenFabricsEWG; openib-general at openib.org Subject: Re: [openfabrics-ewg] Re: [openib-general] Howto setup rping andkrping for OpenIB with AMSO1100 On Thu, 2006-05-18 at 21:34, Hal Rosenstock wrote: > > After spending some time installing OpenIB with AMSO1100, I have > > managed to get rping and krping to work. Here is a howto document > > describing the steps I took in the process. > > This looks like OFED installation doc with some mods. Should this be > sent to the OFED list ? I don't think that iWARP is not part of OFED 1.0. > > This does not get opensm to work, as I am still getting a port guid > > error on that front. > > How do you invoke OpenSM ? Any options ? What error do you get ? What does osm.log say ? -- Hal From halr at voltaire.com Fri May 19 09:15:29 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 May 2006 12:15:29 -0400 Subject: [openib-general] [PATCH 2/3] SA: add call to initialize ib_ah_attr from a path record In-Reply-To: <20060519160552.GD15832@obsidianresearch.com> References: <446D0220.8000709@ichips.intel.com> <20060519030937.GB15832@obsidianresearch.com> <1148048385.18971.104330.camel@hal.voltaire.com> <20060519160552.GD15832@obsidianresearch.com> Message-ID: <1148055182.18971.106526.camel@hal.voltaire.com> On Fri, 2006-05-19 at 12:05, Jason Gunthorpe wrote: > On Fri, May 19, 2006 at 10:19:50AM -0400, Hal Rosenstock wrote: > > > > I think it is also prudent to not use a GRH if the DGID's prefix is > > > fe80::/64 (link local scope). > > > > Agreed. This saves on the overhead in the local subnet case. > > > > However, wouldn't the HopLimit returned from the SA in this case be 0 or > > 1 so why would that check be needed ? > > > > If it is, I think the DGID prefix checking may be more than just this. > > I think the SA should work like you are describing, but how sure are > we that all existing ones do? I'm only suggesting it in case there are > some broken SA's out there. Are you aware of any currently that don't ? In this case, IMO the penalty of including the GRH should be paid (until those SMs are fixed) rather than adding any additional prefix checking into the end node. -- Hal From halr at voltaire.com Fri May 19 09:18:49 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 May 2006 12:18:49 -0400 Subject: [openib-general] Re: [PATCH] [TRIVIAL] IPoIB doc: Update for IPoIB RFCs being issued by IETF In-Reply-To: References: <1148053427.18971.105953.camel@hal.voltaire.com> Message-ID: <1148055297.18971.106572.camel@hal.voltaire.com> On Fri, 2006-05-19 at 11:58, Roland Dreier wrote: > Probably would make sense to mention the exact RFC numbers if we're > going to change this... IPoIB architecture is RFC 4392 Transmission of IP over IB is RFC 4391 DHCP over IB is RFC 4390 IPoIB-CM (which is not currently supported) is still an I-D Do you want to update this or should I send an updated patch ? -- Hal From amit_byron at yahoo.com Fri May 19 09:37:10 2006 From: amit_byron at yahoo.com (amit byron) Date: Fri, 19 May 2006 09:37:10 -0700 (PDT) Subject: [openib-general] infiniband Message-ID: <20060519163710.96839.qmail@web38507.mail.mud.yahoo.com> hi, i've couple of questions: o where can i find sample code to do rdma write? how to setup rdma write & read between two infiniband nodes? o where can i find documentation on infiniband api? thanks, Amit. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Fri May 19 09:44:59 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 19 May 2006 09:44:59 -0700 Subject: [openib-general] infiniband In-Reply-To: <20060519163710.96839.qmail@web38507.mail.mud.yahoo.com> References: <20060519163710.96839.qmail@web38507.mail.mud.yahoo.com> Message-ID: <446DF60B.4040600@ichips.intel.com> amit byron wrote: > o where can i find sample code to do rdma write? how to > setup rdma write & read between two infiniband nodes? There are some kernel tests in svn/gen2/utils/src/linux-kernel/infiniband/util. See krping for RDMA read/write. There are usermode tests in svn/gen2/trunk/src/userspace/librdmacm/examples (see rping). > o where can i find documentation on infiniband api? Verbs are documented in the IB Architecture spec. There's additional documentation surrounding most calls in the header / source files. - Sean From eeb at bartonsoftware.com Fri May 19 10:10:43 2006 From: eeb at bartonsoftware.com (Eric Barton) Date: Fri, 19 May 2006 18:10:43 +0100 Subject: [openib-general] multiple RDMA_CM_EVENT_DISCONNECTED callbacks Message-ID: <200605191710.k4JHAhOe005688@robert.bartonsoftware.com> Hi, I'm using the rdam_cm API. I've seen it call my CM callback with RDMA_CM_EVENT_DISCONNECTED twice. Is this a bug? I've implemented connection teardown procedure that checks whether this is the first time it has been called on a given connection. If not, it's a NOOP. Otherwise it schedules the connection for teardown by a thread. This thread calls rdma_disconnect() and then explicitly moves the QP state to error (but maybe that's redundant?). I call this teardown procedure any time sends or receives complete with error, or when I get the RDMA_CM_EVENT_DISCONNECTED callback. I refcount my connections, so after I've called the teardown function, I'm basically just waiting for the refs to drain to zero, including the CM ref with is released when I see RDMA_CM_EVENT_DISCONNECTED. I had a typo in my code that meant that I was sending with an opcode IB_WC_SEND, rather than IB_WR_SEND, causing a remote access error (IB_WC_SEND == 0 == IB_WR_RDMA_WRITE). Posted receives on the remote QP all completed with error (I guess openib moved the QP state to error) and the send complete locally with error. This meant that both sides were racing to call rdma_disconnect(), and that's when I got 2 CM callbacks with RDMA_CM_EVENT_DISCONNECTED for the same connection. All this was running on a 2.6.9 EL based kernel (LLNL) and openib subversion version #6829. -- Cheers, Eric --------------------------------------------------- |Eric Barton Barton Software | |9 York Gardens Tel: +44 (117) 330 1575 | |Clifton Mobile: +44 (7909) 680 356 | |Bristol BS8 4LL Fax: call first | |United Kingdom E-Mail: eeb at bartonsoftware.com| --------------------------------------------------- From jgunthorpe at obsidianresearch.com Fri May 19 10:20:05 2006 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Fri, 19 May 2006 11:20:05 -0600 Subject: [openib-general] [PATCH 2/3] SA: add call to initialize ib_ah_attr from a path record In-Reply-To: <1148055182.18971.106526.camel@hal.voltaire.com> References: <446D0220.8000709@ichips.intel.com> <20060519030937.GB15832@obsidianresearch.com> <1148048385.18971.104330.camel@hal.voltaire.com> <20060519160552.GD15832@obsidianresearch.com> <1148055182.18971.106526.camel@hal.voltaire.com> Message-ID: <20060519172005.GA23662@obsidianresearch.com> On Fri, May 19, 2006 at 12:15:29PM -0400, Hal Rosenstock wrote: > Are you aware of any currently that don't ? Nope > In this case, IMO the penalty of including the GRH should be paid (until > those SMs are fixed) rather than adding any additional prefix checking > into the end node. I'm fine with that. It could always be added in later if it is a problem. Jason From mshefty at ichips.intel.com Fri May 19 10:34:23 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 19 May 2006 10:34:23 -0700 Subject: [openib-general] multiple RDMA_CM_EVENT_DISCONNECTED callbacks In-Reply-To: <200605191710.k4JHAhOe005688@robert.bartonsoftware.com> References: <200605191710.k4JHAhOe005688@robert.bartonsoftware.com> Message-ID: <446E019F.1030405@ichips.intel.com> Eric Barton wrote: > I'm using the rdam_cm API. I've seen it call my CM callback with > RDMA_CM_EVENT_DISCONNECTED twice. Is this a bug? I would consider this a bug. The problem is that the underlying IB CM is reporting two events: DREQ_RECEIVED, followed by DREP_RECEIVED. The RDMA CM reports both of these as DISCONNECTED events. It makes more sense to me if the RDMA CM only reported the first event, and discarded the second. > I've implemented connection teardown procedure that checks whether this is the > first time it has been called on a given connection. If not, it's a NOOP. > Otherwise it schedules the connection for teardown by a thread. This thread > calls rdma_disconnect() and then explicitly moves the QP state to error (but > maybe that's redundant?). rdma_disconnect will transition the QP into the error state. I recently updated the documentation to clarify this. Thanks for the report. - Sean From xma at us.ibm.com Fri May 19 11:10:28 2006 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 19 May 2006 11:10:28 -0700 Subject: [openib-general] ipoib_reap_ah question In-Reply-To: Message-ID: Hello Roland, Is there any particular reason to use ipoib_reap_ah thread? In my tx_ring removal patch, I tested without ipoib_reap_ah work queue by simply adding kref_get(), kref_put() in ipoib_send(), and i didn't see any difference including performance. If there is no other risk, I will remove it to make it simple. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From amit_byron at yahoo.com Fri May 19 11:22:14 2006 From: amit_byron at yahoo.com (amit byron) Date: Fri, 19 May 2006 11:22:14 -0700 (PDT) Subject: [openib-general] infiniband In-Reply-To: <446DF60B.4040600@ichips.intel.com> Message-ID: <20060519182214.11605.qmail@web38514.mail.mud.yahoo.com> hi Sean, the krping test module make use of rdma_cm* apis. will the module work with ib_cm* api? should i using rdma_cm module or ib_cm module, which is the standard? thanks, Amit. Sean Hefty wrote: amit byron wrote: > o where can i find sample code to do rdma write? how to > setup rdma write & read between two infiniband nodes? There are some kernel tests in svn/gen2/utils/src/linux-kernel/infiniband/util. See krping for RDMA read/write. There are usermode tests in svn/gen2/trunk/src/userspace/librdmacm/examples (see rping). > o where can i find documentation on infiniband api? Verbs are documented in the IB Architecture spec. There's additional documentation surrounding most calls in the header / source files. - Sean __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Fri May 19 11:31:04 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 19 May 2006 11:31:04 -0700 Subject: [openib-general] infiniband In-Reply-To: <20060519182214.11605.qmail@web38514.mail.mud.yahoo.com> Message-ID: the krping test module make use of rdma_cm* apis. will the module work with ib_cm* api? You would have to adapt the module to use the ib_cm APIs. The posting of the work requests is the same, however. should i using rdma_cm module or ib_cm module, which is the standard? Both are supported. The rdma_cm will allow you to connect using IP addresses, and operates at a higher level. For most applications, it is probably sufficient, and will be easier to work with. With the ib_cm API, you will need to perform your own SA queries, and handle device hotplug yourself, but you avoid any connection abstraction. - Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Fri May 19 11:59:31 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 19 May 2006 14:59:31 -0400 Subject: [openfabrics-ewg] Re: [openib-general] Howto setup rping andkrping for OpenIB with AMSO1100 In-Reply-To: References: Message-ID: <1148065167.18971.109851.camel@hal.voltaire.com> On Fri, 2006-05-19 at 12:07, Kasten, ChristopherX B wrote: > When I run opensm, Do you just start it as: opensm and there is just the Ammasso card in your machine (and no IB HCAs) ? > I get the error: > > Using default guid 0x0 > Error: Could not get port guid > > The log file at /var/log/osm.log has the error: > > osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind > > It sounds like it wasn't meant to work on ethernet in the first place. It's not. Just wanted to see if the errors you were getting were what I expected. Thanks. -- Hal > Thanks for the clarification. > > Chris > > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Friday, May 19, 2006 3:48 AM > To: Kasten, ChristopherX B > Cc: OpenFabricsEWG; openib-general at openib.org > Subject: Re: [openfabrics-ewg] Re: [openib-general] Howto setup rping > andkrping for OpenIB with AMSO1100 > > On Thu, 2006-05-18 at 21:34, Hal Rosenstock wrote: > > > After spending some time installing OpenIB with AMSO1100, I have > > > managed to get rping and krping to work. Here is a howto document > > > describing the steps I took in the process. > > > > This looks like OFED installation doc with some mods. Should this be > > sent to the OFED list ? > > I don't think that iWARP is not part of OFED 1.0. > > > > This does not get opensm to work, as I am still getting a port > guid > > > error on that front. > > > > How do you invoke OpenSM ? Any options ? > > What error do you get ? What does osm.log say ? > > -- Hal From sean.hefty at intel.com Fri May 19 13:07:07 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 19 May 2006 13:07:07 -0700 Subject: [openib-general] [PATCH] multiple RDMA_CM_EVENT_DISCONNECTED callbacks In-Reply-To: <200605191710.k4JHAhOe005688@robert.bartonsoftware.com> Message-ID: Eric Can you try this patch and let me know if it fixes your problem? - Sean --- Prevent generating duplicated DISCONNECT events. Signed-off-by: Sean Hefty --- Index: cma.c =================================================================== --- cma.c (revision 7362) +++ cma.c (working copy) @@ -83,6 +83,7 @@ enum cma_state { CMA_ROUTE_QUERY, CMA_ROUTE_RESOLVED, CMA_CONNECT, + CMA_DISCONNECT, CMA_ADDR_BOUND, CMA_LISTEN, CMA_DEVICE_REMOVAL, @@ -801,6 +802,8 @@ static int cma_ib_handler(struct ib_cm_i status = -ETIMEDOUT; /* fall through */ case IB_CM_DREQ_RECEIVED: case IB_CM_DREP_RECEIVED: + if (!cma_comp_exch(id_priv, CMA_CONNECT, CMA_DISCONNECT)) + goto out; event = RDMA_CM_EVENT_DISCONNECTED; break; case IB_CM_TIMEWAIT_EXIT: @@ -1770,7 +1773,8 @@ int rdma_disconnect(struct rdma_cm_id *i int ret; id_priv = container_of(id, struct rdma_id_private, id); - if (!cma_comp(id_priv, CMA_CONNECT)) + if (!cma_comp(id_priv, CMA_CONNECT) && + !cma_comp(id_priv, CMA_DISCONNECT)) return -EINVAL; ret = cma_modify_qp_err(id); From services.de.cartes.desjardins at scd.desjardins.com Fri May 19 13:26:29 2006 From: services.de.cartes.desjardins at scd.desjardins.com (services.de.cartes.desjardins at scd.desjardins.com) Date: Fri, 19 May 2006 16:26:29 -0400 Subject: [openib-general] Information importante Message-ID: An HTML attachment was scrubbed... URL: From panda at cse.ohio-state.edu Sat May 20 22:57:00 2006 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Sun, 21 May 2006 01:57:00 -0400 (EDT) Subject: [openib-general] Announcing the Release of MVAPICH2 0.9.3 with multi-threading support and anonymous SVN access Message-ID: <200605210557.k4L5v0sO009768@xi.cse.ohio-state.edu> The MVAPICH team is pleased to announce the availability of MVAPICH2 0.9.3 with the following new features: - Multi-threading support: This support is available for Gen2, VAPI and uDAPL transport interfaces. In addition, multi-threading support for TCP/IP interface (provided by MPICH2 stack) is also available. - Integrated with MPICH2 1.0.3 stack - Advanced AVL tree-based Resource-aware registration cache - Tuning and Optimization of various collective algorithms for a wide range of system sizes - Processor affinity for intra-node shared memory communication - Auto-detection of InfiniBand adapters for Gen2 MVAPICH2 0.9.3 release supports Gen2, VAPI and uDAPL transport interfaces. It also has support for the standard TCP/IP (provided by MPICH2 stack). It is optimized for the following platforms, OS, compilers and InfiniBand adapters: - Platforms: EM64T, Opteron, IA-32, PPC and Mac G5 - Operating Systems: Linux, Solaris and Mac OSX - Compilers: gcc, intel, pathscale and pgi - InfiniBand Adapters: - Mellanox adapters with PCI-X and PCI-Express (SDR and DDR with mem-full and mem-free cards) - PathScale adapter (through OpenIB/Gen2 support) - IBM ehca adapter (through OpenIB/Gen2 support) More details on all features and supported platforms can be obtained by visiting the project's web page -> Overview -> features. Starting with this 0.9.3 release, MVAPICH team is also pleased to announce the availability of the MVAPICH2 code base through anonymous SVN access. Nightly tarballs are also available. The mvapich-commit mailing list can also be used by users, developers and vendors to keep track of all commits happening to the SVN. MVAPICH2 0.9.3 continues to deliver excellent performance. Sample performance numbers include: - OpenIB/Gen2 on EM64T with PCI-Ex and IBA-DDR: Two-sided operations: - 3.28 microsec one-way latency (4 bytes) - 1475 MB/sec unidirectional bandwidth - 2661 MB/sec bidirectional bandwidth One-sided operations: - 4.99 microsec Put latency - 1476 MB/sec unidirectional Put bandwidth - 2661 MB/sec bidirectional Put bandwidth - OpenIB/Gen2 on EM64T with PCI-Ex and IBA-SDR: Two-sided operations: - 3.71 microsec one-way latency (4 bytes) - 964 MB/sec unidirectional bandwidth - 1846 MB/sec bidirectional bandwidth One-sided operations: - 6.12 microsec Put latency - 964 MB/sec unidirectional Put bandwidth - 1846 MB/sec bidirectional Put bandwidth - OpenIB/Gen2 on Opteron with PCI-Ex and IBA-SDR: Two-sided operations: - 3.38 microsec one-way latency (4 bytes) - 971 MB/sec unidirectional bandwidth - 1867 MB/sec bidirectional bandwidth One-sided operations: - 5.98 microsec Put latency - 971 MB/sec unidirectional Put bandwidth - 1867 MB/sec bidirectional Put bandwidth - Solaris uDAPL/IBTL on Opteron with PCI-Ex and IBA-SDR: Two-sided operations: - 5.41 microsec one-way latency (4 bytes) - 981 MB/sec unidirectional bandwidth - 1903 MB/sec bidirectional bandwidth One-sided operations: - 7.42 microsec Put latency - 981 MB/sec unidirectional Put bandwidth - 1903 MB/sec bidirectional Put bandwidth - OpenIB/Gen2 uDAPL on Opteron with PCI-Ex and IBA-SDR: Two-sided operations: - 3.61 microsec one-way latency (4 bytes) - 971 MB/sec unidirectional bandwidth - 1894 MB/sec bidirectional bandwidth One-sided operations: - 6.10 microsec Put latency - 971 MB/sec unidirectional Put bandwidth - 1894 MB/sec bidirectional Put bandwidth Performance numbers for all other platforms, system configurations and operations can be viewed by visiting `Performance' section of the project's web page. Additional features of MVAPICH2 0.9.3 release include: - Similar performance with MVAPICH: With the ADI-3-level design, MVAPICH2 0.9.3 delivers similar performance for two-sided operations compared to MVAPICH 0.9.7. Organizations and users interested in getting the best performance for both two-sided and one-sided operations and also want to exploit `multi-threading' capability may migrate from MVAPICH code base to MVAPICH2 code base. - A set of benchmarks to evaluate both two-sided and one-sided operations (Put, Get, and Accumulate). A new micro-benchmark (Multi-threaded Latency Test) has been added. - An enhanced and detailed `User Guide' is now available (in both html and pdf forms) from the FAQ page. For downloading MVAPICH2 0.9.3 package and accessing the anonymous SVN, please visit the following URL: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/ A stripped down version of this release is also available at the OpenIB SVN. All feedbacks, including bug reports and hints for performance tuning, are welcome. Please post it to the mvapich-discuss mailing list. Thanks, MVAPICH Team at OSU/NBCL ====================================================================== MVAPICH/MVAPICH2 project is currently supported with funding from U.S. National Science Foundation, U.S. DOE Office of Science, Mellanox, Intel, Cisco Systems, Sun Microsystems and Linux Networx; and with equipment support from Advanced Clustering, AMD, Apple, Appro, Dell, IBM, Intel, Mellanox, Microway, PathScale, SilverStorm and Sun Microsystems. Other technology partner includes Etnus. ====================================================================== From ogerlitz at voltaire.com Sat May 20 23:07:28 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 21 May 2006 09:07:28 +0300 Subject: [openib-general] [PATCH] CM: remove pkey from SIDR REQ - use pkey from path rec instead In-Reply-To: References: Message-ID: <447003A0.8060101@voltaire.com> Sean Hefty wrote: > The pkey is provided into a SIDR REQ in two places, once as a parameter, > and again in the path record. Remove the pkey as a parameter and always > use that given in the path record. > > This change has no practical effect on ABI functionality. > > Signed-off-by: Sean Hefty > --- > Index: include/rdma/ib_cm.h > =================================================================== > --- include/rdma/ib_cm.h (revision 6884) > +++ include/rdma/ib_cm.h (working copy) > @@ -546,7 +546,6 @@ struct ib_cm_sidr_req_param { > const void *private_data; > u8 private_data_len; > u8 max_cm_retries; > - u16 pkey; > }; while reviewing this i noticed that this documentation error: ib_send_cm_sidr_rep - Sends a service ID resolution ***request*** to the remote node. which you can fix when committing this patch. Or. From ogerlitz at voltaire.com Sat May 20 23:32:44 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 21 May 2006 09:32:44 +0300 Subject: [openib-general] Re: [PATCH 0/6] iSER (iSCSI Extensions for RDMA) initiator In-Reply-To: References: <15ddcffd0605101233x104265adp31c3fbd13f541f96@mail.gmail.com> <15ddcffd0605110033r5f250597sbb0265610c2a8028@mail.gmail.com> <446B1E84.9020505@voltaire.com> <446C0286.70708@voltaire.com> <446C0710.8090803@voltaire.com> <446C17B0.4010500@voltaire.com> Message-ID: <4470098C.8050501@voltaire.com> Roland Dreier wrote: > Or> Sure, better safe than sorry is good habit! its just this two > Or> weeks short time frame for three (iscsi && cma -> iser) > Or> serialized pushes which worries me a little, i guess there's > Or> nothing we can do about it. > > It's not really a short window at all -- I would be surprised if it > even took more than 3 days to merge everything. All the maintainers > with git trees generally send Linus a pull request on the first day of > the merge window. > > Anyway the only thing that iser is really serialized against is the > iscsi merge. I will send Linus a request to pull iser as soon as he > has pulled James's tree. If Linus has not pulled my for-2.6.18 tree > yet, then he'll just get a bigger merge including the CMA etc. OK, thanks for the clarification, it makes thing clearer now. Or. From tziporet at mellanox.co.il Sun May 21 01:27:36 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 21 May 2006 11:27:36 +0300 Subject: [openib-general] Re: OFED RC4 also can't support >2000 connections In-Reply-To: <20060519084009.42683.qmail@web36910.mail.mud.yahoo.com> References: <20060519084009.42683.qmail@web36910.mail.mud.yahoo.com> Message-ID: <44702478.3070000@mellanox.co.il> zhu shi song wrote: > i'll test latest svn with FC5-x86_64. Will you fix > the problem in OFED release or svn version? > tks > zhu > > Both Tziporet From leonida at voltaire.com Sun May 21 04:38:54 2006 From: leonida at voltaire.com (Leonid Arsh) Date: Sun, 21 May 2006 14:38:54 +0300 Subject: [openib-general][RFC][PATCH] mthca: HCA initialization parameters In-Reply-To: References: <20060518144810.GA9756@voltaire.com> Message-ID: <4470514E.8020501@voltaire.com> Roland Dreier wrote: > Leonid> Hello, we need a capability to change the HCA parameters, > Leonid> in order to tune its resources. > > Leonid> There is a special structure 'mthca_profile' in the > Leonid> MTHA driver, used during the HCA initialization and > Leonid> determining different HCA initialization parameters, such > Leonid> as maximum number of QPs, CQs, address vectors etc. > Leonid> Unfortunately, the parameters can not be defined outside > Leonid> the driver. > > Leonid> Attached file implements a number of the module > Leonid> parameters allowing to define the 'mthca_profile' values. > > Thanks, I've held off on doing this because adding these module > parameters doesn't handle multiple different HCAs very gracefully. > But I'm not sure if I really have a better solution -- (ab)using > request_firmware() maybe? > Do you mean querying the firmware to determine the type of the HCA? > Does it make sense to tune all of these values? I'm not sure that every one of the parameters will be used but my feeling is that we want to let the user change the whole profile for completeness. > For example is anyone > really changing the size of the user access region context? And > certainly making num_uar tunable doesn't make any sense -- what do we > do if the user asks for more UARs than the PCI BAR can cover? And > what do we save if someone asks for fewer? > You are right, we should enforce some boundaries here. I guess that only power users will tweak the profile parameters. > The scheme of making the module parameter take effect only if its > non-zero seems really confusing to me. Someone is going to look in > sysfs and see that num_qp is 0 and get confused. > We have two ideas here: 1. We can set the access permissions to 0 and that way no sysfs entries will be created. 2. Update the module parameters with the real values after the initialization phase. What do you think ? > Also I think all of these values need to be powers of 2, so that > should probably be enforced somehow (either by making the parameters > be log-base-2 values, or using roundup_pow_of_two -- I'm not sure > which is better). > Sounds like the right thing to do, will update the patch. > - R. From sashak at voltaire.com Sun May 21 09:34:44 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 21 May 2006 19:34:44 +0300 Subject: [openib-general] [PATCH] opensm: configurable VLStallCount values Message-ID: <20060521163444.GL14503@sashak.voltaire.com> Hello Hal, This adds configurable vl_stall_count and leaf_vl_stall_count values for switch external ports. Also fixes existed hoq_lifetime processing. The features are: don't bother about not connected ports, hoq_lifetime setup is only for switch and router ports, vl_stall_count setup is only for switch ports, leaf_ values are used for switch ports connected to a CA. Signed-off-by: Sasha Khapyorsky --- osm/include/opensm/osm_base.h | 17 ++++++++++++- osm/include/opensm/osm_subnet.h | 10 ++++++++ osm/opensm/osm_link_mgr.c | 52 ++++++++++++++++++++++++--------------- osm/opensm/osm_subnet.c | 20 +++++++++++++++ 4 files changed, 78 insertions(+), 21 deletions(-) 4fe5d21c60e16bc77ee7d739a3c9170f5197ba8a diff --git a/osm/include/opensm/osm_base.h b/osm/include/opensm/osm_base.h index 4e29d2b..53e85d4 100644 --- a/osm/include/opensm/osm_base.h +++ b/osm/include/opensm/osm_base.h @@ -318,6 +318,20 @@ #define OSM_DEFAULT_HEAD_OF_QUEUE_LIFE 0 #define OSM_DEFAULT_LEAF_HEAD_OF_QUEUE_LIFE 0xC /***********/ +/****d* OpenSM: Base/OSM_DEFAULT_VL_STALL_COUNT +* NAME +* OSM_DEFAULT_LEAF_VL_COUNT +* +* DESCRIPTION +* Sets the number of contiguous head of queue life time drops that +* puts the VL into stalled state. In stalled state the port supposed to +* drop everything for 8*(head of queue lifetime) +* +* SYNOPSIS +*/ +#define OSM_DEFAULT_VL_STALL_COUNT 0x7 +/***********/ + /****d* OpenSM: Base/OSM_DEFAULT_LEAF_VL_STALL_COUNT * NAME * OSM_DEFAULT_LEAF_VL_STALL_COUNT @@ -325,7 +339,8 @@ #define OSM_DEFAULT_LEAF_HEAD_OF_QUEUE_L * DESCRIPTION * Sets the number of contiguous head of queue life time drops that * puts the VL into stalled state. In stalled state the port supposed to -* drop everything for 8*(head of queue lifetime) +* drop everything for 8*(head of queue lifetime). For switch ports +* driving a CAs. * We use here the value of 1 - so any drop due to HOQ means stalling the VL * * SYNOPSIS diff --git a/osm/include/opensm/osm_subnet.h b/osm/include/opensm/osm_subnet.h index 319c494..affa13c 100644 --- a/osm/include/opensm/osm_subnet.h +++ b/osm/include/opensm/osm_subnet.h @@ -249,6 +249,8 @@ typedef struct _osm_subn_opt boolean_t force_log_flush; uint8_t subnet_timeout; uint8_t packet_life_time; + uint8_t vl_stall_count; + uint8_t leaf_vl_stall_count; uint8_t head_of_queue_lifetime; uint8_t leaf_head_of_queue_lifetime; uint8_t local_phy_errors_threshold; @@ -345,6 +347,14 @@ typedef struct _osm_subn_opt * The subnet_timeout that will be set for all the ports in the * design SubnMgt.Set(PortInfo.vl_stall_life)) * +* vl_stall_count +* The number of sequential packets dropped that caused the port +* to enter the VLStalled state. +* +* leaf_vl_stall_count +* The number of sequential packets dropped that caused the port +* to enter the VLStalled state. For switch ports driving a CA +* * head_of_queue_lifetime * The maximal time a packet can live at the head of a VL queue * on any port not driving an HCA port diff --git a/osm/opensm/osm_link_mgr.c b/osm/opensm/osm_link_mgr.c index c495ec3..225da81 100644 --- a/osm/opensm/osm_link_mgr.c +++ b/osm/opensm/osm_link_mgr.c @@ -248,27 +248,39 @@ __osm_link_mgr_set_physp_pi( Several timeout mechanisms: */ p_remote_physp = osm_physp_get_remote( p_physp ); - - if (p_remote_physp && - osm_physp_is_valid(p_remote_physp) && - (osm_node_get_type( osm_physp_get_node_ptr(p_remote_physp) ) != - IB_NODE_TYPE_SWITCH)) - { - /* we drive an HCA port so we need to set stall-count to 1 and - use leaf hoq value */ - ib_port_info_set_hoq_lifetime( - p_pi, p_mgr->p_subn->opt.leaf_head_of_queue_lifetime); - ib_port_info_set_vl_stall_count( - p_pi, OSM_DEFAULT_LEAF_VL_STALL_COUNT); - } - else - { - ib_port_info_set_hoq_lifetime( - p_pi, p_mgr->p_subn->opt.head_of_queue_lifetime); + if (port_num != 0 && p_remote_physp && + osm_physp_is_valid(p_remote_physp)) { + if (osm_node_get_type(osm_physp_get_node_ptr(p_physp)) == + IB_NODE_TYPE_ROUTER) + { + ib_port_info_set_hoq_lifetime( + p_pi, p_mgr->p_subn->opt.head_of_queue_lifetime); + } + else if (osm_node_get_type(osm_physp_get_node_ptr(p_physp)) == + IB_NODE_TYPE_SWITCH) + { + if (osm_node_get_type(osm_physp_get_node_ptr(p_remote_physp)) == + IB_NODE_TYPE_CA) + { + ib_port_info_set_hoq_lifetime( + p_pi, p_mgr->p_subn->opt.leaf_head_of_queue_lifetime); + ib_port_info_set_vl_stall_count( + p_pi, p_mgr->p_subn->opt.leaf_vl_stall_count); + } + else + { + ib_port_info_set_hoq_lifetime( + p_pi, p_mgr->p_subn->opt.head_of_queue_lifetime); + ib_port_info_set_vl_stall_count( + p_pi, p_mgr->p_subn->opt.vl_stall_count); + } + } + if ( ib_port_info_get_hoq_lifetime(p_pi) != + ib_port_info_get_hoq_lifetime(p_old_pi) || + ib_port_info_get_vl_stall_count(p_pi) != + ib_port_info_get_vl_stall_count(p_old_pi) ) + send_set = TRUE; } - if ( ib_port_info_get_hoq_lifetime(p_pi) != - ib_port_info_get_hoq_lifetime(p_old_pi) ) - send_set = TRUE; ib_port_info_set_phy_and_overrun_err_thd( p_pi, diff --git a/osm/opensm/osm_subnet.c b/osm/opensm/osm_subnet.c index c251411..0cf0869 100644 --- a/osm/opensm/osm_subnet.c +++ b/osm/opensm/osm_subnet.c @@ -441,6 +441,8 @@ osm_subn_set_default_opt( p_opt->force_log_flush = FALSE; p_opt->subnet_timeout = OSM_DEFAULT_SUBNET_TIMEOUT; p_opt->packet_life_time = OSM_DEFAULT_SWITCH_PACKET_LIFE; + p_opt->vl_stall_count = OSM_DEFAULT_VL_STALL_COUNT; + p_opt->leaf_vl_stall_count = OSM_DEFAULT_LEAF_VL_STALL_COUNT; p_opt->head_of_queue_lifetime = OSM_DEFAULT_HEAD_OF_QUEUE_LIFE; p_opt->leaf_head_of_queue_lifetime = OSM_DEFAULT_LEAF_HEAD_OF_QUEUE_LIFE; p_opt->local_phy_errors_threshold = OSM_DEFAULT_ERROR_THRESHOLD; @@ -844,6 +846,14 @@ osm_subn_parse_conf_file( p_key, p_val, &p_opts->packet_life_time); __osm_subn_opts_unpack_uint8( + "vl_stall_count", + p_key, p_val, &p_opts->vl_stall_count); + + __osm_subn_opts_unpack_uint8( + "leaf_vl_stall_count", + p_key, p_val, &p_opts->leaf_vl_stall_count); + + __osm_subn_opts_unpack_uint8( "head_of_queue_lifetime", p_key, p_val, &p_opts->head_of_queue_lifetime); @@ -980,6 +990,14 @@ osm_subn_write_conf_file( "# The actual time is 4.096usec * 2^\n" "# The value 0x14 disables this mechanism\n" "packet_life_time 0x%02x\n\n" + "# The number of sequential packets dropped that caused the port\n" + "# to enter the VLStalled state. The result of setting this value to\n" + "# zero is undefined.\n" + "vl_stall_count 0x%02x\n\n" + "# The number of sequential packets dropped that caused the port\n" + "# to enter the VLStalled state. For switch ports driving a CA. The\n" + "# result of setting this value to zero is undefined.\n" + "leaf_vl_stall_count 0x%02x\n\n" "# The code of maximal time a packet can wait at the head of\n" "# transmission queue. \n" "# The actual time is 4.096usec * 2^\n" @@ -1004,6 +1022,8 @@ osm_subn_write_conf_file( cl_ntoh64(p_opts->subnet_prefix), p_opts->lmc, p_opts->packet_life_time, + p_opts->vl_stall_count, + p_opts->leaf_vl_stall_count, p_opts->head_of_queue_lifetime, p_opts->leaf_head_of_queue_lifetime, p_opts->max_op_vls, -- 1.3.2 From sean.hefty at intel.com Sun May 21 11:37:26 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Sun, 21 May 2006 11:37:26 -0700 Subject: [openib-general] [PATCH] CM: remove pkey from SIDR REQ - use pkey from path rec instead In-Reply-To: <447003A0.8060101@voltaire.com> Message-ID: >while reviewing this i noticed that this documentation error: > >ib_send_cm_sidr_rep - Sends a service ID resolution ***request*** to the >remote node. > >which you can fix when committing this patch. Thanks - I'll update the documentation as part of this patch. - Sean From sashak at voltaire.com Sun May 21 15:16:00 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 22 May 2006 01:16:00 +0300 Subject: [openib-general] [PATCH] opensm: remove osm_pkey_mgr.h Message-ID: <20060521221600.30041.98576.stgit@sashak.voltaire.com> Since we expect that osm_pkey_mgr_process() will be called only from osm_state_mgr_process() this patch replaces osm_pkey_mgr.h header file by local prototype. Signed-off-by: Sasha Khapyorsky --- osm/include/Makefile.am | 1 osm/include/opensm/osm_pkey_mgr.h | 92 ------------------------------------- osm/opensm/osm_pkey_mgr.c | 1 osm/opensm/osm_state_mgr.c | 3 + 4 files changed, 2 insertions(+), 95 deletions(-) diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am index b23b1de..2bee762 100644 --- a/osm/include/Makefile.am +++ b/osm/include/Makefile.am @@ -96,7 +96,6 @@ EXTRA_DIST = \ $(srcdir)/opensm/st.h \ $(srcdir)/opensm/osm_mcast_tbl.h \ $(srcdir)/opensm/osm_pkey.h \ - $(srcdir)/opensm/osm_pkey_mgr.h \ $(srcdir)/opensm/osm_sa_mad_ctrl.h \ $(srcdir)/opensm/osm_req_ctrl.h \ $(srcdir)/opensm/osm_sw_info_rcv.h \ diff --git a/osm/include/opensm/osm_pkey_mgr.h b/osm/include/opensm/osm_pkey_mgr.h deleted file mode 100644 index cb0075d..0000000 --- a/osm/include/opensm/osm_pkey_mgr.h +++ /dev/null @@ -1,92 +0,0 @@ -/* - * Copyright (c) 2006 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * $Id$ - */ - - -/* - * Abstract: - * Prototype for osm_pkey_mgr_process() function - * This is part of the OpenSM family of objects. - * - * Environment: - * Linux User Mode - * - * $Revision: 1.4 $ - */ - - -#ifndef _OSM_PKEY_MGR_H_ -#define _OSM_PKEY_MGR_H_ - -#include -#include - -#ifdef __cplusplus -# define BEGIN_C_DECLS extern "C" { -# define END_C_DECLS } -#else /* !__cplusplus */ -# define BEGIN_C_DECLS -# define END_C_DECLS -#endif /* __cplusplus */ - -BEGIN_C_DECLS - -/****f* OpenSM: P_Key Manager/osm_pkey_mgr_process -* NAME -* osm_pkey_mgr_process -* -* DESCRIPTION -* This function enforces the pkey rules on the SM DB. -* -* SYNOPSIS -*/ -osm_signal_t -osm_pkey_mgr_process( - IN osm_opensm_t *p_osm ); -/* -* PARAMETERS -* p_osm -* [in] Pointer to an osm_opensm_t object. -* -* RETURN VALUES -* None -* -* NOTES -* -* SEE ALSO -*********/ - -END_C_DECLS - -#endif /* _OSM_PKEY_MGR_H_ */ diff --git a/osm/opensm/osm_pkey_mgr.c b/osm/opensm/osm_pkey_mgr.c index e08b7cc..91c1a95 100644 --- a/osm/opensm/osm_pkey_mgr.c +++ b/osm/opensm/osm_pkey_mgr.c @@ -56,7 +56,6 @@ #include #include #include #include -#include #include #include diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c index 42fd5e8..724b2b7 100644 --- a/osm/opensm/osm_state_mgr.c +++ b/osm/opensm/osm_state_mgr.c @@ -66,14 +66,15 @@ #include #include #include #include -#include #include #include #include /********************************************************************** + * Prototypes for manager processors used locally **********************************************************************/ osm_signal_t osm_qos_setup(IN osm_opensm_t * p_osm); +osm_signal_t osm_pkey_mgr_process(IN osm_opensm_t * p_osm); /********************************************************************** **********************************************************************/ From sashak at voltaire.com Sun May 21 16:02:12 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 22 May 2006 02:02:12 +0300 Subject: [openib-general] [PATCH] RFC: opensm: serialize osm_state_mgr_process() Message-ID: <20060521230212.GA30176@sashak.voltaire.com> Hello, Please comment (and test). Thanks, Sasha. This serializes execution of osm_state_mgr_process() and removes the big state_lock. This should reduce "locked state" time and prevent potential dispatcher blocking. Other important change here is direct usage of pthread primitives instead of "traditional" cl_thread* stuff. Signed-off-by: Sasha Khapyorsky --- osm/include/Makefile.am | 1 osm/include/opensm/osm_msgdef.h | 15 -- osm/include/opensm/osm_sm.h | 36 ++++- osm/include/opensm/osm_state_mgr.h | 4 - osm/include/opensm/osm_state_mgr_ctrl.h | 236 ------------------------------- osm/opensm/Makefile.am | 5 - osm/opensm/osm_helper.c | 1 osm/opensm/osm_node_info_rcv.c | 5 - osm/opensm/osm_port_info_rcv.c | 3 osm/opensm/osm_sm.c | 160 +++++++++++---------- osm/opensm/osm_sm_mad_ctrl.c | 26 --- osm/opensm/osm_sm_state_mgr.c | 8 + osm/opensm/osm_sminfo_rcv.c | 8 + osm/opensm/osm_state_mgr.c | 24 --- osm/opensm/osm_state_mgr_ctrl.c | 132 ----------------- osm/opensm/osm_sw_info_rcv.c | 3 osm/opensm/osm_sweep_fail_ctrl.c | 4 - osm/opensm/osm_trap_rcv.c | 3 osm/opensm/osm_vl15intf.c | 25 --- 19 files changed, 143 insertions(+), 556 deletions(-) diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am index 2bee762..0af78a0 100644 --- a/osm/include/Makefile.am +++ b/osm/include/Makefile.am @@ -120,7 +120,6 @@ EXTRA_DIST = \ $(srcdir)/opensm/osm_vl15intf.h \ $(srcdir)/opensm/osm_drop_mgr.h \ $(srcdir)/opensm/osm_port_info_rcv.h \ - $(srcdir)/opensm/osm_state_mgr_ctrl.h \ $(srcdir)/complib/cl_thread_osd.h \ $(srcdir)/complib/cl_packon.h \ $(srcdir)/complib/cl_atomic_osd.h \ diff --git a/osm/include/opensm/osm_msgdef.h b/osm/include/opensm/osm_msgdef.h index a1b5743..6956c86 100644 --- a/osm/include/opensm/osm_msgdef.h +++ b/osm/include/opensm/osm_msgdef.h @@ -148,20 +148,6 @@ BEGIN_C_DECLS * * SOURCE ***********/ -/****d* OpenSM: Dispatcher Messages/OSM_MSG_NO_SMPS_OUTSTANDING -* NAME -* OSM_MSG_NO_SMPS_OUTSTANDING -* -* DESCRIPTION -* Message indicating that there are no outstanding SMPs on the subnet. -* -* NOTES -* Sent by: osm_mad_ctrl_t -* Received by: osm_state_mgr_ctrl_t -* Delivery notice: no -* -* SOURCE -***********/ enum { OSM_MSG_REQ = 0, @@ -169,7 +155,6 @@ enum OSM_MSG_MAD_PORT_INFO, OSM_MSG_MAD_SWITCH_INFO, OSM_MSG_MAD_NODE_DESC, - OSM_MSG_NO_SMPS_OUTSTANDING, OSM_MSG_MAD_NODE_RECORD, OSM_MSG_MAD_PORTINFO_RECORD, OSM_MSG_MAD_SERVICE_RECORD, diff --git a/osm/include/opensm/osm_sm.h b/osm/include/opensm/osm_sm.h index d6086d4..def43c8 100644 --- a/osm/include/opensm/osm_sm.h +++ b/osm/include/opensm/osm_sm.h @@ -69,7 +69,6 @@ #include #include #include -#include #include #include #include @@ -131,7 +130,6 @@ BEGIN_C_DECLS typedef struct _osm_sm { osm_thread_state_t thread_state; - cl_event_t signal; cl_event_t subnet_up_event; cl_thread_t sweeper; osm_subn_t *p_subn; @@ -143,6 +141,9 @@ typedef struct _osm_sm cl_dispatcher_t *p_disp; cl_plock_t *p_lock; atomic32_t sm_trans_id; + unsigned signal_mask; + pthread_mutex_t mutex; + pthread_cond_t cond; osm_req_t req; osm_req_ctrl_t req_ctrl; osm_resp_t resp; @@ -155,7 +156,6 @@ typedef struct _osm_sm osm_sm_mad_ctrl_t mad_ctrl; osm_si_rcv_t si_rcv; osm_si_rcv_ctrl_t si_rcv_ctrl; - osm_state_mgr_ctrl_t state_mgr_ctrl; osm_lid_mgr_t lid_mgr; osm_ucast_mgr_t ucast_mgr; osm_link_mgr_t link_mgr; @@ -387,6 +387,33 @@ osm_sm_init( * SM object, osm_sm_construct, osm_sm_destroy *********/ +/****f* OpenSM: SM/osm_sm_signal +* NAME +* osm_sm_signal +* +* DESCRIPTION +* Forward signal to state engine +* +* SYNOPSIS +*/ +void +osm_sm_signal( + IN osm_sm_t* const p_sm, + IN osm_signal_t signal ); +/* +* PARAMETERS +* p_sm +* [in] Pointer to an osm_sm_t object. +* +* signal +* [in] Signal to the state engine. +* +* NOTES +* +* SEE ALSO +* SM object +*********/ + /****f* OpenSM: SM/osm_sm_sweep * NAME * osm_sm_sweep @@ -404,9 +431,6 @@ osm_sm_sweep( * p_sm * [in] Pointer to an osm_sm_t object. * -* RETURN VALUES -* IB_SUCCESS if the sweep completed successfully. -* * NOTES * * SEE ALSO diff --git a/osm/include/opensm/osm_state_mgr.h b/osm/include/opensm/osm_state_mgr.h index ad4afa0..5e76463 100644 --- a/osm/include/opensm/osm_state_mgr.h +++ b/osm/include/opensm/osm_state_mgr.h @@ -116,7 +116,6 @@ typedef struct _osm_state_mgr osm_stats_t *p_stats; struct _osm_sm_state_mgr *p_sm_state_mgr; const osm_sm_mad_ctrl_t *p_mad_ctrl; - cl_spinlock_t state_lock; cl_spinlock_t idle_lock; cl_qlist_t idle_time_list; cl_plock_t *p_lock; @@ -161,9 +160,6 @@ typedef struct _osm_state_mgr * p_mad_ctrl * Pointer to the SM's MAD Controller object. * -* state_lock -* Spinlock guarding the state and processes. -* * p_lock * lock guarding the subnet object. * diff --git a/osm/include/opensm/osm_state_mgr_ctrl.h b/osm/include/opensm/osm_state_mgr_ctrl.h deleted file mode 100644 index 9ffcfb0..0000000 --- a/osm/include/opensm/osm_state_mgr_ctrl.h +++ /dev/null @@ -1,236 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * $Id$ - */ - - -/* - * Abstract: - * Declaration of osm_state_mgr_ctrl_t. - * This object represents a controller that receives the - * State indication after a subnet sweep. - * This object is part of the OpenSM family of objects. - * - * Environment: - * Linux User Mode - * - * $Revision: 1.4 $ - */ - -#ifndef _OSM_STATE_MGR_CTRL_H_ -#define _OSM_STATE_MGR_CTRL_H_ - - -#include -#include -#include -#include - -#ifdef __cplusplus -# define BEGIN_C_DECLS extern "C" { -# define END_C_DECLS } -#else /* !__cplusplus */ -# define BEGIN_C_DECLS -# define END_C_DECLS -#endif /* __cplusplus */ - -BEGIN_C_DECLS - -/****h* OpenSM/State Manager Controller -* NAME -* State Manager Controller -* -* DESCRIPTION -* The State Manager Controller object encapsulates the information -* needed to pass the dispatcher message from the dispatcher -* to the State Manager. -* -* The State Manager Controller object is thread safe. -* -* This object should be treated as opaque and should be -* manipulated only through the provided functions. -* -* AUTHOR -* Steve King, Intel -* -*********/ -/****s* OpenSM: State Manager Controller/osm_state_mgr_ctrl_t -* NAME -* osm_state_mgr_ctrl_t -* -* DESCRIPTION -* State Manager Controller structure. -* -* This object should be treated as opaque and should -* be manipulated only through the provided functions. -* -* SYNOPSIS -*/ -typedef struct _osm_state_mgr_ctrl -{ - osm_state_mgr_t *p_mgr; - osm_log_t *p_log; - cl_dispatcher_t *p_disp; - cl_disp_reg_handle_t h_disp; - -} osm_state_mgr_ctrl_t; -/* -* FIELDS -* p_mgr -* Pointer to the State Manager object. -* -* p_log -* Pointer to the log object. -* -* p_disp -* Pointer to the Dispatcher. -* -* h_disp -* Handle returned from dispatcher registration. -* -* SEE ALSO -* State Manager Controller object -*********/ - -/****f* OpenSM: State Manager Controller/osm_state_mgr_ctrl_construct -* NAME -* osm_state_mgr_ctrl_construct -* -* DESCRIPTION -* This function constructs a State Manager Controller object. -* -* SYNOPSIS -*/ -void -osm_state_mgr_ctrl_construct( - IN osm_state_mgr_ctrl_t* const p_ctrl ); -/* -* PARAMETERS -* p_ctrl -* [in] Pointer to a State Manager Controller -* object to construct. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* Allows calling osm_state_mgr_ctrl_init, osm_state_mgr_ctrl_destroy, -* and osm_state_mgr_ctrl_is_inited. -* -* Calling osm_state_mgr_ctrl_construct is a prerequisite to calling any other -* method except osm_state_mgr_ctrl_init. -* -* SEE ALSO -* State Manager Controller object, osm_state_mgr_ctrl_init, -* osm_state_mgr_ctrl_destroy -*********/ - -/****f* OpenSM: State Manager Controller/osm_state_mgr_ctrl_destroy -* NAME -* osm_state_mgr_ctrl_destroy -* -* DESCRIPTION -* The osm_state_mgr_ctrl_destroy function destroys the object, releasing -* all resources. -* -* SYNOPSIS -*/ -void -osm_state_mgr_ctrl_destroy( - IN osm_state_mgr_ctrl_t* const p_ctrl ); -/* -* PARAMETERS -* p_ctrl -* [in] Pointer to the object to destroy. -* -* RETURN VALUE -* This function does not return a value. -* -* NOTES -* Performs any necessary cleanup of the specified -* State Manager Controller object. -* Further operations should not be attempted on the destroyed object. -* This function should only be called after a call to -* osm_state_mgr_ctrl_construct or osm_state_mgr_ctrl_init. -* -* SEE ALSO -* State Manager Controller object, osm_state_mgr_ctrl_construct, -* osm_state_mgr_ctrl_init -*********/ - -/****f* OpenSM: State Manager Controller/osm_state_mgr_ctrl_init -* NAME -* osm_state_mgr_ctrl_init -* -* DESCRIPTION -* The osm_state_mgr_ctrl_init function initializes a -* State Manager Controller object for use. -* -* SYNOPSIS -*/ -ib_api_status_t -osm_state_mgr_ctrl_init( - IN osm_state_mgr_ctrl_t* const p_ctrl, - IN osm_state_mgr_t* const p_mgr, - IN osm_log_t* const p_log, - IN cl_dispatcher_t* const p_disp ); -/* -* PARAMETERS -* p_ctrl -* [in] Pointer to an osm_state_mgr_ctrl_t object to initialize. -* -* p_mgr -* [in] Pointer to an osm_state_mgr_t object. -* -* p_log -* [in] Pointer to the log object. -* -* p_disp -* [in] Pointer to the OpenSM central Dispatcher. -* -* RETURN VALUES -* IB_SUCCESS if the State Manager Controller object -* was initialized successfully. -* -* NOTES -* Allows calling other State Manager Controller methods. -* -* SEE ALSO -* State Manager Controller object, osm_state_mgr_ctrl_construct, -* osm_state_mgr_ctrl_destroy -*********/ - -END_C_DECLS - -#endif /* OSM_STATE_MGR_CTRL_H_ */ diff --git a/osm/opensm/Makefile.am b/osm/opensm/Makefile.am index 43fe8c1..7b1060a 100644 --- a/osm/opensm/Makefile.am +++ b/osm/opensm/Makefile.am @@ -78,8 +78,7 @@ opensm_SOURCES = main.c osm_console.c os osm_slvl_map_rcv.c osm_slvl_map_rcv_ctrl.c \ osm_sm.c osm_sminfo_rcv.c \ osm_sminfo_rcv_ctrl.c osm_sm_mad_ctrl.c \ - osm_sm_state_mgr.c osm_state_mgr.c \ - osm_state_mgr_ctrl.c osm_subnet.c \ + osm_sm_state_mgr.c osm_state_mgr.c osm_subnet.c \ osm_sweep_fail_ctrl.c osm_sw_info_rcv.c \ osm_sw_info_rcv_ctrl.c osm_switch.c \ osm_prtn.c osm_prtn_config.c osm_qos.c \ @@ -104,7 +103,7 @@ # we need to be able to load libraries f # we always give precedence to local tree libs and then use the pre-installed ones. opensm_LDADD = -L../complib -L../libvendor -L. $(OSMV_LDADD) -lopensm -losmcomp -losmvendor -opensm_LDFLAGS = -Wl,--rpath -Wl,$(libdir) -lpthread +opensm_LDFLAGS = -Wl,--rpath -Wl,$(libdir) -lpthread -lrt opensmincludedir = $(includedir)/infiniband/opensm diff --git a/osm/opensm/osm_helper.c b/osm/opensm/osm_helper.c index 3886609..a966bbe 100644 --- a/osm/opensm/osm_helper.c +++ b/osm/opensm/osm_helper.c @@ -1895,7 +1895,6 @@ static const char* const __osm_disp_msg_ "OSM_MSG_MAD_PORT_INFO,", "OSM_MSG_MAD_SWITCH_INFO", "OSM_MSG_MAD_NODE_DESC", - "OSM_MSG_NO_SMPS_OUTSTANDING", "OSM_MSG_MAD_NODE_RECORD", "OSM_MSG_MAD_PORTINFO_RECORD", "OSM_MSG_MAD_SERVICE_RECORD", diff --git a/osm/opensm/osm_node_info_rcv.c b/osm/opensm/osm_node_info_rcv.c index 59257a0..7ea4366 100644 --- a/osm/opensm/osm_node_info_rcv.c +++ b/osm/opensm/osm_node_info_rcv.c @@ -69,6 +69,7 @@ #include #include #include #include +#include /********************************************************************** @@ -1088,11 +1089,11 @@ osm_ni_rcv_process( /* * If we processed a new node - need to signal to the state_mgr that - * change detected. BUT - we cannot call the osm_state_mgr_process + * change detected. BUT - we cannot call the osm_sm_signal * from within the lock of p_rcv->p_lock (can cause a deadlock). */ if ( process_new_flag ) - osm_state_mgr_process( p_rcv->p_state_mgr, OSM_SIGNAL_CHANGE_DETECTED ); + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, OSM_SIGNAL_CHANGE_DETECTED ); Exit: OSM_LOG_EXIT( p_rcv->p_log ); diff --git a/osm/opensm/osm_port_info_rcv.c b/osm/opensm/osm_port_info_rcv.c index a08c57c..7405ef0 100644 --- a/osm/opensm/osm_port_info_rcv.c +++ b/osm/opensm/osm_port_info_rcv.c @@ -69,6 +69,7 @@ #include #include #include #include +#include /********************************************************************** **********************************************************************/ @@ -701,7 +702,7 @@ osm_pi_rcv_process( " port = %u, Commencing heavy sweep\n", cl_ntoh64( node_guid ), cl_ntoh64( port_guid ) ); - osm_state_mgr_process( p_rcv->p_state_mgr, + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, OSM_SIGNAL_CHANGE_DETECTED ); goto Exit; } diff --git a/osm/opensm/osm_sm.c b/osm/opensm/osm_sm.c index 0e09f26..5b5eb3f 100644 --- a/osm/opensm/osm_sm.c +++ b/osm/opensm/osm_sm.c @@ -55,6 +55,8 @@ #if HAVE_CONFIG_H # include #endif /* HAVE_CONFIG_H */ +#include +#include #include #include #include @@ -79,53 +81,65 @@ void __osm_sm_sweeper( IN void *p_ptr ) { - ib_api_status_t status; osm_sm_t *const p_sm = ( osm_sm_t * ) p_ptr; + unsigned i, signals; OSM_LOG_ENTER( p_sm->p_log, __osm_sm_sweeper ); - if( p_sm->thread_state == OSM_THREAD_STATE_INIT ) - { - p_sm->thread_state = OSM_THREAD_STATE_RUN; - } - - /* If the sweep interval was updated before - then run only if - * it is not zero. */ - while( p_sm->thread_state == OSM_THREAD_STATE_RUN && - p_sm->p_subn->opt.sweep_interval != 0 ) - { - /* do the sweep only if we are in MASTER state */ - if( p_sm->p_subn->sm_state == IB_SMINFO_STATE_MASTER || - p_sm->p_subn->sm_state == IB_SMINFO_STATE_DISCOVERING ) - osm_state_mgr_process( &p_sm->state_mgr, OSM_SIGNAL_SWEEP ); + do { + signals = 0; + pthread_mutex_lock(&p_sm->mutex); + if (p_sm->signal_mask == 0) + pthread_cond_wait(&p_sm->cond, &p_sm->mutex); + signals = p_sm->signal_mask; + p_sm->signal_mask = 0; + pthread_mutex_unlock(&p_sm->mutex); + for (i = 0 ; signals ; i++) { + if (signals&1) + osm_state_mgr_process( &p_sm->state_mgr, i); + signals >>= 1; + } + } while (p_sm->thread_state == OSM_THREAD_STATE_RUN); - /* - * Wait on the event with a timeout. - * Sweeps may be iniated "off schedule" by simply - * signaling the event. - */ - status = cl_event_wait_on( &p_sm->signal, - p_sm->p_subn->opt.sweep_interval * 1000000, - TRUE ); + OSM_LOG_EXIT( p_sm->p_log ); +} - if( status == CL_SUCCESS ) - { - if( osm_log_is_active( p_sm->p_log, OSM_LOG_DEBUG ) ) - { - osm_log( p_sm->p_log, OSM_LOG_DEBUG, - "__osm_sm_sweeper: " "Off schedule sweep signalled\n" ); - } +/********************************************************************** + **********************************************************************/ +void +__osm_sm_sweeper_periodic( + IN void *p_ptr ) +{ + struct timespec times; + osm_sm_t *const p_sm = ( osm_sm_t * ) p_ptr; + unsigned i, signals; + int ret; + + OSM_LOG_ENTER( p_sm->p_log, __osm_sm_sweeper_periodic ); + + do { + clock_gettime(CLOCK_REALTIME, ×); + times.tv_sec += p_sm->p_subn->opt.sweep_interval; + signals = 0; + pthread_mutex_lock(&p_sm->mutex); + if (p_sm->signal_mask == 0 && + (ret = pthread_cond_timedwait(&p_sm->cond, &p_sm->mutex, + ×)) == ETIMEDOUT && + ( p_sm->p_subn->sm_state == IB_SMINFO_STATE_MASTER || + p_sm->p_subn->sm_state == IB_SMINFO_STATE_DISCOVERING )) { + signals = OSM_SIGNAL_SWEEP; } - else - { - if( status != CL_TIMEOUT ) - { - osm_log( p_sm->p_log, OSM_LOG_ERROR, - "__osm_sm_sweeper: ERR 2E01: " - "Event wait failed (%s)\n", CL_STATUS_MSG( status ) ); - } + else { + signals = p_sm->signal_mask; + p_sm->signal_mask = 0; } - } + pthread_mutex_unlock(&p_sm->mutex); + for (i = 0 ; signals ; i++) { + if (signals&1) + osm_state_mgr_process( &p_sm->state_mgr, i); + signals >>= 1; + } + } while (p_sm->thread_state == OSM_THREAD_STATE_RUN); OSM_LOG_EXIT( p_sm->p_log ); } @@ -139,7 +153,6 @@ osm_sm_construct( memset( p_sm, 0, sizeof( *p_sm ) ); p_sm->thread_state = OSM_THREAD_STATE_NONE; p_sm->sm_trans_id = OSM_SM_INITIAL_TID_VALUE; - cl_event_construct( &p_sm->signal ); cl_event_construct( &p_sm->subnet_up_event ); cl_thread_construct( &p_sm->sweeper ); osm_req_construct( &p_sm->req ); @@ -158,7 +171,6 @@ osm_sm_construct( osm_ucast_mgr_construct( &p_sm->ucast_mgr ); osm_link_mgr_construct( &p_sm->link_mgr ); osm_state_mgr_construct( &p_sm->state_mgr ); - osm_state_mgr_ctrl_construct( &p_sm->state_mgr_ctrl ); osm_drop_mgr_construct( &p_sm->drop_mgr ); osm_lft_rcv_construct( &p_sm->lft_rcv ); osm_lft_rcv_ctrl_construct( &p_sm->lft_rcv_ctrl ); @@ -185,24 +197,14 @@ void osm_sm_shutdown( IN osm_sm_t * const p_sm ) { - boolean_t signal_event = FALSE; - OSM_LOG_ENTER( p_sm->p_log, osm_sm_shutdown ); /* * Signal our threads that we're leaving. - */ - if( p_sm->thread_state != OSM_THREAD_STATE_NONE ) - signal_event = TRUE; - - p_sm->thread_state = OSM_THREAD_STATE_EXIT; - - /* - * Don't trigger unless event has been initialized. * Destroy the thread before we tear down the other objects. */ - if( signal_event ) - cl_event_signal( &p_sm->signal ); + p_sm->thread_state = OSM_THREAD_STATE_EXIT; + osm_sm_signal( p_sm, OSM_SIGNAL_NONE ); cl_thread_destroy( &p_sm->sweeper ); @@ -225,7 +227,6 @@ osm_sm_shutdown( osm_vla_rcv_ctrl_destroy( &p_sm->vla_rcv_ctrl ); osm_pkey_rcv_ctrl_destroy( &p_sm->pkey_rcv_ctrl ); osm_sweep_fail_ctrl_destroy( &p_sm->sweep_fail_ctrl ); - osm_state_mgr_ctrl_destroy( &p_sm->state_mgr_ctrl ); OSM_LOG_EXIT( p_sm->p_log ); } @@ -257,12 +258,14 @@ osm_sm_destroy( osm_state_mgr_destroy( &p_sm->state_mgr ); osm_sm_state_mgr_destroy( &p_sm->sm_state_mgr ); osm_mcast_mgr_destroy( &p_sm->mcast_mgr ); - cl_event_destroy( &p_sm->signal ); cl_event_destroy( &p_sm->subnet_up_event ); if( p_sm->p_report_buf != NULL ) cl_free( p_sm->p_report_buf ); + pthread_cond_destroy(&p_sm->cond); + pthread_mutex_destroy(&p_sm->mutex); + osm_log( p_sm->p_log, OSM_LOG_SYS, "Exiting SM\n" ); /* Format Waived */ OSM_LOG_EXIT( p_sm->p_log ); } @@ -303,14 +306,15 @@ osm_sm_init( status = IB_INSUFFICIENT_MEMORY; goto Exit; } - status = cl_event_init( &p_sm->signal, FALSE ); - if( status != CL_SUCCESS ) - goto Exit; status = cl_event_init( &p_sm->subnet_up_event, FALSE ); if( status != CL_SUCCESS ) goto Exit; + p_sm->signal_mask = 0; + pthread_mutex_init(&p_sm->mutex, NULL); + pthread_cond_init(&p_sm->cond, NULL); + status = osm_sm_mad_ctrl_init( &p_sm->mad_ctrl, p_sm->p_subn, p_sm->p_mad_pool, @@ -416,12 +420,6 @@ osm_sm_init( if( status != IB_SUCCESS ) goto Exit; - status = osm_state_mgr_ctrl_init( &p_sm->state_mgr_ctrl, - &p_sm->state_mgr, - p_sm->p_log, p_sm->p_disp ); - if( status != IB_SUCCESS ) - goto Exit; - status = osm_drop_mgr_init( &p_sm->drop_mgr, p_sm->p_subn, p_sm->p_log, &p_sm->req, p_sm->p_lock ); @@ -523,16 +521,15 @@ osm_sm_init( /* * Now that the component objects are initialized, start - * the sweeper thread if the user wants sweeping. + * the sweeper thread. */ - if( p_sm->p_subn->opt.sweep_interval ) - { - p_sm->thread_state = OSM_THREAD_STATE_INIT; - status = cl_thread_init( &p_sm->sweeper, __osm_sm_sweeper, p_sm, - "opensm sweeper" ); - if( status != IB_SUCCESS ) - goto Exit; - } + p_sm->thread_state = OSM_THREAD_STATE_RUN; + status = cl_thread_init( &p_sm->sweeper, + p_sm->p_subn->opt.sweep_interval > 0 ? + __osm_sm_sweeper_periodic : __osm_sm_sweeper, + p_sm, "opensm sweeper" ); + if( status != IB_SUCCESS ) + goto Exit; Exit: OSM_LOG_EXIT( p_log ); @@ -542,11 +539,26 @@ osm_sm_init( /********************************************************************** **********************************************************************/ void +osm_sm_signal( + IN osm_sm_t* const p_sm, + IN osm_signal_t signal ) +{ + OSM_LOG_ENTER( p_sm->p_log, osm_sm_signal ); + pthread_mutex_lock(&p_sm->mutex); + p_sm->signal_mask |= (1 << signal); + pthread_cond_signal(&p_sm->cond); + pthread_mutex_unlock(&p_sm->mutex); + OSM_LOG_EXIT( p_sm->p_log ); +} + +/********************************************************************** + **********************************************************************/ +void osm_sm_sweep( IN osm_sm_t * const p_sm ) { OSM_LOG_ENTER( p_sm->p_log, osm_sm_sweep ); - osm_state_mgr_process( &p_sm->state_mgr, OSM_SIGNAL_SWEEP ); + osm_sm_signal( p_sm, OSM_SIGNAL_SWEEP ); OSM_LOG_EXIT( p_sm->p_log ); } diff --git a/osm/opensm/osm_sm_mad_ctrl.c b/osm/opensm/osm_sm_mad_ctrl.c index 9dceef2..1982873 100644 --- a/osm/opensm/osm_sm_mad_ctrl.c +++ b/osm/opensm/osm_sm_mad_ctrl.c @@ -81,7 +81,6 @@ __osm_sm_mad_ctrl_retire_trans_mad( IN osm_madw_t* const p_madw ) { uint32_t outstanding; - cl_status_t status; OSM_LOG_ENTER( p_ctrl->p_log, __osm_sm_mad_ctrl_retire_trans_mad ); @@ -115,31 +114,10 @@ __osm_sm_mad_ctrl_retire_trans_mad( The wire is clean. Signal the state manager. */ - if( osm_log_is_active( p_ctrl->p_log, OSM_LOG_DEBUG ) ) - { - osm_log( p_ctrl->p_log, OSM_LOG_DEBUG, - "__osm_sm_mad_ctrl_retire_trans_mad: " - "Posting Dispatcher message %s\n", - osm_get_disp_msg_str( OSM_MSG_NO_SMPS_OUTSTANDING ) ); - } - - status = cl_disp_post( p_ctrl->h_disp, - OSM_MSG_NO_SMPS_OUTSTANDING, - (void *)OSM_SIGNAL_NO_PENDING_TRANSACTIONS, - NULL, - NULL ); - - if( status != CL_SUCCESS ) - { - osm_log( p_ctrl->p_log, OSM_LOG_ERROR, - "__osm_sm_mad_ctrl_retire_trans_mad: ERR 3101: " - "Dispatcher post message failed (%s)\n", - CL_STATUS_MSG( status ) ); - goto Exit; - } + osm_sm_signal( &p_ctrl->p_subn->p_osm->sm, + OSM_SIGNAL_NO_PENDING_TRANSACTIONS ); } - Exit: OSM_LOG_EXIT( p_ctrl->p_log ); } /************/ diff --git a/osm/opensm/osm_sm_state_mgr.c b/osm/opensm/osm_sm_state_mgr.c index feeda45..2c6da4e 100644 --- a/osm/opensm/osm_sm_state_mgr.c +++ b/osm/opensm/osm_sm_state_mgr.c @@ -563,7 +563,7 @@ osm_sm_state_mgr_process( /* * Stop the discovering */ - osm_state_mgr_process( p_sm_mgr->p_state_mgr, + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, OSM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED ); break; case OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE: @@ -610,7 +610,7 @@ osm_sm_state_mgr_process( __osm_sm_state_mgr_discovering_msg( p_sm_mgr ); p_sm_mgr->p_subn->sm_state = IB_SMINFO_STATE_DISCOVERING; p_sm_mgr->p_subn->coming_out_of_standby = TRUE; - osm_state_mgr_process( p_sm_mgr->p_state_mgr, OSM_SIGNAL_EXIT_STBY ); + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, OSM_SIGNAL_EXIT_STBY ); break; case OSM_SM_SIGNAL_DISABLE: /* @@ -641,7 +641,7 @@ osm_sm_state_mgr_process( */ p_sm_mgr->p_subn->master_sm_base_lid = p_sm_mgr->p_subn->sm_base_lid; p_sm_mgr->p_subn->coming_out_of_standby = TRUE; - osm_state_mgr_process( p_sm_mgr->p_state_mgr, OSM_SIGNAL_EXIT_STBY ); + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, OSM_SIGNAL_EXIT_STBY ); break; case OSM_SM_SIGNAL_ACKNOWLEDGE: /* @@ -704,7 +704,7 @@ osm_sm_state_mgr_process( "Received OSM_SM_SIGNAL_HANDOVER\n" ); p_sm_mgr->p_polling_sm = NULL; p_sm_mgr->p_subn->force_immediate_heavy_sweep = TRUE; - osm_state_mgr_process( p_sm_mgr->p_state_mgr, OSM_SIGNAL_SWEEP ); + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, OSM_SIGNAL_SWEEP ); break; case OSM_SM_SIGNAL_HANDOVER_SENT: /* diff --git a/osm/opensm/osm_sminfo_rcv.c b/osm/opensm/osm_sminfo_rcv.c index 5914984..4af549b 100644 --- a/osm/opensm/osm_sminfo_rcv.c +++ b/osm/opensm/osm_sminfo_rcv.c @@ -425,10 +425,10 @@ __osm_sminfo_rcv_process_set_request( } /********************************************************************** - * Return a signal with which to call the osm_state_mgr_process. + * Return a signal with which to call the osm_sm_signal. * This is done since we are locked by p_rcv->p_lock in this function, - * and thus cannot call osm_state_mgr_process (that locks the state_lock). - * If return OSM_SIGNAL_NONE - do not call osm_state_mgr_process. + * and thus cannot call osm_sm_signal. + * If return OSM_SIGNAL_NONE - do not call osm_sm_signal. **********************************************************************/ osm_signal_t __osm_sminfo_rcv_process_get_sm( @@ -676,7 +676,7 @@ __osm_sminfo_rcv_process_get_response( /* If process_get_sm_ret_val != OSM_SIGNAL_NONE then we have to signal * to the state_mgr with that signal. */ if (process_get_sm_ret_val != OSM_SIGNAL_NONE) - osm_state_mgr_process( p_rcv->p_state_mgr, + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, process_get_sm_ret_val ); OSM_LOG_EXIT( p_rcv->p_log ); } diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c index 724b2b7..ff1c65c 100644 --- a/osm/opensm/osm_state_mgr.c +++ b/osm/opensm/osm_state_mgr.c @@ -83,7 +83,6 @@ osm_state_mgr_construct( IN osm_state_mgr_t * const p_mgr ) { memset( p_mgr, 0, sizeof( *p_mgr ) ); - cl_spinlock_construct( &p_mgr->state_lock ); cl_spinlock_construct( &p_mgr->idle_lock ); p_mgr->state = OSM_SM_STATE_INIT; } @@ -99,7 +98,6 @@ osm_state_mgr_destroy( OSM_LOG_ENTER( p_mgr->p_log, osm_state_mgr_destroy ); /* destroy the locks */ - cl_spinlock_destroy( &p_mgr->state_lock ); cl_spinlock_destroy( &p_mgr->idle_lock ); OSM_LOG_EXIT( p_mgr->p_log ); @@ -162,14 +160,6 @@ osm_state_mgr_init( p_mgr->state_step_mode = OSM_STATE_STEP_CONTINUOUS; p_mgr->next_stage_signal = OSM_SIGNAL_NONE; - status = cl_spinlock_init( &p_mgr->state_lock ); - if( status != CL_SUCCESS ) - { - osm_log( p_mgr->p_log, OSM_LOG_ERROR, - "osm_state_mgr_init: ERR 3301: " - "Spinlock init failed (%s)\n", CL_STATUS_MSG( status ) ); - } - cl_qlist_init( &p_mgr->idle_time_list ); status = cl_spinlock_init( &p_mgr->idle_lock ); @@ -1897,16 +1887,6 @@ osm_state_mgr_process( if( osm_exit_flag ) signal = OSM_SIGNAL_NONE; - /* - * The state lock prevents many race conditions from screwing - * up the state transition process. For example, if an function - * puts transactions on the wire, the state lock guarantees this - * loop will see the return code ("DONE PENDING") of the function - * before the "NO OUTSTANDING TRANSACTIONS" signal is asynchronously - * received. - */ - cl_spinlock_acquire( &p_mgr->state_lock ); - while( signal != OSM_SIGNAL_NONE ) { if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) @@ -2957,8 +2937,6 @@ osm_state_mgr_process( p_mgr->state_step_mode = OSM_STATE_STEP_BREAK; } - cl_spinlock_release( &p_mgr->state_lock ); - OSM_LOG_EXIT( p_mgr->p_log ); } @@ -2994,7 +2972,7 @@ osm_state_mgr_process_idle( cl_qlist_insert_tail( &p_mgr->idle_time_list, &p_idle_item->list_item ); cl_spinlock_release( &p_mgr->idle_lock ); - osm_state_mgr_process( p_mgr, OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST ); + osm_sm_signal( &p_mgr->p_subn->p_osm->sm, OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST ); OSM_LOG_EXIT( p_mgr->p_log ); diff --git a/osm/opensm/osm_state_mgr_ctrl.c b/osm/opensm/osm_state_mgr_ctrl.c deleted file mode 100644 index 0bde333..0000000 --- a/osm/opensm/osm_state_mgr_ctrl.c +++ /dev/null @@ -1,132 +0,0 @@ -/* - * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - * $Id$ - */ - - -/* - * Abstract: - * Implementation of osm_state_mgr_ctrl_t. - * This object represents the State Manager Controller object. - * This object is part of the opensm family of objects. - * - * Environment: - * Linux User Mode - * - * $Revision: 1.5 $ - */ - -/* - Next available error code: 0x1601 -*/ - -#if HAVE_CONFIG_H -# include -#endif /* HAVE_CONFIG_H */ - -#include -#include -#include - -/********************************************************************** - **********************************************************************/ -void -__osm_state_mgr_ctrl_disp_callback( - IN void *context, - IN void *p_data ) -{ - /* ignore return status when invoked via the dispatcher */ - osm_state_mgr_process( ((osm_state_mgr_ctrl_t*)context)->p_mgr, - (osm_signal_t)(p_data) ); -} - -/********************************************************************** - **********************************************************************/ -void -osm_state_mgr_ctrl_construct( - IN osm_state_mgr_ctrl_t* const p_ctrl ) -{ - memset( p_ctrl, 0, sizeof(*p_ctrl) ); - p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; -} - -/********************************************************************** - **********************************************************************/ -void -osm_state_mgr_ctrl_destroy( - IN osm_state_mgr_ctrl_t* const p_ctrl ) -{ - CL_ASSERT( p_ctrl ); - cl_disp_unregister( p_ctrl->h_disp ); -} - -/********************************************************************** - **********************************************************************/ -ib_api_status_t -osm_state_mgr_ctrl_init( - IN osm_state_mgr_ctrl_t* const p_ctrl, - IN osm_state_mgr_t* const p_mgr, - IN osm_log_t* const p_log, - IN cl_dispatcher_t* const p_disp ) -{ - ib_api_status_t status = IB_SUCCESS; - - OSM_LOG_ENTER( p_log, osm_state_mgr_ctrl_init ); - - osm_state_mgr_ctrl_construct( p_ctrl ); - p_ctrl->p_log = p_log; - - p_ctrl->p_mgr = p_mgr; - p_ctrl->p_disp = p_disp; - - p_ctrl->h_disp = cl_disp_register( - p_disp, - OSM_MSG_NO_SMPS_OUTSTANDING, - __osm_state_mgr_ctrl_disp_callback, - p_ctrl ); - - if( p_ctrl->h_disp == CL_DISP_INVALID_HANDLE ) - { - osm_log( p_log, OSM_LOG_ERROR, - "osm_state_mgr_ctrl_init: ERR 3401: " - "Dispatcher registration failed\n" ); - status = IB_INSUFFICIENT_RESOURCES; - goto Exit; - } - - Exit: - OSM_LOG_EXIT( p_log ); - return( status ); -} - diff --git a/osm/opensm/osm_sw_info_rcv.c b/osm/opensm/osm_sw_info_rcv.c index 6bbd73a..61aff27 100644 --- a/osm/opensm/osm_sw_info_rcv.c +++ b/osm/opensm/osm_sw_info_rcv.c @@ -60,6 +60,7 @@ #include #include #include #include +#include #include /********************************************************************** @@ -673,7 +674,7 @@ osm_si_rcv_process( if (__osm_si_rcv_process_existing( p_rcv, p_node, p_sw, p_madw )) { CL_PLOCK_RELEASE( p_rcv->p_lock ); - osm_state_mgr_process( p_rcv->p_state_mgr, + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, OSM_SIGNAL_CHANGE_DETECTED ); goto Exit; } diff --git a/osm/opensm/osm_sweep_fail_ctrl.c b/osm/opensm/osm_sweep_fail_ctrl.c index e27a540..9e41ec7 100644 --- a/osm/opensm/osm_sweep_fail_ctrl.c +++ b/osm/opensm/osm_sweep_fail_ctrl.c @@ -52,6 +52,8 @@ #endif /* HAVE_CONFIG_H */ #include #include #include +#include +#include /********************************************************************** **********************************************************************/ @@ -68,7 +70,7 @@ __osm_sweep_fail_ctrl_disp_callback( /* Notify the state manager that we had a light sweep failure. */ - osm_state_mgr_process( p_ctrl->p_state_mgr, + osm_sm_signal( &p_ctrl->p_state_mgr->p_subn->p_osm->sm, OSM_SIGNAL_LIGHT_SWEEP_FAIL ); OSM_LOG_EXIT( p_ctrl->p_log ); diff --git a/osm/opensm/osm_trap_rcv.c b/osm/opensm/osm_trap_rcv.c index 9865f53..fb32ce9 100644 --- a/osm/opensm/osm_trap_rcv.c +++ b/osm/opensm/osm_trap_rcv.c @@ -589,8 +589,7 @@ __osm_trap_rcv_process_request( p_rcv->p_subn->force_immediate_heavy_sweep = TRUE; } - osm_state_mgr_process( p_rcv->p_state_mgr, - OSM_SIGNAL_SWEEP ); + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, OSM_SIGNAL_SWEEP ); } /* If we reached here due to trap 129/130/131 - do not need to do diff --git a/osm/opensm/osm_vl15intf.c b/osm/opensm/osm_vl15intf.c index 68f17c5..c3adb6e 100644 --- a/osm/opensm/osm_vl15intf.c +++ b/osm/opensm/osm_vl15intf.c @@ -62,6 +62,7 @@ #include #include #include #include +#include #include #include @@ -156,7 +157,6 @@ __osm_vl15_poller( if( status != IB_SUCCESS ) { uint32_t outstanding; - cl_status_t cl_status; osm_log( p_vl->p_log, OSM_LOG_ERROR, "__osm_vl15_poller: ERR 3E03: " @@ -202,27 +202,8 @@ __osm_vl15_poller( The wire is clean. Signal the state manager. */ - if( osm_log_is_active( p_vl->p_log, OSM_LOG_DEBUG ) ) - { - osm_log( p_vl->p_log, OSM_LOG_DEBUG, - "__osm_vl15_poller: " - "Posting Dispatcher message %s\n", - osm_get_disp_msg_str( OSM_MSG_NO_SMPS_OUTSTANDING ) ); - } - - cl_status = cl_disp_post( p_vl->h_disp, - OSM_MSG_NO_SMPS_OUTSTANDING, - (void *)OSM_SIGNAL_NO_PENDING_TRANSACTIONS, - NULL, - NULL ); - - if( cl_status != CL_SUCCESS ) - { - osm_log( p_vl->p_log, OSM_LOG_ERROR, - "__osm_vl15_poller: ERR 3E06: " - "Dispatcher post message failed (%s)\n", - CL_STATUS_MSG( cl_status ) ); - } + osm_sm_signal( &p_vl->p_subn->p_osm->sm, + OSM_SIGNAL_NO_PENDING_TRANSACTIONS ); } } } From pw at osc.edu Sun May 21 16:39:17 2006 From: pw at osc.edu (Pete Wyckoff) Date: Sun, 21 May 2006 19:39:17 -0400 Subject: [openib-general] vapi versus openib imm_data Message-ID: <20060521233917.GA25201@osc.edu> I have an application (PVFS2) that can use either VAPI (ibgd-1.8.2) or OpenIB (libibverbs-1.0.3.1.fc4 and libmthca-1.0.1.fc.4). Things work just fine between one machine using VAPI and another using OpenIB, but immediate data in an RDMA write comes through byte-swapped. Both ends are x86_64 hosts. I'm using a heuristic now, knowing the range of values that are valid, but is there any good way to tell if the other side has put its imm_data in network byte order or not? -- Pete From zhushisongzhu at yahoo.com Sun May 21 21:08:06 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Sun, 21 May 2006 21:08:06 -0700 (PDT) Subject: [openib-general] can't compile ttcp.aio.c using gen2 svn version Message-ID: <20060522040806.76174.qmail@web36905.mail.mud.yahoo.com> gen2 svn is compatible with ttcp.aio.c. there is no where to define SOL_SDP constant. zhu __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From amit_byron at yahoo.com Sun May 21 21:25:32 2006 From: amit_byron at yahoo.com (amit byron) Date: Sun, 21 May 2006 21:25:32 -0700 (PDT) Subject: [openib-general] connection management Message-ID: <20060522042532.46143.qmail@web38501.mail.mud.yahoo.com> hello, i'm using openib stack as fast low latency transport between two machines viz. A & B using ib_cm apis for connection management with single-port hca. i have following questions/queries: o to make multiple connections using ib_send_cm_req() i would have make connection requests using ib_send_cm_req() with different service id. is this sufficient, or i missed something? i can use the same port number, correct? o machine A is connected to machine B using ib_cm's request, response, ready to use protocol (machine A initiates the connection request). the question i have: for machine B to send message to machine A, is it sufficient for machine B to do a lookup using ib_sa_path_rec_get(), and then do ib_post_send()? if no, then what else needs to be done? thanks, Amit. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From mst at mellanox.co.il Sun May 21 22:17:22 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 22 May 2006 08:17:22 +0300 Subject: [openib-general] Re: can't compile ttcp.aio.c using gen2 svn version In-Reply-To: <20060522040806.76174.qmail@web36905.mail.mud.yahoo.com> References: <20060522040806.76174.qmail@web36905.mail.mud.yahoo.com> Message-ID: <20060522051721.GA14583@mellanox.co.il> Quoting r. zhu shi song : > Subject: can't compile ttcp.aio.c using gen2 svn version > > gen2 svn is compatible with ttcp.aio.c. there is no > where to define SOL_SDP constant. > > zhu I don't support AIO on sockets at the moment. -- MST From mst at mellanox.co.il Sun May 21 22:19:23 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 22 May 2006 08:19:23 +0300 Subject: [openib-general] Re: vapi versus openib imm_data In-Reply-To: <20060521233917.GA25201@osc.edu> References: <20060521233917.GA25201@osc.edu> Message-ID: <20060522051923.GB14583@mellanox.co.il> Quoting r. Pete Wyckoff : > is there any good way to tell if the other side has put > its imm_data in network byte order or not? gen2 always assumes imm_data is given in network byte order. -- MST From eitan at mellanox.co.il Sun May 21 23:24:23 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 22 May 2006 09:24:23 +0300 Subject: [openib-general] RE: [PATCH] RFC: opensm: serialize osm_state_mgr_process() Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBC9@mtlexch01.mtl.com> Hi Sasha, The idea to use pthread in OpenSM code is totally wrong. Please stop doing this. We want this code to be shared with Windows and this breaks it. Also please provide a clear RFC for what this patch is trying to do. Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Monday, May 22, 2006 2:02 AM > To: Hal Rosenstock; openib-general at openib.org > Cc: Eitan Zahavi; Yael Kalka; Ofer Gigi; sashak at voltaire.com; elid at voltaire.com > Subject: [PATCH] RFC: opensm: serialize osm_state_mgr_process() > > Hello, > > Please comment (and test). > > Thanks, > Sasha. > > This serializes execution of osm_state_mgr_process() and removes the big > state_lock. This should reduce "locked state" time and prevent potential > dispatcher blocking. > > Other important change here is direct usage of pthread primitives > instead of "traditional" cl_thread* stuff. > > Signed-off-by: Sasha Khapyorsky > --- > > osm/include/Makefile.am | 1 > osm/include/opensm/osm_msgdef.h | 15 -- > osm/include/opensm/osm_sm.h | 36 ++++- > osm/include/opensm/osm_state_mgr.h | 4 - > osm/include/opensm/osm_state_mgr_ctrl.h | 236 ------------------------------- > osm/opensm/Makefile.am | 5 - > osm/opensm/osm_helper.c | 1 > osm/opensm/osm_node_info_rcv.c | 5 - > osm/opensm/osm_port_info_rcv.c | 3 > osm/opensm/osm_sm.c | 160 +++++++++++---------- > osm/opensm/osm_sm_mad_ctrl.c | 26 --- > osm/opensm/osm_sm_state_mgr.c | 8 + > osm/opensm/osm_sminfo_rcv.c | 8 + > osm/opensm/osm_state_mgr.c | 24 --- > osm/opensm/osm_state_mgr_ctrl.c | 132 ----------------- > osm/opensm/osm_sw_info_rcv.c | 3 > osm/opensm/osm_sweep_fail_ctrl.c | 4 - > osm/opensm/osm_trap_rcv.c | 3 > osm/opensm/osm_vl15intf.c | 25 --- > 19 files changed, 143 insertions(+), 556 deletions(-) > > diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am > index 2bee762..0af78a0 100644 > --- a/osm/include/Makefile.am > +++ b/osm/include/Makefile.am > @@ -120,7 +120,6 @@ EXTRA_DIST = \ > $(srcdir)/opensm/osm_vl15intf.h \ > $(srcdir)/opensm/osm_drop_mgr.h \ > $(srcdir)/opensm/osm_port_info_rcv.h \ > - $(srcdir)/opensm/osm_state_mgr_ctrl.h \ > $(srcdir)/complib/cl_thread_osd.h \ > $(srcdir)/complib/cl_packon.h \ > $(srcdir)/complib/cl_atomic_osd.h \ > diff --git a/osm/include/opensm/osm_msgdef.h b/osm/include/opensm/osm_msgdef.h > index a1b5743..6956c86 100644 > --- a/osm/include/opensm/osm_msgdef.h > +++ b/osm/include/opensm/osm_msgdef.h > @@ -148,20 +148,6 @@ BEGIN_C_DECLS > * > * SOURCE > ***********/ > -/****d* OpenSM: Dispatcher Messages/OSM_MSG_NO_SMPS_OUTSTANDING > -* NAME > -* OSM_MSG_NO_SMPS_OUTSTANDING > -* > -* DESCRIPTION > -* Message indicating that there are no outstanding SMPs on the subnet. > -* > -* NOTES > -* Sent by: osm_mad_ctrl_t > -* Received by: osm_state_mgr_ctrl_t > -* Delivery notice: no > -* > -* SOURCE > -***********/ > enum > { > OSM_MSG_REQ = 0, > @@ -169,7 +155,6 @@ enum > OSM_MSG_MAD_PORT_INFO, > OSM_MSG_MAD_SWITCH_INFO, > OSM_MSG_MAD_NODE_DESC, > - OSM_MSG_NO_SMPS_OUTSTANDING, > OSM_MSG_MAD_NODE_RECORD, > OSM_MSG_MAD_PORTINFO_RECORD, > OSM_MSG_MAD_SERVICE_RECORD, > diff --git a/osm/include/opensm/osm_sm.h b/osm/include/opensm/osm_sm.h > index d6086d4..def43c8 100644 > --- a/osm/include/opensm/osm_sm.h > +++ b/osm/include/opensm/osm_sm.h > @@ -69,7 +69,6 @@ #include #include > #include > #include > -#include > #include > #include > #include > @@ -131,7 +130,6 @@ BEGIN_C_DECLS > typedef struct _osm_sm > { > osm_thread_state_t thread_state; > - cl_event_t signal; > cl_event_t subnet_up_event; > cl_thread_t sweeper; > osm_subn_t *p_subn; > @@ -143,6 +141,9 @@ typedef struct _osm_sm > cl_dispatcher_t *p_disp; > cl_plock_t *p_lock; > atomic32_t sm_trans_id; > + unsigned signal_mask; > + pthread_mutex_t mutex; > + pthread_cond_t cond; > osm_req_t req; > osm_req_ctrl_t req_ctrl; > osm_resp_t resp; > @@ -155,7 +156,6 @@ typedef struct _osm_sm > osm_sm_mad_ctrl_t mad_ctrl; > osm_si_rcv_t si_rcv; > osm_si_rcv_ctrl_t si_rcv_ctrl; > - osm_state_mgr_ctrl_t state_mgr_ctrl; > osm_lid_mgr_t lid_mgr; > osm_ucast_mgr_t ucast_mgr; > osm_link_mgr_t link_mgr; > @@ -387,6 +387,33 @@ osm_sm_init( > * SM object, osm_sm_construct, osm_sm_destroy > *********/ > > +/****f* OpenSM: SM/osm_sm_signal > +* NAME > +* osm_sm_signal > +* > +* DESCRIPTION > +* Forward signal to state engine > +* > +* SYNOPSIS > +*/ > +void > +osm_sm_signal( > + IN osm_sm_t* const p_sm, > + IN osm_signal_t signal ); > +/* > +* PARAMETERS > +* p_sm > +* [in] Pointer to an osm_sm_t object. > +* > +* signal > +* [in] Signal to the state engine. > +* > +* NOTES > +* > +* SEE ALSO > +* SM object > +*********/ > + > /****f* OpenSM: SM/osm_sm_sweep > * NAME > * osm_sm_sweep > @@ -404,9 +431,6 @@ osm_sm_sweep( > * p_sm > * [in] Pointer to an osm_sm_t object. > * > -* RETURN VALUES > -* IB_SUCCESS if the sweep completed successfully. > -* > * NOTES > * > * SEE ALSO > diff --git a/osm/include/opensm/osm_state_mgr.h > b/osm/include/opensm/osm_state_mgr.h > index ad4afa0..5e76463 100644 > --- a/osm/include/opensm/osm_state_mgr.h > +++ b/osm/include/opensm/osm_state_mgr.h > @@ -116,7 +116,6 @@ typedef struct _osm_state_mgr > osm_stats_t *p_stats; > struct _osm_sm_state_mgr *p_sm_state_mgr; > const osm_sm_mad_ctrl_t *p_mad_ctrl; > - cl_spinlock_t state_lock; > cl_spinlock_t idle_lock; > cl_qlist_t idle_time_list; > cl_plock_t *p_lock; > @@ -161,9 +160,6 @@ typedef struct _osm_state_mgr > * p_mad_ctrl > * Pointer to the SM's MAD Controller object. > * > -* state_lock > -* Spinlock guarding the state and processes. > -* > * p_lock > * lock guarding the subnet object. > * > diff --git a/osm/include/opensm/osm_state_mgr_ctrl.h > b/osm/include/opensm/osm_state_mgr_ctrl.h > deleted file mode 100644 > index 9ffcfb0..0000000 > --- a/osm/include/opensm/osm_state_mgr_ctrl.h > +++ /dev/null > @@ -1,236 +0,0 @@ > -/* > - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. > - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. > - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. > - * > - * This software is available to you under a choice of one of two > - * licenses. You may choose to be licensed under the terms of the GNU > - * General Public License (GPL) Version 2, available from the file > - * COPYING in the main directory of this source tree, or the > - * OpenIB.org BSD license below: > - * > - * Redistribution and use in source and binary forms, with or > - * without modification, are permitted provided that the following > - * conditions are met: > - * > - * - Redistributions of source code must retain the above > - * copyright notice, this list of conditions and the following > - * disclaimer. > - * > - * - Redistributions in binary form must reproduce the above > - * copyright notice, this list of conditions and the following > - * disclaimer in the documentation and/or other materials > - * provided with the distribution. > - * > - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE > WARRANTIES OF > - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT > HOLDERS > - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN > AN > - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR > IN > - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > THE > - * SOFTWARE. > - * > - * $Id$ > - */ > - > - > -/* > - * Abstract: > - * Declaration of osm_state_mgr_ctrl_t. > - * This object represents a controller that receives the > - * State indication after a subnet sweep. > - * This object is part of the OpenSM family of objects. > - * > - * Environment: > - * Linux User Mode > - * > - * $Revision: 1.4 $ > - */ > - > -#ifndef _OSM_STATE_MGR_CTRL_H_ > -#define _OSM_STATE_MGR_CTRL_H_ > - > - > -#include > -#include > -#include > -#include > - > -#ifdef __cplusplus > -# define BEGIN_C_DECLS extern "C" { > -# define END_C_DECLS } > -#else /* !__cplusplus */ > -# define BEGIN_C_DECLS > -# define END_C_DECLS > -#endif /* __cplusplus */ > - > -BEGIN_C_DECLS > - > -/****h* OpenSM/State Manager Controller > -* NAME > -* State Manager Controller > -* > -* DESCRIPTION > -* The State Manager Controller object encapsulates the information > -* needed to pass the dispatcher message from the dispatcher > -* to the State Manager. > -* > -* The State Manager Controller object is thread safe. > -* > -* This object should be treated as opaque and should be > -* manipulated only through the provided functions. > -* > -* AUTHOR > -* Steve King, Intel > -* > -*********/ > -/****s* OpenSM: State Manager Controller/osm_state_mgr_ctrl_t > -* NAME > -* osm_state_mgr_ctrl_t > -* > -* DESCRIPTION > -* State Manager Controller structure. > -* > -* This object should be treated as opaque and should > -* be manipulated only through the provided functions. > -* > -* SYNOPSIS > -*/ > -typedef struct _osm_state_mgr_ctrl > -{ > - osm_state_mgr_t *p_mgr; > - osm_log_t *p_log; > - cl_dispatcher_t *p_disp; > - cl_disp_reg_handle_t h_disp; > - > -} osm_state_mgr_ctrl_t; > -/* > -* FIELDS > -* p_mgr > -* Pointer to the State Manager object. > -* > -* p_log > -* Pointer to the log object. > -* > -* p_disp > -* Pointer to the Dispatcher. > -* > -* h_disp > -* Handle returned from dispatcher registration. > -* > -* SEE ALSO > -* State Manager Controller object > -*********/ > - > -/****f* OpenSM: State Manager Controller/osm_state_mgr_ctrl_construct > -* NAME > -* osm_state_mgr_ctrl_construct > -* > -* DESCRIPTION > -* This function constructs a State Manager Controller object. > -* > -* SYNOPSIS > -*/ > -void > -osm_state_mgr_ctrl_construct( > - IN osm_state_mgr_ctrl_t* const p_ctrl ); > -/* > -* PARAMETERS > -* p_ctrl > -* [in] Pointer to a State Manager Controller > -* object to construct. > -* > -* RETURN VALUE > -* This function does not return a value. > -* > -* NOTES > -* Allows calling osm_state_mgr_ctrl_init, osm_state_mgr_ctrl_destroy, > -* and osm_state_mgr_ctrl_is_inited. > -* > -* Calling osm_state_mgr_ctrl_construct is a prerequisite to calling any other > -* method except osm_state_mgr_ctrl_init. > -* > -* SEE ALSO > -* State Manager Controller object, osm_state_mgr_ctrl_init, > -* osm_state_mgr_ctrl_destroy > -*********/ > - > -/****f* OpenSM: State Manager Controller/osm_state_mgr_ctrl_destroy > -* NAME > -* osm_state_mgr_ctrl_destroy > -* > -* DESCRIPTION > -* The osm_state_mgr_ctrl_destroy function destroys the object, releasing > -* all resources. > -* > -* SYNOPSIS > -*/ > -void > -osm_state_mgr_ctrl_destroy( > - IN osm_state_mgr_ctrl_t* const p_ctrl ); > -/* > -* PARAMETERS > -* p_ctrl > -* [in] Pointer to the object to destroy. > -* > -* RETURN VALUE > -* This function does not return a value. > -* > -* NOTES > -* Performs any necessary cleanup of the specified > -* State Manager Controller object. > -* Further operations should not be attempted on the destroyed object. > -* This function should only be called after a call to > -* osm_state_mgr_ctrl_construct or osm_state_mgr_ctrl_init. > -* > -* SEE ALSO > -* State Manager Controller object, osm_state_mgr_ctrl_construct, > -* osm_state_mgr_ctrl_init > -*********/ > - > -/****f* OpenSM: State Manager Controller/osm_state_mgr_ctrl_init > -* NAME > -* osm_state_mgr_ctrl_init > -* > -* DESCRIPTION > -* The osm_state_mgr_ctrl_init function initializes a > -* State Manager Controller object for use. > -* > -* SYNOPSIS > -*/ > -ib_api_status_t > -osm_state_mgr_ctrl_init( > - IN osm_state_mgr_ctrl_t* const p_ctrl, > - IN osm_state_mgr_t* const p_mgr, > - IN osm_log_t* const p_log, > - IN cl_dispatcher_t* const p_disp ); > -/* > -* PARAMETERS > -* p_ctrl > -* [in] Pointer to an osm_state_mgr_ctrl_t object to initialize. > -* > -* p_mgr > -* [in] Pointer to an osm_state_mgr_t object. > -* > -* p_log > -* [in] Pointer to the log object. > -* > -* p_disp > -* [in] Pointer to the OpenSM central Dispatcher. > -* > -* RETURN VALUES > -* IB_SUCCESS if the State Manager Controller object > -* was initialized successfully. > -* > -* NOTES > -* Allows calling other State Manager Controller methods. > -* > -* SEE ALSO > -* State Manager Controller object, osm_state_mgr_ctrl_construct, > -* osm_state_mgr_ctrl_destroy > -*********/ > - > -END_C_DECLS > - > -#endif /* OSM_STATE_MGR_CTRL_H_ */ > diff --git a/osm/opensm/Makefile.am b/osm/opensm/Makefile.am > index 43fe8c1..7b1060a 100644 > --- a/osm/opensm/Makefile.am > +++ b/osm/opensm/Makefile.am > @@ -78,8 +78,7 @@ opensm_SOURCES = main.c osm_console.c os > osm_slvl_map_rcv.c osm_slvl_map_rcv_ctrl.c \ > osm_sm.c osm_sminfo_rcv.c \ > osm_sminfo_rcv_ctrl.c osm_sm_mad_ctrl.c \ > - osm_sm_state_mgr.c osm_state_mgr.c \ > - osm_state_mgr_ctrl.c osm_subnet.c \ > + osm_sm_state_mgr.c osm_state_mgr.c osm_subnet.c \ > osm_sweep_fail_ctrl.c osm_sw_info_rcv.c \ > osm_sw_info_rcv_ctrl.c osm_switch.c \ > osm_prtn.c osm_prtn_config.c osm_qos.c \ > @@ -104,7 +103,7 @@ # we need to be able to load libraries f > # we always give precedence to local tree libs and then use the pre-installed ones. > opensm_LDADD = -L../complib -L../libvendor -L. $(OSMV_LDADD) -lopensm - > losmcomp -losmvendor > > -opensm_LDFLAGS = -Wl,--rpath -Wl,$(libdir) -lpthread > +opensm_LDFLAGS = -Wl,--rpath -Wl,$(libdir) -lpthread -lrt > > opensmincludedir = $(includedir)/infiniband/opensm > > diff --git a/osm/opensm/osm_helper.c b/osm/opensm/osm_helper.c > index 3886609..a966bbe 100644 > --- a/osm/opensm/osm_helper.c > +++ b/osm/opensm/osm_helper.c > @@ -1895,7 +1895,6 @@ static const char* const __osm_disp_msg_ > "OSM_MSG_MAD_PORT_INFO,", > "OSM_MSG_MAD_SWITCH_INFO", > "OSM_MSG_MAD_NODE_DESC", > - "OSM_MSG_NO_SMPS_OUTSTANDING", > "OSM_MSG_MAD_NODE_RECORD", > "OSM_MSG_MAD_PORTINFO_RECORD", > "OSM_MSG_MAD_SERVICE_RECORD", > diff --git a/osm/opensm/osm_node_info_rcv.c b/osm/opensm/osm_node_info_rcv.c > index 59257a0..7ea4366 100644 > --- a/osm/opensm/osm_node_info_rcv.c > +++ b/osm/opensm/osm_node_info_rcv.c > @@ -69,6 +69,7 @@ #include > #include > #include > #include > +#include > > > /********************************************************************** > @@ -1088,11 +1089,11 @@ osm_ni_rcv_process( > > /* > * If we processed a new node - need to signal to the state_mgr that > - * change detected. BUT - we cannot call the osm_state_mgr_process > + * change detected. BUT - we cannot call the osm_sm_signal > * from within the lock of p_rcv->p_lock (can cause a deadlock). > */ > if ( process_new_flag ) > - osm_state_mgr_process( p_rcv->p_state_mgr, > OSM_SIGNAL_CHANGE_DETECTED ); > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, > OSM_SIGNAL_CHANGE_DETECTED ); > > Exit: > OSM_LOG_EXIT( p_rcv->p_log ); > diff --git a/osm/opensm/osm_port_info_rcv.c b/osm/opensm/osm_port_info_rcv.c > index a08c57c..7405ef0 100644 > --- a/osm/opensm/osm_port_info_rcv.c > +++ b/osm/opensm/osm_port_info_rcv.c > @@ -69,6 +69,7 @@ #include > #include > #include > #include > +#include > > /********************************************************************** > **********************************************************************/ > @@ -701,7 +702,7 @@ osm_pi_rcv_process( > " port = %u, Commencing heavy sweep\n", > cl_ntoh64( node_guid ), > cl_ntoh64( port_guid ) ); > - osm_state_mgr_process( p_rcv->p_state_mgr, > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, > OSM_SIGNAL_CHANGE_DETECTED ); > goto Exit; > } > diff --git a/osm/opensm/osm_sm.c b/osm/opensm/osm_sm.c > index 0e09f26..5b5eb3f 100644 > --- a/osm/opensm/osm_sm.c > +++ b/osm/opensm/osm_sm.c > @@ -55,6 +55,8 @@ #if HAVE_CONFIG_H > # include > #endif /* HAVE_CONFIG_H */ > > +#include > +#include > #include > #include > #include > @@ -79,53 +81,65 @@ void > __osm_sm_sweeper( > IN void *p_ptr ) > { > - ib_api_status_t status; > osm_sm_t *const p_sm = ( osm_sm_t * ) p_ptr; > + unsigned i, signals; > > OSM_LOG_ENTER( p_sm->p_log, __osm_sm_sweeper ); > > - if( p_sm->thread_state == OSM_THREAD_STATE_INIT ) > - { > - p_sm->thread_state = OSM_THREAD_STATE_RUN; > - } > - > - /* If the sweep interval was updated before - then run only if > - * it is not zero. */ > - while( p_sm->thread_state == OSM_THREAD_STATE_RUN && > - p_sm->p_subn->opt.sweep_interval != 0 ) > - { > - /* do the sweep only if we are in MASTER state */ > - if( p_sm->p_subn->sm_state == IB_SMINFO_STATE_MASTER || > - p_sm->p_subn->sm_state == IB_SMINFO_STATE_DISCOVERING ) > - osm_state_mgr_process( &p_sm->state_mgr, OSM_SIGNAL_SWEEP ); > + do { > + signals = 0; > + pthread_mutex_lock(&p_sm->mutex); > + if (p_sm->signal_mask == 0) > + pthread_cond_wait(&p_sm->cond, &p_sm->mutex); > + signals = p_sm->signal_mask; > + p_sm->signal_mask = 0; > + pthread_mutex_unlock(&p_sm->mutex); > + for (i = 0 ; signals ; i++) { > + if (signals&1) > + osm_state_mgr_process( &p_sm->state_mgr, i); > + signals >>= 1; > + } > + } while (p_sm->thread_state == OSM_THREAD_STATE_RUN); > > - /* > - * Wait on the event with a timeout. > - * Sweeps may be iniated "off schedule" by simply > - * signaling the event. > - */ > - status = cl_event_wait_on( &p_sm->signal, > - p_sm->p_subn->opt.sweep_interval * 1000000, > - TRUE ); > + OSM_LOG_EXIT( p_sm->p_log ); > +} > > - if( status == CL_SUCCESS ) > - { > - if( osm_log_is_active( p_sm->p_log, OSM_LOG_DEBUG ) ) > - { > - osm_log( p_sm->p_log, OSM_LOG_DEBUG, > - "__osm_sm_sweeper: " "Off schedule sweep signalled\n" ); > - } > +/********************************************************************** > + **********************************************************************/ > +void > +__osm_sm_sweeper_periodic( > + IN void *p_ptr ) > +{ > + struct timespec times; > + osm_sm_t *const p_sm = ( osm_sm_t * ) p_ptr; > + unsigned i, signals; > + int ret; > + > + OSM_LOG_ENTER( p_sm->p_log, __osm_sm_sweeper_periodic ); > + > + do { > + clock_gettime(CLOCK_REALTIME, ×); > + times.tv_sec += p_sm->p_subn->opt.sweep_interval; > + signals = 0; > + pthread_mutex_lock(&p_sm->mutex); > + if (p_sm->signal_mask == 0 && > + (ret = pthread_cond_timedwait(&p_sm->cond, &p_sm->mutex, > + ×)) == ETIMEDOUT && > + ( p_sm->p_subn->sm_state == IB_SMINFO_STATE_MASTER || > + p_sm->p_subn->sm_state == IB_SMINFO_STATE_DISCOVERING )) { > + signals = OSM_SIGNAL_SWEEP; > } > - else > - { > - if( status != CL_TIMEOUT ) > - { > - osm_log( p_sm->p_log, OSM_LOG_ERROR, > - "__osm_sm_sweeper: ERR 2E01: " > - "Event wait failed (%s)\n", CL_STATUS_MSG( status ) ); > - } > + else { > + signals = p_sm->signal_mask; > + p_sm->signal_mask = 0; > } > - } > + pthread_mutex_unlock(&p_sm->mutex); > + for (i = 0 ; signals ; i++) { > + if (signals&1) > + osm_state_mgr_process( &p_sm->state_mgr, i); > + signals >>= 1; > + } > + } while (p_sm->thread_state == OSM_THREAD_STATE_RUN); > > OSM_LOG_EXIT( p_sm->p_log ); > } > @@ -139,7 +153,6 @@ osm_sm_construct( > memset( p_sm, 0, sizeof( *p_sm ) ); > p_sm->thread_state = OSM_THREAD_STATE_NONE; > p_sm->sm_trans_id = OSM_SM_INITIAL_TID_VALUE; > - cl_event_construct( &p_sm->signal ); > cl_event_construct( &p_sm->subnet_up_event ); > cl_thread_construct( &p_sm->sweeper ); > osm_req_construct( &p_sm->req ); > @@ -158,7 +171,6 @@ osm_sm_construct( > osm_ucast_mgr_construct( &p_sm->ucast_mgr ); > osm_link_mgr_construct( &p_sm->link_mgr ); > osm_state_mgr_construct( &p_sm->state_mgr ); > - osm_state_mgr_ctrl_construct( &p_sm->state_mgr_ctrl ); > osm_drop_mgr_construct( &p_sm->drop_mgr ); > osm_lft_rcv_construct( &p_sm->lft_rcv ); > osm_lft_rcv_ctrl_construct( &p_sm->lft_rcv_ctrl ); > @@ -185,24 +197,14 @@ void > osm_sm_shutdown( > IN osm_sm_t * const p_sm ) > { > - boolean_t signal_event = FALSE; > - > OSM_LOG_ENTER( p_sm->p_log, osm_sm_shutdown ); > > /* > * Signal our threads that we're leaving. > - */ > - if( p_sm->thread_state != OSM_THREAD_STATE_NONE ) > - signal_event = TRUE; > - > - p_sm->thread_state = OSM_THREAD_STATE_EXIT; > - > - /* > - * Don't trigger unless event has been initialized. > * Destroy the thread before we tear down the other objects. > */ > - if( signal_event ) > - cl_event_signal( &p_sm->signal ); > + p_sm->thread_state = OSM_THREAD_STATE_EXIT; > + osm_sm_signal( p_sm, OSM_SIGNAL_NONE ); > > cl_thread_destroy( &p_sm->sweeper ); > > @@ -225,7 +227,6 @@ osm_sm_shutdown( > osm_vla_rcv_ctrl_destroy( &p_sm->vla_rcv_ctrl ); > osm_pkey_rcv_ctrl_destroy( &p_sm->pkey_rcv_ctrl ); > osm_sweep_fail_ctrl_destroy( &p_sm->sweep_fail_ctrl ); > - osm_state_mgr_ctrl_destroy( &p_sm->state_mgr_ctrl ); > > OSM_LOG_EXIT( p_sm->p_log ); > } > @@ -257,12 +258,14 @@ osm_sm_destroy( > osm_state_mgr_destroy( &p_sm->state_mgr ); > osm_sm_state_mgr_destroy( &p_sm->sm_state_mgr ); > osm_mcast_mgr_destroy( &p_sm->mcast_mgr ); > - cl_event_destroy( &p_sm->signal ); > cl_event_destroy( &p_sm->subnet_up_event ); > > if( p_sm->p_report_buf != NULL ) > cl_free( p_sm->p_report_buf ); > > + pthread_cond_destroy(&p_sm->cond); > + pthread_mutex_destroy(&p_sm->mutex); > + > osm_log( p_sm->p_log, OSM_LOG_SYS, "Exiting SM\n" ); /* Format Waived */ > OSM_LOG_EXIT( p_sm->p_log ); > } > @@ -303,14 +306,15 @@ osm_sm_init( > status = IB_INSUFFICIENT_MEMORY; > goto Exit; > } > - status = cl_event_init( &p_sm->signal, FALSE ); > - if( status != CL_SUCCESS ) > - goto Exit; > > status = cl_event_init( &p_sm->subnet_up_event, FALSE ); > if( status != CL_SUCCESS ) > goto Exit; > > + p_sm->signal_mask = 0; > + pthread_mutex_init(&p_sm->mutex, NULL); > + pthread_cond_init(&p_sm->cond, NULL); > + > status = osm_sm_mad_ctrl_init( &p_sm->mad_ctrl, > p_sm->p_subn, > p_sm->p_mad_pool, > @@ -416,12 +420,6 @@ osm_sm_init( > if( status != IB_SUCCESS ) > goto Exit; > > - status = osm_state_mgr_ctrl_init( &p_sm->state_mgr_ctrl, > - &p_sm->state_mgr, > - p_sm->p_log, p_sm->p_disp ); > - if( status != IB_SUCCESS ) > - goto Exit; > - > status = osm_drop_mgr_init( &p_sm->drop_mgr, > p_sm->p_subn, > p_sm->p_log, &p_sm->req, p_sm->p_lock ); > @@ -523,16 +521,15 @@ osm_sm_init( > > /* > * Now that the component objects are initialized, start > - * the sweeper thread if the user wants sweeping. > + * the sweeper thread. > */ > - if( p_sm->p_subn->opt.sweep_interval ) > - { > - p_sm->thread_state = OSM_THREAD_STATE_INIT; > - status = cl_thread_init( &p_sm->sweeper, __osm_sm_sweeper, p_sm, > - "opensm sweeper" ); > - if( status != IB_SUCCESS ) > - goto Exit; > - } > + p_sm->thread_state = OSM_THREAD_STATE_RUN; > + status = cl_thread_init( &p_sm->sweeper, > + p_sm->p_subn->opt.sweep_interval > 0 ? > + __osm_sm_sweeper_periodic : __osm_sm_sweeper, > + p_sm, "opensm sweeper" ); > + if( status != IB_SUCCESS ) > + goto Exit; > > Exit: > OSM_LOG_EXIT( p_log ); > @@ -542,11 +539,26 @@ osm_sm_init( > /********************************************************************** > **********************************************************************/ > void > +osm_sm_signal( > + IN osm_sm_t* const p_sm, > + IN osm_signal_t signal ) > +{ > + OSM_LOG_ENTER( p_sm->p_log, osm_sm_signal ); > + pthread_mutex_lock(&p_sm->mutex); > + p_sm->signal_mask |= (1 << signal); > + pthread_cond_signal(&p_sm->cond); > + pthread_mutex_unlock(&p_sm->mutex); > + OSM_LOG_EXIT( p_sm->p_log ); > +} > + > +/********************************************************************** > + **********************************************************************/ > +void > osm_sm_sweep( > IN osm_sm_t * const p_sm ) > { > OSM_LOG_ENTER( p_sm->p_log, osm_sm_sweep ); > - osm_state_mgr_process( &p_sm->state_mgr, OSM_SIGNAL_SWEEP ); > + osm_sm_signal( p_sm, OSM_SIGNAL_SWEEP ); > OSM_LOG_EXIT( p_sm->p_log ); > } > > diff --git a/osm/opensm/osm_sm_mad_ctrl.c b/osm/opensm/osm_sm_mad_ctrl.c > index 9dceef2..1982873 100644 > --- a/osm/opensm/osm_sm_mad_ctrl.c > +++ b/osm/opensm/osm_sm_mad_ctrl.c > @@ -81,7 +81,6 @@ __osm_sm_mad_ctrl_retire_trans_mad( > IN osm_madw_t* const p_madw ) > { > uint32_t outstanding; > - cl_status_t status; > > OSM_LOG_ENTER( p_ctrl->p_log, __osm_sm_mad_ctrl_retire_trans_mad ); > > @@ -115,31 +114,10 @@ __osm_sm_mad_ctrl_retire_trans_mad( > The wire is clean. > Signal the state manager. > */ > - if( osm_log_is_active( p_ctrl->p_log, OSM_LOG_DEBUG ) ) > - { > - osm_log( p_ctrl->p_log, OSM_LOG_DEBUG, > - "__osm_sm_mad_ctrl_retire_trans_mad: " > - "Posting Dispatcher message %s\n", > - osm_get_disp_msg_str( OSM_MSG_NO_SMPS_OUTSTANDING ) ); > - } > - > - status = cl_disp_post( p_ctrl->h_disp, > - OSM_MSG_NO_SMPS_OUTSTANDING, > - (void *)OSM_SIGNAL_NO_PENDING_TRANSACTIONS, > - NULL, > - NULL ); > - > - if( status != CL_SUCCESS ) > - { > - osm_log( p_ctrl->p_log, OSM_LOG_ERROR, > - "__osm_sm_mad_ctrl_retire_trans_mad: ERR 3101: " > - "Dispatcher post message failed (%s)\n", > - CL_STATUS_MSG( status ) ); > - goto Exit; > - } > + osm_sm_signal( &p_ctrl->p_subn->p_osm->sm, > + OSM_SIGNAL_NO_PENDING_TRANSACTIONS ); > } > > - Exit: > OSM_LOG_EXIT( p_ctrl->p_log ); > } > /************/ > diff --git a/osm/opensm/osm_sm_state_mgr.c b/osm/opensm/osm_sm_state_mgr.c > index feeda45..2c6da4e 100644 > --- a/osm/opensm/osm_sm_state_mgr.c > +++ b/osm/opensm/osm_sm_state_mgr.c > @@ -563,7 +563,7 @@ osm_sm_state_mgr_process( > /* > * Stop the discovering > */ > - osm_state_mgr_process( p_sm_mgr->p_state_mgr, > + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, > OSM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED ); > break; > case OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE: > @@ -610,7 +610,7 @@ osm_sm_state_mgr_process( > __osm_sm_state_mgr_discovering_msg( p_sm_mgr ); > p_sm_mgr->p_subn->sm_state = IB_SMINFO_STATE_DISCOVERING; > p_sm_mgr->p_subn->coming_out_of_standby = TRUE; > - osm_state_mgr_process( p_sm_mgr->p_state_mgr, OSM_SIGNAL_EXIT_STBY > ); > + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, > OSM_SIGNAL_EXIT_STBY ); > break; > case OSM_SM_SIGNAL_DISABLE: > /* > @@ -641,7 +641,7 @@ osm_sm_state_mgr_process( > */ > p_sm_mgr->p_subn->master_sm_base_lid = p_sm_mgr->p_subn->sm_base_lid; > p_sm_mgr->p_subn->coming_out_of_standby = TRUE; > - osm_state_mgr_process( p_sm_mgr->p_state_mgr, OSM_SIGNAL_EXIT_STBY > ); > + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, > OSM_SIGNAL_EXIT_STBY ); > break; > case OSM_SM_SIGNAL_ACKNOWLEDGE: > /* > @@ -704,7 +704,7 @@ osm_sm_state_mgr_process( > "Received OSM_SM_SIGNAL_HANDOVER\n" ); > p_sm_mgr->p_polling_sm = NULL; > p_sm_mgr->p_subn->force_immediate_heavy_sweep = TRUE; > - osm_state_mgr_process( p_sm_mgr->p_state_mgr, OSM_SIGNAL_SWEEP ); > + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, OSM_SIGNAL_SWEEP ); > break; > case OSM_SM_SIGNAL_HANDOVER_SENT: > /* > diff --git a/osm/opensm/osm_sminfo_rcv.c b/osm/opensm/osm_sminfo_rcv.c > index 5914984..4af549b 100644 > --- a/osm/opensm/osm_sminfo_rcv.c > +++ b/osm/opensm/osm_sminfo_rcv.c > @@ -425,10 +425,10 @@ __osm_sminfo_rcv_process_set_request( > } > > /********************************************************************** > - * Return a signal with which to call the osm_state_mgr_process. > + * Return a signal with which to call the osm_sm_signal. > * This is done since we are locked by p_rcv->p_lock in this function, > - * and thus cannot call osm_state_mgr_process (that locks the state_lock). > - * If return OSM_SIGNAL_NONE - do not call osm_state_mgr_process. > + * and thus cannot call osm_sm_signal. > + * If return OSM_SIGNAL_NONE - do not call osm_sm_signal. > **********************************************************************/ > osm_signal_t > __osm_sminfo_rcv_process_get_sm( > @@ -676,7 +676,7 @@ __osm_sminfo_rcv_process_get_response( > /* If process_get_sm_ret_val != OSM_SIGNAL_NONE then we have to signal > * to the state_mgr with that signal. */ > if (process_get_sm_ret_val != OSM_SIGNAL_NONE) > - osm_state_mgr_process( p_rcv->p_state_mgr, > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, > process_get_sm_ret_val ); > OSM_LOG_EXIT( p_rcv->p_log ); > } > diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c > index 724b2b7..ff1c65c 100644 > --- a/osm/opensm/osm_state_mgr.c > +++ b/osm/opensm/osm_state_mgr.c > @@ -83,7 +83,6 @@ osm_state_mgr_construct( > IN osm_state_mgr_t * const p_mgr ) > { > memset( p_mgr, 0, sizeof( *p_mgr ) ); > - cl_spinlock_construct( &p_mgr->state_lock ); > cl_spinlock_construct( &p_mgr->idle_lock ); > p_mgr->state = OSM_SM_STATE_INIT; > } > @@ -99,7 +98,6 @@ osm_state_mgr_destroy( > OSM_LOG_ENTER( p_mgr->p_log, osm_state_mgr_destroy ); > > /* destroy the locks */ > - cl_spinlock_destroy( &p_mgr->state_lock ); > cl_spinlock_destroy( &p_mgr->idle_lock ); > > OSM_LOG_EXIT( p_mgr->p_log ); > @@ -162,14 +160,6 @@ osm_state_mgr_init( > p_mgr->state_step_mode = OSM_STATE_STEP_CONTINUOUS; > p_mgr->next_stage_signal = OSM_SIGNAL_NONE; > > - status = cl_spinlock_init( &p_mgr->state_lock ); > - if( status != CL_SUCCESS ) > - { > - osm_log( p_mgr->p_log, OSM_LOG_ERROR, > - "osm_state_mgr_init: ERR 3301: " > - "Spinlock init failed (%s)\n", CL_STATUS_MSG( status ) ); > - } > - > cl_qlist_init( &p_mgr->idle_time_list ); > > status = cl_spinlock_init( &p_mgr->idle_lock ); > @@ -1897,16 +1887,6 @@ osm_state_mgr_process( > if( osm_exit_flag ) > signal = OSM_SIGNAL_NONE; > > - /* > - * The state lock prevents many race conditions from screwing > - * up the state transition process. For example, if an function > - * puts transactions on the wire, the state lock guarantees this > - * loop will see the return code ("DONE PENDING") of the function > - * before the "NO OUTSTANDING TRANSACTIONS" signal is asynchronously > - * received. > - */ > - cl_spinlock_acquire( &p_mgr->state_lock ); > - > while( signal != OSM_SIGNAL_NONE ) > { > if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) > @@ -2957,8 +2937,6 @@ osm_state_mgr_process( > p_mgr->state_step_mode = OSM_STATE_STEP_BREAK; > } > > - cl_spinlock_release( &p_mgr->state_lock ); > - > OSM_LOG_EXIT( p_mgr->p_log ); > } > > @@ -2994,7 +2972,7 @@ osm_state_mgr_process_idle( > cl_qlist_insert_tail( &p_mgr->idle_time_list, &p_idle_item->list_item ); > cl_spinlock_release( &p_mgr->idle_lock ); > > - osm_state_mgr_process( p_mgr, OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST > ); > + osm_sm_signal( &p_mgr->p_subn->p_osm->sm, > OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST ); > > OSM_LOG_EXIT( p_mgr->p_log ); > > diff --git a/osm/opensm/osm_state_mgr_ctrl.c b/osm/opensm/osm_state_mgr_ctrl.c > deleted file mode 100644 > index 0bde333..0000000 > --- a/osm/opensm/osm_state_mgr_ctrl.c > +++ /dev/null > @@ -1,132 +0,0 @@ > -/* > - * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. > - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. > - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. > - * > - * This software is available to you under a choice of one of two > - * licenses. You may choose to be licensed under the terms of the GNU > - * General Public License (GPL) Version 2, available from the file > - * COPYING in the main directory of this source tree, or the > - * OpenIB.org BSD license below: > - * > - * Redistribution and use in source and binary forms, with or > - * without modification, are permitted provided that the following > - * conditions are met: > - * > - * - Redistributions of source code must retain the above > - * copyright notice, this list of conditions and the following > - * disclaimer. > - * > - * - Redistributions in binary form must reproduce the above > - * copyright notice, this list of conditions and the following > - * disclaimer in the documentation and/or other materials > - * provided with the distribution. > - * > - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE > WARRANTIES OF > - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT > HOLDERS > - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN > AN > - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR > IN > - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > THE > - * SOFTWARE. > - * > - * $Id$ > - */ > - > - > -/* > - * Abstract: > - * Implementation of osm_state_mgr_ctrl_t. > - * This object represents the State Manager Controller object. > - * This object is part of the opensm family of objects. > - * > - * Environment: > - * Linux User Mode > - * > - * $Revision: 1.5 $ > - */ > - > -/* > - Next available error code: 0x1601 > -*/ > - > -#if HAVE_CONFIG_H > -# include > -#endif /* HAVE_CONFIG_H */ > - > -#include > -#include > -#include > - > -/********************************************************************** > - **********************************************************************/ > -void > -__osm_state_mgr_ctrl_disp_callback( > - IN void *context, > - IN void *p_data ) > -{ > - /* ignore return status when invoked via the dispatcher */ > - osm_state_mgr_process( ((osm_state_mgr_ctrl_t*)context)->p_mgr, > - (osm_signal_t)(p_data) ); > -} > - > -/********************************************************************** > - **********************************************************************/ > -void > -osm_state_mgr_ctrl_construct( > - IN osm_state_mgr_ctrl_t* const p_ctrl ) > -{ > - memset( p_ctrl, 0, sizeof(*p_ctrl) ); > - p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; > -} > - > -/********************************************************************** > - **********************************************************************/ > -void > -osm_state_mgr_ctrl_destroy( > - IN osm_state_mgr_ctrl_t* const p_ctrl ) > -{ > - CL_ASSERT( p_ctrl ); > - cl_disp_unregister( p_ctrl->h_disp ); > -} > - > -/********************************************************************** > - **********************************************************************/ > -ib_api_status_t > -osm_state_mgr_ctrl_init( > - IN osm_state_mgr_ctrl_t* const p_ctrl, > - IN osm_state_mgr_t* const p_mgr, > - IN osm_log_t* const p_log, > - IN cl_dispatcher_t* const p_disp ) > -{ > - ib_api_status_t status = IB_SUCCESS; > - > - OSM_LOG_ENTER( p_log, osm_state_mgr_ctrl_init ); > - > - osm_state_mgr_ctrl_construct( p_ctrl ); > - p_ctrl->p_log = p_log; > - > - p_ctrl->p_mgr = p_mgr; > - p_ctrl->p_disp = p_disp; > - > - p_ctrl->h_disp = cl_disp_register( > - p_disp, > - OSM_MSG_NO_SMPS_OUTSTANDING, > - __osm_state_mgr_ctrl_disp_callback, > - p_ctrl ); > - > - if( p_ctrl->h_disp == CL_DISP_INVALID_HANDLE ) > - { > - osm_log( p_log, OSM_LOG_ERROR, > - "osm_state_mgr_ctrl_init: ERR 3401: " > - "Dispatcher registration failed\n" ); > - status = IB_INSUFFICIENT_RESOURCES; > - goto Exit; > - } > - > - Exit: > - OSM_LOG_EXIT( p_log ); > - return( status ); > -} > - > diff --git a/osm/opensm/osm_sw_info_rcv.c b/osm/opensm/osm_sw_info_rcv.c > index 6bbd73a..61aff27 100644 > --- a/osm/opensm/osm_sw_info_rcv.c > +++ b/osm/opensm/osm_sw_info_rcv.c > @@ -60,6 +60,7 @@ #include > #include > #include > #include > +#include > #include > > /********************************************************************** > @@ -673,7 +674,7 @@ osm_si_rcv_process( > if (__osm_si_rcv_process_existing( p_rcv, p_node, p_sw, p_madw )) > { > CL_PLOCK_RELEASE( p_rcv->p_lock ); > - osm_state_mgr_process( p_rcv->p_state_mgr, > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, > OSM_SIGNAL_CHANGE_DETECTED ); > goto Exit; > } > diff --git a/osm/opensm/osm_sweep_fail_ctrl.c b/osm/opensm/osm_sweep_fail_ctrl.c > index e27a540..9e41ec7 100644 > --- a/osm/opensm/osm_sweep_fail_ctrl.c > +++ b/osm/opensm/osm_sweep_fail_ctrl.c > @@ -52,6 +52,8 @@ #endif /* HAVE_CONFIG_H */ > #include > #include > #include > +#include > +#include > > /********************************************************************** > **********************************************************************/ > @@ -68,7 +70,7 @@ __osm_sweep_fail_ctrl_disp_callback( > /* > Notify the state manager that we had a light sweep failure. > */ > - osm_state_mgr_process( p_ctrl->p_state_mgr, > + osm_sm_signal( &p_ctrl->p_state_mgr->p_subn->p_osm->sm, > OSM_SIGNAL_LIGHT_SWEEP_FAIL ); > > OSM_LOG_EXIT( p_ctrl->p_log ); > diff --git a/osm/opensm/osm_trap_rcv.c b/osm/opensm/osm_trap_rcv.c > index 9865f53..fb32ce9 100644 > --- a/osm/opensm/osm_trap_rcv.c > +++ b/osm/opensm/osm_trap_rcv.c > @@ -589,8 +589,7 @@ __osm_trap_rcv_process_request( > > p_rcv->p_subn->force_immediate_heavy_sweep = TRUE; > } > - osm_state_mgr_process( p_rcv->p_state_mgr, > - OSM_SIGNAL_SWEEP ); > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, OSM_SIGNAL_SWEEP ); > } > > /* If we reached here due to trap 129/130/131 - do not need to do > diff --git a/osm/opensm/osm_vl15intf.c b/osm/opensm/osm_vl15intf.c > index 68f17c5..c3adb6e 100644 > --- a/osm/opensm/osm_vl15intf.c > +++ b/osm/opensm/osm_vl15intf.c > @@ -62,6 +62,7 @@ #include > #include > #include > #include > +#include > #include > #include > > @@ -156,7 +157,6 @@ __osm_vl15_poller( > if( status != IB_SUCCESS ) > { > uint32_t outstanding; > - cl_status_t cl_status; > > osm_log( p_vl->p_log, OSM_LOG_ERROR, > "__osm_vl15_poller: ERR 3E03: " > @@ -202,27 +202,8 @@ __osm_vl15_poller( > The wire is clean. > Signal the state manager. > */ > - if( osm_log_is_active( p_vl->p_log, OSM_LOG_DEBUG ) ) > - { > - osm_log( p_vl->p_log, OSM_LOG_DEBUG, > - "__osm_vl15_poller: " > - "Posting Dispatcher message %s\n", > - osm_get_disp_msg_str( OSM_MSG_NO_SMPS_OUTSTANDING ) ); > - } > - > - cl_status = cl_disp_post( p_vl->h_disp, > - OSM_MSG_NO_SMPS_OUTSTANDING, > - (void *)OSM_SIGNAL_NO_PENDING_TRANSACTIONS, > - NULL, > - NULL ); > - > - if( cl_status != CL_SUCCESS ) > - { > - osm_log( p_vl->p_log, OSM_LOG_ERROR, > - "__osm_vl15_poller: ERR 3E06: " > - "Dispatcher post message failed (%s)\n", > - CL_STATUS_MSG( cl_status ) ); > - } > + osm_sm_signal( &p_vl->p_subn->p_osm->sm, > + OSM_SIGNAL_NO_PENDING_TRANSACTIONS ); > } > } > } From eitan at mellanox.co.il Sun May 21 23:27:03 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 22 May 2006 09:27:03 +0300 Subject: [openib-general] RE: [PATCH] opensm: remove osm_pkey_mgr.h Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBCA@mtlexch01.mtl.com> Hi Sasha, Every OpenSM manager has an H file. Instead of trying to "save" lines of code - please focus on improving code readability and structure. NOTE: this is not kernel code. The tradeoff for user land code is different. If you save us one header file - but break the structure of the code you make more damage than good. Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Monday, May 22, 2006 1:16 AM > To: Hal Rosenstock > Cc: openib-general at openib.org; Eitan Zahavi; Yael Kalka; Ofer Gigi > Subject: [PATCH] opensm: remove osm_pkey_mgr.h > > > Since we expect that osm_pkey_mgr_process() will be called only from > osm_state_mgr_process() this patch replaces osm_pkey_mgr.h header file > by local prototype. > > Signed-off-by: Sasha Khapyorsky > --- > > osm/include/Makefile.am | 1 > osm/include/opensm/osm_pkey_mgr.h | 92 ------------------------------------- > osm/opensm/osm_pkey_mgr.c | 1 > osm/opensm/osm_state_mgr.c | 3 + > 4 files changed, 2 insertions(+), 95 deletions(-) > > diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am > index b23b1de..2bee762 100644 > --- a/osm/include/Makefile.am > +++ b/osm/include/Makefile.am > @@ -96,7 +96,6 @@ EXTRA_DIST = \ > $(srcdir)/opensm/st.h \ > $(srcdir)/opensm/osm_mcast_tbl.h \ > $(srcdir)/opensm/osm_pkey.h \ > - $(srcdir)/opensm/osm_pkey_mgr.h \ > $(srcdir)/opensm/osm_sa_mad_ctrl.h \ > $(srcdir)/opensm/osm_req_ctrl.h \ > $(srcdir)/opensm/osm_sw_info_rcv.h \ > diff --git a/osm/include/opensm/osm_pkey_mgr.h > b/osm/include/opensm/osm_pkey_mgr.h > deleted file mode 100644 > index cb0075d..0000000 > --- a/osm/include/opensm/osm_pkey_mgr.h > +++ /dev/null > @@ -1,92 +0,0 @@ > -/* > - * Copyright (c) 2006 Voltaire, Inc. All rights reserved. > - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. > - * > - * This software is available to you under a choice of one of two > - * licenses. You may choose to be licensed under the terms of the GNU > - * General Public License (GPL) Version 2, available from the file > - * COPYING in the main directory of this source tree, or the > - * OpenIB.org BSD license below: > - * > - * Redistribution and use in source and binary forms, with or > - * without modification, are permitted provided that the following > - * conditions are met: > - * > - * - Redistributions of source code must retain the above > - * copyright notice, this list of conditions and the following > - * disclaimer. > - * > - * - Redistributions in binary form must reproduce the above > - * copyright notice, this list of conditions and the following > - * disclaimer in the documentation and/or other materials > - * provided with the distribution. > - * > - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE > WARRANTIES OF > - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT > HOLDERS > - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN > AN > - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR > IN > - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > THE > - * SOFTWARE. > - * > - * $Id$ > - */ > - > - > -/* > - * Abstract: > - * Prototype for osm_pkey_mgr_process() function > - * This is part of the OpenSM family of objects. > - * > - * Environment: > - * Linux User Mode > - * > - * $Revision: 1.4 $ > - */ > - > - > -#ifndef _OSM_PKEY_MGR_H_ > -#define _OSM_PKEY_MGR_H_ > - > -#include > -#include > - > -#ifdef __cplusplus > -# define BEGIN_C_DECLS extern "C" { > -# define END_C_DECLS } > -#else /* !__cplusplus */ > -# define BEGIN_C_DECLS > -# define END_C_DECLS > -#endif /* __cplusplus */ > - > -BEGIN_C_DECLS > - > -/****f* OpenSM: P_Key Manager/osm_pkey_mgr_process > -* NAME > -* osm_pkey_mgr_process > -* > -* DESCRIPTION > -* This function enforces the pkey rules on the SM DB. > -* > -* SYNOPSIS > -*/ > -osm_signal_t > -osm_pkey_mgr_process( > - IN osm_opensm_t *p_osm ); > -/* > -* PARAMETERS > -* p_osm > -* [in] Pointer to an osm_opensm_t object. > -* > -* RETURN VALUES > -* None > -* > -* NOTES > -* > -* SEE ALSO > -*********/ > - > -END_C_DECLS > - > -#endif /* _OSM_PKEY_MGR_H_ */ > diff --git a/osm/opensm/osm_pkey_mgr.c b/osm/opensm/osm_pkey_mgr.c > index e08b7cc..91c1a95 100644 > --- a/osm/opensm/osm_pkey_mgr.c > +++ b/osm/opensm/osm_pkey_mgr.c > @@ -56,7 +56,6 @@ #include > #include > #include > #include > -#include > #include > #include > > diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c > index 42fd5e8..724b2b7 100644 > --- a/osm/opensm/osm_state_mgr.c > +++ b/osm/opensm/osm_state_mgr.c > @@ -66,14 +66,15 @@ #include > #include > #include > #include > -#include > #include > #include > #include > > /********************************************************************** > + * Prototypes for manager processors used locally > **********************************************************************/ > osm_signal_t osm_qos_setup(IN osm_opensm_t * p_osm); > +osm_signal_t osm_pkey_mgr_process(IN osm_opensm_t * p_osm); > > /********************************************************************** > **********************************************************************/ From jackm at mellanox.co.il Sun May 21 23:30:21 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Mon, 22 May 2006 09:30:21 +0300 Subject: [openib-general] Re: vapi versus openib imm_data In-Reply-To: <20060522051923.GB14583@mellanox.co.il> References: <20060521233917.GA25201@osc.edu> <20060522051923.GB14583@mellanox.co.il> Message-ID: <200605220930.22274.jackm@mellanox.co.il> On Monday 22 May 2006 08:19, Michael S. Tsirkin wrote: > Quoting r. Pete Wyckoff : > > is there any good way to tell if the other side has put > > its imm_data in network byte order or not? > > gen2 always assumes imm_data is given in network byte order. VAPI assumes that imm_data is given (i.e., supplied to the API) in host byte order. If the host is a little-endian host (as PCs are), the mlxhh (i.e., inner) layer will convert immediate data to network byte order on the send, and will convert received immediate data from Network byte order to host byte order on receive -- and the VAPI caller will receive the immediate data in host byte order. - Jack From sashak at voltaire.com Mon May 22 01:12:15 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 22 May 2006 11:12:15 +0300 Subject: [openib-general] Re: [PATCH] RFC: opensm: serialize osm_state_mgr_process() In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBC9@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBC9@mtlexch01.mtl.com> Message-ID: <20060522081215.GB30176@sashak.voltaire.com> Hi Eitan, Thanks for the comments. On 09:24 Mon 22 May , Eitan Zahavi wrote: > > The idea to use pthread in OpenSM code is totally wrong. > Please stop doing this. We want this code to be shared with Windows and > this breaks it. The problem is that complib does not provide needed primitives, mostly in synchronization and thread management areas (like pthread_cond_wait() and friends, pthread_cancel() and friends). The option to extend complib also does not look very helpful for Windows too - complib uses pthreads as backend anyway. Is using of pthread library with Windows may solve sharing issue? Other option I may think about is to use pthread wrapper (in the same way as it is done today with complib). > > Also please provide a clear RFC for what this patch is trying to do. Basically this serializes execution of osm_state_mgr_process(), so instead of to be directly called via dispatcher's callback (with possible waiting for the lock), state_manager will be signaled (and wakeuped if necessary), as result we don't need big state_lock anymore, mutex is needed only to protect signal_mask (opensm's osm_signal_t, not *nix signals) update. Sasha > Eitan Zahavi > Senior Engineering Director, Software Architect > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > Sent: Monday, May 22, 2006 2:02 AM > > To: Hal Rosenstock; openib-general at openib.org > > Cc: Eitan Zahavi; Yael Kalka; Ofer Gigi; sashak at voltaire.com; > elid at voltaire.com > > Subject: [PATCH] RFC: opensm: serialize osm_state_mgr_process() > > > > Hello, > > > > Please comment (and test). > > > > Thanks, > > Sasha. > > > > This serializes execution of osm_state_mgr_process() and removes the > big > > state_lock. This should reduce "locked state" time and prevent > potential > > dispatcher blocking. > > > > Other important change here is direct usage of pthread primitives > > instead of "traditional" cl_thread* stuff. > > > > Signed-off-by: Sasha Khapyorsky > > --- > > > > osm/include/Makefile.am | 1 > > osm/include/opensm/osm_msgdef.h | 15 -- > > osm/include/opensm/osm_sm.h | 36 ++++- > > osm/include/opensm/osm_state_mgr.h | 4 - > > osm/include/opensm/osm_state_mgr_ctrl.h | 236 > ------------------------------- > > osm/opensm/Makefile.am | 5 - > > osm/opensm/osm_helper.c | 1 > > osm/opensm/osm_node_info_rcv.c | 5 - > > osm/opensm/osm_port_info_rcv.c | 3 > > osm/opensm/osm_sm.c | 160 +++++++++++---------- > > osm/opensm/osm_sm_mad_ctrl.c | 26 --- > > osm/opensm/osm_sm_state_mgr.c | 8 + > > osm/opensm/osm_sminfo_rcv.c | 8 + > > osm/opensm/osm_state_mgr.c | 24 --- > > osm/opensm/osm_state_mgr_ctrl.c | 132 ----------------- > > osm/opensm/osm_sw_info_rcv.c | 3 > > osm/opensm/osm_sweep_fail_ctrl.c | 4 - > > osm/opensm/osm_trap_rcv.c | 3 > > osm/opensm/osm_vl15intf.c | 25 --- > > 19 files changed, 143 insertions(+), 556 deletions(-) > > > > diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am > > index 2bee762..0af78a0 100644 > > --- a/osm/include/Makefile.am > > +++ b/osm/include/Makefile.am > > @@ -120,7 +120,6 @@ EXTRA_DIST = \ > > $(srcdir)/opensm/osm_vl15intf.h \ > > $(srcdir)/opensm/osm_drop_mgr.h \ > > $(srcdir)/opensm/osm_port_info_rcv.h \ > > - $(srcdir)/opensm/osm_state_mgr_ctrl.h \ > > $(srcdir)/complib/cl_thread_osd.h \ > > $(srcdir)/complib/cl_packon.h \ > > $(srcdir)/complib/cl_atomic_osd.h \ > > diff --git a/osm/include/opensm/osm_msgdef.h > b/osm/include/opensm/osm_msgdef.h > > index a1b5743..6956c86 100644 > > --- a/osm/include/opensm/osm_msgdef.h > > +++ b/osm/include/opensm/osm_msgdef.h > > @@ -148,20 +148,6 @@ BEGIN_C_DECLS > > * > > * SOURCE > > ***********/ > > -/****d* OpenSM: Dispatcher Messages/OSM_MSG_NO_SMPS_OUTSTANDING > > -* NAME > > -* OSM_MSG_NO_SMPS_OUTSTANDING > > -* > > -* DESCRIPTION > > -* Message indicating that there are no outstanding SMPs on the > subnet. > > -* > > -* NOTES > > -* Sent by: osm_mad_ctrl_t > > -* Received by: osm_state_mgr_ctrl_t > > -* Delivery notice: no > > -* > > -* SOURCE > > -***********/ > > enum > > { > > OSM_MSG_REQ = 0, > > @@ -169,7 +155,6 @@ enum > > OSM_MSG_MAD_PORT_INFO, > > OSM_MSG_MAD_SWITCH_INFO, > > OSM_MSG_MAD_NODE_DESC, > > - OSM_MSG_NO_SMPS_OUTSTANDING, > > OSM_MSG_MAD_NODE_RECORD, > > OSM_MSG_MAD_PORTINFO_RECORD, > > OSM_MSG_MAD_SERVICE_RECORD, > > diff --git a/osm/include/opensm/osm_sm.h b/osm/include/opensm/osm_sm.h > > index d6086d4..def43c8 100644 > > --- a/osm/include/opensm/osm_sm.h > > +++ b/osm/include/opensm/osm_sm.h > > @@ -69,7 +69,6 @@ #include > #include > > #include > > #include > > -#include > > #include > > #include > > #include > > @@ -131,7 +130,6 @@ BEGIN_C_DECLS > > typedef struct _osm_sm > > { > > osm_thread_state_t thread_state; > > - cl_event_t signal; > > cl_event_t subnet_up_event; > > cl_thread_t sweeper; > > osm_subn_t *p_subn; > > @@ -143,6 +141,9 @@ typedef struct _osm_sm > > cl_dispatcher_t *p_disp; > > cl_plock_t *p_lock; > > atomic32_t sm_trans_id; > > + unsigned signal_mask; > > + pthread_mutex_t mutex; > > + pthread_cond_t cond; > > osm_req_t req; > > osm_req_ctrl_t req_ctrl; > > osm_resp_t resp; > > @@ -155,7 +156,6 @@ typedef struct _osm_sm > > osm_sm_mad_ctrl_t mad_ctrl; > > osm_si_rcv_t si_rcv; > > osm_si_rcv_ctrl_t si_rcv_ctrl; > > - osm_state_mgr_ctrl_t state_mgr_ctrl; > > osm_lid_mgr_t lid_mgr; > > osm_ucast_mgr_t ucast_mgr; > > osm_link_mgr_t link_mgr; > > @@ -387,6 +387,33 @@ osm_sm_init( > > * SM object, osm_sm_construct, osm_sm_destroy > > *********/ > > > > +/****f* OpenSM: SM/osm_sm_signal > > +* NAME > > +* osm_sm_signal > > +* > > +* DESCRIPTION > > +* Forward signal to state engine > > +* > > +* SYNOPSIS > > +*/ > > +void > > +osm_sm_signal( > > + IN osm_sm_t* const p_sm, > > + IN osm_signal_t signal ); > > +/* > > +* PARAMETERS > > +* p_sm > > +* [in] Pointer to an osm_sm_t object. > > +* > > +* signal > > +* [in] Signal to the state engine. > > +* > > +* NOTES > > +* > > +* SEE ALSO > > +* SM object > > +*********/ > > + > > /****f* OpenSM: SM/osm_sm_sweep > > * NAME > > * osm_sm_sweep > > @@ -404,9 +431,6 @@ osm_sm_sweep( > > * p_sm > > * [in] Pointer to an osm_sm_t object. > > * > > -* RETURN VALUES > > -* IB_SUCCESS if the sweep completed successfully. > > -* > > * NOTES > > * > > * SEE ALSO > > diff --git a/osm/include/opensm/osm_state_mgr.h > > b/osm/include/opensm/osm_state_mgr.h > > index ad4afa0..5e76463 100644 > > --- a/osm/include/opensm/osm_state_mgr.h > > +++ b/osm/include/opensm/osm_state_mgr.h > > @@ -116,7 +116,6 @@ typedef struct _osm_state_mgr > > osm_stats_t *p_stats; > > struct _osm_sm_state_mgr *p_sm_state_mgr; > > const osm_sm_mad_ctrl_t *p_mad_ctrl; > > - cl_spinlock_t state_lock; > > cl_spinlock_t idle_lock; > > cl_qlist_t idle_time_list; > > cl_plock_t *p_lock; > > @@ -161,9 +160,6 @@ typedef struct _osm_state_mgr > > * p_mad_ctrl > > * Pointer to the SM's MAD Controller object. > > * > > -* state_lock > > -* Spinlock guarding the state and processes. > > -* > > * p_lock > > * lock guarding the subnet object. > > * > > diff --git a/osm/include/opensm/osm_state_mgr_ctrl.h > > b/osm/include/opensm/osm_state_mgr_ctrl.h > > deleted file mode 100644 > > index 9ffcfb0..0000000 > > --- a/osm/include/opensm/osm_state_mgr_ctrl.h > > +++ /dev/null > > @@ -1,236 +0,0 @@ > > -/* > > - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. > > - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights > reserved. > > - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. > > - * > > - * This software is available to you under a choice of one of two > > - * licenses. You may choose to be licensed under the terms of the > GNU > > - * General Public License (GPL) Version 2, available from the file > > - * COPYING in the main directory of this source tree, or the > > - * OpenIB.org BSD license below: > > - * > > - * Redistribution and use in source and binary forms, with or > > - * without modification, are permitted provided that the > following > > - * conditions are met: > > - * > > - * - Redistributions of source code must retain the above > > - * copyright notice, this list of conditions and the following > > - * disclaimer. > > - * > > - * - Redistributions in binary form must reproduce the above > > - * copyright notice, this list of conditions and the following > > - * disclaimer in the documentation and/or other materials > > - * provided with the distribution. > > - * > > - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > > - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE > > WARRANTIES OF > > - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT > > HOLDERS > > - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN > > AN > > - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR > > IN > > - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > > THE > > - * SOFTWARE. > > - * > > - * $Id$ > > - */ > > - > > - > > -/* > > - * Abstract: > > - * Declaration of osm_state_mgr_ctrl_t. > > - * This object represents a controller that receives the > > - * State indication after a subnet sweep. > > - * This object is part of the OpenSM family of objects. > > - * > > - * Environment: > > - * Linux User Mode > > - * > > - * $Revision: 1.4 $ > > - */ > > - > > -#ifndef _OSM_STATE_MGR_CTRL_H_ > > -#define _OSM_STATE_MGR_CTRL_H_ > > - > > - > > -#include > > -#include > > -#include > > -#include > > - > > -#ifdef __cplusplus > > -# define BEGIN_C_DECLS extern "C" { > > -# define END_C_DECLS } > > -#else /* !__cplusplus */ > > -# define BEGIN_C_DECLS > > -# define END_C_DECLS > > -#endif /* __cplusplus */ > > - > > -BEGIN_C_DECLS > > - > > -/****h* OpenSM/State Manager Controller > > -* NAME > > -* State Manager Controller > > -* > > -* DESCRIPTION > > -* The State Manager Controller object encapsulates the information > > -* needed to pass the dispatcher message from the dispatcher > > -* to the State Manager. > > -* > > -* The State Manager Controller object is thread safe. > > -* > > -* This object should be treated as opaque and should be > > -* manipulated only through the provided functions. > > -* > > -* AUTHOR > > -* Steve King, Intel > > -* > > -*********/ > > -/****s* OpenSM: State Manager Controller/osm_state_mgr_ctrl_t > > -* NAME > > -* osm_state_mgr_ctrl_t > > -* > > -* DESCRIPTION > > -* State Manager Controller structure. > > -* > > -* This object should be treated as opaque and should > > -* be manipulated only through the provided functions. > > -* > > -* SYNOPSIS > > -*/ > > -typedef struct _osm_state_mgr_ctrl > > -{ > > - osm_state_mgr_t *p_mgr; > > - osm_log_t *p_log; > > - cl_dispatcher_t *p_disp; > > - cl_disp_reg_handle_t h_disp; > > - > > -} osm_state_mgr_ctrl_t; > > -/* > > -* FIELDS > > -* p_mgr > > -* Pointer to the State Manager object. > > -* > > -* p_log > > -* Pointer to the log object. > > -* > > -* p_disp > > -* Pointer to the Dispatcher. > > -* > > -* h_disp > > -* Handle returned from dispatcher registration. > > -* > > -* SEE ALSO > > -* State Manager Controller object > > -*********/ > > - > > -/****f* OpenSM: State Manager Controller/osm_state_mgr_ctrl_construct > > -* NAME > > -* osm_state_mgr_ctrl_construct > > -* > > -* DESCRIPTION > > -* This function constructs a State Manager Controller object. > > -* > > -* SYNOPSIS > > -*/ > > -void > > -osm_state_mgr_ctrl_construct( > > - IN osm_state_mgr_ctrl_t* const p_ctrl ); > > -/* > > -* PARAMETERS > > -* p_ctrl > > -* [in] Pointer to a State Manager Controller > > -* object to construct. > > -* > > -* RETURN VALUE > > -* This function does not return a value. > > -* > > -* NOTES > > -* Allows calling osm_state_mgr_ctrl_init, > osm_state_mgr_ctrl_destroy, > > -* and osm_state_mgr_ctrl_is_inited. > > -* > > -* Calling osm_state_mgr_ctrl_construct is a prerequisite to > calling any other > > -* method except osm_state_mgr_ctrl_init. > > -* > > -* SEE ALSO > > -* State Manager Controller object, osm_state_mgr_ctrl_init, > > -* osm_state_mgr_ctrl_destroy > > -*********/ > > - > > -/****f* OpenSM: State Manager Controller/osm_state_mgr_ctrl_destroy > > -* NAME > > -* osm_state_mgr_ctrl_destroy > > -* > > -* DESCRIPTION > > -* The osm_state_mgr_ctrl_destroy function destroys the object, > releasing > > -* all resources. > > -* > > -* SYNOPSIS > > -*/ > > -void > > -osm_state_mgr_ctrl_destroy( > > - IN osm_state_mgr_ctrl_t* const p_ctrl ); > > -/* > > -* PARAMETERS > > -* p_ctrl > > -* [in] Pointer to the object to destroy. > > -* > > -* RETURN VALUE > > -* This function does not return a value. > > -* > > -* NOTES > > -* Performs any necessary cleanup of the specified > > -* State Manager Controller object. > > -* Further operations should not be attempted on the destroyed > object. > > -* This function should only be called after a call to > > -* osm_state_mgr_ctrl_construct or osm_state_mgr_ctrl_init. > > -* > > -* SEE ALSO > > -* State Manager Controller object, osm_state_mgr_ctrl_construct, > > -* osm_state_mgr_ctrl_init > > -*********/ > > - > > -/****f* OpenSM: State Manager Controller/osm_state_mgr_ctrl_init > > -* NAME > > -* osm_state_mgr_ctrl_init > > -* > > -* DESCRIPTION > > -* The osm_state_mgr_ctrl_init function initializes a > > -* State Manager Controller object for use. > > -* > > -* SYNOPSIS > > -*/ > > -ib_api_status_t > > -osm_state_mgr_ctrl_init( > > - IN osm_state_mgr_ctrl_t* const p_ctrl, > > - IN osm_state_mgr_t* const p_mgr, > > - IN osm_log_t* const p_log, > > - IN cl_dispatcher_t* const p_disp ); > > -/* > > -* PARAMETERS > > -* p_ctrl > > -* [in] Pointer to an osm_state_mgr_ctrl_t object to > initialize. > > -* > > -* p_mgr > > -* [in] Pointer to an osm_state_mgr_t object. > > -* > > -* p_log > > -* [in] Pointer to the log object. > > -* > > -* p_disp > > -* [in] Pointer to the OpenSM central Dispatcher. > > -* > > -* RETURN VALUES > > -* IB_SUCCESS if the State Manager Controller object > > -* was initialized successfully. > > -* > > -* NOTES > > -* Allows calling other State Manager Controller methods. > > -* > > -* SEE ALSO > > -* State Manager Controller object, osm_state_mgr_ctrl_construct, > > -* osm_state_mgr_ctrl_destroy > > -*********/ > > - > > -END_C_DECLS > > - > > -#endif /* OSM_STATE_MGR_CTRL_H_ */ > > diff --git a/osm/opensm/Makefile.am b/osm/opensm/Makefile.am > > index 43fe8c1..7b1060a 100644 > > --- a/osm/opensm/Makefile.am > > +++ b/osm/opensm/Makefile.am > > @@ -78,8 +78,7 @@ opensm_SOURCES = main.c osm_console.c os > > osm_slvl_map_rcv.c osm_slvl_map_rcv_ctrl.c \ > > osm_sm.c osm_sminfo_rcv.c \ > > osm_sminfo_rcv_ctrl.c osm_sm_mad_ctrl.c \ > > - osm_sm_state_mgr.c osm_state_mgr.c \ > > - osm_state_mgr_ctrl.c osm_subnet.c \ > > + osm_sm_state_mgr.c osm_state_mgr.c osm_subnet.c \ > > osm_sweep_fail_ctrl.c osm_sw_info_rcv.c \ > > osm_sw_info_rcv_ctrl.c osm_switch.c \ > > osm_prtn.c osm_prtn_config.c osm_qos.c \ > > @@ -104,7 +103,7 @@ # we need to be able to load libraries f > > # we always give precedence to local tree libs and then use the > pre-installed ones. > > opensm_LDADD = -L../complib -L../libvendor -L. $(OSMV_LDADD) -lopensm > - > > losmcomp -losmvendor > > > > -opensm_LDFLAGS = -Wl,--rpath -Wl,$(libdir) -lpthread > > +opensm_LDFLAGS = -Wl,--rpath -Wl,$(libdir) -lpthread -lrt > > > > opensmincludedir = $(includedir)/infiniband/opensm > > > > diff --git a/osm/opensm/osm_helper.c b/osm/opensm/osm_helper.c > > index 3886609..a966bbe 100644 > > --- a/osm/opensm/osm_helper.c > > +++ b/osm/opensm/osm_helper.c > > @@ -1895,7 +1895,6 @@ static const char* const __osm_disp_msg_ > > "OSM_MSG_MAD_PORT_INFO,", > > "OSM_MSG_MAD_SWITCH_INFO", > > "OSM_MSG_MAD_NODE_DESC", > > - "OSM_MSG_NO_SMPS_OUTSTANDING", > > "OSM_MSG_MAD_NODE_RECORD", > > "OSM_MSG_MAD_PORTINFO_RECORD", > > "OSM_MSG_MAD_SERVICE_RECORD", > > diff --git a/osm/opensm/osm_node_info_rcv.c > b/osm/opensm/osm_node_info_rcv.c > > index 59257a0..7ea4366 100644 > > --- a/osm/opensm/osm_node_info_rcv.c > > +++ b/osm/opensm/osm_node_info_rcv.c > > @@ -69,6 +69,7 @@ #include > > #include > > #include > > #include > > +#include > > > > > > > /********************************************************************** > > @@ -1088,11 +1089,11 @@ osm_ni_rcv_process( > > > > /* > > * If we processed a new node - need to signal to the state_mgr > that > > - * change detected. BUT - we cannot call the osm_state_mgr_process > > + * change detected. BUT - we cannot call the osm_sm_signal > > * from within the lock of p_rcv->p_lock (can cause a deadlock). > > */ > > if ( process_new_flag ) > > - osm_state_mgr_process( p_rcv->p_state_mgr, > > OSM_SIGNAL_CHANGE_DETECTED ); > > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, > > OSM_SIGNAL_CHANGE_DETECTED ); > > > > Exit: > > OSM_LOG_EXIT( p_rcv->p_log ); > > diff --git a/osm/opensm/osm_port_info_rcv.c > b/osm/opensm/osm_port_info_rcv.c > > index a08c57c..7405ef0 100644 > > --- a/osm/opensm/osm_port_info_rcv.c > > +++ b/osm/opensm/osm_port_info_rcv.c > > @@ -69,6 +69,7 @@ #include > > #include > > #include > > #include > > +#include > > > > > /********************************************************************** > > > **********************************************************************/ > > @@ -701,7 +702,7 @@ osm_pi_rcv_process( > > " port = %u, Commencing heavy sweep\n", > > cl_ntoh64( node_guid ), > > cl_ntoh64( port_guid ) ); > > - osm_state_mgr_process( p_rcv->p_state_mgr, > > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, > > OSM_SIGNAL_CHANGE_DETECTED ); > > goto Exit; > > } > > diff --git a/osm/opensm/osm_sm.c b/osm/opensm/osm_sm.c > > index 0e09f26..5b5eb3f 100644 > > --- a/osm/opensm/osm_sm.c > > +++ b/osm/opensm/osm_sm.c > > @@ -55,6 +55,8 @@ #if HAVE_CONFIG_H > > # include > > #endif /* HAVE_CONFIG_H */ > > > > +#include > > +#include > > #include > > #include > > #include > > @@ -79,53 +81,65 @@ void > > __osm_sm_sweeper( > > IN void *p_ptr ) > > { > > - ib_api_status_t status; > > osm_sm_t *const p_sm = ( osm_sm_t * ) p_ptr; > > + unsigned i, signals; > > > > OSM_LOG_ENTER( p_sm->p_log, __osm_sm_sweeper ); > > > > - if( p_sm->thread_state == OSM_THREAD_STATE_INIT ) > > - { > > - p_sm->thread_state = OSM_THREAD_STATE_RUN; > > - } > > - > > - /* If the sweep interval was updated before - then run only if > > - * it is not zero. */ > > - while( p_sm->thread_state == OSM_THREAD_STATE_RUN && > > - p_sm->p_subn->opt.sweep_interval != 0 ) > > - { > > - /* do the sweep only if we are in MASTER state */ > > - if( p_sm->p_subn->sm_state == IB_SMINFO_STATE_MASTER || > > - p_sm->p_subn->sm_state == IB_SMINFO_STATE_DISCOVERING ) > > - osm_state_mgr_process( &p_sm->state_mgr, OSM_SIGNAL_SWEEP ); > > + do { > > + signals = 0; > > + pthread_mutex_lock(&p_sm->mutex); > > + if (p_sm->signal_mask == 0) > > + pthread_cond_wait(&p_sm->cond, &p_sm->mutex); > > + signals = p_sm->signal_mask; > > + p_sm->signal_mask = 0; > > + pthread_mutex_unlock(&p_sm->mutex); > > + for (i = 0 ; signals ; i++) { > > + if (signals&1) > > + osm_state_mgr_process( &p_sm->state_mgr, i); > > + signals >>= 1; > > + } > > + } while (p_sm->thread_state == OSM_THREAD_STATE_RUN); > > > > - /* > > - * Wait on the event with a timeout. > > - * Sweeps may be iniated "off schedule" by simply > > - * signaling the event. > > - */ > > - status = cl_event_wait_on( &p_sm->signal, > > - p_sm->p_subn->opt.sweep_interval * > 1000000, > > - TRUE ); > > + OSM_LOG_EXIT( p_sm->p_log ); > > +} > > > > - if( status == CL_SUCCESS ) > > - { > > - if( osm_log_is_active( p_sm->p_log, OSM_LOG_DEBUG ) ) > > - { > > - osm_log( p_sm->p_log, OSM_LOG_DEBUG, > > - "__osm_sm_sweeper: " "Off schedule sweep > signalled\n" ); > > - } > > > +/********************************************************************** > > + > **********************************************************************/ > > +void > > +__osm_sm_sweeper_periodic( > > + IN void *p_ptr ) > > +{ > > + struct timespec times; > > + osm_sm_t *const p_sm = ( osm_sm_t * ) p_ptr; > > + unsigned i, signals; > > + int ret; > > + > > + OSM_LOG_ENTER( p_sm->p_log, __osm_sm_sweeper_periodic ); > > + > > + do { > > + clock_gettime(CLOCK_REALTIME, ×); > > + times.tv_sec += p_sm->p_subn->opt.sweep_interval; > > + signals = 0; > > + pthread_mutex_lock(&p_sm->mutex); > > + if (p_sm->signal_mask == 0 && > > + (ret = pthread_cond_timedwait(&p_sm->cond, &p_sm->mutex, > > + ×)) == ETIMEDOUT && > > + ( p_sm->p_subn->sm_state == IB_SMINFO_STATE_MASTER || > > + p_sm->p_subn->sm_state == IB_SMINFO_STATE_DISCOVERING > )) { > > + signals = OSM_SIGNAL_SWEEP; > > } > > - else > > - { > > - if( status != CL_TIMEOUT ) > > - { > > - osm_log( p_sm->p_log, OSM_LOG_ERROR, > > - "__osm_sm_sweeper: ERR 2E01: " > > - "Event wait failed (%s)\n", CL_STATUS_MSG( > status ) ); > > - } > > + else { > > + signals = p_sm->signal_mask; > > + p_sm->signal_mask = 0; > > } > > - } > > + pthread_mutex_unlock(&p_sm->mutex); > > + for (i = 0 ; signals ; i++) { > > + if (signals&1) > > + osm_state_mgr_process( &p_sm->state_mgr, i); > > + signals >>= 1; > > + } > > + } while (p_sm->thread_state == OSM_THREAD_STATE_RUN); > > > > OSM_LOG_EXIT( p_sm->p_log ); > > } > > @@ -139,7 +153,6 @@ osm_sm_construct( > > memset( p_sm, 0, sizeof( *p_sm ) ); > > p_sm->thread_state = OSM_THREAD_STATE_NONE; > > p_sm->sm_trans_id = OSM_SM_INITIAL_TID_VALUE; > > - cl_event_construct( &p_sm->signal ); > > cl_event_construct( &p_sm->subnet_up_event ); > > cl_thread_construct( &p_sm->sweeper ); > > osm_req_construct( &p_sm->req ); > > @@ -158,7 +171,6 @@ osm_sm_construct( > > osm_ucast_mgr_construct( &p_sm->ucast_mgr ); > > osm_link_mgr_construct( &p_sm->link_mgr ); > > osm_state_mgr_construct( &p_sm->state_mgr ); > > - osm_state_mgr_ctrl_construct( &p_sm->state_mgr_ctrl ); > > osm_drop_mgr_construct( &p_sm->drop_mgr ); > > osm_lft_rcv_construct( &p_sm->lft_rcv ); > > osm_lft_rcv_ctrl_construct( &p_sm->lft_rcv_ctrl ); > > @@ -185,24 +197,14 @@ void > > osm_sm_shutdown( > > IN osm_sm_t * const p_sm ) > > { > > - boolean_t signal_event = FALSE; > > - > > OSM_LOG_ENTER( p_sm->p_log, osm_sm_shutdown ); > > > > /* > > * Signal our threads that we're leaving. > > - */ > > - if( p_sm->thread_state != OSM_THREAD_STATE_NONE ) > > - signal_event = TRUE; > > - > > - p_sm->thread_state = OSM_THREAD_STATE_EXIT; > > - > > - /* > > - * Don't trigger unless event has been initialized. > > * Destroy the thread before we tear down the other objects. > > */ > > - if( signal_event ) > > - cl_event_signal( &p_sm->signal ); > > + p_sm->thread_state = OSM_THREAD_STATE_EXIT; > > + osm_sm_signal( p_sm, OSM_SIGNAL_NONE ); > > > > cl_thread_destroy( &p_sm->sweeper ); > > > > @@ -225,7 +227,6 @@ osm_sm_shutdown( > > osm_vla_rcv_ctrl_destroy( &p_sm->vla_rcv_ctrl ); > > osm_pkey_rcv_ctrl_destroy( &p_sm->pkey_rcv_ctrl ); > > osm_sweep_fail_ctrl_destroy( &p_sm->sweep_fail_ctrl ); > > - osm_state_mgr_ctrl_destroy( &p_sm->state_mgr_ctrl ); > > > > OSM_LOG_EXIT( p_sm->p_log ); > > } > > @@ -257,12 +258,14 @@ osm_sm_destroy( > > osm_state_mgr_destroy( &p_sm->state_mgr ); > > osm_sm_state_mgr_destroy( &p_sm->sm_state_mgr ); > > osm_mcast_mgr_destroy( &p_sm->mcast_mgr ); > > - cl_event_destroy( &p_sm->signal ); > > cl_event_destroy( &p_sm->subnet_up_event ); > > > > if( p_sm->p_report_buf != NULL ) > > cl_free( p_sm->p_report_buf ); > > > > + pthread_cond_destroy(&p_sm->cond); > > + pthread_mutex_destroy(&p_sm->mutex); > > + > > osm_log( p_sm->p_log, OSM_LOG_SYS, "Exiting SM\n" ); /* Format > Waived */ > > OSM_LOG_EXIT( p_sm->p_log ); > > } > > @@ -303,14 +306,15 @@ osm_sm_init( > > status = IB_INSUFFICIENT_MEMORY; > > goto Exit; > > } > > - status = cl_event_init( &p_sm->signal, FALSE ); > > - if( status != CL_SUCCESS ) > > - goto Exit; > > > > status = cl_event_init( &p_sm->subnet_up_event, FALSE ); > > if( status != CL_SUCCESS ) > > goto Exit; > > > > + p_sm->signal_mask = 0; > > + pthread_mutex_init(&p_sm->mutex, NULL); > > + pthread_cond_init(&p_sm->cond, NULL); > > + > > status = osm_sm_mad_ctrl_init( &p_sm->mad_ctrl, > > p_sm->p_subn, > > p_sm->p_mad_pool, > > @@ -416,12 +420,6 @@ osm_sm_init( > > if( status != IB_SUCCESS ) > > goto Exit; > > > > - status = osm_state_mgr_ctrl_init( &p_sm->state_mgr_ctrl, > > - &p_sm->state_mgr, > > - p_sm->p_log, p_sm->p_disp ); > > - if( status != IB_SUCCESS ) > > - goto Exit; > > - > > status = osm_drop_mgr_init( &p_sm->drop_mgr, > > p_sm->p_subn, > > p_sm->p_log, &p_sm->req, p_sm->p_lock > ); > > @@ -523,16 +521,15 @@ osm_sm_init( > > > > /* > > * Now that the component objects are initialized, start > > - * the sweeper thread if the user wants sweeping. > > + * the sweeper thread. > > */ > > - if( p_sm->p_subn->opt.sweep_interval ) > > - { > > - p_sm->thread_state = OSM_THREAD_STATE_INIT; > > - status = cl_thread_init( &p_sm->sweeper, __osm_sm_sweeper, > p_sm, > > - "opensm sweeper" ); > > - if( status != IB_SUCCESS ) > > - goto Exit; > > - } > > + p_sm->thread_state = OSM_THREAD_STATE_RUN; > > + status = cl_thread_init( &p_sm->sweeper, > > + p_sm->p_subn->opt.sweep_interval > 0 ? > > + __osm_sm_sweeper_periodic : > __osm_sm_sweeper, > > + p_sm, "opensm sweeper" ); > > + if( status != IB_SUCCESS ) > > + goto Exit; > > > > Exit: > > OSM_LOG_EXIT( p_log ); > > @@ -542,11 +539,26 @@ osm_sm_init( > > > /********************************************************************** > > > **********************************************************************/ > > void > > +osm_sm_signal( > > + IN osm_sm_t* const p_sm, > > + IN osm_signal_t signal ) > > +{ > > + OSM_LOG_ENTER( p_sm->p_log, osm_sm_signal ); > > + pthread_mutex_lock(&p_sm->mutex); > > + p_sm->signal_mask |= (1 << signal); > > + pthread_cond_signal(&p_sm->cond); > > + pthread_mutex_unlock(&p_sm->mutex); > > + OSM_LOG_EXIT( p_sm->p_log ); > > +} > > + > > > +/********************************************************************** > > + > **********************************************************************/ > > +void > > osm_sm_sweep( > > IN osm_sm_t * const p_sm ) > > { > > OSM_LOG_ENTER( p_sm->p_log, osm_sm_sweep ); > > - osm_state_mgr_process( &p_sm->state_mgr, OSM_SIGNAL_SWEEP ); > > + osm_sm_signal( p_sm, OSM_SIGNAL_SWEEP ); > > OSM_LOG_EXIT( p_sm->p_log ); > > } > > > > diff --git a/osm/opensm/osm_sm_mad_ctrl.c > b/osm/opensm/osm_sm_mad_ctrl.c > > index 9dceef2..1982873 100644 > > --- a/osm/opensm/osm_sm_mad_ctrl.c > > +++ b/osm/opensm/osm_sm_mad_ctrl.c > > @@ -81,7 +81,6 @@ __osm_sm_mad_ctrl_retire_trans_mad( > > IN osm_madw_t* const p_madw ) > > { > > uint32_t outstanding; > > - cl_status_t status; > > > > OSM_LOG_ENTER( p_ctrl->p_log, __osm_sm_mad_ctrl_retire_trans_mad ); > > > > @@ -115,31 +114,10 @@ __osm_sm_mad_ctrl_retire_trans_mad( > > The wire is clean. > > Signal the state manager. > > */ > > - if( osm_log_is_active( p_ctrl->p_log, OSM_LOG_DEBUG ) ) > > - { > > - osm_log( p_ctrl->p_log, OSM_LOG_DEBUG, > > - "__osm_sm_mad_ctrl_retire_trans_mad: " > > - "Posting Dispatcher message %s\n", > > - osm_get_disp_msg_str( OSM_MSG_NO_SMPS_OUTSTANDING ) ); > > - } > > - > > - status = cl_disp_post( p_ctrl->h_disp, > > - OSM_MSG_NO_SMPS_OUTSTANDING, > > - (void > *)OSM_SIGNAL_NO_PENDING_TRANSACTIONS, > > - NULL, > > - NULL ); > > - > > - if( status != CL_SUCCESS ) > > - { > > - osm_log( p_ctrl->p_log, OSM_LOG_ERROR, > > - "__osm_sm_mad_ctrl_retire_trans_mad: ERR 3101: " > > - "Dispatcher post message failed (%s)\n", > > - CL_STATUS_MSG( status ) ); > > - goto Exit; > > - } > > + osm_sm_signal( &p_ctrl->p_subn->p_osm->sm, > > + OSM_SIGNAL_NO_PENDING_TRANSACTIONS ); > > } > > > > - Exit: > > OSM_LOG_EXIT( p_ctrl->p_log ); > > } > > /************/ > > diff --git a/osm/opensm/osm_sm_state_mgr.c > b/osm/opensm/osm_sm_state_mgr.c > > index feeda45..2c6da4e 100644 > > --- a/osm/opensm/osm_sm_state_mgr.c > > +++ b/osm/opensm/osm_sm_state_mgr.c > > @@ -563,7 +563,7 @@ osm_sm_state_mgr_process( > > /* > > * Stop the discovering > > */ > > - osm_state_mgr_process( p_sm_mgr->p_state_mgr, > > + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, > > > OSM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED ); > > break; > > case OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE: > > @@ -610,7 +610,7 @@ osm_sm_state_mgr_process( > > __osm_sm_state_mgr_discovering_msg( p_sm_mgr ); > > p_sm_mgr->p_subn->sm_state = IB_SMINFO_STATE_DISCOVERING; > > p_sm_mgr->p_subn->coming_out_of_standby = TRUE; > > - osm_state_mgr_process( p_sm_mgr->p_state_mgr, > OSM_SIGNAL_EXIT_STBY > > ); > > + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, > > OSM_SIGNAL_EXIT_STBY ); > > break; > > case OSM_SM_SIGNAL_DISABLE: > > /* > > @@ -641,7 +641,7 @@ osm_sm_state_mgr_process( > > */ > > p_sm_mgr->p_subn->master_sm_base_lid = > p_sm_mgr->p_subn->sm_base_lid; > > p_sm_mgr->p_subn->coming_out_of_standby = TRUE; > > - osm_state_mgr_process( p_sm_mgr->p_state_mgr, > OSM_SIGNAL_EXIT_STBY > > ); > > + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, > > OSM_SIGNAL_EXIT_STBY ); > > break; > > case OSM_SM_SIGNAL_ACKNOWLEDGE: > > /* > > @@ -704,7 +704,7 @@ osm_sm_state_mgr_process( > > "Received OSM_SM_SIGNAL_HANDOVER\n" ); > > p_sm_mgr->p_polling_sm = NULL; > > p_sm_mgr->p_subn->force_immediate_heavy_sweep = TRUE; > > - osm_state_mgr_process( p_sm_mgr->p_state_mgr, > OSM_SIGNAL_SWEEP ); > > + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, > OSM_SIGNAL_SWEEP ); > > break; > > case OSM_SM_SIGNAL_HANDOVER_SENT: > > /* > > diff --git a/osm/opensm/osm_sminfo_rcv.c b/osm/opensm/osm_sminfo_rcv.c > > index 5914984..4af549b 100644 > > --- a/osm/opensm/osm_sminfo_rcv.c > > +++ b/osm/opensm/osm_sminfo_rcv.c > > @@ -425,10 +425,10 @@ __osm_sminfo_rcv_process_set_request( > > } > > > > > /********************************************************************** > > - * Return a signal with which to call the osm_state_mgr_process. > > + * Return a signal with which to call the osm_sm_signal. > > * This is done since we are locked by p_rcv->p_lock in this > function, > > - * and thus cannot call osm_state_mgr_process (that locks the > state_lock). > > - * If return OSM_SIGNAL_NONE - do not call osm_state_mgr_process. > > + * and thus cannot call osm_sm_signal. > > + * If return OSM_SIGNAL_NONE - do not call osm_sm_signal. > > > **********************************************************************/ > > osm_signal_t > > __osm_sminfo_rcv_process_get_sm( > > @@ -676,7 +676,7 @@ __osm_sminfo_rcv_process_get_response( > > /* If process_get_sm_ret_val != OSM_SIGNAL_NONE then we have to > signal > > * to the state_mgr with that signal. */ > > if (process_get_sm_ret_val != OSM_SIGNAL_NONE) > > - osm_state_mgr_process( p_rcv->p_state_mgr, > > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, > > process_get_sm_ret_val ); > > OSM_LOG_EXIT( p_rcv->p_log ); > > } > > diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c > > index 724b2b7..ff1c65c 100644 > > --- a/osm/opensm/osm_state_mgr.c > > +++ b/osm/opensm/osm_state_mgr.c > > @@ -83,7 +83,6 @@ osm_state_mgr_construct( > > IN osm_state_mgr_t * const p_mgr ) > > { > > memset( p_mgr, 0, sizeof( *p_mgr ) ); > > - cl_spinlock_construct( &p_mgr->state_lock ); > > cl_spinlock_construct( &p_mgr->idle_lock ); > > p_mgr->state = OSM_SM_STATE_INIT; > > } > > @@ -99,7 +98,6 @@ osm_state_mgr_destroy( > > OSM_LOG_ENTER( p_mgr->p_log, osm_state_mgr_destroy ); > > > > /* destroy the locks */ > > - cl_spinlock_destroy( &p_mgr->state_lock ); > > cl_spinlock_destroy( &p_mgr->idle_lock ); > > > > OSM_LOG_EXIT( p_mgr->p_log ); > > @@ -162,14 +160,6 @@ osm_state_mgr_init( > > p_mgr->state_step_mode = OSM_STATE_STEP_CONTINUOUS; > > p_mgr->next_stage_signal = OSM_SIGNAL_NONE; > > > > - status = cl_spinlock_init( &p_mgr->state_lock ); > > - if( status != CL_SUCCESS ) > > - { > > - osm_log( p_mgr->p_log, OSM_LOG_ERROR, > > - "osm_state_mgr_init: ERR 3301: " > > - "Spinlock init failed (%s)\n", CL_STATUS_MSG( status ) > ); > > - } > > - > > cl_qlist_init( &p_mgr->idle_time_list ); > > > > status = cl_spinlock_init( &p_mgr->idle_lock ); > > @@ -1897,16 +1887,6 @@ osm_state_mgr_process( > > if( osm_exit_flag ) > > signal = OSM_SIGNAL_NONE; > > > > - /* > > - * The state lock prevents many race conditions from screwing > > - * up the state transition process. For example, if an function > > - * puts transactions on the wire, the state lock guarantees this > > - * loop will see the return code ("DONE PENDING") of the function > > - * before the "NO OUTSTANDING TRANSACTIONS" signal is > asynchronously > > - * received. > > - */ > > - cl_spinlock_acquire( &p_mgr->state_lock ); > > - > > while( signal != OSM_SIGNAL_NONE ) > > { > > if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) > > @@ -2957,8 +2937,6 @@ osm_state_mgr_process( > > p_mgr->state_step_mode = OSM_STATE_STEP_BREAK; > > } > > > > - cl_spinlock_release( &p_mgr->state_lock ); > > - > > OSM_LOG_EXIT( p_mgr->p_log ); > > } > > > > @@ -2994,7 +2972,7 @@ osm_state_mgr_process_idle( > > cl_qlist_insert_tail( &p_mgr->idle_time_list, > &p_idle_item->list_item ); > > cl_spinlock_release( &p_mgr->idle_lock ); > > > > - osm_state_mgr_process( p_mgr, OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST > > ); > > + osm_sm_signal( &p_mgr->p_subn->p_osm->sm, > > OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST ); > > > > OSM_LOG_EXIT( p_mgr->p_log ); > > > > diff --git a/osm/opensm/osm_state_mgr_ctrl.c > b/osm/opensm/osm_state_mgr_ctrl.c > > deleted file mode 100644 > > index 0bde333..0000000 > > --- a/osm/opensm/osm_state_mgr_ctrl.c > > +++ /dev/null > > @@ -1,132 +0,0 @@ > > -/* > > - * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. > > - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights > reserved. > > - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. > > - * > > - * This software is available to you under a choice of one of two > > - * licenses. You may choose to be licensed under the terms of the > GNU > > - * General Public License (GPL) Version 2, available from the file > > - * COPYING in the main directory of this source tree, or the > > - * OpenIB.org BSD license below: > > - * > > - * Redistribution and use in source and binary forms, with or > > - * without modification, are permitted provided that the > following > > - * conditions are met: > > - * > > - * - Redistributions of source code must retain the above > > - * copyright notice, this list of conditions and the following > > - * disclaimer. > > - * > > - * - Redistributions in binary form must reproduce the above > > - * copyright notice, this list of conditions and the following > > - * disclaimer in the documentation and/or other materials > > - * provided with the distribution. > > - * > > - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > > - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE > > WARRANTIES OF > > - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT > > HOLDERS > > - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN > > AN > > - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR > > IN > > - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > > THE > > - * SOFTWARE. > > - * > > - * $Id$ > > - */ > > - > > - > > -/* > > - * Abstract: > > - * Implementation of osm_state_mgr_ctrl_t. > > - * This object represents the State Manager Controller object. > > - * This object is part of the opensm family of objects. > > - * > > - * Environment: > > - * Linux User Mode > > - * > > - * $Revision: 1.5 $ > > - */ > > - > > -/* > > - Next available error code: 0x1601 > > -*/ > > - > > -#if HAVE_CONFIG_H > > -# include > > -#endif /* HAVE_CONFIG_H */ > > - > > -#include > > -#include > > -#include > > - > > > -/********************************************************************** > > - > **********************************************************************/ > > -void > > -__osm_state_mgr_ctrl_disp_callback( > > - IN void *context, > > - IN void *p_data ) > > -{ > > - /* ignore return status when invoked via the dispatcher */ > > - osm_state_mgr_process( ((osm_state_mgr_ctrl_t*)context)->p_mgr, > > - (osm_signal_t)(p_data) ); > > -} > > - > > > -/********************************************************************** > > - > **********************************************************************/ > > -void > > -osm_state_mgr_ctrl_construct( > > - IN osm_state_mgr_ctrl_t* const p_ctrl ) > > -{ > > - memset( p_ctrl, 0, sizeof(*p_ctrl) ); > > - p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; > > -} > > - > > > -/********************************************************************** > > - > **********************************************************************/ > > -void > > -osm_state_mgr_ctrl_destroy( > > - IN osm_state_mgr_ctrl_t* const p_ctrl ) > > -{ > > - CL_ASSERT( p_ctrl ); > > - cl_disp_unregister( p_ctrl->h_disp ); > > -} > > - > > > -/********************************************************************** > > - > **********************************************************************/ > > -ib_api_status_t > > -osm_state_mgr_ctrl_init( > > - IN osm_state_mgr_ctrl_t* const p_ctrl, > > - IN osm_state_mgr_t* const p_mgr, > > - IN osm_log_t* const p_log, > > - IN cl_dispatcher_t* const p_disp ) > > -{ > > - ib_api_status_t status = IB_SUCCESS; > > - > > - OSM_LOG_ENTER( p_log, osm_state_mgr_ctrl_init ); > > - > > - osm_state_mgr_ctrl_construct( p_ctrl ); > > - p_ctrl->p_log = p_log; > > - > > - p_ctrl->p_mgr = p_mgr; > > - p_ctrl->p_disp = p_disp; > > - > > - p_ctrl->h_disp = cl_disp_register( > > - p_disp, > > - OSM_MSG_NO_SMPS_OUTSTANDING, > > - __osm_state_mgr_ctrl_disp_callback, > > - p_ctrl ); > > - > > - if( p_ctrl->h_disp == CL_DISP_INVALID_HANDLE ) > > - { > > - osm_log( p_log, OSM_LOG_ERROR, > > - "osm_state_mgr_ctrl_init: ERR 3401: " > > - "Dispatcher registration failed\n" ); > > - status = IB_INSUFFICIENT_RESOURCES; > > - goto Exit; > > - } > > - > > - Exit: > > - OSM_LOG_EXIT( p_log ); > > - return( status ); > > -} > > - > > diff --git a/osm/opensm/osm_sw_info_rcv.c > b/osm/opensm/osm_sw_info_rcv.c > > index 6bbd73a..61aff27 100644 > > --- a/osm/opensm/osm_sw_info_rcv.c > > +++ b/osm/opensm/osm_sw_info_rcv.c > > @@ -60,6 +60,7 @@ #include > > #include > > #include > > #include > > +#include > > #include > > > > > /********************************************************************** > > @@ -673,7 +674,7 @@ osm_si_rcv_process( > > if (__osm_si_rcv_process_existing( p_rcv, p_node, p_sw, p_madw > )) > > { > > CL_PLOCK_RELEASE( p_rcv->p_lock ); > > - osm_state_mgr_process( p_rcv->p_state_mgr, > > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, > > OSM_SIGNAL_CHANGE_DETECTED ); > > goto Exit; > > } > > diff --git a/osm/opensm/osm_sweep_fail_ctrl.c > b/osm/opensm/osm_sweep_fail_ctrl.c > > index e27a540..9e41ec7 100644 > > --- a/osm/opensm/osm_sweep_fail_ctrl.c > > +++ b/osm/opensm/osm_sweep_fail_ctrl.c > > @@ -52,6 +52,8 @@ #endif /* HAVE_CONFIG_H */ > > #include > > #include > > #include > > +#include > > +#include > > > > > /********************************************************************** > > > **********************************************************************/ > > @@ -68,7 +70,7 @@ __osm_sweep_fail_ctrl_disp_callback( > > /* > > Notify the state manager that we had a light sweep failure. > > */ > > - osm_state_mgr_process( p_ctrl->p_state_mgr, > > + osm_sm_signal( &p_ctrl->p_state_mgr->p_subn->p_osm->sm, > > OSM_SIGNAL_LIGHT_SWEEP_FAIL ); > > > > OSM_LOG_EXIT( p_ctrl->p_log ); > > diff --git a/osm/opensm/osm_trap_rcv.c b/osm/opensm/osm_trap_rcv.c > > index 9865f53..fb32ce9 100644 > > --- a/osm/opensm/osm_trap_rcv.c > > +++ b/osm/opensm/osm_trap_rcv.c > > @@ -589,8 +589,7 @@ __osm_trap_rcv_process_request( > > > > p_rcv->p_subn->force_immediate_heavy_sweep = TRUE; > > } > > - osm_state_mgr_process( p_rcv->p_state_mgr, > > - OSM_SIGNAL_SWEEP ); > > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, OSM_SIGNAL_SWEEP ); > > } > > > > /* If we reached here due to trap 129/130/131 - do not need to do > > diff --git a/osm/opensm/osm_vl15intf.c b/osm/opensm/osm_vl15intf.c > > index 68f17c5..c3adb6e 100644 > > --- a/osm/opensm/osm_vl15intf.c > > +++ b/osm/opensm/osm_vl15intf.c > > @@ -62,6 +62,7 @@ #include > > #include > > #include > > #include > > +#include > > #include > > #include > > > > @@ -156,7 +157,6 @@ __osm_vl15_poller( > > if( status != IB_SUCCESS ) > > { > > uint32_t outstanding; > > - cl_status_t cl_status; > > > > osm_log( p_vl->p_log, OSM_LOG_ERROR, > > "__osm_vl15_poller: ERR 3E03: " > > @@ -202,27 +202,8 @@ __osm_vl15_poller( > > The wire is clean. > > Signal the state manager. > > */ > > - if( osm_log_is_active( p_vl->p_log, OSM_LOG_DEBUG ) ) > > - { > > - osm_log( p_vl->p_log, OSM_LOG_DEBUG, > > - "__osm_vl15_poller: " > > - "Posting Dispatcher message %s\n", > > - osm_get_disp_msg_str( > OSM_MSG_NO_SMPS_OUTSTANDING ) ); > > - } > > - > > - cl_status = cl_disp_post( p_vl->h_disp, > > - OSM_MSG_NO_SMPS_OUTSTANDING, > > - (void > *)OSM_SIGNAL_NO_PENDING_TRANSACTIONS, > > - NULL, > > - NULL ); > > - > > - if( cl_status != CL_SUCCESS ) > > - { > > - osm_log( p_vl->p_log, OSM_LOG_ERROR, > > - "__osm_vl15_poller: ERR 3E06: " > > - "Dispatcher post message failed (%s)\n", > > - CL_STATUS_MSG( cl_status ) ); > > - } > > + osm_sm_signal( &p_vl->p_subn->p_osm->sm, > > + OSM_SIGNAL_NO_PENDING_TRANSACTIONS ); > > } > > } > > } From mst at mellanox.co.il Mon May 22 01:30:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 22 May 2006 11:30:54 +0300 Subject: [openib-general] [PATCH] cma: fix bind to ip In-Reply-To: References: <20060518133627.GW30211@mellanox.co.il> <446CA5DE.1040600@ichips.intel.com> Message-ID: <20060522083054.GF15161@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [openib-general] [PATCH] cma: fix bind to ip > > Sean> Thanks! Committed with only minor adjustment to spacing. - > > Should I add that commit to what I have queued for 2.6.18? > I'd like very much for the following two patches to be merged. While SDP isn't likely to be in kernel 2.6.18, I think its very confusing to have CMA API in-kernel and on trunk to have slightly different semantics. r7339 | sean.hefty | 2006-05-18 20:02:08 +0300 (Thu, 18 May 2006) | 8 lines Fix private data format for bind to specific IP for SDP. Further, CMA format mask for IPv6 was set incorrectly (hint - memset(foo, 1, bar) does not set memory to all ones) so fix that. Signed-off-by: Ali Ayoub Signed-off-by: Michael S. Tsirkin Signed-off-by: Sean Hefty ------------------------------------------------------------------------ r7238 | sean.hefty | 2006-05-16 19:21:03 +0300 (Tue, 16 May 2006) | 5 lines Expose CONNECT_RESPONSE message to SDP clients, while still performing the QP transitions. Signed-off-by: Sean Hefty -- MST From k_mahesh85 at yahoo.co.in Mon May 22 02:08:04 2006 From: k_mahesh85 at yahoo.co.in (keshetti mahesh) Date: Mon, 22 May 2006 10:08:04 +0100 (BST) Subject: [openib-general] RDMA kernel utilities-newbie Message-ID: <20060522090804.53858.qmail@web8325.mail.in.yahoo.com> i need to develop a kernel utility capable of RDMA read/write operations i have seen example utilities under "svn/gen2/utils/src/linux-kernel/infiniband/util/" tree. where can i find the documentation related to them? regards K.Mahesh --------------------------------- Do you have a question on a topic you cant find an Answer to. Try Yahoo! Answers India Get the all new Yahoo! Messenger Beta Now -------------- next part -------------- An HTML attachment was scrubbed... URL: From sashak at voltaire.com Mon May 22 02:14:09 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 22 May 2006 12:14:09 +0300 Subject: [openib-general] Re: [PATCH] opensm: remove osm_pkey_mgr.h In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBCA@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBCA@mtlexch01.mtl.com> Message-ID: <20060522091409.GC30176@sashak.voltaire.com> Hi Eitan, On 09:27 Mon 22 May , Eitan Zahavi wrote: > > Every OpenSM manager has an H file. Even if it is not necessary? Why? > Instead of trying to "save" lines of code - please focus on improving > code readability and structure. This is exactly my point - to improve code readability and structure (and obviously this is not last change needed in this area). And I cannot see how huge amount of duplicated code and useless structures may help there. Instead I can see how it hurts. > NOTE: this is not kernel code. The tradeoff for user land code is > different. Yes, this is not kernel code, but I don't see your point. (don't think that you are referring kernel sources as example of unreadable and bad structured code :)) Probably you see this patch as "extremal optimization"? I see it as cleanup and structure improvement. Sasha > If you save us one header file - but break the structure of the code you > make more damage than good. > > Eitan Zahavi > Senior Engineering Director, Software Architect > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > Sent: Monday, May 22, 2006 1:16 AM > > To: Hal Rosenstock > > Cc: openib-general at openib.org; Eitan Zahavi; Yael Kalka; Ofer Gigi > > Subject: [PATCH] opensm: remove osm_pkey_mgr.h > > > > > > Since we expect that osm_pkey_mgr_process() will be called only from > > osm_state_mgr_process() this patch replaces osm_pkey_mgr.h header file > > by local prototype. > > > > Signed-off-by: Sasha Khapyorsky > > --- > > > > osm/include/Makefile.am | 1 > > osm/include/opensm/osm_pkey_mgr.h | 92 > ------------------------------------- > > osm/opensm/osm_pkey_mgr.c | 1 > > osm/opensm/osm_state_mgr.c | 3 + > > 4 files changed, 2 insertions(+), 95 deletions(-) > > > > diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am > > index b23b1de..2bee762 100644 > > --- a/osm/include/Makefile.am > > +++ b/osm/include/Makefile.am > > @@ -96,7 +96,6 @@ EXTRA_DIST = \ > > $(srcdir)/opensm/st.h \ > > $(srcdir)/opensm/osm_mcast_tbl.h \ > > $(srcdir)/opensm/osm_pkey.h \ > > - $(srcdir)/opensm/osm_pkey_mgr.h \ > > $(srcdir)/opensm/osm_sa_mad_ctrl.h \ > > $(srcdir)/opensm/osm_req_ctrl.h \ > > $(srcdir)/opensm/osm_sw_info_rcv.h \ > > diff --git a/osm/include/opensm/osm_pkey_mgr.h > > b/osm/include/opensm/osm_pkey_mgr.h > > deleted file mode 100644 > > index cb0075d..0000000 > > --- a/osm/include/opensm/osm_pkey_mgr.h > > +++ /dev/null > > @@ -1,92 +0,0 @@ > > -/* > > - * Copyright (c) 2006 Voltaire, Inc. All rights reserved. > > - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights > reserved. > > - * > > - * This software is available to you under a choice of one of two > > - * licenses. You may choose to be licensed under the terms of the > GNU > > - * General Public License (GPL) Version 2, available from the file > > - * COPYING in the main directory of this source tree, or the > > - * OpenIB.org BSD license below: > > - * > > - * Redistribution and use in source and binary forms, with or > > - * without modification, are permitted provided that the > following > > - * conditions are met: > > - * > > - * - Redistributions of source code must retain the above > > - * copyright notice, this list of conditions and the following > > - * disclaimer. > > - * > > - * - Redistributions in binary form must reproduce the above > > - * copyright notice, this list of conditions and the following > > - * disclaimer in the documentation and/or other materials > > - * provided with the distribution. > > - * > > - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > > - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE > > WARRANTIES OF > > - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT > > HOLDERS > > - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN > > AN > > - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR > > IN > > - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > > THE > > - * SOFTWARE. > > - * > > - * $Id$ > > - */ > > - > > - > > -/* > > - * Abstract: > > - * Prototype for osm_pkey_mgr_process() function > > - * This is part of the OpenSM family of objects. > > - * > > - * Environment: > > - * Linux User Mode > > - * > > - * $Revision: 1.4 $ > > - */ > > - > > - > > -#ifndef _OSM_PKEY_MGR_H_ > > -#define _OSM_PKEY_MGR_H_ > > - > > -#include > > -#include > > - > > -#ifdef __cplusplus > > -# define BEGIN_C_DECLS extern "C" { > > -# define END_C_DECLS } > > -#else /* !__cplusplus */ > > -# define BEGIN_C_DECLS > > -# define END_C_DECLS > > -#endif /* __cplusplus */ > > - > > -BEGIN_C_DECLS > > - > > -/****f* OpenSM: P_Key Manager/osm_pkey_mgr_process > > -* NAME > > -* osm_pkey_mgr_process > > -* > > -* DESCRIPTION > > -* This function enforces the pkey rules on the SM DB. > > -* > > -* SYNOPSIS > > -*/ > > -osm_signal_t > > -osm_pkey_mgr_process( > > - IN osm_opensm_t *p_osm ); > > -/* > > -* PARAMETERS > > -* p_osm > > -* [in] Pointer to an osm_opensm_t object. > > -* > > -* RETURN VALUES > > -* None > > -* > > -* NOTES > > -* > > -* SEE ALSO > > -*********/ > > - > > -END_C_DECLS > > - > > -#endif /* _OSM_PKEY_MGR_H_ */ > > diff --git a/osm/opensm/osm_pkey_mgr.c b/osm/opensm/osm_pkey_mgr.c > > index e08b7cc..91c1a95 100644 > > --- a/osm/opensm/osm_pkey_mgr.c > > +++ b/osm/opensm/osm_pkey_mgr.c > > @@ -56,7 +56,6 @@ #include > > #include > > #include > > #include > > -#include > > #include > > #include > > > > diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c > > index 42fd5e8..724b2b7 100644 > > --- a/osm/opensm/osm_state_mgr.c > > +++ b/osm/opensm/osm_state_mgr.c > > @@ -66,14 +66,15 @@ #include > > #include > > #include > > #include > > -#include > > #include > > #include > > #include > > > > > /********************************************************************** > > + * Prototypes for manager processors used locally > > > **********************************************************************/ > > osm_signal_t osm_qos_setup(IN osm_opensm_t * p_osm); > > +osm_signal_t osm_pkey_mgr_process(IN osm_opensm_t * p_osm); > > > > > /********************************************************************** > > > **********************************************************************/ From katiyar.mohit at gmail.com Mon May 22 02:41:14 2006 From: katiyar.mohit at gmail.com (Mohit Katiyar) Date: Mon, 22 May 2006 18:41:14 +0900 Subject: [openib-general] iSCSI host adapter entries Message-ID: <46465bb30605220241u28eb958ayd094d3a6bc4174a3@mail.gmail.com> Hi all, I have a query regarding the Host adapters. Currently I am using SLES 9 on x86_64 machine and I have an InfiniBand Host adapter. The entry in the sys directory for the for the host is as follows optgfs:~ # ll /sys/class/scsi_host/iscsi/* -r--r--r-- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/cmd_per_lun --w------- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/connfailtimeout lrwxrwxrwx 1 root root 0 May 22 13:31 /sys/class/scsi_host/iscsi/device -> ../../../devices/platform/iscsi --w------- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/diskcommandtimeout -r--r--r-- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/host_busy -r--r--r-- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/host_no --w------- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/log -rw-r--r-- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/no_partition_check -r--r--r-- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/proc_name --w------- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/scan -r--r--r-- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/sg_tablesize --w------- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/shutdown -r--r--r-- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/unchecked_isa_dma -r--r--r-- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/unique_id I want to know if I had two IB host adapters(or any other iscsi adapter) on the same machine then what would be the pattern of their entries in /sys/class/scsi_host/iscsi directory? Thanks Mohit katiyar From eitan at mellanox.co.il Mon May 22 03:55:19 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 22 May 2006 13:55:19 +0300 Subject: [openib-general] RE: [PATCH] opensm: remove osm_pkey_mgr.h Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBD3@mtlexch01.mtl.com> Hi Sasha, My point is simple: OpenSM has a very structured skeleton: 1. All mad receivers have two c files and two h files 1.1 mad receive controller which deals with dispatcher registration . 1.2 Mad receiver which deals with all the action happening after such a mad is received 2. All algorithm stages (managers) have a c file and h file An algorithm stage might be lid assignment, routing, partition enforcement etc 3. All SMDB objects have a c file and h files. Examples are Nodes, Ports, Multicast registrations etc These are the structural code rules for OpenSM. Even if you think it is better to merge functionality and avoid having some of these h files and you might even be able to save some lines of code by doing that, you break the code structure. If you personally like to work with flat code - this is your preference. I prefer having clear structure. So if I need to know where is the pkey manager object defined? What are its internal state variables? What are the algorithms it uses? Etc I can simply open up osm_pkey_mgr.h and find this out. If you want to redesign OpenSM structure to be "simpler" or more "effective" you can propose doing that. But doing it in the salami way is just going to hurt stability and leave us with no structure at all. Unless we are willing to re-architect the code in a clean manner and spend the years of development and validation for these changes - lets keep the code with clear structure. Eitan Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Monday, May 22, 2006 12:14 PM > To: Eitan Zahavi > Cc: Hal Rosenstock; openib-general at openib.org; Yael Kalka; Ofer Gigi > Subject: Re: [PATCH] opensm: remove osm_pkey_mgr.h > > Hi Eitan, > > On 09:27 Mon 22 May , Eitan Zahavi wrote: > > > > Every OpenSM manager has an H file. > > Even if it is not necessary? Why? > > > Instead of trying to "save" lines of code - please focus on improving > > code readability and structure. > > This is exactly my point - to improve code readability and structure > (and obviously this is not last change needed in this area). And I > cannot see how huge amount of duplicated code and useless structures > may help there. Instead I can see how it hurts. > > > NOTE: this is not kernel code. The tradeoff for user land code is > > different. > > Yes, this is not kernel code, but I don't see your point. (don't think > that you are referring kernel sources as example of unreadable and bad > structured code :)) > > Probably you see this patch as "extremal optimization"? I see it as > cleanup and structure improvement. > > Sasha > > > If you save us one header file - but break the structure of the code you > > make more damage than good. > > > > Eitan Zahavi > > Senior Engineering Director, Software Architect > > Mellanox Technologies LTD > > Tel:+972-4-9097208 > > Fax:+972-4-9593245 > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > -----Original Message----- > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > > Sent: Monday, May 22, 2006 1:16 AM > > > To: Hal Rosenstock > > > Cc: openib-general at openib.org; Eitan Zahavi; Yael Kalka; Ofer Gigi > > > Subject: [PATCH] opensm: remove osm_pkey_mgr.h > > > > > > > > > Since we expect that osm_pkey_mgr_process() will be called only from > > > osm_state_mgr_process() this patch replaces osm_pkey_mgr.h header file > > > by local prototype. > > > > > > Signed-off-by: Sasha Khapyorsky > > > --- > > > > > > osm/include/Makefile.am | 1 > > > osm/include/opensm/osm_pkey_mgr.h | 92 > > ------------------------------------- > > > osm/opensm/osm_pkey_mgr.c | 1 > > > osm/opensm/osm_state_mgr.c | 3 + > > > 4 files changed, 2 insertions(+), 95 deletions(-) > > > > > > diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am > > > index b23b1de..2bee762 100644 > > > --- a/osm/include/Makefile.am > > > +++ b/osm/include/Makefile.am > > > @@ -96,7 +96,6 @@ EXTRA_DIST = \ > > > $(srcdir)/opensm/st.h \ > > > $(srcdir)/opensm/osm_mcast_tbl.h \ > > > $(srcdir)/opensm/osm_pkey.h \ > > > - $(srcdir)/opensm/osm_pkey_mgr.h \ > > > $(srcdir)/opensm/osm_sa_mad_ctrl.h \ > > > $(srcdir)/opensm/osm_req_ctrl.h \ > > > $(srcdir)/opensm/osm_sw_info_rcv.h \ > > > diff --git a/osm/include/opensm/osm_pkey_mgr.h > > > b/osm/include/opensm/osm_pkey_mgr.h > > > deleted file mode 100644 > > > index cb0075d..0000000 > > > --- a/osm/include/opensm/osm_pkey_mgr.h > > > +++ /dev/null > > > @@ -1,92 +0,0 @@ > > > -/* > > > - * Copyright (c) 2006 Voltaire, Inc. All rights reserved. > > > - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights > > reserved. > > > - * > > > - * This software is available to you under a choice of one of two > > > - * licenses. You may choose to be licensed under the terms of the > > GNU > > > - * General Public License (GPL) Version 2, available from the file > > > - * COPYING in the main directory of this source tree, or the > > > - * OpenIB.org BSD license below: > > > - * > > > - * Redistribution and use in source and binary forms, with or > > > - * without modification, are permitted provided that the > > following > > > - * conditions are met: > > > - * > > > - * - Redistributions of source code must retain the above > > > - * copyright notice, this list of conditions and the following > > > - * disclaimer. > > > - * > > > - * - Redistributions in binary form must reproduce the above > > > - * copyright notice, this list of conditions and the following > > > - * disclaimer in the documentation and/or other materials > > > - * provided with the distribution. > > > - * > > > - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY > KIND, > > > - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE > > > WARRANTIES OF > > > - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > > - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR > COPYRIGHT > > > HOLDERS > > > - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER > IN > > > AN > > > - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF > OR > > > IN > > > - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS > IN > > > THE > > > - * SOFTWARE. > > > - * > > > - * $Id$ > > > - */ > > > - > > > - > > > -/* > > > - * Abstract: > > > - * Prototype for osm_pkey_mgr_process() function > > > - * This is part of the OpenSM family of objects. > > > - * > > > - * Environment: > > > - * Linux User Mode > > > - * > > > - * $Revision: 1.4 $ > > > - */ > > > - > > > - > > > -#ifndef _OSM_PKEY_MGR_H_ > > > -#define _OSM_PKEY_MGR_H_ > > > - > > > -#include > > > -#include > > > - > > > -#ifdef __cplusplus > > > -# define BEGIN_C_DECLS extern "C" { > > > -# define END_C_DECLS } > > > -#else /* !__cplusplus */ > > > -# define BEGIN_C_DECLS > > > -# define END_C_DECLS > > > -#endif /* __cplusplus */ > > > - > > > -BEGIN_C_DECLS > > > - > > > -/****f* OpenSM: P_Key Manager/osm_pkey_mgr_process > > > -* NAME > > > -* osm_pkey_mgr_process > > > -* > > > -* DESCRIPTION > > > -* This function enforces the pkey rules on the SM DB. > > > -* > > > -* SYNOPSIS > > > -*/ > > > -osm_signal_t > > > -osm_pkey_mgr_process( > > > - IN osm_opensm_t *p_osm ); > > > -/* > > > -* PARAMETERS > > > -* p_osm > > > -* [in] Pointer to an osm_opensm_t object. > > > -* > > > -* RETURN VALUES > > > -* None > > > -* > > > -* NOTES > > > -* > > > -* SEE ALSO > > > -*********/ > > > - > > > -END_C_DECLS > > > - > > > -#endif /* _OSM_PKEY_MGR_H_ */ > > > diff --git a/osm/opensm/osm_pkey_mgr.c b/osm/opensm/osm_pkey_mgr.c > > > index e08b7cc..91c1a95 100644 > > > --- a/osm/opensm/osm_pkey_mgr.c > > > +++ b/osm/opensm/osm_pkey_mgr.c > > > @@ -56,7 +56,6 @@ #include > > > #include > > > #include > > > #include > > > -#include > > > #include > > > #include > > > > > > diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c > > > index 42fd5e8..724b2b7 100644 > > > --- a/osm/opensm/osm_state_mgr.c > > > +++ b/osm/opensm/osm_state_mgr.c > > > @@ -66,14 +66,15 @@ #include > > > #include > > > #include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > > > > > > /********************************************************************** > > > + * Prototypes for manager processors used locally > > > > > **********************************************************************/ > > > osm_signal_t osm_qos_setup(IN osm_opensm_t * p_osm); > > > +osm_signal_t osm_pkey_mgr_process(IN osm_opensm_t * p_osm); > > > > > > > > /********************************************************************** > > > > > **********************************************************************/ From eitan at mellanox.co.il Mon May 22 04:44:21 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 22 May 2006 14:44:21 +0300 Subject: [openib-general] RE: [PATCH] RFC: opensm: serialize osm_state_mgr_process() Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBD4@mtlexch01.mtl.com> Hi Sasha, > > > > The idea to use pthread in OpenSM code is totally wrong. > > Please stop doing this. We want this code to be shared with Windows and > > this breaks it. > > The problem is that complib does not provide needed primitives, mostly > in synchronization and thread management areas (like pthread_cond_wait() > and friends, pthread_cancel() and friends). [EZ] The problem is that any code you write in OpenSM should be portable to existing complib such that it would be portable to windows. If some thread interface is missing (I am surprised we could get so far without that primitive) - please add to complib and also make sure Windows native threads can support that primitive. > > The option to extend complib also does not look very helpful for Windows > too - complib uses pthreads as backend anyway. [EZ] No the windows complib does not use pthread at all. > > Is using of pthread library with Windows may solve sharing issue? [EZ] It might if you are willing to go through this un-needed exercise ... We have a complib in windows that is well tested and functional today. I do not see why you need to re-write it. > > Other option I may think about is to use pthread wrapper (in the same > way as it is done today with complib). [EZ] This does not solve the need to implement the same functionality in windows. And since I do not want to re-implement complib for windows - I prefer sticking with complib wrappers. > > > > > Also please provide a clear RFC for what this patch is trying to do. > > Basically this serializes execution of osm_state_mgr_process(), so > instead of to be directly called via dispatcher's callback (with possible > waiting for the lock), state_manager will be signaled (and wakeuped if > necessary), as result we don't need big state_lock anymore, mutex is > needed only to protect signal_mask (opensm's osm_signal_t, not *nix > signals) update. [EZ] I do not see what problem you are trying to solve? What is wrong with the current implementation? Are you fixing a bug? The dispatching mechanism used allows passing the state manager signals which must be processed in order. The state_lock guarantees this order. > > Sasha > > > Eitan Zahavi > > Senior Engineering Director, Software Architect > > Mellanox Technologies LTD > > Tel:+972-4-9097208 > > Fax:+972-4-9593245 > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > -----Original Message----- > > > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > > > Sent: Monday, May 22, 2006 2:02 AM > > > To: Hal Rosenstock; openib-general at openib.org > > > Cc: Eitan Zahavi; Yael Kalka; Ofer Gigi; sashak at voltaire.com; > > elid at voltaire.com > > > Subject: [PATCH] RFC: opensm: serialize osm_state_mgr_process() > > > > > > Hello, > > > > > > Please comment (and test). > > > > > > Thanks, > > > Sasha. > > > > > > This serializes execution of osm_state_mgr_process() and removes the > > big > > > state_lock. This should reduce "locked state" time and prevent > > potential > > > dispatcher blocking. > > > > > > Other important change here is direct usage of pthread primitives > > > instead of "traditional" cl_thread* stuff. > > > > > > Signed-off-by: Sasha Khapyorsky > > > --- > > > > > > osm/include/Makefile.am | 1 > > > osm/include/opensm/osm_msgdef.h | 15 -- > > > osm/include/opensm/osm_sm.h | 36 ++++- > > > osm/include/opensm/osm_state_mgr.h | 4 - > > > osm/include/opensm/osm_state_mgr_ctrl.h | 236 > > ------------------------------- > > > osm/opensm/Makefile.am | 5 - > > > osm/opensm/osm_helper.c | 1 > > > osm/opensm/osm_node_info_rcv.c | 5 - > > > osm/opensm/osm_port_info_rcv.c | 3 > > > osm/opensm/osm_sm.c | 160 +++++++++++---------- > > > osm/opensm/osm_sm_mad_ctrl.c | 26 --- > > > osm/opensm/osm_sm_state_mgr.c | 8 + > > > osm/opensm/osm_sminfo_rcv.c | 8 + > > > osm/opensm/osm_state_mgr.c | 24 --- > > > osm/opensm/osm_state_mgr_ctrl.c | 132 ----------------- > > > osm/opensm/osm_sw_info_rcv.c | 3 > > > osm/opensm/osm_sweep_fail_ctrl.c | 4 - > > > osm/opensm/osm_trap_rcv.c | 3 > > > osm/opensm/osm_vl15intf.c | 25 --- > > > 19 files changed, 143 insertions(+), 556 deletions(-) > > > > > > diff --git a/osm/include/Makefile.am b/osm/include/Makefile.am > > > index 2bee762..0af78a0 100644 > > > --- a/osm/include/Makefile.am > > > +++ b/osm/include/Makefile.am > > > @@ -120,7 +120,6 @@ EXTRA_DIST = \ > > > $(srcdir)/opensm/osm_vl15intf.h \ > > > $(srcdir)/opensm/osm_drop_mgr.h \ > > > $(srcdir)/opensm/osm_port_info_rcv.h \ > > > - $(srcdir)/opensm/osm_state_mgr_ctrl.h \ > > > $(srcdir)/complib/cl_thread_osd.h \ > > > $(srcdir)/complib/cl_packon.h \ > > > $(srcdir)/complib/cl_atomic_osd.h \ > > > diff --git a/osm/include/opensm/osm_msgdef.h > > b/osm/include/opensm/osm_msgdef.h > > > index a1b5743..6956c86 100644 > > > --- a/osm/include/opensm/osm_msgdef.h > > > +++ b/osm/include/opensm/osm_msgdef.h > > > @@ -148,20 +148,6 @@ BEGIN_C_DECLS > > > * > > > * SOURCE > > > ***********/ > > > -/****d* OpenSM: Dispatcher Messages/OSM_MSG_NO_SMPS_OUTSTANDING > > > -* NAME > > > -* OSM_MSG_NO_SMPS_OUTSTANDING > > > -* > > > -* DESCRIPTION > > > -* Message indicating that there are no outstanding SMPs on the > > subnet. > > > -* > > > -* NOTES > > > -* Sent by: osm_mad_ctrl_t > > > -* Received by: osm_state_mgr_ctrl_t > > > -* Delivery notice: no > > > -* > > > -* SOURCE > > > -***********/ > > > enum > > > { > > > OSM_MSG_REQ = 0, > > > @@ -169,7 +155,6 @@ enum > > > OSM_MSG_MAD_PORT_INFO, > > > OSM_MSG_MAD_SWITCH_INFO, > > > OSM_MSG_MAD_NODE_DESC, > > > - OSM_MSG_NO_SMPS_OUTSTANDING, > > > OSM_MSG_MAD_NODE_RECORD, > > > OSM_MSG_MAD_PORTINFO_RECORD, > > > OSM_MSG_MAD_SERVICE_RECORD, > > > diff --git a/osm/include/opensm/osm_sm.h b/osm/include/opensm/osm_sm.h > > > index d6086d4..def43c8 100644 > > > --- a/osm/include/opensm/osm_sm.h > > > +++ b/osm/include/opensm/osm_sm.h > > > @@ -69,7 +69,6 @@ #include > > #include > > > #include > > > #include > > > -#include > > > #include > > > #include > > > #include > > > @@ -131,7 +130,6 @@ BEGIN_C_DECLS > > > typedef struct _osm_sm > > > { > > > osm_thread_state_t thread_state; > > > - cl_event_t signal; > > > cl_event_t subnet_up_event; > > > cl_thread_t sweeper; > > > osm_subn_t *p_subn; > > > @@ -143,6 +141,9 @@ typedef struct _osm_sm > > > cl_dispatcher_t *p_disp; > > > cl_plock_t *p_lock; > > > atomic32_t sm_trans_id; > > > + unsigned signal_mask; > > > + pthread_mutex_t mutex; > > > + pthread_cond_t cond; > > > osm_req_t req; > > > osm_req_ctrl_t req_ctrl; > > > osm_resp_t resp; > > > @@ -155,7 +156,6 @@ typedef struct _osm_sm > > > osm_sm_mad_ctrl_t mad_ctrl; > > > osm_si_rcv_t si_rcv; > > > osm_si_rcv_ctrl_t si_rcv_ctrl; > > > - osm_state_mgr_ctrl_t state_mgr_ctrl; > > > osm_lid_mgr_t lid_mgr; > > > osm_ucast_mgr_t ucast_mgr; > > > osm_link_mgr_t link_mgr; > > > @@ -387,6 +387,33 @@ osm_sm_init( > > > * SM object, osm_sm_construct, osm_sm_destroy > > > *********/ > > > > > > +/****f* OpenSM: SM/osm_sm_signal > > > +* NAME > > > +* osm_sm_signal > > > +* > > > +* DESCRIPTION > > > +* Forward signal to state engine > > > +* > > > +* SYNOPSIS > > > +*/ > > > +void > > > +osm_sm_signal( > > > + IN osm_sm_t* const p_sm, > > > + IN osm_signal_t signal ); > > > +/* > > > +* PARAMETERS > > > +* p_sm > > > +* [in] Pointer to an osm_sm_t object. > > > +* > > > +* signal > > > +* [in] Signal to the state engine. > > > +* > > > +* NOTES > > > +* > > > +* SEE ALSO > > > +* SM object > > > +*********/ > > > + > > > /****f* OpenSM: SM/osm_sm_sweep > > > * NAME > > > * osm_sm_sweep > > > @@ -404,9 +431,6 @@ osm_sm_sweep( > > > * p_sm > > > * [in] Pointer to an osm_sm_t object. > > > * > > > -* RETURN VALUES > > > -* IB_SUCCESS if the sweep completed successfully. > > > -* > > > * NOTES > > > * > > > * SEE ALSO > > > diff --git a/osm/include/opensm/osm_state_mgr.h > > > b/osm/include/opensm/osm_state_mgr.h > > > index ad4afa0..5e76463 100644 > > > --- a/osm/include/opensm/osm_state_mgr.h > > > +++ b/osm/include/opensm/osm_state_mgr.h > > > @@ -116,7 +116,6 @@ typedef struct _osm_state_mgr > > > osm_stats_t *p_stats; > > > struct _osm_sm_state_mgr *p_sm_state_mgr; > > > const osm_sm_mad_ctrl_t *p_mad_ctrl; > > > - cl_spinlock_t state_lock; > > > cl_spinlock_t idle_lock; > > > cl_qlist_t idle_time_list; > > > cl_plock_t *p_lock; > > > @@ -161,9 +160,6 @@ typedef struct _osm_state_mgr > > > * p_mad_ctrl > > > * Pointer to the SM's MAD Controller object. > > > * > > > -* state_lock > > > -* Spinlock guarding the state and processes. > > > -* > > > * p_lock > > > * lock guarding the subnet object. > > > * > > > diff --git a/osm/include/opensm/osm_state_mgr_ctrl.h > > > b/osm/include/opensm/osm_state_mgr_ctrl.h > > > deleted file mode 100644 > > > index 9ffcfb0..0000000 > > > --- a/osm/include/opensm/osm_state_mgr_ctrl.h > > > +++ /dev/null > > > @@ -1,236 +0,0 @@ > > > -/* > > > - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. > > > - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights > > reserved. > > > - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. > > > - * > > > - * This software is available to you under a choice of one of two > > > - * licenses. You may choose to be licensed under the terms of the > > GNU > > > - * General Public License (GPL) Version 2, available from the file > > > - * COPYING in the main directory of this source tree, or the > > > - * OpenIB.org BSD license below: > > > - * > > > - * Redistribution and use in source and binary forms, with or > > > - * without modification, are permitted provided that the > > following > > > - * conditions are met: > > > - * > > > - * - Redistributions of source code must retain the above > > > - * copyright notice, this list of conditions and the following > > > - * disclaimer. > > > - * > > > - * - Redistributions in binary form must reproduce the above > > > - * copyright notice, this list of conditions and the following > > > - * disclaimer in the documentation and/or other materials > > > - * provided with the distribution. > > > - * > > > - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY > KIND, > > > - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE > > > WARRANTIES OF > > > - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > > - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR > COPYRIGHT > > > HOLDERS > > > - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER > IN > > > AN > > > - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF > OR > > > IN > > > - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS > IN > > > THE > > > - * SOFTWARE. > > > - * > > > - * $Id$ > > > - */ > > > - > > > - > > > -/* > > > - * Abstract: > > > - * Declaration of osm_state_mgr_ctrl_t. > > > - * This object represents a controller that receives the > > > - * State indication after a subnet sweep. > > > - * This object is part of the OpenSM family of objects. > > > - * > > > - * Environment: > > > - * Linux User Mode > > > - * > > > - * $Revision: 1.4 $ > > > - */ > > > - > > > -#ifndef _OSM_STATE_MGR_CTRL_H_ > > > -#define _OSM_STATE_MGR_CTRL_H_ > > > - > > > - > > > -#include > > > -#include > > > -#include > > > -#include > > > - > > > -#ifdef __cplusplus > > > -# define BEGIN_C_DECLS extern "C" { > > > -# define END_C_DECLS } > > > -#else /* !__cplusplus */ > > > -# define BEGIN_C_DECLS > > > -# define END_C_DECLS > > > -#endif /* __cplusplus */ > > > - > > > -BEGIN_C_DECLS > > > - > > > -/****h* OpenSM/State Manager Controller > > > -* NAME > > > -* State Manager Controller > > > -* > > > -* DESCRIPTION > > > -* The State Manager Controller object encapsulates the information > > > -* needed to pass the dispatcher message from the dispatcher > > > -* to the State Manager. > > > -* > > > -* The State Manager Controller object is thread safe. > > > -* > > > -* This object should be treated as opaque and should be > > > -* manipulated only through the provided functions. > > > -* > > > -* AUTHOR > > > -* Steve King, Intel > > > -* > > > -*********/ > > > -/****s* OpenSM: State Manager Controller/osm_state_mgr_ctrl_t > > > -* NAME > > > -* osm_state_mgr_ctrl_t > > > -* > > > -* DESCRIPTION > > > -* State Manager Controller structure. > > > -* > > > -* This object should be treated as opaque and should > > > -* be manipulated only through the provided functions. > > > -* > > > -* SYNOPSIS > > > -*/ > > > -typedef struct _osm_state_mgr_ctrl > > > -{ > > > - osm_state_mgr_t *p_mgr; > > > - osm_log_t *p_log; > > > - cl_dispatcher_t *p_disp; > > > - cl_disp_reg_handle_t h_disp; > > > - > > > -} osm_state_mgr_ctrl_t; > > > -/* > > > -* FIELDS > > > -* p_mgr > > > -* Pointer to the State Manager object. > > > -* > > > -* p_log > > > -* Pointer to the log object. > > > -* > > > -* p_disp > > > -* Pointer to the Dispatcher. > > > -* > > > -* h_disp > > > -* Handle returned from dispatcher registration. > > > -* > > > -* SEE ALSO > > > -* State Manager Controller object > > > -*********/ > > > - > > > -/****f* OpenSM: State Manager Controller/osm_state_mgr_ctrl_construct > > > -* NAME > > > -* osm_state_mgr_ctrl_construct > > > -* > > > -* DESCRIPTION > > > -* This function constructs a State Manager Controller object. > > > -* > > > -* SYNOPSIS > > > -*/ > > > -void > > > -osm_state_mgr_ctrl_construct( > > > - IN osm_state_mgr_ctrl_t* const p_ctrl ); > > > -/* > > > -* PARAMETERS > > > -* p_ctrl > > > -* [in] Pointer to a State Manager Controller > > > -* object to construct. > > > -* > > > -* RETURN VALUE > > > -* This function does not return a value. > > > -* > > > -* NOTES > > > -* Allows calling osm_state_mgr_ctrl_init, > > osm_state_mgr_ctrl_destroy, > > > -* and osm_state_mgr_ctrl_is_inited. > > > -* > > > -* Calling osm_state_mgr_ctrl_construct is a prerequisite to > > calling any other > > > -* method except osm_state_mgr_ctrl_init. > > > -* > > > -* SEE ALSO > > > -* State Manager Controller object, osm_state_mgr_ctrl_init, > > > -* osm_state_mgr_ctrl_destroy > > > -*********/ > > > - > > > -/****f* OpenSM: State Manager Controller/osm_state_mgr_ctrl_destroy > > > -* NAME > > > -* osm_state_mgr_ctrl_destroy > > > -* > > > -* DESCRIPTION > > > -* The osm_state_mgr_ctrl_destroy function destroys the object, > > releasing > > > -* all resources. > > > -* > > > -* SYNOPSIS > > > -*/ > > > -void > > > -osm_state_mgr_ctrl_destroy( > > > - IN osm_state_mgr_ctrl_t* const p_ctrl ); > > > -/* > > > -* PARAMETERS > > > -* p_ctrl > > > -* [in] Pointer to the object to destroy. > > > -* > > > -* RETURN VALUE > > > -* This function does not return a value. > > > -* > > > -* NOTES > > > -* Performs any necessary cleanup of the specified > > > -* State Manager Controller object. > > > -* Further operations should not be attempted on the destroyed > > object. > > > -* This function should only be called after a call to > > > -* osm_state_mgr_ctrl_construct or osm_state_mgr_ctrl_init. > > > -* > > > -* SEE ALSO > > > -* State Manager Controller object, osm_state_mgr_ctrl_construct, > > > -* osm_state_mgr_ctrl_init > > > -*********/ > > > - > > > -/****f* OpenSM: State Manager Controller/osm_state_mgr_ctrl_init > > > -* NAME > > > -* osm_state_mgr_ctrl_init > > > -* > > > -* DESCRIPTION > > > -* The osm_state_mgr_ctrl_init function initializes a > > > -* State Manager Controller object for use. > > > -* > > > -* SYNOPSIS > > > -*/ > > > -ib_api_status_t > > > -osm_state_mgr_ctrl_init( > > > - IN osm_state_mgr_ctrl_t* const p_ctrl, > > > - IN osm_state_mgr_t* const p_mgr, > > > - IN osm_log_t* const p_log, > > > - IN cl_dispatcher_t* const p_disp ); > > > -/* > > > -* PARAMETERS > > > -* p_ctrl > > > -* [in] Pointer to an osm_state_mgr_ctrl_t object to > > initialize. > > > -* > > > -* p_mgr > > > -* [in] Pointer to an osm_state_mgr_t object. > > > -* > > > -* p_log > > > -* [in] Pointer to the log object. > > > -* > > > -* p_disp > > > -* [in] Pointer to the OpenSM central Dispatcher. > > > -* > > > -* RETURN VALUES > > > -* IB_SUCCESS if the State Manager Controller object > > > -* was initialized successfully. > > > -* > > > -* NOTES > > > -* Allows calling other State Manager Controller methods. > > > -* > > > -* SEE ALSO > > > -* State Manager Controller object, osm_state_mgr_ctrl_construct, > > > -* osm_state_mgr_ctrl_destroy > > > -*********/ > > > - > > > -END_C_DECLS > > > - > > > -#endif /* OSM_STATE_MGR_CTRL_H_ */ > > > diff --git a/osm/opensm/Makefile.am b/osm/opensm/Makefile.am > > > index 43fe8c1..7b1060a 100644 > > > --- a/osm/opensm/Makefile.am > > > +++ b/osm/opensm/Makefile.am > > > @@ -78,8 +78,7 @@ opensm_SOURCES = main.c osm_console.c os > > > osm_slvl_map_rcv.c osm_slvl_map_rcv_ctrl.c \ > > > osm_sm.c osm_sminfo_rcv.c \ > > > osm_sminfo_rcv_ctrl.c osm_sm_mad_ctrl.c \ > > > - osm_sm_state_mgr.c osm_state_mgr.c \ > > > - osm_state_mgr_ctrl.c osm_subnet.c \ > > > + osm_sm_state_mgr.c osm_state_mgr.c osm_subnet.c \ > > > osm_sweep_fail_ctrl.c osm_sw_info_rcv.c \ > > > osm_sw_info_rcv_ctrl.c osm_switch.c \ > > > osm_prtn.c osm_prtn_config.c osm_qos.c \ > > > @@ -104,7 +103,7 @@ # we need to be able to load libraries f > > > # we always give precedence to local tree libs and then use the > > pre-installed ones. > > > opensm_LDADD = -L../complib -L../libvendor -L. $(OSMV_LDADD) -lopensm > > - > > > losmcomp -losmvendor > > > > > > -opensm_LDFLAGS = -Wl,--rpath -Wl,$(libdir) -lpthread > > > +opensm_LDFLAGS = -Wl,--rpath -Wl,$(libdir) -lpthread -lrt > > > > > > opensmincludedir = $(includedir)/infiniband/opensm > > > > > > diff --git a/osm/opensm/osm_helper.c b/osm/opensm/osm_helper.c > > > index 3886609..a966bbe 100644 > > > --- a/osm/opensm/osm_helper.c > > > +++ b/osm/opensm/osm_helper.c > > > @@ -1895,7 +1895,6 @@ static const char* const __osm_disp_msg_ > > > "OSM_MSG_MAD_PORT_INFO,", > > > "OSM_MSG_MAD_SWITCH_INFO", > > > "OSM_MSG_MAD_NODE_DESC", > > > - "OSM_MSG_NO_SMPS_OUTSTANDING", > > > "OSM_MSG_MAD_NODE_RECORD", > > > "OSM_MSG_MAD_PORTINFO_RECORD", > > > "OSM_MSG_MAD_SERVICE_RECORD", > > > diff --git a/osm/opensm/osm_node_info_rcv.c > > b/osm/opensm/osm_node_info_rcv.c > > > index 59257a0..7ea4366 100644 > > > --- a/osm/opensm/osm_node_info_rcv.c > > > +++ b/osm/opensm/osm_node_info_rcv.c > > > @@ -69,6 +69,7 @@ #include > > > #include > > > #include > > > #include > > > +#include > > > > > > > > > > > /********************************************************************** > > > @@ -1088,11 +1089,11 @@ osm_ni_rcv_process( > > > > > > /* > > > * If we processed a new node - need to signal to the state_mgr > > that > > > - * change detected. BUT - we cannot call the osm_state_mgr_process > > > + * change detected. BUT - we cannot call the osm_sm_signal > > > * from within the lock of p_rcv->p_lock (can cause a deadlock). > > > */ > > > if ( process_new_flag ) > > > - osm_state_mgr_process( p_rcv->p_state_mgr, > > > OSM_SIGNAL_CHANGE_DETECTED ); > > > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, > > > OSM_SIGNAL_CHANGE_DETECTED ); > > > > > > Exit: > > > OSM_LOG_EXIT( p_rcv->p_log ); > > > diff --git a/osm/opensm/osm_port_info_rcv.c > > b/osm/opensm/osm_port_info_rcv.c > > > index a08c57c..7405ef0 100644 > > > --- a/osm/opensm/osm_port_info_rcv.c > > > +++ b/osm/opensm/osm_port_info_rcv.c > > > @@ -69,6 +69,7 @@ #include > > > #include > > > #include > > > #include > > > +#include > > > > > > > > /********************************************************************** > > > > > **********************************************************************/ > > > @@ -701,7 +702,7 @@ osm_pi_rcv_process( > > > " port = %u, Commencing heavy sweep\n", > > > cl_ntoh64( node_guid ), > > > cl_ntoh64( port_guid ) ); > > > - osm_state_mgr_process( p_rcv->p_state_mgr, > > > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, > > > OSM_SIGNAL_CHANGE_DETECTED ); > > > goto Exit; > > > } > > > diff --git a/osm/opensm/osm_sm.c b/osm/opensm/osm_sm.c > > > index 0e09f26..5b5eb3f 100644 > > > --- a/osm/opensm/osm_sm.c > > > +++ b/osm/opensm/osm_sm.c > > > @@ -55,6 +55,8 @@ #if HAVE_CONFIG_H > > > # include > > > #endif /* HAVE_CONFIG_H */ > > > > > > +#include > > > +#include > > > #include > > > #include > > > #include > > > @@ -79,53 +81,65 @@ void > > > __osm_sm_sweeper( > > > IN void *p_ptr ) > > > { > > > - ib_api_status_t status; > > > osm_sm_t *const p_sm = ( osm_sm_t * ) p_ptr; > > > + unsigned i, signals; > > > > > > OSM_LOG_ENTER( p_sm->p_log, __osm_sm_sweeper ); > > > > > > - if( p_sm->thread_state == OSM_THREAD_STATE_INIT ) > > > - { > > > - p_sm->thread_state = OSM_THREAD_STATE_RUN; > > > - } > > > - > > > - /* If the sweep interval was updated before - then run only if > > > - * it is not zero. */ > > > - while( p_sm->thread_state == OSM_THREAD_STATE_RUN && > > > - p_sm->p_subn->opt.sweep_interval != 0 ) > > > - { > > > - /* do the sweep only if we are in MASTER state */ > > > - if( p_sm->p_subn->sm_state == IB_SMINFO_STATE_MASTER || > > > - p_sm->p_subn->sm_state == IB_SMINFO_STATE_DISCOVERING ) > > > - osm_state_mgr_process( &p_sm->state_mgr, OSM_SIGNAL_SWEEP ); > > > + do { > > > + signals = 0; > > > + pthread_mutex_lock(&p_sm->mutex); > > > + if (p_sm->signal_mask == 0) > > > + pthread_cond_wait(&p_sm->cond, &p_sm->mutex); > > > + signals = p_sm->signal_mask; > > > + p_sm->signal_mask = 0; > > > + pthread_mutex_unlock(&p_sm->mutex); > > > + for (i = 0 ; signals ; i++) { > > > + if (signals&1) > > > + osm_state_mgr_process( &p_sm->state_mgr, i); > > > + signals >>= 1; > > > + } > > > + } while (p_sm->thread_state == OSM_THREAD_STATE_RUN); > > > > > > - /* > > > - * Wait on the event with a timeout. > > > - * Sweeps may be iniated "off schedule" by simply > > > - * signaling the event. > > > - */ > > > - status = cl_event_wait_on( &p_sm->signal, > > > - p_sm->p_subn->opt.sweep_interval * > > 1000000, > > > - TRUE ); > > > + OSM_LOG_EXIT( p_sm->p_log ); > > > +} > > > > > > - if( status == CL_SUCCESS ) > > > - { > > > - if( osm_log_is_active( p_sm->p_log, OSM_LOG_DEBUG ) ) > > > - { > > > - osm_log( p_sm->p_log, OSM_LOG_DEBUG, > > > - "__osm_sm_sweeper: " "Off schedule sweep > > signalled\n" ); > > > - } > > > > > +/********************************************************************** > > > + > > **********************************************************************/ > > > +void > > > +__osm_sm_sweeper_periodic( > > > + IN void *p_ptr ) > > > +{ > > > + struct timespec times; > > > + osm_sm_t *const p_sm = ( osm_sm_t * ) p_ptr; > > > + unsigned i, signals; > > > + int ret; > > > + > > > + OSM_LOG_ENTER( p_sm->p_log, __osm_sm_sweeper_periodic ); > > > + > > > + do { > > > + clock_gettime(CLOCK_REALTIME, ×); > > > + times.tv_sec += p_sm->p_subn->opt.sweep_interval; > > > + signals = 0; > > > + pthread_mutex_lock(&p_sm->mutex); > > > + if (p_sm->signal_mask == 0 && > > > + (ret = pthread_cond_timedwait(&p_sm->cond, &p_sm->mutex, > > > + ×)) == ETIMEDOUT && > > > + ( p_sm->p_subn->sm_state == IB_SMINFO_STATE_MASTER || > > > + p_sm->p_subn->sm_state == IB_SMINFO_STATE_DISCOVERING > > )) { > > > + signals = OSM_SIGNAL_SWEEP; > > > } > > > - else > > > - { > > > - if( status != CL_TIMEOUT ) > > > - { > > > - osm_log( p_sm->p_log, OSM_LOG_ERROR, > > > - "__osm_sm_sweeper: ERR 2E01: " > > > - "Event wait failed (%s)\n", CL_STATUS_MSG( > > status ) ); > > > - } > > > + else { > > > + signals = p_sm->signal_mask; > > > + p_sm->signal_mask = 0; > > > } > > > - } > > > + pthread_mutex_unlock(&p_sm->mutex); > > > + for (i = 0 ; signals ; i++) { > > > + if (signals&1) > > > + osm_state_mgr_process( &p_sm->state_mgr, i); > > > + signals >>= 1; > > > + } > > > + } while (p_sm->thread_state == OSM_THREAD_STATE_RUN); > > > > > > OSM_LOG_EXIT( p_sm->p_log ); > > > } > > > @@ -139,7 +153,6 @@ osm_sm_construct( > > > memset( p_sm, 0, sizeof( *p_sm ) ); > > > p_sm->thread_state = OSM_THREAD_STATE_NONE; > > > p_sm->sm_trans_id = OSM_SM_INITIAL_TID_VALUE; > > > - cl_event_construct( &p_sm->signal ); > > > cl_event_construct( &p_sm->subnet_up_event ); > > > cl_thread_construct( &p_sm->sweeper ); > > > osm_req_construct( &p_sm->req ); > > > @@ -158,7 +171,6 @@ osm_sm_construct( > > > osm_ucast_mgr_construct( &p_sm->ucast_mgr ); > > > osm_link_mgr_construct( &p_sm->link_mgr ); > > > osm_state_mgr_construct( &p_sm->state_mgr ); > > > - osm_state_mgr_ctrl_construct( &p_sm->state_mgr_ctrl ); > > > osm_drop_mgr_construct( &p_sm->drop_mgr ); > > > osm_lft_rcv_construct( &p_sm->lft_rcv ); > > > osm_lft_rcv_ctrl_construct( &p_sm->lft_rcv_ctrl ); > > > @@ -185,24 +197,14 @@ void > > > osm_sm_shutdown( > > > IN osm_sm_t * const p_sm ) > > > { > > > - boolean_t signal_event = FALSE; > > > - > > > OSM_LOG_ENTER( p_sm->p_log, osm_sm_shutdown ); > > > > > > /* > > > * Signal our threads that we're leaving. > > > - */ > > > - if( p_sm->thread_state != OSM_THREAD_STATE_NONE ) > > > - signal_event = TRUE; > > > - > > > - p_sm->thread_state = OSM_THREAD_STATE_EXIT; > > > - > > > - /* > > > - * Don't trigger unless event has been initialized. > > > * Destroy the thread before we tear down the other objects. > > > */ > > > - if( signal_event ) > > > - cl_event_signal( &p_sm->signal ); > > > + p_sm->thread_state = OSM_THREAD_STATE_EXIT; > > > + osm_sm_signal( p_sm, OSM_SIGNAL_NONE ); > > > > > > cl_thread_destroy( &p_sm->sweeper ); > > > > > > @@ -225,7 +227,6 @@ osm_sm_shutdown( > > > osm_vla_rcv_ctrl_destroy( &p_sm->vla_rcv_ctrl ); > > > osm_pkey_rcv_ctrl_destroy( &p_sm->pkey_rcv_ctrl ); > > > osm_sweep_fail_ctrl_destroy( &p_sm->sweep_fail_ctrl ); > > > - osm_state_mgr_ctrl_destroy( &p_sm->state_mgr_ctrl ); > > > > > > OSM_LOG_EXIT( p_sm->p_log ); > > > } > > > @@ -257,12 +258,14 @@ osm_sm_destroy( > > > osm_state_mgr_destroy( &p_sm->state_mgr ); > > > osm_sm_state_mgr_destroy( &p_sm->sm_state_mgr ); > > > osm_mcast_mgr_destroy( &p_sm->mcast_mgr ); > > > - cl_event_destroy( &p_sm->signal ); > > > cl_event_destroy( &p_sm->subnet_up_event ); > > > > > > if( p_sm->p_report_buf != NULL ) > > > cl_free( p_sm->p_report_buf ); > > > > > > + pthread_cond_destroy(&p_sm->cond); > > > + pthread_mutex_destroy(&p_sm->mutex); > > > + > > > osm_log( p_sm->p_log, OSM_LOG_SYS, "Exiting SM\n" ); /* Format > > Waived */ > > > OSM_LOG_EXIT( p_sm->p_log ); > > > } > > > @@ -303,14 +306,15 @@ osm_sm_init( > > > status = IB_INSUFFICIENT_MEMORY; > > > goto Exit; > > > } > > > - status = cl_event_init( &p_sm->signal, FALSE ); > > > - if( status != CL_SUCCESS ) > > > - goto Exit; > > > > > > status = cl_event_init( &p_sm->subnet_up_event, FALSE ); > > > if( status != CL_SUCCESS ) > > > goto Exit; > > > > > > + p_sm->signal_mask = 0; > > > + pthread_mutex_init(&p_sm->mutex, NULL); > > > + pthread_cond_init(&p_sm->cond, NULL); > > > + > > > status = osm_sm_mad_ctrl_init( &p_sm->mad_ctrl, > > > p_sm->p_subn, > > > p_sm->p_mad_pool, > > > @@ -416,12 +420,6 @@ osm_sm_init( > > > if( status != IB_SUCCESS ) > > > goto Exit; > > > > > > - status = osm_state_mgr_ctrl_init( &p_sm->state_mgr_ctrl, > > > - &p_sm->state_mgr, > > > - p_sm->p_log, p_sm->p_disp ); > > > - if( status != IB_SUCCESS ) > > > - goto Exit; > > > - > > > status = osm_drop_mgr_init( &p_sm->drop_mgr, > > > p_sm->p_subn, > > > p_sm->p_log, &p_sm->req, p_sm->p_lock > > ); > > > @@ -523,16 +521,15 @@ osm_sm_init( > > > > > > /* > > > * Now that the component objects are initialized, start > > > - * the sweeper thread if the user wants sweeping. > > > + * the sweeper thread. > > > */ > > > - if( p_sm->p_subn->opt.sweep_interval ) > > > - { > > > - p_sm->thread_state = OSM_THREAD_STATE_INIT; > > > - status = cl_thread_init( &p_sm->sweeper, __osm_sm_sweeper, > > p_sm, > > > - "opensm sweeper" ); > > > - if( status != IB_SUCCESS ) > > > - goto Exit; > > > - } > > > + p_sm->thread_state = OSM_THREAD_STATE_RUN; > > > + status = cl_thread_init( &p_sm->sweeper, > > > + p_sm->p_subn->opt.sweep_interval > 0 ? > > > + __osm_sm_sweeper_periodic : > > __osm_sm_sweeper, > > > + p_sm, "opensm sweeper" ); > > > + if( status != IB_SUCCESS ) > > > + goto Exit; > > > > > > Exit: > > > OSM_LOG_EXIT( p_log ); > > > @@ -542,11 +539,26 @@ osm_sm_init( > > > > > /********************************************************************** > > > > > **********************************************************************/ > > > void > > > +osm_sm_signal( > > > + IN osm_sm_t* const p_sm, > > > + IN osm_signal_t signal ) > > > +{ > > > + OSM_LOG_ENTER( p_sm->p_log, osm_sm_signal ); > > > + pthread_mutex_lock(&p_sm->mutex); > > > + p_sm->signal_mask |= (1 << signal); > > > + pthread_cond_signal(&p_sm->cond); > > > + pthread_mutex_unlock(&p_sm->mutex); > > > + OSM_LOG_EXIT( p_sm->p_log ); > > > +} > > > + > > > > > +/********************************************************************** > > > + > > **********************************************************************/ > > > +void > > > osm_sm_sweep( > > > IN osm_sm_t * const p_sm ) > > > { > > > OSM_LOG_ENTER( p_sm->p_log, osm_sm_sweep ); > > > - osm_state_mgr_process( &p_sm->state_mgr, OSM_SIGNAL_SWEEP ); > > > + osm_sm_signal( p_sm, OSM_SIGNAL_SWEEP ); > > > OSM_LOG_EXIT( p_sm->p_log ); > > > } > > > > > > diff --git a/osm/opensm/osm_sm_mad_ctrl.c > > b/osm/opensm/osm_sm_mad_ctrl.c > > > index 9dceef2..1982873 100644 > > > --- a/osm/opensm/osm_sm_mad_ctrl.c > > > +++ b/osm/opensm/osm_sm_mad_ctrl.c > > > @@ -81,7 +81,6 @@ __osm_sm_mad_ctrl_retire_trans_mad( > > > IN osm_madw_t* const p_madw ) > > > { > > > uint32_t outstanding; > > > - cl_status_t status; > > > > > > OSM_LOG_ENTER( p_ctrl->p_log, __osm_sm_mad_ctrl_retire_trans_mad ); > > > > > > @@ -115,31 +114,10 @@ __osm_sm_mad_ctrl_retire_trans_mad( > > > The wire is clean. > > > Signal the state manager. > > > */ > > > - if( osm_log_is_active( p_ctrl->p_log, OSM_LOG_DEBUG ) ) > > > - { > > > - osm_log( p_ctrl->p_log, OSM_LOG_DEBUG, > > > - "__osm_sm_mad_ctrl_retire_trans_mad: " > > > - "Posting Dispatcher message %s\n", > > > - osm_get_disp_msg_str( OSM_MSG_NO_SMPS_OUTSTANDING ) ); > > > - } > > > - > > > - status = cl_disp_post( p_ctrl->h_disp, > > > - OSM_MSG_NO_SMPS_OUTSTANDING, > > > - (void > > *)OSM_SIGNAL_NO_PENDING_TRANSACTIONS, > > > - NULL, > > > - NULL ); > > > - > > > - if( status != CL_SUCCESS ) > > > - { > > > - osm_log( p_ctrl->p_log, OSM_LOG_ERROR, > > > - "__osm_sm_mad_ctrl_retire_trans_mad: ERR 3101: " > > > - "Dispatcher post message failed (%s)\n", > > > - CL_STATUS_MSG( status ) ); > > > - goto Exit; > > > - } > > > + osm_sm_signal( &p_ctrl->p_subn->p_osm->sm, > > > + OSM_SIGNAL_NO_PENDING_TRANSACTIONS ); > > > } > > > > > > - Exit: > > > OSM_LOG_EXIT( p_ctrl->p_log ); > > > } > > > /************/ > > > diff --git a/osm/opensm/osm_sm_state_mgr.c > > b/osm/opensm/osm_sm_state_mgr.c > > > index feeda45..2c6da4e 100644 > > > --- a/osm/opensm/osm_sm_state_mgr.c > > > +++ b/osm/opensm/osm_sm_state_mgr.c > > > @@ -563,7 +563,7 @@ osm_sm_state_mgr_process( > > > /* > > > * Stop the discovering > > > */ > > > - osm_state_mgr_process( p_sm_mgr->p_state_mgr, > > > + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, > > > > > OSM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED ); > > > break; > > > case OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE: > > > @@ -610,7 +610,7 @@ osm_sm_state_mgr_process( > > > __osm_sm_state_mgr_discovering_msg( p_sm_mgr ); > > > p_sm_mgr->p_subn->sm_state = IB_SMINFO_STATE_DISCOVERING; > > > p_sm_mgr->p_subn->coming_out_of_standby = TRUE; > > > - osm_state_mgr_process( p_sm_mgr->p_state_mgr, > > OSM_SIGNAL_EXIT_STBY > > > ); > > > + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, > > > OSM_SIGNAL_EXIT_STBY ); > > > break; > > > case OSM_SM_SIGNAL_DISABLE: > > > /* > > > @@ -641,7 +641,7 @@ osm_sm_state_mgr_process( > > > */ > > > p_sm_mgr->p_subn->master_sm_base_lid = > > p_sm_mgr->p_subn->sm_base_lid; > > > p_sm_mgr->p_subn->coming_out_of_standby = TRUE; > > > - osm_state_mgr_process( p_sm_mgr->p_state_mgr, > > OSM_SIGNAL_EXIT_STBY > > > ); > > > + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, > > > OSM_SIGNAL_EXIT_STBY ); > > > break; > > > case OSM_SM_SIGNAL_ACKNOWLEDGE: > > > /* > > > @@ -704,7 +704,7 @@ osm_sm_state_mgr_process( > > > "Received OSM_SM_SIGNAL_HANDOVER\n" ); > > > p_sm_mgr->p_polling_sm = NULL; > > > p_sm_mgr->p_subn->force_immediate_heavy_sweep = TRUE; > > > - osm_state_mgr_process( p_sm_mgr->p_state_mgr, > > OSM_SIGNAL_SWEEP ); > > > + osm_sm_signal( &p_sm_mgr->p_subn->p_osm->sm, > > OSM_SIGNAL_SWEEP ); > > > break; > > > case OSM_SM_SIGNAL_HANDOVER_SENT: > > > /* > > > diff --git a/osm/opensm/osm_sminfo_rcv.c b/osm/opensm/osm_sminfo_rcv.c > > > index 5914984..4af549b 100644 > > > --- a/osm/opensm/osm_sminfo_rcv.c > > > +++ b/osm/opensm/osm_sminfo_rcv.c > > > @@ -425,10 +425,10 @@ __osm_sminfo_rcv_process_set_request( > > > } > > > > > > > > /********************************************************************** > > > - * Return a signal with which to call the osm_state_mgr_process. > > > + * Return a signal with which to call the osm_sm_signal. > > > * This is done since we are locked by p_rcv->p_lock in this > > function, > > > - * and thus cannot call osm_state_mgr_process (that locks the > > state_lock). > > > - * If return OSM_SIGNAL_NONE - do not call osm_state_mgr_process. > > > + * and thus cannot call osm_sm_signal. > > > + * If return OSM_SIGNAL_NONE - do not call osm_sm_signal. > > > > > **********************************************************************/ > > > osm_signal_t > > > __osm_sminfo_rcv_process_get_sm( > > > @@ -676,7 +676,7 @@ __osm_sminfo_rcv_process_get_response( > > > /* If process_get_sm_ret_val != OSM_SIGNAL_NONE then we have to > > signal > > > * to the state_mgr with that signal. */ > > > if (process_get_sm_ret_val != OSM_SIGNAL_NONE) > > > - osm_state_mgr_process( p_rcv->p_state_mgr, > > > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, > > > process_get_sm_ret_val ); > > > OSM_LOG_EXIT( p_rcv->p_log ); > > > } > > > diff --git a/osm/opensm/osm_state_mgr.c b/osm/opensm/osm_state_mgr.c > > > index 724b2b7..ff1c65c 100644 > > > --- a/osm/opensm/osm_state_mgr.c > > > +++ b/osm/opensm/osm_state_mgr.c > > > @@ -83,7 +83,6 @@ osm_state_mgr_construct( > > > IN osm_state_mgr_t * const p_mgr ) > > > { > > > memset( p_mgr, 0, sizeof( *p_mgr ) ); > > > - cl_spinlock_construct( &p_mgr->state_lock ); > > > cl_spinlock_construct( &p_mgr->idle_lock ); > > > p_mgr->state = OSM_SM_STATE_INIT; > > > } > > > @@ -99,7 +98,6 @@ osm_state_mgr_destroy( > > > OSM_LOG_ENTER( p_mgr->p_log, osm_state_mgr_destroy ); > > > > > > /* destroy the locks */ > > > - cl_spinlock_destroy( &p_mgr->state_lock ); > > > cl_spinlock_destroy( &p_mgr->idle_lock ); > > > > > > OSM_LOG_EXIT( p_mgr->p_log ); > > > @@ -162,14 +160,6 @@ osm_state_mgr_init( > > > p_mgr->state_step_mode = OSM_STATE_STEP_CONTINUOUS; > > > p_mgr->next_stage_signal = OSM_SIGNAL_NONE; > > > > > > - status = cl_spinlock_init( &p_mgr->state_lock ); > > > - if( status != CL_SUCCESS ) > > > - { > > > - osm_log( p_mgr->p_log, OSM_LOG_ERROR, > > > - "osm_state_mgr_init: ERR 3301: " > > > - "Spinlock init failed (%s)\n", CL_STATUS_MSG( status ) > > ); > > > - } > > > - > > > cl_qlist_init( &p_mgr->idle_time_list ); > > > > > > status = cl_spinlock_init( &p_mgr->idle_lock ); > > > @@ -1897,16 +1887,6 @@ osm_state_mgr_process( > > > if( osm_exit_flag ) > > > signal = OSM_SIGNAL_NONE; > > > > > > - /* > > > - * The state lock prevents many race conditions from screwing > > > - * up the state transition process. For example, if an function > > > - * puts transactions on the wire, the state lock guarantees this > > > - * loop will see the return code ("DONE PENDING") of the function > > > - * before the "NO OUTSTANDING TRANSACTIONS" signal is > > asynchronously > > > - * received. > > > - */ > > > - cl_spinlock_acquire( &p_mgr->state_lock ); > > > - > > > while( signal != OSM_SIGNAL_NONE ) > > > { > > > if( osm_log_is_active( p_mgr->p_log, OSM_LOG_DEBUG ) ) > > > @@ -2957,8 +2937,6 @@ osm_state_mgr_process( > > > p_mgr->state_step_mode = OSM_STATE_STEP_BREAK; > > > } > > > > > > - cl_spinlock_release( &p_mgr->state_lock ); > > > - > > > OSM_LOG_EXIT( p_mgr->p_log ); > > > } > > > > > > @@ -2994,7 +2972,7 @@ osm_state_mgr_process_idle( > > > cl_qlist_insert_tail( &p_mgr->idle_time_list, > > &p_idle_item->list_item ); > > > cl_spinlock_release( &p_mgr->idle_lock ); > > > > > > - osm_state_mgr_process( p_mgr, > OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST > > > ); > > > + osm_sm_signal( &p_mgr->p_subn->p_osm->sm, > > > OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST ); > > > > > > OSM_LOG_EXIT( p_mgr->p_log ); > > > > > > diff --git a/osm/opensm/osm_state_mgr_ctrl.c > > b/osm/opensm/osm_state_mgr_ctrl.c > > > deleted file mode 100644 > > > index 0bde333..0000000 > > > --- a/osm/opensm/osm_state_mgr_ctrl.c > > > +++ /dev/null > > > @@ -1,132 +0,0 @@ > > > -/* > > > - * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. > > > - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights > > reserved. > > > - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. > > > - * > > > - * This software is available to you under a choice of one of two > > > - * licenses. You may choose to be licensed under the terms of the > > GNU > > > - * General Public License (GPL) Version 2, available from the file > > > - * COPYING in the main directory of this source tree, or the > > > - * OpenIB.org BSD license below: > > > - * > > > - * Redistribution and use in source and binary forms, with or > > > - * without modification, are permitted provided that the > > following > > > - * conditions are met: > > > - * > > > - * - Redistributions of source code must retain the above > > > - * copyright notice, this list of conditions and the following > > > - * disclaimer. > > > - * > > > - * - Redistributions in binary form must reproduce the above > > > - * copyright notice, this list of conditions and the following > > > - * disclaimer in the documentation and/or other materials > > > - * provided with the distribution. > > > - * > > > - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY > KIND, > > > - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE > > > WARRANTIES OF > > > - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > > - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR > COPYRIGHT > > > HOLDERS > > > - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER > IN > > > AN > > > - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF > OR > > > IN > > > - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS > IN > > > THE > > > - * SOFTWARE. > > > - * > > > - * $Id$ > > > - */ > > > - > > > - > > > -/* > > > - * Abstract: > > > - * Implementation of osm_state_mgr_ctrl_t. > > > - * This object represents the State Manager Controller object. > > > - * This object is part of the opensm family of objects. > > > - * > > > - * Environment: > > > - * Linux User Mode > > > - * > > > - * $Revision: 1.5 $ > > > - */ > > > - > > > -/* > > > - Next available error code: 0x1601 > > > -*/ > > > - > > > -#if HAVE_CONFIG_H > > > -# include > > > -#endif /* HAVE_CONFIG_H */ > > > - > > > -#include > > > -#include > > > -#include > > > - > > > > > -/********************************************************************** > > > - > > **********************************************************************/ > > > -void > > > -__osm_state_mgr_ctrl_disp_callback( > > > - IN void *context, > > > - IN void *p_data ) > > > -{ > > > - /* ignore return status when invoked via the dispatcher */ > > > - osm_state_mgr_process( ((osm_state_mgr_ctrl_t*)context)->p_mgr, > > > - (osm_signal_t)(p_data) ); > > > -} > > > - > > > > > -/********************************************************************** > > > - > > **********************************************************************/ > > > -void > > > -osm_state_mgr_ctrl_construct( > > > - IN osm_state_mgr_ctrl_t* const p_ctrl ) > > > -{ > > > - memset( p_ctrl, 0, sizeof(*p_ctrl) ); > > > - p_ctrl->h_disp = CL_DISP_INVALID_HANDLE; > > > -} > > > - > > > > > -/********************************************************************** > > > - > > **********************************************************************/ > > > -void > > > -osm_state_mgr_ctrl_destroy( > > > - IN osm_state_mgr_ctrl_t* const p_ctrl ) > > > -{ > > > - CL_ASSERT( p_ctrl ); > > > - cl_disp_unregister( p_ctrl->h_disp ); > > > -} > > > - > > > > > -/********************************************************************** > > > - > > **********************************************************************/ > > > -ib_api_status_t > > > -osm_state_mgr_ctrl_init( > > > - IN osm_state_mgr_ctrl_t* const p_ctrl, > > > - IN osm_state_mgr_t* const p_mgr, > > > - IN osm_log_t* const p_log, > > > - IN cl_dispatcher_t* const p_disp ) > > > -{ > > > - ib_api_status_t status = IB_SUCCESS; > > > - > > > - OSM_LOG_ENTER( p_log, osm_state_mgr_ctrl_init ); > > > - > > > - osm_state_mgr_ctrl_construct( p_ctrl ); > > > - p_ctrl->p_log = p_log; > > > - > > > - p_ctrl->p_mgr = p_mgr; > > > - p_ctrl->p_disp = p_disp; > > > - > > > - p_ctrl->h_disp = cl_disp_register( > > > - p_disp, > > > - OSM_MSG_NO_SMPS_OUTSTANDING, > > > - __osm_state_mgr_ctrl_disp_callback, > > > - p_ctrl ); > > > - > > > - if( p_ctrl->h_disp == CL_DISP_INVALID_HANDLE ) > > > - { > > > - osm_log( p_log, OSM_LOG_ERROR, > > > - "osm_state_mgr_ctrl_init: ERR 3401: " > > > - "Dispatcher registration failed\n" ); > > > - status = IB_INSUFFICIENT_RESOURCES; > > > - goto Exit; > > > - } > > > - > > > - Exit: > > > - OSM_LOG_EXIT( p_log ); > > > - return( status ); > > > -} > > > - > > > diff --git a/osm/opensm/osm_sw_info_rcv.c > > b/osm/opensm/osm_sw_info_rcv.c > > > index 6bbd73a..61aff27 100644 > > > --- a/osm/opensm/osm_sw_info_rcv.c > > > +++ b/osm/opensm/osm_sw_info_rcv.c > > > @@ -60,6 +60,7 @@ #include > > > #include > > > #include > > > #include > > > +#include > > > #include > > > > > > > > /********************************************************************** > > > @@ -673,7 +674,7 @@ osm_si_rcv_process( > > > if (__osm_si_rcv_process_existing( p_rcv, p_node, p_sw, p_madw > > )) > > > { > > > CL_PLOCK_RELEASE( p_rcv->p_lock ); > > > - osm_state_mgr_process( p_rcv->p_state_mgr, > > > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, > > > OSM_SIGNAL_CHANGE_DETECTED ); > > > goto Exit; > > > } > > > diff --git a/osm/opensm/osm_sweep_fail_ctrl.c > > b/osm/opensm/osm_sweep_fail_ctrl.c > > > index e27a540..9e41ec7 100644 > > > --- a/osm/opensm/osm_sweep_fail_ctrl.c > > > +++ b/osm/opensm/osm_sweep_fail_ctrl.c > > > @@ -52,6 +52,8 @@ #endif /* HAVE_CONFIG_H */ > > > #include > > > #include > > > #include > > > +#include > > > +#include > > > > > > > > /********************************************************************** > > > > > **********************************************************************/ > > > @@ -68,7 +70,7 @@ __osm_sweep_fail_ctrl_disp_callback( > > > /* > > > Notify the state manager that we had a light sweep failure. > > > */ > > > - osm_state_mgr_process( p_ctrl->p_state_mgr, > > > + osm_sm_signal( &p_ctrl->p_state_mgr->p_subn->p_osm->sm, > > > OSM_SIGNAL_LIGHT_SWEEP_FAIL ); > > > > > > OSM_LOG_EXIT( p_ctrl->p_log ); > > > diff --git a/osm/opensm/osm_trap_rcv.c b/osm/opensm/osm_trap_rcv.c > > > index 9865f53..fb32ce9 100644 > > > --- a/osm/opensm/osm_trap_rcv.c > > > +++ b/osm/opensm/osm_trap_rcv.c > > > @@ -589,8 +589,7 @@ __osm_trap_rcv_process_request( > > > > > > p_rcv->p_subn->force_immediate_heavy_sweep = TRUE; > > > } > > > - osm_state_mgr_process( p_rcv->p_state_mgr, > > > - OSM_SIGNAL_SWEEP ); > > > + osm_sm_signal( &p_rcv->p_subn->p_osm->sm, OSM_SIGNAL_SWEEP ); > > > } > > > > > > /* If we reached here due to trap 129/130/131 - do not need to do > > > diff --git a/osm/opensm/osm_vl15intf.c b/osm/opensm/osm_vl15intf.c > > > index 68f17c5..c3adb6e 100644 > > > --- a/osm/opensm/osm_vl15intf.c > > > +++ b/osm/opensm/osm_vl15intf.c > > > @@ -62,6 +62,7 @@ #include > > > #include > > > #include > > > #include > > > +#include > > > #include > > > #include > > > > > > @@ -156,7 +157,6 @@ __osm_vl15_poller( > > > if( status != IB_SUCCESS ) > > > { > > > uint32_t outstanding; > > > - cl_status_t cl_status; > > > > > > osm_log( p_vl->p_log, OSM_LOG_ERROR, > > > "__osm_vl15_poller: ERR 3E03: " > > > @@ -202,27 +202,8 @@ __osm_vl15_poller( > > > The wire is clean. > > > Signal the state manager. > > > */ > > > - if( osm_log_is_active( p_vl->p_log, OSM_LOG_DEBUG ) ) > > > - { > > > - osm_log( p_vl->p_log, OSM_LOG_DEBUG, > > > - "__osm_vl15_poller: " > > > - "Posting Dispatcher message %s\n", > > > - osm_get_disp_msg_str( > > OSM_MSG_NO_SMPS_OUTSTANDING ) ); > > > - } > > > - > > > - cl_status = cl_disp_post( p_vl->h_disp, > > > - OSM_MSG_NO_SMPS_OUTSTANDING, > > > - (void > > *)OSM_SIGNAL_NO_PENDING_TRANSACTIONS, > > > - NULL, > > > - NULL ); > > > - > > > - if( cl_status != CL_SUCCESS ) > > > - { > > > - osm_log( p_vl->p_log, OSM_LOG_ERROR, > > > - "__osm_vl15_poller: ERR 3E06: " > > > - "Dispatcher post message failed (%s)\n", > > > - CL_STATUS_MSG( cl_status ) ); > > > - } > > > + osm_sm_signal( &p_vl->p_subn->p_osm->sm, > > > + OSM_SIGNAL_NO_PENDING_TRANSACTIONS ); > > > } > > > } > > > } From ishai at mellanox.co.il Mon May 22 05:20:29 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Mon, 22 May 2006 15:20:29 +0300 Subject: [openib-general] SRP: [PATCH] Handling DREQ In-Reply-To: <20060518235435.GA24917@mellanox.co.il> References: <20060518235435.GA24917@mellanox.co.il> Message-ID: <20060522122029.GA1498@mellanox.co.il> On Fri, May 19, 2006 at 02:54:35AM +0300, Ishai Rabinovitz wrote: Hi Fixed the patch a little bit. > Hi, > > I got "Unhandled CM event 6" (IB_CM_DREQ_ERROR) and "Unhandled CM event 7" > (IB_CM_DREQ_RECEIVED). > > So here is a patch that handles these CM events. > > This is an initial patch. Maybe it will be more efficient to initiate a reconnect > in case we get IB_CM_DREQ_RECEIVED. What do you think? > > Signed-off-by: Ishai Rabinovitz > Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-21 18:19:58.000000000 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-22 09:32:00.000000000 +0300 @@ -1217,6 +1217,20 @@ static int srp_cm_handler(struct ib_cm_i target->status = 0; break; + case IB_CM_DREQ_ERROR: + printk(KERN_ERR PFX + "IB_CM_DREQ_ERROR received - connection closed\n"); + /* no need to set comp - there will be a TIMEWAIT_EXIT */ + break; + + case IB_CM_DREQ_RECEIVED: + printk(KERN_WARNING PFX + "IB_CM_DREQ_RECEIVED received - connection closed\n"); + if (ib_send_cm_drep(cm_id, NULL, 0)) + printk(KERN_ERR PFX "ib_send_cm_drep failed\n"); + /* no need to set comp - there will be a TIMEWAIT_EXIT */ + break; + default: printk(KERN_WARNING PFX "Unhandled CM event %d\n", event->event); break; -- Ishai Rabinovitz From muli at il.ibm.com Mon May 22 06:08:24 2006 From: muli at il.ibm.com (Muli Ben-Yehuda) Date: Mon, 22 May 2006 09:08:24 -0400 Subject: [openib-general] SRP: [PATCH] Handling DREQ In-Reply-To: <20060522122029.GA1498@mellanox.co.il> References: <20060518235435.GA24917@mellanox.co.il> <20060522122029.GA1498@mellanox.co.il> Message-ID: <20060522130824.GA3359@rhun.haifa.ibm.com> On Mon, May 22, 2006 at 03:20:29PM +0300, Ishai Rabinovitz wrote: > + case IB_CM_DREQ_RECEIVED: > + printk(KERN_WARNING PFX > + "IB_CM_DREQ_RECEIVED received - connection closed\n"); > + if (ib_send_cm_drep(cm_id, NULL, 0)) > + printk(KERN_ERR PFX "ib_send_cm_drep > failed\n"); Assuming ib_send_cm_drep returns -ERRNO, please print the return value here. Cheers, Muli From halr at voltaire.com Mon May 22 06:06:43 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 22 May 2006 09:06:43 -0400 Subject: [openib-general] Re: [PATCH] opensm: configurable VLStallCount values In-Reply-To: <20060521163444.GL14503@sashak.voltaire.com> References: <20060521163444.GL14503@sashak.voltaire.com> Message-ID: <1148303201.4470.59532.camel@hal.voltaire.com> On Sun, 2006-05-21 at 12:34, Sasha Khapyorsky wrote: > Hello Hal, > > This adds configurable vl_stall_count and leaf_vl_stall_count values for > switch external ports. Also fixes existed hoq_lifetime processing. The > features are: don't bother about not connected ports, hoq_lifetime > setup is only for switch and router ports, vl_stall_count setup is only > for switch ports, leaf_ values are used for switch ports connected to a > CA. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only. -- Hal From sashak at voltaire.com Mon May 22 06:43:47 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 22 May 2006 16:43:47 +0300 Subject: [openib-general] Re: [PATCH] opensm: remove osm_pkey_mgr.h In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBD3@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBD3@mtlexch01.mtl.com> Message-ID: <20060522134347.GF30176@sashak.voltaire.com> On 13:55 Mon 22 May , Eitan Zahavi wrote: > Hi Sasha, > > My point is simple: > > OpenSM has a very structured skeleton: > 1. All mad receivers have two c files and two h files > 1.1 mad receive controller which deals with dispatcher registration . > 1.2 Mad receiver which deals with all the action happening after such a > mad is received > 2. All algorithm stages (managers) have a c file and h file > An algorithm stage might be lid assignment, routing, partition > enforcement etc > 3. All SMDB objects have a c file and h files. > Examples are Nodes, Ports, Multicast registrations etc > > These are the structural code rules for OpenSM. > > Even if you think it is better to merge functionality and avoid having > some of these h files and you might even be able to save some lines of > code by doing that, you break the code structure. If you personally like > to work with flat code - this is your preference. > I prefer having clear structure. So if I need to know where is the pkey > manager object defined? What are its internal state variables? What are > the algorithms it uses? Etc I can simply open up osm_pkey_mgr.h and find > this out. I would be agree with your last example, but it is not the case - what you will actually find in osm_pkey_mgr.h is some object with no related to pkey management fields, but instead with four duplicated from somewhere pointers (and you will need to dig the whole tree in order to find from where actually it was copied). Do you call this "clear structure"? > If you want to redesign OpenSM structure to be "simpler" or more > "effective" you can propose doing that. But doing it in the salami way > is just going to hurt stability and leave us with no structure at all. > Unless we are willing to re-architect the code in a clean manner and > spend the years of development and validation for these changes - lets > keep the code with clear structure. So your proposition is to wait years in order to remove unused object? Also note that this cleanup has nothing similar with to re-architect the code, it does not even touch OpenSM architecture. Sasha From pw at osc.edu Mon May 22 07:38:01 2006 From: pw at osc.edu (Pete Wyckoff) Date: Mon, 22 May 2006 10:38:01 -0400 Subject: [openib-general] Re: vapi versus openib imm_data In-Reply-To: <200605220930.22274.jackm@mellanox.co.il> References: <20060521233917.GA25201@osc.edu> <20060522051923.GB14583@mellanox.co.il> <200605220930.22274.jackm@mellanox.co.il> Message-ID: <20060522143801.GB13663@osc.edu> jackm at mellanox.co.il wrote on Mon, 22 May 2006 09:30 +0300: > On Monday 22 May 2006 08:19, Michael S. Tsirkin wrote: > > Quoting r. Pete Wyckoff : > > > is there any good way to tell if the other side has put > > > its imm_data in network byte order or not? > > > > gen2 always assumes imm_data is given in network byte order. > > VAPI assumes that imm_data is given (i.e., supplied to the API) in host byte > order. If the host is a little-endian host (as PCs are), the mlxhh (i.e., > inner) layer will convert immediate data to network byte order on the send, > and will convert received immediate data from Network byte order to host byte > order on receive -- and the VAPI caller will receive the immediate data in > host byte order. Thanks both. I'll solve this by adding htonl/ntohl, only on the VAPI side, when sending immediate data and reading it back from the CQ. That should undo the little-endian-only byte swap introduced by VAPI. Much easier than trying to figure out what the other side is doing with its immediate data. -- Pete From sashak at voltaire.com Mon May 22 08:10:20 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 22 May 2006 18:10:20 +0300 Subject: [openib-general] [PATCH] opensm: fix byte ordering in ib_member_get/set_sl_flow_hop() Message-ID: <20060522151020.GJ30176@sashak.voltaire.com> This fixes net/host byte ordering in ib_member_get/set_sl_flow_hop() functions. Signed-off-by: Sasha Khapyorsky --- osm/include/iba/ib_types.h | 29 +++++++++++------------------ 1 files changed, 11 insertions(+), 18 deletions(-) f8c16e36c5294d25136737c7855ffcf718866898 diff --git a/osm/include/iba/ib_types.h b/osm/include/iba/ib_types.h index 86133ed..7526b4b 100644 --- a/osm/include/iba/ib_types.h +++ b/osm/include/iba/ib_types.h @@ -6304,20 +6304,18 @@ ib_member_get_sl_flow_hop( OUT uint32_t* const p_flow_lbl, OUT uint8_t* const p_hop ) { - ib_net32_t tmp_sl_flow_hop; + uint32_t tmp = cl_ntoh32(sl_flow_hop); - if (p_sl) - *p_sl = (uint8_t)(sl_flow_hop & 0x0f); - - tmp_sl_flow_hop = sl_flow_hop >> 4; + if (p_hop) + *p_hop = (uint8_t)tmp; + tmp >>= 8; if (p_flow_lbl) - *p_flow_lbl = (uint32_t)(tmp_sl_flow_hop & 0xfffff); - - tmp_sl_flow_hop = tmp_sl_flow_hop >> 20; + *p_flow_lbl = (uint32_t)(tmp & 0xfffff); + tmp >>= 20; - if (p_hop) - *p_hop = (uint8_t)(tmp_sl_flow_hop & 0xff); + if (p_sl) + *p_sl = (uint8_t)tmp; } /* * PARAMETERS @@ -6355,14 +6353,9 @@ ib_member_set_sl_flow_hop( IN const uint32_t flow_label, IN const uint8_t hop_limit ) { - ib_net32_t sl_flow_hop; - - sl_flow_hop = hop_limit; - sl_flow_hop = sl_flow_hop << 20; - sl_flow_hop = sl_flow_hop | flow_label; - sl_flow_hop = sl_flow_hop << 2; - sl_flow_hop = sl_flow_hop | sl; - return (sl_flow_hop); + uint32_t tmp; + tmp = (sl << 28) | ((flow_label & 0xfffff) << 8) | hop_limit; + return cl_hton32(tmp); } /* * PARAMETERS -- 1.3.2 From mst at mellanox.co.il Mon May 22 08:31:01 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 22 May 2006 18:31:01 +0300 Subject: [openib-general] [PATCH] comma at end of enumerator list Message-ID: <20060522153101.GO15161@mellanox.co.il> While comma at end of enumerator list is legal since 1999, some tools (notably gcc versions pre-4.0) seem to default to 1989 mode when running with -pedantic flag, and warn about this usage. Since most of our enums in header files do not have comma at end, its probably easier to fix the remaining two cases than educate all users of libibverbs on virtues of C99. Signed-off-by: Michael S. Tsirkin Index: libibverbs/include/infiniband/verbs.h =================================================================== --- libibverbs/include/infiniband/verbs.h (revision 7212) +++ libibverbs/include/infiniband/verbs.h (working copy) @@ -88,7 +88,7 @@ IBV_DEVICE_SYS_IMAGE_GUID = 1 << 11, IBV_DEVICE_RC_RNR_NAK_GEN = 1 << 12, IBV_DEVICE_SRQ_RESIZE = 1 << 13, - IBV_DEVICE_N_NOTIFY_CQ = 1 << 14, + IBV_DEVICE_N_NOTIFY_CQ = 1 << 14 }; enum ibv_atomic_cap { @@ -338,7 +338,7 @@ enum ibv_srq_attr_mask { IBV_SRQ_MAX_WR = 1 << 0, - IBV_SRQ_LIMIT = 1 << 1, + IBV_SRQ_LIMIT = 1 << 1 }; struct ibv_srq_attr { -- MST From rdreier at cisco.com Mon May 22 08:55:33 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 May 2006 08:55:33 -0700 Subject: [openib-general] vapi versus openib imm_data References: <20060521233917.GA25201@osc.edu> Message-ID: Pete> I have an application (PVFS2) that can use either VAPI Pete> (ibgd-1.8.2) or OpenIB (libibverbs-1.0.3.1.fc4 and Pete> libmthca-1.0.1.fc.4). Things work just fine between one Pete> machine using VAPI and another using OpenIB, but immediate Pete> data in an RDMA write comes through byte-swapped. Both ends Pete> are x86_64 hosts. Pete> I'm using a heuristic now, knowing the range of values that Pete> are valid, but is there any good way to tell if the other Pete> side has put its imm_data in network byte order or not? libmthca (ie openib) puts the immediate data from the work request directly onto the wire without doing any swapping. From looking at the code, it seems that the old VAPI code will byte swap the immediate data before sending it. So the simplest solution seems to be for you to use htonl/ntohl when running on top of the new verbs, and leave it out when running on top of VAPI. - R. From rdreier at cisco.com Mon May 22 09:00:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 May 2006 09:00:34 -0700 Subject: [openib-general] [PATCH] cma: fix bind to ip In-Reply-To: <20060522083054.GF15161@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 22 May 2006 11:30:54 +0300") References: <20060518133627.GW30211@mellanox.co.il> <446CA5DE.1040600@ichips.intel.com> <20060522083054.GF15161@mellanox.co.il> Message-ID: Michael> I'd like very much for the following two patches to be Michael> merged. While SDP isn't likely to be in kernel 2.6.18, I Michael> think its very confusing to have CMA API in-kernel and on Michael> trunk to have slightly different semantics. OK, can you send me a patch against my for-2.6.18 branch that has the changes you're looking for? Thanks, Roland From rdreier at cisco.com Mon May 22 09:00:33 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 May 2006 09:00:33 -0700 Subject: [openib-general] Announcing the Release of MVAPICH2 0.9.3 with multi-threading support and anonymous SVN access References: <200605210557.k4L5v0sO009768@xi.cse.ohio-state.edu> Message-ID: > - Solaris uDAPL/IBTL on Opteron with PCI-Ex and IBA-SDR: > Two-sided operations: > - 5.41 microsec one-way latency (4 bytes) > - OpenIB/Gen2 uDAPL on Opteron with PCI-Ex and IBA-SDR: > Two-sided operations: > - 3.61 microsec one-way latency (4 bytes) Just out of curiousity, do you have any idea why Solaris uDAPL does so much worse than Linux uDAPL on the same hardware? - R. From mshefty at ichips.intel.com Mon May 22 09:16:00 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 22 May 2006 09:16:00 -0700 Subject: [openib-general] RDMA kernel utilities-newbie In-Reply-To: <20060522090804.53858.qmail@web8325.mail.in.yahoo.com> References: <20060522090804.53858.qmail@web8325.mail.in.yahoo.com> Message-ID: <4471E3C0.3060807@ichips.intel.com> keshetti mahesh wrote: > i need to develop a kernel utility capable of RDMA read/write operations > i have seen example utilities under > "svn/gen2/utils/src/linux-kernel/infiniband/util/" tree. > where can i find the documentation related to them? There's no documentation for those utilities - just the code. - Sean From sean.hefty at intel.com Mon May 22 09:19:39 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 22 May 2006 09:19:39 -0700 Subject: [openib-general] Re: vapi versus openib imm_data In-Reply-To: <20060522143801.GB13663@osc.edu> Message-ID: >Thanks both. I'll solve this by adding htonl/ntohl, only on the >VAPI side Since VAPI is wanting the data in host order, while openib uses network order, it makes more sense to me to do the swapping on the openib side. - Sean From mst at mellanox.co.il Mon May 22 09:26:32 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 22 May 2006 19:26:32 +0300 Subject: [openib-general] [PATCH] cma: fix bind to ip In-Reply-To: References: <20060518133627.GW30211@mellanox.co.il> <446CA5DE.1040600@ichips.intel.com> <20060522083054.GF15161@mellanox.co.il> Message-ID: <20060522162632.GP15161@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [openib-general] [PATCH] cma: fix bind to ip > > Michael> I'd like very much for the following two patches to be > Michael> merged. While SDP isn't likely to be in kernel 2.6.18, I > Michael> think its very confusing to have CMA API in-kernel and on > Michael> trunk to have slightly different semantics. > > OK, can you send me a patch against my for-2.6.18 branch that has the > changes you're looking for? > > Thanks, > Roland > Split or rolled into one? -- MST From rdreier at cisco.com Mon May 22 09:37:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 May 2006 09:37:21 -0700 Subject: [openib-general] [PATCH] cma: fix bind to ip In-Reply-To: <20060522162632.GP15161@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 22 May 2006 19:26:32 +0300") References: <20060518133627.GW30211@mellanox.co.il> <446CA5DE.1040600@ichips.intel.com> <20060522083054.GF15161@mellanox.co.il> <20060522162632.GP15161@mellanox.co.il> Message-ID: Michael> Split or rolled into one? Just roll it up. I haven't been tracking changes to the patches I have queued up -- I just want to merge the latest version. - R. From halr at voltaire.com Mon May 22 09:34:10 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 22 May 2006 12:34:10 -0400 Subject: [openib-general] [PATCH] OpenSM: Support C14-24-2.1 in terms of valid components when PortState is DOWN Message-ID: <1148315648.4470.64114.camel@hal.voltaire.com> OpenSM: Support C14-24-2.1 in terms of valid components when PortState is DOWN Signed-off-by: Hal Rosenstock Index: include/opensm/osm_port.h =================================================================== --- include/opensm/osm_port.h (revision 7396) +++ include/opensm/osm_port.h (working copy) @@ -427,7 +427,8 @@ osm_physp_set_health( * osm_physp_set_port_info * * DESCRIPTION -* Copies the PortInfo attribute into the Physical Port object. +* Copies the PortInfo attribute into the Physical Port object +* based on the PortState. * * SYNOPSIS */ @@ -438,7 +439,19 @@ osm_physp_set_port_info( { CL_ASSERT( p_pi ); CL_ASSERT( osm_physp_is_valid( p_physp ) ); - p_physp->port_info = *p_pi; + + if (ib_port_info_get_port_state(p_pi) == IB_LINK_DOWN) + { + /* If PortState is down, only copy PortState */ + /* PortPhysicalState per C14-24-2.1 */ + ib_port_info_set_port_state(&p_physp->port_info, IB_LINK_DOWN); + ib_port_info_set_port_phys_state( + ib_port_info_get_port_phys_state(p_pi), &p_physp->port_info); + } + else + { + p_physp->port_info = *p_pi; + } } /* * PARAMETERS From rdreier at cisco.com Mon May 22 09:41:12 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 May 2006 09:41:12 -0700 Subject: [openib-general] Re: [PATCH] comma at end of enumerator list In-Reply-To: <20060522153101.GO15161@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 22 May 2006 18:31:01 +0300") References: <20060522153101.GO15161@mellanox.co.il> Message-ID: Can't hurt I guess.. applied From ftillier at silverstorm.com Mon May 22 09:43:35 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Mon, 22 May 2006 09:43:35 -0700 Subject: [openib-general] Re: vapi versus openib imm_data In-Reply-To: References: <20060522143801.GB13663@osc.edu> Message-ID: <79ae2f320605220943q37747562sb2009f95fc62b4f0@mail.gmail.com> On 5/22/06, Sean Hefty wrote: > >Thanks both. I'll solve this by adding htonl/ntohl, only on the > >VAPI side > > Since VAPI is wanting the data in host order, while openib uses network order, > it makes more sense to me to do the swapping on the openib side. It doesn't matter what VAPI wants - it's the application that matters. If the application is using the immediate data for flags, you don't need any swapping on the OpenIB side of things, and you can avoid the swap altogether. While this makes the VAPI implementation less efficient (two swaps of the immediate data), hopefully that implementation will be replaced overtime with OpenIB leaving an optimal solution. Again, this all depends on what the app is doing with the immediate data. - Fab From rdreier at cisco.com Mon May 22 09:48:09 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 May 2006 09:48:09 -0700 Subject: [openib-general] Re: [PATCH] [TRIVIAL] IPoIB doc: Update for IPoIB RFCs being issued by IETF In-Reply-To: <1148055297.18971.106572.camel@hal.voltaire.com> (Hal Rosenstock's message of "19 May 2006 12:18:49 -0400") References: <1148053427.18971.105953.camel@hal.voltaire.com> <1148055297.18971.106572.camel@hal.voltaire.com> Message-ID: I queued this for 2.6.18: IPoIB: Mention RFC numbers in documentation Now that the IETF has released RFCs covering IPoIB, give the numbers in the documentation for IPoIB. Signed-off-by: Roland Dreier diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt index 5c5a4cc..1870355 100644 --- a/Documentation/infiniband/ipoib.txt +++ b/Documentation/infiniband/ipoib.txt @@ -1,10 +1,10 @@ IP OVER INFINIBAND The ib_ipoib driver is an implementation of the IP over InfiniBand - protocol as specified by the latest Internet-Drafts issued by the - IETF ipoib working group. It is a "native" implementation in the - sense of setting the interface type to ARPHRD_INFINIBAND and the - hardware address length to 20 (earlier proprietary implementations + protocol as specified by RFC 4391 and 4392, issued by the IETF ipoib + working group. It is a "native" implementation in the sense of + setting the interface type to ARPHRD_INFINIBAND and the hardware + address length to 20 (earlier proprietary implementations masqueraded to the kernel as ethernet interfaces). Partitions and P_Keys @@ -53,3 +53,7 @@ References IETF IP over InfiniBand (ipoib) Working Group http://ietf.org/html.charters/ipoib-charter.html + Transmission of IP over InfiniBand (IPoIB) (RFC 4391) + http://ietf.org/rfc/rfc4391.txt + IP over InfiniBand (IPoIB) Architecture (RFC 4392) + http://ietf.org/rfc/rfc4392.txt From mshefty at ichips.intel.com Mon May 22 09:54:23 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 22 May 2006 09:54:23 -0700 Subject: [openib-general] Re: vapi versus openib imm_data In-Reply-To: <79ae2f320605220943q37747562sb2009f95fc62b4f0@mail.gmail.com> References: <20060522143801.GB13663@osc.edu> <79ae2f320605220943q37747562sb2009f95fc62b4f0@mail.gmail.com> Message-ID: <4471ECBF.4080203@ichips.intel.com> Fabian Tillier wrote: > It doesn't matter what VAPI wants - it's the application that matters. > If the application is using the immediate data for flags, you don't > need any swapping on the OpenIB side of things, and you can avoid the > swap altogether. While this makes the VAPI implementation less > efficient (two swaps of the immediate data), hopefully that > implementation will be replaced overtime with OpenIB leaving an > optimal solution. Again, this all depends on what the app is doing > with the immediate data. If you run openib on different platforms, one big endian and the other little endiand, this doesn't work. - Sean From rdreier at cisco.com Mon May 22 09:58:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 May 2006 09:58:13 -0700 Subject: [openib-general] Re: ipoib_reap_ah question In-Reply-To: (Shirley Ma's message of "Fri, 19 May 2006 11:10:28 -0700") References: Message-ID: Shirley> Hello Roland, Is there any particular reason to use Shirley> ipoib_reap_ah thread? Shirley> In my tx_ring removal patch, I tested without Shirley> ipoib_reap_ah work queue by simply adding kref_get(), Shirley> kref_put() in ipoib_send(), and i didn't see any Shirley> difference including performance. If there is no other Shirley> risk, I will remove it to make it simple. Using kref_get() and kref_put() adds an atomic operation to the data path. You may not be measuring the cost but an atomic operation is just about the most expensive thing you can do. I think you should keep your patches simple -- "one idea per patch." So if you want to experiment with both tx_ring removal and the reap_ah removal, keep in mind that they should be merged as separate patches. So you should probably develop them that way. - R. From rdreier at cisco.com Mon May 22 09:53:17 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 May 2006 09:53:17 -0700 Subject: [openib-general][RFC][PATCH] mthca: HCA initialization parameters In-Reply-To: <4470514E.8020501@voltaire.com> (Leonid Arsh's message of "Sun, 21 May 2006 14:38:54 +0300") References: <20060518144810.GA9756@voltaire.com> <4470514E.8020501@voltaire.com> Message-ID: Leonid> Do you mean querying the firmware to determine the type of Leonid> the HCA? No, I meant using the request_firmware() function to ask userspace for a profile for each HCA, rather than allowing only one profile to be provided (which will be used for all HCAs in the system). But I'm not sure that request_firmware() really makes sense. I just wonder if having one profile for all HCAs is the right answer. Maybe it is. Roland> Does it make sense to tune all of these values? Leonid> I'm not sure that every one of the parameters will be used Leonid> but my feeling is that we want to let the user change the Leonid> whole profile for completeness. My bias is in the other direction. If we can't justify have something be settable then I don't think we should add the setting. It's just adding complexity for no benefit. Also, my point was that changing the number of UARs doesn't make any sense in any circumstance. Reducing the number doesn't save anything significant, and increasing the number won't work. So why have the parameter? Leonid> We have two ideas here: 1. We can set the access Leonid> permissions to 0 and that way no sysfs entries will be Leonid> created. 2. Update the module parameters with the real Leonid> values after the initialization phase. #2 makes a lot more sense to me. - R. From ftillier at silverstorm.com Mon May 22 10:16:37 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Mon, 22 May 2006 10:16:37 -0700 Subject: [openib-general] Re: vapi versus openib imm_data In-Reply-To: <4471ECBF.4080203@ichips.intel.com> References: <20060522143801.GB13663@osc.edu> <79ae2f320605220943q37747562sb2009f95fc62b4f0@mail.gmail.com> <4471ECBF.4080203@ichips.intel.com> Message-ID: <79ae2f320605221016p6c76fcao9c1f014f6d582624@mail.gmail.com> On 5/22/06, Sean Hefty wrote: > Fabian Tillier wrote: > > It doesn't matter what VAPI wants - it's the application that matters. > > If the application is using the immediate data for flags, you don't > > need any swapping on the OpenIB side of things, and you can avoid the > > swap altogether. While this makes the VAPI implementation less > > efficient (two swaps of the immediate data), hopefully that > > implementation will be replaced overtime with OpenIB leaving an > > optimal solution. Again, this all depends on what the app is doing > > with the immediate data. > > If you run openib on different platforms, one big endian and the other little > endiand, this doesn't work. If you swap the flag constants in little endian systems so they're always in network order - something the compiler can do for you - then you're checking for a bit, and it is safe to treat the value in network order always. The flag constant will be different in little endian and big endian, but will represent the same bit. If you do any arithmetic, then you need it in host order. - Fab From mshefty at ichips.intel.com Mon May 22 10:26:42 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 22 May 2006 10:26:42 -0700 Subject: [openib-general] Re: vapi versus openib imm_data In-Reply-To: <79ae2f320605221016p6c76fcao9c1f014f6d582624@mail.gmail.com> References: <20060522143801.GB13663@osc.edu> <79ae2f320605220943q37747562sb2009f95fc62b4f0@mail.gmail.com> <4471ECBF.4080203@ichips.intel.com> <79ae2f320605221016p6c76fcao9c1f014f6d582624@mail.gmail.com> Message-ID: <4471F451.6080009@ichips.intel.com> Fabian Tillier wrote: > If you swap the flag constants in little endian systems so they're > always in network order - something the compiler can do for you - then > you're checking for a bit, and it is safe to treat the value in > network order always. The flag constant will be different in little > endian and big endian, but will represent the same bit. Correct - but you must still provide the immediate data in network byte order. I didn't say that the swapping had to be done at runtime, only that it is better to do it on the openib side. Putting it on the VAPI side is only going to cause the the app to break later, and means that the app isn't obeying the requirements of either interface. - Sean From euros_info at mynet.com Mon May 22 10:37:30 2006 From: euros_info at mynet.com (euros_info) Date: Mon, 22 May 2006 20:37:30 +0300 (EEST) Subject: [openib-general] YOUR E-MAIL ADDRESS WON THE LOTTERY.2006. Message-ID: <29611.83.34.129.0.1148319450.mynet@webmail83.mynet.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: untitled-1 URL: From sashak at voltaire.com Mon May 22 10:56:37 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 22 May 2006 20:56:37 +0300 Subject: [openib-general] Re: [PATCH] RFC: opensm: serialize osm_state_mgr_process() In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBD4@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30102BBD4@mtlexch01.mtl.com> Message-ID: <20060522175637.GM30176@sashak.voltaire.com> On 14:44 Mon 22 May , Eitan Zahavi wrote: > Hi Sasha, > > > > > > > The idea to use pthread in OpenSM code is totally wrong. > > > Please stop doing this. We want this code to be shared with Windows > and > > > this breaks it. > > > > The problem is that complib does not provide needed primitives, mostly > > in synchronization and thread management areas (like > pthread_cond_wait() > > and friends, pthread_cancel() and friends). > [EZ] The problem is that any code you write in OpenSM should be portable > to existing complib such that it would be portable to windows. > If some thread interface is missing (I am surprised we could get so far > without that primitive) - please add to complib and also make sure > Windows native threads can support that primitive. This will not help to Windows anyway (just additional works on our side). complib on linux has pthread as backend, any new complib wrapper will require a works on windows side. > > The option to extend complib also does not look very helpful for > Windows > > too - complib uses pthreads as backend anyway. > [EZ] No the windows complib does not use pthread at all. linux's does. > > Is using of pthread library with Windows may solve sharing issue? > [EZ] It might if you are willing to go through this un-needed exercise > ... > We have a complib in windows that is well tested and functional today. > I do not see why you need to re-write it. > > > > Other option I may think about is to use pthread wrapper (in the same > > way as it is done today with complib). > [EZ] This does not solve the need to implement the same functionality in > windows. > And since I do not want to re-implement complib for windows - I prefer > sticking with complib wrappers. And why new complib wrapper will be better (or simpler) than new pthread wrapper? (pthread at least is much more standard and generally popular). > > > > > > > > Also please provide a clear RFC for what this patch is trying to do. > > > > Basically this serializes execution of osm_state_mgr_process(), so > > instead of to be directly called via dispatcher's callback (with > possible > > waiting for the lock), state_manager will be signaled (and wakeuped if > > necessary), as result we don't need big state_lock anymore, mutex is > > needed only to protect signal_mask (opensm's osm_signal_t, not *nix > > signals) update. > [EZ] I do not see what problem you are trying to solve? from original post: This serializes execution of osm_state_mgr_process() and removes the big state_lock. This should reduce "locked state" time and prevent potential dispatcher blocking. > What is wrong with the current implementation? In addititon to above: There is impossible to know reliably is there pended signals or transactions, etc. In general the current implementation looks > Are you fixing a bug? Mostly this patch was born yet in period of deadlocking (on the big state_lock due to broken signal processing) and endless blocking (due to bad broken mad counters). And we discussed several needed improvements. I've just pended this stuff until nearly end of 1.0 cycle. Other obvious thing we will need is to resolve cl_atomic() workaround which was done then. > The dispatching mechanism used allows passing the state manager signals > which must be processed in order. The state_lock guarantees this order. This is done as complib's cl_spinlock() and finally via pthread_mutex. Does pthread_mutex_lock() guarantees the order? Sasha From pw at osc.edu Mon May 22 11:14:41 2006 From: pw at osc.edu (Pete Wyckoff) Date: Mon, 22 May 2006 14:14:41 -0400 Subject: [openib-general] Re: vapi versus openib imm_data In-Reply-To: <79ae2f320605220943q37747562sb2009f95fc62b4f0@mail.gmail.com> References: <20060522143801.GB13663@osc.edu> <79ae2f320605220943q37747562sb2009f95fc62b4f0@mail.gmail.com> Message-ID: <20060522181441.GA14213@osc.edu> ftillier at silverstorm.com wrote on Mon, 22 May 2006 09:43 -0700: > On 5/22/06, Sean Hefty wrote: > >>Thanks both. I'll solve this by adding htonl/ntohl, only on the > >>VAPI side > > > >Since VAPI is wanting the data in host order, while openib uses network > >order, > >it makes more sense to me to do the swapping on the openib side. > > It doesn't matter what VAPI wants - it's the application that matters. > If the application is using the immediate data for flags, you don't > need any swapping on the OpenIB side of things, and you can avoid the > swap altogether. While this makes the VAPI implementation less > efficient (two swaps of the immediate data), hopefully that > implementation will be replaced overtime with OpenIB leaving an > optimal solution. Again, this all depends on what the app is doing > with the immediate data. Even though it would be more efficient today for my little-endian hosts to do the byteswap on the OpenIB side only, I share Fab's thoughts about what the more prevalent API will be in the future. The app in question has already handled swapping the immediate data above the network layer, to support heterogeneous configurations. I didn't expect a byte-swapping service from VAPI or OpenIB and was a bit surprised to find that VAPI provides it. Correcting it on the VAPI side rather than the OpenIB side fits my expectations better, and hopefully is more natural for other programmers with a sockets background too. -- Pete From mshefty at ichips.intel.com Mon May 22 11:29:30 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 22 May 2006 11:29:30 -0700 Subject: [openib-general] connection management In-Reply-To: <20060522042532.46143.qmail@web38501.mail.mud.yahoo.com> References: <20060522042532.46143.qmail@web38501.mail.mud.yahoo.com> Message-ID: <4472030A.7080009@ichips.intel.com> amit byron wrote: > o to make multiple connections using ib_send_cm_req() > i would have make connection requests using > ib_send_cm_req() with different service id. is this > sufficient, or i missed something? i can use the > same port number, correct? You need one service ID per connection. You would call ib_send_cm_req() with each service id. > o machine A is connected to machine B using ib_cm's > request, response, ready to use protocol (machine A > initiates the connection request). the question i > have: for machine B to send message to machine A, > is it sufficient for machine B to do a lookup using > ib_sa_path_rec_get(), and then do ib_post_send()? > if no, then what else needs to be done? I'm assuming that machine B performed a listen. It can send a message to A immediately after receiving an RTU or a message from A. There's no need to get a path record. You will need to transition the QPs on both sides. You can also refer to svn/gen2/utils/src/linux-kernel/infiniband/util/cmpost for a kernel test utility that operates over the ib_cm. A userspace test utility is svn/gen2/trunk/src/userspace/libibcm/examples/cmpost.c. There's also an alternate connection manager that uses IP addressing available as the rdma_cm for both user/kernel clients if you're interested. - Sean From bugzilla-daemon at openib.org Mon May 22 12:20:21 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 22 May 2006 12:20:21 -0700 (PDT) Subject: [openib-general] [Bug 91] New: sizeof(srp_indirect_buf) wrong on 64-bit platforms Message-ID: <20060522192021.D09BD2283D7@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=91 Summary: sizeof(srp_indirect_buf) wrong on 64-bit platforms Product: OpenFabrics Linux Version: 1.0rc2 Platform: IA64 OS/Version: All Status: NEW Severity: normal Priority: P2 Component: SRP AssignedTo: bugzilla at openib.org ReportedBy: chas at cmf.nrl.navy.mil the packed attribute should be applied to the entire struct instead of the just the array of srp_direct elements. the size of this struct should be 20, not 24 as the following test program indicates. nothing critical depends on the sizeof of srp_indirect_buf but it would be nice to be consistent. #include #define u64 unsigned long long #define __be64 unsigned long long #define __be32 unsigned int struct srp_direct_buf { __be64 va; __be32 key; __be32 len; }; struct srp_indirect_buf { struct srp_direct_buf table_desc; __be32 len; struct srp_direct_buf desc_list[0] __attribute__((packed)); }; struct srp_indirect_buf2 { struct srp_direct_buf table_desc; __be32 len; struct srp_direct_buf desc_list[0]; } __attribute__((packed)); main() { printf("sizeof(struct srp_direct_buf) = %d\n", sizeof(struct srp_direct_buf)); printf("sizeof(struct srp_indirect_buf) = %d\n", sizeof(struct srp_indirect_buf)); printf("sizeof(struct srp_indirect_buf2) = %d\n", sizeof(struct srp_indirect_buf2)); } --- drivers/infiniband/include/scsi/srp.h.000 2006-05-22 14:56:51.337237500 -0400 +++ drivers/infiniband/include/scsi/srp.h 2006-05-22 12:03:27.141582521 -0400 @@ -101,8 +101,8 @@ struct srp_indirect_buf { struct srp_direct_buf table_desc; __be32 len; - struct srp_direct_buf desc_list[0] __attribute__((packed)); -}; + struct srp_direct_buf desc_list[0]; +} __attribute__((packed)); enum { SRP_MULTICHAN_SINGLE = 0, ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From amit_byron at yahoo.com Mon May 22 12:26:19 2006 From: amit_byron at yahoo.com (amit byron) Date: Mon, 22 May 2006 19:26:19 +0000 (UTC) Subject: [openib-general] connection management Message-ID: Sean, thanks for answering! Amit From bugzilla-daemon at openib.org Mon May 22 12:58:48 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 22 May 2006 12:58:48 -0700 (PDT) Subject: [openib-general] [Bug 91] sizeof(srp_indirect_buf) wrong on 64-bit platforms Message-ID: <20060522195848.465182283D7@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=91 ------- Additional Comments From rolandd at cisco.com 2006-05-22 12:58 ------- This is already fixed in upstream kernels. I'm not sure what OFED has to do to pick up the fix, since is not maintained as part of the openib svn. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Mon May 22 13:11:05 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 22 May 2006 13:11:05 -0700 (PDT) Subject: [openib-general] [Bug 92] New: sizeof(srp_indirect_buf) wrong on 64-bit platforms Message-ID: <20060522201105.30E262283D7@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=92 Summary: sizeof(srp_indirect_buf) wrong on 64-bit platforms Product: OpenFabrics Linux Version: 1.0rc2 Platform: IA64 OS/Version: All Status: NEW Severity: normal Priority: P2 Component: SRP AssignedTo: bugzilla at openib.org ReportedBy: chas at cmf.nrl.navy.mil the packed attribute should be applied to the entire struct instead of the just the array of srp_direct elements. the size of this struct should be 20, not 24 as the following test program indicates. nothing critical depends on the sizeof of srp_indirect_buf but it would be nice to be consistent. #include #define u64 unsigned long long #define __be64 unsigned long long #define __be32 unsigned int struct srp_direct_buf { __be64 va; __be32 key; __be32 len; }; struct srp_indirect_buf { struct srp_direct_buf table_desc; __be32 len; struct srp_direct_buf desc_list[0] __attribute__((packed)); }; struct srp_indirect_buf2 { struct srp_direct_buf table_desc; __be32 len; struct srp_direct_buf desc_list[0]; } __attribute__((packed)); main() { printf("sizeof(struct srp_direct_buf) = %d\n", sizeof(struct srp_direct_buf)); printf("sizeof(struct srp_indirect_buf) = %d\n", sizeof(struct srp_indirect_buf)); printf("sizeof(struct srp_indirect_buf2) = %d\n", sizeof(struct srp_indirect_buf2)); } --- drivers/infiniband/include/scsi/srp.h.000 2006-05-22 14:56:51.337237500 -0400 +++ drivers/infiniband/include/scsi/srp.h 2006-05-22 12:03:27.141582521 -0400 @@ -101,8 +101,8 @@ struct srp_indirect_buf { struct srp_direct_buf table_desc; __be32 len; - struct srp_direct_buf desc_list[0] __attribute__((packed)); -}; + struct srp_direct_buf desc_list[0]; +} __attribute__((packed)); enum { SRP_MULTICHAN_SINGLE = 0, ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Mon May 22 13:13:00 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 22 May 2006 13:13:00 -0700 (PDT) Subject: [openib-general] [Bug 92] sizeof(srp_indirect_buf) wrong on 64-bit platforms Message-ID: <20060522201300.451F22283D7@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=92 rolandd at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Additional Comments From rolandd at cisco.com 2006-05-22 13:12 ------- *** This bug has been marked as a duplicate of 91 *** ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Mon May 22 13:13:00 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 22 May 2006 13:13:00 -0700 (PDT) Subject: [openib-general] [Bug 91] sizeof(srp_indirect_buf) wrong on 64-bit platforms Message-ID: <20060522201300.C1C48228540@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=91 ------- Additional Comments From rolandd at cisco.com 2006-05-22 13:13 ------- *** Bug 92 has been marked as a duplicate of this bug. *** ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Mon May 22 13:13:30 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 22 May 2006 13:13:30 -0700 (PDT) Subject: [openib-general] [Bug 92] sizeof(srp_indirect_buf) wrong on 64-bit platforms Message-ID: <20060522201330.76A472283D7@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=92 ------- Additional Comments From chas at cmf.nrl.navy.mil 2006-05-22 13:13 ------- good damned browser. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Mon May 22 13:15:22 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Mon, 22 May 2006 13:15:22 -0700 (PDT) Subject: [openib-general] [Bug 91] sizeof(srp_indirect_buf) wrong on 64-bit platforms Message-ID: <20060522201522.919CB2283D7@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=91 ------- Additional Comments From chas at cmf.nrl.navy.mil 2006-05-22 13:15 ------- i imagine scsi_srp_4979_to_2_6_14.patch in ofed could be fixed. this creates srp.h for those who are missing it. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Mon May 22 13:09:38 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 22 May 2006 23:09:38 +0300 Subject: [openib-general] Fwd: [Bug 91] sizeof(srp_indirect_buf) wrong on 64-bit platforms Message-ID: <20060522200938.GB18107@mellanox.co.il> Roland, maybe this means we need scsi/srp.h in svn for now? svn is supposed to work on 2.6.16 ... ----- Forwarded message from bugzilla-daemon at openib.org ----- From: bugzilla-daemon at openib.org Date: Mon, 22 May 2006 12:58:48 -0700 (PDT) Subject: [Bug 91] sizeof(srp_indirect_buf) wrong on 64-bit platforms X-Spam: exempt http://openib.org/bugzilla/show_bug.cgi?id=91 ------- Additional Comments From rolandd at cisco.com 2006-05-22 12:58 ------- This is already fixed in upstream kernels. I'm not sure what OFED has to do to pick up the fix, since is not maintained as part of the openib svn. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ----- End forwarded message ----- -- MST From rdreier at cisco.com Mon May 22 13:09:48 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 May 2006 13:09:48 -0700 Subject: [openib-general] SRP: [PATCH] Handling DREQ In-Reply-To: <20060522122029.GA1498@mellanox.co.il> (Ishai Rabinovitz's message of "Mon, 22 May 2006 15:20:29 +0300") References: <20060518235435.GA24917@mellanox.co.il> <20060522122029.GA1498@mellanox.co.il> Message-ID: Thanks, I applied the version below. > This is an initial patch. Maybe it will be more efficient to initiate a reconnect > in case we get IB_CM_DREQ_RECEIVED. What do you think? I guess it depends on when real targets would send DREQs. In general if a target is saying it wants to close a connection, it seems sort of rude to try and reconnect immediately... - R. Index: infiniband/ulp/srp/ib_srp.c =================================================================== --- infiniband/ulp/srp/ib_srp.c (revision 7398) +++ infiniband/ulp/srp/ib_srp.c (working copy) @@ -1201,11 +1201,10 @@ static int srp_cm_handler(struct ib_cm_i srp_cm_rej_handler(cm_id, event, target); break; - case IB_CM_MRA_RECEIVED: - printk(KERN_ERR PFX "MRA received\n"); - break; - - case IB_CM_DREP_RECEIVED: + case IB_CM_DREQ_RECEIVED: + printk(KERN_WARNING PFX "DREQ received - connection closed\n"); + if (ib_send_cm_drep(cm_id, NULL, 0)) + printk(KERN_ERR PFX "Sending CM DREP failed\n"); break; case IB_CM_TIMEWAIT_EXIT: @@ -1215,6 +1214,11 @@ static int srp_cm_handler(struct ib_cm_i target->status = 0; break; + case IB_CM_MRA_RECEIVED: + case IB_CM_DREQ_ERROR: + case IB_CM_DREP_RECEIVED: + break; + default: printk(KERN_WARNING PFX "Unhandled CM event %d\n", event->event); break; From rdreier at cisco.com Mon May 22 13:10:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 May 2006 13:10:34 -0700 Subject: [openib-general] Re: Fwd: [Bug 91] sizeof(srp_indirect_buf) wrong on 64-bit platforms In-Reply-To: <20060522200938.GB18107@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 22 May 2006 23:09:38 +0300") References: <20060522200938.GB18107@mellanox.co.il> Message-ID: > Roland, maybe this means we need scsi/srp.h in svn for now? > svn is supposed to work on 2.6.16 ... As far as I can tell the bug in the header has no effect on how the IB SRP initiator works. - R. From rdreier at cisco.com Mon May 22 13:11:25 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 May 2006 13:11:25 -0700 Subject: [openib-general] Re: Fwd: [Bug 91] sizeof(srp_indirect_buf) wrong on 64-bit platforms In-Reply-To: (Roland Dreier's message of "Mon, 22 May 2006 13:10:34 -0700") References: <20060522200938.GB18107@mellanox.co.il> Message-ID: Roland> As far as I can tell the bug in the header has no effect Roland> on how the IB SRP initiator works. And if it does have a practical impact, the best way to handle it would probably to get a fix into 2.6.16 via the -stable team. - R. From mshefty at ichips.intel.com Mon May 22 10:50:50 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 22 May 2006 10:50:50 -0700 Subject: [openib-general] [PATCH v3] ipoib: convert to use new multicast interface In-Reply-To: References: Message-ID: <4471F9FA.5090807@ichips.intel.com> Sean Hefty wrote: > Convert ipoib to make use of the new multicast module interface. I've committed this patch to svn. - Sean From Thomas.Talpey at netapp.com Mon May 22 13:25:15 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Mon, 22 May 2006 16:25:15 -0400 Subject: [openib-general] NFS/RDMA for Linux: client and server update release 5 Message-ID: <7.0.1.0.2.20060522161137.04202e30@netapp.com> Network Appliance is pleased to announce release 5 of the NFS/RDMA client and server for Linux 2.6.16.16. This update to the April 19 release adds improved server parallel performance and fixes various issues. This code supports both Infiniband and iWARP transports. Comments and feedback welcome. We're especially interested in successful test reports! Thanks. Tom Talpey, for the various NFS/RDMA projects. From xma at us.ibm.com Mon May 22 13:48:05 2006 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 22 May 2006 13:48:05 -0700 Subject: [openib-general] Re: ipoib_reap_ah question In-Reply-To: Message-ID: Roland Dreier wrote on 05/22/2006 09:58:13 AM: > I think you should keep your patches simple -- "one idea per patch." > So if you want to experiment with both tx_ring removal and the reap_ah > removal, keep in mind that they should be merged as separate patches. > So you should probably develop them that way. > > - R. I will, thanks. If I seperate these two patches I will have to use last_send as atomic_t in tx_ring removal. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon May 22 14:06:03 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 May 2006 14:06:03 -0700 Subject: [openib-general] Re: ipoib_reap_ah question In-Reply-To: (Shirley Ma's message of "Mon, 22 May 2006 13:48:05 -0700") References: Message-ID: Shirley> I will, thanks. If I seperate these two patches I will Shirley> have to use last_send as atomic_t in tx_ring removal. Why? And how does it help? ipoib never does any arithmetic on last_send so I don't see what changing it to atomic_t accomplishes. - R. From xma at us.ibm.com Mon May 22 16:03:18 2006 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 22 May 2006 16:03:18 -0700 Subject: [openib-general] [PATCH][0/7]ipoib performance patches In-Reply-To: Message-ID: Hello Roland, Let me starting to submit some of performance patches one by one for review, these patches have been validated, more tests are still going on. 1. splitting CQ and CQ handler into send/recv, changing the default NUM_WC value to bigger size. 2. requeue packets because of send queue overrun 3. remove tx_ring 4. replace ipoib_reap_ah with kref_get()/kref_put() 5. remove rx_ring 6. change poll_cq from interrupt conext to thread context, multiple threads support on both send and recv 7. tunable poll interval parameters to sycn hardare driver Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Mon May 22 16:09:13 2006 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 22 May 2006 17:09:13 -0600 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- split CQ In-Reply-To: Message-ID: 1. splitting CQ and CQ handler into send/recv, changing NUM_SEND(RECV)WC value to bigger size. Signed-off-by: Shirley Ma diff -urpN infiniband/ulp/ipoib/ipoib.h infiniband-split-cq/ulp/ipoib/ipoib.h --- infiniband/ulp/ipoib/ipoib.h 2006-04-05 17:43:18.000000000 -0700 +++ infiniband-split-cq/ulp/ipoib/ipoib.h 2006-05-22 08:48:38.000000000 -0700 @@ -2,6 +2,8 @@ * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2004 Voltaire, Inc. All rights reserved. + * Copyright (c) 2006 International Business Machines Corp., + * All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -71,7 +73,8 @@ enum { IPOIB_MAX_QUEUE_SIZE = 8192, IPOIB_MIN_QUEUE_SIZE = 2, - IPOIB_NUM_WC = 4, + IPOIB_NUM_SEND_WC = 32, + IPOIB_NUM_RECV_WC = 4, IPOIB_MAX_PATH_REC_QUEUE = 3, IPOIB_MAX_MCAST_QUEUE = 3, @@ -151,7 +154,8 @@ struct ipoib_dev_priv { u16 pkey; struct ib_pd *pd; struct ib_mr *mr; - struct ib_cq *cq; + struct ib_cq *send_cq; + struct ib_cq *recv_cq; struct ib_qp *qp; u32 qkey; @@ -164,15 +168,13 @@ struct ipoib_dev_priv { struct ipoib_rx_buf *rx_ring; - spinlock_t tx_lock; + spinlock_t tx_lock ____cacheline_aligned_in_smp; struct ipoib_tx_buf *tx_ring; unsigned tx_head; unsigned tx_tail; struct ib_sge tx_sge; struct ib_send_wr tx_wr; - struct ib_wc ibwc[IPOIB_NUM_WC]; - struct list_head dead_ahs; struct ib_event_handler event_handler; @@ -245,7 +247,8 @@ extern struct workqueue_struct *ipoib_wo /* functions */ -void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr); +void ipoib_ib_send_completion(struct ib_cq *cq, void *dev_ptr); +void ipoib_ib_recv_completion(struct ib_cq *cq, void *dev_ptr); struct ipoib_ah *ipoib_create_ah(struct net_device *dev, struct ib_pd *pd, struct ib_ah_attr *attr); diff -urpN infiniband/ulp/ipoib/ipoib_ib.c infiniband-split-cq/ulp/ipoib/ipoib_ib.c --- infiniband/ulp/ipoib/ipoib_ib.c 2006-04-05 17:43:18.000000000 -0700 +++ infiniband-split-cq/ulp/ipoib/ipoib_ib.c 2006-05-22 08:48:23.000000000 -0700 @@ -3,6 +3,8 @@ * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2005 Mellanox Technologies. All rights reserved. * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2006 International Business Machines Corp., + * All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -50,7 +52,6 @@ MODULE_PARM_DESC(data_debug_level, "Enable data path debug tracing if > 0"); #endif -#define IPOIB_OP_RECV (1ul << 31) static DEFINE_MUTEX(pkey_mutex); @@ -108,7 +109,7 @@ static int ipoib_ib_post_receive(struct list.lkey = priv->mr->lkey; param.next = NULL; - param.wr_id = id | IPOIB_OP_RECV; + param.wr_id = id; param.sg_list = &list; param.num_sge = 1; @@ -175,8 +176,8 @@ static int ipoib_ib_post_receives(struct return 0; } -static void ipoib_ib_handle_wc(struct net_device *dev, - struct ib_wc *wc) +static void ipoib_ib_handle_recv_wc(struct net_device *dev, + struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); unsigned int wr_id = wc->wr_id; @@ -184,121 +185,142 @@ static void ipoib_ib_handle_wc(struct ne ipoib_dbg_data(priv, "called: id %d, op %d, status: %d\n", wr_id, wc->opcode, wc->status); - if (wr_id & IPOIB_OP_RECV) { - wr_id &= ~IPOIB_OP_RECV; - - if (wr_id < ipoib_recvq_size) { - struct sk_buff *skb = priv->rx_ring[wr_id].skb; - dma_addr_t addr = priv->rx_ring[wr_id].mapping; - - if (unlikely(wc->status != IB_WC_SUCCESS)) { - if (wc->status != IB_WC_WR_FLUSH_ERR) - ipoib_warn(priv, "failed recv event " - "(status=%d, wrid=%d vend_err %x)\n", - wc->status, wr_id, wc->vendor_err); - dma_unmap_single(priv->ca->dma_device, addr, - IPOIB_BUF_SIZE, DMA_FROM_DEVICE); - dev_kfree_skb_any(skb); - priv->rx_ring[wr_id].skb = NULL; - return; - } - - /* - * If we can't allocate a new RX buffer, dump - * this packet and reuse the old buffer. - */ - if (unlikely(ipoib_alloc_rx_skb(dev, wr_id))) { - ++priv->stats.rx_dropped; - goto repost; - } - - ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", - wc->byte_len, wc->slid); + if (wr_id < ipoib_recvq_size) { + struct sk_buff *skb = priv->rx_ring[wr_id].skb; + dma_addr_t addr = priv->rx_ring[wr_id].mapping; + + if (unlikely(wc->status != IB_WC_SUCCESS)) { + if (wc->status != IB_WC_WR_FLUSH_ERR) + ipoib_warn(priv, "failed recv event " + "(status=%d, wrid=%d vend_err %x)\n", + wc->status, wr_id, wc->vendor_err); dma_unmap_single(priv->ca->dma_device, addr, IPOIB_BUF_SIZE, DMA_FROM_DEVICE); + dev_kfree_skb_any(skb); + priv->rx_ring[wr_id].skb = NULL; + return; + } - skb_put(skb, wc->byte_len); - skb_pull(skb, IB_GRH_BYTES); + /* + * If we can't allocate a new RX buffer, dump + * this packet and reuse the old buffer. + */ + if (unlikely(ipoib_alloc_rx_skb(dev, wr_id))) { + ++priv->stats.rx_dropped; + goto repost; + } - if (wc->slid != priv->local_lid || - wc->src_qp != priv->qp->qp_num) { - skb->protocol = ((struct ipoib_header *) skb->data)->proto; - skb->mac.raw = skb->data; - skb_pull(skb, IPOIB_ENCAP_LEN); - - dev->last_rx = jiffies; - ++priv->stats.rx_packets; - priv->stats.rx_bytes += skb->len; - - skb->dev = dev; - /* XXX get correct PACKET_ type here */ - skb->pkt_type = PACKET_HOST; - netif_rx_ni(skb); - } else { - ipoib_dbg_data(priv, "dropping loopback packet\n"); - dev_kfree_skb_any(skb); - } + ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n", + wc->byte_len, wc->slid); - repost: - if (unlikely(ipoib_ib_post_receive(dev, wr_id))) - ipoib_warn(priv, "ipoib_ib_post_receive failed " - "for buf %d\n", wr_id); - } else - ipoib_warn(priv, "completion event with wrid %d\n", - wr_id); + dma_unmap_single(priv->ca->dma_device, addr, + IPOIB_BUF_SIZE, DMA_FROM_DEVICE); - } else { - struct ipoib_tx_buf *tx_req; - unsigned long flags; + skb_put(skb, wc->byte_len); + skb_pull(skb, IB_GRH_BYTES); - if (wr_id >= ipoib_sendq_size) { - ipoib_warn(priv, "completion event with wrid %d (> %d)\n", - wr_id, ipoib_sendq_size); - return; + if (wc->slid != priv->local_lid || + wc->src_qp != priv->qp->qp_num) { + skb->protocol = ((struct ipoib_header *) skb->data)->proto; + skb->mac.raw = skb->data; + skb_pull(skb, IPOIB_ENCAP_LEN); + + dev->last_rx = jiffies; + ++priv->stats.rx_packets; + priv->stats.rx_bytes += skb->len; + + skb->dev = dev; + /* XXX get correct PACKET_ type here */ + skb->pkt_type = PACKET_HOST; + netif_rx_ni(skb); + } else { + ipoib_dbg_data(priv, "dropping loopback packet\n"); + dev_kfree_skb_any(skb); } - ipoib_dbg_data(priv, "send complete, wrid %d\n", wr_id); + repost: + if (unlikely(ipoib_ib_post_receive(dev, wr_id))) + ipoib_warn(priv, "ipoib_ib_post_receive failed " + "for buf %d\n", wr_id); + } else + ipoib_warn(priv, "completion event with wrid %d\n", + wr_id); +} + +static void ipoib_ib_handle_send_wc(struct net_device *dev, + struct ib_wc *wc) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + unsigned int wr_id = wc->wr_id; + struct ipoib_tx_buf *tx_req; + unsigned long flags; + + ipoib_dbg_data(priv, "called: id %d, op %d, status: %d\n", + wr_id, wc->opcode, wc->status); - tx_req = &priv->tx_ring[wr_id]; + if (wr_id >= ipoib_sendq_size) { + ipoib_warn(priv, "completion event with wrid %d (> %d)\n", + wr_id, ipoib_sendq_size); + return; + } - dma_unmap_single(priv->ca->dma_device, - pci_unmap_addr(tx_req, mapping), - tx_req->skb->len, - DMA_TO_DEVICE); + ipoib_dbg_data(priv, "send complete, wrid %d\n", wr_id); - ++priv->stats.tx_packets; - priv->stats.tx_bytes += tx_req->skb->len; + tx_req = &priv->tx_ring[wr_id]; - dev_kfree_skb_any(tx_req->skb); + dma_unmap_single(priv->ca->dma_device, + pci_unmap_addr(tx_req, mapping), + tx_req->skb->len, + DMA_TO_DEVICE); - spin_lock_irqsave(&priv->tx_lock, flags); - ++priv->tx_tail; - if (netif_queue_stopped(dev) && - priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) - netif_wake_queue(dev); - spin_unlock_irqrestore(&priv->tx_lock, flags); + ++priv->stats.tx_packets; + priv->stats.tx_bytes += tx_req->skb->len; - if (wc->status != IB_WC_SUCCESS && - wc->status != IB_WC_WR_FLUSH_ERR) - ipoib_warn(priv, "failed send event " - "(status=%d, wrid=%d vend_err %x)\n", - wc->status, wr_id, wc->vendor_err); - } + dev_kfree_skb_any(tx_req->skb); + + spin_lock_irqsave(&priv->tx_lock, flags); + ++priv->tx_tail; + if (netif_queue_stopped(dev) && + priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) + netif_wake_queue(dev); + spin_unlock_irqrestore(&priv->tx_lock, flags); + + if (wc->status != IB_WC_SUCCESS && + wc->status != IB_WC_WR_FLUSH_ERR) + ipoib_warn(priv, "failed send event " + "(status=%d, wrid=%d vend_err %x)\n", + wc->status, wr_id, wc->vendor_err); +} + +void ipoib_ib_send_completion(struct ib_cq *cq, void *dev_ptr) +{ + struct net_device *dev = (struct net_device *) dev_ptr; + struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_wc ibwc[IPOIB_NUM_SEND_WC]; + int n, i; + + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); + do { + n = ib_poll_cq(cq, IPOIB_NUM_SEND_WC, ibwc); + for (i = 0; i < n; ++i) + ipoib_ib_handle_send_wc(dev, ibwc + i); + } while (n != 0); } -void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) +void ipoib_ib_recv_completion(struct ib_cq *cq, void *dev_ptr) { struct net_device *dev = (struct net_device *) dev_ptr; struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_wc ibwc[IPOIB_NUM_RECV_WC]; int n, i; ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); do { - n = ib_poll_cq(cq, IPOIB_NUM_WC, priv->ibwc); + n = ib_poll_cq(cq, IPOIB_NUM_RECV_WC, ibwc); for (i = 0; i < n; ++i) - ipoib_ib_handle_wc(dev, priv->ibwc + i); - } while (n == IPOIB_NUM_WC); + ipoib_ib_handle_recv_wc(dev, ibwc); + } while (n != 0); } static inline int post_send(struct ipoib_dev_priv *priv, diff -urpN infiniband/ulp/ipoib/ipoib_main.c infiniband-split-cq/ulp/ipoib/ipoib_main.c --- infiniband/ulp/ipoib/ipoib_main.c 2006-05-03 13:16:18.000000000 -0700 +++ infiniband-split-cq/ulp/ipoib/ipoib_main.c 2006-05-22 08:48:47.000000000 -0700 @@ -2,6 +2,8 @@ * Copyright (c) 2004 Topspin Communications. All rights reserved. * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2004 Voltaire, Inc. All rights reserved. + * Copyright (c) 2006 International Business Machines Corp., + * All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU diff -urpN infiniband/ulp/ipoib/ipoib_verbs.c infiniband-split-cq/ulp/ipoib/ipoib_verbs.c --- infiniband/ulp/ipoib/ipoib_verbs.c 2006-04-05 17:43:18.000000000 -0700 +++ infiniband-split-cq/ulp/ipoib/ipoib_verbs.c 2006-05-22 08:48:54.000000000 -0700 @@ -1,6 +1,8 @@ /* * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. * Copyright (c) 2005 Mellanox Technologies. All rights reserved. + * Copyright (c) 2006 International Business Machines Corp., + * All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -174,24 +176,35 @@ int ipoib_transport_dev_init(struct net_ return -ENODEV; } - priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, - ipoib_sendq_size + ipoib_recvq_size + 1); - if (IS_ERR(priv->cq)) { - printk(KERN_WARNING "%s: failed to create CQ\n", ca->name); + priv->send_cq = ib_create_cq(priv->ca, ipoib_ib_send_completion, NULL, dev, + ipoib_sendq_size + 1); + if (IS_ERR(priv->send_cq)) { + printk(KERN_WARNING "%s: failed to create send CQ\n", ca->name); goto out_free_pd; } - if (ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP)) - goto out_free_cq; + if (ib_req_notify_cq(priv->send_cq, IB_CQ_NEXT_COMP)) + goto out_free_send_cq; + + + priv->recv_cq = ib_create_cq(priv->ca, ipoib_ib_recv_completion, NULL, dev, + ipoib_recvq_size + 1); + if (IS_ERR(priv->recv_cq)) { + printk(KERN_WARNING "%s: failed to create recv CQ\n", ca->name); + goto out_free_send_cq; + } + + if (ib_req_notify_cq(priv->recv_cq, IB_CQ_NEXT_COMP)) + goto out_free_recv_cq; priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE); if (IS_ERR(priv->mr)) { printk(KERN_WARNING "%s: ib_get_dma_mr failed\n", ca->name); - goto out_free_cq; + goto out_free_recv_cq; } - init_attr.send_cq = priv->cq; - init_attr.recv_cq = priv->cq, + init_attr.send_cq = priv->send_cq; + init_attr.recv_cq = priv->recv_cq, priv->qp = ib_create_qp(priv->pd, &init_attr); if (IS_ERR(priv->qp)) { @@ -215,8 +228,11 @@ int ipoib_transport_dev_init(struct net_ out_free_mr: ib_dereg_mr(priv->mr); -out_free_cq: - ib_destroy_cq(priv->cq); +out_free_recv_cq: + ib_destroy_cq(priv->recv_cq); + +out_free_send_cq: + ib_destroy_cq(priv->send_cq); out_free_pd: ib_dealloc_pd(priv->pd); @@ -238,7 +254,10 @@ void ipoib_transport_dev_cleanup(struct if (ib_dereg_mr(priv->mr)) ipoib_warn(priv, "ib_dereg_mr failed\n"); - if (ib_destroy_cq(priv->cq)) + if (ib_destroy_cq(priv->send_cq)) + ipoib_warn(priv, "ib_cq_destroy failed\n"); + + if (ib_destroy_cq(priv->recv_cq)) ipoib_warn(priv, "ib_cq_destroy failed\n"); if (ib_dealloc_pd(priv->pd)) Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: infiniband-split-cq.patch1 Type: application/octet-stream Size: 13884 bytes Desc: not available URL: From zhushisongzhu at yahoo.com Mon May 22 19:56:03 2006 From: zhushisongzhu at yahoo.com (zhu shi song) Date: Mon, 22 May 2006 19:56:03 -0700 (PDT) Subject: [openib-general] which way to port squid for supporting large amount concurrent connections In-Reply-To: <20060522051721.GA14583@mellanox.co.il> Message-ID: <20060523025603.68783.qmail@web36907.mail.mud.yahoo.com> I won't wait sdp OK. I hope to use another method to port squid. I know VAPI, uDAP can do this. But can you suggest me which is best way I should adopt? And How can I get more info about how to program using these methods? tks zhu __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From lindahl at pathscale.com Mon May 22 21:33:24 2006 From: lindahl at pathscale.com (Greg Lindahl) Date: Mon, 22 May 2006 21:33:24 -0700 Subject: [openib-general] which way to port squid for supporting large amount concurrent connections In-Reply-To: <20060523025603.68783.qmail@web36907.mail.mud.yahoo.com> References: <20060522051721.GA14583@mellanox.co.il> <20060523025603.68783.qmail@web36907.mail.mud.yahoo.com> Message-ID: <20060523043324.GA4338@greglaptop> On Mon, May 22, 2006 at 07:56:03PM -0700, zhu shi song wrote: > I won't wait sdp OK. I hope to use another method to > port squid. How about IPoIB? -- greg From eitan at mellanox.co.il Mon May 22 22:43:29 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 23 May 2006 08:43:29 +0300 Subject: [openib-general] RE: [PATCH] OpenSM: Support C14-24-2.1 in terms of valid componentswhen PortState is DOWN Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3023686EA@mtlexch01.mtl.com> Looks good Thanks Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Monday, May 22, 2006 7:34 PM > To: openib-general at openib.org > Cc: Eitan Zahavi > Subject: [PATCH] OpenSM: Support C14-24-2.1 in terms of valid componentswhen > PortState is DOWN > > OpenSM: Support C14-24-2.1 in terms of valid components when PortState > is DOWN > > Signed-off-by: Hal Rosenstock > > Index: include/opensm/osm_port.h > =================================================================== > --- include/opensm/osm_port.h (revision 7396) > +++ include/opensm/osm_port.h (working copy) > @@ -427,7 +427,8 @@ osm_physp_set_health( > * osm_physp_set_port_info > * > * DESCRIPTION > -* Copies the PortInfo attribute into the Physical Port object. > +* Copies the PortInfo attribute into the Physical Port object > +* based on the PortState. > * > * SYNOPSIS > */ > @@ -438,7 +439,19 @@ osm_physp_set_port_info( > { > CL_ASSERT( p_pi ); > CL_ASSERT( osm_physp_is_valid( p_physp ) ); > - p_physp->port_info = *p_pi; > + > + if (ib_port_info_get_port_state(p_pi) == IB_LINK_DOWN) > + { > + /* If PortState is down, only copy PortState */ > + /* PortPhysicalState per C14-24-2.1 */ > + ib_port_info_set_port_state(&p_physp->port_info, IB_LINK_DOWN); > + ib_port_info_set_port_phys_state( > + ib_port_info_get_port_phys_state(p_pi), &p_physp->port_info); > + } > + else > + { > + p_physp->port_info = *p_pi; > + } > } > /* > * PARAMETERS > From eitan at mellanox.co.il Mon May 22 23:14:27 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 23 May 2006 09:14:27 +0300 Subject: [openib-general] RE: [PATCH] opensm: remove osm_pkey_mgr.h Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3023686EB@mtlexch01.mtl.com> Hi Sasha > On 13:55 Mon 22 May , Eitan Zahavi wrote: > > Hi Sasha, > > > > My point is simple: > > > > OpenSM has a very structured skeleton: > > 1. All mad receivers have two c files and two h files > > 1.1 mad receive controller which deals with dispatcher registration . > > 1.2 Mad receiver which deals with all the action happening after such a > > mad is received > > 2. All algorithm stages (managers) have a c file and h file > > An algorithm stage might be lid assignment, routing, partition > > enforcement etc > > 3. All SMDB objects have a c file and h files. > > Examples are Nodes, Ports, Multicast registrations etc > > > > These are the structural code rules for OpenSM. > > > > Even if you think it is better to merge functionality and avoid having > > some of these h files and you might even be able to save some lines of > > code by doing that, you break the code structure. If you personally like > > to work with flat code - this is your preference. > > I prefer having clear structure. So if I need to know where is the pkey > > manager object defined? What are its internal state variables? What are > > the algorithms it uses? Etc I can simply open up osm_pkey_mgr.h and find > > this out. > > I would be agree with your last example, but it is not the case - what > you will actually find in osm_pkey_mgr.h is some object with no related > to pkey management fields, but instead with four duplicated from > somewhere pointers (and you will need to dig the whole tree in order to > find from where actually it was copied). Do you call this "clear > structure"? [EZ] What is important is not what the manager specific data or functions are but the fact anybody knows where to find them. So once it is clear the osm_partition_mgr is described in osm_partition_mgr.h I can know with one glance that it does not have any special data or function. If I do not have such an h file to learn about the partition manager where would I look for that info? If you keep track of the structure you do not need to dig so much to find where the manager pointers are defined. You should KNOW they must be passed to the manager on its initialization at the osm_sm . And yes - I call the state where you can know by simple rule what file to look for some object definition a clear structure. And I call the state when you can not tell which object should be defined in what file - a mess. > > > If you want to redesign OpenSM structure to be "simpler" or more > > "effective" you can propose doing that. But doing it in the salami way > > is just going to hurt stability and leave us with no structure at all. > > Unless we are willing to re-architect the code in a clean manner and > > spend the years of development and validation for these changes - lets > > keep the code with clear structure. > > So your proposition is to wait years in order to remove unused object? [EZ] If the pkey manager is not used - yes go ahead and remove it. But the case is that there is a partition manager so my objection is to having a manager without an h file. Regarding code flattening and structure rules violations my position is to avoid messing one project structure for obscure reasons, killing its stability and structure. > > Also note that this cleanup has nothing similar with to re-architect the > code, it does not even touch OpenSM architecture. [EZ] Call it whatever you like - if you continuously going to modify the structure it is a major re-write which will impact stability. What regression testing are you running before you posting these patches? From arne.redlich at xiranet.com Tue May 23 00:10:05 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Tue, 23 May 2006 09:10:05 +0200 Subject: [openib-general] Re: Fwd: [Bug 91] sizeof(srp_indirect_buf) wrong on 64-bit platforms In-Reply-To: (Roland Dreier's message of "Mon, 22 May 2006 13:10:34 -0700") References: <20060522200938.GB18107@mellanox.co.il> Message-ID: <877j4dnrn6.fsf@confield.dd.xiranet.com> Roland Dreier writes: > > Roland, maybe this means we need scsi/srp.h in svn for now? > > svn is supposed to work on 2.6.16 ... > > As far as I can tell the bug in the header has no effect on how the IB > SRP initiator works. Roland, I'm afraid it *does* have an effect, unfortunately. There's the following code in ib_srp.c::srp_map_data(), around the lines 540 - 550: struct srp_indirect_buf *buf = (void *) cmd->add_data; /* snip */ buf->table_desc.va = cpu_to_be64(req->cmd->dma + sizeof *cmd + sizeof *buf); So if a target actually RDMA Reads the indirect descriptor table, it will use a wrong address. Arne -- Arne Redlich Xiranet Communications GmbH From ishai at mellanox.co.il Tue May 23 02:03:40 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Tue, 23 May 2006 12:03:40 +0300 Subject: [openib-general] Re: Fwd: [Bug 91] sizeof(srp_indirect_buf) wrongon 64-bit platforms In-Reply-To: <877j4dnrn6.fsf@confield.dd.xiranet.com> References: <877j4dnrn6.fsf@confield.dd.xiranet.com> Message-ID: <20060523090340.GA12666@mellanox.co.il> On Tue, May 23, 2006 at 10:10:05AM +0300, Arne Redlich wrote: > Roland Dreier writes: > > > > Roland, maybe this means we need scsi/srp.h in svn for now? > > > svn is supposed to work on 2.6.16 ... > > > > As far as I can tell the bug in the header has no effect on how the IB > > SRP initiator works. > > Roland, > > I'm afraid it *does* have an effect, unfortunately. There's the following code in ib_srp.c::srp_map_data(), around the lines 540 - 550: > > struct srp_indirect_buf *buf = (void *) cmd->add_data; > > /* snip */ > > buf->table_desc.va = cpu_to_be64(req->cmd->dma + > sizeof *cmd + > sizeof *buf); > > So if a target actually RDMA Reads the indirect descriptor table, it will use a wrong address. > It looks to me that there is no effect after all. This buf->table_desc.va should point to the desc_list array in the srp_indirect_buf. When the code enters the values to this array (buf->desc_list[i]) it uses the address that is corresponding to sizeof *buf. To sum it up, there will be a change in the address the target sees but the data will still be in the address the target sees. > Arne > -- > Arne Redlich > Xiranet Communications GmbH > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- Ishai Rabinovitz From ogerlitz at voltaire.com Tue May 23 02:09:40 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 23 May 2006 12:09:40 +0300 (IDT) Subject: [openib-general] [PATCH 1/2] mthca support for max_map_per_fmr device attribute Message-ID: The max fmr remaps device attribute is not set by the driver, so the generic fmr_pool uses a default of 32. Enlaring this quantity would make the amortized cost of remaps lower. With the current mthca "default profile" on memfull HCA 17 bits are used for MPT addressing so an FMR can be remapped 2^15 - 1 >> 32 times. Looking in the VAPI driver, i understand there might be some issue with the memfree HCA(s) which can break this patch. In vapi/kernel/mlxhh/rhh/hob.c :: XHH_hob_query_struct_init() i see the following code: + /* max maps per fmr: in Arbel we use the 8-LSbits for FMR memkey modification */ + hca_cap_p->max_num_map_per_fmr = 255; I am not sure if the RHH code is used for both Arbel and Sinai memfree or only for Sinai. Also, if the patch makes sense and the memfree issue is resolved, i'd like to change the name of the device attribute from max_map_per_fmr to max_remaps_per_fmr, i can resend this patch series with this fix. Or. Signed-off-by: Or Gerlitz Index: hw/mthca/mthca_provider.c =================================================================== --- hw/mthca/mthca_provider.c (revision 7031) +++ hw/mthca/mthca_provider.c (working copy) @@ -116,6 +116,11 @@ static int mthca_query_device(struct ib_ props->max_total_mcast_qp_attach = props->max_mcast_qp_attach * props->max_mcast_grp; + /* FMR can be remapped 2^B - 1 times where B < 32 is the number of bits + * which are not used for MPT addressing + */ + props->max_map_per_fmr = (1 << (32 - + long_log2(mdev->limits.num_mpts))) - 1; err = 0; out: kfree(in_mad); From ogerlitz at voltaire.com Tue May 23 02:12:38 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 23 May 2006 12:12:38 +0300 (IDT) Subject: [openib-general] [PATCH 2/2] port the fmr pool to use the max_map_per_fmr device attribute In-Reply-To: Message-ID: This patch ports the generic fmr pool to query the ib device and use the device attribute as for the max number of fmr remaps. If the device does not suport the attribute, the code reverts to use the IB_FMR_MAX_REMAPS (32) default. Or. Signed-off-by: Or Gerlitz Index: core/fmr_pool.c =================================================================== --- core/fmr_pool.c (revision 7031) +++ core/fmr_pool.c (working copy) @@ -54,7 +54,7 @@ enum { /* * If an FMR is not in use, then the list member will point to either * its pool's free_list (if the FMR can be mapped again; that is, - * remap_count < IB_FMR_MAX_REMAPS) or its pool's dirty_list (if the + * remap_count < device_attr.max_map_per_fmr) or its pool's dirty_list (if the * FMR needs to be unmapped before being remapped). In either of * these cases it is a bug if the ref_count is not 0. In other words, * if ref_count is > 0, then the list member must not be linked into @@ -84,6 +84,7 @@ struct ib_fmr_pool { int pool_size; int max_pages; + int max_remaps; int dirty_watermark; int dirty_len; struct list_head free_list; @@ -214,6 +215,7 @@ struct ib_fmr_pool *ib_create_fmr_pool(s { struct ib_device *device; struct ib_fmr_pool *pool; + struct ib_device_attr device_attr; int i; int ret; @@ -228,6 +230,15 @@ struct ib_fmr_pool *ib_create_fmr_pool(s return ERR_PTR(-ENOSYS); } + ret = ib_query_device(device, &device_attr); + if (ret) { + printk(KERN_WARNING "couldn't query device"); + return ERR_PTR(ret); + } + /* use the default max remaps for drivers not setting the attribute */ + if (!device_attr.max_map_per_fmr) + device_attr.max_map_per_fmr = IB_FMR_MAX_REMAPS; + pool = kmalloc(sizeof *pool, GFP_KERNEL); if (!pool) { printk(KERN_WARNING "couldn't allocate pool struct"); @@ -258,6 +269,7 @@ struct ib_fmr_pool *ib_create_fmr_pool(s pool->pool_size = 0; pool->max_pages = params->max_pages_per_fmr; + pool->max_remaps = device_attr.max_map_per_fmr; pool->dirty_watermark = params->dirty_watermark; pool->dirty_len = 0; spin_lock_init(&pool->pool_lock); @@ -279,7 +291,7 @@ struct ib_fmr_pool *ib_create_fmr_pool(s struct ib_pool_fmr *fmr; struct ib_fmr_attr attr = { .max_pages = params->max_pages_per_fmr, - .max_maps = IB_FMR_MAX_REMAPS, + .max_maps = device_attr.max_map_per_fmr, .page_shift = params->page_shift }; @@ -489,7 +501,7 @@ int ib_fmr_pool_unmap(struct ib_pool_fmr --fmr->ref_count; if (!fmr->ref_count) { - if (fmr->remap_count < IB_FMR_MAX_REMAPS) { + if (fmr->remap_count < pool->max_remaps) { list_add_tail(&fmr->list, &pool->free_list); } else { list_add_tail(&fmr->list, &pool->dirty_list); From ogerlitz at voltaire.com Tue May 23 04:11:05 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 23 May 2006 14:11:05 +0300 Subject: [openib-general] [PATCH 1/2] mthca support for max_map_per_fmr device attribute In-Reply-To: References: Message-ID: <4472EDC9.8080706@voltaire.com> Or Gerlitz wrote: > The max fmr remaps device attribute is not set by the driver, so the generic > fmr_pool uses a default of 32. Enlaring this quantity would make the amortized > cost of remaps lower. With the current mthca "default profile" on memfull HCA > 17 bits are used for MPT addressing so an FMR can be remapped 2^15 - 1 >> 32 times. Actually, the bigger (than unmap amortized cost) problem i was facing with the unmap count being very low is the following: say my app publishes N credits and serving each credit consumes one FMR, so my app implementation created the pool with 2N FMRs and set the watermark to N. When "requests" come fast enough, there's a window in time when there's an unmapping of N FMRs running at batch, but out of the remaining N FMRs some are already dirty and can't be used to serve a credit. So the app fails temporally... So, setting the watermark to 0.5N might solve this, but since enlarging the number of remaps is trivial, i'd like to do it first. The app i am talking about is a SCSI LLD (eg iSER, SRP) where each SCSI command consumes one FMR and the LLD posts to the SCSI ML how many commands can be issued in parallel. Or. From arne.redlich at xiranet.com Tue May 23 04:25:31 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Tue, 23 May 2006 13:25:31 +0200 Subject: [openib-general] Re: Fwd: [Bug 91] sizeof(srp_indirect_buf) wrongon 64-bit platforms In-Reply-To: <20060523090340.GA12666@mellanox.co.il> (Ishai Rabinovitz's message of "Tue, 23 May 2006 12:03:40 +0300") References: <877j4dnrn6.fsf@confield.dd.xiranet.com> <20060523090340.GA12666@mellanox.co.il> Message-ID: <87mzd9yod0.fsf@confield.dd.xiranet.com> Ishai Rabinovitz writes: > On Tue, May 23, 2006 at 10:10:05AM +0300, Arne Redlich wrote: >> Roland Dreier writes: >> >> > > Roland, maybe this means we need scsi/srp.h in svn for now? >> > > svn is supposed to work on 2.6.16 ... >> > >> > As far as I can tell the bug in the header has no effect on how the IB >> > SRP initiator works. >> >> Roland, >> >> I'm afraid it *does* have an effect, unfortunately. There's the following code in ib_srp.c::srp_map_data(), around the lines 540 - 550: >> >> struct srp_indirect_buf *buf = (void *) cmd->add_data; >> >> /* snip */ >> >> buf->table_desc.va = cpu_to_be64(req->cmd->dma + >> sizeof *cmd + >> sizeof *buf); >> >> So if a target actually RDMA Reads the indirect descriptor table, it will use a wrong address. >> > > It looks to me that there is no effect after all. > This buf->table_desc.va should point to the desc_list array in the srp_indirect_buf. > When the code enters the values to this array (buf->desc_list[i]) it uses the > address that is corresponding to sizeof *buf. > > To sum it up, there will be a change in the address the target sees but the > data will still be in the address the target sees. No, unfortunately this is wrong. The code posted below resembles the offending part in ib_srp.c. It results in this output on an x86_64: sizeof p: 24 offset of table_desc: 0 offset of len: 16 offset of desc_list: 20 addr. of p: ffff81003c389f28 p->table_desc.va: ffff81003c389f40 p->desc_list[0]: ffff81003c389f3c Arne -- Arne Redlich Xiranet Communications GmbH /* --------------- cut here ------------------------------------------------ */ #include #include /* my is already fixed, here's the broken version again */ struct srp_indirect_buf_old { struct srp_direct_buf table_desc; __be32 len; struct srp_direct_buf desc_list[0] __attribute__((packed)); }; MODULE_LICENSE("GPL"); static void __exit test_fini(void) { return; } static int __init test_init(void) { struct srp_indirect_buf_old buf, *p; p = &buf; p->table_desc.va = (u64)(unsigned long)p; p->table_desc.va += sizeof(struct srp_indirect_buf_old); printk("sizeof p: %lu\n", sizeof(*p)); printk("offset of table_desc: %lu\n", offsetof(struct srp_indirect_buf_old, table_desc)); printk("offset of len: %lu\n", offsetof(struct srp_indirect_buf_old, len)); printk("offset of desc_list: %lu\n", offsetof(struct srp_indirect_buf_old, desc_list)); printk("addr. of p: %p\n", p); printk("p->table_desc.va: %llx\n", p->table_desc.va); printk("p->desc_list[0]: %p\n", &p->desc_list[0]); return 0; } module_init(test_init); module_exit(test_fini); From Thomas.Talpey at netapp.com Tue May 23 04:30:33 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Tue, 23 May 2006 07:30:33 -0400 Subject: [openib-general] [PATCH 1/2] mthca support for max_map_per_fmr device attribute In-Reply-To: <4472EDC9.8080706@voltaire.com> References: <4472EDC9.8080706@voltaire.com> Message-ID: <7.0.1.0.2.20060523072806.0421e2d8@netapp.com> Doesn't this change only *increase* the window of vulnerability which FMRs suffer? I.e. when you say "dirty", you mean "still mapped", right? Tom. At 07:11 AM 5/23/2006, Or Gerlitz wrote: >Or Gerlitz wrote: >> The max fmr remaps device attribute is not set by the driver, so the generic >> fmr_pool uses a default of 32. Enlaring this quantity would make the >amortized >> cost of remaps lower. With the current mthca "default profile" on >memfull HCA >> 17 bits are used for MPT addressing so an FMR can be remapped 2^15 - >1 >> 32 times. > >Actually, the bigger (than unmap amortized cost) problem i was facing >with the unmap count being very low is the following: say my app >publishes N credits and serving each credit consumes one FMR, so my app >implementation created the pool with 2N FMRs and set the watermark to N. > >When "requests" come fast enough, there's a window in time when there's >an unmapping of N FMRs running at batch, but out of the remaining N FMRs >some are already dirty and can't be used to serve a credit. So the app >fails temporally... So, setting the watermark to 0.5N might solve this, >but since enlarging the number of remaps is trivial, i'd like to do it >first. > >The app i am talking about is a SCSI LLD (eg iSER, SRP) where each SCSI >command consumes one FMR and the LLD posts to the SCSI ML how many >commands can be issued in parallel. > >Or. > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From ogerlitz at voltaire.com Tue May 23 04:37:32 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 23 May 2006 14:37:32 +0300 Subject: [openib-general] [PATCH 1/2] mthca support for max_map_per_fmr device attribute In-Reply-To: <7.0.1.0.2.20060523072806.0421e2d8@netapp.com> References: <4472EDC9.8080706@voltaire.com> <7.0.1.0.2.20060523072806.0421e2d8@netapp.com> Message-ID: <4472F3FC.3000707@voltaire.com> Talpey, Thomas wrote: > Doesn't this change only *increase* the window of vulnerability > which FMRs suffer? I.e. when you say "dirty", you mean "still mapped", > right? I am not sure i can quantify how much vulnerability is increased, however please recall that the openib fmr pool is ment to be for users of the Mellanox proprietary FMRs, whose unmapping is done not per usage etc etc. Or. From jackm at mellanox.co.il Tue May 23 04:59:44 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Tue, 23 May 2006 14:59:44 +0300 Subject: [openib-general] [PATCH] mad: prevent duplicate RMPP sessions on responder side Message-ID: <200605231459.46326.jackm@mellanox.co.il> Prevent opening multiple RMPP MAD transaction sessions at responder side with the same TID, GID/LID, class. Could happen if RMPP requests are retried while response is in progress. Signed-off-by: Jack Morgenstein Index: openib_branch1.0/drivers/infiniband/core/mad.c =================================================================== --- openib_branch1.0.orig/drivers/infiniband/core/mad.c +++ openib_branch1.0/drivers/infiniband/core/mad.c @@ -1038,6 +1038,102 @@ int ib_send_mad(struct ib_mad_send_wr_pr return ret; } +static inline int is_rmpp_data(struct ib_mad *mad) +{ + struct ib_rmpp_mad *r; + + if (!ib_is_mad_class_rmpp(mad->mad_hdr.mgmt_class)) + return 0; + + r = (struct ib_rmpp_mad *)mad; + return (ib_get_rmpp_flags(&r->rmpp_hdr) & IB_MGMT_RMPP_FLAG_ACTIVE) && + r->rmpp_hdr.rmpp_type == IB_MGMT_RMPP_TYPE_DATA; +} + +static inline int send_has_same_class(struct ib_mad_send_wr_private *mwr1, + struct ib_mad_send_wr_private *mwr2) +{ + return (((struct ib_mad *)(mwr1->send_buf.mad))->mad_hdr.mgmt_class == + ((struct ib_mad *)(mwr2->send_buf.mad))->mad_hdr.mgmt_class); +} + +static int send_has_same_gid(struct ib_mad_agent_private *agent, + struct ib_mad_send_wr_private *lwr, + struct ib_mad_send_wr_private *swr) +{ + struct ib_ah_attr sattr, lattr; + u8 lmethod = ((struct ib_mad *)(lwr->send_buf.mad))->mad_hdr.method; + u8 smethod = ((struct ib_mad *)(swr->send_buf.mad))->mad_hdr.method; + u8 lmc; + + /* one is a response mad, other is not */ + if ((lmethod & IB_MGMT_METHOD_RESP) != (smethod & IB_MGMT_METHOD_RESP)) + return 0; + + /* need to compare GIDs/LIDs */ + if (ib_query_ah(swr->send_buf.ah, &sattr) || + ib_query_ah(lwr->send_buf.ah, &lattr)) + /* No AH data. Assume not equal, to avoid false positives. */ + return 0; + + if (!(smethod & IB_MGMT_METHOD_RESP)) { + /* Is not a response */ + if (!(lattr.ah_flags & IB_AH_GRH) && + !(sattr.ah_flags & IB_AH_GRH)) { + /* no GIDs, compare src_path_bits */ + if (ib_get_cached_lmc(agent->agent.device, + agent->agent.port_num, + &lmc)) + return 0; + return (!lmc || !((sattr.src_path_bits ^ + lattr.src_path_bits) & + ((1 << lmc) - 1))); + } + if ((lattr.ah_flags & IB_AH_GRH) && (sattr.ah_flags & IB_AH_GRH)) + return lattr.grh.sgid_index == sattr.grh.sgid_index; + return 0; + } + + /* comparing send responses */ + if (!(lattr.ah_flags & IB_AH_GRH) && !(sattr.ah_flags & IB_AH_GRH)) + /* No GIDs. Compare LIDs */ + return (sattr.dlid && (lattr.dlid == sattr.dlid)); + if ((lattr.ah_flags & IB_AH_GRH) && (sattr.ah_flags & IB_AH_GRH)) + /* check if GIDs are equal */ + return (!memcmp(lattr.grh.dgid.raw, sattr.grh.dgid.raw, 16)); + /* one has GID, other does not. Assume different dest */ + return 0; +} + +static int check_dup_send_mad(struct ib_mad_agent_private *agent, + struct ib_mad_send_wr_private *send_wr) +{ + struct ib_mad_send_wr_private *t; + + if (!is_rmpp_data(send_wr->send_buf.mad)) + return 0; + list_for_each_entry(t, &agent->wait_list, agent_list) { + if (t->tid == send_wr->tid && + send_has_same_class(t, send_wr) && + send_has_same_gid(agent, t, send_wr)) + return 1; + } + + /* + * It's possible to send a duplicate mad before we've + * been notified that the first send has completed + */ + list_for_each_entry(t, &agent->send_list, agent_list) { + if (is_rmpp_data(t->send_buf.mad) && + t->tid == send_wr->tid && send_has_same_class(t, send_wr) && + send_has_same_gid(agent, t, send_wr)) { + /* Verify request has not been canceled */ + return (send_wr->status == IB_WC_SUCCESS) ? 1 : 0; + } + } + return 0; +} + /* * ib_post_send_mad - Posts MAD(s) to the send queue of the QP associated * with the registered client @@ -1102,6 +1198,12 @@ int ib_post_send_mad(struct ib_mad_send_ /* Reference MAD agent until send completes */ atomic_inc(&mad_agent_priv->refcount); spin_lock_irqsave(&mad_agent_priv->lock, flags); + if (check_dup_send_mad(mad_agent_priv, mad_send_wr)) { + /* Duplicate send request */ + spin_unlock_irqrestore(&mad_agent_priv->lock, flags); + atomic_dec(&mad_agent_priv->refcount); + return -EBUSY; + } list_add_tail(&mad_send_wr->agent_list, &mad_agent_priv->send_list); spin_unlock_irqrestore(&mad_agent_priv->lock, flags); From ishai at mellanox.co.il Tue May 23 07:23:53 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Tue, 23 May 2006 17:23:53 +0300 Subject: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading In-Reply-To: References: Message-ID: <20060523142353.GA15159@mellanox.co.il> On Wed, May 17, 2006 at 02:55:57AM +0300, Roland Dreier wrote: > > + /* > > + * We need 2 scsi_host_put becuase there are two get: > > + * in scsi_host_alloc and in scsi_add_host > > + */ > > + scsi_host_put(target->scsi_host); > > scsi_host_put(target->scsi_host); > > Hmm, this doesn't seem right to me. If I try this, then I get a crash > because the scsi_host is already gone after the first put. I verified > that the reference count is 1 before these puts, and with the > unmodified module I don't see anything left in /sys/class/scsi_host > after unloading the module. > > What kernel are you seeing problems with? I'm testing with an > up-to-date git kernel, although I doubt it makes a difference (did > SCSI reference counting change recently??). > > I do think there are some extra scsi_host_put() calls in > srp_remove_work() -- I think the double scsi_host_put() dates back to > a version (which I may never even have checked in) where there was a > scsi_host_get() to avoid the scsi_host going away between the > schedule_work() and srp_remove_work() actually running. > > So the patch below seems correct to me. > > What do you think? > > --- linux-kernel/infiniband/ulp/srp/ib_srp.c (revision 7245) > +++ linux-kernel/infiniband/ulp/srp/ib_srp.c (working copy) > @@ -353,7 +356,6 @@ static void srp_remove_work(void *target > spin_lock_irq(target->scsi_host->host_lock); > if (target->state != SRP_TARGET_DEAD) { > spin_unlock_irq(target->scsi_host->host_lock); > - scsi_host_put(target->scsi_host); > return; > } > target->state = SRP_TARGET_REMOVED; > @@ -367,8 +369,6 @@ static void srp_remove_work(void *target > ib_destroy_cm_id(target->cm_id); > srp_free_target_ib(target); > scsi_host_put(target->scsi_host); > - /* And another put to really free the target port... */ > - scsi_host_put(target->scsi_host); > } > > static int srp_connect_target(struct srp_target_port *target) > > Roland, As I told you before, your patch looks correct. Are you going to apply it? -- Ishai Rabinovitz From rdreier at cisco.com Tue May 23 08:20:17 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 08:20:17 -0700 Subject: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading In-Reply-To: <20060523142353.GA15159@mellanox.co.il> (Ishai Rabinovitz's message of "Tue, 23 May 2006 17:23:53 +0300") References: <20060523142353.GA15159@mellanox.co.il> Message-ID: Ishai> As I told you before, your patch looks correct. Are you Ishai> going to apply it? Sorry, I committed it to git but forgot to check it into svn. (One more reason to stop maintaining kernel drivers in svn) - R. From jlentini at netapp.com Tue May 23 08:40:33 2006 From: jlentini at netapp.com (James Lentini) Date: Tue, 23 May 2006 11:40:33 -0400 (EDT) Subject: [openib-general] which way to port squid for supporting large amount concurrent connections In-Reply-To: <20060523025603.68783.qmail@web36907.mail.mud.yahoo.com> References: <20060523025603.68783.qmail@web36907.mail.mud.yahoo.com> Message-ID: On Mon, 22 May 2006, zhu shi song wrote: > I won't wait sdp OK. I hope to use another method to > port squid. I know VAPI, uDAP can do this. But can > you suggest me which is best way I should adopt? And > How can I get more info about how to program using > these methods? uDAPL was defined by an industry consortium: http://www.datcollaborative.org/ The website has API documentation. (Disclaimer: I'm the maintainer of the uDAPL open source project) VAPI is an older verbs API specified by Mellanox. The current OpenFabrics (aka OpenIB) project does not have a VAPI userspace library. Another options is the low level OpenFabrics verbs API. I don't know of any documentation for this. From bugzilla-daemon at openib.org Tue May 23 09:09:52 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Tue, 23 May 2006 09:09:52 -0700 (PDT) Subject: [openib-general] [Bug 92] sizeof(srp_indirect_buf) wrong on 64-bit platforms Message-ID: <20060523160952.2025E228417@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=92 sweitzen at cisco.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED ------- Additional Comments From sweitzen at cisco.com 2006-05-23 09:09 ------- Close dup. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eeb at bartonsoftware.com Tue May 23 09:03:48 2006 From: eeb at bartonsoftware.com (Eric Barton) Date: Tue, 23 May 2006 17:03:48 +0100 Subject: [openib-general] RE: [PATCH] multiple RDMA_CM_EVENT_DISCONNECTED callbacks In-Reply-To: Message-ID: <006b01c67e82$7a24acf0$0281a8c0@ebpc> Sean, I just tested your patch and checked that it prevents the double DISCONNECT event callback (it does :). Cheers, Eric --------------------------------------------------- |Eric Barton Barton Software | |9 York Gardens Tel: +44 (117) 330 1575 | |Clifton Mobile: +44 (7909) 680 356 | |Bristol BS8 4LL Fax: call first | |United Kingdom E-Mail: eeb at bartonsoftware.com| --------------------------------------------------- > -----Original Message----- > From: Sean Hefty [mailto:sean.hefty at intel.com] > Sent: 19 May 2006 9:07 PM > To: 'Eric Barton'; openib-general at openib.org > Subject: [PATCH] multiple RDMA_CM_EVENT_DISCONNECTED callbacks > > Eric > > Can you try this patch and let me know if it fixes your problem? > > - Sean > --- > Prevent generating duplicated DISCONNECT events. > > Signed-off-by: Sean Hefty > --- > Index: cma.c > =================================================================== > --- cma.c (revision 7362) > +++ cma.c (working copy) > @@ -83,6 +83,7 @@ enum cma_state { > CMA_ROUTE_QUERY, > CMA_ROUTE_RESOLVED, > CMA_CONNECT, > + CMA_DISCONNECT, > CMA_ADDR_BOUND, > CMA_LISTEN, > CMA_DEVICE_REMOVAL, > @@ -801,6 +802,8 @@ static int cma_ib_handler(struct ib_cm_i > status = -ETIMEDOUT; /* fall through */ > case IB_CM_DREQ_RECEIVED: > case IB_CM_DREP_RECEIVED: > + if (!cma_comp_exch(id_priv, CMA_CONNECT, > CMA_DISCONNECT)) > + goto out; > event = RDMA_CM_EVENT_DISCONNECTED; > break; > case IB_CM_TIMEWAIT_EXIT: > @@ -1770,7 +1773,8 @@ int rdma_disconnect(struct rdma_cm_id *i > int ret; > > id_priv = container_of(id, struct rdma_id_private, id); > - if (!cma_comp(id_priv, CMA_CONNECT)) > + if (!cma_comp(id_priv, CMA_CONNECT) && > + !cma_comp(id_priv, CMA_DISCONNECT)) > return -EINVAL; > > ret = cma_modify_qp_err(id); > From rdreier at cisco.com Tue May 23 09:06:15 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 09:06:15 -0700 Subject: [openib-general] Re: Fwd: [Bug 91] sizeof(srp_indirect_buf) wrongon 64-bit platforms In-Reply-To: <87mzd9yod0.fsf@confield.dd.xiranet.com> (Arne Redlich's message of "Tue, 23 May 2006 13:25:31 +0200") References: <877j4dnrn6.fsf@confield.dd.xiranet.com> <20060523090340.GA12666@mellanox.co.il> <87mzd9yod0.fsf@confield.dd.xiranet.com> Message-ID: >>> I'm afraid it *does* have an effect, unfortunately. Hmm, go ahead and forward the fix from 2.6.17 to the stable team for kernel 2.6.16 if this bug affects your target. Thanks, Roland From rdreier at cisco.com Tue May 23 09:09:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 09:09:05 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- split CQ In-Reply-To: (Shirley Ma's message of "Mon, 22 May 2006 17:09:13 -0600") References: Message-ID: Did you send the other 6 patches in this series? I was waiting to comment until I had all the patches, but there is one really bad thing here: > + IPOIB_NUM_SEND_WC = 32, > +void ipoib_ib_send_completion(struct ib_cq *cq, void *dev_ptr) > +{ > + struct net_device *dev = (struct net_device *) dev_ptr; > + struct ipoib_dev_priv *priv = netdev_priv(dev); > + struct ib_wc ibwc[IPOIB_NUM_SEND_WC]; If I'm doing the math correctly, this function now uses more than 2K of stack, which is of course unacceptable. I don't think there's any way around keeping the wc array in the ipoib_dev_priv structure. - R. From halr at voltaire.com Tue May 23 09:03:16 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 23 May 2006 12:03:16 -0400 Subject: [openib-general] [PATCH] OpenSM/complib: Restore cl_mem* routines as deprecated rather than removing them altogether Message-ID: <1148400193.4470.95112.camel@hal.voltaire.com> OpenSM/complib: Restore cl_mem* routines as deprecated rather than removing them altogether Signed-off-by: Hal Rosenstock Note: If this approach is acceptable, I will be doing the same with cl_malloc, cl_zalloc, cl_free, and friends. Index: include/complib/cl_memory.h =================================================================== --- include/complib/cl_memory.h (revision 7432) +++ include/complib/cl_memory.h (working copy) @@ -436,7 +436,7 @@ cl_malloc( * environments. * * SEE ALSO -* Memory Management, cl_free, cl_zalloc +* Memory Management, cl_free, cl_zalloc, cl_memset, cl_memclr, cl_memcpy, cl_memcmp **********/ @@ -467,7 +467,7 @@ cl_zalloc( * environments. * * SEE ALSO -* Memory Management, cl_free, cl_malloc +* Memory Management, cl_free, cl_malloc, cl_memset, cl_memclr, cl_memcpy, cl_memcmp **********/ @@ -502,6 +502,142 @@ cl_free( **********/ +/****f* Public: Memory Management/cl_memset +* NAME +* cl_memset +* +* DESCRIPTION +* The cl_memset function sets every byte in a memory range to a given value. +* +* SYNOPSIS +*/ +void __attribute__((deprecated)) +cl_memset( + IN void* const p_memory, + IN const uint8_t fill, + IN const size_t count ); +/* +* PARAMETERS +* p_memory +* [in] Pointer to a memory block. +* +* fill +* [in] Byte value with which to fill the memory. +* +* count +* [in] Number of bytes to set. +* +* RETURN VALUE +* This function does not return a value. +* +* SEE ALSO +* Memory Management, cl_memclr, cl_memcpy, cl_memcmp +**********/ + + +/****f* Public: Memory Management/cl_memclr +* NAME +* cl_memclr +* +* DESCRIPTION +* The cl_memclr function sets every byte in a memory range to zero. +* +* SYNOPSIS +*/ +static inline void __attribute__((deprecated)) +cl_memclr( + IN void* const p_memory, + IN const size_t count ) +{ + memset( p_memory, 0, count ); +} +/* +* PARAMETERS +* p_memory +* [in] Pointer to a memory block. +* +* count +* [in] Number of bytes to set. +* +* RETURN VALUE +* This function does not return a value. +* +* SEE ALSO +* Memory Management, cl_memset, cl_memcpy, cl_memcmp +**********/ + + +/****f* Public: Memory Management/cl_memcpy +* NAME +* cl_memcpy +* +* DESCRIPTION +* The cl_memcpy function copies a given number of bytes from +* one buffer to another. +* +* SYNOPSIS +*/ +void __attribute__((deprecated)) * +cl_memcpy( + IN void* const p_dest, + IN const void* const p_src, + IN const size_t count ); +/* +* PARAMETERS +* p_dest +* [in] Pointer to the buffer being copied to. +* +* p_src +* [in] Pointer to the buffer being copied from. +* +* count +* [in] Number of bytes to copy from the source buffer to the +* destination buffer. +* +* RETURN VALUE +* This function does not return a value. +* +* SEE ALSO +* Memory Management, cl_memset, cl_memclr, cl_memcmp +**********/ + + +/****f* Public: Memory Management/cl_memcmp +* NAME +* cl_memcmp +* +* DESCRIPTION +* The cl_memcmp function compares two memory buffers. +* +* SYNOPSIS +*/ +int32_t __attribute__((deprecated)) +cl_memcmp( + IN const void* const p_mem, + IN const void* const p_ref, + IN const size_t count ); +/* +* PARAMETERS +* p_mem +* [in] Pointer to a memory block being compared. +* +* p_ref +* [in] Pointer to the reference memory block to compare against. +* +* count +* [in] Number of bytes to compare. +* +* RETURN VALUES +* Returns less than zero if p_mem is less than p_ref. +* +* Returns greater than zero if p_mem is greater than p_ref. +* +* Returns zero if the two memory regions are the identical. +* +* SEE ALSO +* Memory Management, cl_memset, cl_memclr, cl_memcpy +**********/ + /****f* Public: Memory Management/cl_get_pagesize * NAME * cl_get_pagesize Index: complib/cl_memory_osd.c =================================================================== --- complib/cl_memory_osd.c (revision 7432) +++ complib/cl_memory_osd.c (working copy) @@ -69,3 +69,30 @@ __cl_free_priv( free( p_memory ); } +void +cl_memset( + IN void* const p_memory, + IN const uint8_t fill, + IN const size_t count ) +{ + memset( p_memory, fill, count ); +} + +void* +cl_memcpy( + IN void* const p_dest, + IN const void* const p_src, + IN const size_t count ) +{ + return( memcpy( p_dest, p_src, count ) ); +} + +int32_t +cl_memcmp( + IN const void* const p_mem, + IN const void* const p_ref, + IN const size_t count ) +{ + return( memcmp( p_mem, p_ref, count ) ); +} + Index: complib/libosmcomp.map =================================================================== --- complib/libosmcomp.map (revision 7432) +++ complib/libosmcomp.map (working copy) @@ -1,4 +1,4 @@ -OSMCOMP_1.0 { +OSMCOMP_1.1 { global: cl_async_proc_construct; cl_async_proc_init; @@ -87,6 +87,9 @@ OSMCOMP_1.0 { __cl_find_mem; __cl_free_trk; __cl_free_ntrk; + cl_memset; + cl_memcpy; + cl_memcmp; __cl_perf_run_calibration; __cl_perf_construct; __cl_perf_init; Index: complib/libosmcomp.ver =================================================================== --- complib/libosmcomp.ver (revision 7432) +++ complib/libosmcomp.ver (working copy) @@ -6,4 +6,4 @@ # API_REV - advance on any added API # RUNNING_REV - advance any change to the vendor files # AGE - number of backward versions the API still supports -LIBVERSION=1:0:0 +LIBVERSION=1:1:0 From rdreier at cisco.com Tue May 23 09:11:20 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 09:11:20 -0700 Subject: [openib-general] [PATCH 1/2] mthca support for max_map_per_fmr device attribute In-Reply-To: <4472EDC9.8080706@voltaire.com> (Or Gerlitz's message of "Tue, 23 May 2006 14:11:05 +0300") References: <4472EDC9.8080706@voltaire.com> Message-ID: Or> When "requests" come fast enough, there's a window in time Or> when there's an unmapping of N FMRs running at batch, but out Or> of the remaining N FMRs some are already dirty and can't be Or> used to serve a credit. So the app fails temporally... So, Or> setting the watermark to 0.5N might solve this, but since Or> enlarging the number of remaps is trivial, i'd like to do it Or> first. I don't quite understand how increasing the max remap count really helps you that much. Increasing it would just make this failure less frequent, but it would still occur, right? - R. From rdreier at cisco.com Tue May 23 09:13:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 09:13:00 -0700 Subject: [openib-general] Re: [PATCH 1/2] mthca support for max_map_per_fmr device attribute In-Reply-To: (Or Gerlitz's message of "Tue, 23 May 2006 12:09:40 +0300 (IDT)") References: Message-ID: Or> Also, if the patch makes sense and the memfree issue is Or> resolved, i'd like to change the name of the device attribute Or> from max_map_per_fmr to max_remaps_per_fmr, i can resend this Or> patch series with this fix. The patch makes sense, although of course you need to make sure you understand and handle the mem-free case as well if you want it applied. I'm not sure changing to max_remaps_per_fmr is really clearer, since the value counts the first mapping of the FMR (which is not a remapping). But I guess I could be convinced if more people think it's clearer. - R. From mshefty at ichips.intel.com Tue May 23 09:13:43 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 23 May 2006 09:13:43 -0700 Subject: [openib-general] RE: [PATCH] multiple RDMA_CM_EVENT_DISCONNECTED callbacks In-Reply-To: <006b01c67e82$7a24acf0$0281a8c0@ebpc> References: <006b01c67e82$7a24acf0$0281a8c0@ebpc> Message-ID: <447334B7.9080002@ichips.intel.com> Eric Barton wrote: > I just tested your patch and checked that it prevents the double > DISCONNECT event callback (it does :). Thanks for testing this. I've committed this patch to svn. - Sean From rdreier at cisco.com Tue May 23 09:14:36 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 09:14:36 -0700 Subject: [openib-general] Re: [PATCH 2/2] port the fmr pool to use the max_map_per_fmr device attribute In-Reply-To: (Or Gerlitz's message of "Tue, 23 May 2006 12:12:38 +0300 (IDT)") References: Message-ID: > + struct ib_device_attr device_attr; How big is struct ib_device_attr? I've usually been reluctant to put this type of thing on the stack to avoid bloating stack usage too much. - R. From rdreier at cisco.com Tue May 23 09:15:22 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 09:15:22 -0700 Subject: [openib-general] RE: [PATCH] multiple RDMA_CM_EVENT_DISCONNECTED callbacks In-Reply-To: <447334B7.9080002@ichips.intel.com> (Sean Hefty's message of "Tue, 23 May 2006 09:13:43 -0700") References: <006b01c67e82$7a24acf0$0281a8c0@ebpc> <447334B7.9080002@ichips.intel.com> Message-ID: Sean> Thanks for testing this. I've committed this patch to svn. Should this be merged into what I have queued for 2.6.18? - R. From mshefty at ichips.intel.com Tue May 23 09:23:56 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 23 May 2006 09:23:56 -0700 Subject: [openib-general] Re: [PATCH] mad: prevent duplicate RMPP sessions on responder side In-Reply-To: <200605231459.46326.jackm@mellanox.co.il> References: <200605231459.46326.jackm@mellanox.co.il> Message-ID: <4473371C.6040504@ichips.intel.com> Jack Morgenstein wrote: > Prevent opening multiple RMPP MAD transaction sessions at responder side > with the same TID, GID/LID, class. > > Could happen if RMPP requests are retried while response is in progress. My preference for handling this is to detect and discard duplicate requests, and verify that response MADs match a request when being sent. See the mail thread starting at: http://openib.org/pipermail/openib-general/2006-April/020703.html This will also help us add in support for DS RMPP. For kernel clients, I anticipate that this sort of change is fairly small. Userspace support requires a bit more work, especially if we don't want to change the ABI. - Sean From mshefty at ichips.intel.com Tue May 23 09:25:36 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 23 May 2006 09:25:36 -0700 Subject: [openib-general] RE: [PATCH] multiple RDMA_CM_EVENT_DISCONNECTED callbacks In-Reply-To: References: <006b01c67e82$7a24acf0$0281a8c0@ebpc> <447334B7.9080002@ichips.intel.com> Message-ID: <44733780.2090002@ichips.intel.com> Roland Dreier wrote: > Sean> Thanks for testing this. I've committed this patch to svn. > > Should this be merged into what I have queued for 2.6.18? I think so. I was going to send another update later today that included the patches that Michael wanted for SDP support as well. (I didn't see that he had posted those yet.) - Sean From rdreier at cisco.com Tue May 23 09:29:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 09:29:21 -0700 Subject: [openib-general] RE: [PATCH] multiple RDMA_CM_EVENT_DISCONNECTED callbacks In-Reply-To: <44733780.2090002@ichips.intel.com> (Sean Hefty's message of "Tue, 23 May 2006 09:25:36 -0700") References: <006b01c67e82$7a24acf0$0281a8c0@ebpc> <447334B7.9080002@ichips.intel.com> <44733780.2090002@ichips.intel.com> Message-ID: Sean> I think so. I was going to send another update later today Sean> that included the patches that Michael wanted for SDP Sean> support as well. (I didn't see that he had posted those Sean> yet.) OK, I'll wait for that. Thanks, Roland From mst at mellanox.co.il Tue May 23 09:31:09 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 23 May 2006 19:31:09 +0300 Subject: [openib-general] RE: [PATCH] multiple RDMA_CM_EVENT_DISCONNECTED callbacks In-Reply-To: <44733780.2090002@ichips.intel.com> References: <006b01c67e82$7a24acf0$0281a8c0@ebpc> <447334B7.9080002@ichips.intel.com> <44733780.2090002@ichips.intel.com> Message-ID: <20060523163109.GB3377@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] RE: [PATCH] multiple RDMA_CM_EVENT_DISCONNECTED callbacks > > Roland Dreier wrote: > > Sean> Thanks for testing this. I've committed this patch to svn. > > > >Should this be merged into what I have queued for 2.6.18? > > I think so. I was going to send another update later today that included > the patches that Michael wanted for SDP support as well. (I didn't see > that he had posted those yet.) Didn't have the time yet, sorry. Please go ahead if you can, otherwise I'll generate the patches later this week. -- MST From rdreier at cisco.com Tue May 23 09:57:03 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 09:57:03 -0700 Subject: [openib-general] Plans for libibverbs 1.1 Message-ID: I'm planning on branching the libibverbs tree so that I can open a 1.1 development branch where ABI/API stability is not a requirement. My current plan is to copy the current src/userspace/libibverbs tree in svn to src/userspace/libibverbs-1.0. The libibverbs-1.0 tree would be used for stable maintainence (only changes that preserve ABI and API stability will be accepted), and the libibverbs tree would be used for new development. I would expect a libibverbs 1.1-pre1 snapshot release shortly, with the goal of a full stable libibverbs 1.1 release in 3 or 4 months. So far I have the changes below queued up for the new libibverbs 1.1 tree. The main changes are getting rid of libsysfs use, and removing the deprecated ib_XXX symbols. If no one raises any problems, I'll commit all of this tomorrow (Weds). Thanks, Roland --- libibverbs/libibverbs.spec.in (revision 7435) +++ libibverbs/libibverbs.spec.in (working copy) @@ -3,18 +3,16 @@ %define ver @VERSION@ Name: libibverbs -Version: 1.0.4 -Release: 1%{?dist} +Version: 1.1 +Release: 0.1.pre1%{?dist} Summary: A library for direct userspace use of InfiniBand Group: System Environment/Libraries License: GPL/BSD Url: http://openib.org/ -Source: http://openib.org/downloads/libibverbs-1.0.4.tar.gz +Source: http://openib.org/downloads/libibverbs-1.1-pre1.tar.gz BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) -BuildRequires: %{_includedir}/sysfs/libsysfs.h - %description libibverbs is a library that allows userspace processes to use InfiniBand "verbs" as described in the InfiniBand Architecture @@ -27,7 +25,6 @@ also be installed. %package devel Summary: Development files for the libibverbs library Group: System Environment/Libraries -Requires: %{name} = %{version}-%{release} %{_includedir}/sysfs/libsysfs.h %description devel Static libraries and header files for the libibverbs verbs library. @@ -77,6 +74,10 @@ rm -rf $RPM_BUILD_ROOT %{_mandir}/man1/* %changelog +* Mon May 22 2006 Roland Dreier - 1.1-0.1.pre1 +- New upstream release +- Remove dependency on libsysfs, since it is no longer used + * Thu May 4 2006 Roland Dreier - 1.0.4-1 - New upstream release --- libibverbs/debian/control (revision 7435) +++ libibverbs/debian/control (working copy) @@ -1,11 +1,11 @@ Source: libibverbs Priority: extra Maintainer: Roland Dreier -Build-Depends: cdbs (>= 0.4.25-1), debhelper (>= 5), autotools-dev, libsysfs-dev -Standards-Version: 3.7.0 +Build-Depends: cdbs (>= 0.4.25-1), debhelper (>= 5), autotools-dev +Standards-Version: 3.7.2 Section: libs -Package: libibverbs1 +Package: libibverbs2 Section: libs Architecture: any Depends: ${shlibs:Depends}, ${misc:Depends}, adduser @@ -23,22 +23,22 @@ Description: A library for direct usersp Package: libibverbs-dev Section: libdevel Architecture: any -Depends: ${misc:Depends}, libibverbs1 (= ${Source-Version}), libsysfs-dev +Depends: ${misc:Depends}, libibverbs2 (= ${Source-Version}) Description: Development files for the libibverbs library libibverbs is a library that allows userspace processes to use InfiniBand "verbs" as described in the InfiniBand Architecture Specification. This includes direct hardware access for fast path operations. . - This package is needed to compile programs against libibverbs1. + This package is needed to compile programs against libibverbs2. It contains the header files and static libraries (optionally) needed for compiling. -Package: libibverbs1-dbg +Package: libibverbs2-dbg Section: libdevel Priority: extra Architecture: any -Depends: ${misc:Depends}, libibverbs1 (= ${Source-Version}) +Depends: ${misc:Depends}, libibverbs2 (= ${Source-Version}) Description: Debugging symbols for the libibverbs library libibverbs is a library that allows userspace processes to use InfiniBand "verbs" as described in the InfiniBand Architecture @@ -46,7 +46,7 @@ Description: Debugging symbols for the l operations. . This package contains the debugging symbols associated with - libibverbs1. They will automatically be used by gdb for debugging + libibverbs2. They will automatically be used by gdb for debugging libibverbs-related issues. Package: ibverbs-utils @@ -59,5 +59,5 @@ Description: Examples for the libibverbs Specification. This includes direct hardware access for fast path operations. . - This package contains useful libibverbs1 example programs such as + This package contains useful libibverbs2 example programs such as ibv_devinfo, which displays information about InfiniBand devices. --- libibverbs/debian/libibverbs1.postinst (revision 7435) +++ libibverbs/debian/libibverbs1.postinst (working copy) @@ -1,12 +0,0 @@ -#!/bin/sh -# postinst script for libibverbs1 - -set -e - -if [ "$1" != configure ]; then - exit 0 -fi - -getent group rdma > /dev/null 2>&1 || addgroup --system --quiet rdma - -#DEBHELPER# --- libibverbs/debian/changelog (revision 7435) +++ libibverbs/debian/changelog (working copy) @@ -1,8 +1,10 @@ -libibverbs (1.0.4-1) unstable; urgency=low +libibverbs (1.0.99+1.1-pre1-1) unstable; urgency=low - * New upstream release. + * New upstream prerelease. + * soname bumped to 2 due to ABI changes. + * Update to Standards-Version: 3.7.2 - -- Roland Dreier Thu, 4 May 2006 13:46:44 -0700 + -- Roland Dreier Mon, 22 May 2006 23:13:00 -0700 libibverbs (1.0.3-1) unstable; urgency=low --- libibverbs/debian/libibverbs1.install (revision 7435) +++ libibverbs/debian/libibverbs1.install (working copy) @@ -1 +0,0 @@ -usr/lib/libibverbs*.so.* --- libibverbs/include/infiniband/driver.h (revision 7435) +++ libibverbs/include/infiniband/driver.h (working copy) @@ -1,6 +1,6 @@ /* * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. - * Copyright (c) 2005, 2006 Cisco Systems. All rights reserved. + * Copyright (c) 2005, 2006 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2005 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two @@ -37,8 +37,6 @@ #ifndef INFINIBAND_DRIVER_H #define INFINIBAND_DRIVER_H -#include - #include #include @@ -54,16 +52,17 @@ * Device-specific drivers should declare their device init function * as below (the name must be "openib_driver_init"): * - * struct ibv_device *openib_driver_init(struct sysfs_class_device *); + * struct ibv_device *ibv_driver_init(const char *uverbs_sys_path, + * int abi_version); * - * libibverbs will call each driver's openib_driver_init() function - * once for each InfiniBand device. If the device is one that the - * driver can support, it should return a struct ibv_device * with the - * ops member filled in. If the driver does not support the device, - * it should return NULL from openib_driver_init(). + * libibverbs will call each driver's ibv_driver_init() function once + * for each InfiniBand device. If the device is one that the driver + * can support, it should return a struct ibv_device * with the ops + * member filled in. If the driver does not support the device, it + * should return NULL from openib_driver_init(). */ -typedef struct ibv_device *(*ibv_driver_init_func)(struct sysfs_class_device *); +typedef struct ibv_device *(*ibv_driver_init_func)(const char *, int); int ibv_cmd_get_context(struct ibv_context *context, struct ibv_get_context *cmd, size_t cmd_size, struct ibv_get_context_resp *resp, --- libibverbs/include/infiniband/verbs.h (revision 7435) +++ libibverbs/include/infiniband/verbs.h (working copy) @@ -1,7 +1,7 @@ /* * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. * Copyright (c) 2004 Intel Corporation. All rights reserved. - * Copyright (c) 2005, 2006 Cisco Systems. All rights reserved. + * Copyright (c) 2005, 2006 Cisco Systems, Inc. All rights reserved. * Copyright (c) 2005 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two @@ -41,8 +41,6 @@ #include #include -#include - #ifdef __cplusplus # define BEGIN_C_DECLS extern "C" { # define END_C_DECLS } @@ -559,11 +557,22 @@ struct ibv_device_ops { void (*free_context)(struct ibv_context *context); }; +enum { + IBV_SYSFS_NAME_MAX = 64, + IBV_SYSFS_PATH_MAX = 256 +}; + struct ibv_device { - struct sysfs_class_device *dev; - struct sysfs_class_device *ibdev; - struct ibv_driver *driver; - struct ibv_device_ops ops; + struct ibv_driver *driver; + struct ibv_device_ops ops; + /* Name of underlying kernel IB device, eg "mthca0" */ + char name[IBV_SYSFS_NAME_MAX]; + /* Name of uverbs device, eg "uverbs0" */ + char dev_name[IBV_SYSFS_NAME_MAX]; + /* Path to infiniband_verbs class device in sysfs */ + char dev_path[IBV_SYSFS_PATH_MAX]; + /* Path to infiniband class device in sysfs */ + char ibdev_path[IBV_SYSFS_PATH_MAX]; }; struct ibv_context_ops { --- libibverbs/include/infiniband/marshall.h (revision 7435) +++ libibverbs/include/infiniband/marshall.h (working copy) @@ -46,12 +46,6 @@ # define END_C_DECLS #endif /* __cplusplus */ -#if __GNUC__ >= 3 -# define __attribute_deprecated __attribute__((deprecated)) -#else -# define __attribute_deprecated -#endif - BEGIN_C_DECLS void ibv_copy_qp_attr_from_kern(struct ibv_qp_attr *dst, @@ -63,21 +57,6 @@ void ibv_copy_path_rec_from_kern(struct void ibv_copy_path_rec_to_kern(struct ibv_kern_path_rec *dst, struct ibv_sa_path_rec *src); -/* - * Obsolete, deprecated names. Will be removed in libibverbs 1.1. - */ - -void ib_copy_qp_attr_from_kern(struct ibv_qp_attr *dst, - struct ibv_kern_qp_attr *src) __attribute_deprecated; - -void ib_copy_path_rec_from_kern(struct ib_sa_path_rec *dst, - struct ib_kern_path_rec *src) __attribute_deprecated; - -void ib_copy_path_rec_to_kern(struct ib_kern_path_rec *dst, - struct ib_sa_path_rec *src) __attribute_deprecated; - END_C_DECLS -#undef __attribute_deprecated - #endif /* INFINIBAND_MARSHALL_H */ --- libibverbs/include/infiniband/sa.h (revision 7435) +++ libibverbs/include/infiniband/sa.h (working copy) @@ -38,13 +38,6 @@ #include -/* - * Obsolete, deprecated names. Will be removed in libibverbs 1.1. - */ -#define ib_sa_path_rec ibv_sa_path_rec -#define ib_sa_mcmember_rec ibv_sa_mcmember_rec -#define ib_sa_service_rec ibv_sa_service_rec - struct ibv_sa_path_rec { /* reserved */ /* reserved */ --- libibverbs/configure.in (revision 7435) +++ libibverbs/configure.in (working copy) @@ -1,11 +1,11 @@ dnl Process this file with autoconf to produce a configure script. AC_PREREQ(2.57) -AC_INIT(libibverbs, 1.0.4, openib-general at openib.org) +AC_INIT(libibverbs, 1.1-pre1, openib-general at openib.org) AC_CONFIG_SRCDIR([src/ibverbs.h]) AC_CONFIG_AUX_DIR(config) AM_CONFIG_HEADER(config.h) -AM_INIT_AUTOMAKE(libibverbs, 1.0.4) +AM_INIT_AUTOMAKE(libibverbs, 1.1-pre1) AM_PROG_LIBTOOL @@ -17,12 +17,8 @@ AC_CHECK_LIB(dl, dlsym, [], AC_MSG_ERROR([dlsym() not found. libibverbs requires libdl.])) AC_CHECK_LIB(pthread, pthread_mutex_init, [], AC_MSG_ERROR([pthread_mutex_init() not found. libibverbs requires libpthread.])) -AC_CHECK_LIB(sysfs, sysfs_open_class, [], - AC_MSG_ERROR([sysfs_open_class() not found. libibverbs requires libsysfs.])) dnl Checks for header files. -AC_CHECK_HEADER(sysfs/libsysfs.h, [], - AC_MSG_ERROR([ not found. libibverbs requires libsysfs.])) AC_HEADER_STDC dnl Checks for typedefs, structures, and compiler characteristics. --- libibverbs/ChangeLog (revision 7435) +++ libibverbs/ChangeLog (working copy) @@ -1,3 +1,25 @@ +2006-05-22 Roland Dreier + + * examples/devinfo.c (print_hca_cap): Read board_id attribute from + sysfs using ibv_read_sysfs_file() instead of libsysfs. + + * src/cmd.c, src/marshall.c, src/sysfs.c: Include , + since it is no longer implicitly included via . + + * include/infiniband/driver.h, include/infiniband/verbs.h, + src/device.c, src/init.c, src/verbs.c: Remove dependency on + libsysfs by implementing what is required directly on top of + filesystem operations. + + * include/infiniband/driver.h, src/init.c: Change name of driver + entry point to ibv_driver_init(), and update prototype to remove + libsysfs dependency. + + * src/marshall.c, include/infiniband/marshall.h, + include/infiniband/sa.h: Remove deprecated ib_xxx symbols. + + * Create libibverbs 1.1 branch and bump version number to 1.1-pre1. + 2006-05-22 Michael S. Tsirkin * include/infiniband/verbs.h: Remove trailing commas from --- libibverbs/src/libibverbs.map (revision 7435) +++ libibverbs/src/libibverbs.map (working copy) @@ -72,8 +72,5 @@ IBVERBS_1.0 { ibv_get_sysfs_path; ibv_read_sysfs_file; - ib_copy_qp_attr_from_kern; - ib_copy_path_rec_from_kern; - ib_copy_path_rec_to_kern; local: *; }; --- libibverbs/src/device.c (revision 7435) +++ libibverbs/src/device.c (working copy) @@ -1,5 +1,6 @@ /* * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -82,7 +83,7 @@ void ibv_free_device_list(struct ibv_dev const char *ibv_get_device_name(struct ibv_device *device) { - return device->ibdev->name; + return device->name; } uint64_t ibv_get_device_guid(struct ibv_device *device) @@ -92,7 +93,7 @@ uint64_t ibv_get_device_guid(struct ibv_ uint16_t parts[4]; int i; - if (ibv_read_sysfs_file(device->ibdev->path, "node_guid", + if (ibv_read_sysfs_file(device->ibdev_path, "node_guid", attr, sizeof attr) < 0) return 0; @@ -112,13 +113,15 @@ struct ibv_context *ibv_open_device(stru int cmd_fd; struct ibv_context *context; - asprintf(&devpath, "/dev/infiniband/%s", device->dev->name); + asprintf(&devpath, "/dev/infiniband/%s", device->dev_name); /* * We'll only be doing writes, but we need O_RDWR in case the * provider needs to mmap() the file. */ cmd_fd = open(devpath, O_RDWR); + free(devpath); + if (cmd_fd < 0) return NULL; --- libibverbs/src/verbs.c (revision 7435) +++ libibverbs/src/verbs.c (working copy) @@ -1,6 +1,6 @@ /* * Copyright (c) 2005 Topspin Communications. All rights reserved. - * Copyright (c) 2006 Cisco Systems. All rights reserved. + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -99,7 +99,7 @@ int ibv_query_gid(struct ibv_context *co snprintf(name, sizeof name, "ports/%d/gids/%d", port_num, index); - if (ibv_read_sysfs_file(context->device->ibdev->path, name, + if (ibv_read_sysfs_file(context->device->ibdev_path, name, attr, sizeof attr) < 0) return -1; @@ -122,7 +122,7 @@ int ibv_query_pkey(struct ibv_context *c snprintf(name, sizeof name, "ports/%d/pkeys/%d", port_num, index); - if (ibv_read_sysfs_file(context->device->ibdev->path, name, + if (ibv_read_sysfs_file(context->device->ibdev_path, name, attr, sizeof attr) < 0) return -1; --- libibverbs/src/init.c (revision 7435) +++ libibverbs/src/init.c (working copy) @@ -1,5 +1,6 @@ /* * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -43,6 +44,7 @@ #include #include #include +#include #include "ibverbs.h" @@ -71,7 +73,7 @@ static void load_driver(char *so_path) } dlerror(); - init_func = dlsym(dlhandle, "openib_driver_init"); + init_func = dlsym(dlhandle, "ibv_driver_init"); if (dlerror() != NULL || !init_func) { dlclose(dlhandle); return; @@ -118,40 +120,45 @@ static void find_drivers(char *dir) globfree(&so_glob); } -static struct ibv_device *init_drivers(struct sysfs_class_device *verbs_dev) +static struct ibv_device *init_drivers(const char *class_path, + const char *dev_name) { - struct sysfs_class_device *ib_dev; struct ibv_driver *driver; struct ibv_device *dev; - char ibdev_name[64]; + int abi_ver = 0; + char sys_path[IBV_SYSFS_PATH_MAX]; + char ibdev_name[IBV_SYSFS_NAME_MAX]; + char value[8]; - if (ibv_read_sysfs_file(verbs_dev->path, "ibdev", - ibdev_name, sizeof ibdev_name) < 0) { - fprintf(stderr, PFX "Warning: no ibdev class attr for %s\n", - verbs_dev->name); - return NULL; - } + snprintf(sys_path, sizeof sys_path, "%s/%s", + class_path, dev_name); - ib_dev = sysfs_open_class_device("infiniband", ibdev_name); - if (!ib_dev) { - fprintf(stderr, PFX "Warning: no infiniband class device %s for %s\n", - ibdev_name, verbs_dev->name); + if (ibv_read_sysfs_file(sys_path, "abi_version", value, sizeof value) > 0) + abi_ver = strtol(value, NULL, 10); + + if (ibv_read_sysfs_file(sys_path, "ibdev", ibdev_name, sizeof ibdev_name) < 0) { + fprintf(stderr, PFX "Warning: no ibdev class attr for %s\n", + sys_path); return NULL; } for (driver = driver_list; driver; driver = driver->next) { - dev = driver->init_func(verbs_dev); - if (dev) { - dev->dev = verbs_dev; - dev->ibdev = ib_dev; - dev->driver = driver; + dev = driver->init_func(sys_path, abi_ver); + if (!dev) + continue; + + dev->driver = driver; + strcpy(dev->dev_path, sys_path); + snprintf(dev->ibdev_path, IBV_SYSFS_PATH_MAX, "%s/class/infiniband/%s", + ibv_get_sysfs_path(), ibdev_name); + strcpy(dev->dev_name, dev_name); + strcpy(dev->name, ibdev_name); - return dev; - } + return dev; } fprintf(stderr, PFX "Warning: no userspace device-specific driver found for %s\n" - " driver search path: ", verbs_dev->name); + " driver search path: ", dev_name); if (user_path) fprintf(stderr, "%s:", user_path); fprintf(stderr, "%s\n", default_path); @@ -159,17 +166,10 @@ static struct ibv_device *init_drivers(s return NULL; } -static int check_abi_version(void) +static int check_abi_version(const char *path) { - const char *path; char value[8]; - path = ibv_get_sysfs_path(); - if (!path) { - fprintf(stderr, PFX "Fatal: couldn't find sysfs mount.\n"); - return -1; - } - if (ibv_read_sysfs_file(path, "class/infiniband_verbs/abi_version", value, sizeof value) < 0) { fprintf(stderr, PFX "Fatal: couldn't read uverbs ABI version.\n"); @@ -191,10 +191,11 @@ static int check_abi_version(void) HIDDEN int ibverbs_init(struct ibv_device ***list) { + const char *sysfs_path; char *wr_path, *dir; - struct sysfs_class *cls; - struct dlist *verbs_dev_list; - struct sysfs_class_device *verbs_dev; + char class_path[IBV_SYSFS_PATH_MAX]; + DIR *class_dir; + struct dirent *dent; struct ibv_device *device; struct ibv_device **new_list; int num_devices = 0; @@ -227,35 +228,45 @@ HIDDEN int ibverbs_init(struct ibv_devic */ load_driver(NULL); - cls = sysfs_open_class("infiniband_verbs"); - if (!cls) { - fprintf(stderr, PFX "Fatal: couldn't open sysfs class 'infiniband_verbs'.\n"); + sysfs_path = ibv_get_sysfs_path(); + if (!sysfs_path) { + fprintf(stderr, PFX "Fatal: couldn't find sysfs mount.\n"); return 0; } - if (check_abi_version()) + if (check_abi_version(sysfs_path)) return 0; - verbs_dev_list = sysfs_get_class_devices(cls); - if (!verbs_dev_list) { - fprintf(stderr, PFX "Fatal: no infiniband class devices found.\n"); + snprintf(class_path, sizeof class_path, "%s/class/infiniband_verbs", + sysfs_path); + class_dir = opendir(class_path); + if (!class_dir) { + fprintf(stderr, PFX "Fatal: couldn't open sysfs class " + "directory '%s'.\n", class_path); return 0; } - dlist_for_each_data(verbs_dev_list, verbs_dev, struct sysfs_class_device) { - device = init_drivers(verbs_dev); - if (device) { - if (list_size <= num_devices) { - list_size = list_size ? list_size * 2 : 1; - new_list = realloc(*list, list_size * sizeof (struct ibv_device *)); - if (!new_list) - goto out; - *list = new_list; - } - (*list)[num_devices++] = device; + while ((dent = readdir(class_dir))) { + if (dent->d_name[0] == '.' || dent->d_type == DT_REG) + continue; + + device = init_drivers(class_path, dent->d_name); + if (!device) + continue; + + if (list_size <= num_devices) { + list_size = list_size ? list_size * 2 : 1; + new_list = realloc(*list, list_size * sizeof (struct ibv_device *)); + if (!new_list) + goto out; + *list = new_list; } + + (*list)[num_devices++] = device; } + closedir(class_dir); + out: return num_devices; } --- libibverbs/src/cmd.c (revision 7435) +++ libibverbs/src/cmd.c (working copy) @@ -1,7 +1,7 @@ /* * Copyright (c) 2005 Topspin Communications. All rights reserved. * Copyright (c) 2005 PathScale, Inc. All rights reserved. - * Copyright (c) 2006 Cisco Systems. All rights reserved. + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -43,6 +43,7 @@ #include #include #include +#include #include "ibverbs.h" --- libibverbs/src/marshall.c (revision 7435) +++ libibverbs/src/marshall.c (working copy) @@ -34,6 +34,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include + #include static void ibv_copy_ah_attr_from_kern(struct ibv_ah_attr *dst, @@ -138,21 +140,3 @@ void ibv_copy_path_rec_to_kern(struct ib dst->preference = src->preference; dst->packet_life_time_selector = src->packet_life_time_selector; } - -void ib_copy_qp_attr_from_kern(struct ibv_qp_attr *dst, - struct ibv_kern_qp_attr *src) -{ - return ibv_copy_qp_attr_from_kern(dst, src); -} - -void ib_copy_path_rec_from_kern(struct ib_sa_path_rec *dst, - struct ib_kern_path_rec *src) -{ - return ibv_copy_path_rec_from_kern(dst, src); -} - -void ib_copy_path_rec_to_kern(struct ib_kern_path_rec *dst, - struct ib_sa_path_rec *src) -{ - return ibv_copy_path_rec_to_kern(dst, src); -} --- libibverbs/src/sysfs.c (revision 7435) +++ libibverbs/src/sysfs.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2006 Cisco Systems. All rights reserved. + * Copyright (c) 2006 Cisco Systems, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -42,6 +42,7 @@ #include #include #include +#include #include "ibverbs.h" @@ -64,7 +65,7 @@ const char *ibv_get_sysfs_path(void) if (env) { int len; - sysfs_path = strndup(env, 256); + sysfs_path = strndup(env, IBV_SYSFS_PATH_MAX); len = strlen(sysfs_path); while (len > 0 && sysfs_path[len - 1] == '/') { --len; --- libibverbs/Makefile.am (revision 7435) +++ libibverbs/Makefile.am (working copy) @@ -16,7 +16,7 @@ endif src_libibverbs_la_SOURCES = src/cmd.c src/device.c src/init.c src/marshall.c \ src/memory.c src/sysfs.c src/verbs.c -src_libibverbs_la_LDFLAGS = -version-info 1 -export-dynamic \ +src_libibverbs_la_LDFLAGS = -version-info 2 -export-dynamic \ $(libibverbs_version_script) src_libibverbs_la_DEPENDENCIES = $(srcdir)/src/libibverbs.map @@ -49,8 +49,8 @@ man_MANS = man/ibv_asyncwatch.1 man/ibv_ man/ibv_srq_pingpong.1 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \ - debian/ibverbs-utils.install debian/libibverbs1.install \ - debian/libibverbs1.postinst debian/libibverbs-dev.install \ + debian/ibverbs-utils.install debian/libibverbs2.install \ + debian/libibverbs2.postinst debian/libibverbs-dev.install \ debian/rules EXTRA_DIST = include/infiniband/driver.h include/infiniband/kern-abi.h \ --- libibverbs/README (revision 7435) +++ libibverbs/README (working copy) @@ -127,15 +127,6 @@ the 1.1 release: driver ABI, because a new method will need to be added to struct ibv_context_ops. - * Eliminate the dependency on libsysfs by implementing the required - sysfs handling directly. This will break the API, because the dev - and ibdev members of struct ibv_device will be removed. It will - also break the device driver ABI, because the signature of the - driver initialization function will change. The driver - initialization function will be changed as part of this work; this - has the added benefit of allowing us to choose a better name than - "openib_driver_init." - Other possibilities ------------------- --- libibverbs/examples/devinfo.c (revision 7435) +++ libibverbs/examples/devinfo.c (working copy) @@ -47,6 +47,7 @@ #include #include +#include #include static int verbose = 0; @@ -169,7 +170,6 @@ static int print_hca_cap(struct ibv_devi struct ibv_context *ctx; struct ibv_device_attr device_attr; struct ibv_port_attr port_attr; - struct sysfs_attribute *attr; int rc = 0; uint8_t port; char buf[256]; @@ -194,11 +194,9 @@ static int print_hca_cap(struct ibv_devi printf("\tvendor_id:\t\t\t0x%04x\n", device_attr.vendor_id); printf("\tvendor_part_id:\t\t\t%d\n", device_attr.vendor_part_id); printf("\thw_ver:\t\t\t\t0x%X\n", device_attr.hw_ver); - attr = sysfs_get_classdev_attr(ib_dev->ibdev, "board_id"); - if (attr) { - printf("\tboard_id:\t\t\t%s", attr->value); - sysfs_close_attribute(attr); - } + + if (ibv_read_sysfs_file(ib_dev->ibdev_path, "board_id", buf, sizeof buf) > 0) + printf("\tboard_id:\t\t\t%s\n", buf); printf("\tphys_port_cnt:\t\t\t%d\n", device_attr.phys_port_cnt); From xma at us.ibm.com Tue May 23 10:20:52 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 23 May 2006 10:20:52 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- split CQ In-Reply-To: Message-ID: Roland Dreier wrote on 05/23/2006 09:09:05 AM: > Did you send the other 6 patches in this series? Yes, I am splitting these patches. > I was waiting to comment until I had all the patches, but there is one > really bad thing here: > > > + IPOIB_NUM_SEND_WC = 32, > > > +void ipoib_ib_send_completion(struct ib_cq *cq, void *dev_ptr) > > +{ > > + struct net_device *dev = (struct net_device *) dev_ptr; > > + struct ipoib_dev_priv *priv = netdev_priv(dev); > > + struct ib_wc ibwc[IPOIB_NUM_SEND_WC]; > > If I'm doing the math correctly, this function now uses more than 2K > of stack, which is of course unacceptable. > > I don't think there's any way around keeping the wc array in the > ipoib_dev_priv structure. > > - R. The stack is 4k, not 8K anymore. I think we can still use IPOIB_NUM_SEND_WC as 4. I modified mthca_XXX_post_send to remove lock totally before(since sender is exclusive), and found that lock didn't impact performance too much. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Tue May 23 10:30:47 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 23 May 2006 20:30:47 +0300 Subject: [openib-general] Re: Plans for libibverbs 1.1 In-Reply-To: References: Message-ID: <20060523173047.GC3377@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Plans for libibverbs 1.1 > > I'm planning on branching the libibverbs tree so that I can open a 1.1 > development branch where ABI/API stability is not a requirement. My > current plan is to copy the current src/userspace/libibverbs tree in > svn to src/userspace/libibverbs-1.0. The libibverbs-1.0 tree would be > used for stable maintainence (only changes that preserve ABI and API > stability will be accepted), and the libibverbs tree would be used for > new development. Might some place under branches be better? -- MST From rdreier at cisco.com Tue May 23 10:33:42 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 10:33:42 -0700 Subject: [openib-general] Re: Plans for libibverbs 1.1 In-Reply-To: <20060523173047.GC3377@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 23 May 2006 20:30:47 +0300") References: <20060523173047.GC3377@mellanox.co.il> Message-ID: Michael> Might some place under branches be better? Someplace under git would be better of course ;) But I don't think I want to move the libibverbs-1.0 tree too far out of the way. After all it will still be required for low-level driver libraries that haven't been converted yet. - R. From sashak at voltaire.com Tue May 23 10:59:48 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 23 May 2006 20:59:48 +0300 Subject: [openib-general] Re: [PATCH] opensm: remove osm_pkey_mgr.h In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3023686EB@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3023686EB@mtlexch01.mtl.com> Message-ID: <20060523175948.GC6472@sashak.voltaire.com> On 09:14 Tue 23 May , Eitan Zahavi wrote: > > > > I would be agree with your last example, but it is not the case - what > > you will actually find in osm_pkey_mgr.h is some object with no > related > > to pkey management fields, but instead with four duplicated from > > somewhere pointers (and you will need to dig the whole tree in order > to > > find from where actually it was copied). Do you call this "clear > > structure"? > [EZ] What is important is not what the manager specific data or > functions are but the fact anybody knows where to find them. So once it > is clear the osm_partition_mgr is described in osm_partition_mgr.h I can > know with one glance that it does not have any special data or function. You need to keep the file in order to know that there is no information in this file? (Am I missing something?) > If I do not have such an h file to learn about the partition manager > where would I look for that info? Is "no file" - "no info" not clear enough? > If you keep track of the structure you do not need to dig so much to > find where the manager pointers are defined. You should KNOW they must > be passed to the manager on its initialization at the osm_sm . > > And yes - I call the state where you can know by simple rule what file > to look for some object definition a clear structure. And I call the > state when you can not tell which object should be defined in what file > - a mess. I agree with last paragraph ("AS IS" this may be good "rule" IMO), but cannot see how it is related to our discussion or to the proposed patches. The goal of those patches is not to move object definition to some unclear "secret" place, but to remove the useless object and associated with it obsolete header file. > > > > > If you want to redesign OpenSM structure to be "simpler" or more > > > "effective" you can propose doing that. But doing it in the salami > way > > > is just going to hurt stability and leave us with no structure at > all. > > > Unless we are willing to re-architect the code in a clean manner and > > > spend the years of development and validation for these changes - > lets > > > keep the code with clear structure. > > > > So your proposition is to wait years in order to remove unused object? > [EZ] If the pkey manager is not used - yes go ahead and remove it. Great. I am happy that you are agree to not keep unused objects. > But > the case is that there is a partition manager so my objection is to > having a manager without an h file. But why we need to keep the file without information? (somehow it reminds the joke about a programmer with two glasses - one with water and one without water... :)) > > Regarding code flattening and structure rules violations my position is > to avoid messing one project structure for obscure reasons, killing its > stability and structure. > > > > Also note that this cleanup has nothing similar with to re-architect > the > > code, it does not even touch OpenSM architecture. > [EZ] Call it whatever you like - if you continuously going to modify the > structure it is a major re-write which will impact stability. As programmer you should know that such kind of cleanup is safe and cannot cause instability. And if you are referring such kind of instability where moving any character in the source code may cause errors then I would prefer to "shake" the code just in order to find and fix more "hidden" bugs. > What > regression testing are you running before you posting these patches? I don't have too much serious test facilities, so my generic test case is simple - I'm trying to keep OpenSM running 24x7 on one of my machines (I need to stop it from time to time in order to rerun with fresh executable, to test some specific feature or for 32/64 bit change) under 'osmtest' and 'kill -HUP' in loop. Sasha From ishai at mellanox.co.il Tue May 23 11:14:12 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Tue, 23 May 2006 21:14:12 +0300 Subject: [openib-general] Get email notification to svn commit Message-ID: <20060523181412.GB15159@mellanox.co.il> Hi, I understand that there is a way to be informed by mail when there is a commit to the svn repository. How can I register? Can I register to get notification only on svn commit to specific directories? Thanks -- Ishai Rabinovitz From eeb at bartonsoftware.com Tue May 23 11:21:47 2006 From: eeb at bartonsoftware.com (Eric Barton) Date: Tue, 23 May 2006 19:21:47 +0100 Subject: [openib-general] different send and receive CQs Message-ID: <200605231821.k4NILlRT032327@robert.bartonsoftware.com> Hi, More dumb questions :) In my ULP (lustre networking) I maintain a common pool of send descriptors and per-connection receive descriptors. So it seems reasonable to have a single CQ for all sends and one CQ per-connection for receives. My code worked OK with a single CQ per connection, but when I try separate send and receive CQs, I don't see any send completions. Has anyone else tried separate send and receive CQs? -- Cheers, Eric From halr at voltaire.com Tue May 23 11:15:47 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 23 May 2006 14:15:47 -0400 Subject: [openib-general] Get email notification to svn commit In-Reply-To: <20060523181412.GB15159@mellanox.co.il> References: <20060523181412.GB15159@mellanox.co.il> Message-ID: <1148408147.4470.97884.camel@hal.voltaire.com> On Tue, 2006-05-23 at 14:14, Ishai Rabinovitz wrote: > Hi, > > I understand that there is a way to be informed by mail when there is a commit to the svn repository. > How can I register? See http://openib.org/mailman/listinfo/openib-commits > Can I register to get notification only on svn commit to specific directories? Not as far as I know. -- Hal > > Thanks From rdreier at cisco.com Tue May 23 11:26:52 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 11:26:52 -0700 Subject: [openib-general] different send and receive CQs In-Reply-To: <200605231821.k4NILlRT032327@robert.bartonsoftware.com> (Eric Barton's message of "Tue, 23 May 2006 19:21:47 +0100") References: <200605231821.k4NILlRT032327@robert.bartonsoftware.com> Message-ID: Eric> My code worked OK with a single CQ per connection, but when Eric> I try separate send and receive CQs, I don't see any send Eric> completions. Eric> Has anyone else tried separate send and receive CQs? Yes, for example I know Shirley Ma is working on patches that change IPoIB to use separate CQs for sends and receives. Are you not seeing any completions when you poll the CQ, or are you not getting completion events? Some things to check would be that you are requesting notification on all the CQs you want events on, and that you are not posting unsignaled send requests. - R. From bos at pathscale.com Tue May 23 11:32:28 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 23 May 2006 11:32:28 -0700 Subject: [openib-general] [PATCH 0 of 10] ipath patches for 2.6.17 Message-ID: Hi, Roland - Here are some patches for 2.6.17. They all fix kernel or userspace crasher bugs. I may have a few more for you in a while, too. Regards, Message-ID: Make sure modify_qp won't modify the QP if any of the changes failed. Signed-off-by: Bryan O'Sullivan diff -r bc968dacc860 -r bb640dcf4d9d drivers/infiniband/hw/ipath/ipath_qp.c --- a/drivers/infiniband/hw/ipath/ipath_qp.c Tue May 23 11:29:15 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_qp.c Tue May 23 11:29:15 2006 -0700 @@ -427,6 +427,7 @@ int ipath_modify_qp(struct ib_qp *ibqp, int ipath_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask) { + struct ipath_ibdev *dev = to_idev(ibqp->device); struct ipath_qp *qp = to_iqp(ibqp); enum ib_qp_state cur_state, new_state; unsigned long flags; @@ -443,6 +444,19 @@ int ipath_modify_qp(struct ib_qp *ibqp, attr_mask)) goto inval; + if (attr_mask & IB_QP_AV) + if (attr->ah_attr.dlid == 0 || + attr->ah_attr.dlid >= IPS_MULTICAST_LID_BASE) + goto inval; + + if (attr_mask & IB_QP_PKEY_INDEX) + if (attr->pkey_index >= ipath_layer_get_npkeys(dev->dd)) + goto inval; + + if (attr_mask & IB_QP_MIN_RNR_TIMER) + if (attr->min_rnr_timer > 31) + goto inval; + switch (new_state) { case IB_QPS_RESET: ipath_reset_qp(qp); @@ -457,13 +471,8 @@ int ipath_modify_qp(struct ib_qp *ibqp, } - if (attr_mask & IB_QP_PKEY_INDEX) { - struct ipath_ibdev *dev = to_idev(ibqp->device); - - if (attr->pkey_index >= ipath_layer_get_npkeys(dev->dd)) - goto inval; + if (attr_mask & IB_QP_PKEY_INDEX) qp->s_pkey_index = attr->pkey_index; - } if (attr_mask & IB_QP_DEST_QPN) qp->remote_qpn = attr->dest_qp_num; @@ -479,12 +488,8 @@ int ipath_modify_qp(struct ib_qp *ibqp, if (attr_mask & IB_QP_ACCESS_FLAGS) qp->qp_access_flags = attr->qp_access_flags; - if (attr_mask & IB_QP_AV) { - if (attr->ah_attr.dlid == 0 || - attr->ah_attr.dlid >= IPS_MULTICAST_LID_BASE) - goto inval; + if (attr_mask & IB_QP_AV) qp->remote_ah_attr = attr->ah_attr; - } if (attr_mask & IB_QP_PATH_MTU) qp->path_mtu = attr->path_mtu; @@ -499,11 +504,8 @@ int ipath_modify_qp(struct ib_qp *ibqp, qp->s_rnr_retry_cnt = qp->s_rnr_retry; } - if (attr_mask & IB_QP_MIN_RNR_TIMER) { - if (attr->min_rnr_timer > 31) - goto inval; + if (attr_mask & IB_QP_MIN_RNR_TIMER) qp->s_min_rnr_timer = attr->min_rnr_timer; - } if (attr_mask & IB_QP_QKEY) qp->qkey = attr->qkey; From bos at pathscale.com Tue May 23 11:32:31 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 23 May 2006 11:32:31 -0700 Subject: [openib-general] [PATCH 3 of 10] ipath - fix reporting of driver version to userspace In-Reply-To: Message-ID: <386fe7306b31e9866161.1148409151@eng-12.pathscale.com> Fix the interface version that gets exported to userspace. Signed-off-by: Bryan O'Sullivan diff -r bb640dcf4d9d -r 386fe7306b31 drivers/infiniband/hw/ipath/ipath_file_ops.c --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c Tue May 23 11:29:15 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c Tue May 23 11:29:15 2006 -0700 @@ -139,7 +139,7 @@ static int ipath_get_base_info(struct ip kinfo->spi_piosize = dd->ipath_ibmaxlen; kinfo->spi_mtu = dd->ipath_ibmaxlen; /* maxlen, not ibmtu */ kinfo->spi_port = pd->port_port; - kinfo->spi_sw_version = IPATH_USER_SWVERSION; + kinfo->spi_sw_version = IPATH_KERN_SWVERSION; kinfo->spi_hw_version = dd->ipath_revision; if (copy_to_user(ubase, kinfo, sizeof(*kinfo))) From bos at pathscale.com Tue May 23 11:32:29 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 23 May 2006 11:32:29 -0700 Subject: [openib-general] [PATCH 1 of 10] ipath - fix spinlock recursion bug In-Reply-To: Message-ID: The local loopback path for RC can lock the rkey table lock without blocking interrupts. The receive interrupt path can then call ipath_rkey_ok() and deadlock. Remove the redundant lock. Signed-off-by: Bryan O'Sullivan diff -r abcc41d46f4c -r bc968dacc860 drivers/infiniband/hw/ipath/ipath_keys.c --- a/drivers/infiniband/hw/ipath/ipath_keys.c Tue May 23 11:29:11 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_keys.c Tue May 23 11:29:15 2006 -0700 @@ -136,9 +136,7 @@ int ipath_lkey_ok(struct ipath_lkey_tabl ret = 1; goto bail; } - spin_lock(&rkt->lock); mr = rkt->table[(sge->lkey >> (32 - ib_ipath_lkey_table_size))]; - spin_unlock(&rkt->lock); if (unlikely(mr == NULL || mr->lkey != sge->lkey)) { ret = 0; goto bail; @@ -184,8 +182,6 @@ bail: * @acc: access flags * * Return 1 if successful, otherwise 0. - * - * The QP r_rq.lock should be held. */ int ipath_rkey_ok(struct ipath_ibdev *dev, struct ipath_sge_state *ss, u32 len, u64 vaddr, u32 rkey, int acc) @@ -196,9 +192,7 @@ int ipath_rkey_ok(struct ipath_ibdev *de size_t off; int ret; - spin_lock(&rkt->lock); mr = rkt->table[(rkey >> (32 - ib_ipath_lkey_table_size))]; - spin_unlock(&rkt->lock); if (unlikely(mr == NULL || mr->lkey != rkey)) { ret = 0; goto bail; From bos at pathscale.com Tue May 23 11:32:32 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 23 May 2006 11:32:32 -0700 Subject: [openib-general] [PATCH 4 of 10] ipath - replace uses of LIST_POISON In-Reply-To: Message-ID: Per Andrew's request. Signed-off-by: Bryan O'Sullivan diff -r 386fe7306b31 -r c7cf56636dd1 drivers/infiniband/hw/ipath/ipath_qp.c --- a/drivers/infiniband/hw/ipath/ipath_qp.c Tue May 23 11:29:15 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_qp.c Tue May 23 11:29:15 2006 -0700 @@ -375,10 +375,10 @@ static void ipath_error_qp(struct ipath_ spin_lock(&dev->pending_lock); /* XXX What if its already removed by the timeout code? */ - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); - if (qp->piowait.next != LIST_POISON1) - list_del(&qp->piowait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); + if (!list_empty(&qp->piowait)) + list_del_init(&qp->piowait); spin_unlock(&dev->pending_lock); wc.status = IB_WC_WR_FLUSH_ERR; @@ -712,10 +712,8 @@ struct ib_qp *ipath_create_qp(struct ib_ init_attr->qp_type == IB_QPT_RC ? ipath_do_rc_send : ipath_do_uc_send, (unsigned long)qp); - qp->piowait.next = LIST_POISON1; - qp->piowait.prev = LIST_POISON2; - qp->timerwait.next = LIST_POISON1; - qp->timerwait.prev = LIST_POISON2; + INIT_LIST_HEAD(&qp->piowait); + INIT_LIST_HEAD(&qp->timerwait); qp->state = IB_QPS_RESET; qp->s_wq = swq; qp->s_size = init_attr->cap.max_send_wr + 1; @@ -785,10 +783,10 @@ int ipath_destroy_qp(struct ib_qp *ibqp) /* Make sure the QP isn't on the timeout list. */ spin_lock_irqsave(&dev->pending_lock, flags); - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); - if (qp->piowait.next != LIST_POISON1) - list_del(&qp->piowait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); + if (!list_empty(&qp->piowait)) + list_del_init(&qp->piowait); spin_unlock_irqrestore(&dev->pending_lock, flags); /* @@ -857,10 +855,10 @@ void ipath_sqerror_qp(struct ipath_qp *q spin_lock(&dev->pending_lock); /* XXX What if its already removed by the timeout code? */ - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); - if (qp->piowait.next != LIST_POISON1) - list_del(&qp->piowait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); + if (!list_empty(&qp->piowait)) + list_del_init(&qp->piowait); spin_unlock(&dev->pending_lock); ipath_cq_enter(to_icq(qp->ibqp.send_cq), wc, 1); diff -r 386fe7306b31 -r c7cf56636dd1 drivers/infiniband/hw/ipath/ipath_rc.c --- a/drivers/infiniband/hw/ipath/ipath_rc.c Tue May 23 11:29:15 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_rc.c Tue May 23 11:29:15 2006 -0700 @@ -57,7 +57,7 @@ static void ipath_init_restart(struct ip qp->s_len = wqe->length - len; dev = to_idev(qp->ibqp.device); spin_lock(&dev->pending_lock); - if (qp->timerwait.next == LIST_POISON1) + if (list_empty(&qp->timerwait)) list_add_tail(&qp->timerwait, &dev->pending[dev->pending_index]); spin_unlock(&dev->pending_lock); @@ -356,7 +356,7 @@ static inline int ipath_make_rc_req(stru if ((int)(qp->s_psn - qp->s_next_psn) > 0) qp->s_next_psn = qp->s_psn; spin_lock(&dev->pending_lock); - if (qp->timerwait.next == LIST_POISON1) + if (list_empty(&qp->timerwait)) list_add_tail(&qp->timerwait, &dev->pending[dev->pending_index]); spin_unlock(&dev->pending_lock); @@ -726,8 +726,8 @@ void ipath_restart_rc(struct ipath_qp *q */ dev = to_idev(qp->ibqp.device); spin_lock(&dev->pending_lock); - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); spin_unlock(&dev->pending_lock); if (wqe->wr.opcode == IB_WR_RDMA_READ) @@ -886,8 +886,8 @@ static int do_rc_ack(struct ipath_qp *qp * just won't find anything to restart if we ACK everything. */ spin_lock(&dev->pending_lock); - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); spin_unlock(&dev->pending_lock); /* @@ -1194,8 +1194,7 @@ static inline void ipath_rc_rcv_resp(str IB_WR_RDMA_READ)) goto ack_done; spin_lock(&dev->pending_lock); - if (qp->s_rnr_timeout == 0 && - qp->timerwait.next != LIST_POISON1) + if (qp->s_rnr_timeout == 0 && !list_empty(&qp->timerwait)) list_move_tail(&qp->timerwait, &dev->pending[dev->pending_index]); spin_unlock(&dev->pending_lock); diff -r 386fe7306b31 -r c7cf56636dd1 drivers/infiniband/hw/ipath/ipath_ruc.c --- a/drivers/infiniband/hw/ipath/ipath_ruc.c Tue May 23 11:29:15 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c Tue May 23 11:29:15 2006 -0700 @@ -435,7 +435,7 @@ void ipath_no_bufs_available(struct ipat unsigned long flags; spin_lock_irqsave(&dev->pending_lock, flags); - if (qp->piowait.next == LIST_POISON1) + if (list_empty(&qp->piowait)) list_add_tail(&qp->piowait, &dev->piowait); spin_unlock_irqrestore(&dev->pending_lock, flags); /* diff -r 386fe7306b31 -r c7cf56636dd1 drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Tue May 23 11:29:15 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Tue May 23 11:29:15 2006 -0700 @@ -464,7 +464,7 @@ static void ipath_ib_timer(void *arg) last = &dev->pending[dev->pending_index]; while (!list_empty(last)) { qp = list_entry(last->next, struct ipath_qp, timerwait); - list_del(&qp->timerwait); + list_del_init(&qp->timerwait); qp->timer_next = resend; resend = qp; atomic_inc(&qp->refcount); @@ -474,7 +474,7 @@ static void ipath_ib_timer(void *arg) qp = list_entry(last->next, struct ipath_qp, timerwait); if (--qp->s_rnr_timeout == 0) { do { - list_del(&qp->timerwait); + list_del_init(&qp->timerwait); tasklet_hi_schedule(&qp->s_task); if (list_empty(last)) break; @@ -554,7 +554,7 @@ static int ipath_ib_piobufavail(void *ar while (!list_empty(&dev->piowait)) { qp = list_entry(dev->piowait.next, struct ipath_qp, piowait); - list_del(&qp->piowait); + list_del_init(&qp->piowait); tasklet_hi_schedule(&qp->s_task); } spin_unlock_irqrestore(&dev->pending_lock, flags); From bos at pathscale.com Tue May 23 11:32:33 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 23 May 2006 11:32:33 -0700 Subject: [openib-general] [PATCH 5 of 10] ipath - fix NULL dereference during cleanup In-Reply-To: Message-ID: <6bf52c0f0f0d0df39a78.1148409153@eng-12.pathscale.com> Fix NULL deref due to pcidev being clobbered before dd->ipath_f_cleanup() was called. Signed-off-by: Bryan O'Sullivan diff -r c7cf56636dd1 -r 6bf52c0f0f0d drivers/infiniband/hw/ipath/ipath_driver.c --- a/drivers/infiniband/hw/ipath/ipath_driver.c Tue May 23 11:29:15 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_driver.c Tue May 23 11:29:15 2006 -0700 @@ -1905,19 +1905,19 @@ static void __exit infinipath_cleanup(vo } else ipath_dbg("irq is 0, not doing free_irq " "for unit %u\n", dd->ipath_unit); + + /* + * we check for NULL here, because it's outside + * the kregbase check, and we need to call it + * after the free_irq. Thus it's possible that + * the function pointers were never initialized. + */ + if (dd->ipath_f_cleanup) + /* clean up chip-specific stuff */ + dd->ipath_f_cleanup(dd); + dd->pcidev = NULL; } - - /* - * we check for NULL here, because it's outside the kregbase - * check, and we need to call it after the free_irq. Thus - * it's possible that the function pointers were never - * initialized. - */ - if (dd->ipath_f_cleanup) - /* clean up chip-specific stuff */ - dd->ipath_f_cleanup(dd); - spin_lock_irqsave(&ipath_devs_lock, flags); } From bos at pathscale.com Tue May 23 11:32:38 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 23 May 2006 11:32:38 -0700 Subject: [openib-general] [PATCH 10 of 10] ipath - deref correct pointer when using kernel SMA In-Reply-To: Message-ID: At this point, the core QP structure hasn't been initialized, so what's in there isn't valid. Get the same information elsewhere. Signed-off-by: Bryan O'Sullivan diff -r 3d844dee2f61 -r c892bcb21ac1 drivers/infiniband/hw/ipath/ipath_qp.c --- a/drivers/infiniband/hw/ipath/ipath_qp.c Tue May 23 11:29:16 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_qp.c Tue May 23 11:29:16 2006 -0700 @@ -734,7 +734,7 @@ struct ib_qp *ipath_create_qp(struct ib_ ipath_reset_qp(qp); /* Tell the core driver that the kernel SMA is present. */ - if (qp->ibqp.qp_type == IB_QPT_SMI) + if (init_attr->qp_type == IB_QPT_SMI) ipath_layer_set_verbs_flags(dev->dd, IPATH_VERBS_KERNEL_SMA); break; From bos at pathscale.com Tue May 23 11:32:36 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 23 May 2006 11:32:36 -0700 Subject: [openib-general] [PATCH 8 of 10] ipath - register as IB device owner In-Reply-To: Message-ID: <551717ecc3dbd997fc64.1148409156@eng-12.pathscale.com> This fixes an oops. Signed-off-by: Bryan O'Sullivan diff -r 8d87788e21b1 -r 551717ecc3db drivers/infiniband/hw/ipath/ipath_verbs.c --- a/drivers/infiniband/hw/ipath/ipath_verbs.c Tue May 23 11:29:16 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Tue May 23 11:29:16 2006 -0700 @@ -951,6 +951,7 @@ static void *ipath_register_ib_device(in idev->dd = dd; strlcpy(dev->name, "ipath%d", IB_DEVICE_NAME_MAX); + dev->owner = THIS_MODULE; dev->node_guid = ipath_layer_get_guid(dd); dev->uverbs_abi_ver = IPATH_UVERBS_ABI_VERSION; dev->uverbs_cmd_mask = From bos at pathscale.com Tue May 23 11:32:35 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 23 May 2006 11:32:35 -0700 Subject: [openib-general] [PATCH 7 of 10] ipath - enable PE800 receive interrupts on user ports In-Reply-To: Message-ID: <8d87788e21b1800f357f.1148409155@eng-12.pathscale.com> Fixed so it works on the PE-800. It had not previously been updated to match PE-800 receive interrupt differences from HT-400. Signed-off-by: Bryan O'Sullivan diff -r 5d7e365286b3 -r 8d87788e21b1 drivers/infiniband/hw/ipath/ipath_file_ops.c --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c Tue May 23 11:29:16 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c Tue May 23 11:29:16 2006 -0700 @@ -1224,6 +1224,10 @@ static unsigned int ipath_poll(struct fi if (tail == head) { set_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag); + if(dd->ipath_rhdrhead_intr_off) /* arm rcv interrupt */ + (void)ipath_write_ureg(dd, ur_rcvhdrhead, + dd->ipath_rhdrhead_intr_off + | head, pd->port_port); poll_wait(fp, &pd->port_wait, pt); if (test_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag)) { From bos at pathscale.com Tue May 23 11:32:34 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 23 May 2006 11:32:34 -0700 Subject: [openib-general] [PATCH 6 of 10] ipath - enable GPIO interrupt on HT-460 In-Reply-To: Message-ID: <5d7e365286b3a3096fba.1148409154@eng-12.pathscale.com> This is required for even semi-decent performance on OpenIB. Signed-off-by: Bryan O'Sullivan diff -r 6bf52c0f0f0d -r 5d7e365286b3 drivers/infiniband/hw/ipath/ipath_eeprom.c --- a/drivers/infiniband/hw/ipath/ipath_eeprom.c Tue May 23 11:29:15 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c Tue May 23 11:29:16 2006 -0700 @@ -505,11 +505,10 @@ static u8 flash_csum(struct ipath_flash * ipath_get_guid - get the GUID from the i2c device * @dd: the infinipath device * - * When we add the multi-chip support, we will probably have to add - * the ability to use the number of guids field, and get the guid from - * the first chip's flash, to use for all of them. - */ -void ipath_get_guid(struct ipath_devdata *dd) + * We have the capability to use the ipath_nguid field, and get + * the guid from the first chip's flash, to use for all of them. + */ +void ipath_get_eeprom_info(struct ipath_devdata *dd) { void *buf; struct ipath_flash *ifp; diff -r 6bf52c0f0f0d -r 5d7e365286b3 drivers/infiniband/hw/ipath/ipath_ht400.c --- a/drivers/infiniband/hw/ipath/ipath_ht400.c Tue May 23 11:29:15 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_ht400.c Tue May 23 11:29:16 2006 -0700 @@ -607,7 +607,12 @@ static int ipath_ht_boardname(struct ipa case 4: /* Ponderosa is one of the bringup boards */ n = "Ponderosa"; break; - case 5: /* HT-460 original production board */ + case 5: + /* + * HT-460 original production board; two production levels, with + * different serial number ranges. See ipath_ht_early_init() for + * case where we enable IPATH_GPIO_INTR for later serial # range. + */ n = "InfiniPath_HT-460"; break; case 6: @@ -642,7 +647,7 @@ static int ipath_ht_boardname(struct ipa if (n) snprintf(name, namelen, "%s", n); - if (dd->ipath_majrev != 3 || dd->ipath_minrev != 2) { + if (dd->ipath_majrev != 3 || (dd->ipath_minrev < 2 || dd->ipath_minrev > 3)) { /* * This version of the driver only supports the HT-400 * Rev 3.2 @@ -1520,6 +1525,18 @@ static int ipath_ht_early_init(struct ip */ ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, INFINIPATH_S_ABORT); + + ipath_get_eeprom_info(dd); + if(dd->ipath_boardrev == 5 && dd->ipath_serial[0] == '1' && + dd->ipath_serial[1] == '2' && dd->ipath_serial[2] == '8') { + /* + * Later production HT-460 has same changes as HT-465, so + * can use GPIO interrupts. They have serial #'s starting + * with 128, rather than 112. + */ + dd->ipath_flags |= IPATH_GPIO_INTR; + dd->ipath_flags &= ~IPATH_POLL_RX_INTR; + } return 0; } diff -r 6bf52c0f0f0d -r 5d7e365286b3 drivers/infiniband/hw/ipath/ipath_init_chip.c --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c Tue May 23 11:29:15 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Tue May 23 11:29:16 2006 -0700 @@ -879,7 +879,6 @@ int ipath_init_chip(struct ipath_devdata done: if (!ret) { - ipath_get_guid(dd); *dd->ipath_statusp |= IPATH_STATUS_CHIP_PRESENT; if (!dd->ipath_f_intrsetup(dd)) { /* now we can enable all interrupts from the chip */ diff -r 6bf52c0f0f0d -r 5d7e365286b3 drivers/infiniband/hw/ipath/ipath_kernel.h --- a/drivers/infiniband/hw/ipath/ipath_kernel.h Tue May 23 11:29:15 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Tue May 23 11:29:16 2006 -0700 @@ -650,7 +650,7 @@ void ipath_init_pe800_funcs(struct ipath void ipath_init_pe800_funcs(struct ipath_devdata *); /* init HT-400-specific func */ void ipath_init_ht400_funcs(struct ipath_devdata *); -void ipath_get_guid(struct ipath_devdata *); +void ipath_get_eeprom_info(struct ipath_devdata *); u64 ipath_snap_cntr(struct ipath_devdata *, ipath_creg); /* diff -r 6bf52c0f0f0d -r 5d7e365286b3 drivers/infiniband/hw/ipath/ipath_pe800.c --- a/drivers/infiniband/hw/ipath/ipath_pe800.c Tue May 23 11:29:15 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_pe800.c Tue May 23 11:29:16 2006 -0700 @@ -1180,6 +1180,8 @@ static int ipath_pe_early_init(struct ip */ dd->ipath_rhdrhead_intr_off = 1ULL<<32; + ipath_get_eeprom_info(dd); + return 0; } From bos at pathscale.com Tue May 23 11:32:37 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 23 May 2006 11:32:37 -0700 Subject: [openib-general] [PATCH 9 of 10] ipath - fix null deref during rdma ops In-Reply-To: Message-ID: <3d844dee2f612417bdb8.1148409157@eng-12.pathscale.com> The problem was that node A's sending thread, which handles sending RDMA read response data, would write the trigger word, the last packet would be sent, node B would send a new RDMA read request, node A's interrupt handler would initialize s_rdma_sge, then node A's sending thread would update s_rdma_sge. This didn't happen very often naturally but was more frequent with 1 byte RDMA reads. Rather than adding more locking or increasing the QP structure size and copying sge data, I modified the copy routine to update the pointers before writing the trigger word to avoid the update race. Signed-off-by: Ralph Campbell Signed-off-by: Bryan O'Sullivan diff -r 551717ecc3db -r 3d844dee2f61 drivers/infiniband/hw/ipath/ipath_layer.c --- a/drivers/infiniband/hw/ipath/ipath_layer.c Tue May 23 11:29:16 2006 -0700 +++ b/drivers/infiniband/hw/ipath/ipath_layer.c Tue May 23 11:29:16 2006 -0700 @@ -872,12 +872,13 @@ static void copy_io(u32 __iomem *piobuf, update_sge(ss, len); length -= len; } + /* Update address before sending packet. */ + update_sge(ss, length); /* must flush early everything before trigger word */ ipath_flush_wc(); __raw_writel(last, piobuf); /* be sure trigger word is written */ ipath_flush_wc(); - update_sge(ss, length); } /** @@ -943,17 +944,18 @@ int ipath_verbs_send(struct ipath_devdat if (likely(ss->num_sge == 1 && len <= ss->sge.length && !((unsigned long)ss->sge.vaddr & (sizeof(u32) - 1)))) { u32 w; - + u32 *addr = (u32 *) ss->sge.vaddr; + + /* Update address before sending packet. */ + update_sge(ss, len); /* Need to round up for the last dword in the packet. */ w = (len + 3) >> 2; - __iowrite32_copy(piobuf, ss->sge.vaddr, w - 1); + __iowrite32_copy(piobuf, addr, w - 1); /* must flush early everything before trigger word */ ipath_flush_wc(); - __raw_writel(((u32 *) ss->sge.vaddr)[w - 1], - piobuf + w - 1); + __raw_writel(addr[w - 1], piobuf + w - 1); /* be sure trigger word is written */ ipath_flush_wc(); - update_sge(ss, len); ret = 0; goto bail; } From sean.hefty at intel.com Tue May 23 12:12:53 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 23 May 2006 12:12:53 -0700 Subject: [openib-general] [PATCH] git rdma_cm: update CMA in for-2.6.18 branch Message-ID: This patch should bring the RDMA CM in the 2.6.18 branch up to the tip. Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 0e6e4d6..b798f77 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include #include @@ -82,6 +83,7 @@ enum cma_state { CMA_ROUTE_QUERY, CMA_ROUTE_RESOLVED, CMA_CONNECT, + CMA_DISCONNECT, CMA_ADDR_BOUND, CMA_LISTEN, CMA_DEVICE_REMOVAL, @@ -767,10 +769,10 @@ static int cma_ib_handler(struct ib_cm_i u8 private_data_len = 0; int ret = 0, status = 0; + atomic_inc(&id_priv->dev_remove); if (!cma_comp(id_priv, CMA_CONNECT)) - return 0; + goto out; - atomic_inc(&id_priv->dev_remove); switch (ib_event->event) { case IB_CM_REQ_ERROR: case IB_CM_REP_ERROR: @@ -781,7 +783,7 @@ static int cma_ib_handler(struct ib_cm_i status = cma_verify_rep(id_priv, ib_event->private_data); if (status) event = RDMA_CM_EVENT_CONNECT_ERROR; - else if (id_priv->id.qp) { + else if (id_priv->id.qp && id_priv->id.ps != RDMA_PS_SDP) { status = cma_rep_recv(id_priv); event = status ? RDMA_CM_EVENT_CONNECT_ERROR : RDMA_CM_EVENT_ESTABLISHED; @@ -798,6 +800,8 @@ static int cma_ib_handler(struct ib_cm_i status = -ETIMEDOUT; /* fall through */ case IB_CM_DREQ_RECEIVED: case IB_CM_DREP_RECEIVED: + if (!cma_comp_exch(id_priv, CMA_CONNECT, CMA_DISCONNECT)) + goto out; event = RDMA_CM_EVENT_DISCONNECTED; break; case IB_CM_TIMEWAIT_EXIT: @@ -1201,6 +1205,30 @@ err1: return ret; } +int rdma_set_ib_paths(struct rdma_cm_id *id, + struct ib_sa_path_rec *path_rec, int num_paths) +{ + struct rdma_id_private *id_priv; + int ret; + + id_priv = container_of(id, struct rdma_id_private, id); + if (!cma_comp_exch(id_priv, CMA_ADDR_RESOLVED, CMA_ROUTE_RESOLVED)) + return -EINVAL; + + id->route.path_rec = kmalloc(sizeof *path_rec * num_paths, GFP_KERNEL); + if (!id->route.path_rec) { + ret = -ENOMEM; + goto err; + } + + memcpy(id->route.path_rec, path_rec, sizeof *path_rec * num_paths); + return 0; +err: + cma_comp_exch(id_priv, CMA_ROUTE_RESOLVED, CMA_ADDR_RESOLVED); + return ret; +} +EXPORT_SYMBOL(rdma_set_ib_paths); + int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms) { struct rdma_id_private *id_priv; @@ -1734,7 +1762,8 @@ int rdma_disconnect(struct rdma_cm_id *i int ret; id_priv = container_of(id, struct rdma_id_private, id); - if (!cma_comp(id_priv, CMA_CONNECT)) + if (!cma_comp(id_priv, CMA_CONNECT) && + !cma_comp(id_priv, CMA_DISCONNECT)) return -EINVAL; ret = cma_modify_qp_err(id); diff --git a/include/rdma/rdma_cm_ib.h b/include/rdma/rdma_cm_ib.h new file mode 100644 index 0000000..e8c3af1 --- /dev/null +++ b/include/rdma/rdma_cm_ib.h @@ -0,0 +1,47 @@ +/* + * Copyright (c) 2006 Intel Corporation. All rights reserved. + * + * This Software is licensed under one of the following licenses: + * + * 1) under the terms of the "Common Public License 1.0" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/cpl.php. + * + * 2) under the terms of the "The BSD License" a copy of which is + * available from the Open Source Initiative, see + * http://www.opensource.org/licenses/bsd-license.php. + * + * 3) under the terms of the "GNU General Public License (GPL) Version 2" a + * copy of which is available from the Open Source Initiative, see + * http://www.opensource.org/licenses/gpl-license.php. + * + * Licensee has the right to choose one of the above licenses. + * + * Redistributions of source code must retain the above copyright + * notice and one of the license notices. + * + * Redistributions in binary form must reproduce both the above copyright + * notice, one of the license notices in the documentation + * and/or other materials provided with the distribution. + * + */ + +#if !defined(RDMA_CM_IB_H) +#define RDMA_CM_IB_H + +#include + +/** + * rdma_set_ib_paths - Manually sets the path records used to establish a + * connection. + * @id: Connection identifier associated with the request. + * @path_rec: Reference to the path record + * + * This call permits a user to specify routing information for rdma_cm_id's + * bound to Infiniband devices. It is called on the client side of a + * connection and replaces the call to rdma_resolve_route. + */ +int rdma_set_ib_paths(struct rdma_cm_id *id, + struct ib_sa_path_rec *path_rec, int num_paths); + +#endif /* RDMA_CM_IB_H */ From paul.lundin at gmail.com Tue May 23 12:34:09 2006 From: paul.lundin at gmail.com (Paul Lundin) Date: Tue, 23 May 2006 15:34:09 -0400 Subject: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o Message-ID: Hi All, I just started working with openIB in the past week. I am having an issue getting the kernel modules to compile with the stock rhel4 u3 kernel. I have applied the patches found at https://openib.org/svn/gen2/branches/backport/2.6.9_U3/ and followed the instructions from https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet but I have been getting the following error: LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/built-in.o LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/built-in.o CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/index.o CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/addr.o CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.o /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.c: In function `ib_cm_cleanup': /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.c:3367: warning: implicit declaration of function `idr_destroy' CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/packer.o CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64 /drivers/infiniband/core/ud_header.o CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/verbs.o CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.o /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.c:693: error: unknown field `uevent' specified in initializer /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.c:693: warning: initialization from incompatible pointer type make[2]: *** [/usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.o] Error 1 make[1]: *** [/usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core] Error 2 make: *** [_module_/usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband] Error 2 make: Leaving directory `/usr/src/kernels/2.6.9-34.EL-ppc64' Any help would be appreciated. As noted this is on a ppc64 machine. The rhel4 u3 install does *NOT* configure openIB by default like it does on intel architectures. I was wondering if openIB has been tested at all on ppc64 and if this was even possible at this point. Regards. Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Tue May 23 12:36:16 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 23 May 2006 22:36:16 +0300 Subject: [openib-general] Re: Plans for libibverbs 1.1 In-Reply-To: References: <20060523173047.GC3377@mellanox.co.il> Message-ID: <20060523193616.GD3377@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: Plans for libibverbs 1.1 > > Michael> Might some place under branches be better? > > Someplace under git would be better of course ;) > > But I don't think I want to move the libibverbs-1.0 tree too far out > of the way. After all it will still be required for low-level driver > libraries that haven't been converted yet. I'm playing with git-svn a bit - seems to work so far. Is it possible to use it to develop in git and keep svn in sync? -- MST From sweitzen at cisco.com Tue May 23 12:38:56 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 23 May 2006 12:38:56 -0700 Subject: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o Message-ID: OFED 1.0 rc4 does compile and run on RHEL4 U3 ppc64. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Paul Lundin Sent: Tuesday, May 23, 2006 12:34 PM To: openib-general at openib.org Subject: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o Hi All, I just started working with openIB in the past week. I am having an issue getting the kernel modules to compile with the stock rhel4 u3 kernel. I have applied the patches found at https://openib.org/svn/gen2/branches/backport/2.6.9_U3/ and followed the instructions from https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet but I have been getting the following error: LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/built-in.o LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/built-in.o CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/index.o CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/addr.o CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.o /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/cm.c: In function `ib_cm_cleanup': /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.c:3367: warning: implicit declaration of function `idr_destroy' CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/packer.o CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/ud_header.o CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/verbs.o CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/sysfs.o /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.c:693: error: unknown field `uevent' specified in initializer /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.c:693: warning: initialization from incompatible pointer type make[2]: *** [/usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.o] Error 1 make[1]: *** [/usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core] Error 2 make: *** [_module_/usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband] Error 2 make: Leaving directory `/usr/src/kernels/2.6.9-34.EL-ppc64' Any help would be appreciated. As noted this is on a ppc64 machine. The rhel4 u3 install does *NOT* configure openIB by default like it does on intel architectures. I was wondering if openIB has been tested at all on ppc64 and if this was even possible at this point. Regards. Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.lundin at gmail.com Tue May 23 12:42:24 2006 From: paul.lundin at gmail.com (Paul) Date: Tue, 23 May 2006 15:42:24 -0400 Subject: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o In-Reply-To: References: Message-ID: Scott, Thanks for the confirmation and the quick reply. Any ideas as to what might be causing the error in question ? Regards. On 5/23/06, Scott Weitzenkamp (sweitzen) wrote: > > OFED 1.0 rc4 does compile and run on RHEL4 U3 ppc64. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > ------------------------------ > *From:* openib-general-bounces at openib.org [mailto: > openib-general-bounces at openib.org] *On Behalf Of *Paul Lundin > *Sent:* Tuesday, May 23, 2006 12:34 PM > *To:* openib-general at openib.org > *Subject:* [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o > > Hi All, > I just started working with openIB in the past week. I am having an > issue getting the kernel modules to compile with the stock rhel4 u3 kernel. > I have applied the patches found at > https://openib.org/svn/gen2/branches/backport/2.6.9_U3/ and followed the > instructions from https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet > but I have been getting the following error: > > LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/built-in.o > LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/built-in.o > CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/index.o > CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/addr.o > CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.o > /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/cm.c: In > function `ib_cm_cleanup': > /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.c:3367: > warning: implicit declaration of function `idr_destroy' > CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64 > /drivers/infiniband/core/packer.o > CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64 > /drivers/infiniband/core/ud_header.o > CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/verbs.o > CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/sysfs.o > /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.c:693: > error: unknown field `uevent' specified in initializer > /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.c:693: > warning: initialization from incompatible pointer type > make[2]: *** [/usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.o] > Error 1 > make[1]: *** [/usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core] > Error 2 > make: *** [_module_/usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband] > Error 2 > make: Leaving directory `/usr/src/kernels/2.6.9-34.EL-ppc64' > > Any help would be appreciated. As noted this is on a ppc64 machine. The > rhel4 u3 install does *NOT* configure openIB by default like it does on > intel architectures. I was wondering if openIB has been tested at all on > ppc64 and if this was even possible at this point. > > Regards. > Paul > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Tue May 23 12:43:56 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Tue, 23 May 2006 12:43:56 -0700 Subject: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o Message-ID: No clue, I know if you grab OFED 1.0 rc4 tarball and run install.sh, it should work. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: Paul [mailto:paul.lundin at gmail.com] Sent: Tuesday, May 23, 2006 12:42 PM To: Scott Weitzenkamp (sweitzen) Cc: openib-general at openib.org Subject: Re: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o Scott, Thanks for the confirmation and the quick reply. Any ideas as to what might be causing the error in question ? Regards. On 5/23/06, Scott Weitzenkamp (sweitzen) wrote: OFED 1.0 rc4 does compile and run on RHEL4 U3 ppc64. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Paul Lundin Sent: Tuesday, May 23, 2006 12:34 PM To: openib-general at openib.org Subject: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o Hi All, I just started working with openIB in the past week. I am having an issue getting the kernel modules to compile with the stock rhel4 u3 kernel. I have applied the patches found at https://openib.org/svn/gen2/branches/backport/2.6.9_U3/ and followed the instructions from https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet but I have been getting the following error: LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/built-in.o LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/built-in.o CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/index.o CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/addr.o CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.o /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/cm.c: In function `ib_cm_cleanup': /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.c:3367: warning: implicit declaration of function `idr_destroy' CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/packer.o CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/ud_header.o CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/verbs.o CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/sysfs.o /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.c:693: error: unknown field `uevent' specified in initializer /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.c:693: warning: initialization from incompatible pointer type make[2]: *** [/usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.o] Error 1 make[1]: *** [/usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core] Error 2 make: *** [_module_/usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband] Error 2 make: Leaving directory `/usr/src/kernels/2.6.9-34.EL-ppc64' Any help would be appreciated. As noted this is on a ppc64 machine. The rhel4 u3 install does *NOT* configure openIB by default like it does on intel architectures. I was wondering if openIB has been tested at all on ppc64 and if this was even possible at this point. Regards. Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue May 23 12:49:15 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 12:49:15 -0700 Subject: [openib-general] Re: Plans for libibverbs 1.1 In-Reply-To: <20060523193616.GD3377@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 23 May 2006 22:36:16 +0300") References: <20060523173047.GC3377@mellanox.co.il> <20060523193616.GD3377@mellanox.co.il> Message-ID: Michael> I'm playing with git-svn a bit - seems to work so far. Michael> Is it possible to use it to develop in git and keep svn Michael> in sync? Yes, but obviously what happens in the svn tree cannot be something that can't be represented in svn. - R. From mst at mellanox.co.il Tue May 23 12:55:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 23 May 2006 22:55:54 +0300 Subject: [openib-general] Re: Plans for libibverbs 1.1 In-Reply-To: References: <20060523173047.GC3377@mellanox.co.il> <20060523193616.GD3377@mellanox.co.il> Message-ID: <20060523195554.GF3377@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: Plans for libibverbs 1.1 > > Michael> I'm playing with git-svn a bit - seems to work so far. > Michael> Is it possible to use it to develop in git and keep svn > Michael> in sync? > > Yes, but obviously what happens in the svn tree cannot be something > that can't be represented in svn. Couldn't parse this. Could you explain? -- MST From rdreier at cisco.com Tue May 23 13:07:54 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 13:07:54 -0700 Subject: [openib-general] Re: Plans for libibverbs 1.1 In-Reply-To: <20060523195554.GF3377@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 23 May 2006 22:55:54 +0300") References: <20060523173047.GC3377@mellanox.co.il> <20060523193616.GD3377@mellanox.co.il> <20060523195554.GF3377@mellanox.co.il> Message-ID: Michael> Couldn't parse this. Could you explain? Just that git-svn doesn't really help all that much if the canonical upstream repository is still svn. You can't do anything beyond the svn model of linear history anyway. - R. From mst at mellanox.co.il Tue May 23 13:09:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 23 May 2006 23:09:19 +0300 Subject: [openib-general] Re: Compilation issues on rhel4 u3 ppc64 sysfs.o In-Reply-To: References: Message-ID: <20060523200919.GG3377@mellanox.co.il> Quoting r. Paul Lundin : > Subject: Compilation issues on rhel4 u3 ppc64 sysfs.o > > Hi All, > I just started working with openIB in the past week. I am having an issue getting the kernel modules to compile with the stock rhel4 u3 kernel. I have applied the patches found at https://openib.org/svn/gen2/branches/backport/2.6.9_U3/ and followed the instructions from https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet Note these instructions are for 2.6.16 > but I have been getting the following error: > > LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/built-in.o > LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/built-in.o > CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/index.o > CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/addr.o > CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.o > /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/cm.c: In function `ib_cm_cleanup': > /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.c:3367: warning: implicit declaration of function `idr_destroy' After you apply linux_idr_6554_to_2_6_13.patch a file ./include/linux/idr.h will be created for you. You should build with directory ./include/linux first on include list for this to take effect. something like make -C $(KSRC) SUBDIRS="$(CWD)/$(SRC)linux-kernel/infiniband" \ LINUXINCLUDE='-I$(CWD)/include \ -I$(CWD)/$(SRC)linux-kernel/infiniband/include \ -Iinclude \ $$(if $$(KBUILD_SRC),-Iinclude2 -I$$(srctree)/include)' \ -- MST From mst at mellanox.co.il Tue May 23 13:13:15 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 23 May 2006 23:13:15 +0300 Subject: [openib-general] Re: Plans for libibverbs 1.1 In-Reply-To: References: <20060523173047.GC3377@mellanox.co.il> <20060523193616.GD3377@mellanox.co.il> <20060523195554.GF3377@mellanox.co.il> Message-ID: <20060523201315.GH3377@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: Plans for libibverbs 1.1 > > Michael> Couldn't parse this. Could you explain? > > Just that git-svn doesn't really help all that much if the canonical > upstream repository is still svn. You can't do anything beyond the > svn model of linear history anyway. What is meant by canonical repository? I assume development will be done in git and svn will track some git tree since early tasters seem to like it. -- MST From rdreier at cisco.com Tue May 23 13:17:14 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 13:17:14 -0700 Subject: [openib-general] Re: Plans for libibverbs 1.1 In-Reply-To: <20060523201315.GH3377@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 23 May 2006 23:13:15 +0300") References: <20060523173047.GC3377@mellanox.co.il> <20060523193616.GD3377@mellanox.co.il> <20060523195554.GF3377@mellanox.co.il> <20060523201315.GH3377@mellanox.co.il> Message-ID: Michael> What is meant by canonical repository? I assume Michael> development will be done in git and svn will track some Michael> git tree since early tasters seem to like it. Yeah, I guess that could work. But it doesn't really address the issue of where to put libibverbs 1.0 vs. 1.1 in svn. (And I'm not sure git-svn can be used to keep two svn trees in sync from the same git repository -- can it?) - R. From mst at mellanox.co.il Tue May 23 13:24:56 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 23 May 2006 23:24:56 +0300 Subject: [openib-general] Re: Plans for libibverbs 1.1 In-Reply-To: References: <20060523173047.GC3377@mellanox.co.il> <20060523193616.GD3377@mellanox.co.il> <20060523195554.GF3377@mellanox.co.il> <20060523201315.GH3377@mellanox.co.il> Message-ID: <20060523202456.GJ3377@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: Plans for libibverbs 1.1 > > Michael> What is meant by canonical repository? I assume > Michael> development will be done in git and svn will track some > Michael> git tree since early tasters seem to like it. > > Yeah, I guess that could work. But it doesn't really address the > issue of where to put libibverbs 1.0 vs. 1.1 in svn. Right. > (And I'm not > sure git-svn can be used to keep two svn trees in sync from the same > git repository -- can it?) Assuming development is done in git, won't we just have two git trees - for 1.0 and for 1.1? -- MST From paul.lundin at gmail.com Tue May 23 13:24:20 2006 From: paul.lundin at gmail.com (Paul) Date: Tue, 23 May 2006 16:24:20 -0400 Subject: [openib-general] Re: Compilation issues on rhel4 u3 ppc64 sysfs.o In-Reply-To: <20060523200919.GG3377@mellanox.co.il> References: <20060523200919.GG3377@mellanox.co.il> Message-ID: Michael, Thanks. I will try this if the build scripts from the OFED tarball dont work. (I was unaware that such a thing existed.) Thanks. On 5/23/06, Michael S. Tsirkin wrote: > > Quoting r. Paul Lundin : > > Subject: Compilation issues on rhel4 u3 ppc64 sysfs.o > > > > Hi All, > > I just started working with openIB in the past week. I am having an > issue getting the kernel modules to compile with the stock rhel4 u3 kernel. > I have applied the patches found at > https://openib.org/svn/gen2/branches/backport/2.6.9_U3/ and followed the > instructions from > https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet > > Note these instructions are for 2.6.16 > > > but I have been getting the following error: > > > > LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/built-in.o > > LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/built-in.o > > CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64 > /drivers/infiniband/core/index.o > > CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/addr.o > > CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.o > > /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/cm.c: In > function `ib_cm_cleanup': > > /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.c:3367: > warning: implicit declaration of function `idr_destroy' > > After you apply > linux_idr_6554_to_2_6_13.patch > > a file ./include/linux/idr.h will be created for you. > > You should build with directory ./include/linux first on include list > for this to take effect. > > something like > > make -C $(KSRC) SUBDIRS="$(CWD)/$(SRC)linux-kernel/infiniband" \ > LINUXINCLUDE='-I$(CWD)/include \ > -I$(CWD)/$(SRC)linux-kernel/infiniband/include \ > -Iinclude \ > $$(if $$(KBUILD_SRC),-Iinclude2 -I$$(srctree)/include)' \ > > > -- > MST > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue May 23 13:26:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 13:26:07 -0700 Subject: [openib-general] Re: Plans for libibverbs 1.1 In-Reply-To: <20060523202456.GJ3377@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 23 May 2006 23:24:56 +0300") References: <20060523173047.GC3377@mellanox.co.il> <20060523193616.GD3377@mellanox.co.il> <20060523195554.GF3377@mellanox.co.il> <20060523201315.GH3377@mellanox.co.il> <20060523202456.GJ3377@mellanox.co.il> Message-ID: Michael> Assuming development is done in git, won't we just have Michael> two git trees - for 1.0 and for 1.1? I think that misses the point of git. By keeping multiple branches in the same git repository then we have full history information, merging patches to both branches becomes easier, etc, etc. - R. From mst at mellanox.co.il Tue May 23 13:33:39 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 23 May 2006 23:33:39 +0300 Subject: [openib-general] Re: Plans for libibverbs 1.1 In-Reply-To: References: <20060523173047.GC3377@mellanox.co.il> <20060523193616.GD3377@mellanox.co.il> <20060523195554.GF3377@mellanox.co.il> <20060523201315.GH3377@mellanox.co.il> <20060523202456.GJ3377@mellanox.co.il> Message-ID: <20060523203339.GK3377@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [openib-general] Re: Plans for libibverbs 1.1 > > Michael> Assuming development is done in git, won't we just have > Michael> two git trees - for 1.0 and for 1.1? > > I think that misses the point of git. By keeping multiple branches in > the same git repository then we have full history information, merging > patches to both branches becomes easier, etc, etc. Right. So we need 1.0 and 1.1 branches in the same git repository, and export each of these to an svn branch. -- MST From mst at mellanox.co.il Tue May 23 13:34:08 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 23 May 2006 23:34:08 +0300 Subject: [openib-general] Re: Compilation issues on rhel4 u3 ppc64 sysfs.o In-Reply-To: References: <20060523200919.GG3377@mellanox.co.il> Message-ID: <20060523203408.GL3377@mellanox.co.il> They should work, that's what they do internally. Quoting r. Paul : Subject: Re: Compilation issues on rhel4 u3 ppc64 sysfs.o Michael, Thanks. I will try this if the build scripts from the OFED tarball dont work. (I was unaware that such a thing existed.) Thanks. On 5/23/06, Michael S. Tsirkin wrote: Quoting r. Paul Lundin : > Subject: Compilation issues on rhel4 u3 ppc64 sysfs.o > > Hi All, > I just started working with openIB in the past week. I am having an issue getting the kernel modules to compile with the stock rhel4 u3 kernel. I have applied the patches found at https://openib.org/svn/gen2/branches/backport/2.6.9_U3/ and followed the instructions from https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet Note these instructions are for 2.6.16 > but I have been getting the following error: > > LD /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/built-in.o > LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/built-in.o > CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/index.o > CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/addr.o > CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.o > /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/cm.c: In function `ib_cm_cleanup': > /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.c:3367: warning: implicit declaration of function `idr_destroy' After you apply linux_idr_6554_to_2_6_13.patch a file ./include/linux/idr.h will be created for you. You should build with directory ./include/linux first on include list for this to take effect. something like make -C $(KSRC) SUBDIRS="$(CWD)/$(SRC)linux-kernel/infiniband" \ LINUXINCLUDE='-I$(CWD)/include \ -I$(CWD)/$(SRC)linux-kernel/infiniband/include \ -Iinclude \ $$(if $$(KBUILD_SRC),-Iinclude2 -I$$(srctree)/include)' \ -- MST -- MST From rdreier at cisco.com Tue May 23 14:05:14 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 14:05:14 -0700 Subject: [openib-general] [git pull] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This contains fixes for the new ipath driver. The changes are a little bit bigger than I would prefer at this stage in the release cycle, but since the ipath driver is new in 2.6.17, there's no risk of regressions against 2.6.16 ;) Bryan O'Sullivan: IB/ipath: fix spinlock recursion bug IB/ipath: don't modify QP if changes fail IB/ipath: fix reporting of driver version to userspace IB/ipath: replace uses of LIST_POISON IB/ipath: fix NULL dereference during cleanup IB/ipath: enable GPIO interrupt on HT-460 IB/ipath: enable PE800 receive interrupts on user ports IB/ipath: register as IB device owner IB/ipath: fix null deref during rdma ops IB/ipath: deref correct pointer when using kernel SMA drivers/infiniband/hw/ipath/ipath_driver.c | 22 ++++----- drivers/infiniband/hw/ipath/ipath_eeprom.c | 7 +-- drivers/infiniband/hw/ipath/ipath_file_ops.c | 6 ++ drivers/infiniband/hw/ipath/ipath_ht400.c | 21 +++++++- drivers/infiniband/hw/ipath/ipath_init_chip.c | 1 drivers/infiniband/hw/ipath/ipath_kernel.h | 2 - drivers/infiniband/hw/ipath/ipath_keys.c | 6 -- drivers/infiniband/hw/ipath/ipath_layer.c | 12 +++-- drivers/infiniband/hw/ipath/ipath_pe800.c | 2 + drivers/infiniband/hw/ipath/ipath_qp.c | 64 +++++++++++++------------ drivers/infiniband/hw/ipath/ipath_rc.c | 15 +++--- drivers/infiniband/hw/ipath/ipath_ruc.c | 2 - drivers/infiniband/hw/ipath/ipath_verbs.c | 7 ++- 13 files changed, 92 insertions(+), 75 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index 3697eda..dddcdae 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -1905,19 +1905,19 @@ static void __exit infinipath_cleanup(vo } else ipath_dbg("irq is 0, not doing free_irq " "for unit %u\n", dd->ipath_unit); - dd->pcidev = NULL; - } - /* - * we check for NULL here, because it's outside the kregbase - * check, and we need to call it after the free_irq. Thus - * it's possible that the function pointers were never - * initialized. - */ - if (dd->ipath_f_cleanup) - /* clean up chip-specific stuff */ - dd->ipath_f_cleanup(dd); + /* + * we check for NULL here, because it's outside + * the kregbase check, and we need to call it + * after the free_irq. Thus it's possible that + * the function pointers were never initialized. + */ + if (dd->ipath_f_cleanup) + /* clean up chip-specific stuff */ + dd->ipath_f_cleanup(dd); + dd->pcidev = NULL; + } spin_lock_irqsave(&ipath_devs_lock, flags); } diff --git a/drivers/infiniband/hw/ipath/ipath_eeprom.c b/drivers/infiniband/hw/ipath/ipath_eeprom.c index f11a900..a2f1cea 100644 --- a/drivers/infiniband/hw/ipath/ipath_eeprom.c +++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c @@ -505,11 +505,10 @@ static u8 flash_csum(struct ipath_flash * ipath_get_guid - get the GUID from the i2c device * @dd: the infinipath device * - * When we add the multi-chip support, we will probably have to add - * the ability to use the number of guids field, and get the guid from - * the first chip's flash, to use for all of them. + * We have the capability to use the ipath_nguid field, and get + * the guid from the first chip's flash, to use for all of them. */ -void ipath_get_guid(struct ipath_devdata *dd) +void ipath_get_eeprom_info(struct ipath_devdata *dd) { void *buf; struct ipath_flash *ifp; diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c index c347191..ada267e 100644 --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c @@ -139,7 +139,7 @@ static int ipath_get_base_info(struct ip kinfo->spi_piosize = dd->ipath_ibmaxlen; kinfo->spi_mtu = dd->ipath_ibmaxlen; /* maxlen, not ibmtu */ kinfo->spi_port = pd->port_port; - kinfo->spi_sw_version = IPATH_USER_SWVERSION; + kinfo->spi_sw_version = IPATH_KERN_SWVERSION; kinfo->spi_hw_version = dd->ipath_revision; if (copy_to_user(ubase, kinfo, sizeof(*kinfo))) @@ -1224,6 +1224,10 @@ static unsigned int ipath_poll(struct fi if (tail == head) { set_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag); + if(dd->ipath_rhdrhead_intr_off) /* arm rcv interrupt */ + (void)ipath_write_ureg(dd, ur_rcvhdrhead, + dd->ipath_rhdrhead_intr_off + | head, pd->port_port); poll_wait(fp, &pd->port_wait, pt); if (test_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag)) { diff --git a/drivers/infiniband/hw/ipath/ipath_ht400.c b/drivers/infiniband/hw/ipath/ipath_ht400.c index 4652435..fac0a2b 100644 --- a/drivers/infiniband/hw/ipath/ipath_ht400.c +++ b/drivers/infiniband/hw/ipath/ipath_ht400.c @@ -607,7 +607,12 @@ static int ipath_ht_boardname(struct ipa case 4: /* Ponderosa is one of the bringup boards */ n = "Ponderosa"; break; - case 5: /* HT-460 original production board */ + case 5: + /* + * HT-460 original production board; two production levels, with + * different serial number ranges. See ipath_ht_early_init() for + * case where we enable IPATH_GPIO_INTR for later serial # range. + */ n = "InfiniPath_HT-460"; break; case 6: @@ -642,7 +647,7 @@ static int ipath_ht_boardname(struct ipa if (n) snprintf(name, namelen, "%s", n); - if (dd->ipath_majrev != 3 || dd->ipath_minrev != 2) { + if (dd->ipath_majrev != 3 || (dd->ipath_minrev < 2 || dd->ipath_minrev > 3)) { /* * This version of the driver only supports the HT-400 * Rev 3.2 @@ -1520,6 +1525,18 @@ static int ipath_ht_early_init(struct ip */ ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, INFINIPATH_S_ABORT); + + ipath_get_eeprom_info(dd); + if(dd->ipath_boardrev == 5 && dd->ipath_serial[0] == '1' && + dd->ipath_serial[1] == '2' && dd->ipath_serial[2] == '8') { + /* + * Later production HT-460 has same changes as HT-465, so + * can use GPIO interrupts. They have serial #'s starting + * with 128, rather than 112. + */ + dd->ipath_flags |= IPATH_GPIO_INTR; + dd->ipath_flags &= ~IPATH_POLL_RX_INTR; + } return 0; } diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index 16f640e..dc83250 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -879,7 +879,6 @@ int ipath_init_chip(struct ipath_devdata done: if (!ret) { - ipath_get_guid(dd); *dd->ipath_statusp |= IPATH_STATUS_CHIP_PRESENT; if (!dd->ipath_f_intrsetup(dd)) { /* now we can enable all interrupts from the chip */ diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index e6507f8..5d92d57 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -650,7 +650,7 @@ u32 __iomem *ipath_getpiobuf(struct ipat void ipath_init_pe800_funcs(struct ipath_devdata *); /* init HT-400-specific func */ void ipath_init_ht400_funcs(struct ipath_devdata *); -void ipath_get_guid(struct ipath_devdata *); +void ipath_get_eeprom_info(struct ipath_devdata *); u64 ipath_snap_cntr(struct ipath_devdata *, ipath_creg); /* diff --git a/drivers/infiniband/hw/ipath/ipath_keys.c b/drivers/infiniband/hw/ipath/ipath_keys.c index aa33b0e..5ae8761 100644 --- a/drivers/infiniband/hw/ipath/ipath_keys.c +++ b/drivers/infiniband/hw/ipath/ipath_keys.c @@ -136,9 +136,7 @@ int ipath_lkey_ok(struct ipath_lkey_tabl ret = 1; goto bail; } - spin_lock(&rkt->lock); mr = rkt->table[(sge->lkey >> (32 - ib_ipath_lkey_table_size))]; - spin_unlock(&rkt->lock); if (unlikely(mr == NULL || mr->lkey != sge->lkey)) { ret = 0; goto bail; @@ -184,8 +182,6 @@ bail: * @acc: access flags * * Return 1 if successful, otherwise 0. - * - * The QP r_rq.lock should be held. */ int ipath_rkey_ok(struct ipath_ibdev *dev, struct ipath_sge_state *ss, u32 len, u64 vaddr, u32 rkey, int acc) @@ -196,9 +192,7 @@ int ipath_rkey_ok(struct ipath_ibdev *de size_t off; int ret; - spin_lock(&rkt->lock); mr = rkt->table[(rkey >> (32 - ib_ipath_lkey_table_size))]; - spin_unlock(&rkt->lock); if (unlikely(mr == NULL || mr->lkey != rkey)) { ret = 0; goto bail; diff --git a/drivers/infiniband/hw/ipath/ipath_layer.c b/drivers/infiniband/hw/ipath/ipath_layer.c index 9cb5258..9ec4ac7 100644 --- a/drivers/infiniband/hw/ipath/ipath_layer.c +++ b/drivers/infiniband/hw/ipath/ipath_layer.c @@ -872,12 +872,13 @@ static void copy_io(u32 __iomem *piobuf, update_sge(ss, len); length -= len; } + /* Update address before sending packet. */ + update_sge(ss, length); /* must flush early everything before trigger word */ ipath_flush_wc(); __raw_writel(last, piobuf); /* be sure trigger word is written */ ipath_flush_wc(); - update_sge(ss, length); } /** @@ -943,17 +944,18 @@ int ipath_verbs_send(struct ipath_devdat if (likely(ss->num_sge == 1 && len <= ss->sge.length && !((unsigned long)ss->sge.vaddr & (sizeof(u32) - 1)))) { u32 w; + u32 *addr = (u32 *) ss->sge.vaddr; + /* Update address before sending packet. */ + update_sge(ss, len); /* Need to round up for the last dword in the packet. */ w = (len + 3) >> 2; - __iowrite32_copy(piobuf, ss->sge.vaddr, w - 1); + __iowrite32_copy(piobuf, addr, w - 1); /* must flush early everything before trigger word */ ipath_flush_wc(); - __raw_writel(((u32 *) ss->sge.vaddr)[w - 1], - piobuf + w - 1); + __raw_writel(addr[w - 1], piobuf + w - 1); /* be sure trigger word is written */ ipath_flush_wc(); - update_sge(ss, len); ret = 0; goto bail; } diff --git a/drivers/infiniband/hw/ipath/ipath_pe800.c b/drivers/infiniband/hw/ipath/ipath_pe800.c index 6318067..02e8c75 100644 --- a/drivers/infiniband/hw/ipath/ipath_pe800.c +++ b/drivers/infiniband/hw/ipath/ipath_pe800.c @@ -1180,6 +1180,8 @@ static int ipath_pe_early_init(struct ip */ dd->ipath_rhdrhead_intr_off = 1ULL<<32; + ipath_get_eeprom_info(dd); + return 0; } diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c b/drivers/infiniband/hw/ipath/ipath_qp.c index 1889071..9f8855d 100644 --- a/drivers/infiniband/hw/ipath/ipath_qp.c +++ b/drivers/infiniband/hw/ipath/ipath_qp.c @@ -375,10 +375,10 @@ static void ipath_error_qp(struct ipath_ spin_lock(&dev->pending_lock); /* XXX What if its already removed by the timeout code? */ - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); - if (qp->piowait.next != LIST_POISON1) - list_del(&qp->piowait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); + if (!list_empty(&qp->piowait)) + list_del_init(&qp->piowait); spin_unlock(&dev->pending_lock); wc.status = IB_WC_WR_FLUSH_ERR; @@ -427,6 +427,7 @@ static void ipath_error_qp(struct ipath_ int ipath_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask) { + struct ipath_ibdev *dev = to_idev(ibqp->device); struct ipath_qp *qp = to_iqp(ibqp); enum ib_qp_state cur_state, new_state; unsigned long flags; @@ -443,6 +444,19 @@ int ipath_modify_qp(struct ib_qp *ibqp, attr_mask)) goto inval; + if (attr_mask & IB_QP_AV) + if (attr->ah_attr.dlid == 0 || + attr->ah_attr.dlid >= IPS_MULTICAST_LID_BASE) + goto inval; + + if (attr_mask & IB_QP_PKEY_INDEX) + if (attr->pkey_index >= ipath_layer_get_npkeys(dev->dd)) + goto inval; + + if (attr_mask & IB_QP_MIN_RNR_TIMER) + if (attr->min_rnr_timer > 31) + goto inval; + switch (new_state) { case IB_QPS_RESET: ipath_reset_qp(qp); @@ -457,13 +471,8 @@ int ipath_modify_qp(struct ib_qp *ibqp, } - if (attr_mask & IB_QP_PKEY_INDEX) { - struct ipath_ibdev *dev = to_idev(ibqp->device); - - if (attr->pkey_index >= ipath_layer_get_npkeys(dev->dd)) - goto inval; + if (attr_mask & IB_QP_PKEY_INDEX) qp->s_pkey_index = attr->pkey_index; - } if (attr_mask & IB_QP_DEST_QPN) qp->remote_qpn = attr->dest_qp_num; @@ -479,12 +488,8 @@ int ipath_modify_qp(struct ib_qp *ibqp, if (attr_mask & IB_QP_ACCESS_FLAGS) qp->qp_access_flags = attr->qp_access_flags; - if (attr_mask & IB_QP_AV) { - if (attr->ah_attr.dlid == 0 || - attr->ah_attr.dlid >= IPS_MULTICAST_LID_BASE) - goto inval; + if (attr_mask & IB_QP_AV) qp->remote_ah_attr = attr->ah_attr; - } if (attr_mask & IB_QP_PATH_MTU) qp->path_mtu = attr->path_mtu; @@ -499,11 +504,8 @@ int ipath_modify_qp(struct ib_qp *ibqp, qp->s_rnr_retry_cnt = qp->s_rnr_retry; } - if (attr_mask & IB_QP_MIN_RNR_TIMER) { - if (attr->min_rnr_timer > 31) - goto inval; + if (attr_mask & IB_QP_MIN_RNR_TIMER) qp->s_min_rnr_timer = attr->min_rnr_timer; - } if (attr_mask & IB_QP_QKEY) qp->qkey = attr->qkey; @@ -710,10 +712,8 @@ struct ib_qp *ipath_create_qp(struct ib_ init_attr->qp_type == IB_QPT_RC ? ipath_do_rc_send : ipath_do_uc_send, (unsigned long)qp); - qp->piowait.next = LIST_POISON1; - qp->piowait.prev = LIST_POISON2; - qp->timerwait.next = LIST_POISON1; - qp->timerwait.prev = LIST_POISON2; + INIT_LIST_HEAD(&qp->piowait); + INIT_LIST_HEAD(&qp->timerwait); qp->state = IB_QPS_RESET; qp->s_wq = swq; qp->s_size = init_attr->cap.max_send_wr + 1; @@ -734,7 +734,7 @@ struct ib_qp *ipath_create_qp(struct ib_ ipath_reset_qp(qp); /* Tell the core driver that the kernel SMA is present. */ - if (qp->ibqp.qp_type == IB_QPT_SMI) + if (init_attr->qp_type == IB_QPT_SMI) ipath_layer_set_verbs_flags(dev->dd, IPATH_VERBS_KERNEL_SMA); break; @@ -783,10 +783,10 @@ int ipath_destroy_qp(struct ib_qp *ibqp) /* Make sure the QP isn't on the timeout list. */ spin_lock_irqsave(&dev->pending_lock, flags); - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); - if (qp->piowait.next != LIST_POISON1) - list_del(&qp->piowait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); + if (!list_empty(&qp->piowait)) + list_del_init(&qp->piowait); spin_unlock_irqrestore(&dev->pending_lock, flags); /* @@ -855,10 +855,10 @@ void ipath_sqerror_qp(struct ipath_qp *q spin_lock(&dev->pending_lock); /* XXX What if its already removed by the timeout code? */ - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); - if (qp->piowait.next != LIST_POISON1) - list_del(&qp->piowait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); + if (!list_empty(&qp->piowait)) + list_del_init(&qp->piowait); spin_unlock(&dev->pending_lock); ipath_cq_enter(to_icq(qp->ibqp.send_cq), wc, 1); diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c index a4055ca..493b182 100644 --- a/drivers/infiniband/hw/ipath/ipath_rc.c +++ b/drivers/infiniband/hw/ipath/ipath_rc.c @@ -57,7 +57,7 @@ static void ipath_init_restart(struct ip qp->s_len = wqe->length - len; dev = to_idev(qp->ibqp.device); spin_lock(&dev->pending_lock); - if (qp->timerwait.next == LIST_POISON1) + if (list_empty(&qp->timerwait)) list_add_tail(&qp->timerwait, &dev->pending[dev->pending_index]); spin_unlock(&dev->pending_lock); @@ -356,7 +356,7 @@ static inline int ipath_make_rc_req(stru if ((int)(qp->s_psn - qp->s_next_psn) > 0) qp->s_next_psn = qp->s_psn; spin_lock(&dev->pending_lock); - if (qp->timerwait.next == LIST_POISON1) + if (list_empty(&qp->timerwait)) list_add_tail(&qp->timerwait, &dev->pending[dev->pending_index]); spin_unlock(&dev->pending_lock); @@ -726,8 +726,8 @@ void ipath_restart_rc(struct ipath_qp *q */ dev = to_idev(qp->ibqp.device); spin_lock(&dev->pending_lock); - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); spin_unlock(&dev->pending_lock); if (wqe->wr.opcode == IB_WR_RDMA_READ) @@ -886,8 +886,8 @@ static int do_rc_ack(struct ipath_qp *qp * just won't find anything to restart if we ACK everything. */ spin_lock(&dev->pending_lock); - if (qp->timerwait.next != LIST_POISON1) - list_del(&qp->timerwait); + if (!list_empty(&qp->timerwait)) + list_del_init(&qp->timerwait); spin_unlock(&dev->pending_lock); /* @@ -1194,8 +1194,7 @@ static inline void ipath_rc_rcv_resp(str IB_WR_RDMA_READ)) goto ack_done; spin_lock(&dev->pending_lock); - if (qp->s_rnr_timeout == 0 && - qp->timerwait.next != LIST_POISON1) + if (qp->s_rnr_timeout == 0 && !list_empty(&qp->timerwait)) list_move_tail(&qp->timerwait, &dev->pending[dev->pending_index]); spin_unlock(&dev->pending_lock); diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c index eb81424..d38f4f3 100644 --- a/drivers/infiniband/hw/ipath/ipath_ruc.c +++ b/drivers/infiniband/hw/ipath/ipath_ruc.c @@ -435,7 +435,7 @@ void ipath_no_bufs_available(struct ipat unsigned long flags; spin_lock_irqsave(&dev->pending_lock, flags); - if (qp->piowait.next == LIST_POISON1) + if (list_empty(&qp->piowait)) list_add_tail(&qp->piowait, &dev->piowait); spin_unlock_irqrestore(&dev->pending_lock, flags); /* diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index cb9e387..28fdbda 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -464,7 +464,7 @@ static void ipath_ib_timer(void *arg) last = &dev->pending[dev->pending_index]; while (!list_empty(last)) { qp = list_entry(last->next, struct ipath_qp, timerwait); - list_del(&qp->timerwait); + list_del_init(&qp->timerwait); qp->timer_next = resend; resend = qp; atomic_inc(&qp->refcount); @@ -474,7 +474,7 @@ static void ipath_ib_timer(void *arg) qp = list_entry(last->next, struct ipath_qp, timerwait); if (--qp->s_rnr_timeout == 0) { do { - list_del(&qp->timerwait); + list_del_init(&qp->timerwait); tasklet_hi_schedule(&qp->s_task); if (list_empty(last)) break; @@ -554,7 +554,7 @@ static int ipath_ib_piobufavail(void *ar while (!list_empty(&dev->piowait)) { qp = list_entry(dev->piowait.next, struct ipath_qp, piowait); - list_del(&qp->piowait); + list_del_init(&qp->piowait); tasklet_hi_schedule(&qp->s_task); } spin_unlock_irqrestore(&dev->pending_lock, flags); @@ -951,6 +951,7 @@ static void *ipath_register_ib_device(in idev->dd = dd; strlcpy(dev->name, "ipath%d", IB_DEVICE_NAME_MAX); + dev->owner = THIS_MODULE; dev->node_guid = ipath_layer_get_guid(dd); dev->uverbs_abi_ver = IPATH_UVERBS_ABI_VERSION; dev->uverbs_cmd_mask = From rdreier at cisco.com Tue May 23 14:09:31 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 May 2006 14:09:31 -0700 Subject: [openib-general] Re: [PATCH 1 of 10] ipath - fix spinlock recursion bug In-Reply-To: (Bryan O'Sullivan's message of "Tue, 23 May 2006 11:32:29 -0700") References: Message-ID: Thanks, I've put 1 through 10 into my git tree and asked Linus to pull. BTW, I just tried SRP with 2.6.17-rc4 + my for-2.6.18 tree + all of these patches, and immediately after connecting to a storage target I get the following: Kernel BUG at drivers/infiniband/hw/ipath/ipath_layer.c:761 invalid opcode: 0000 [1] SMP CPU 0 Modules linked in: ib_srp ib_cm ib_sa ib_mad ib_ipath ib_core ipv6 thermal fan button processor ac battery nfs lockd nfs_acl sunrpc dm_mod ide_generic ide_disk ide_cd cdrom e1000 amd74xx shpchp generic pci_hotplug ipath_core parport_pc parport psmouse ohci_hcd ide_core ehci_hcd serio_raw pcspkr Pid: 3623, comm: udevd Not tainted 2.6.17-rc4 #4 RIP: 0010:[] {:ipath_core:ipath_verbs_send+364} RSP: 0000:ffffffff804e1e28 EFLAGS: 00010246 RAX: ffff8101a02b4148 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff8101a02b4148 RSI: ffff8101a02b40b0 RDI: ffffc200001c8020 RBP: 0000000000000000 R08: ffff8101a02b4148 R09: 0000000000000002 R10: ffff8101a02b41b0 R11: 000000021e056480 R12: 0000000000000000 R13: ffffc200001c8020 R14: 0000000000000040 R15: 0000000000000010 FS: 00002b8024bbcae0(0000) GS:ffffffff80537000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b8024513350 CR3: 00000001a0075000 CR4: 00000000000006e0 Process udevd (pid: 3623, threadinfo ffff81021cc8e000, task ffff8101a02b8770) Stack: 0000000000000092 ffff8101a02b4148 ffff8101a02b4098 0000000016000003 ffff8101a02b4000 000000000000000a 0000000000000000 0000000000000014 ffff8101a02b40a0 ffffffff881e29ec Call Trace: {:ib_ipath:ipath_do_rc_send+384} {:ib_srp:srp_completion+631} {tasklet_hi_action+96} {__do_softirq+86} {call_softirq+30} {do_softirq+44} {do_IRQ+65} {ret_from_intr+0} {__handle_mm_fault+1278} {do_page_fault+796} {do_page_fault+767} {datagram_poll+0} {error_exit+0} Code: 0f 0b 68 22 e9 07 88 c2 f9 02 eb 07 44 39 f5 41 0f 47 ee 48 RIP {:ipath_core:ipath_verbs_send+364} RSP From bos at pathscale.com Tue May 23 14:26:51 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Tue, 23 May 2006 14:26:51 -0700 Subject: [openib-general] Re: [PATCH 1 of 10] ipath - fix spinlock recursion bug In-Reply-To: References: Message-ID: <1148419611.22550.11.camel@chalcedony.pathscale.com> On Tue, 2006-05-23 at 14:09 -0700, Roland Dreier wrote: > Thanks, I've put 1 through 10 into my git tree and asked Linus to pull. Thanks. > BTW, I just tried SRP with 2.6.17-rc4 + my for-2.6.18 tree + all of > these patches, and immediately after connecting to a storage target I > get the following: Yes, I have another large pile of fixes to sort out. Unfortunately, all of them depend on some "code motion" driver changes that, in an ideal world, should be deferred until 2.6.18. Regenerating and testing them against 2.6.17, without the code motion, is a big pain in the big painful body region. How do you feel about taking one code motion patch for 2.6.17? :-) (Bryan O'Sullivan's message of "Tue, 23 May 2006 14:26:51 -0700") References: <1148419611.22550.11.camel@chalcedony.pathscale.com> Message-ID: Bryan> How do you feel about taking one code motion patch for Bryan> 2.6.17? :-) It's probably OK as long as it's pure code motion. In other words separate the actual fixes from moving code around. What I want to avoid is the giant combo patch that does several different things, because if someone later bisects a regression back to that patch, we're kind of screwed... - R. From panda at cse.ohio-state.edu Tue May 23 15:29:16 2006 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue, 23 May 2006 18:29:16 -0400 (EDT) Subject: [openib-general] Announcing the Release of MVAPICH2 0.9.3 with multi-threading support and anonymous SVN access In-Reply-To: from "Roland Dreier" at May 22, 2006 09:00:33 AM Message-ID: <200605232229.k4NMTGjU005974@xi.cse.ohio-state.edu> Roland, Thanks for your question. First of all, the two uDAPL implementations are different. MVAPICH2 code is mostly same for both these implementations. Also, these numbers have been taken on two different platforms. Even though the processor speed is the same on both platforms, other components (incuding the HCAs) might have some impact. This issue needs some further in-depth study while trying to run both Linux and Solaris on the the same platform with the same set of HCAs. If some people in the community would like to carry out this in-depth study, we will be happy to extend help. Thanks, DK > > - Solaris uDAPL/IBTL on Opteron with PCI-Ex and IBA-SDR: > > Two-sided operations: > > - 5.41 microsec one-way latency (4 bytes) > > > - OpenIB/Gen2 uDAPL on Opteron with PCI-Ex and IBA-SDR: > > Two-sided operations: > > - 3.61 microsec one-way latency (4 bytes) > > Just out of curiousity, do you have any idea why Solaris uDAPL does so > much worse than Linux uDAPL on the same hardware? > > - R. > From hycsw at ca.sandia.gov Tue May 23 16:19:25 2006 From: hycsw at ca.sandia.gov (helen chen) Date: 23 May 2006 16:19:25 -0700 Subject: [openib-general] NFS/RDMA for Linux: client and server update release 5 In-Reply-To: <7.0.1.0.2.20060522161137.04202e30@netapp.com> References: <7.0.1.0.2.20060522161137.04202e30@netapp.com> Message-ID: <1148426365.1575.10.camel@shuttle> Hi Tom, I have downloaded your release 5 of the NFS/RDMA and am having trouble mounting the rdma nfs, the "./nfsrdmamount -o rdma on16-ib:/mnt/rdma /mnt/rdma" command never returned. and the dmesg for client and server are: ------ demsg from client ----- RPCRDMA Module Init, register RPC RDMA transport Defaults: MaxRequests 50 MaxInlineRead 1024 MaxInlineWrite 1024 Padding 0 Memreg 5 RPC: Registered rdma transport module. RPC: Registered rdma transport module. RPC: xprt_setup_rdma: 140.221.134.221:2049 nfs: server on16-ib not responding, timed out Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<0000000000000000>] PGD a9f2b067 PUD a8ca2067 PMD 0 Oops: 0010 [1] PREEMPT SMP CPU 1 Modules linked in: xprtrdma ib_srp iscsi_tcp scsi_transport_iscsi scsi_mod Pid: 346, comm: ib_cm/1 Not tainted 2.6.16.16 #4 RIP: 0010:[<0000000000000000>] [<0000000000000000>] RSP: 0018:ffff8100af5a1c30 EFLAGS: 00010246 RAX: ffff8100aeff2400 RBX: ffff8100aeff2400 RCX: ffff8100afc9e458 RDX: 0000000000000000 RSI: ffff8100af5a1d48 RDI: ffff8100aeff2440 RBP: ffff8100aeff2440 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000000 R12: ffff8100aeff2500 R13: 00000000ffffff99 R14: ffff8100af5a1d48 R15: ffffffff8036c72c FS: 0000000000505ae0(0000) GS:ffff810003ce25c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000ad587000 CR4: 00000000000006a0 Process ib_cm/1 (pid: 346, threadinfo ffff8100af5a0000, task ffff8100afea8100) Stack: ffffffff8802a331 ffff8100aeff2500 0000000000000001 ffff8100aeff2440 ffffffff804011fd 0000000000000000 ffffffff8802a343 ffff8100afdd6100 ffffffff80364ee4 0000000000000100 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Code: Bad RIP value. RIP [<0000000000000000>] RSP CR2: 0000000000000000 ------dmesg from server ------ nfsd: request from insecure port 140.221.134.220, port=32768! svc_rdma_recvfrom: transport ffff81007e8f2800 is closing svc_rdma_put: Destroying transport ffff81007e8f2800, cm_id=ffff81007e945200, sk_flags=154, sk_inuse=0 Did I forget to configure necessary components into my kernel? Thanks, Helen On Mon, 2006-05-22 at 13:25, Talpey, Thomas wrote: > Network Appliance is pleased to announce release 5 of the NFS/RDMA > client and server for Linux 2.6.16.16. This update to the April 19 release > adds improved server parallel performance and fixes various issues. This > code supports both Infiniband and iWARP transports. > > > > > > Comments and feedback welcome. We're especially interested in > successful test reports! Thanks. > > Tom Talpey, for the various NFS/RDMA projects. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From xma at us.ibm.com Tue May 23 16:31:48 2006 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 23 May 2006 16:31:48 -0700 Subject: [openib-general] different send and receive CQs In-Reply-To: <200605231821.k4NILlRT032327@robert.bartonsoftware.com> Message-ID: Eric, I have no problem with splitting CQ, you can refer my IPoIB splitting CQ patch. Could you share your code here so we can give you some suggestions? Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From devesh28 at gmail.com Tue May 23 21:29:21 2006 From: devesh28 at gmail.com (Devesh Sharma) Date: Wed, 24 May 2006 09:59:21 +0530 Subject: [openib-general] krping test utility Message-ID: <309a667c0605232129x66d8cc5ek1bd05d22c7e1db7@mail.gmail.com> Hello all, In the krping test utility get_dma_mr is called with access premissions IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE|IB_ACCESS_REMOTE_READ, But the lkey we get from get_dma_mr is similar to reserved lkey with which only Local operations are allowed, but here it seems violating that statement. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Tue May 23 23:08:52 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 24 May 2006 09:08:52 +0300 Subject: [openib-general] different send and receive CQs In-Reply-To: <200605231821.k4NILlRT032327@robert.bartonsoftware.com> References: <200605231821.k4NILlRT032327@robert.bartonsoftware.com> Message-ID: <4473F874.6010804@voltaire.com> Eric Barton wrote: > In my ULP (lustre networking) I maintain a common pool of send descriptors and > per-connection receive descriptors. So it seems reasonable to have a single CQ > for all sends and one CQ per-connection for receives. Please note that since completions for each CQ make the HCA to generate an interrupt for which a SW handler needs some CPU to run on, multiple CQs scale upto the number of CPUs in the system. Beyond that your code will function quite bad. So its bad both for the client side (connecting to multiple OSTs and the server side connecting to $K-order clients). An easy solution to the issue of a single CQ having RX completions from multiple connections (QPs) is to have the structure pointed by the cookie carrying some TAG (pointer) relating it to the relevant connection. some approaches for multiple CQs are: + separate RX and TX CQs + have multiple (#CPUs) threads being waken up by the CQ interrupt handler, where each (takes a lock...,) poll a completion and handles it. Or. From eitan at mellanox.co.il Tue May 23 23:48:13 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 24 May 2006 09:48:13 +0300 Subject: [openib-general] RE: [PATCH] OpenSM/complib: Restore cl_mem* routines as deprecatedrather than removing them altogether Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3023686FE@mtlexch01.mtl.com> OK Thanks. This will give us some time to find and remove all these gracefully. > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Tuesday, May 23, 2006 7:03 PM > To: openib-general at openib.org > Cc: Eitan Zahavi > Subject: [PATCH] OpenSM/complib: Restore cl_mem* routines as deprecatedrather > than removing them altogether > > OpenSM/complib: Restore cl_mem* routines as deprecated rather than > removing them altogether > > Signed-off-by: Hal Rosenstock > > Note: If this approach is acceptable, I will be doing the same with > cl_malloc, cl_zalloc, cl_free, and friends. > > Index: include/complib/cl_memory.h > =================================================================== > --- include/complib/cl_memory.h (revision 7432) > +++ include/complib/cl_memory.h (working copy) > @@ -436,7 +436,7 @@ cl_malloc( > * environments. > * > * SEE ALSO > -* Memory Management, cl_free, cl_zalloc > +* Memory Management, cl_free, cl_zalloc, cl_memset, cl_memclr, cl_memcpy, > cl_memcmp > **********/ > > > @@ -467,7 +467,7 @@ cl_zalloc( > * environments. > * > * SEE ALSO > -* Memory Management, cl_free, cl_malloc > +* Memory Management, cl_free, cl_malloc, cl_memset, cl_memclr, cl_memcpy, > cl_memcmp > **********/ > > > @@ -502,6 +502,142 @@ cl_free( > **********/ > > > +/****f* Public: Memory Management/cl_memset > +* NAME > +* cl_memset > +* > +* DESCRIPTION > +* The cl_memset function sets every byte in a memory range to a given value. > +* > +* SYNOPSIS > +*/ > +void __attribute__((deprecated)) > +cl_memset( > + IN void* const p_memory, > + IN const uint8_t fill, > + IN const size_t count ); > +/* > +* PARAMETERS > +* p_memory > +* [in] Pointer to a memory block. > +* > +* fill > +* [in] Byte value with which to fill the memory. > +* > +* count > +* [in] Number of bytes to set. > +* > +* RETURN VALUE > +* This function does not return a value. > +* > +* SEE ALSO > +* Memory Management, cl_memclr, cl_memcpy, cl_memcmp > +**********/ > + > + > +/****f* Public: Memory Management/cl_memclr > +* NAME > +* cl_memclr > +* > +* DESCRIPTION > +* The cl_memclr function sets every byte in a memory range to zero. > +* > +* SYNOPSIS > +*/ > +static inline void __attribute__((deprecated)) > +cl_memclr( > + IN void* const p_memory, > + IN const size_t count ) > +{ > + memset( p_memory, 0, count ); > +} > +/* > +* PARAMETERS > +* p_memory > +* [in] Pointer to a memory block. > +* > +* count > +* [in] Number of bytes to set. > +* > +* RETURN VALUE > +* This function does not return a value. > +* > +* SEE ALSO > +* Memory Management, cl_memset, cl_memcpy, cl_memcmp > +**********/ > + > + > +/****f* Public: Memory Management/cl_memcpy > +* NAME > +* cl_memcpy > +* > +* DESCRIPTION > +* The cl_memcpy function copies a given number of bytes from > +* one buffer to another. > +* > +* SYNOPSIS > +*/ > +void __attribute__((deprecated)) * > +cl_memcpy( > + IN void* const p_dest, > + IN const void* const p_src, > + IN const size_t count ); > +/* > +* PARAMETERS > +* p_dest > +* [in] Pointer to the buffer being copied to. > +* > +* p_src > +* [in] Pointer to the buffer being copied from. > +* > +* count > +* [in] Number of bytes to copy from the source buffer to the > +* destination buffer. > +* > +* RETURN VALUE > +* This function does not return a value. > +* > +* SEE ALSO > +* Memory Management, cl_memset, cl_memclr, cl_memcmp > +**********/ > + > + > +/****f* Public: Memory Management/cl_memcmp > +* NAME > +* cl_memcmp > +* > +* DESCRIPTION > +* The cl_memcmp function compares two memory buffers. > +* > +* SYNOPSIS > +*/ > +int32_t __attribute__((deprecated)) > +cl_memcmp( > + IN const void* const p_mem, > + IN const void* const p_ref, > + IN const size_t count ); > +/* > +* PARAMETERS > +* p_mem > +* [in] Pointer to a memory block being compared. > +* > +* p_ref > +* [in] Pointer to the reference memory block to compare against. > +* > +* count > +* [in] Number of bytes to compare. > +* > +* RETURN VALUES > +* Returns less than zero if p_mem is less than p_ref. > +* > +* Returns greater than zero if p_mem is greater than p_ref. > +* > +* Returns zero if the two memory regions are the identical. > +* > +* SEE ALSO > +* Memory Management, cl_memset, cl_memclr, cl_memcpy > +**********/ > + > /****f* Public: Memory Management/cl_get_pagesize > * NAME > * cl_get_pagesize > Index: complib/cl_memory_osd.c > =================================================================== > --- complib/cl_memory_osd.c (revision 7432) > +++ complib/cl_memory_osd.c (working copy) > @@ -69,3 +69,30 @@ __cl_free_priv( > free( p_memory ); > } > > +void > +cl_memset( > + IN void* const p_memory, > + IN const uint8_t fill, > + IN const size_t count ) > +{ > + memset( p_memory, fill, count ); > +} > + > +void* > +cl_memcpy( > + IN void* const p_dest, > + IN const void* const p_src, > + IN const size_t count ) > +{ > + return( memcpy( p_dest, p_src, count ) ); > +} > + > +int32_t > +cl_memcmp( > + IN const void* const p_mem, > + IN const void* const p_ref, > + IN const size_t count ) > +{ > + return( memcmp( p_mem, p_ref, count ) ); > +} > + > Index: complib/libosmcomp.map > =================================================================== > --- complib/libosmcomp.map (revision 7432) > +++ complib/libosmcomp.map (working copy) > @@ -1,4 +1,4 @@ > -OSMCOMP_1.0 { > +OSMCOMP_1.1 { > global: > cl_async_proc_construct; > cl_async_proc_init; > @@ -87,6 +87,9 @@ OSMCOMP_1.0 { > __cl_find_mem; > __cl_free_trk; > __cl_free_ntrk; > + cl_memset; > + cl_memcpy; > + cl_memcmp; > __cl_perf_run_calibration; > __cl_perf_construct; > __cl_perf_init; > Index: complib/libosmcomp.ver > =================================================================== > --- complib/libosmcomp.ver (revision 7432) > +++ complib/libosmcomp.ver (working copy) > @@ -6,4 +6,4 @@ > # API_REV - advance on any added API > # RUNNING_REV - advance any change to the vendor files > # AGE - number of backward versions the API still supports > -LIBVERSION=1:0:0 > +LIBVERSION=1:1:0 > From eeb at bartonsoftware.com Wed May 24 01:28:42 2006 From: eeb at bartonsoftware.com (Eric Barton) Date: Wed, 24 May 2006 09:28:42 +0100 Subject: [openib-general] different send and receive CQs In-Reply-To: Message-ID: <00fc01c67f0c$10e97bf0$0281a8c0@ebpc> > Are you not seeing any completions when you poll the CQ, or are you > not getting completion events? Some things to check would be that you > are requesting notification on all the CQs you want events on, Doh! I must have been having a bit of a Homer moment... BTW, Or Gerlitz reckons there is a performance penalty for using multiple CQs. The reason I'm interested in separate CQs is to avoid CQ overflow as I add connections. On other stacks (Voltaire, Cisco, Silverstorm) I size a single CQ large enough for 'n' connections (i.e. cluster size - 1), but that means I have to refuse connections when 'n' have been established. In one stack it also stressed vmalloc() and prevented me from using a single whole-memory mapping. Is there a consensus? Cheers, Eric From ogerlitz at voltaire.com Wed May 24 01:32:12 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 24 May 2006 11:32:12 +0300 Subject: [openib-general] Re: [PATCH 2/2] port the fmr pool to use the max_map_per_fmr device attribute In-Reply-To: References: Message-ID: <44741A0C.3010807@voltaire.com> Roland Dreier wrote: > > + struct ib_device_attr device_attr; > > How big is struct ib_device_attr? I've usually been reluctant to put > this type of thing on the stack to avoid bloating stack usage too > much. Oh, its 168 bytes on x86_64, I will fix it to be allocated dynamically. Or. From ogerlitz at voltaire.com Wed May 24 01:46:19 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 24 May 2006 11:46:19 +0300 Subject: [openib-general] [PATCH 1/2] mthca support for max_map_per_fmr device attribute In-Reply-To: References: <4472EDC9.8080706@voltaire.com> Message-ID: <44741D5B.8090001@voltaire.com> Roland Dreier wrote: > Or> When "requests" come fast enough, there's a window in time > Or> when there's an unmapping of N FMRs running at batch, but out > Or> of the remaining N FMRs some are already dirty and can't be > Or> used to serve a credit. So the app fails temporally... So, > Or> setting the watermark to 0.5N might solve this, but since > Or> enlarging the number of remaps is trivial, i'd like to do it > Or> first. > > I don't quite understand how increasing the max remap count really > helps you that much. Increasing it would just make this failure less > frequent, but it would still occur, right? Increasing the max remap count --really-- helps me b/c it takes >> time for free FMRs to become dirty, and this window is enough for the batch unmap to complete, so practically there are always free N FMRs with the scheme suggested above (allocate 2N with watermark at N, publish N to upper layers) Indeed, the code can not --count-- on that, so when iSER get -EAGAIN return code from ib_fmr_pool_map_phys() it would try later. The current retry scheme is just trying over and over (you can not see it easily from the iser code, its related to the interaction with libiscsi), i have on my TODO an item to register a flush callback with the pool, suspend the iser TX flow when getting EAGAIN from fmr_map and resume TX from the flush callback. Or. From ogerlitz at voltaire.com Wed May 24 02:01:32 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 24 May 2006 12:01:32 +0300 Subject: [openib-general] Re: [PATCH 1/2] mthca support for max_map_per_fmr device attribute In-Reply-To: References: Message-ID: <447420EC.90403@voltaire.com> Roland Dreier wrote: > Or> Also, if the patch makes sense and the memfree issue is > Or> resolved, i'd like to change the name of the device attribute > Or> from max_map_per_fmr to max_remaps_per_fmr, i can resend this > Or> patch series with this fix. > > The patch makes sense, although of course you need to make sure you > understand and handle the mem-free case as well if you want it > applied. + /* FMR can be remapped 2^B - 1 times where B < 32 is the number of + * bits which are not used for MPT addressing */ + max_map_per_fmr = (1 << (32 - long_log2(mdev->limits.num_mpts))) - 1; OK, fair enough, i will need at least some kickoff helping... can you comment if the above calculation is indeed broken under memfree? if yes, is it broken under both Arbel/Sinai? where should i look into the driver or i should look in the PRM? > I'm not sure changing to max_remaps_per_fmr is really > clearer, since the value counts the first mapping of the FMR (which is > not a remapping). But I guess I could be convinced if more people > think it's clearer. remaps means all the maps except the first one, for this stands the "-1" in the calculation above. Let me know what makes sense more to you (or others if they choose to respond) calling it max_maps_per_fmr (which counts also the first map) or calling it max_remaps_per_fmr (so it does not count the first map). Or. From ogerlitz at voltaire.com Wed May 24 02:26:19 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 24 May 2006 12:26:19 +0300 Subject: [openib-general] different send and receive CQs In-Reply-To: <00fc01c67f0c$10e97bf0$0281a8c0@ebpc> References: <00fc01c67f0c$10e97bf0$0281a8c0@ebpc> Message-ID: <447426BB.8050102@voltaire.com> Eric Barton wrote: > BTW, Or Gerlitz reckons there is a performance penalty for using multiple > CQs. The reason I'm interested in separate CQs is to avoid CQ overflow as I > add connections. On other stacks (Voltaire, Cisco, Silverstorm) I size a > single CQ large enough for 'n' connections (i.e. cluster size - 1), but that > means I have to refuse connections when 'n' have been established. Talking about CQ wrt adding connections here's my take: the max CQ size (reported by struct ib_device_attr->max_cqe of ib_query_device) is 128K (this is on memfull HCA, you would need to check the memfree HCA). So when the number of RX credits per connection is low it allows for many-K connections to use the same CQ (eg eight credits allow for 120K connections which is much more then the ~48K limit on LMC0 IB clusters size...). If you need more connections (QPs) than a single CQ can carry, create another one and attach it to new QPs. The CQ callback gets the CQ pointer as its first element, so you need not change you polling/arming logic. Also note that a 128K entries CQ consumes about 4MB (Roland can you confirm?) of the HCA attached memory (or host memory for memfree), so per my taste, coding apps for the cq_resize is kind of over doing. > In one stack it also stressed vmalloc() and prevented me from using a > single whole-memory mapping. Is there a chance that you are confusing CQs with QPs? Before implementing FMR scheme for the voltaire NAL, you were creating a giant QP for which the gen1 driver was allocating the host side memory using vmalloc, so it could not allocate more then ~300 QPs. With the mthca driver you should be able to allocate a CQ with the maximum allowed size (and if not it will be fixed...) Or. From mst at mellanox.co.il Wed May 24 02:37:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 12:37:19 +0300 Subject: [openib-general] Re: [PATCH] mthca: fix posting lists of 256 entries for tavor In-Reply-To: References: <20060518153254.GF30211@mellanox.co.il> Message-ID: <20060524093719.GB21266@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] mthca: fix posting lists of 256 entries for tavor > > Thanks, applied. > BTW, srq will have the same problem in tavor, won't it? Both kernel and userspace code look quite similiar. -- MST From halr at voltaire.com Wed May 24 03:17:40 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 24 May 2006 06:17:40 -0400 Subject: [openib-general] Re: [PATCH] opensm: fix byte ordering in ib_member_get/set_sl_flow_hop() In-Reply-To: <20060522151020.GJ30176@sashak.voltaire.com> References: <20060522151020.GJ30176@sashak.voltaire.com> Message-ID: <1148465854.4470.115742.camel@hal.voltaire.com> On Mon, 2006-05-22 at 11:10, Sasha Khapyorsky wrote: > This fixes net/host byte ordering in ib_member_get/set_sl_flow_hop() > functions. > > Signed-off-by: Sasha Khapyorsky Thanks. Applied to trunk only. -- Hal From Thomas.Talpey at netapp.com Wed May 24 04:25:47 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Wed, 24 May 2006 07:25:47 -0400 Subject: [openib-general] NFS/RDMA for Linux: client and server update release 5 In-Reply-To: <1148426365.1575.10.camel@shuttle> References: <7.0.1.0.2.20060522161137.04202e30@netapp.com> <1148426365.1575.10.camel@shuttle> Message-ID: <7.0.1.0.2.20060524071711.0421e2d8@netapp.com> [Cutting down the reply list to more relevant parties...] It's hard to say what is crashing, but I suspect the CM code, due to the process context being ib_cm. Is there some reason you're not getting symbols in the stack trace? If you could feed this oops text to ksymoops it will give us more information. In any case, it appears the connection is succeeding at the server, but the client RPC code isn't being signalled that it has done so. Perhaps this is due to a lost reply, but the NFS code hasn't actually started to do anything. So, I would look for IB-level issues. Is the client running the current OpenFabrics svn top-of-tree? Let's take this offline to diagnose, unless someone has an idea why the CM would be failing. The ksymoops analysis would help. Tom. At 07:19 PM 5/23/2006, helen chen wrote: >Hi Tom, > >I have downloaded your release 5 of the NFS/RDMA and am having trouble >mounting the rdma nfs, the >"./nfsrdmamount -o rdma on16-ib:/mnt/rdma /mnt/rdma" command never >returned. and the dmesg for client and server are: > >------ demsg from client ----- >RPCRDMA Module Init, register RPC RDMA transport >Defaults: > MaxRequests 50 > MaxInlineRead 1024 > MaxInlineWrite 1024 > Padding 0 > Memreg 5 >RPC: Registered rdma transport module. >RPC: Registered rdma transport module. >RPC: xprt_setup_rdma: 140.221.134.221:2049 >nfs: server on16-ib not responding, timed out >Unable to handle kernel NULL pointer dereference at 0000000000000000 >RIP: >[<0000000000000000>] >PGD a9f2b067 PUD a8ca2067 PMD 0 >Oops: 0010 [1] PREEMPT SMP >CPU 1 >Modules linked in: xprtrdma ib_srp iscsi_tcp scsi_transport_iscsi >scsi_mod >Pid: 346, comm: ib_cm/1 Not tainted 2.6.16.16 #4 >RIP: 0010:[<0000000000000000>] [<0000000000000000>] >RSP: 0018:ffff8100af5a1c30 EFLAGS: 00010246 >RAX: ffff8100aeff2400 RBX: ffff8100aeff2400 RCX: ffff8100afc9e458 >RDX: 0000000000000000 RSI: ffff8100af5a1d48 RDI: ffff8100aeff2440 >RBP: ffff8100aeff2440 R08: 0000000000000000 R09: 0000000000000000 >R10: 0000000000000003 R11: 0000000000000000 R12: ffff8100aeff2500 >R13: 00000000ffffff99 R14: ffff8100af5a1d48 R15: ffffffff8036c72c >FS: 0000000000505ae0(0000) GS:ffff810003ce25c0(0000) >knlGS:0000000000000000 >CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >CR2: 0000000000000000 CR3: 00000000ad587000 CR4: 00000000000006a0 >Process ib_cm/1 (pid: 346, threadinfo ffff8100af5a0000, task >ffff8100afea8100) >Stack: ffffffff8802a331 ffff8100aeff2500 0000000000000001 >ffff8100aeff2440 > ffffffff804011fd 0000000000000000 ffffffff8802a343 >ffff8100afdd6100 > ffffffff80364ee4 0000000000000100 >Call Trace: [] [] > [] [] [] > [] [] [] > [] [] [] > [] [] [] > [] [] [] > [] [] [] > [] [] [] > [] [] [] > [] > >Code: Bad RIP value. >RIP [<0000000000000000>] RSP >CR2: 0000000000000000 > >------dmesg from server ------ >nfsd: request from insecure port 140.221.134.220, port=32768! >svc_rdma_recvfrom: transport ffff81007e8f2800 is closing >svc_rdma_put: Destroying transport ffff81007e8f2800, >cm_id=ffff81007e945200, sk_flags=154, sk_inuse=0 > >Did I forget to configure necessary components into my kernel? > >Thanks, >Helen > >On Mon, 2006-05-22 at 13:25, Talpey, Thomas wrote: >> Network Appliance is pleased to announce release 5 of the NFS/RDMA >> client and server for Linux 2.6.16.16. This update to the April 19 release >> adds improved server parallel performance and fixes various issues. This >> code supports both Infiniband and iWARP transports. >> >> >> >> > >> >> Comments and feedback welcome. We're especially interested in >> successful test reports! Thanks. >> >> Tom Talpey, for the various NFS/RDMA projects. >> >> _______________________________________________ >> openib-general mailing list >> openib-general at openib.org >> http://openib.org/mailman/listinfo/openib-general >> >> To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general >> From glebn at voltaire.com Wed May 24 04:30:48 2006 From: glebn at voltaire.com (Gleb Natapov) Date: Wed, 24 May 2006 14:30:48 +0300 Subject: [openib-general] Plans for libibverbs 1.1 In-Reply-To: References: Message-ID: <20060524113048.GF8694@minantech.com> On Tue, May 23, 2006 at 09:57:03AM -0700, Roland Dreier wrote: > I'm planning on branching the libibverbs tree so that I can open a 1.1 > development branch where ABI/API stability is not a requirement. My > current plan is to copy the current src/userspace/libibverbs tree in > svn to src/userspace/libibverbs-1.0. The libibverbs-1.0 tree would be > used for stable maintainence (only changes that preserve ABI and API > stability will be accepted), and the libibverbs tree would be used for > new development. > > I would expect a libibverbs 1.1-pre1 snapshot release shortly, with > the goal of a full stable libibverbs 1.1 release in 3 or 4 months. > > So far I have the changes below queued up for the new libibverbs 1.1 > tree. The main changes are getting rid of libsysfs use, and removing > the deprecated ib_XXX symbols. > What about madvice patch? Is it scheduled to go into 1.1? -- Gleb. From Thomas.Talpey at netapp.com Wed May 24 05:24:07 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Wed, 24 May 2006 08:24:07 -0400 Subject: [openib-general] NFS/RDMA for Linux: client and server update release 5 In-Reply-To: <7.0.1.0.2.20060524071711.0421e2d8@netapp.com> References: <7.0.1.0.2.20060522161137.04202e30@netapp.com> <1148426365.1575.10.camel@shuttle> <7.0.1.0.2.20060524071711.0421e2d8@netapp.com> Message-ID: <7.0.1.0.2.20060524081758.0462efd8@netapp.com> OBTW, I just noticed that your server printed the message: nfsd: request from insecure port 140.221.134.220, port=32768! This means the /mnt/rdma export isn't configured with "insecure", and causes the server to close the connection. Because the IB CM does not allow the client to use so-called secure ports (< 1024), you need to set this flag on any RDMA exports, this is mentioned in our README. The jury is out on whether it's worth implementing the source port emulation in the IB CM. The problem is that to do so requires the CM to interface with the local IP port space, or manage one of its own. So for now, NFS/RDMA just recommends using the exports flag. Frankly, it provides no additional security, and is misnamed... Tom. At 07:25 AM 5/24/2006, Talpey, Thomas wrote: >[Cutting down the reply list to more relevant parties...] > >It's hard to say what is crashing, but I suspect the CM code, due >to the process context being ib_cm. Is there some reason you're >not getting symbols in the stack trace? If you could feed this oops >text to ksymoops it will give us more information. > >In any case, it appears the connection is succeeding at the server, >but the client RPC code isn't being signalled that it has done so. >Perhaps this is due to a lost reply, but the NFS code hasn't actually >started to do anything. So, I would look for IB-level issues. Is the >client running the current OpenFabrics svn top-of-tree? > >Let's take this offline to diagnose, unless someone has an idea why >the CM would be failing. The ksymoops analysis would help. > >Tom. > > > >At 07:19 PM 5/23/2006, helen chen wrote: >>Hi Tom, >> >>I have downloaded your release 5 of the NFS/RDMA and am having trouble >>mounting the rdma nfs, the >>"./nfsrdmamount -o rdma on16-ib:/mnt/rdma /mnt/rdma" command never >>returned. and the dmesg for client and server are: >> >>------ demsg from client ----- >>RPCRDMA Module Init, register RPC RDMA transport >>Defaults: >> MaxRequests 50 >> MaxInlineRead 1024 >> MaxInlineWrite 1024 >> Padding 0 >> Memreg 5 >>RPC: Registered rdma transport module. >>RPC: Registered rdma transport module. >>RPC: xprt_setup_rdma: 140.221.134.221:2049 >>nfs: server on16-ib not responding, timed out >>Unable to handle kernel NULL pointer dereference at 0000000000000000 >>RIP: >>[<0000000000000000>] >>PGD a9f2b067 PUD a8ca2067 PMD 0 >>Oops: 0010 [1] PREEMPT SMP >>CPU 1 >>Modules linked in: xprtrdma ib_srp iscsi_tcp scsi_transport_iscsi >>scsi_mod >>Pid: 346, comm: ib_cm/1 Not tainted 2.6.16.16 #4 >>RIP: 0010:[<0000000000000000>] [<0000000000000000>] >>RSP: 0018:ffff8100af5a1c30 EFLAGS: 00010246 >>RAX: ffff8100aeff2400 RBX: ffff8100aeff2400 RCX: ffff8100afc9e458 >>RDX: 0000000000000000 RSI: ffff8100af5a1d48 RDI: ffff8100aeff2440 >>RBP: ffff8100aeff2440 R08: 0000000000000000 R09: 0000000000000000 >>R10: 0000000000000003 R11: 0000000000000000 R12: ffff8100aeff2500 >>R13: 00000000ffffff99 R14: ffff8100af5a1d48 R15: ffffffff8036c72c >>FS: 0000000000505ae0(0000) GS:ffff810003ce25c0(0000) >>knlGS:0000000000000000 >>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >>CR2: 0000000000000000 CR3: 00000000ad587000 CR4: 00000000000006a0 >>Process ib_cm/1 (pid: 346, threadinfo ffff8100af5a0000, task >>ffff8100afea8100) >>Stack: ffffffff8802a331 ffff8100aeff2500 0000000000000001 >>ffff8100aeff2440 >> ffffffff804011fd 0000000000000000 ffffffff8802a343 >>ffff8100afdd6100 >> ffffffff80364ee4 0000000000000100 >>Call Trace: [] [] >> [] [] [] >> [] [] [] >> [] [] [] >> [] [] [] >> [] [] [] >> [] [] [] >> [] [] [] >> [] [] [] >> [] >> >>Code: Bad RIP value. >>RIP [<0000000000000000>] RSP >>CR2: 0000000000000000 >> >>------dmesg from server ------ >>nfsd: request from insecure port 140.221.134.220, port=32768! >>svc_rdma_recvfrom: transport ffff81007e8f2800 is closing >>svc_rdma_put: Destroying transport ffff81007e8f2800, >>cm_id=ffff81007e945200, sk_flags=154, sk_inuse=0 >> >>Did I forget to configure necessary components into my kernel? >> >>Thanks, >>Helen >> >>On Mon, 2006-05-22 at 13:25, Talpey, Thomas wrote: >>> Network Appliance is pleased to announce release 5 of the NFS/RDMA >>> client and server for Linux 2.6.16.16. This update to the April 19 release >>> adds improved server parallel performance and fixes various issues. This >>> code supports both Infiniband and iWARP transports. >>> >>> >>> >>> >>id=191427> >>> >>> Comments and feedback welcome. We're especially interested in >>> successful test reports! Thanks. >>> >>> Tom Talpey, for the various NFS/RDMA projects. >>> >>> _______________________________________________ >>> openib-general mailing list >>> openib-general at openib.org >>> http://openib.org/mailman/listinfo/openib-general >>> >>> To unsubscribe, please visit >>http://openib.org/mailman/listinfo/openib-general >>> > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Wed May 24 06:12:58 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 16:12:58 +0300 Subject: [openib-general] Re: [PATCH 1/2] mthca support for max_map_per_fmr device attribute In-Reply-To: <44741D5B.8090001@voltaire.com> References: <4472EDC9.8080706@voltaire.com> <44741D5B.8090001@voltaire.com> Message-ID: <20060524131258.GM21266@mellanox.co.il> Quoting r. Or Gerlitz : > Subject: Re: [PATCH 1/2] mthca support for max_map_per_fmr device attribute > > Roland Dreier wrote: > > Or> When "requests" come fast enough, there's a window in time > > Or> when there's an unmapping of N FMRs running at batch, but out > > Or> of the remaining N FMRs some are already dirty and can't be > > Or> used to serve a credit. So the app fails temporally... So, > > Or> setting the watermark to 0.5N might solve this, but since > > Or> enlarging the number of remaps is trivial, i'd like to do it > > Or> first. You will still be limited wrt the number of remaps in memfree architecture. And the memory registration code is hard enough to stress test. So I have to say setting the watermark at 0.5N makes much more sense to me. -- MST From arne.redlich at xiranet.com Wed May 24 06:23:00 2006 From: arne.redlich at xiranet.com (Arne Redlich) Date: Wed, 24 May 2006 15:23:00 +0200 Subject: [openib-general] Re: Fwd: [Bug 91] sizeof(srp_indirect_buf) wrongon 64-bit platforms In-Reply-To: (Roland Dreier's message of "Tue, 23 May 2006 09:06:15 -0700") References: <877j4dnrn6.fsf@confield.dd.xiranet.com> <20060523090340.GA12666@mellanox.co.il> <87mzd9yod0.fsf@confield.dd.xiranet.com> Message-ID: <87irnvpnez.fsf@confield.dd.xiranet.com> Roland Dreier writes: > >>> I'm afraid it *does* have an effect, unfortunately. > > Hmm, go ahead and forward the fix from 2.6.17 to the stable team for > kernel 2.6.16 if this bug affects your target. > > Thanks, > Roland It doesn't affect our target during regular operation, as the indirect descriptor table is always included entirely in the IU, so RDMA reading it isn't necessary. I just stumbled accross this issue while enforcing an RDMA read of the descriptor table for testing purposes - so having the fix queued for 2.6.17 is sufficient for us. Do you (or anyone else?) want (me) to forward the fix to the stable team anyway? Arne From mst at mellanox.co.il Wed May 24 06:37:28 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 16:37:28 +0300 Subject: [openib-general] ipoib use of multicast module on trunk causes kernel oops on 2.6.16 Message-ID: <20060524133728.GN21266@mellanox.co.il> Hi! Looks like moving ipoib to the new multicast module caused some instability. See below. ----- Forwarded message from Ali Ayoub ----- > -----Original Message----- > From: Michael S. Tsirkin > To: Ali Ayoub > > Quoting r. Ali Ayoub : > > Subject: [gen2 trunk] kernel oops on 2.6.16 > > > > The last trunk build causes kernel oops on 2.6.16 while restarting the driver. > > > > (the previous build -rev 7422- works fine) > > > > > > > > May 24 16:00:40 sw037 kernel: Unable to handle kernel paging request at > ffffffff8804bb17 RIP: > > May 24 16:00:40 sw037 kernel: [] > > May 24 16:00:40 sw037 kernel: PGD 103027 PUD 105027 PMD 17f69d067 PTE 0 > > May 24 16:00:40 sw037 kernel: Oops: 0000 [1] SMP > > May 24 16:00:40 sw037 kernel: CPU 1 > > May 24 16:00:40 sw037 kernel: Modules linked in: ib_sa ib_uverbs ib_umad > ib_mthca ib_mad ib_core > > May 24 16:00:40 sw037 kernel: Pid: 4355, comm: modprobe Not tainted > 2.6.16 #9 > > May 24 16:00:40 sw037 kernel: RIP: 0010:[] > [] > > May 24 16:00:40 sw037 kernel: RSP: 0000:ffff810179ca7d40 EFLAGS: > 00010246 > > May 24 16:00:40 sw037 kernel: RAX: 0000000000000005 RBX: > ffff810179ca7df0 RCX: ffffffff88045b49 > > May 24 16:00:40 sw037 kernel: RDX: ffff81017c0f1760 RSI: > 0000000000000000 RDI: 00000000fffffffc > > May 24 16:00:40 sw037 kernel: RBP: ffff810179ca7da8 R08: > ffff81017a83fb68 R09: ffff81017c54cc40 > > May 24 16:00:40 sw037 kernel: R10: ffff81017c8c4848 R11: > 0000000000000020 R12: ffff81017e21a3b8 > > May 24 16:00:40 sw037 kernel: R13: 00000000fffffffc R14: > 0000000000000000 R15: 0000000000000080 > > May 24 16:00:40 sw037 kernel: FS: 0000000000000000(0000) > GS:ffff81017fc772a8(0000) knlGS:0000000000000000 > > May 24 16:00:40 sw037 ifdown: Interface not available and no > configuration found. > > May 24 16:00:40 sw037 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 000000008005003b > > May 24 16:00:40 sw037 kernel: CR2: ffffffff8804bb17 CR3: > 000000017af3f000 CR4: 00000000000006e0 > > May 24 16:00:40 sw037 kernel: Process modprobe (pid: 4355, threadinfo > ffff810179ca6000, task ffff81017f9209a0) > > May 24 16:00:40 sw037 kernel: Stack: ffffffff88045b95 ffff81017c54cc38 > ffff81017c54c000 ffff810179ca7d98 > > May 24 16:00:40 sw037 kernel: ffffffff80173b26 ffff810179ca7d98 > ffffffff801736ad 0000000000000000 > > May 24 16:00:40 sw037 kernel: ffff81017fc00040 ffff81017c8c4840 > > May 24 16:00:40 sw037 kernel: Call Trace: > {:ib_sa:ib_sa_mcmember_rec_callback+76} > > May 24 16:00:40 sw037 kernel: > {cache_free_debugcheck+568} > {poison_obj+58} > > May 24 16:00:40 sw037 kernel: > {:ib_sa:send_handler+80} > {:ib_mad:ib_unregister_mad_agent+359} > > May 24 16:00:40 sw037 kernel: > {:ib_sa:free_sm_ah+0} > {:ib_sa:ib_sa_remove_one+80} > > May 24 16:00:40 sw037 kernel: > {:ib_core:ib_unregister_client+72} > > May 24 16:00:40 sw037 kernel: > {:ib_sa:ib_sa_cleanup+16} > {sys_delete_module+513} > > May 24 16:00:40 sw037 kernel: {__up_write+293} > {system_call+126} > > May 24 16:00:40 sw037 kernel: > > May 24 16:00:40 sw037 kernel: Code: Bad RIP value. > > May 24 16:00:40 sw037 kernel: RIP [] RSP > > > May 24 16:00:40 sw037 kernel: CR2: ffffffff8804bb17 > > May 24 16:00:40 sw037 kernel: BUG: modprobe/4355, lock held at task > exit time! > > May 24 16:00:40 sw037 kernel: [ffffffff8800c9e0] {device_mutex} > > May 24 16:00:40 sw037 kernel: .. held by: modprobe: 4355 > [ffff81017f9209a0, 118] > > May 24 16:00:40 sw037 kernel: ... acquired at: > ib_unregister_client+0x1a/0x108 [ib_core] > > May 24 16:00:42 sw037 ifdown: Interface not available and no > configuration found. > > May 24 16:00:42 sw037 ifdown: Interface not available and no > > > > > > Must be ipoib multicast change by Sean. Please try reverting r7401. Right, reverting to 7401 solved the problem. -- MST From halr at voltaire.com Wed May 24 06:33:54 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 24 May 2006 09:33:54 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_t Fix NULL ptr issue Message-ID: <1148477633.4470.119597.camel@hal.voltaire.com> OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_t Fix NULL ptr issue Signed-off-by: Hal Rosenstock Index: opensm/osm_ucast_updn.c =================================================================== --- opensm/osm_ucast_updn.c (revision 7435) +++ opensm/osm_ucast_updn.c (working copy) @@ -121,10 +121,12 @@ __updn_create_updn_next_step_t(IN updn_s p_next_step = (updn_next_step_t*) cl_zalloc(sizeof(*p_next_step)); CL_ASSERT (p_next_step != NULL); - p_next_step->state = state; - p_next_step->p_sw = p_sw; + if (p_next_step) + { + p_next_step->state = state; + p_next_step->p_sw = p_sw; + } return p_next_step; - } /********************************************************************** From halr at voltaire.com Wed May 24 06:58:39 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 24 May 2006 09:58:39 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_sa_mcmember_record.c::OpenSM/osm_sa_mcmember_record.c Return 0 when no table can be allocated Message-ID: <1148478598.4470.119901.camel@hal.voltaire.com> OpenSM/osm_sa_mcmember_record.c Return 0 when no used MLID table can be allocated Signed-off-by: Hal Rosenstock Index: opensm/osm_sa_mcmember_record.c =================================================================== --- opensm/osm_sa_mcmember_record.c (revision 7441) +++ opensm/osm_sa_mcmember_record.c (working copy) @@ -326,6 +326,8 @@ __get_new_mlid( /* track all used mlids in the array (by mlid index) */ used_mlids_array = (uint8_t *)cl_zalloc(sizeof(uint8_t)*max_num_mlids); + if (!used_mlids_array) + return 0; /* scan all available multicast groups in the DB and fill in the table */ while( p_mgrp != (osm_mgrp_t*)cl_qmap_end( &p_subn->mgrp_mlid_tbl ) ) From ogerlitz at voltaire.com Wed May 24 07:14:58 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 24 May 2006 17:14:58 +0300 Subject: [openib-general] [PATCH] IB/iser remove usage of ISCSI_VERSION_STR Message-ID: <20060524141458.GA6483@voltaire.com> Concatenate iSER's DRV_VER instead of ISCSI_VERSION_STR to the name field of the scsi host template. Signed-off-by: Or Gerlitz --- b/drivers/infiniband/ulp/iser/iscsi_iser.c 2006-05-24 16:41:56.936748675 +0300 +++ linux-2.6.16/drivers/infiniband/ulp/iser/iscsi_iser.c 2006-05-24 16:38:07.011802445 +0300 @@ -679,8 +679,7 @@ iscsi_iser_ep_disconnect(__u64 ep_handle } static struct scsi_host_template iscsi_iser_sht = { - .name = "iSCSI Initiator over iSER, v." - ISCSI_VERSION_STR, + .name = "iSCSI Initiator over iSER, v." DRV_VER, .queuecommand = iscsi_queuecommand, .can_queue = ISCSI_XMIT_CMDS_MAX - 1, .sg_tablesize = ISCSI_ISER_SG_TABLESIZE, From rdreier at cisco.com Wed May 24 07:23:33 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 07:23:33 -0700 Subject: [openib-general] Plans for libibverbs 1.1 In-Reply-To: <20060524113048.GF8694@minantech.com> (Gleb Natapov's message of "Wed, 24 May 2006 14:30:48 +0300") References: <20060524113048.GF8694@minantech.com> Message-ID: Gleb> What about madvice patch? Is it scheduled to go into 1.1? Yes, in some form. - R. From rdreier at cisco.com Wed May 24 07:24:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 07:24:07 -0700 Subject: [openib-general] Re: Fwd: [Bug 91] sizeof(srp_indirect_buf) wrongon 64-bit platforms In-Reply-To: <87irnvpnez.fsf@confield.dd.xiranet.com> (Arne Redlich's message of "Wed, 24 May 2006 15:23:00 +0200") References: <877j4dnrn6.fsf@confield.dd.xiranet.com> <20060523090340.GA12666@mellanox.co.il> <87mzd9yod0.fsf@confield.dd.xiranet.com> <87irnvpnez.fsf@confield.dd.xiranet.com> Message-ID: Arne> Do you (or anyone else?) want (me) to forward the fix to the Arne> stable team anyway? I don't see any reason, given that it has no practical impact right now. - R. From rdreier at cisco.com Wed May 24 07:25:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 07:25:32 -0700 Subject: [openib-general] Re: [PATCH] mthca: fix posting lists of 256 entries for tavor In-Reply-To: <20060524093719.GB21266@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 24 May 2006 12:37:19 +0300") References: <20060518153254.GF30211@mellanox.co.il> <20060524093719.GB21266@mellanox.co.il> Message-ID: Michael> BTW, srq will have the same problem in tavor, won't it? Michael> Both kernel and userspace code look quite similiar. I'll check. - R. From rdreier at cisco.com Wed May 24 07:26:16 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 07:26:16 -0700 Subject: [openib-general] Re: [PATCH 1/2] mthca support for max_map_per_fmr device attribute In-Reply-To: <447420EC.90403@voltaire.com> (Or Gerlitz's message of "Wed, 24 May 2006 12:01:32 +0300") References: <447420EC.90403@voltaire.com> Message-ID: Or> OK, fair enough, i will need at least some kickoff Or> helping... can you comment if the above calculation is indeed Or> broken under memfree? if yes, is it broken under both Or> Arbel/Sinai? where should i look into the driver or i should Or> look in the PRM? Sorry, I don't have time to look back at all the details now. You will have to read the relevant PRMs and code to figure it out. - R. From rdreier at cisco.com Wed May 24 07:30:27 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 07:30:27 -0700 Subject: [openib-general] [PATCH] IB/iser remove usage of ISCSI_VERSION_STR In-Reply-To: <20060524141458.GA6483@voltaire.com> (Or Gerlitz's message of "Wed, 24 May 2006 17:14:58 +0300") References: <20060524141458.GA6483@voltaire.com> Message-ID: Thanks, applied and pushed out. From rdreier at cisco.com Wed May 24 07:32:32 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 07:32:32 -0700 Subject: [openib-general] Re: [PATCH] mthca: fix posting lists of 256 entries for tavor In-Reply-To: (Roland Dreier's message of "Wed, 24 May 2006 07:25:32 -0700") References: <20060518153254.GF30211@mellanox.co.il> <20060524093719.GB21266@mellanox.co.il> Message-ID: BTW, any comment on this: http://openib.org/bugzilla/show_bug.cgi?id=94 Is this the same bug? - R. From paul.lundin at gmail.com Wed May 24 07:42:32 2006 From: paul.lundin at gmail.com (Paul) Date: Wed, 24 May 2006 10:42:32 -0400 Subject: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o In-Reply-To: References: Message-ID: Scott, That worked. Thanks for the info. Regards. On 5/23/06, Scott Weitzenkamp (sweitzen) wrote: > > No clue, I know if you grab OFED 1.0 rc4 tarball and run install.sh, it > should work. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > ------------------------------ > *From:* Paul [mailto:paul.lundin at gmail.com] > *Sent:* Tuesday, May 23, 2006 12:42 PM > *To:* Scott Weitzenkamp (sweitzen) > *Cc:* openib-general at openib.org > *Subject:* Re: [openib-general] Compilation issues on rhel4 u3 ppc64 > sysfs.o > > Scott, > Thanks for the confirmation and the quick reply. Any ideas as to what > might be causing the error in question ? > > Regards. > > On 5/23/06, Scott Weitzenkamp (sweitzen) wrote: > > > > OFED 1.0 rc4 does compile and run on RHEL4 U3 ppc64. > > > > Scott Weitzenkamp > > SQA and Release Manager > > Server Virtualization Business Unit > > Cisco Systems > > > > > > ------------------------------ > > *From:* openib-general-bounces at openib.org [mailto: > > openib-general-bounces at openib.org] *On Behalf Of *Paul Lundin > > *Sent:* Tuesday, May 23, 2006 12:34 PM > > *To:* openib-general at openib.org > > *Subject:* [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o > > > > Hi All, > > I just started working with openIB in the past week. I am having an > > issue getting the kernel modules to compile with the stock rhel4 u3 kernel. > > I have applied the patches found at https://openib.org/svn/gen2/branches/backport/2.6.9_U3/ > > and followed the instructions from https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet > > but I have been getting the following error: > > > > LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/built-in.o > > LD /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/built-in.o > > CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64 > > /drivers/infiniband/core/index.o > > CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/addr.o > > CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.o > > /usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband/core/cm.c: In > > function `ib_cm_cleanup': > > /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/cm.c:3367: > > warning: implicit declaration of function `idr_destroy' > > CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64 > > /drivers/infiniband/core/packer.o > > CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64 > > /drivers/infiniband/core/ud_header.o > > CC [M] /usr/src/kernels/2.6.9-34.EL-ppc64 > > /drivers/infiniband/core/verbs.o > > CC [M] /usr/src/kernels/2.6.9- 34.EL-ppc64 > > /drivers/infiniband/core/sysfs.o > > /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.c:693: > > error: unknown field `uevent' specified in initializer > > /usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.c:693: > > warning: initialization from incompatible pointer type > > make[2]: *** [/usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core/sysfs.o] > > Error 1 > > make[1]: *** [/usr/src/kernels/2.6.9-34.EL-ppc64/drivers/infiniband/core] > > Error 2 > > make: *** [_module_/usr/src/kernels/2.6.9- 34.EL-ppc64/drivers/infiniband] > > Error 2 > > make: Leaving directory `/usr/src/kernels/2.6.9-34.EL-ppc64' > > > > Any help would be appreciated. As noted this is on a ppc64 machine. The > > rhel4 u3 install does *NOT* configure openIB by default like it does on > > intel architectures. I was wondering if openIB has been tested at all on > > ppc64 and if this was even possible at this point. > > > > Regards. > > Paul > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Wed May 24 07:46:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 17:46:30 +0300 Subject: [openib-general] Re: [PATCH] mthca: fix posting lists of 256 entries for tavor In-Reply-To: References: <20060518153254.GF30211@mellanox.co.il> <20060524093719.GB21266@mellanox.co.il> Message-ID: <20060524144630.GR21266@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] mthca: fix posting lists of 256 entries for tavor > > Michael> BTW, srq will have the same problem in tavor, won't it? > Michael> Both kernel and userspace code look quite similiar. > > I'll check. Yes, just got a report that posting list of 256 entries on SRQ on tavor fails. Its the same problem. -- MST From bos at pathscale.com Wed May 24 07:45:58 2006 From: bos at pathscale.com (Bryan O'Sullivan) Date: Wed, 24 May 2006 07:45:58 -0700 Subject: [openib-general] Re: [PATCH 1 of 10] ipath - fix spinlock recursion bug In-Reply-To: References: <1148419611.22550.11.camel@chalcedony.pathscale.com> Message-ID: <1148481958.5652.27.camel@chalcedony.pathscale.com> On Tue, 2006-05-23 at 14:31 -0700, Roland Dreier wrote: > It's probably OK as long as it's pure code motion. I'll recheck and make sure that it is before I send you anything. Thanks. > What I want to > avoid is the giant combo patch that does several different things, > because if someone later bisects a regression back to that patch, > we're kind of screwed... Yeah, I've been doing some educating lately about that :-) References: <20060518153254.GF30211@mellanox.co.il> <20060524093719.GB21266@mellanox.co.il> Message-ID: <20060524144742.GS21266@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: Re: [PATCH] mthca: fix posting lists of 256 entries for tavor > > BTW, any comment on this: > > http://openib.org/bugzilla/show_bug.cgi?id=94 > > Is this the same bug? No idea - the site seems to be down :) -- MST From rdreier at cisco.com Wed May 24 07:52:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 07:52:13 -0700 Subject: [openib-general] Re: [PATCH] mthca: fix posting lists of 256 entries for tavor In-Reply-To: <20060524144742.GS21266@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 24 May 2006 17:47:42 +0300") References: <20060518153254.GF30211@mellanox.co.il> <20060524093719.GB21266@mellanox.co.il> <20060524144742.GS21266@mellanox.co.il> Message-ID: Michael> No idea - the site seems to be down :) It's working from here -- must be an issue in your network. Anyway the report is: ************************************************************* Host Architecture : x86_64 Linux Distribution: Fedora Core release 4 (Stentz) Kernel Version : 2.6.11-1.1369_FC4smp Memory size : 4071672 kB Driver Version : OFED-1.0-rc5-pre5 HCA ID(s) : mthca0 HCA model(s) : 25208 FW version(s) : 4.7.600 Board(s) : MT_00A0010001 ************************************************************* posting a list of multiples of 256 WR to SRQ or QP may be corrupted. The WR list that is being posted may be posted to a different QP than the QP number of the QP handle. test to reproduce it: qp_test daemon: qp_test --daemon client: qp_test --thread=15 --oust=256 --srq CLIENT SR 1 1 or qp_test --thread=15 --oust=256 CLIENT SR 1 1 From swise at opengridcomputing.com Wed May 24 07:53:21 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 24 May 2006 09:53:21 -0500 Subject: [openib-general] krping test utility In-Reply-To: <309a667c0605232129x66d8cc5ek1bd05d22c7e1db7@mail.gmail.com> References: <309a667c0605232129x66d8cc5ek1bd05d22c7e1db7@mail.gmail.com> Message-ID: <1148482401.3942.2.camel@stevo-desktop> On Wed, 2006-05-24 at 09:59 +0530, Devesh Sharma wrote: > Hello all, > > In the krping test utility get_dma_mr is called with access > premissions IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE| > IB_ACCESS_REMOTE_READ, But the lkey we get from get_dma_mr is similar > to reserved lkey with which only Local operations are allowed, but > here it seems violating that statement. What exactly do you think is incorrect? I don't understand your question. Steve. From mst at mellanox.co.il Wed May 24 07:59:34 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 17:59:34 +0300 Subject: [openib-general] Re: [PATCH] mthca: fix posting lists of 256 entries for tavor In-Reply-To: References: <20060518153254.GF30211@mellanox.co.il> <20060524093719.GB21266@mellanox.co.il> <20060524144742.GS21266@mellanox.co.il> Message-ID: <20060524145934.GU21266@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] mthca: fix posting lists of 256 entries for tavor > > Michael> No idea - the site seems to be down :) > > It's working from here -- must be an issue in your network. > > Anyway the report is: > > ************************************************************* > Host Architecture : x86_64 > Linux Distribution: Fedora Core release 4 (Stentz) > Kernel Version : 2.6.11-1.1369_FC4smp > Memory size : 4071672 kB > Driver Version : OFED-1.0-rc5-pre5 > HCA ID(s) : mthca0 > HCA model(s) : 25208 > FW version(s) : 4.7.600 > Board(s) : MT_00A0010001 > ************************************************************* > > posting a list of multiples of 256 WR to SRQ or QP may be corrupted. > The WR list that is being posted may be posted to a different QP than > the QP > number of the QP handle. > > test to reproduce it: qp_test > daemon: > qp_test --daemon > client: > qp_test --thread=15 --oust=256 --srq CLIENT SR 1 1 > or > qp_test --thread=15 --oust=256 CLIENT SR 1 1 > Yep, looks like it. Testing the following patch now, will let you know. Index: openib/src/userspace/libmthca/src/srq.c =================================================================== --- openib/src/userspace/libmthca/src/srq.c (revision 7317) +++ openib/src/userspace/libmthca/src/srq.c (working copy) @@ -97,24 +97,7 @@ first_ind = srq->first_free; - for (nreq = 0; wr; ++nreq, wr = wr->next) { - if (nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB) { - nreq = 0; - - doorbell[0] = htonl(first_ind << srq->wqe_shift); - doorbell[1] = htonl(srq->srqn << 8); - - /* - * Make sure that descriptors are written - * before doorbell is rung. - */ - mb(); - - mthca_write64(doorbell, to_mctx(ibsrq->context), MTHCA_RECV_DOORBELL); - - first_ind = srq->first_free; - } - + for (nreq = 0; wr; wr = wr->next) { ind = srq->first_free; if (ind < 0) { @@ -172,6 +155,23 @@ srq->wrid[ind] = wr->wr_id; srq->first_free = next_ind; + + if (++nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB) { + nreq = 0; + + doorbell[0] = htonl(first_ind << srq->wqe_shift); + doorbell[1] = htonl(srq->srqn << 8); + + /* + * Make sure that descriptors are written + * before doorbell is rung. + */ + mb(); + + mthca_write64(doorbell, to_mctx(ibsrq->context), MTHCA_RECV_DOORBELL); + + first_ind = srq->first_free; + } } if (nreq) { -- MST From jackm at mellanox.co.il Wed May 24 08:06:46 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Wed, 24 May 2006 18:06:46 +0300 Subject: [openib-general] which dapl/udapl changes in trunk should be imported into OFED branch? (patch enclosed) Message-ID: <200605241806.48546.jackm@mellanox.co.il> Hi, Below is a patch file of differences between the OFED dapl library and the openib main trunk dapl library. Please indicate which of the dapl library changes are necessary for the Intel MPI to work correctly in OFED. Thanks! - Jack ------------------------ Index: test/dapltest/test/dapl_server.c =================================================================== --- test/dapltest/test/dapl_server.c (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ test/dapltest/test/dapl_server.c (.../trunk/src/userspace/dapl) (revision 7464) @@ -50,7 +50,7 @@ DT_cs_Server (Params_t * params_ptr) Started_server_t *temp_list = NULL; Started_server_t *pre_list = NULL; unsigned char *buffp = NULL; - unsigned char *module = "DT_cs_Server"; + char *module = "DT_cs_Server"; DAT_DTO_COOKIE dto_cookie; DAT_DTO_COMPLETION_EVENT_DATA dto_stat; @@ -842,7 +842,7 @@ send_control_data ( Per_Server_Data_t *ps_ptr, Per_Test_Data_t *pt_ptr) { - unsigned char *module = "send_control_data"; + char *module = "send_control_data"; DAT_DTO_COOKIE dto_cookie; DAT_DTO_COMPLETION_EVENT_DATA dto_stat; Index: test/dapltest/test/dapl_bpool.c =================================================================== --- test/dapltest/test/dapl_bpool.c (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ test/dapltest/test/dapl_bpool.c (.../trunk/src/userspace/dapl) (revision 7464) @@ -56,7 +56,7 @@ DT_BpoolAlloc ( DAT_BOOLEAN enable_rdma_write, DAT_BOOLEAN enable_rdma_read) { - unsigned char *module = "DT_BpoolAlloc"; + char *module = "DT_BpoolAlloc"; unsigned char *alloc_ptr = 0; Bpool *bpool_ptr = 0; DAT_COUNT alloc_size; @@ -254,7 +254,7 @@ DT_Bpool_Destroy (Per_Test_Data_t * pt_p DT_Tdep_Print_Head *phead, Bpool * bpool_ptr) { - unsigned char *module = "DT_Bpool_Destroy"; + char *module = "DT_Bpool_Destroy"; bool rval = true; if (bpool_ptr) Index: test/dapltest/test/dapl_test_util.c =================================================================== --- test/dapltest/test/dapl_test_util.c (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ test/dapltest/test/dapl_test_util.c (.../trunk/src/userspace/dapl) (revision 7464) @@ -38,7 +38,7 @@ DT_query ( Per_Test_Data_t *pt_ptr, DAT_IA_HANDLE ia_handle, DAT_EP_HANDLE ep_handle) { - unsigned char *module = "DT_query"; + char *module = "DT_query"; DAT_EVD_HANDLE async_evd_hdl; /* not used */ DAT_EP_PARAM ep_params; DAT_RETURN ret; Index: test/dapltest/test/dapl_client.c =================================================================== --- test/dapltest/test/dapl_client.c (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ test/dapltest/test/dapl_client.c (.../trunk/src/userspace/dapl) (revision 7464) @@ -57,7 +57,7 @@ DT_cs_Client (Params_t * params_ptr, Performance_Cmd_t *Performance_Cmd = NULL; Bpool *bpool = NULL; DAT_IA_ADDRESS_PTR server_netaddr = NULL; - unsigned char *module = "DT_cs_Client"; + char *module = "DT_cs_Client"; unsigned int did_connect = 0; unsigned int retry_cnt = 0; DAT_DTO_COOKIE dto_cookie; Index: test/dapltest/test/dapl_transaction_util.c =================================================================== --- test/dapltest/test/dapl_transaction_util.c (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ test/dapltest/test/dapl_transaction_util.c (.../trunk/src/userspace/dapl) (revision 7464) @@ -641,7 +641,7 @@ DT_handle_rdma_op (DT_Tdep_Print_Head *p */ bool DT_check_params (Per_Test_Data_t *pt_ptr, - unsigned char *module) + char *module) { Transaction_Cmd_t * cmd = &pt_ptr->Params.u.Transaction_Cmd; unsigned long num_recvs = 0U; Index: test/dapltest/test/dapl_transaction_test.c =================================================================== --- test/dapltest/test/dapl_transaction_test.c (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ test/dapltest/test/dapl_transaction_test.c (.../trunk/src/userspace/dapl) (revision 7464) @@ -972,38 +972,42 @@ retry: } /* end foreach op */ /* - * Send our memory info (synchronously) + * Send our memory info. The client performs the first send to comply + * with the iWARP MPA protocol's "Connection Startup Rules". */ DT_Tdep_PT_Debug (1,(phead,"Test[" F64x "]: Sending %s Memory Info\n", test_ptr->base_port, test_ptr->is_server ? "Server" : "Client")); - /* post the send buffer */ - if (!DT_post_send_buffer (phead, + if (!test_ptr->is_server ) { + + /* post the send buffer */ + if (!DT_post_send_buffer (phead, test_ptr->ep_context[i].ep_handle, test_ptr->ep_context[i].bp, RMI_SEND_BUFFER_ID, buff_size)) - { - /* error message printed by DT_post_send_buffer */ - goto test_failure; - } - /* reap the send and verify it */ - dto_cookie.as_64 = LZERO; - dto_cookie.as_ptr = - (DAT_PVOID) DT_Bpool_GetBuffer ( - test_ptr->ep_context[i].bp, - RMI_SEND_BUFFER_ID); - if (!DT_dto_event_wait (phead, test_ptr->reqt_evd_hdl, &dto_stat) || - !DT_dto_check ( phead, + { + /* error message printed by DT_post_send_buffer */ + goto test_failure; + } + /* reap the send and verify it */ + dto_cookie.as_64 = LZERO; + dto_cookie.as_ptr = + (DAT_PVOID) DT_Bpool_GetBuffer ( + test_ptr->ep_context[i].bp, + RMI_SEND_BUFFER_ID); + if (!DT_dto_event_wait (phead, test_ptr->reqt_evd_hdl, &dto_stat) || + !DT_dto_check ( phead, &dto_stat, test_ptr->ep_context[i].ep_handle, buff_size, dto_cookie, test_ptr->is_server ? "Client_Mem_Info_Send" : "Server_Mem_Info_Send")) - { - goto test_failure; + { + goto test_failure; + } } /* @@ -1029,6 +1033,36 @@ retry: goto test_failure; } + if (test_ptr->is_server ) { + /* post the send buffer */ + if (!DT_post_send_buffer (phead, + test_ptr->ep_context[i].ep_handle, + test_ptr->ep_context[i].bp, + RMI_SEND_BUFFER_ID, + buff_size)) + { + /* error message printed by DT_post_send_buffer */ + goto test_failure; + } + /* reap the send and verify it */ + dto_cookie.as_64 = LZERO; + dto_cookie.as_ptr = + (DAT_PVOID) DT_Bpool_GetBuffer ( + test_ptr->ep_context[i].bp, + RMI_SEND_BUFFER_ID); + if (!DT_dto_event_wait (phead, test_ptr->reqt_evd_hdl, &dto_stat) || + !DT_dto_check ( phead, + &dto_stat, + test_ptr->ep_context[i].ep_handle, + buff_size, + dto_cookie, + test_ptr->is_server ? "Client_Mem_Info_Send" + : "Server_Mem_Info_Send")) + { + goto test_failure; + } + } + /* * Extract what we need */ Index: test/dapltest/include/dapl_proto.h =================================================================== --- test/dapltest/include/dapl_proto.h (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ test/dapltest/include/dapl_proto.h (.../trunk/src/userspace/dapl) (revision 7464) @@ -524,8 +524,8 @@ bool DT_handle_rdma_op (DT_Td int op_indx, bool poll); -bool DT_check_params (Per_Test_Data_t *pt_ptr, - unsigned char *module); +bool DT_check_params (Per_Test_Data_t *pt_ptr, + char *module); void DT_Test_Error (void); Index: test/dtest/dtest.c =================================================================== --- test/dtest/dtest.c (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ test/dtest/dtest.c (.../trunk/src/userspace/dapl) (revision 7464) @@ -33,6 +33,7 @@ */ #include #include +#include #include #include #include @@ -40,6 +41,7 @@ #include #include #include +#include #ifndef DAPL_PROVIDER #define DAPL_PROVIDER "OpenIB-ib0" @@ -546,7 +548,7 @@ send_msg( void *dat if ((event.event_data.dto_completion_event_data.transfered_length != size ) || (event.event_data.dto_completion_event_data.user_cookie.as_64 != 0xaaaa )) { - fprintf(stderr, "%d: ERROR: DTO len %d or cookie %x\n", + fprintf(stderr, "%d: ERROR: DTO len %d or cookie " PRIx64 "\n", getpid(), event.event_data.dto_completion_event_data.transfered_length, event.event_data.dto_completion_event_data.user_cookie.as_64 ); @@ -833,7 +835,7 @@ connect_ep( char *hostname, int conn_id sizeof( DAT_RMR_TRIPLET )) || (event.event_data.dto_completion_event_data.user_cookie.as_64 != recv_msg_index) ) { - fprintf(stderr,"ERR recv event: len=%d cookie=%d expected %d/%d\n", + fprintf(stderr,"ERR recv event: len=%d cookie=" PRIx64 " expected %d/%d\n", (int)event.event_data.dto_completion_event_data.transfered_length, (int)event.event_data.dto_completion_event_data.user_cookie.as_64, sizeof(DAT_RMR_TRIPLET), recv_msg_index ); @@ -1045,7 +1047,7 @@ do_rdma_write_with_msg( ) if ( (event.event_data.dto_completion_event_data.transfered_length != sizeof( DAT_RMR_TRIPLET )) || (event.event_data.dto_completion_event_data.user_cookie.as_64 != recv_msg_index) ) { + - fprintf(stderr,"unexpected event data for receive: len=%d cookie=%d exp %d/%d\n", + fprintf(stderr,"unexpected event data for receive: len=%d cookie=" PRIx64 " exp %d/%d\n", (int)event.event_data.dto_completion_event_data.transfered_length, (int)event.event_data.dto_completion_event_data.user_cookie.as_64, sizeof(DAT_RMR_TRIPLET), recv_msg_index ); @@ -1155,7 +1157,7 @@ do_rdma_read_with_msg( ) } if ((event.event_data.dto_completion_event_data.transfered_length != buf_len ) || (event.event_data.dto_completion_event_data.user_cookie.as_64 != 0x9999 )) { - fprintf(stderr, "%d: ERROR: DTO len %d or cookie %x\n", + fprintf(stderr, "%d: ERROR: DTO len %d or cookie " PRIx64 "\n", getpid(), event.event_data.dto_completion_event_data.transfered_length, event.event_data.dto_completion_event_data.user_cookie.as_64 ); @@ -1237,7 +1239,7 @@ do_rdma_read_with_msg( ) if ( (event.event_data.dto_completion_event_data.transfered_length != sizeof( DAT_RMR_TRIPLET )) || (event.event_data.dto_completion_event_data.user_cookie.as_64 != recv_msg_index) ) { - fprintf(stderr,"unexpected event data for receive: len=%d cookie=%d exp %d/%d\n", + fprintf(stderr,"unexpected event data for receive: len=%d cookie=" PRIx64 " exp %d/%d\n", (int)event.event_data.dto_completion_event_data.transfered_length, (int)event.event_data.dto_completion_event_data.user_cookie.as_64, sizeof(DAT_RMR_TRIPLET), recv_msg_index ); @@ -1272,9 +1274,9 @@ do_ping_pong_msg( ) DAT_DTO_COOKIE cookie; DAT_LMR_TRIPLET l_iov; DAT_RETURN ret; - int i; - unsigned char *snd_buf; - unsigned char *rcv_buf; + int i; + char *snd_buf; + char *rcv_buf; printf("\n %d PING DATA with SEND MSG\n\n",getpid()); @@ -1389,7 +1391,7 @@ do_ping_pong_msg( ) != buf_len) || (event.event_data.dto_completion_event_data.user_cookie.as_64 != burst_msg_index) ) { - fprintf(stderr,"ERR: recv event: len=%d cookie=%d exp %d/%d\n", + fprintf(stderr,"ERR: recv event: len=%d cookie=" PRIx64 " exp %d/%d\n", (int)event.event_data.dto_completion_event_data.transfered_length, (int)event.event_data.dto_completion_event_data.user_cookie.as_64, buf_len, burst_msg_index ); Index: dapl/common/dapl_ep_create.c =================================================================== --- dapl/common/dapl_ep_create.c (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ dapl/common/dapl_ep_create.c (.../trunk/src/userspace/dapl) (revision 7464) @@ -310,7 +310,10 @@ dapl_ep_create ( * * N.B. This should really be done by a util routine. */ - dapl_os_atomic_inc (& ((DAPL_EVD *)connect_evd_handle)->evd_ref_count); + if (connect_evd_handle != DAT_HANDLE_NULL) + { + dapl_os_atomic_inc (& ((DAPL_EVD *)connect_evd_handle)->evd_ref_count); + } /* Optional handles */ if (recv_evd_handle != DAT_HANDLE_NULL) { Index: dapl/common/dapl_ep_util.c =================================================================== --- dapl/common/dapl_ep_util.c (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ dapl/common/dapl_ep_util.c (.../trunk/src/userspace/dapl) (revision 7464) @@ -39,6 +39,7 @@ #include "dapl_cookie.h" #include "dapl_adapter_util.h" #include "dapl_evd_util.h" +#include "dapl_cr_util.h" /* for callback routine */ /* * Local definitions @@ -570,6 +571,65 @@ bail: #endif /* DAPL_DBG_IO_TRC */ /* + * Generate a disconnect event on abruct close for older verbs providers + * that do not do it automatically. + */ + +void +dapl_ep_legacy_post_disconnect( + DAPL_EP *ep_ptr, + DAT_CLOSE_FLAGS disconnect_flags) +{ + ib_cm_events_t ib_cm_event; + DAPL_CR *cr_ptr; + + /* + * Acquire the lock and make sure we didn't get a callback + * that cleaned up. + */ + dapl_os_lock ( &ep_ptr->header.lock ); + if (disconnect_flags == DAT_CLOSE_ABRUPT_FLAG && + ep_ptr->param.ep_state == DAT_EP_STATE_DISCONNECT_PENDING ) + { + /* + * If this is an ABRUPT close, the provider will not generate + * a disconnect message so we do it manually here. Just invoke + * the CM callback as it will clean up the appropriate + * data structures, reset the state, and generate the event + * on the way out. Obtain the provider dependent cm_event to + * pass into the callback for a disconnect. + */ + ib_cm_event = dapls_ib_get_cm_event (DAT_CONNECTION_EVENT_DISCONNECTED); + + cr_ptr = ep_ptr->cr_ptr; + dapl_os_unlock ( &ep_ptr->header.lock ); + + if (cr_ptr != NULL) + { + dapl_dbg_log (DAPL_DBG_TYPE_API | DAPL_DBG_TYPE_CM, + " dapl_ep_disconnect force callback on EP %p CM handle %x\n", + ep_ptr, cr_ptr->ib_cm_handle); + + dapls_cr_callback (cr_ptr->ib_cm_handle, + ib_cm_event, + NULL, + cr_ptr->sp_ptr); + } + else + { + dapl_evd_connection_callback (ep_ptr->cm_handle, + ib_cm_event, + NULL, + (void *) ep_ptr); + } + } + else + { + dapl_os_unlock ( &ep_ptr->header.lock ); + } +} + +/* * Local variables: * c-indent-level: 4 * c-basic-offset: 4 Index: dapl/common/dapl_ep_util.h =================================================================== --- dapl/common/dapl_ep_util.h (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ dapl/common/dapl_ep_util.h (.../trunk/src/userspace/dapl) (revision 7464) @@ -77,4 +77,9 @@ DAT_RETURN_SUBTYPE dapls_ep_state_subtype( IN DAPL_EP *ep_ptr ); +extern void +dapl_ep_legacy_post_disconnect( + DAPL_EP *ep_ptr, + DAT_CLOSE_FLAGS disconnect_flags); + #endif /* _DAPL_EP_UTIL_H_ */ Index: dapl/common/dapl_init.h =================================================================== --- dapl/common/dapl_init.h (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ dapl/common/dapl_init.h (.../trunk/src/userspace/dapl) (revision 7464) @@ -48,4 +48,10 @@ extern void DAT_PROVIDER_FINI_FUNC_NAME ( IN const DAT_PROVIDER_INFO * ); +extern void +dapl_init ( void ) ; + +extern void +dapl_fini ( void ) ; + #endif Index: dapl/common/dapl_ep_disconnect.c =================================================================== --- dapl/common/dapl_ep_disconnect.c (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ dapl/common/dapl_ep_disconnect.c (.../trunk/src/userspace/dapl) (revision 7464) @@ -42,7 +42,6 @@ #include "dapl_sp_util.h" #include "dapl_evd_util.h" #include "dapl_adapter_util.h" -#include "dapl_cr_util.h" /* for callback routine */ /* * dapl_ep_disconnect @@ -70,8 +69,6 @@ dapl_ep_disconnect ( { DAPL_EP *ep_ptr; DAPL_EVD *evd_ptr; - DAPL_CR *cr_ptr; - ib_cm_events_t ib_cm_event; DAT_RETURN dat_status; dapl_dbg_log (DAPL_DBG_TYPE_API | DAPL_DBG_TYPE_CM, @@ -175,51 +172,6 @@ dapl_ep_disconnect ( dapl_os_unlock ( &ep_ptr->header.lock ); dat_status = dapls_ib_disconnect ( ep_ptr, disconnect_flags ); - /* - * Reacquire the lock and make sure we didn't get a callback - * that cleaned up. - */ - dapl_os_lock ( &ep_ptr->header.lock ); - if (disconnect_flags == DAT_CLOSE_ABRUPT_FLAG && - ep_ptr->param.ep_state == DAT_EP_STATE_DISCONNECT_PENDING ) - { - /* - * If this is an ABRUPT close, the provider will not generate - * a disconnect message so we do it manually here. Just invoke - * the CM callback as it will clean up the appropriate - * data structures, reset the state, and generate the event - * on the way out. Obtain the provider dependent cm_event to - * pass into the callback for a disconnect. - */ - ib_cm_event = dapls_ib_get_cm_event (DAT_CONNECTION_EVENT_DISCONNECTED); - - cr_ptr = ep_ptr->cr_ptr; - dapl_os_unlock ( &ep_ptr->header.lock ); - - if (cr_ptr != NULL) - { - dapl_dbg_log (DAPL_DBG_TYPE_API | DAPL_DBG_TYPE_CM, - " dapl_ep_disconnect force callback on EP %p CM handle %x\n", - ep_ptr, cr_ptr->ib_cm_handle); - - dapls_cr_callback (cr_ptr->ib_cm_handle, - ib_cm_event, - NULL, - cr_ptr->sp_ptr); - } - else - { - dapl_evd_connection_callback (ep_ptr->cm_handle, - ib_cm_event, - NULL, - (void *) ep_ptr); - } - } - else - { - dapl_os_unlock ( &ep_ptr->header.lock ); - } - bail: dapl_dbg_log (DAPL_DBG_TYPE_RTN | DAPL_DBG_TYPE_CM, "dapl_ep_disconnect () returns 0x%x\n", Index: dapl/openib/dapl_ib_cm.c =================================================================== --- dapl/openib/dapl_ib_cm.c (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ dapl/openib/dapl_ib_cm.c (.../trunk/src/userspace/dapl) (revision 7464) @@ -52,6 +52,7 @@ #include "dapl.h" #include "dapl_adapter_util.h" +#include "dapl_ep_util.h" #include "dapl_evd_util.h" #include "dapl_cr_util.h" #include "dapl_name_service.h" @@ -689,6 +690,8 @@ dapls_ib_disconnect ( ep_ptr->cm_handle, status); } + dapl_ep_legacy_post_disconnect(ep_ptr, close_flags) + return DAT_SUCCESS; } Index: dapl/openib_cma/dapl_ib_cm.c =================================================================== --- dapl/openib_cma/dapl_ib_cm.c (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ dapl/openib_cma/dapl_ib_cm.c (.../trunk/src/userspace/dapl) (revision 7464) @@ -287,14 +287,24 @@ static void dapli_cm_active_cb(struct da NULL, conn->ep); break; case RDMA_CM_EVENT_REJECTED: + { + ib_cm_events_t cm_event; + + /* no device type specified so assume IB for now */ + if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */ + cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; + else + cm_event = IB_CME_DESTINATION_REJECT; + dapl_dbg_log( DAPL_DBG_TYPE_WARN, " dapli_cm_active_handler: REJECTED reason=%d\n", event->status); - dapl_evd_connection_callback(conn, IB_CME_DESTINATION_REJECT, - NULL, conn->ep); + + dapl_evd_connection_callback(conn, cm_event, NULL, conn->ep); + break; - + } case RDMA_CM_EVENT_ESTABLISHED: dapl_dbg_log(DAPL_DBG_TYPE_CM, @@ -383,6 +393,14 @@ static void dapli_cm_passive_cb(struct d break; case RDMA_CM_EVENT_REJECTED: + { + ib_cm_events_t cm_event; + + /* no device type specified so assume IB for now */ + if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED */ + cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; + else + cm_event = IB_CME_DESTINATION_REJECT; dapl_dbg_log( DAPL_DBG_TYPE_WARN, @@ -397,10 +415,11 @@ static void dapli_cm_passive_cb(struct d &ipaddr->dst_addr)->sin_addr.s_addr), ntohs(((struct sockaddr_in *) &ipaddr->dst_addr)->sin_port)); - - dapls_cr_callback(conn, IB_CME_DESTINATION_REJECT, - NULL, conn->sp); + + dapls_cr_callback(conn, cm_event, NULL, conn->sp); + break; + } case RDMA_CM_EVENT_ESTABLISHED: dapl_dbg_log(DAPL_DBG_TYPE_CM, Index: dapl/openib_cma/dapl_ib_cq.c =================================================================== --- dapl/openib_cma/dapl_ib_cq.c (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ dapl/openib_cma/dapl_ib_cq.c (.../trunk/src/userspace/dapl) (revision 7464) @@ -481,7 +481,6 @@ dapls_ib_wait_object_wait(IN ib_wait_obj { struct dapl_evd *evd_ptr; struct ibv_cq *ibv_cq = NULL; - void *ibv_ctx = NULL; int status = 0; int timeout_ms = -1; struct pollfd cq_fd = { @@ -518,8 +517,8 @@ dapls_ib_wait_object_wait(IN ib_wait_obj status = errno; dapl_dbg_log(DAPL_DBG_TYPE_UTIL, - " cq_object_wait: RET evd %p ibv_cq %p ibv_ctx %p %s\n", - evd_ptr, ibv_cq,ibv_ctx,strerror(errno)); + " cq_object_wait: RET evd %p ibv_cq %p %s\n", + evd_ptr, ibv_cq, strerror(errno)); return(dapl_convert_errno(status,"cq_wait_object_wait")); Index: configure.in =================================================================== --- configure.in (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ configure.in (.../trunk/src/userspace/dapl) (revision 7464) @@ -1,11 +1,11 @@ dnl Process this file with autoconf to produce a configure script. AC_PREREQ(2.57) -AC_INIT(libdat, 1.2.0, dapl-devel at lists.sourceforge.net) +AC_INIT(dapl, 1.2.0, dapl-devel at lists.sourceforge.net) AC_CONFIG_SRCDIR([dat/udat/udat.c]) AC_CONFIG_AUX_DIR(config) AM_CONFIG_HEADER(config.h) -AM_INIT_AUTOMAKE(libdat, 1.2.0) +AM_INIT_AUTOMAKE(dapl, 1.2.0) AM_PROG_LIBTOOL Index: Makefile.am =================================================================== --- Makefile.am (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ Makefile.am (.../trunk/src/userspace/dapl) (revision 7464) @@ -350,5 +350,5 @@ EXTRA_DIST = dat/common/dat_dictionary.h dapl/udapl/libdaplscm.map \ libdat.spec.in -dist-hook: libdat.spec LICENSE*.txt - cp libdat.spec LICENSE*.txt $(distdir) +dist-hook: libdat.spec + cp libdat.spec $(distdir) Index: libdat.spec.in =================================================================== --- libdat.spec.in (.../branches/1.0/src/userspace/dapl) (revision 7464) +++ libdat.spec.in (.../trunk/src/userspace/dapl) (revision 7464) @@ -36,7 +36,7 @@ %define rel %{?CUSTOM_RELEASE} %{!?CUSTOM_RELEASE:%RELEASE} Summary: Userspace DAT and DAPL API. -Name: libdat +Name: dapl Version: %ver Release: %rel License: Dual GPL/BSD/CPL @@ -44,7 +44,6 @@ Group: System Environment/Libraries BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) Source: http://openib.org/downloads/%{name}-%{version}.tar.gz Url: http://openib.org/ -BuildRequires: libibverbs-devel librdmacm-devel %description Along with the OpenIB kernel drivers, libdat and libdapl provides a userspace @@ -53,7 +52,7 @@ RDMA API that supports DAT 1.2 specifica %package devel Summary: Development files for the libdat and libdapl libraries Group: System Environment/Libraries -Requires: %{name} = %{version}-%{release} +Requires: %{name} = %{version}-%{release} %description devel Static libraries and header files for the libdat and libdapl library. @@ -79,8 +78,8 @@ rm -rf $RPM_BUILD_ROOT %files %defattr(-,root,root) %{_libdir}/libda*.so.* -%config(noreplace) %{_sysconfdir}/dat.conf -%doc AUTHORS COPYING README LICENSE*.txt +%{_sysconfdir}/dat.conf +%doc AUTHORS COPYING ChangeLog NEWS README %files devel %defattr(-,root,root,-) @@ -96,3 +95,4 @@ rm -rf $RPM_BUILD_ROOT %{_includedir}/dat/udat.h %{_includedir}/dat/udat_redirection.h %{_includedir}/dat/udat_vendor_specific.h +%{_sysconfdir}/dat.conf From dotanb at mellanox.co.il Wed May 24 08:10:45 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Wed, 24 May 2006 18:10:45 +0300 Subject: [openib-general] Re: [PATCH] mthca: fix posting lists of 256 entries for tavor In-Reply-To: <20060524145934.GU21266@mellanox.co.il> References: <20060518153254.GF30211@mellanox.co.il> <20060524145934.GU21266@mellanox.co.il> Message-ID: <200605241810.45801.dotanb@mellanox.co.il> On Wednesday 24 May 2006 17:59, Michael S. Tsirkin wrote: > Quoting r. Roland Dreier : > > Subject: Re: [PATCH] mthca: fix posting lists of 256 entries for tavor > > > > Michael> No idea - the site seems to be down :) > > > > It's working from here -- must be an issue in your network. > > yes. i opened a bug on this issue (posting 256 WR to SRQ) and the previous issue (posting 256 WR to QP). Dotan From mst at mellanox.co.il Wed May 24 08:26:55 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 18:26:55 +0300 Subject: [openib-general] [PATCH] libmthca: fix posting 256 WRs to SRQ in tavor Message-ID: <20060524152655.GW21266@mellanox.co.il> libmthca: fix posting 256 WRs to SRQ in tavor. Same bug as for regular RQ really, so same fix. Signed-off-by: Michael S. Tsirkin Index: openib/src/userspace/libmthca/src/srq.c =================================================================== --- openib/src/userspace/libmthca/src/srq.c (revision 7317) +++ openib/src/userspace/libmthca/src/srq.c (working copy) @@ -97,24 +97,7 @@ int mthca_tavor_post_srq_recv(struct ibv first_ind = srq->first_free; - for (nreq = 0; wr; ++nreq, wr = wr->next) { - if (nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB) { - nreq = 0; - - doorbell[0] = htonl(first_ind << srq->wqe_shift); - doorbell[1] = htonl(srq->srqn << 8); - - /* - * Make sure that descriptors are written - * before doorbell is rung. - */ - mb(); - - mthca_write64(doorbell, to_mctx(ibsrq->context), MTHCA_RECV_DOORBELL); - - first_ind = srq->first_free; - } - + for (nreq = 0; wr; wr = wr->next) { ind = srq->first_free; if (ind < 0) { @@ -172,6 +155,23 @@ int mthca_tavor_post_srq_recv(struct ibv srq->wrid[ind] = wr->wr_id; srq->first_free = next_ind; + + if (++nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB) { + nreq = 0; + + doorbell[0] = htonl(first_ind << srq->wqe_shift); + doorbell[1] = htonl(srq->srqn << 8); + + /* + * Make sure that descriptors are written + * before doorbell is rung. + */ + mb(); + + mthca_write64(doorbell, to_mctx(ibsrq->context), MTHCA_RECV_DOORBELL); + + first_ind = srq->first_free; + } } if (nreq) { -- MST From mst at mellanox.co.il Wed May 24 08:27:07 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 18:27:07 +0300 Subject: [openib-general] [PATCH] mthca: fix posting list of 256 to SRQ Message-ID: <20060524152707.GX21266@mellanox.co.il> Here's the same patch for kernel. --- Fix posting WQE lists of length 246 to SRQ for tavor. This is really the same bug we fixed for non-shared RQ. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.16/drivers/infiniband/hw/mthca/mthca_srq.c =================================================================== --- linux-2.6.16.orig/drivers/infiniband/hw/mthca/mthca_srq.c 2006-05-24 18:07:24.000000000 +0300 +++ linux-2.6.16/drivers/infiniband/hw/mthca/mthca_srq.c 2006-05-24 18:07:19.000000000 +0300 @@ -490,26 +490,7 @@ int mthca_tavor_post_srq_recv(struct ib_ first_ind = srq->first_free; - for (nreq = 0; wr; ++nreq, wr = wr->next) { - if (unlikely(nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB)) { - nreq = 0; - - doorbell[0] = cpu_to_be32(first_ind << srq->wqe_shift); - doorbell[1] = cpu_to_be32(srq->srqn << 8); - - /* - * Make sure that descriptors are written - * before doorbell is rung. - */ - wmb(); - - mthca_write64(doorbell, - dev->kar + MTHCA_RECEIVE_DOORBELL, - MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); - - first_ind = srq->first_free; - } - + for (nreq = 0; wr; wr = wr->next) { ind = srq->first_free; if (ind < 0) { @@ -569,6 +550,25 @@ int mthca_tavor_post_srq_recv(struct ib_ srq->wrid[ind] = wr->wr_id; srq->first_free = next_ind; + + if (unlikely(++nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB)) { + nreq = 0; + + doorbell[0] = cpu_to_be32(first_ind << srq->wqe_shift); + doorbell[1] = cpu_to_be32(srq->srqn << 8); + + /* + * Make sure that descriptors are written + * before doorbell is rung. + */ + wmb(); + + mthca_write64(doorbell, + dev->kar + MTHCA_RECEIVE_DOORBELL, + MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); + + first_ind = srq->first_free; + } } if (likely(nreq)) { -- MST From mst at mellanox.co.il Wed May 24 08:28:28 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 18:28:28 +0300 Subject: [openib-general] Re: [PATCH] mthca: fix posting lists of 256 entries for tavor In-Reply-To: <20060524144630.GR21266@mellanox.co.il> References: <20060518153254.GF30211@mellanox.co.il> <20060524093719.GB21266@mellanox.co.il> <20060524144630.GR21266@mellanox.co.il> Message-ID: <20060524152828.GY21266@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Subject: Re: [PATCH] mthca: fix posting lists of 256 entries for tavor > > Quoting r. Roland Dreier : > > Subject: Re: [PATCH] mthca: fix posting lists of 256 entries for tavor > > > > Michael> BTW, srq will have the same problem in tavor, won't it? > > Michael> Both kernel and userspace code look quite similiar. > > > > I'll check. > > Yes, just got a report that posting list of 256 entries on SRQ on tavor fails. > Its the same problem. I've just tested and posted a fix. Please queue for 2.6.17 and libibverbs 1.0. -- MST From Thomas.Talpey at netapp.com Wed May 24 08:27:47 2006 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Wed, 24 May 2006 11:27:47 -0400 Subject: [openib-general] Re: [PATCH] mthca: fix posting lists of 256 entries for tavor In-Reply-To: References: <20060518153254.GF30211@mellanox.co.il> <20060524093719.GB21266@mellanox.co.il> <20060524144742.GS21266@mellanox.co.il> Message-ID: <7.0.1.0.2.20060524112643.0462efd8@netapp.com> At 10:52 AM 5/24/2006, Roland Dreier wrote: > Michael> No idea - the site seems to be down :) > >It's working from here -- must be an issue in your network. > I saw the same error, but adding "www." to the "openib.org" url fixes it. Tom. >Anyway the report is: > >************************************************************* >Host Architecture : x86_64 >Linux Distribution: Fedora Core release 4 (Stentz) >Kernel Version : 2.6.11-1.1369_FC4smp >Memory size : 4071672 kB >Driver Version : OFED-1.0-rc5-pre5 >HCA ID(s) : mthca0 >HCA model(s) : 25208 >FW version(s) : 4.7.600 >Board(s) : MT_00A0010001 >************************************************************* > >posting a list of multiples of 256 WR to SRQ or QP may be corrupted. >The WR list that is being posted may be posted to a different QP than >the QP >number of the QP handle. > >test to reproduce it: qp_test >daemon: >qp_test --daemon >client: >qp_test --thread=15 --oust=256 --srq CLIENT SR 1 1 > or >qp_test --thread=15 --oust=256 CLIENT SR 1 1 >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From eitan at mellanox.co.il Wed May 24 08:33:39 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 24 May 2006 18:33:39 +0300 Subject: [openib-general] RE: [PATCH] OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_tFix NULL ptr issue Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30236870B@mtlexch01.mtl.com> Hi Hal You missed the line that asserts on null p_next_step just before the code you changed ... EZ Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, May 24, 2006 4:34 PM > To: openib-general at openib.org > Cc: Eitan Zahavi > Subject: [PATCH] OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_tFix > NULL ptr issue > > OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_t Fix NULL ptr > issue > > Signed-off-by: Hal Rosenstock > > Index: opensm/osm_ucast_updn.c > =================================================================== > --- opensm/osm_ucast_updn.c (revision 7435) > +++ opensm/osm_ucast_updn.c (working copy) > @@ -121,10 +121,12 @@ __updn_create_updn_next_step_t(IN updn_s > p_next_step = (updn_next_step_t*) cl_zalloc(sizeof(*p_next_step)); > CL_ASSERT (p_next_step != NULL); > > - p_next_step->state = state; > - p_next_step->p_sw = p_sw; > + if (p_next_step) > + { > + p_next_step->state = state; > + p_next_step->p_sw = p_sw; > + } > return p_next_step; > - > } > > /********************************************************************** > From eitan at mellanox.co.il Wed May 24 08:34:13 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 24 May 2006 18:34:13 +0300 Subject: [openib-general] RE: [PATCH]OpenSM/osm_sa_mcmember_record.c::OpenSM/osm_sa_mcmember_record.c Return 0when no table can be allocated Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30236870C@mtlexch01.mtl.com> Looks right EZ > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, May 24, 2006 4:59 PM > To: openib-general at openib.org > Cc: Eitan Zahavi > Subject: > [PATCH]OpenSM/osm_sa_mcmember_record.c::OpenSM/osm_sa_mcmember_record > .c Return 0when no table can be allocated > > OpenSM/osm_sa_mcmember_record.c Return 0 when no used MLID table can be > allocated > > Signed-off-by: Hal Rosenstock > > Index: opensm/osm_sa_mcmember_record.c > =================================================================== > --- opensm/osm_sa_mcmember_record.c (revision 7441) > +++ opensm/osm_sa_mcmember_record.c (working copy) > @@ -326,6 +326,8 @@ __get_new_mlid( > /* track all used mlids in the array (by mlid index) */ > used_mlids_array = > (uint8_t *)cl_zalloc(sizeof(uint8_t)*max_num_mlids); > + if (!used_mlids_array) > + return 0; > > /* scan all available multicast groups in the DB and fill in the table */ > while( p_mgrp != (osm_mgrp_t*)cl_qmap_end( &p_subn->mgrp_mlid_tbl ) ) > From rdreier at cisco.com Wed May 24 08:37:59 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 08:37:59 -0700 Subject: [openib-general] krping test utility References: <309a667c0605232129x66d8cc5ek1bd05d22c7e1db7@mail.gmail.com> Message-ID: Devesh> Hello all, In the krping test utility get_dma_mr is called Devesh> with access premissions Devesh> IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE|IB_ACCESS_REMOTE_READ, Devesh> But the lkey we get from get_dma_mr is similar to reserved Devesh> lkey with which only Local operations are allowed, but Devesh> here it seems violating that statement. No, ib_get_dma_mr() returns an L_Key/R_Key with exactly the permissions requested. - R. From halr at voltaire.com Wed May 24 08:31:33 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 24 May 2006 11:31:33 -0400 Subject: [openib-general] RE: [PATCH] OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_tFix NULL ptr issue In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30236870B@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30236870B@mtlexch01.mtl.com> Message-ID: <1148484692.4470.121786.camel@hal.voltaire.com> On Wed, 2006-05-24 at 11:33, Eitan Zahavi wrote: > Hi Hal > > You missed the line that asserts on null p_next_step just before the > code you changed . But isn't CL_ASSERT a debug compile time thing so it's needed when it's built without debug ? -- Hal > . > > EZ > > Eitan Zahavi > Senior Engineering Director, Software Architect > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Wednesday, May 24, 2006 4:34 PM > > To: openib-general at openib.org > > Cc: Eitan Zahavi > > Subject: [PATCH] > OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_tFix > > NULL ptr issue > > > > OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_t Fix NULL ptr > > issue > > > > Signed-off-by: Hal Rosenstock > > > > Index: opensm/osm_ucast_updn.c > > =================================================================== > > --- opensm/osm_ucast_updn.c (revision 7435) > > +++ opensm/osm_ucast_updn.c (working copy) > > @@ -121,10 +121,12 @@ __updn_create_updn_next_step_t(IN updn_s > > p_next_step = (updn_next_step_t*) cl_zalloc(sizeof(*p_next_step)); > > CL_ASSERT (p_next_step != NULL); > > > > - p_next_step->state = state; > > - p_next_step->p_sw = p_sw; > > + if (p_next_step) > > + { > > + p_next_step->state = state; > > + p_next_step->p_sw = p_sw; > > + } > > return p_next_step; > > - > > } > > > > > /********************************************************************** > > > From dotanb at mellanox.co.il Wed May 24 08:51:57 2006 From: dotanb at mellanox.co.il (Dotan Barak) Date: Wed, 24 May 2006 18:51:57 +0300 Subject: [openib-general] [IPoIB] executing iperf over IPoIB causes to multicast (IP) packets to be recieved out-of-order Message-ID: <200605241851.57398.dotanb@mellanox.co.il> Hi. when executing iperf over IPoIB with multicast IP packets, there are some packets that being received out of order. Here are my machine attributes: ************************************************************* Host Architecture : x86_64 Linux Distribution: Fedora Core release 4 (Stentz) Kernel Version : 2.6.11-1.1369_FC4smp Memory size : 4071672 kB Driver Version : openib_gen2-20060524-1700 (REV=7460) HCA ID(s) : mthca0 HCA model(s) : 25208 FW version(s) : 4.7.600 Board(s) : MT_00A0010001 ************************************************************* here are the iperf command line: sender command: ./iperf -c 10.4.3.86 -u -T 3 -t 400 -i 2 -b 1000M -l 100 receiver command: ./iperf -s -u -B 10.4.3.86 -i 2 here is the output of the receiver execution: Execute the receiver over IPoIB ------------------------------------------------------------ Server listening on UDP port 5001 Binding to local address 11.4.3.86 Receiving 1470 byte datagrams UDP buffer size: 132 KByte (default) ------------------------------------------------------------ [ 3] local 11.4.3.86 port 5001 connected with 11.4.3.87 port 33334 [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 3] 0.0- 2.0 sec 34.3 MBytes 144 Mbits/sec 0.003 ms 232/359959 (0.064%) [ 3] 0.0- 2.0 sec 45 datagrams received out-of-order [ 3] 2.0- 4.0 sec 34.3 MBytes 144 Mbits/sec 0.004 ms 0/359391 (0%) [ 3] 2.0- 4.0 sec 43 datagrams received out-of-order [ 3] 4.0- 6.0 sec 34.3 MBytes 144 Mbits/sec 0.003 ms 0/359470 (0%) [ 3] 4.0- 6.0 sec 31 datagrams received out-of-order [ 3] 6.0- 8.0 sec 34.3 MBytes 144 Mbits/sec 0.003 ms 0/359195 (0%) [ 3] 6.0- 8.0 sec 38 datagrams received out-of-order [ 3] 8.0-10.0 sec 34.3 MBytes 144 Mbits/sec 0.003 ms 0/359379 (0%) [ 3] 8.0-10.0 sec 48 datagrams received out-of-order Waiting for server threads to complete. Interrupt again to force quit. [ 3] 0.0-21.2 sec 364 MBytes 144 Mbits/sec 0.005 ms 232/3813508 (0.0061%) [ 3] 0.0-21.2 sec 595 datagrams received out-of-order Execute the receiver over IP ------------------------------------------------------------ Server listening on UDP port 5001 Binding to local address 10.4.3.86 Receiving 1470 byte datagrams UDP buffer size: 132 KByte (default) ------------------------------------------------------------ [ 3] local 10.4.3.86 port 5001 connected with 10.4.3.87 port 33334 [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 3] 0.0- 2.0 sec 14.4 MBytes 60.2 Mbits/sec 0.034 ms 355/150976 (0.24%) [ 3] 2.0- 4.0 sec 14.4 MBytes 60.2 Mbits/sec 0.010 ms 0/150599 (0%) [ 3] 4.0- 6.0 sec 14.4 MBytes 60.2 Mbits/sec 0.012 ms 0/150597 (0%) [ 3] 6.0- 8.0 sec 14.4 MBytes 60.2 Mbits/sec 0.011 ms 0/150599 (0%) [ 3] 8.0-10.0 sec 14.4 MBytes 60.2 Mbits/sec 0.009 ms 0/150598 (0%) [ 3] 10.0-12.0 sec 14.4 MBytes 60.2 Mbits/sec 0.010 ms 0/150584 (0%) [ 3] 12.0-14.0 sec 14.4 MBytes 60.2 Mbits/sec 0.009 ms 0/150583 (0%) [ 3] 14.0-16.0 sec 14.4 MBytes 60.2 Mbits/sec 0.019 ms 0/150601 (0%) [ 3] 16.0-18.0 sec 14.4 MBytes 60.2 Mbits/sec 0.009 ms 0/150576 (0%) [ 3] 18.0-20.0 sec 14.4 MBytes 60.2 Mbits/sec 0.021 ms 0/150583 (0%) [ 3] 20.0-22.0 sec 14.4 MBytes 60.2 Mbits/sec 0.010 ms 0/150591 (0%) Waiting for server threads to complete. Interrupt again to force quit. [ 3] 22.0-24.0 sec 14.4 MBytes 60.2 Mbits/sec 0.028 ms 0/150582 (0%) did anyone notice this issue before? thanks Dotan From bugzilla-daemon at openib.org Wed May 24 09:04:26 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Wed, 24 May 2006 09:04:26 -0700 (PDT) Subject: [openib-general] [Bug 95] New: Stack seems to reorder multicast entries Message-ID: <20060524160426.C829322859E@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=95 Summary: Stack seems to reorder multicast entries Product: OpenFabrics Linux Version: 1.0rc3 Platform: All OS/Version: Other Status: NEW Severity: normal Priority: P2 Component: IPoIB AssignedTo: bugzilla at openib.org ReportedBy: jim at pantasys.com When running iperf to test multicast performance, we seen many out-of-order errors when the receive node is running OFED RC3 or RC4. Case 1: IBGD (or gen2) sending to gen2 receive node Receive node: - iperf -s -u -B 224.0.6.66 -i 2 Send node: - Add multicast route to interface - iperf -c 224.0.6.66 -u -T 3 -t 400 -i 2 -b 100M -l 100 The receive node should report no lost or out of order frames. Case 2: same sender, using OFED RC3 or RC4 receive node Run the same test as above. You will see many multicast frames report out of order. We have reproduced this problem in two different sites. In all cases we are using SUSE10 as the OS hosting the receive stack (gen2 or OFED). ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rdreier at cisco.com Wed May 24 08:59:23 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 08:59:23 -0700 Subject: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: <20060524133728.GN21266@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 24 May 2006 16:37:28 +0300") References: <20060524133728.GN21266@mellanox.co.il> Message-ID: What is this test doing exactly? One thing that seems strange is that I don't see ib_multicast here: > Modules linked in: ib_sa ib_uverbs ib_umad ib_mthca ib_mad ib_core It looks like ib_multicast may be unloaded without canceling all of its SA queries, and ib_sa ends up calling back into an unloaded module. - R. From halr at voltaire.com Wed May 24 08:59:15 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 24 May 2006 11:59:15 -0400 Subject: [openib-general] [IPoIB] executing iperf over IPoIB causes to multicast (IP) packets to be recieved out-of-order In-Reply-To: <200605241851.57398.dotanb@mellanox.co.il> References: <200605241851.57398.dotanb@mellanox.co.il> Message-ID: <1148486354.4470.122283.camel@hal.voltaire.com> On Wed, 2006-05-24 at 11:51, Dotan Barak wrote: > Hi. > > when executing iperf over IPoIB with multicast IP packets, there are some packets that being received out of order. With IPmc, there is no ordering guarantee. I'm not sure what your network configuration is exactly and whether this could be related or whether a dropped packet is being reported as out of order. -- Hal > Here are my machine attributes: > ************************************************************* > Host Architecture : x86_64 > Linux Distribution: Fedora Core release 4 (Stentz) > Kernel Version : 2.6.11-1.1369_FC4smp > Memory size : 4071672 kB > Driver Version : openib_gen2-20060524-1700 (REV=7460) > HCA ID(s) : mthca0 > HCA model(s) : 25208 > FW version(s) : 4.7.600 > Board(s) : MT_00A0010001 > ************************************************************* > > here are the iperf command line: > sender command: ./iperf -c 10.4.3.86 -u -T 3 -t 400 -i 2 -b 1000M -l 100 > receiver command: ./iperf -s -u -B 10.4.3.86 -i 2 > > here is the output of the receiver execution: > > Execute the receiver over IPoIB > ------------------------------------------------------------ > Server listening on UDP port 5001 > Binding to local address 11.4.3.86 > Receiving 1470 byte datagrams > UDP buffer size: 132 KByte (default) > ------------------------------------------------------------ > [ 3] local 11.4.3.86 port 5001 connected with 11.4.3.87 port 33334 > [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams > [ 3] 0.0- 2.0 sec 34.3 MBytes 144 Mbits/sec 0.003 ms 232/359959 (0.064%) > [ 3] 0.0- 2.0 sec 45 datagrams received out-of-order > [ 3] 2.0- 4.0 sec 34.3 MBytes 144 Mbits/sec 0.004 ms 0/359391 (0%) > [ 3] 2.0- 4.0 sec 43 datagrams received out-of-order > [ 3] 4.0- 6.0 sec 34.3 MBytes 144 Mbits/sec 0.003 ms 0/359470 (0%) > [ 3] 4.0- 6.0 sec 31 datagrams received out-of-order > [ 3] 6.0- 8.0 sec 34.3 MBytes 144 Mbits/sec 0.003 ms 0/359195 (0%) > [ 3] 6.0- 8.0 sec 38 datagrams received out-of-order > [ 3] 8.0-10.0 sec 34.3 MBytes 144 Mbits/sec 0.003 ms 0/359379 (0%) > [ 3] 8.0-10.0 sec 48 datagrams received out-of-order > Waiting for server threads to complete. Interrupt again to force quit. > [ 3] 0.0-21.2 sec 364 MBytes 144 Mbits/sec 0.005 ms 232/3813508 (0.0061%) > [ 3] 0.0-21.2 sec 595 datagrams received out-of-order > > Execute the receiver over IP > ------------------------------------------------------------ > Server listening on UDP port 5001 > Binding to local address 10.4.3.86 > Receiving 1470 byte datagrams > UDP buffer size: 132 KByte (default) > ------------------------------------------------------------ > [ 3] local 10.4.3.86 port 5001 connected with 10.4.3.87 port 33334 > [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams > [ 3] 0.0- 2.0 sec 14.4 MBytes 60.2 Mbits/sec 0.034 ms 355/150976 (0.24%) > [ 3] 2.0- 4.0 sec 14.4 MBytes 60.2 Mbits/sec 0.010 ms 0/150599 (0%) > [ 3] 4.0- 6.0 sec 14.4 MBytes 60.2 Mbits/sec 0.012 ms 0/150597 (0%) > [ 3] 6.0- 8.0 sec 14.4 MBytes 60.2 Mbits/sec 0.011 ms 0/150599 (0%) > [ 3] 8.0-10.0 sec 14.4 MBytes 60.2 Mbits/sec 0.009 ms 0/150598 (0%) > [ 3] 10.0-12.0 sec 14.4 MBytes 60.2 Mbits/sec 0.010 ms 0/150584 (0%) > [ 3] 12.0-14.0 sec 14.4 MBytes 60.2 Mbits/sec 0.009 ms 0/150583 (0%) > [ 3] 14.0-16.0 sec 14.4 MBytes 60.2 Mbits/sec 0.019 ms 0/150601 (0%) > [ 3] 16.0-18.0 sec 14.4 MBytes 60.2 Mbits/sec 0.009 ms 0/150576 (0%) > [ 3] 18.0-20.0 sec 14.4 MBytes 60.2 Mbits/sec 0.021 ms 0/150583 (0%) > [ 3] 20.0-22.0 sec 14.4 MBytes 60.2 Mbits/sec 0.010 ms 0/150591 (0%) > Waiting for server threads to complete. Interrupt again to force quit. > [ 3] 22.0-24.0 sec 14.4 MBytes 60.2 Mbits/sec 0.028 ms 0/150582 (0%) > > > did anyone notice this issue before? > > thanks > Dotan > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Wed May 24 09:16:36 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 19:16:36 +0300 Subject: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: References: <20060524133728.GN21266@mellanox.co.il> Message-ID: <20060524161636.GB21266@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 > > What is this test doing exactly? > > One thing that seems strange is that I don't see ib_multicast here: > > > Modules linked in: ib_sa ib_uverbs ib_umad ib_mthca ib_mad ib_core > > It looks like ib_multicast may be unloaded without canceling all of > its SA queries, and ib_sa ends up calling back into an unloaded > module. I think its not a specific test - Ali was simply doing things and unloaded the driver in process. -- MST From mst at mellanox.co.il Wed May 24 09:22:42 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 19:22:42 +0300 Subject: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: References: <20060524133728.GN21266@mellanox.co.il> Message-ID: <20060524162242.GC21266@mellanox.co.il> Quoting r. Roland Dreier : > It looks like ib_multicast may be unloaded without canceling all of > its SA queries, and ib_sa ends up calling back into an unloaded > module. Looks like this. Note its not enough to cancel query, you must wait for it to complete as well. Cancelling outstanding queries properly is one of the harder things in ipoib multicast code. -- MST From mst at mellanox.co.il Wed May 24 09:29:00 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 19:29:00 +0300 Subject: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: References: <20060524133728.GN21266@mellanox.co.il> Message-ID: <20060524162900.GA24314@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 > > What is this test doing exactly? Ali here says he's just bringing the ib0 up, and then unloads the module. Linux is sending broadcasts out all the time, so ... -- MST From mshefty at ichips.intel.com Wed May 24 09:44:59 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 24 May 2006 09:44:59 -0700 Subject: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: <20060524162242.GC21266@mellanox.co.il> References: <20060524133728.GN21266@mellanox.co.il> <20060524162242.GC21266@mellanox.co.il> Message-ID: <44748D8B.1000508@ichips.intel.com> Michael S. Tsirkin wrote: > Looks like this. Note its not enough to cancel query, you must wait for it > to complete as well. > Cancelling outstanding queries properly is one of the harder things in ipoib > multicast code. I will look at this. The multicast module should have handled this. What modules were unloaded and being unloaded when this occurred? - Sean From mst at mellanox.co.il Wed May 24 10:01:18 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 20:01:18 +0300 Subject: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: <44748D8B.1000508@ichips.intel.com> References: <20060524133728.GN21266@mellanox.co.il> <20060524162242.GC21266@mellanox.co.il> <44748D8B.1000508@ichips.intel.com> Message-ID: <20060524170118.GF21266@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 > > Michael S. Tsirkin wrote: > >Looks like this. Note its not enough to cancel query, you must wait for it > >to complete as well. > >Cancelling outstanding queries properly is one of the harder things in > >ipoib > >multicast code. > > I will look at this. The multicast module should have handled this. > > What modules were unloaded and being unloaded when this occurred? I think this can be seen in oops. -- MST From mst at mellanox.co.il Wed May 24 10:02:13 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 20:02:13 +0300 Subject: [openib-general] Re: [IPoIB] executing iperf over IPoIB causes to multicast (IP) packets to be recieved out-of-order In-Reply-To: <1148486354.4470.122283.camel@hal.voltaire.com> References: <200605241851.57398.dotanb@mellanox.co.il> <1148486354.4470.122283.camel@hal.voltaire.com> Message-ID: <20060524170213.GG21266@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: [IPoIB] executing iperf over IPoIB causes to multicast (IP) packets to be recieved out-of-order > > On Wed, 2006-05-24 at 11:51, Dotan Barak wrote: > > Hi. > > > > when executing iperf over IPoIB with multicast IP packets, there are some packets that being received out of order. > > With IPmc, there is no ordering guarantee. I'm not sure what your > network configuration is exactly and whether this could be related or > whether a dropped packet is being reported as out of order. This is on back to back. I don't think hardware does this - this is ipoib software thing. -- MST From rdreier at cisco.com Wed May 24 10:01:28 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 10:01:28 -0700 Subject: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: <44748D8B.1000508@ichips.intel.com> (Sean Hefty's message of "Wed, 24 May 2006 09:44:59 -0700") References: <20060524133728.GN21266@mellanox.co.il> <20060524162242.GC21266@mellanox.co.il> <44748D8B.1000508@ichips.intel.com> Message-ID: It's not completely trivial to reproduce. I tried loading and unloading ib_ipoib a few times, then I tried to load ib_ipoib and unload ib_mthca, I tried pinging in between loading and unloading, and I didn't see any crashes. - R. From rdreier at cisco.com Wed May 24 10:02:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 10:02:07 -0700 Subject: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: <20060524170118.GF21266@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 24 May 2006 20:01:18 +0300") References: <20060524133728.GN21266@mellanox.co.il> <20060524162242.GC21266@mellanox.co.il> <44748D8B.1000508@ichips.intel.com> <20060524170118.GF21266@mellanox.co.il> Message-ID: Michael> I think this can be seen in oops. Not which modules were being unloaded... From rdreier at cisco.com Wed May 24 10:03:38 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 10:03:38 -0700 Subject: [openib-general] Re: [IPoIB] executing iperf over IPoIB causes to multicast (IP) packets to be recieved out-of-order In-Reply-To: <20060524170213.GG21266@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 24 May 2006 20:02:13 +0300") References: <200605241851.57398.dotanb@mellanox.co.il> <1148486354.4470.122283.camel@hal.voltaire.com> <20060524170213.GG21266@mellanox.co.il> Message-ID: Michael> This is on back to back. I don't think hardware does this Michael> - this is ipoib software thing. It seems to be in the stack above IPoIB. IPoIB posts sends and collects receive completions exactly in the order they happen. From mshefty at ichips.intel.com Wed May 24 10:06:02 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 24 May 2006 10:06:02 -0700 Subject: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: <20060524170118.GF21266@mellanox.co.il> References: <20060524133728.GN21266@mellanox.co.il> <20060524162242.GC21266@mellanox.co.il> <44748D8B.1000508@ichips.intel.com> <20060524170118.GF21266@mellanox.co.il> Message-ID: <4474927A.5070303@ichips.intel.com> Michael S. Tsirkin wrote: >>What modules were unloaded and being unloaded when this occurred? > > I think this can be seen in oops. I'm guessing that ipoib and ib_multicast were unloaded, and the crash occurred unloading ib_sa. Is this correct? - Sean From mshefty at ichips.intel.com Wed May 24 10:10:31 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 24 May 2006 10:10:31 -0700 Subject: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: <20060524162900.GA24314@mellanox.co.il> References: <20060524133728.GN21266@mellanox.co.il> <20060524162900.GA24314@mellanox.co.il> Message-ID: <44749387.3090908@ichips.intel.com> Michael S. Tsirkin wrote: > Ali here says he's just bringing the ib0 up, and then unloads the module. > Linux is sending broadcasts out all the time, so ... Were ipoib and ib_multicast already loaded, and he was just doing an ifconfig? Did he unload both ipoib and ib_multicast, or just ipoib? - Sean From Don.Dhondt at Bull.com Wed May 24 10:28:19 2006 From: Don.Dhondt at Bull.com (Don.Dhondt at Bull.com) Date: Wed, 24 May 2006 10:28:19 -0700 Subject: [openib-general] Running MVAPICH2 with SLURM Process Manager Message-ID: We are running mvapich2-0.9.3-RC0 with OFED1.0 RC4 and have had good results. We would like to use the SLURM resource manager with this combination rather than MPD but it does not appear to be one of the choices avaliable. Does anyone have any experience in this area? ./configure --prefix=${PREFIX} ${MULTI_THREAD} \ --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd \ --disable-romio --without-mpe 2>&1 |tee config-mine.log --with-pm=mpd We would have liked to have seen an option for slurm. Regards, Donald Dhondt GCOS 8 Communications Solutions Project Manager Bull HN Information Systems Inc. 13430 N. Black Canyon Hwy., Phoenix, AZ 85029 Work (602) 862-5245 Fax (602) 862-4290 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Wed May 24 10:30:19 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 20:30:19 +0300 Subject: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: <44749387.3090908@ichips.intel.com> References: <20060524133728.GN21266@mellanox.co.il> <20060524162900.GA24314@mellanox.co.il> <44749387.3090908@ichips.intel.com> Message-ID: <20060524173019.GA25186@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 > > Michael S. Tsirkin wrote: > >Ali here says he's just bringing the ib0 up, and then unloads the module. > >Linux is sending broadcasts out all the time, so ... > > Were ipoib and ib_multicast already loaded, and he was just doing an > ifconfig? Did he unload both ipoib and ib_multicast, or just ipoib? modprobe -r on ipoib which I think unloads ipoib and then multicast. -- MST From mst at mellanox.co.il Wed May 24 10:36:04 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 20:36:04 +0300 Subject: [openib-general] Re: Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: References: <20060524133728.GN21266@mellanox.co.il> <20060524162242.GC21266@mellanox.co.il> <44748D8B.1000508@ichips.intel.com> Message-ID: <20060524173604.GB25186@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 > > It's not completely trivial to reproduce. I tried loading and > unloading ib_ipoib a few times, then I tried to load ib_ipoib and > unload ib_mthca, I tried pinging in between loading and unloading, and > I didn't see any crashes. Maybe SM was down. -- MST From halr at voltaire.com Wed May 24 10:44:48 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 24 May 2006 13:44:48 -0400 Subject: [openib-general] [PATCH] OpenSM/memory allocation: Deprecate cl_malloc/zalloc/free and use malloc/free directly Message-ID: <1148492674.4470.124126.camel@hal.voltaire.com> OpenSM/memory allocation: Deprecate cl_malloc/zalloc/free and friends, and use malloc/free directly Signed-off-by: Hal Rosenstock Index: osm/include/opensm/osm_port.h =================================================================== --- osm/include/opensm/osm_port.h (revision 7470) +++ osm/include/opensm/osm_port.h (working copy) @@ -50,9 +50,9 @@ #ifndef _OSM_PORT_H_ #define _OSM_PORT_H_ +#include #include #include -#include #include #include #include @@ -1374,7 +1374,7 @@ osm_port_delete( IN OUT osm_port_t** const pp_port ) { osm_port_destroy( *pp_port ); - cl_free( *pp_port ); + free( *pp_port ); *pp_port = NULL; } /* Index: osm/include/opensm/osm_rand_fwd_tbl.h =================================================================== --- osm/include/opensm/osm_rand_fwd_tbl.h (revision 7470) +++ osm/include/opensm/osm_rand_fwd_tbl.h (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -50,8 +50,8 @@ #ifndef _OSM_RAND_FWD_TBL_H_ #define _OSM_RAND_FWD_TBL_H_ +#include #include -#include #include #ifdef __cplusplus @@ -125,7 +125,7 @@ osm_rand_tbl_delete( /* TO DO - This is a place holder function only! */ - cl_free( *pp_tbl ); + free( *pp_tbl ); *pp_tbl = NULL; } /* Index: osm/include/complib/cl_memory.h =================================================================== --- osm/include/complib/cl_memory.h (revision 7470) +++ osm/include/complib/cl_memory.h (working copy) @@ -96,7 +96,7 @@ BEGIN_C_DECLS * * SYNOPSIS */ -void +void __attribute__((deprecated)) __cl_mem_track( IN const boolean_t start ); /* @@ -135,7 +135,7 @@ __cl_mem_track( * * SYNOPSIS */ -void +void __attribute__((deprecated)) cl_mem_display( void ); /* * RETURN VALUE @@ -162,7 +162,7 @@ cl_mem_display( void ); * * SYNOPSIS */ -boolean_t +boolean_t __attribute__((deprecated)) cl_mem_check( void ); /* * RETURN VALUE @@ -189,7 +189,7 @@ cl_mem_check( void ); * * SYNOPSIS */ -void* +void __attribute__((deprecated)) * __cl_malloc_trk( IN const char* const p_file_name, IN const int32_t line_num, @@ -232,7 +232,7 @@ __cl_malloc_trk( * * SYNOPSIS */ -void* +void __attribute__((deprecated)) * __cl_zalloc_trk( IN const char* const p_file_name, IN const int32_t line_num, @@ -274,7 +274,7 @@ __cl_zalloc_trk( * * SYNOPSIS */ -void* +void __attribute__((deprecated)) * __cl_malloc_ntrk( IN const size_t size ); /* @@ -308,7 +308,7 @@ __cl_malloc_ntrk( * * SYNOPSIS */ -void* +void __attribute__((deprecated)) * __cl_zalloc_ntrk( IN const size_t bytes ); /* @@ -341,7 +341,7 @@ __cl_zalloc_ntrk( * * SYNOPSIS */ -void +void __attribute__((deprecated)) __cl_free_trk( IN const char* const p_file_name, IN const int32_t line_num, @@ -384,7 +384,7 @@ __cl_free_trk( * * SYNOPSIS */ -void +void __attribute__((deprecated)) __cl_free_ntrk( IN void* const p_memory ); /* @@ -418,7 +418,7 @@ __cl_free_ntrk( * * SYNOPSIS */ -void* +void __attribute__((deprecated)) * cl_malloc( IN const size_t size ); /* @@ -449,7 +449,7 @@ cl_malloc( * * SYNOPSIS */ -void* +void __attribute__((deprecated)) * cl_zalloc( IN const size_t size ); /* @@ -480,7 +480,7 @@ cl_zalloc( * * SYNOPSIS */ -void +void __attribute__((deprecated)) cl_free( IN void* const p_memory ); /* Index: osm/include/complib/cl_memtrack.h =================================================================== --- osm/include/complib/cl_memtrack.h (revision 7470) +++ osm/include/complib/cl_memtrack.h (working copy) @@ -79,7 +79,7 @@ typedef struct _cl_mem_tracker /* List to manage free headers. */ cl_qlist_t free_hdr_list; -} cl_mem_tracker_t; +} cl_mem_tracker_t __attribute__((deprecated)); #define FILE_NAME_LENGTH 64 @@ -93,7 +93,7 @@ typedef struct _cl_malloc_hdr char file_name[FILE_NAME_LENGTH]; int32_t line_num; -} cl_malloc_hdr_t; +} cl_malloc_hdr_t __attribute__((deprecated)); extern cl_mem_tracker_t *gp_mem_tracker; Index: osm/include/vendor/osm_vendor_mlx_svc.h =================================================================== --- osm/include/vendor/osm_vendor_mlx_svc.h (revision 7470) +++ osm/include/vendor/osm_vendor_mlx_svc.h (working copy) @@ -40,7 +40,6 @@ #include #include #include -#include #include #ifdef __cplusplus @@ -191,9 +190,10 @@ osmv_mad_copy(IN const ib_mad_t *p_mad) uint8_t *p_copy; CL_ASSERT(p_mad); - p_copy = cl_zalloc(MAD_BLOCK_SIZE); + p_copy = malloc(MAD_BLOCK_SIZE); if (NULL != p_copy) { + memset(p_copy, 0, MAD_BLOCK_SIZE); memcpy(p_copy, p_mad, MAD_BLOCK_SIZE); } Index: osm/libvendor/osm_vendor_mlx_ts.c =================================================================== --- osm/libvendor/osm_vendor_mlx_ts.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_ts.c (working copy) @@ -51,6 +51,7 @@ #include #include #include +#include #include #include @@ -58,7 +59,6 @@ #include #include -#include #include typedef struct _osmv_TOPSPIN_transport_mgr_ { @@ -187,7 +187,7 @@ osmv_transport_init(IN osm_bind_info_t * char device_file[16]; int device_fd; int ts_ioctl_ret; - osmv_TOPSPIN_transport_mgr_t* p_mgr = cl_zalloc(sizeof(osmv_TOPSPIN_transport_mgr_t)); + osmv_TOPSPIN_transport_mgr_t* p_mgr = malloc(sizeof(osmv_TOPSPIN_transport_mgr_t)); int qpn; if (!p_mgr) @@ -195,6 +195,8 @@ osmv_transport_init(IN osm_bind_info_t * return IB_INSUFFICIENT_MEMORY; } + memset(p_mgr, 0, sizeof(osmv_TOPSPIN_transport_mgr_t)); + /* open TopSpin file device */ /* HACK: assume last char in hostid is the HCA index */ sprintf(device_file, "/dev/ts_ua%u", hca_idx); @@ -414,7 +416,7 @@ osmv_transport_done(IN const osm_bind_ha /* seems the only way to abort a blocking read is to make it read something */ __osm_transport_gen_dummy_mad(p_bo); cl_thread_destroy(&(p_tpot_mgr->receiver)); - cl_free(p_tpot_mgr); + free(p_tpot_mgr); } static void Index: osm/libvendor/osm_vendor_mtl_transaction_mgr.c =================================================================== --- osm/libvendor/osm_vendor_mtl_transaction_mgr.c (revision 7470) +++ osm/libvendor/osm_vendor_mtl_transaction_mgr.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -40,7 +40,7 @@ #endif /* HAVE_CONFIG_H */ #include -#include +#include #include #include #include @@ -279,7 +279,7 @@ osm_transaction_mgr_init( IN osm_vendor_ CL_ASSERT( p_vend->p_transaction_mgr == NULL ); (osm_transaction_mgr_t*)p_vend->p_transaction_mgr = - (osm_transaction_mgr_t * ) cl_malloc( sizeof( osm_transaction_mgr_t ) ); + (osm_transaction_mgr_t * ) malloc( sizeof( osm_transaction_mgr_t ) ); trans_mgr_p = (osm_transaction_mgr_t*)p_vend->p_transaction_mgr; @@ -289,12 +289,12 @@ osm_transaction_mgr_init( IN osm_vendor_ /* initialize the qlist */ trans_mgr_p->madw_reqs_list_p = - ( cl_qlist_t * ) cl_malloc( sizeof( cl_qlist_t ) ); + ( cl_qlist_t * ) malloc( sizeof( cl_qlist_t ) ); cl_qlist_init( trans_mgr_p->madw_reqs_list_p); /* initialize the qmap */ trans_mgr_p->madw_by_tid_map_p = - ( cl_qmap_t * ) cl_malloc( sizeof( cl_qmap_t ) ); + ( cl_qmap_t * ) malloc( sizeof( cl_qmap_t ) ); cl_qmap_init( trans_mgr_p->madw_by_tid_map_p ); /* create the timer used by the madw_req_list */ @@ -352,12 +352,12 @@ osm_transaction_mgr_destroy ( IN osm_ven p_map_item = &(osm_madw_req_p->map_item); cl_qmap_remove_item(trans_mgr_p->madw_by_tid_map_p, p_map_item); /* free the item */ - cl_free(osm_madw_req_p); + free(osm_madw_req_p); p_list_item = cl_qlist_remove_head(trans_mgr_p->madw_reqs_list_p); } /* free the qlist and qmap */ - cl_free(trans_mgr_p->madw_reqs_list_p ); - cl_free(trans_mgr_p->madw_by_tid_map_p ); + free(trans_mgr_p->madw_reqs_list_p ); + free(trans_mgr_p->madw_by_tid_map_p ); /* reliease and destroy the lock */ cl_spinlock_release( &trans_mgr_p->transaction_mgr_lock ); cl_spinlock_destroy( &(trans_mgr_p->transaction_mgr_lock) ); @@ -365,7 +365,7 @@ osm_transaction_mgr_destroy ( IN osm_ven cl_timer_trim(&trans_mgr_p->madw_list_timer, 1); cl_timer_destroy( &trans_mgr_p->madw_list_timer ); /* free the transaction_manager object */ - cl_free(trans_mgr_p); + free(trans_mgr_p); trans_mgr_p = NULL; } OSM_LOG_EXIT( p_vend->p_log ); @@ -398,7 +398,7 @@ osm_transaction_mgr_insert_madw( IN osm_ timeout = (uint64_t)(p_vend->timeout) * 1000; /* change the miliseconds value of timeout to microseconds. */ waking_time = timeout + cl_get_time_stamp(); - osm_madw_req_p = (osm_madw_req_t *)cl_malloc( sizeof (osm_madw_req_t) ); + osm_madw_req_p = (osm_madw_req_t *)malloc( sizeof (osm_madw_req_t) ); osm_madw_req_p->p_madw = p_madw; osm_madw_req_p->waking_time = waking_time; @@ -476,7 +476,7 @@ osm_transaction_mgr_erase_madw( IN osm_v "Removed TID:<0x%"PRIx64">.\n", p_mad->trans_id ); /* free the item */ - cl_free(osm_madw_req_p); + free(osm_madw_req_p); } else { Index: osm/libvendor/osm_pkt_randomizer.c =================================================================== --- osm/libvendor/osm_pkt_randomizer.c (revision 7470) +++ osm/libvendor/osm_pkt_randomizer.c (working copy) @@ -58,8 +58,6 @@ #include #endif -#include - /********************************************************************** * Return TRUE if the path is in a fault path, and FALSE otherwise. * By in a fault path the meaning is that there is a path in the fault @@ -284,7 +282,7 @@ osm_pkt_randomizer_init( OSM_LOG_ENTER( p_log, osm_pkt_randomizer_init ); - *pp_pkt_randomizer = cl_zalloc( sizeof( osm_pkt_randomizer_t ) ); + *pp_pkt_randomizer = malloc( sizeof( osm_pkt_randomizer_t ) ); if ( *pp_pkt_randomizer == NULL ) { res = IB_INSUFFICIENT_MEMORY; @@ -317,14 +315,17 @@ osm_pkt_randomizer_init( /* allocate the fault_dr_paths variable */ /* It is the number of the paths that will be saved as fault = osm_pkt_num_unstable_links */ - (*pp_pkt_randomizer)->fault_dr_paths = cl_zalloc( sizeof( osm_dr_path_t ) * - (*pp_pkt_randomizer)->osm_pkt_num_unstable_links ); + (*pp_pkt_randomizer)->fault_dr_paths = malloc( sizeof( osm_dr_path_t ) * + (*pp_pkt_randomizer)->osm_pkt_num_unstable_links ); if ( (*pp_pkt_randomizer)->fault_dr_paths == NULL ) { res = IB_INSUFFICIENT_MEMORY; goto Exit; } + memset( (*pp_pkt_randomizer)->fault_dr_paths, 0, + sizeof( osm_dr_path_t ) * (*pp_pkt_randomizer)->osm_pkt_num_unstable_links ); + Exit: OSM_LOG_EXIT( p_log ); return (res); @@ -341,8 +342,8 @@ osm_pkt_randomizer_destroy( if ( *pp_pkt_randomizer != NULL ) { - cl_free( (*pp_pkt_randomizer)->fault_dr_paths ); - cl_free( *pp_pkt_randomizer ); + free( (*pp_pkt_randomizer)->fault_dr_paths ); + free( *pp_pkt_randomizer ); } OSM_LOG_EXIT( p_log ); } Index: osm/libvendor/osm_vendor_mlx_hca.c =================================================================== --- osm/libvendor/osm_vendor_mlx_hca.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_hca.c (working copy) @@ -46,8 +46,8 @@ #include #include #include -#include #include +#include #include /******************************************************************************** @@ -130,8 +130,8 @@ __osm_vendor_get_ca_ids( IN osm_vendor_t /* allocate and really call - user of this function needs to deallocate it */ *p_hca_ids = - ( VAPI_hca_id_t * ) cl_malloc( *p_num_guids * - sizeof( VAPI_hca_id_t ) ); + ( VAPI_hca_id_t * ) malloc( *p_num_guids * + sizeof( VAPI_hca_id_t ) ); /* now call it really */ vapi_res = EVAPI_list_hcas( *p_num_guids, p_num_guids, *p_hca_ids ); @@ -221,15 +221,15 @@ __osm_ca_info_init( IN osm_vendor_t * co memcpy( &( p_ca_info->guid ), hca_cap.node_guid, 8 * sizeof( u_int8_t ) ); p_ca_info->attr_size = 1; p_ca_info->p_attr = - ( ib_ca_attr_t * ) cl_malloc( sizeof( ib_ca_attr_t ) ); + ( ib_ca_attr_t * ) malloc( sizeof( ib_ca_attr_t ) ); memcpy( &( p_ca_info->p_attr->ca_guid ), hca_cap.node_guid, 8 * sizeof( u_int8_t ) ); /* now obtain the attributes of the ports */ p_ca_info->p_attr->num_ports = hca_cap.phys_port_num; p_ca_info->p_attr->p_port_attr = - ( ib_port_attr_t * ) cl_malloc( hca_cap.phys_port_num * - sizeof( ib_port_attr_t ) ); + ( ib_port_attr_t * ) malloc( hca_cap.phys_port_num * + sizeof( ib_port_attr_t ) ); for( port_num = 0; port_num < p_ca_info->p_attr->num_ports; port_num++ ) { @@ -250,7 +250,7 @@ __osm_ca_info_init( IN osm_vendor_t * co VAPI_query_hca_gid_tbl( hca_hndl, port_num + 1, 0, &maxNumGids, NULL ); p_port_gid = - ( IB_gid_t * ) cl_malloc( maxNumGids * sizeof( IB_gid_t ) ); + ( IB_gid_t * ) malloc( maxNumGids * sizeof( IB_gid_t ) ); vapi_res = VAPI_query_hca_gid_tbl( hca_hndl, port_num + 1, maxNumGids, @@ -270,7 +270,7 @@ __osm_ca_info_init( IN osm_vendor_t * co p_ca_info->p_attr->p_port_attr[port_num].link_state = hca_port.state; p_ca_info->p_attr->p_port_attr[port_num].sm_lid = hca_port.sm_lid; - cl_free( p_port_gid ); + free( p_port_gid ); } status = IB_SUCCESS; @@ -299,14 +299,14 @@ osm_ca_info_destroy( IN osm_vendor_t * c { if(0 != p_ca->p_attr->num_ports) { - cl_free( p_ca->p_attr->p_port_attr ); + free( p_ca->p_attr->p_port_attr ); } - cl_free( p_ca->p_attr); + free( p_ca->p_attr); } } - cl_free( p_ca_info ); + free( p_ca_info ); OSM_LOG_EXIT( p_vend->p_log ); } @@ -349,7 +349,7 @@ osm_vendor_get_all_port_attr( IN osm_ven } /* Allocate an array big enough to hold the ca info objects*/ - p_ca_infos = cl_zalloc( ca_count * sizeof( osm_ca_info_t ) ); + p_ca_infos = malloc( ca_count * sizeof( osm_ca_info_t ) ); if( p_ca_infos == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -358,6 +358,8 @@ osm_vendor_get_all_port_attr( IN osm_ven goto Exit; } + memset( p_ca_infos, 0, ca_count * sizeof( osm_ca_info_t ) ); + /* * For each CA, retrieve the CA info attributes */ @@ -409,7 +411,7 @@ osm_vendor_get_all_port_attr( IN osm_ven Exit: if( p_ca_ids ) - cl_free( p_ca_ids ); + free( p_ca_ids ); if ( p_ca_infos ) { @@ -504,7 +506,7 @@ osm_vendor_get_guid_ca_and_port( VAPI_query_hca_gid_tbl( hca_hndl, portIdx + 1, 0, &maxNumGids, NULL ); p_port_gid = - ( IB_gid_t * ) cl_malloc( maxNumGids * sizeof( IB_gid_t ) ); + ( IB_gid_t * ) malloc( maxNumGids * sizeof( IB_gid_t ) ); /* get the port guid */ vapi_res = @@ -533,7 +535,7 @@ osm_vendor_get_guid_ca_and_port( goto Exit; } - cl_free( p_port_gid ); + free( p_port_gid ); p_port_gid = NULL; } /* ALL PORTS */ } /* all HCAs */ @@ -546,9 +548,9 @@ osm_vendor_get_guid_ca_and_port( Exit: if( p_ca_ids != NULL ) - cl_free( p_ca_ids ); + free( p_ca_ids ); if( p_port_gid != NULL ) - cl_free( p_port_gid ); + free( p_port_gid ); OSM_LOG_EXIT( p_vend->p_log ); return ( status ); } Index: osm/libvendor/osm_vendor_mlx_sa.c =================================================================== --- osm/libvendor/osm_vendor_mlx_sa.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_sa.c (working copy) @@ -40,8 +40,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include @@ -181,7 +181,7 @@ __osmv_sa_mad_rcv_cb( Exit: /* free the copied query request if found */ - if (p_query_req_copy) cl_free(p_query_req_copy); + if (p_query_req_copy) free(p_query_req_copy); /* put back the request madw */ if (p_req_madw) @@ -227,7 +227,7 @@ __osmv_sa_mad_err_cb( if ((p_query_req_copy->flags & OSM_SA_FLAGS_SYNC) == OSM_SA_FLAGS_SYNC) cl_event_signal( &p_bind->sync_event ); - if (p_query_req_copy) cl_free(p_query_req_copy); + if (p_query_req_copy) free(p_query_req_copy); OSM_LOG_EXIT( p_bind->p_log ); } @@ -289,7 +289,7 @@ __osmv_get_lid_and_sm_lid_by_port_guid( /* allocate the attributes */ p_attr_array = - (ib_port_attr_t *)cl_malloc(sizeof(ib_port_attr_t)*num_ports); + (ib_port_attr_t *)malloc(sizeof(ib_port_attr_t)*num_ports); /* obtain the attributes */ status = osm_vendor_get_all_port_attr(p_vend, p_attr_array, &num_ports); @@ -300,7 +300,7 @@ __osmv_get_lid_and_sm_lid_by_port_guid( "Fail to get port attributes (error: %s)\n", ib_get_err_str(status) ); - cl_free(p_attr_array); + free(p_attr_array); goto Exit; } @@ -321,7 +321,7 @@ __osmv_get_lid_and_sm_lid_by_port_guid( } } - cl_free(p_attr_array); + free(p_attr_array); Exit: OSM_LOG_EXIT( p_vend->p_log ); @@ -361,7 +361,7 @@ osmv_bind_sa( /* allocate the new sa bind info */ p_sa_bind_info = - (osmv_sa_bind_info_t *)cl_malloc(sizeof(osmv_sa_bind_info_t)); + (osmv_sa_bind_info_t *)malloc(sizeof(osmv_sa_bind_info_t)); if (! p_sa_bind_info) { osm_log( p_log, OSM_LOG_ERROR, @@ -389,7 +389,7 @@ osmv_bind_sa( if (p_sa_bind_info->h_bind == OSM_BIND_INVALID_HANDLE) { - cl_free(p_sa_bind_info); + free(p_sa_bind_info); p_sa_bind_info = OSM_BIND_INVALID_HANDLE; osm_log( p_log, OSM_LOG_ERROR, "osmv_bind_sa: ERR 0506: " @@ -406,7 +406,7 @@ osmv_bind_sa( &p_sa_bind_info->sm_lid); if (status != IB_SUCCESS) { - cl_free(p_sa_bind_info); + free(p_sa_bind_info); p_sa_bind_info = OSM_BIND_INVALID_HANDLE; osm_log( p_log, OSM_LOG_ERROR, "osmv_bind_sa: ERR 0507: " @@ -424,7 +424,7 @@ osmv_bind_sa( "cl_init_event failed: %s\n", ib_get_err_str(cl_status) ); - cl_free(p_sa_bind_info); + free(p_sa_bind_info); p_sa_bind_info = OSM_BIND_INVALID_HANDLE; } @@ -586,7 +586,7 @@ __osmv_send_sa_req( To store on the MADW we cast it into what opensm has: p_madw->context.arb_context.context1 */ - p_query_req_copy = cl_malloc(sizeof(*p_query_req_copy)); + p_query_req_copy = malloc(sizeof(*p_query_req_copy)); *p_query_req_copy = *p_query_req; p_madw->context.arb_context.context1 = p_query_req_copy; Index: osm/libvendor/osm_vendor_mlx_hca_pfs.c =================================================================== --- osm/libvendor/osm_vendor_mlx_hca_pfs.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_hca_pfs.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -43,8 +43,8 @@ #undef IN #undef OUT #include -#include #include +#include #include #include #include @@ -516,14 +516,14 @@ __osm_ca_info_init( IN osm_vendor_t * co /* set size of attributes and allocate them */ p_ca_info->attr_size = 1; - p_ca_info->p_attr = ( ib_ca_attr_t * ) cl_malloc( sizeof( ib_ca_attr_t ) ); + p_ca_info->p_attr = ( ib_ca_attr_t * ) malloc( sizeof( ib_ca_attr_t ) ); p_ca_info->p_attr->ca_guid = p_ca_info->guid; p_ca_info->p_attr->num_ports = pfs_ca_info.num_ports; /* now obtain the attributes of the ports */ p_ca_info->p_attr->p_port_attr = - ( ib_port_attr_t * ) cl_malloc( pfs_ca_info.num_ports * sizeof( ib_port_attr_t ) ); + ( ib_port_attr_t * ) malloc( pfs_ca_info.num_ports * sizeof( ib_port_attr_t ) ); /* get all the ports info */ for( port_num = 1; port_num <= pfs_ca_info.num_ports; port_num++ ) @@ -581,14 +581,14 @@ osm_ca_info_destroy( IN osm_vendor_t * c { if(0 != p_ca->p_attr->num_ports) { - cl_free( p_ca->p_attr->p_port_attr ); + free( p_ca->p_attr->p_port_attr ); } - cl_free( p_ca->p_attr); + free( p_ca->p_attr); } } - cl_free( p_ca_info ); + free( p_ca_info ); OSM_LOG_EXIT( p_vend->p_log ); } @@ -628,7 +628,7 @@ osm_vendor_get_all_port_attr( IN osm_ven } /* Allocate an array big enough to hold the ca info objects*/ - p_ca_infos = cl_zalloc( ca_count * sizeof( osm_ca_info_t ) ); + p_ca_infos = malloc( ca_count * sizeof( osm_ca_info_t ) ); if( p_ca_infos == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -637,6 +637,8 @@ osm_vendor_get_all_port_attr( IN osm_ven goto Exit; } + memset( p_ca_infos, 0, ca_count * sizeof( osm_ca_info_t ) ); + /* * For each CA, retrieve the CA info attributes */ Index: osm/libvendor/osm_vendor_ibumad_sa.c =================================================================== --- osm/libvendor/osm_vendor_ibumad_sa.c (revision 7470) +++ osm/libvendor/osm_vendor_ibumad_sa.c (working copy) @@ -38,13 +38,12 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include #include -#include - #define MAX_PORTS 64 /***************************************************************************** @@ -180,7 +179,7 @@ __osmv_sa_mad_rcv_cb( Exit: /* free the copied query request if found */ - if (p_query_req_copy) cl_free(p_query_req_copy); + if (p_query_req_copy) free(p_query_req_copy); OSM_LOG_EXIT( p_bind->p_log ); } @@ -221,7 +220,7 @@ __osmv_sa_mad_err_cb( if ((p_query_req_copy->flags & OSM_SA_FLAGS_SYNC) == OSM_SA_FLAGS_SYNC) cl_event_signal( &p_bind->sync_event ); - if (p_query_req_copy) cl_free(p_query_req_copy); + if (p_query_req_copy) free(p_query_req_copy); OSM_LOG_EXIT( p_bind->p_log ); } @@ -282,7 +281,7 @@ __osmv_get_lid_and_sm_lid_by_port_guid( /* allocate the attributes */ p_attr_array = - (ib_port_attr_t *)cl_malloc(sizeof(ib_port_attr_t)*num_ports); + (ib_port_attr_t *)malloc(sizeof(ib_port_attr_t)*num_ports); /* obtain the attributes */ status = osm_vendor_get_all_port_attr(p_vend, p_attr_array, &num_ports); @@ -293,7 +292,7 @@ __osmv_get_lid_and_sm_lid_by_port_guid( "Failed to get port attributes (error: %s)\n", ib_get_err_str(status) ); - cl_free(p_attr_array); + free(p_attr_array); goto Exit; } @@ -314,7 +313,7 @@ __osmv_get_lid_and_sm_lid_by_port_guid( } } - cl_free(p_attr_array); + free(p_attr_array); Exit: OSM_LOG_EXIT( p_vend->p_log ); @@ -354,7 +353,7 @@ osmv_bind_sa( /* allocate the new sa bind info */ p_sa_bind_info = - (osmv_sa_bind_info_t *)cl_malloc(sizeof(osmv_sa_bind_info_t)); + (osmv_sa_bind_info_t *)malloc(sizeof(osmv_sa_bind_info_t)); if (! p_sa_bind_info) { osm_log( p_log, OSM_LOG_ERROR, @@ -382,7 +381,7 @@ osmv_bind_sa( if (p_sa_bind_info->h_bind == OSM_BIND_INVALID_HANDLE) { - cl_free(p_sa_bind_info); + free(p_sa_bind_info); p_sa_bind_info = OSM_BIND_INVALID_HANDLE; osm_log( p_log, OSM_LOG_ERROR, "osmv_bind_sa: ERR 5506: " @@ -399,7 +398,7 @@ osmv_bind_sa( &p_sa_bind_info->sm_lid); if (status != IB_SUCCESS) { - cl_free(p_sa_bind_info); + free(p_sa_bind_info); p_sa_bind_info = OSM_BIND_INVALID_HANDLE; osm_log( p_log, OSM_LOG_ERROR, "osmv_bind_sa: ERR 5507: " @@ -417,7 +416,7 @@ osmv_bind_sa( "cl_init_event failed: %s\n", ib_get_err_str(cl_status) ); - cl_free(p_sa_bind_info); + free(p_sa_bind_info); p_sa_bind_info = OSM_BIND_INVALID_HANDLE; } @@ -576,7 +575,7 @@ __osmv_send_sa_req( To store on the MADW we cast it into what opensm has: p_madw->context.ni_context.node_guid */ - p_query_req_copy = cl_malloc(sizeof(*p_query_req_copy)); + p_query_req_copy = malloc(sizeof(*p_query_req_copy)); *p_query_req_copy = *p_query_req; p_madw->context.ni_context.node_guid = (ib_net64_t)(long)p_query_req_copy; Index: osm/libvendor/osm_vendor_mlx_txn.c =================================================================== --- osm/libvendor/osm_vendor_mlx_txn.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_txn.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -38,7 +38,7 @@ # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include @@ -86,12 +86,13 @@ osmv_txn_init(IN osm_bind_handle_t osm_log( p_bo->p_vendor->p_log, OSM_LOG_DEBUG, "Starting transaction 0x%llX (key=0x%llX)\n", tid, key); - p_txn = cl_zalloc(sizeof(osmv_txn_ctx_t)); + p_txn = malloc(sizeof(osmv_txn_ctx_t)); if (! p_txn) { return IB_INSUFFICIENT_MEMORY; } + memset(p_txn, 0, sizeof(osmv_txn_ctx_t)); p_txn->p_log = p_bo->txn_mgr.p_log; p_txn->tid = tid; p_txn->key = key; @@ -113,7 +114,7 @@ osmv_txn_init(IN osm_bind_handle_t return IB_SUCCESS; insert_txn_failed: - cl_free(p_txn); + free(p_txn); OSM_LOG_EXIT( p_bo->p_vendor->p_log ); return st; @@ -132,13 +133,15 @@ osmv_txn_init_rmpp_sender(IN osm_bind_ha osmv_txn_remove_timeout_ev(h_bind, osmv_txn_get_key(p_txn)); p_txn->rmpp_txfr.rmpp_state = OSMV_TXN_RMPP_SENDER; - p_txn->rmpp_txfr.p_rmpp_send_ctx = cl_zalloc(sizeof(osmv_rmpp_send_ctx_t)); + p_txn->rmpp_txfr.p_rmpp_send_ctx = malloc(sizeof(osmv_rmpp_send_ctx_t)); if (!p_txn->rmpp_txfr.p_rmpp_send_ctx) { return IB_INSUFFICIENT_MEMORY; } + memset(p_txn->rmpp_txfr.p_rmpp_send_ctx, 0, sizeof(osmv_rmpp_send_ctx_t)); + st = osmv_rmpp_send_ctx_init(p_txn->rmpp_txfr.p_rmpp_send_ctx, (void*)p_madw->p_mad, p_madw->mad_size, @@ -171,7 +174,7 @@ osmv_txn_init_rmpp_receiver(IN osm_bind_ p_txn->rmpp_txfr.rmpp_state = OSMV_TXN_RMPP_RECEIVER; p_txn->rmpp_txfr.is_rmpp_init_by_peer = is_init_by_peer; - p_txn->rmpp_txfr.p_rmpp_recv_ctx = cl_zalloc(sizeof(osmv_rmpp_recv_ctx_t)); + p_txn->rmpp_txfr.p_rmpp_recv_ctx = malloc(sizeof(osmv_rmpp_recv_ctx_t)); if (!p_txn->rmpp_txfr.p_rmpp_recv_ctx) { @@ -180,6 +183,8 @@ osmv_txn_init_rmpp_receiver(IN osm_bind_ return IB_INSUFFICIENT_MEMORY; } + memset(p_txn->rmpp_txfr.p_rmpp_recv_ctx, 0, sizeof(osmv_rmpp_recv_ctx_t)); + st = osmv_rmpp_recv_ctx_init(p_txn->rmpp_txfr.p_rmpp_recv_ctx,p_txn->p_log); return st; @@ -271,7 +276,7 @@ osmv_txn_done(IN osm_bind_handle_t h_bin osmv_rmpp_recv_ctx_done(p_ctx->rmpp_txfr.p_rmpp_recv_ctx); } - cl_free(p_ctx); + free(p_ctx); OSM_LOG_EXIT(p_bo->p_vendor->p_log); } @@ -327,12 +332,14 @@ osmv_txnmgr_init(IN osmv_txn_mgr_t *p_tx { cl_status_t cl_st = CL_SUCCESS; - p_tx_mgr->p_event_wheel = cl_zalloc(sizeof(cl_event_wheel_t)); + p_tx_mgr->p_event_wheel = malloc(sizeof(cl_event_wheel_t)); if (!p_tx_mgr->p_event_wheel) { return IB_INSUFFICIENT_MEMORY; } + memset(p_tx_mgr->p_event_wheel, 0, sizeof(cl_event_wheel_t)); + cl_event_wheel_construct(p_tx_mgr->p_event_wheel); /* NOTE! We are using an extended constructor. @@ -342,18 +349,20 @@ osmv_txnmgr_init(IN osmv_txn_mgr_t *p_tx cl_st = cl_event_wheel_init_ex(p_tx_mgr->p_event_wheel, p_log, p_lock); if (cl_st != CL_SUCCESS) { - cl_free(p_tx_mgr->p_event_wheel); + free(p_tx_mgr->p_event_wheel); return (ib_api_status_t)cl_st; } - p_tx_mgr->p_txn_map = cl_zalloc(sizeof(cl_qmap_t)); + p_tx_mgr->p_txn_map = malloc(sizeof(cl_qmap_t)); if (!p_tx_mgr->p_txn_map) { cl_event_wheel_destroy(p_tx_mgr->p_event_wheel); - cl_free(p_tx_mgr->p_event_wheel); + free(p_tx_mgr->p_event_wheel); return IB_INSUFFICIENT_MEMORY; } + memset(p_tx_mgr->p_txn_map, 0, sizeof(cl_qmap_t)); + cl_qmap_init(p_tx_mgr->p_txn_map); p_tx_mgr->p_log = p_log; @@ -366,10 +375,10 @@ osmv_txnmgr_done(IN osm_bind_handle_t osmv_bind_obj_t* p_bo = (osmv_bind_obj_t*)h_bind; __osmv_txn_all_done(h_bind); - cl_free(p_bo->txn_mgr.p_txn_map); + free(p_bo->txn_mgr.p_txn_map); cl_event_wheel_destroy(p_bo->txn_mgr.p_event_wheel); - cl_free(p_bo->txn_mgr.p_event_wheel); + free(p_bo->txn_mgr.p_event_wheel); } ib_api_status_t @@ -430,7 +439,7 @@ __osmv_txnmgr_insert_txn(IN osmv_txn_mgr CL_ASSERT(p_txn); key = osmv_txn_get_key(p_txn); - p_obj = cl_zalloc(sizeof(cl_map_obj_t)); + p_obj = malloc(sizeof(cl_map_obj_t)); if (NULL == p_obj) return IB_INSUFFICIENT_MEMORY; @@ -438,6 +447,8 @@ __osmv_txnmgr_insert_txn(IN osmv_txn_mgr "__osmv_txnmgr_insert_txn: " "Inserting key: 0x%llX to map ptr:%p\n", key, p_tx_mgr->p_txn_map ); + memset(p_obj, 0, sizeof(cl_map_obj_t)); + cl_qmap_set_obj(p_obj,p_txn); /* assuming lookup with this key was made and the result was IB_NOT_FOUND */ cl_qmap_insert(p_tx_mgr->p_txn_map, key, &p_obj->item); @@ -484,7 +495,7 @@ __osmv_txnmgr_remove_txn(IN osmv_txn_mg p_obj = PARENT_STRUCT(p_item, cl_map_obj_t,item); *pp_txn = cl_qmap_obj(p_obj); - cl_free(p_obj); + free(p_obj); OSM_LOG_EXIT(p_tx_mgr->p_log); return IB_SUCCESS; @@ -506,7 +517,7 @@ __osmv_txn_all_done(osm_bind_handle_t p_obj = PARENT_STRUCT(p_item,cl_map_obj_t,item); p_txn = (osmv_txn_ctx_t*)cl_qmap_obj(p_obj); osmv_txn_done(h_bind, osmv_txn_get_key(p_txn), FALSE); - cl_free(p_obj); + free(p_obj); /* assuming osmv_txn_done has removed the txn from the map */ p_item = cl_qmap_head(p_bo->txn_mgr.p_txn_map); } Index: osm/libvendor/osm_vendor_ibumad.c =================================================================== --- osm/libvendor/osm_vendor_ibumad.c (revision 7470) +++ osm/libvendor/osm_vendor_ibumad.c (working copy) @@ -58,12 +58,12 @@ #ifdef OSM_VENDOR_INTF_OPENIB #include +#include #include #include #include #include -#include #include #include #include @@ -507,7 +507,7 @@ osm_vendor_new( goto Exit; } - p_vend = cl_zalloc( sizeof(*p_vend) ); + p_vend = malloc( sizeof(*p_vend) ); if( p_vend == NULL ) { osm_log( p_log, OSM_LOG_ERROR, @@ -516,8 +516,10 @@ osm_vendor_new( goto Exit; } + memset( p_vend, 0, sizeof(*p_vend) ); + if (osm_vendor_init( p_vend, p_log, timeout ) < 0) { - cl_free( p_vend ); + free( p_vend ); p_vend = NULL; } @@ -550,7 +552,7 @@ osm_vendor_delete( cl_event_destroy( &p_ur->signal ); cl_spinlock_destroy( &(*pp_vend)->cb_lock ); cl_spinlock_destroy( &(*pp_vend)->match_tbl_lock ); - cl_free( *pp_vend ); + free( *pp_vend ); *pp_vend = NULL; } @@ -826,13 +828,14 @@ osm_vendor_bind( goto Exit; } - if (!(p_bind = cl_zalloc( sizeof(*p_bind) ))) { + if (!(p_bind = malloc( sizeof(*p_bind) ))) { osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 5425: " "Unable to allocate internal bind object\n" ); goto Exit; } + memset( p_bind, 0, sizeof(*p_bind) ); p_bind->p_vend = p_vend; p_bind->port_id = umad_port_id; p_bind->client_context = context; @@ -880,7 +883,7 @@ osm_vendor_bind( "osm_vendor_bind: ERR 5426: " "Unable to register class %u version %u\n", p_user_bind->mad_class, p_user_bind->class_version); - cl_free(p_bind); + free(p_bind); p_bind = 0; goto Exit; } @@ -892,7 +895,7 @@ osm_vendor_bind( "bad agent id %u or duplicate agent for class %u vers %u\n", p_bind->agent_id, p_user_bind->mad_class, p_user_bind->class_version); - cl_free(p_bind); + free(p_bind); p_bind = 0; goto Exit; } @@ -909,7 +912,7 @@ osm_vendor_bind( "osm_vendor_bind: ERR 5428: " "Unable to register class 1 version %u\n", p_user_bind->class_version); - cl_free(p_bind); + free(p_bind); p_bind = 0; goto Exit; } @@ -920,7 +923,7 @@ osm_vendor_bind( "osm_vendor_bind: ERR 5429: " "bad agent id %u or duplicate agent for class 1 vers %u\n", p_bind->agent_id1, p_user_bind->class_version); - cl_free(p_bind); + free(p_bind); p_bind = 0; goto Exit; } Index: osm/libvendor/osm_vendor_mlx_sar.c =================================================================== --- osm/libvendor/osm_vendor_mlx_sar.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_sar.c (working copy) @@ -38,10 +38,10 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include -#include ib_api_status_t osmv_rmpp_sar_init(osmv_rmpp_sar_t* p_sar, void* p_arbt_mad, @@ -161,21 +161,10 @@ osmv_rmpp_sar_reassemble_arbt_mad(osmv_r p_mad= (char*)p_mad+space_left; } - cl_free(buf_tmp); - cl_free(p_obj); + free(buf_tmp); + free(p_obj); } return IB_SUCCESS; } - - - - - - - - - - - Index: osm/libvendor/osm_vendor_mlx_sim.c =================================================================== --- osm/libvendor/osm_vendor_mlx_sim.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_sim.c (working copy) @@ -51,6 +51,7 @@ #include #include #include +#include #include #include @@ -58,7 +59,6 @@ #include #include -#include /* the simulator messages definition */ #include @@ -149,7 +149,7 @@ osmv_transport_init(IN osm_bind_info_t * IN osmv_bind_obj_t *p_bo) { ibms_conn_handle_t conHdl; /* the connection we talk to the simulator through */ - osmv_ibms_transport_mgr_t* p_mgr = cl_zalloc(sizeof(osmv_ibms_transport_mgr_t)); + osmv_ibms_transport_mgr_t* p_mgr = malloc(sizeof(osmv_ibms_transport_mgr_t)); int qpn; int ibms_status; uint64_t port_guid; @@ -159,6 +159,8 @@ osmv_transport_init(IN osm_bind_info_t * return IB_INSUFFICIENT_MEMORY; } + memset(p_mgr, 0, sizeof(osmv_ibms_transport_mgr_t)); + /* create the client socket connected to the simulator */ /* also perform the "connect" message - such that we validate the target guid */ @@ -343,7 +345,7 @@ osmv_transport_done(IN const osm_bind_ha ibms_disconnect(p_tpot_mgr->conHdl); /* seems the only way to abort a blocking read is to make it read something */ - cl_free(p_tpot_mgr); + free(p_tpot_mgr); } static void Index: osm/libvendor/osm_vendor_umadt.c =================================================================== --- osm/libvendor/osm_vendor_umadt.c (revision 7470) +++ osm/libvendor/osm_vendor_umadt.c (working copy) @@ -59,11 +59,11 @@ #ifdef OSM_VENDOR_INTF_UMADT +#include #include #include #include -#include #include #include #include @@ -153,9 +153,11 @@ osm_vendor_new( OSM_LOG_ENTER( p_log, osm_vendor_new ); - p_umadt_obj = cl_zalloc(sizeof(umadt_obj_t)); + p_umadt_obj = malloc(sizeof(umadt_obj_t)); if( p_umadt_obj ) { + memset( p_umadt_obj, 0, sizeof(umadt_obj_t) ); + status = osm_vendor_init( (osm_vendor_t*)p_umadt_obj, p_log, timeout ); if( status != IB_SUCCESS ) @@ -201,7 +203,7 @@ osm_vendor_delete( p_mad_bind_info = (mad_bind_info_t*)p_list_item; } dlclose(p_umadt_obj->umadt_handle); - cl_free(p_umadt_obj); + free(p_umadt_obj); *pp_vend = NULL; OSM_LOG_EXIT( p_umadt_obj->p_log ); @@ -354,8 +356,8 @@ osm_vendor_get_ports( } pPortAttributesList = - ( IB_PORT_ATTRIBUTES * ) cl_zalloc ( caAttributes. - PortAttributesListSize ); + ( IB_PORT_ATTRIBUTES * ) malloc ( caAttributes. + PortAttributesListSize ); if ( pPortAttributesList == NULL ) { @@ -389,7 +391,7 @@ osm_vendor_get_ports( pPortAttributesList = pPortAttributesList->Next; p_port_guid++; } - cl_free (caAttributes.PortAttributesList); + free (caAttributes.PortAttributesList); p_umadt_obj->IbtInterface.Vpi.CloseCA ( caHandle ); free_guids = free_guids - caAttributes.Ports ; @@ -441,12 +443,15 @@ osm_vendor_get( p_vend_wrap->direction = SEND; return( (ib_mad_t*)&p_madt_struct->IBMad ); #endif /* 0 */ - p_mad = (ib_mad_t*)cl_zalloc( mad_size ); - if ( !p_mad) + p_mad = (ib_mad_t*)malloc( mad_size ); + if ( !p_mad ) { p_vend_wrap->p_madt_struct = NULL; return NULL; } + + memset(p_mad, 0, mad_size); + p_vend_wrap->p_madt_struct = NULL; p_vend_wrap->direction = SEND; p_vend_wrap->size =mad_size; @@ -489,7 +494,7 @@ osm_vendor_put( /* For a send the PostSend released the MAD with Umadt. Simply dealloacte the */ /* local memory that was allocated on the osm_vendor_get() call. */ /* */ - cl_free(p_mad); + free(p_mad); #if 0 Status = p_umadt_obj->uMadtInterface.uMadtReleaseSendMad(p_mad_bind_info->umadt_handle, p_vend_wrap->p_madt_struct); @@ -584,17 +589,21 @@ osm_vendor_send( p_mad->trans_id = cl_ntoh64(p_mad->trans_id)<<24; /* */ - /* Creat a transaction context for this send and save the TID and client context. */ + /* Create a transaction context for this send and save the TID and client context. */ /* */ if ( resp_expected ) { - p_trans_context = cl_zalloc(sizeof(trans_context_t)); + p_trans_context = malloc(sizeof(trans_context_t)); CL_ASSERT(p_trans_context); - p_trans_context->trans_id = p_mad->trans_id; - p_trans_context->context = transaction_context; - p_trans_context->sent_time = cl_get_time_stamp(); + if (p_trans_context) + { + memset(p_trans_context, 0, sizeof(trans_context_t)); + p_trans_context->trans_id = p_mad->trans_id; + p_trans_context->context = transaction_context; + p_trans_context->sent_time = cl_get_time_stamp(); + } cl_spinlock_acquire(&p_mad_bind_info->trans_ctxt_lock); cl_qlist_insert_tail(&p_mad_bind_info->trans_ctxt_list, @@ -774,9 +783,12 @@ osm_vendor_bind( CL_ASSERT( mad_recv_callback ); /* Allocate memory for registering the handle. */ - p_mad_bind_info = (mad_bind_info_t*)cl_zalloc(sizeof(*p_mad_bind_info)); - - p_umadt_reg_class = &p_mad_bind_info->umadt_reg_class ; + p_mad_bind_info = (mad_bind_info_t*)malloc(sizeof(*p_mad_bind_info)); + if (p_mad_bind_info) + { + memset(p_mad_bind_info, 0, sizeof(*p_mad_bind_info)); + p_umadt_reg_class = &p_mad_bind_info->umadt_reg_class; + } p_umadt_reg_class->PortGuid = cl_ntoh64( p_osm_bind_info->port_guid ); p_umadt_reg_class->ClassId = p_osm_bind_info->mad_class; p_umadt_reg_class->ClassVersion = p_osm_bind_info->class_version; @@ -797,7 +809,7 @@ osm_vendor_bind( &p_mad_bind_info->umadt_handle); if (Status != FSUCCESS) { - cl_free(p_mad_bind_info); + free(p_mad_bind_info); OSM_LOG_EXIT( p_umadt_obj->p_log ); return( OSM_BIND_INVALID_HANDLE ); } @@ -875,7 +887,7 @@ osm_vendor_unbind(IN osm_bind_handle_t h p_next_list_item = cl_qlist_next(p_list_item); cl_qlist_remove_item(&p_mad_bind_info->trans_ctxt_list, p_list_item); - cl_free(p_list_item); + free(p_list_item); p_list_item = p_next_list_item; } cl_spinlock_release(&p_mad_bind_info->trans_ctxt_lock); @@ -887,12 +899,12 @@ osm_vendor_unbind(IN osm_bind_handle_t h p_next_list_item = cl_qlist_next(p_list_item); cl_qlist_remove_item(&p_mad_bind_info->timeout_list, p_list_item); - cl_free(p_list_item); + free(p_list_item); p_list_item = p_next_list_item; } cl_spinlock_release(&p_mad_bind_info->timeout_list_lock); - cl_free(p_mad_bind_info); + free(p_mad_bind_info); } /********************************************************************** @@ -1025,7 +1037,7 @@ __mad_recv_processor( transaction_context =((trans_context_t*)p_list_item)->context; cl_qlist_remove_item(&p_mad_bind_info->trans_ctxt_list, p_list_item); - cl_free(p_list_item); + free(p_list_item); } cl_spinlock_release(&p_mad_bind_info->trans_ctxt_lock); ((ib_mad_t*)p_osm_madw->p_mad)->trans_id = cl_ntoh64(p_osm_madw->p_mad->trans_id>>24); @@ -1137,7 +1149,7 @@ __osm_vendor_timer_callback( p_next_list_item = cl_qlist_next(p_list_item); cl_qlist_remove_item(&p_mad_bind_info->timeout_list, p_list_item); - cl_free(p_list_item); + free(p_list_item); p_list_item = p_next_list_item; } Index: osm/libvendor/osm_vendor_mlx_rmpp_ctx.c =================================================================== --- osm/libvendor/osm_vendor_mlx_rmpp_ctx.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_rmpp_ctx.c (working copy) @@ -38,10 +38,10 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include -#include #include #include @@ -91,7 +91,7 @@ osmv_rmpp_send_ctx_done(IN osmv_rmpp_sen CL_ASSERT(p_ctx); cl_event_destroy(&p_ctx->event); osmv_rmpp_sar_done(&p_ctx->sar); - cl_free(p_ctx); + free(p_ctx); } @@ -211,9 +211,10 @@ osmv_rmpp_recv_ctx_init(osmv_rmpp_recv_c p_ctx->is_sa_mad = FALSE; - p_ctx->p_rbuf = cl_zalloc(sizeof(cl_qlist_t)); + p_ctx->p_rbuf = malloc(sizeof(cl_qlist_t)); if (p_ctx->p_rbuf) { + memset(p_ctx->p_rbuf, 0, sizeof(cl_qlist_t)); cl_qlist_init(p_ctx->p_rbuf); p_ctx->expected_seg = 1; } else st= IB_INSUFFICIENT_MEMORY; @@ -237,16 +238,16 @@ osmv_rmpp_recv_ctx_done(IN osmv_rmpp_rec p_obj = PARENT_STRUCT(p_list_item,cl_list_obj_t,list_item); - cl_free(cl_qlist_obj(p_obj)); - cl_free(p_obj); + free(cl_qlist_obj(p_obj)); + free(p_obj); p_list_item = cl_qlist_remove_head(p_ctx->p_rbuf); } osmv_rmpp_sar_done(&p_ctx->sar); - cl_free(p_ctx->p_rbuf); - cl_free(p_ctx); + free(p_ctx->p_rbuf); + free(p_ctx); } @@ -260,20 +261,22 @@ osmv_rmpp_recv_ctx_store_mad_seg(IN osmv OSM_LOG_ENTER(p_recv_ctx->p_log, osmv_rmpp_recv_ctx_store_mad_seg); CL_ASSERT(p_recv_ctx); - p_list_mad= cl_zalloc(MAD_BLOCK_SIZE); + p_list_mad= malloc(MAD_BLOCK_SIZE); if (NULL == p_list_mad) { return IB_INSUFFICIENT_MEMORY; } - memcpy(p_list_mad,p_mad,MAD_BLOCK_SIZE); + memset(p_list_mad, 0, MAD_BLOCK_SIZE); + memcpy(p_list_mad, p_mad, MAD_BLOCK_SIZE); - p_obj = cl_zalloc(sizeof(cl_list_obj_t)); + p_obj = malloc(sizeof(cl_list_obj_t)); if (NULL == p_obj) { - cl_free(p_list_mad); + free(p_list_mad); return IB_INSUFFICIENT_MEMORY; } - cl_qlist_set_obj(p_obj,p_list_mad); + memset(p_obj, 0, sizeof(cl_list_obj_t)); + cl_qlist_set_obj(p_obj, p_list_mad); cl_qlist_insert_tail(p_recv_ctx->p_rbuf,&p_obj->list_item); Index: osm/libvendor/osm_vendor_mtl_hca_guid.c =================================================================== --- osm/libvendor/osm_vendor_mtl_hca_guid.c (revision 7470) +++ osm/libvendor/osm_vendor_mtl_hca_guid.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -42,10 +42,10 @@ #if defined(OSM_VENDOR_INTF_MTL) | defined(OSM_VENDOR_INTF_TS) #undef IN #undef OUT +#include #include #include #include -#include #include #include @@ -148,8 +148,8 @@ __osm_vendor_get_ca_ids( IN osm_vendor_t /* allocate and really call - user of this function needs to deallocate it */ *p_hca_ids = - ( VAPI_hca_id_t * ) cl_malloc( *p_num_guids * - sizeof( VAPI_hca_id_t ) ); + ( VAPI_hca_id_t * ) malloc( *p_num_guids * + sizeof( VAPI_hca_id_t ) ); /* now call it really */ vapi_res = EVAPI_list_hcas( *p_num_guids, p_num_guids, *p_hca_ids ); @@ -239,15 +239,15 @@ __osm_ca_info_init( IN osm_vendor_t * co memcpy( &( p_ca_info->guid ), hca_cap.node_guid, 8 * sizeof( u_int8_t ) ); p_ca_info->attr_size = 1; p_ca_info->p_attr = - ( ib_ca_attr_t * ) cl_malloc( sizeof( ib_ca_attr_t ) ); + ( ib_ca_attr_t * ) malloc( sizeof( ib_ca_attr_t ) ); memcpy( &( p_ca_info->p_attr->ca_guid ), hca_cap.node_guid, 8 * sizeof( u_int8_t ) ); /* now obtain the attributes of the ports */ p_ca_info->p_attr->num_ports = hca_cap.phys_port_num; p_ca_info->p_attr->p_port_attr = - ( ib_port_attr_t * ) cl_malloc( hca_cap.phys_port_num * - sizeof( ib_port_attr_t ) ); + ( ib_port_attr_t * ) malloc( hca_cap.phys_port_num * + sizeof( ib_port_attr_t ) ); for( port_num = 0; port_num < p_ca_info->p_attr->num_ports; port_num++ ) { @@ -268,7 +268,7 @@ __osm_ca_info_init( IN osm_vendor_t * co VAPI_query_hca_gid_tbl( hca_hndl, port_num + 1, 0, &maxNumGids, NULL ); p_port_gid = - ( IB_gid_t * ) cl_malloc( maxNumGids * sizeof( IB_gid_t ) ); + ( IB_gid_t * ) malloc( maxNumGids * sizeof( IB_gid_t ) ); vapi_res = VAPI_query_hca_gid_tbl( hca_hndl, port_num + 1, maxNumGids, @@ -288,7 +288,7 @@ __osm_ca_info_init( IN osm_vendor_t * co p_ca_info->p_attr->p_port_attr[port_num].link_state = hca_port.state; p_ca_info->p_attr->p_port_attr[port_num].sm_lid = hca_port.sm_lid; - cl_free( p_port_gid ); + free( p_port_gid ); } status = IB_SUCCESS; @@ -309,12 +309,12 @@ osm_ca_info_destroy( IN osm_vendor_t * c { if( p_ca_info->p_attr->num_ports ) { - cl_free( p_ca_info->p_attr->p_port_attr ); + free( p_ca_info->p_attr->p_port_attr ); } - cl_free( p_ca_info->p_attr ); + free( p_ca_info->p_attr ); } - cl_free( p_ca_info ); + free( p_ca_info ); OSM_LOG_EXIT( p_vend->p_log ); } @@ -359,7 +359,7 @@ osm_vendor_get_all_port_attr( IN osm_ven } /* we keep track of all the CAs in this info array */ - p_vend->p_ca_info = cl_zalloc( ca_count * sizeof( *p_vend->p_ca_info ) ); + p_vend->p_ca_info = malloc( ca_count * sizeof( *p_vend->p_ca_info ) ); if( p_vend->p_ca_info == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -368,6 +368,7 @@ osm_vendor_get_all_port_attr( IN osm_ven goto Exit; } + memset( p_vend->p_ca_info, 0, ca_count * sizeof( *p_vend->p_ca_info ) ); p_vend->ca_count = ca_count; /* @@ -433,7 +434,7 @@ osm_vendor_get_all_port_attr( IN osm_ven *p_num_ports = total_ports; if( p_ca_ids ) - cl_free( p_ca_ids ); + free( p_ca_ids ); OSM_LOG_EXIT( p_vend->p_log ); return ( status ); @@ -521,7 +522,7 @@ osm_vendor_get_guid_ca_and_port( IN osm_ VAPI_query_hca_gid_tbl( hca_hndl, portIdx + 1, 0, &maxNumGids, NULL ); p_port_gid = - ( IB_gid_t * ) cl_malloc( maxNumGids * sizeof( IB_gid_t ) ); + ( IB_gid_t * ) malloc( maxNumGids * sizeof( IB_gid_t ) ); /* get the port guid */ vapi_res = @@ -549,7 +550,7 @@ osm_vendor_get_guid_ca_and_port( IN osm_ goto Exit; } - cl_free( p_port_gid ); + free( p_port_gid ); p_port_gid = NULL; } /* ALL PORTS */ } /* all HCAs */ @@ -562,9 +563,9 @@ osm_vendor_get_guid_ca_and_port( IN osm_ Exit: if( p_ca_ids != NULL ) - cl_free( p_ca_ids ); + free( p_ca_ids ); if( p_port_gid != NULL ) - cl_free( p_port_gid ); + free( p_port_gid ); OSM_LOG_EXIT( p_vend->p_log ); return ( status ); } Index: osm/libvendor/osm_vendor_test.c =================================================================== --- osm/libvendor/osm_vendor_test.c (revision 7470) +++ osm/libvendor/osm_vendor_test.c (working copy) @@ -56,8 +56,8 @@ #ifdef OSM_VENDOR_INTF_TEST +#include #include -#include #include #include #include @@ -89,7 +89,7 @@ osm_vendor_delete( CL_ASSERT( pp_vend ); osm_vendor_destroy( *pp_vend ); - cl_free( *pp_vend ); + free( *pp_vend ); *pp_vend = NULL; } @@ -125,9 +125,11 @@ osm_vendor_new( CL_ASSERT( p_log ); - p_vend = cl_zalloc( sizeof(*p_vend) ); + p_vend = malloc( sizeof(*p_vend) ); if( p_vend != NULL ) { + memset( p_vend, 0, sizeof(*p_vend) ); + status = osm_vendor_init( p_vend, p_log, timeout ); if( status != IB_SUCCESS ) { @@ -158,12 +160,15 @@ osm_vendor_get( /* Simply malloc the MAD off the heap. */ - p_mad = (ib_mad_t*)cl_zalloc( size ); + p_mad = (ib_mad_t*)malloc( size ); osm_log( p_vend->p_log, OSM_LOG_VERBOSE, "osm_vendor_get: " "MAD %p.\n", p_mad ); + if (p_mad) + memset( p_mad, 0, size ); + OSM_LOG_EXIT( p_vend->p_log ); return( p_mad ); } @@ -191,7 +196,7 @@ osm_vendor_put( /* Return the MAD to the heap. */ - cl_free( p_mad ); + free( p_mad ); OSM_LOG_EXIT( p_vend->p_log ); } @@ -249,9 +254,10 @@ osm_vendor_bind( UNUSED_PARAM( mad_recv_callback ); UNUSED_PARAM( context ); - h_bind = (osm_bind_handle_t)cl_zalloc(sizeof(*h_bind) ); + h_bind = (osm_bind_handle_t)malloc(sizeof(*h_bind) ); if( h_bind != NULL ) { + memset(h_bind, 0, sizeof(*h_bind)); h_bind->p_vend = p_vend; h_bind->port_guid = p_bind_info->port_guid; h_bind->mad_class = p_bind_info->mad_class; Index: osm/libvendor/osm_vendor_mlx_ibmgt.c =================================================================== --- osm/libvendor/osm_vendor_mlx_ibmgt.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_ibmgt.c (working copy) @@ -46,9 +46,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include #include #include #include @@ -127,11 +127,12 @@ osmv_transport_init(IN osm_bind_info_t * osm_log(p_bo->p_vendor->p_log, OSM_LOG_DEBUG, "osmv_transport_init: first bind() for the vendor\n"); p_bo->p_vendor->p_transport_info - = (osmv_IBMGT_transport_info_t*) cl_zalloc(sizeof(osmv_IBMGT_transport_info_t)); + = (osmv_IBMGT_transport_info_t*) malloc(sizeof(osmv_IBMGT_transport_info_t)); if (NULL == p_bo->p_vendor->p_transport_info) { return IB_INSUFFICIENT_MEMORY; } + memset(p_bo->p_vendor->p_transport_info, 0, sizeof(osmv_IBMGT_transport_info_t)); p_tpot_info = (osmv_IBMGT_transport_info_t*)(p_bo->p_vendor->p_transport_info); p_tpot_info->smi_h = 0xffffffff; @@ -155,17 +156,19 @@ osmv_transport_init(IN osm_bind_info_t * /* allocate transport mgr */ - p_mgr = cl_zalloc(sizeof(osmv_IBMGT_transport_mgr_t)); + p_mgr = malloc(sizeof(osmv_IBMGT_transport_mgr_t)); if (NULL == p_mgr) { - cl_free(p_tpot_info); + free(p_tpot_info); osm_log(p_bo->p_vendor->p_log, OSM_LOG_ERROR, "osmv_transport_init: ERR 7201: " "alloc failed \n"); return IB_INSUFFICIENT_MEMORY; } - p_bo->p_transp_mgr = p_mgr ; + memset(p_mgr, 0, sizeof(osmv_IBMGT_transport_mgr_t)); + + p_bo->p_transp_mgr = p_mgr; switch ( p_info->mad_class ) { case IB_MCLASS_SUBN_LID: @@ -198,7 +201,7 @@ osmv_transport_init(IN osm_bind_info_t * "osmv_transport_init: ERR 7202: " "IB_MGT_get_handle for smi failed \n"); st = IB_ERROR; - cl_free(p_mgr); + free(p_mgr); goto Exit; } @@ -213,12 +216,12 @@ osmv_transport_init(IN osm_bind_info_t * "osmv_transport_init: ERR 7203: " "IB_MGT_bind_sm failed \n"); st = IB_ERROR; - cl_free( p_mgr); + free( p_mgr); goto Exit; } /* init smi list */ - p_tpot_info->p_smi_list = cl_zalloc(sizeof(cl_qlist_t)); + p_tpot_info->p_smi_list = malloc(sizeof(cl_qlist_t)); if (NULL == p_tpot_info->p_smi_list) { osm_log(p_bo->p_vendor->p_log, OSM_LOG_ERROR, @@ -226,9 +229,10 @@ osmv_transport_init(IN osm_bind_info_t * "alloc failed \n"); IB_MGT_unbind_sm(p_tpot_info->smi_h); IB_MGT_release_handle(p_tpot_info->smi_h); - cl_free(p_mgr); + free(p_mgr); return IB_INSUFFICIENT_MEMORY; } + memset(p_tpot_info->p_smi_list, 0, sizeof(cl_qlist_t)); cl_qlist_init(p_tpot_info->p_smi_list); osm_log(p_bo->p_vendor->p_log, OSM_LOG_DEBUG, @@ -248,17 +252,19 @@ osmv_transport_init(IN osm_bind_info_t * ); IB_MGT_unbind_sm(p_tpot_info->smi_h); IB_MGT_release_handle(p_tpot_info->smi_h); - cl_free(p_tpot_info->p_smi_list); - cl_free(p_mgr); + free(p_tpot_info->p_smi_list); + free(p_mgr); st= IB_ERROR; goto Exit; } } /* insert to list of smi's - for raising callbacks later on */ - p_obj = cl_zalloc(sizeof(cl_list_obj_t)); - cl_qlist_set_obj(p_obj,p_bo); - cl_qlist_insert_tail(p_tpot_info->p_smi_list,&p_obj->list_item); + p_obj = malloc(sizeof(cl_list_obj_t)); + if (p_obj) + memset(p_obj, 0, sizeof(cl_list_obj_t)); + cl_qlist_set_obj(p_obj, p_bo); + cl_qlist_insert_tail(p_tpot_info->p_smi_list, &p_obj->list_item); break; @@ -278,7 +284,7 @@ osmv_transport_init(IN osm_bind_info_t * "osmv_transport_init: ERR 7207: " "IB_MGT_get_handle for gsi failed \n"); st = IB_ERROR; - cl_free(p_mgr); + free(p_mgr); goto Exit; } } @@ -293,21 +299,24 @@ osmv_transport_init(IN osm_bind_info_t * "osmv_transport_init: ERR 7208: " "IB_MGT_bind_gsi_class failed \n"); st = IB_ERROR; - cl_free( p_mgr); + free( p_mgr); goto Exit; } - p_tpot_info->gsi_mgmt_lists[p_info->mad_class] = cl_zalloc(sizeof(cl_qlist_t)); + p_tpot_info->gsi_mgmt_lists[p_info->mad_class] = malloc(sizeof(cl_qlist_t)); if (NULL == p_tpot_info->gsi_mgmt_lists[p_info->mad_class]) { IB_MGT_unbind_gsi_class(p_tpot_info->gsi_h,p_info->mad_class); - cl_free(p_mgr); + free(p_mgr); return IB_INSUFFICIENT_MEMORY; } + memset(p_tpot_info->gsi_mgmt_lists[p_info->mad_class], 0, sizeof(cl_qlist_t)); cl_qlist_init(p_tpot_info->gsi_mgmt_lists[p_info->mad_class]); } /* insert to list of smi's - for raising callbacks later on */ - p_obj = cl_zalloc(sizeof(cl_list_obj_t)); + p_obj = malloc(sizeof(cl_list_obj_t)); + if (p_obj) + memset(p_obj, 0, sizeof(cl_list_obj_t)); cl_qlist_set_obj(p_obj,p_bo); cl_qlist_insert_tail(p_tpot_info->gsi_mgmt_lists[p_info->mad_class],&p_obj->list_item); @@ -322,8 +331,8 @@ osmv_transport_init(IN osm_bind_info_t * if (ret != IB_SUCCESS) { IB_MGT_unbind_gsi_class(p_tpot_info->gsi_h,p_mgr->mgmt_class); - cl_free(p_tpot_info->gsi_mgmt_lists[p_mgr->mgmt_class]); - cl_free(p_mgr); + free(p_tpot_info->gsi_mgmt_lists[p_mgr->mgmt_class]); + free(p_mgr); st= IB_ERROR; goto Exit; } @@ -334,7 +343,7 @@ osmv_transport_init(IN osm_bind_info_t * osm_log(p_log, OSM_LOG_ERROR, "osmv_transport_init: ERR 7209: unrecognized mgmt class \n" ); st = IB_ERROR; - cl_free( p_mgr); + free( p_mgr); goto Exit; } @@ -523,12 +532,12 @@ osmv_transport_done(IN const osm_bind_ha CL_ASSERT(p_item != cl_qlist_end(p_list)); cl_qlist_remove_item(p_list,p_item); - if (p_obj) cl_free(p_obj); + if (p_obj) free(p_obj); /* no one is binded to smi anymore - we can free the list, unbind & realease the hndl*/ if (cl_is_qlist_empty(p_list) == TRUE) { - cl_free(p_list); + free(p_list); p_list = NULL; ret = IB_MGT_unbind_sm(p_tpot_info->smi_h); @@ -566,12 +575,12 @@ osmv_transport_done(IN const osm_bind_ha CL_ASSERT(p_item != cl_qlist_end(p_list)); cl_qlist_remove_item(p_list,p_item); - if (p_obj) cl_free(p_obj); + if (p_obj) free(p_obj); /* no one is binded to this class anymore - we can free the list and unbind this class*/ if (cl_is_qlist_empty(p_list) == TRUE) { - cl_free(p_list); + free(p_list); p_list = NULL; ret = IB_MGT_unbind_gsi_class(p_tpot_info->gsi_h,p_mgr->mgmt_class); @@ -604,7 +613,7 @@ osmv_transport_done(IN const osm_bind_ha } }/* end switch */ - cl_free(p_mgr); + free(p_mgr); } Index: osm/libvendor/osm_vendor_mlx_hca_sim.c =================================================================== --- osm/libvendor/osm_vendor_mlx_hca_sim.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_hca_sim.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -44,8 +44,8 @@ #undef OUT #include -#include #include +#include #include #include #include @@ -560,14 +560,14 @@ __osm_ca_info_init( IN osm_vendor_t * co /* set size of attributes and allocate them */ p_ca_info->attr_size = 1; - p_ca_info->p_attr = ( ib_ca_attr_t * ) cl_malloc( sizeof( ib_ca_attr_t ) ); + p_ca_info->p_attr = ( ib_ca_attr_t * ) malloc( sizeof( ib_ca_attr_t ) ); p_ca_info->p_attr->ca_guid = p_ca_info->guid; p_ca_info->p_attr->num_ports = sim_ca_info.num_ports; /* now obtain the attributes of the ports */ p_ca_info->p_attr->p_port_attr = - ( ib_port_attr_t * ) cl_malloc( sim_ca_info.num_ports * sizeof( ib_port_attr_t ) ); + ( ib_port_attr_t * ) malloc( sim_ca_info.num_ports * sizeof( ib_port_attr_t ) ); /* get all the ports info */ for( port_num = 1; port_num <= sim_ca_info.num_ports; port_num++ ) @@ -625,14 +625,14 @@ osm_ca_info_destroy( IN osm_vendor_t * c { if(0 != p_ca->p_attr->num_ports) { - cl_free( p_ca->p_attr->p_port_attr ); + free( p_ca->p_attr->p_port_attr ); } - cl_free( p_ca->p_attr); + free( p_ca->p_attr); } } - cl_free( p_ca_info ); + free( p_ca_info ); OSM_LOG_EXIT( p_vend->p_log ); } @@ -671,7 +671,7 @@ osm_vendor_get_all_port_attr( IN osm_ven } /* Allocate an array big enough to hold the ca info objects*/ - p_ca_infos = cl_zalloc( ca_count * sizeof( osm_ca_info_t ) ); + p_ca_infos = malloc( ca_count * sizeof( osm_ca_info_t ) ); if( p_ca_infos == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -680,6 +680,8 @@ osm_vendor_get_all_port_attr( IN osm_ven goto Exit; } + memset( p_ca_infos, 0, ca_count * sizeof( osm_ca_info_t ) ); + /* * For each CA, retrieve the CA info attributes */ Index: osm/libvendor/osm_vendor_ts.c =================================================================== --- osm/libvendor/osm_vendor_ts.c (revision 7470) +++ osm/libvendor/osm_vendor_ts.c (working copy) @@ -40,10 +40,10 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include -#include #include #include @@ -227,7 +227,7 @@ osm_vendor_delete( IN osm_vendor_t ** co CL_ASSERT( pp_vend ); osm_vendor_destroy( *pp_vend ); - cl_free( *pp_vend ); + free( *pp_vend ); *pp_vend = NULL; } @@ -272,9 +272,11 @@ osm_vendor_new( CL_ASSERT( p_log ); - p_vend = cl_zalloc( sizeof( *p_vend ) ); + p_vend = malloc( sizeof( *p_vend ) ); if( p_vend != NULL ) { + memset( p_vend, 0, sizeof( *p_vend ) ); + status = osm_vendor_init( p_vend, p_log, timeout ); if( status != IB_SUCCESS ) { @@ -717,7 +719,7 @@ osm_vendor_get( IN osm_bind_handle_t h_b p_vw->size = mad_size; /* allocate it */ - p_mad = ( ib_mad_t * ) cl_zalloc( p_vw->size ); + p_mad = ( ib_mad_t * ) malloc( p_vw->size ); if( p_mad == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -774,7 +776,7 @@ osm_vendor_put( */ /* free the mad but the wrapper is part of the madw object */ - cl_free( p_vw->p_mad_buf ); + free( p_vw->p_mad_buf ); p_vw->p_mad_buf = NULL; p_madw = PARENT_STRUCT( p_vw, osm_madw_t, vend_wrap); p_madw->p_mad = NULL; Index: osm/libvendor/osm_vendor_mlx_anafa.c =================================================================== --- osm/libvendor/osm_vendor_mlx_anafa.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_anafa.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -40,6 +40,7 @@ #endif /* HAVE_CONFIG_H */ #include +#include #include #include #include @@ -55,7 +56,6 @@ #include #include -#include #include /** @@ -86,8 +86,9 @@ osm_vendor_new (IN osm_log_t * const p_l CL_ASSERT (p_log); - p_vend = cl_zalloc (sizeof (*p_vend)); + p_vend = malloc (sizeof (*p_vend)); if (p_vend != NULL) { + memset(p_vend, 0, sizeof (*p_vend)); status = osm_vendor_init (p_vend, p_log, timeout); if (status != IB_SUCCESS) { osm_vendor_delete (&p_vend); @@ -134,14 +135,14 @@ osm_vendor_delete (IN osm_vendor_t ** co __osm_vendor_internal_unbind (bind_h); - cl_free (p_obj); + free (p_obj); /*removing from list */ p_item = cl_qlist_remove_head (&((*pp_vend)->bind_handles)); } } if (NULL != ((*pp_vend)->p_transport_info)) { - cl_free ((*pp_vend)->p_transport_info); + free ((*pp_vend)->p_transport_info); (*pp_vend)->p_transport_info = NULL; } @@ -150,7 +151,7 @@ osm_vendor_delete (IN osm_vendor_t ** co osm_pkt_randomizer_destroy (&((*pp_vend)->p_pkt_randomizer), p_log); - cl_free (*pp_vend); + free (*pp_vend); *pp_vend = NULL; OSM_LOG_EXIT (p_log); @@ -177,11 +178,13 @@ osm_vendor_init (IN osm_vendor_t * const p_vend->ttime_timeout = timeout * OSMV_TXN_TIMEOUT_FACTOR; p_vend->p_transport_info = (osmv_TOPSPIN_ANAFA_transport_info_t *) - cl_zalloc (sizeof (osmv_TOPSPIN_ANAFA_transport_info_t)); + malloc (sizeof (osmv_TOPSPIN_ANAFA_transport_info_t)); if (!p_vend->p_transport_info) { return IB_ERROR; } + memset(p_vend->p_transport_info, 0, sizeof (osmv_TOPSPIN_ANAFA_transport_info_t)); + /* update the run_randomizer flag */ if (getenv ("OSM_PKT_DROP_RATE") != NULL && atol (getenv ("OSM_PKT_DROP_RATE")) != 0) { @@ -247,7 +250,7 @@ osm_vendor_bind (IN osm_vendor_t * const return OSM_BIND_INVALID_HANDLE; } - p_bo = cl_zalloc (sizeof (osmv_bind_obj_t)); + p_bo = malloc (sizeof (osmv_bind_obj_t)); if (NULL == p_bo) { osm_log (p_bo->p_vendor->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 7403: " @@ -255,6 +258,7 @@ osm_vendor_bind (IN osm_vendor_t * const return OSM_BIND_INVALID_HANDLE; } + memset (p_bo, 0, sizeof (osmv_bind_obj_t)); p_bo->p_vendor = p_vend; p_bo->recv_cb = mad_recv_callback; p_bo->send_err_cb = send_err_callback; @@ -276,7 +280,7 @@ osm_vendor_bind (IN osm_vendor_t * const osm_log (p_bo->p_vendor->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 7405: " "could not initialize the spinlock ...\n"); - cl_free (p_bo); + free (p_bo); return OSM_BIND_INVALID_HANDLE; } @@ -288,7 +292,7 @@ osm_vendor_bind (IN osm_vendor_t * const "osm_vendor_bind: ERR 7406: " "osmv_txnmgr_init failed \n"); cl_spinlock_destroy (&p_bo->lock); - cl_free (p_bo); + free (p_bo); return OSM_BIND_INVALID_HANDLE; } @@ -300,12 +304,12 @@ osm_vendor_bind (IN osm_vendor_t * const "osmv_transport_init failed \n"); osmv_txnmgr_done ((osm_bind_handle_t) p_bo); cl_spinlock_destroy (&p_bo->lock); - cl_free (p_bo); + free (p_bo); return OSM_BIND_INVALID_HANDLE; } /* insert bind handle into db */ - p_obj = cl_zalloc (sizeof (cl_list_obj_t)); + p_obj = malloc (sizeof (cl_list_obj_t)); if (NULL == p_obj) { osm_log (p_bo->p_vendor->p_log, OSM_LOG_ERROR, @@ -315,9 +319,11 @@ osm_vendor_bind (IN osm_vendor_t * const osmv_transport_done (p_bo->p_transp_mgr); osmv_txnmgr_done ((osm_bind_handle_t) p_bo); cl_spinlock_destroy (&p_bo->lock); - cl_free (p_bo); + free (p_bo); return OSM_BIND_INVALID_HANDLE; } + if (p_obj) + memset (p_obj, 0, sizeof (cl_list_obj_t)); cl_qlist_set_obj (p_obj, p_bo); cl_qlist_insert_head (&p_vend->bind_handles, &p_obj->list_item); @@ -357,7 +363,7 @@ osm_vendor_unbind (IN osm_bind_handle_t CL_ASSERT (p_item != cl_qlist_end (p_bh_list)); cl_qlist_remove_item (p_bh_list, p_item); - cl_free (p_obj); + free (p_obj); __osm_vendor_internal_unbind (h_bind); @@ -391,7 +397,7 @@ osm_vendor_get (IN osm_bind_handle_t h_b } /* allocate it */ - p_mad = (ib_mad_t *) cl_zalloc (act_mad_size); + p_mad = (ib_mad_t *) malloc (act_mad_size); if (p_mad == NULL) { osm_log (p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_get: ERR 7409: " @@ -399,6 +405,8 @@ osm_vendor_get (IN osm_bind_handle_t h_b goto Exit; } + memset (p_mad, 0, act_mad_size); + if (osm_log_get_level (p_vend->p_log) >= OSM_LOG_DEBUG) { osm_log (p_vend->p_log, OSM_LOG_DEBUG, "osm_vendor_get: " @@ -547,7 +555,7 @@ osm_vendor_put (IN osm_bind_handle_t h_b "osm_vendor_put: " "Retiring MAD %p.\n", p_vw->p_mad); } - cl_free (p_vw->p_mad); + free (p_vw->p_mad); p_vw->p_mad = NULL; OSM_LOG_EXIT (p_vend->p_log); @@ -656,7 +664,7 @@ __osm_vendor_internal_unbind (osm_bind_h the client - and the client might use them cl_spinlock_destroy(&p_bo->lock); - cl_free(p_bo); + free(p_bo); */ OSM_LOG_EXIT (p_log); Index: osm/libvendor/osm_vendor_mtl.c =================================================================== --- osm/libvendor/osm_vendor_mtl.c (revision 7470) +++ osm/libvendor/osm_vendor_mtl.c (working copy) @@ -43,8 +43,8 @@ #ifdef OSM_VENDOR_INTF_MTL +#include #include -#include #include #include /* HACK - I do not know how to prevent complib from loading kernel H files */ @@ -306,7 +306,7 @@ osm_vendor_delete( IN osm_vendor_t ** co CL_ASSERT( pp_vend ); osm_vendor_destroy( *pp_vend ); - cl_free( *pp_vend ); + free( *pp_vend ); *pp_vend = NULL; } @@ -337,7 +337,7 @@ osm_vendor_init( IN osm_vendor_t * const */ ib_mgt_hdl_p = ( osm_vendor_mgt_bind_t * ) - cl_malloc( sizeof( osm_vendor_mgt_bind_t ) ); + malloc( sizeof( osm_vendor_mgt_bind_t ) ); if( ib_mgt_hdl_p == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -376,9 +376,10 @@ osm_vendor_new( CL_ASSERT( p_log ); - p_vend = cl_zalloc( sizeof( *p_vend ) ); + p_vend = malloc( sizeof( *p_vend ) ); if( p_vend != NULL ) { + memset( p_vend, 0, sizeof( *p_vend ) ); status = osm_vendor_init( p_vend, p_log, timeout ); if( status != IB_SUCCESS ) { @@ -675,7 +676,7 @@ osm_vendor_bind( IN osm_vendor_t * const } /* create the bind object tracking this binding */ - p_bind = (osm_mtl_bind_info_t *)cl_malloc( sizeof(osm_mtl_bind_info_t) ); + p_bind = (osm_mtl_bind_info_t *)malloc( sizeof(osm_mtl_bind_info_t) ); memset(p_bind, 0, sizeof(osm_mtl_bind_info_t)); if( p_bind == NULL ) { @@ -736,7 +737,7 @@ osm_vendor_bind( IN osm_vendor_t * const &( ib_mgt_hdl_p->smi_mads_hdl ) ); if( IB_MGT_OK != mgt_ret ) { - cl_free( p_bind ); + free( p_bind ); p_bind = NULL; osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 3C16: " @@ -748,7 +749,7 @@ osm_vendor_bind( IN osm_vendor_t * const mgt_ret = IB_MGT_bind_sm( ib_mgt_hdl_p->smi_mads_hdl ); if( IB_MGT_OK != mgt_ret ) { - cl_free( p_bind ); + free( p_bind ); p_bind = NULL; osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 3C17: " @@ -790,7 +791,7 @@ osm_vendor_bind( IN osm_vendor_t * const &( ib_mgt_hdl_p->gsi_mads_hdl ) ); if( IB_MGT_OK != mgt_ret ) { - cl_free( p_bind ); + free( p_bind ); p_bind = NULL; osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 3C20: " @@ -804,7 +805,7 @@ osm_vendor_bind( IN osm_vendor_t * const p_user_bind->mad_class ); if( IB_MGT_OK != mgt_ret ) { - cl_free( p_bind ); + free( p_bind ); p_bind = NULL; osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 3C22: " @@ -837,7 +838,7 @@ osm_vendor_bind( IN osm_vendor_t * const if( IB_MGT_OK != mgt_ret ) { - cl_free( p_bind ); + free( p_bind ); p_bind = NULL; osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 3C23: " @@ -875,7 +876,7 @@ osm_vendor_get( IN osm_bind_handle_t h_b p_vw->size = MAD_BLOCK_SIZE; /* allocate it */ - mad_p = ( ib_mad_t * ) cl_zalloc( p_vw->size ); + mad_p = ( ib_mad_t * ) malloc( p_vw->size ); if( mad_p == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -931,7 +932,7 @@ osm_vendor_put( IN osm_bind_handle_t h_b */ /* free the mad but the wrapper is part of the madw object */ - cl_free( p_vw->mad_buf_p ); + free( p_vw->mad_buf_p ); p_vw->mad_buf_p = NULL; p_madw = PARENT_STRUCT( p_vw, osm_madw_t, vend_wrap); p_madw->p_mad = NULL; Index: osm/libvendor/osm_vendor_al.c =================================================================== --- osm/libvendor/osm_vendor_al.c (revision 7470) +++ osm/libvendor/osm_vendor_al.c (working copy) @@ -59,8 +59,8 @@ #ifdef OSM_VENDOR_INTF_AL +#include #include -#include #include #include #include @@ -415,7 +415,7 @@ osm_vendor_new( OSM_LOG_ENTER( p_log, osm_vendor_new ); - p_vend = cl_zalloc( sizeof(*p_vend) ); + p_vend = malloc( sizeof(*p_vend) ); if( p_vend == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -424,10 +424,12 @@ osm_vendor_new( goto Exit; } + memset( p_vend, 0, sizeof(*p_vend) ); + status = osm_vendor_init( p_vend, p_log, timeout ); if( status != IB_SUCCESS ) { - cl_free( p_vend ); + free( p_vend ); p_vend = NULL; } @@ -444,7 +446,7 @@ osm_vendor_delete( { /* TO DO - fill this in */ ib_close_al( (*pp_vend)->h_al ); - cl_free( *pp_vend ); + free( *pp_vend ); *pp_vend = NULL; } @@ -483,7 +485,7 @@ __osm_ca_info_init( CL_ASSERT( p_ca_info->attr_size ); - p_ca_info->p_attr = cl_malloc( p_ca_info->attr_size ); + p_ca_info->p_attr = malloc( p_ca_info->attr_size ); if( p_ca_info->p_attr == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -519,9 +521,9 @@ osm_ca_info_destroy( OSM_LOG_ENTER( p_vend->p_log, osm_ca_info_destroy ); if( p_ca_info->p_attr ) - cl_free( p_ca_info->p_attr ); + free( p_ca_info->p_attr ); - cl_free( p_ca_info ); + free( p_ca_info ); OSM_LOG_EXIT( p_vend->p_log ); } @@ -540,10 +542,12 @@ osm_ca_info_new( CL_ASSERT( ca_guid ); - p_ca_info = cl_zalloc( sizeof(*p_ca_info) ); + p_ca_info = malloc( sizeof(*p_ca_info) ); if( p_ca_info == NULL ) goto Exit; + memset( p_ca_info, 0, sizeof(*p_ca_info) ); + status = __osm_ca_info_init( p_vend, p_ca_info, ca_guid ); if( status != IB_SUCCESS ) { @@ -591,7 +595,7 @@ __osm_vendor_get_ca_guids( goto Exit; } - *p_guids = cl_malloc( *p_num_guids * sizeof(**p_guids) ); + *p_guids = malloc( *p_num_guids * sizeof(**p_guids) ); if( *p_guids == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -681,7 +685,7 @@ osm_vendor_get_all_port_attr( */ status = __osm_vendor_get_ca_guids( p_vend, &p_ca_guid, &ca_count ); - p_vend->p_ca_info = cl_zalloc( ca_count * sizeof(*p_vend->p_ca_info) ); + p_vend->p_ca_info = malloc( ca_count * sizeof(*p_vend->p_ca_info) ); if( p_vend->p_ca_info == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -690,6 +694,7 @@ osm_vendor_get_all_port_attr( goto Exit; } + memset( p_vend->p_ca_info, 0, ca_count * sizeof(*p_vend->p_ca_info) ); p_vend->ca_count = ca_count; /* @@ -748,7 +753,7 @@ osm_vendor_get_all_port_attr( Exit: if( p_ca_guid ) - cl_free( p_ca_guid ); + free( p_ca_guid ); OSM_LOG_EXIT( p_vend->p_log ); return( status ); @@ -1003,7 +1008,7 @@ osm_vendor_bind( } } - p_bind = cl_zalloc( sizeof(*p_bind) ); + p_bind = malloc( sizeof(*p_bind) ); if( p_bind == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -1012,6 +1017,7 @@ osm_vendor_bind( goto Exit; } + memset( p_bind, 0, sizeof(*p_bind) ); p_bind->p_vend = p_vend; p_bind->client_context = context; p_bind->port_num = osm_vendor_get_port_num( p_vend, port_guid ); @@ -1055,7 +1061,7 @@ osm_vendor_bind( if( status != IB_SUCCESS ) { - cl_free( p_bind ); + free( p_bind ); osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 3B19: " "Unable to get QP handle (%s).\n", @@ -1088,7 +1094,7 @@ osm_vendor_bind( if( status != IB_SUCCESS ) { - cl_free( p_bind ); + free( p_bind ); osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 3B21: " "Unable to register QP0 MAD service (%s).\n", Index: osm/libvendor/osm_vendor_mlx_ts_anafa.c =================================================================== --- osm/libvendor/osm_vendor_mlx_ts_anafa.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_ts_anafa.c (working copy) @@ -52,6 +52,7 @@ #include #include #include +#include #include #include @@ -60,7 +61,6 @@ #include #include -#include #include static void @@ -186,11 +186,13 @@ osmv_transport_init ( (osmv_TOPSPIN_ANAFA_transport_info_t *) p_bo->p_vendor-> p_transport_info; - p_mgr = cl_zalloc (sizeof (osmv_TOPSPIN_ANAFA_transport_mgr_t)); + p_mgr = malloc (sizeof (osmv_TOPSPIN_ANAFA_transport_mgr_t)); if (!p_mgr) { return IB_INSUFFICIENT_MEMORY; } + memset(p_mgr, 0, sizeof (osmv_TOPSPIN_ANAFA_transport_mgr_t)); + /* open TopSpin file device */ device_fd = open (device_file, O_RDWR); if (device_fd < 0) { @@ -355,7 +357,7 @@ osmv_transport_done (IN const osm_bind_h /* pthread_cancel (p_tpot_mgr->receiver.osd.id); */ cl_thread_destroy (&(p_tpot_mgr->receiver)); - cl_free (p_tpot_mgr); + free (p_tpot_mgr); } static void Index: osm/libvendor/osm_vendor_mlx.c =================================================================== --- osm/libvendor/osm_vendor_mlx.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx.c (working copy) @@ -38,6 +38,7 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include @@ -45,7 +46,6 @@ #include #include #include -#include /** * FORWARD REFERENCES @@ -78,9 +78,11 @@ osm_vendor_new( CL_ASSERT( p_log ); - p_vend = cl_zalloc( sizeof( *p_vend ) ); + p_vend = malloc( sizeof( *p_vend ) ); if ( p_vend != NULL ) { + memset( p_vend, 0, sizeof( *p_vend ) ); + status = osm_vendor_init( p_vend, p_log, timeout ); if ( status != IB_SUCCESS ) { @@ -126,14 +128,14 @@ osm_vendor_delete( IN osm_vendor_t ** co __osm_vendor_internal_unbind(bind_h); - cl_free(p_obj); + free(p_obj); /*removing from list */ p_item = cl_qlist_remove_head(&((*pp_vend)->bind_handles)); } if (NULL != ((*pp_vend)->p_transport_info)) { - cl_free((*pp_vend)->p_transport_info); + free((*pp_vend)->p_transport_info); (*pp_vend)->p_transport_info = NULL; } @@ -141,7 +143,7 @@ osm_vendor_delete( IN osm_vendor_t ** co if ( (*pp_vend)->run_randomizer == TRUE ) osm_pkt_randomizer_destroy( &((*pp_vend)->p_pkt_randomizer), p_log ); - cl_free( *pp_vend ); + free( *pp_vend ); *pp_vend = NULL; OSM_LOG_EXIT( p_log ); @@ -223,7 +225,7 @@ osm_vendor_bind( return OSM_BIND_INVALID_HANDLE; } - p_bo = cl_zalloc(sizeof(osmv_bind_obj_t)); + p_bo = malloc(sizeof(osmv_bind_obj_t)); if (NULL == p_bo) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -231,6 +233,7 @@ osm_vendor_bind( return OSM_BIND_INVALID_HANDLE; } + memset(p_bo, 0, sizeof(osmv_bind_obj_t)); p_bo->p_vendor = p_vend; p_bo->recv_cb = mad_recv_callback; p_bo->send_err_cb = send_err_callback; @@ -257,7 +260,7 @@ osm_vendor_bind( "Fail to find port number of port guid:0x%016"PRIx64"\n", p_bind_info->port_guid ); - cl_free(p_bo); + free(p_bo); return OSM_BIND_INVALID_HANDLE; } @@ -275,7 +278,7 @@ osm_vendor_bind( osm_log(p_bo->p_vendor->p_log,OSM_LOG_ERROR, "osm_vendor_bind: ERR 7305: " "could not initialize the spinlock ...\n"); - cl_free(p_bo); + free(p_bo); return OSM_BIND_INVALID_HANDLE; } @@ -287,7 +290,7 @@ osm_vendor_bind( "osm_vendor_bind: ERR 7306: " "osmv_txnmgr_init failed \n"); cl_spinlock_destroy(&p_bo->lock); - cl_free(p_bo); + free(p_bo); return OSM_BIND_INVALID_HANDLE; } @@ -299,12 +302,12 @@ osm_vendor_bind( "osmv_transport_init failed \n"); osmv_txnmgr_done((osm_bind_handle_t) p_bo); cl_spinlock_destroy(&p_bo->lock); - cl_free(p_bo); + free(p_bo); return OSM_BIND_INVALID_HANDLE; } /* insert bind handle into db */ - p_obj = cl_zalloc(sizeof(cl_list_obj_t)); + p_obj = malloc(sizeof(cl_list_obj_t)); if (NULL == p_obj) { @@ -315,9 +318,10 @@ osm_vendor_bind( osmv_transport_done(p_bo->p_transp_mgr); osmv_txnmgr_done((osm_bind_handle_t) p_bo); cl_spinlock_destroy(&p_bo->lock); - cl_free(p_bo); + free(p_bo); return OSM_BIND_INVALID_HANDLE; } + memset(p_obj, 0, sizeof(cl_list_obj_t)); cl_qlist_set_obj(p_obj, p_bo); cl_qlist_insert_head(&p_vend->bind_handles,&p_obj->list_item); @@ -357,7 +361,7 @@ osm_vendor_unbind(IN osm_bind_handle_t CL_ASSERT(p_item != cl_qlist_end(p_bh_list)); cl_qlist_remove_item(p_bh_list,p_item); - if (p_obj) cl_free(p_obj); + if (p_obj) free(p_obj); if (h_bind != 0) { @@ -398,7 +402,7 @@ osm_vendor_get( IN osm_bind_handle_t h_b } /* allocate it */ - p_mad = ( ib_mad_t * ) cl_zalloc( act_mad_size ); + p_mad = ( ib_mad_t * ) malloc( act_mad_size ); if ( p_mad == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -407,6 +411,8 @@ osm_vendor_get( IN osm_bind_handle_t h_b goto Exit; } + memset( p_mad, 0, act_mad_size ); + if ( osm_log_get_level( p_vend->p_log ) >= OSM_LOG_DEBUG ) { osm_log( p_vend->p_log, OSM_LOG_DEBUG, @@ -583,7 +589,7 @@ osm_vendor_put( "osm_vendor_put: " "Retiring MAD %p.\n", p_vw->p_mad ); } - cl_free( p_vw->p_mad ); + free( p_vw->p_mad ); p_vw->p_mad = NULL; OSM_LOG_EXIT( p_vend->p_log ); @@ -708,7 +714,7 @@ __osm_vendor_internal_unbind(osm_bind_ha the client - and the client might use them cl_spinlock_destroy(&p_bo->lock); - cl_free(p_bo); + free(p_bo); */ OSM_LOG_EXIT(p_log); Index: osm/libvendor/osm_vendor_mlx_hca_anafa.c =================================================================== --- osm/libvendor/osm_vendor_mlx_hca_anafa.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_hca_anafa.c (working copy) @@ -43,11 +43,11 @@ #undef IN #undef OUT +#include #include #include #include -#include #include #include @@ -110,7 +110,7 @@ __osm_ca_info_init (IN osm_vendor_t * co p_ca_info->attr.num_ports = 1; p_ca_info->attr.p_port_attr = - (ib_port_attr_t *) cl_malloc (1 * sizeof (ib_port_attr_t)); + (ib_port_attr_t *) malloc (1 * sizeof (ib_port_attr_t)); port_info.port = 1; ioctl_ret = Index: osm/complib/cl_timer.c =================================================================== --- osm/complib/cl_timer.c (revision 7470) +++ osm/complib/cl_timer.c (working copy) @@ -48,9 +48,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include #include #include #include @@ -83,9 +83,11 @@ __cl_timer_prov_create( void ) { CL_ASSERT( gp_timer_prov == NULL ); - gp_timer_prov = cl_zalloc( sizeof(cl_timer_prov_t) ); + gp_timer_prov = malloc( sizeof(cl_timer_prov_t) ); if( !gp_timer_prov ) return( CL_INSUFFICIENT_MEMORY ); + else + memset( gp_timer_prov, 0, sizeof(cl_timer_prov_t) ); cl_qlist_init( &gp_timer_prov->queue ); @@ -122,7 +124,7 @@ __cl_timer_prov_destroy( void ) pthread_cond_destroy( &gp_timer_prov->cond ); /* Free the memory and reset the global pointer. */ - cl_free( gp_timer_prov ); + free( gp_timer_prov ); gp_timer_prov = NULL; } Index: osm/complib/cl_dispatcher.c =================================================================== --- osm/complib/cl_dispatcher.c (revision 7470) +++ osm/complib/cl_dispatcher.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -51,7 +51,7 @@ # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include @@ -161,7 +161,7 @@ cl_disp_shutdown( /* Free all registration info. */ while( !cl_is_qlist_empty( &p_disp->reg_list ) ) - cl_free( cl_qlist_remove_head( &p_disp->reg_list ) ); + free( cl_qlist_remove_head( &p_disp->reg_list ) ); } /******************************************************************** @@ -253,12 +253,16 @@ cl_disp_register( } /* Get a registration info from the pool. */ - p_reg = (cl_disp_reg_info_t*)cl_zalloc( sizeof(cl_disp_reg_info_t) ); + p_reg = (cl_disp_reg_info_t*)malloc( sizeof(cl_disp_reg_info_t) ); if( !p_reg ) { cl_spinlock_release( &p_disp->lock ); return( NULL ); } + else + { + memset( p_reg, 0, sizeof(cl_disp_reg_info_t) ); + } p_reg->p_disp = p_disp; p_reg->ref_cnt = 0; @@ -276,7 +280,7 @@ cl_disp_register( status = cl_ptr_vector_set( &p_disp->reg_vec, msg_id, p_reg ); if( status != CL_SUCCESS ) { - cl_free( p_reg ); + free( p_reg ); cl_spinlock_release( &p_disp->lock ); return( NULL ); } @@ -323,7 +327,7 @@ cl_disp_unregister( /* Remove the registrant from the list. */ cl_qlist_remove_item( &p_disp->reg_list, (cl_list_item_t*)p_reg ); /* Return the registration info to the pool */ - cl_free( p_reg ); + free( p_reg ); cl_spinlock_release( &p_disp->lock ); } Index: osm/complib/cl_ptr_vector.c =================================================================== --- osm/complib/cl_ptr_vector.c (revision 7470) +++ osm/complib/cl_ptr_vector.c (working copy) @@ -51,9 +51,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include void @@ -113,7 +113,7 @@ cl_ptr_vector_destroy( /* Destroy the page vector. */ if( p_vector->p_ptr_array ) { - cl_free( (void*)p_vector->p_ptr_array ); + free( (void*)p_vector->p_ptr_array ); p_vector->p_ptr_array = NULL; } } @@ -214,9 +214,11 @@ cl_ptr_vector_set_capacity( } /* Allocate our pointer array. */ - p_new_ptr_array = cl_zalloc( new_capacity * sizeof(void*) ); + p_new_ptr_array = malloc( new_capacity * sizeof(void*) ); if( !p_new_ptr_array ) return( CL_INSUFFICIENT_MEMORY ); + else + memset( p_new_ptr_array, 0, new_capacity * sizeof(void*) ); if( p_vector->p_ptr_array ) { @@ -225,7 +227,7 @@ cl_ptr_vector_set_capacity( p_vector->capacity * sizeof(void*) ); /* Free the old pointer array. */ - cl_free( (void*)p_vector->p_ptr_array ); + free( (void*)p_vector->p_ptr_array ); } /* Set the new array. */ Index: osm/complib/cl_perf.c =================================================================== --- osm/complib/cl_perf.c (revision 7470) +++ osm/complib/cl_perf.c (working copy) @@ -51,6 +51,7 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include /* @@ -64,7 +65,6 @@ #include #include -#include @@ -108,10 +108,13 @@ __cl_perf_init( /* Allocate an array of counters. */ p_perf->size = num_counters; p_perf->data_array = (cl_perf_data_t*) - cl_zalloc( sizeof(cl_perf_data_t) * num_counters ); + malloc( sizeof(cl_perf_data_t) * num_counters ); if( !p_perf->data_array ) return( CL_INSUFFICIENT_MEMORY ); + else + memset( p_perf->data_array, 0, + sizeof(cl_perf_data_t) * num_counters ); /* Initialize the user's counters. */ for( i = 0; i < num_counters; i++ ) @@ -223,7 +226,7 @@ __cl_perf_destroy( for( i = 0; i < p_perf->size; i++ ) cl_spinlock_destroy( &p_perf->data_array[i].lock ); - cl_free( p_perf->data_array ); + free( p_perf->data_array ); p_perf->data_array = NULL; p_perf->state = CL_UNINITIALIZED; Index: osm/complib/cl_threadpool.c =================================================================== --- osm/complib/cl_threadpool.c (revision 7470) +++ osm/complib/cl_threadpool.c (working copy) @@ -51,10 +51,10 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include -#include void @@ -151,7 +151,7 @@ cl_thread_pool_init( for( i = 0; i < count; i++ ) { /* Create a new thread. */ - p_thread = (cl_thread_t*)cl_malloc( sizeof(cl_thread_t) ); + p_thread = (cl_thread_t*)malloc( sizeof(cl_thread_t) ); if( !p_thread ) { cl_thread_pool_destroy( p_thread_pool ); @@ -229,7 +229,7 @@ cl_thread_pool_destroy( p_thread = (cl_thread_t*)cl_list_remove_head( &p_thread_pool->thread_list ); cl_thread_destroy( p_thread ); - cl_free( p_thread ); + free( p_thread ); } } Index: osm/complib/cl_vector.c =================================================================== --- osm/complib/cl_vector.c (revision 7470) +++ osm/complib/cl_vector.c (working copy) @@ -51,9 +51,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include /* @@ -316,12 +316,12 @@ cl_vector_destroy( /* Deallocate the pages */ while( !cl_is_qlist_empty( &p_vector->alloc_list ) ) - cl_free( cl_qlist_remove_head( &p_vector->alloc_list ) ); + free( cl_qlist_remove_head( &p_vector->alloc_list ) ); /* Destroy the page vector. */ if( p_vector->p_ptr_array ) { - cl_free( p_vector->p_ptr_array ); + free( p_vector->p_ptr_array ); p_vector->p_ptr_array = NULL; } } @@ -406,9 +406,11 @@ cl_vector_set_capacity( } /* Allocate our pointer array. */ - p_new_ptr_array = cl_zalloc( new_capacity * sizeof(void*) ); + p_new_ptr_array = malloc( new_capacity * sizeof(void*) ); if( !p_new_ptr_array ) return( CL_INSUFFICIENT_MEMORY ); + else + memset( p_new_ptr_array, 0, new_capacity * sizeof(void*) ); if( p_vector->p_ptr_array ) { @@ -417,7 +419,7 @@ cl_vector_set_capacity( p_vector->capacity * sizeof(void*) ); /* Free the old pointer array. */ - cl_free( p_vector->p_ptr_array ); + free( p_vector->p_ptr_array ); } /* Set the new array. */ @@ -431,9 +433,11 @@ cl_vector_set_capacity( /* Determine the allocation size for the new array elements. */ alloc_size = new_elements * p_vector->element_size; - p_buf = (cl_list_item_t*)cl_zalloc( alloc_size + sizeof(cl_list_item_t) ); + p_buf = (cl_list_item_t*)malloc( alloc_size + sizeof(cl_list_item_t) ); if( !p_buf ) return( CL_INSUFFICIENT_MEMORY ); + else + memset( p_buf, 0, alloc_size + sizeof(cl_list_item_t) ); cl_qlist_insert_tail( &p_vector->alloc_list, p_buf ); /* Advance the buffer pointer past the list item. */ Index: osm/complib/cl_event_wheel.c =================================================================== --- osm/complib/cl_event_wheel.c (revision 7470) +++ osm/complib/cl_event_wheel.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -40,7 +40,7 @@ #endif /* HAVE_CONFIG_H */ #include -#include +#include #include #include @@ -130,7 +130,7 @@ __cl_event_wheel_callback( IN void* cont cl_qlist_remove_head(&p_event_wheel->events_wheel); /* delete the event info object - allocated by cl_event_wheel_reg */ - cl_free(p_event); + free(p_event); } else { @@ -330,7 +330,7 @@ cl_event_wheel_destroy( /* remove it from the map */ p_map_item = &(p_event->map_item); cl_qmap_remove_item(&p_event_wheel->events_map, p_map_item); - cl_free(p_event); /* allocated by cl_event_wheel_reg */ + free(p_event); /* allocated by cl_event_wheel_reg */ p_list_item = cl_qlist_remove_head(&p_event_wheel->events_wheel); } @@ -387,7 +387,7 @@ cl_event_wheel_reg( { /* make a new one */ p_event = (cl_event_wheel_reg_info_t *) - cl_malloc( sizeof (cl_event_wheel_reg_info_t) ); + malloc( sizeof (cl_event_wheel_reg_info_t) ); p_event->num_regs = 0; } @@ -504,7 +504,7 @@ cl_event_wheel_unreg( "Removed key:0x%"PRIx64"\n", key ); /* free the item */ - cl_free(p_event); + free(p_event); } else { Index: osm/complib/cl_pool.c =================================================================== --- osm/complib/cl_pool.c (revision 7470) +++ osm/complib/cl_pool.c (working copy) @@ -52,12 +52,12 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include #include #include -#include #include @@ -116,11 +116,14 @@ cl_qcpool_init( * Allocate the array of component sizes and component pointers all * in one allocation. */ - p_pool->component_sizes = (size_t*)cl_zalloc( + p_pool->component_sizes = (size_t*)malloc( (sizeof(size_t) + sizeof(void*)) * num_components ); if( !p_pool->component_sizes ) return( CL_INSUFFICIENT_MEMORY ); + else + memset( p_pool->component_sizes, 0, + (sizeof(size_t) + sizeof(void*)) * num_components ); /* Calculate the pointer to the array of pointers, used for callbacks. */ p_pool->p_components = @@ -213,11 +216,11 @@ cl_qcpool_destroy( /* Free all alocated memory blocks. */ while( !cl_is_qlist_empty( &p_pool->alloc_list ) ) - cl_free( cl_qlist_remove_head( &p_pool->alloc_list ) ); + free( cl_qlist_remove_head( &p_pool->alloc_list ) ); if( p_pool->component_sizes ) { - cl_free( p_pool->component_sizes ); + free( p_pool->component_sizes ); p_pool->component_sizes = NULL; } } @@ -256,11 +259,14 @@ cl_qcpool_grow( /* Allocate the buffer for the new objects. */ p_objects = (uint8_t*) - cl_zalloc( sizeof(cl_list_item_t) + (obj_size * obj_count) ); + malloc( sizeof(cl_list_item_t) + (obj_size * obj_count) ); /* Make sure the allocation succeeded. */ if( !p_objects ) return( CL_INSUFFICIENT_MEMORY ); + else + memset( p_objects, 0, + sizeof(cl_list_item_t) + (obj_size * obj_count) ); /* Insert the allocation in our list. */ cl_qlist_insert_tail( &p_pool->alloc_list, (cl_list_item_t*)p_objects ); Index: osm/osmtest/osmtest.c =================================================================== --- osm/osmtest/osmtest.c (revision 7470) +++ osm/osmtest/osmtest.c (working copy) @@ -64,7 +64,6 @@ #include #endif #include -#include #include "osmtest.h" #ifndef __WIN__ @@ -460,21 +459,21 @@ osmtest_destroy( IN osmtest_t * const p_ { p_item = p_next_item; p_next_item = cl_qmap_next( p_item ); - cl_free( p_item ); + free( p_item ); } p_next_item = cl_qmap_head( &p_osmt->exp_subn.mgrp_mlid_tbl ); while( p_next_item != cl_qmap_end( &p_osmt->exp_subn.mgrp_mlid_tbl ) ) { p_item = p_next_item; p_next_item = cl_qmap_next( p_item ); - cl_free( p_item ); + free( p_item ); } p_next_item = cl_qmap_head( &p_osmt->exp_subn.node_guid_tbl ); while( p_next_item != cl_qmap_end( &p_osmt->exp_subn.node_guid_tbl ) ) { p_item = p_next_item; p_next_item = cl_qmap_next( p_item ); - cl_free( p_item ); + free( p_item ); } p_next_item = cl_qmap_head( &p_osmt->exp_subn.node_lid_tbl ); @@ -482,7 +481,7 @@ osmtest_destroy( IN osmtest_t * const p_ { p_item = p_next_item; p_next_item = cl_qmap_next( p_item ); - cl_free( p_item ); + free( p_item ); } p_next_item = cl_qmap_head( &p_osmt->exp_subn.path_tbl ); @@ -490,14 +489,14 @@ osmtest_destroy( IN osmtest_t * const p_ { p_item = p_next_item; p_next_item = cl_qmap_next( p_item ); - cl_free( p_item ); + free( p_item ); } p_next_item = cl_qmap_head( &p_osmt->exp_subn.port_key_tbl ); while( p_next_item != cl_qmap_end( &p_osmt->exp_subn.port_key_tbl ) ) { p_item = p_next_item; p_next_item = cl_qmap_next( p_item ); - cl_free( p_item ); + free( p_item ); } osm_log_destroy( &p_osmt->log ); Index: osm/osmtest/include/osmtest_subnet.h =================================================================== --- osm/osmtest/include/osmtest_subnet.h (revision 7470) +++ osm/osmtest/include/osmtest_subnet.h (working copy) @@ -1,4 +1,5 @@ /* + * Copyright (c) 2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -47,7 +48,7 @@ #ifndef _OSMTEST_SUBNET_H_ #define _OSMTEST_SUBNET_H_ -#include +#include #include #include #include @@ -121,14 +122,16 @@ node_new( void ) { node_t *p_obj; - p_obj = cl_zalloc( sizeof( *p_obj ) ); + p_obj = malloc( sizeof( *p_obj ) ); + if (p_obj) + memset( p_obj, 0, sizeof( *p_obj ) ); return ( p_obj ); } static inline void node_delete( IN node_t * p_obj ) { - cl_free( p_obj ); + free( p_obj ); } /****s* Subnet Database/port_t @@ -179,14 +182,16 @@ port_new( void ) { port_t *p_obj; - p_obj = cl_zalloc( sizeof( *p_obj ) ); + p_obj = malloc( sizeof( *p_obj ) ); + if (p_obj) + memset( p_obj, 0, sizeof( *p_obj ) ); return ( p_obj ); } static inline void port_delete( IN port_t * p_obj ) { - cl_free( p_obj ); + free( p_obj ); } static inline uint64_t @@ -268,14 +273,16 @@ path_new( void ) { path_t *p_obj; - p_obj = cl_zalloc( sizeof( *p_obj ) ); + p_obj = malloc( sizeof( *p_obj ) ); + if (p_obj) + memset( p_obj, 0, sizeof( *p_obj ) ); return ( p_obj ); } static inline void path_delete( IN path_t * p_obj ) { - cl_free( p_obj ); + free( p_obj ); } /****s* Subnet Database/subnet_t Index: osm/osmtest/osmt_service.c =================================================================== --- osm/osmtest/osmt_service.c (revision 7470) +++ osm/osmtest/osmt_service.c (working copy) @@ -60,7 +60,6 @@ #include #include #include -#include #include "osmtest.h" ib_api_status_t @@ -1055,7 +1054,7 @@ osmt_get_all_services_and_check_names( I OSM_LOG_ENTER(&p_osmt->log, osmt_get_all_services_and_check_names ); /* Prepare tracker for the checked names */ - p_checked_names = (uint8_t*)cl_malloc(sizeof(uint8_t)*num_of_valid_names); + p_checked_names = (uint8_t*)malloc(sizeof(uint8_t)*num_of_valid_names); for (j = 0 ; j < num_of_valid_names ; j++) { p_checked_names[j] = 0; Index: osm/osmtest/main.c =================================================================== --- osm/osmtest/main.c (revision 7470) +++ osm/osmtest/main.c (working copy) @@ -1,6 +1,6 @@ /* + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -47,7 +47,6 @@ #include #include #include -#include #include "osmtest.h" /******************************************************************** @@ -120,7 +119,6 @@ show_usage( ) " -d0 - Unused.\n" " -d1 - Do not scan/compare path records.\n" " -d2 - Force log flushing after each log message.\n" - " -d3 - Use mem tracking.\n" " Without -d, no debug options are enabled\n\n" ); printf( "-m \n" "--max_lid \n" @@ -307,7 +305,6 @@ main( int argc, uint32_t log_flags = OSM_LOG_ERROR | OSM_LOG_INFO; int32_t vendor_debug=0; char flow_name[64]; - boolean_t mem_track = FALSE; uint32_t next_option; const char *const short_option = "f:l:m:M:d:g:s:t:i:pcvVh"; @@ -559,9 +556,7 @@ main( int argc, opt.force_log_flush = TRUE; break; case 3: - printf( "Use Mem Tracking\n" ); - mem_track = TRUE; - break; + /* Used to be memory tracking */ default: printf( "Unknown value %ld (ignored)\n", strtol( optarg, NULL, 0 ) ); break; @@ -591,7 +586,6 @@ main( int argc, printf( "\tFlow = %s\n", flow_name ); - if (mem_track) __cl_mem_track(TRUE); if (vendor_debug) osm_vendor_set_debug(osm_test.p_vendor, vendor_debug); @@ -634,8 +628,6 @@ main( int argc, } osmtest_destroy( &osm_test ); - if (mem_track) cl_mem_display(); - complib_exit(); Exit: Index: osm/osmtest/osmt_multicast.c =================================================================== --- osm/osmtest/osmt_multicast.c (revision 7470) +++ osm/osmtest/osmt_multicast.c (working copy) @@ -53,7 +53,6 @@ #include #include #include -#include #include #include "osmtest.h" @@ -157,7 +156,7 @@ osmt_query_mcast( IN osmtest_t * const p p_item = p_next_item; p_next_item = cl_qmap_next( p_item ); cl_qmap_remove_item(&p_osmt->exp_subn.mgrp_mlid_tbl,p_item); - cl_free( p_item ); + free( p_item ); } @@ -197,7 +196,7 @@ osmt_query_mcast( IN osmtest_t * const p status = IB_ERROR; goto Exit; } - p_mgrp = (osmtest_mgrp_t*)cl_malloc( sizeof(*p_mgrp) ); + p_mgrp = (osmtest_mgrp_t*)malloc( sizeof(*p_mgrp) ); if (!p_mgrp) { osm_log( &p_osmt->log, OSM_LOG_ERROR, Index: osm/opensm/osm_port.c =================================================================== --- osm/opensm/osm_port.c (revision 7470) +++ osm/opensm/osm_port.c (working copy) @@ -53,7 +53,6 @@ #include #include -#include #include #include #include @@ -88,7 +87,7 @@ osm_physp_destroy( /* free the SL2VL Tables */ num_slvl = cl_ptr_vector_get_size(&p_physp->slvl_by_port); for (i = 0; i < num_slvl; i++) - cl_free(cl_ptr_vector_get(&p_physp->slvl_by_port, i)); + free(cl_ptr_vector_get(&p_physp->slvl_by_port, i)); cl_ptr_vector_destroy(&p_physp->slvl_by_port); /* free the P_Key Tables */ @@ -142,7 +141,9 @@ osm_physp_init( cl_ptr_vector_init( &p_physp->slvl_by_port, num_slvl, 1); for (i = 0; i < num_slvl; i++) { - p_slvl = (ib_slvl_table_t *)cl_zalloc(sizeof(ib_slvl_table_t)); + p_slvl = (ib_slvl_table_t *)malloc(sizeof(ib_slvl_table_t)); + if (p_slvl) + memset(p_slvl, 0, sizeof(ib_slvl_table_t)); cl_ptr_vector_set(&p_physp->slvl_by_port, i, p_slvl); } @@ -238,9 +239,12 @@ osm_port_new( */ size = p_ni->num_ports; - p_port = cl_zalloc( sizeof(*p_port) + sizeof(void *) * size ); + p_port = malloc( sizeof(*p_port) + sizeof(void *) * size ); if( p_port != NULL ) + { + memset( p_port, 0, sizeof(*p_port) + sizeof(void *) * size ); osm_port_init( p_port, p_ni, p_parent_node ); + } return( p_port ); } @@ -706,7 +710,7 @@ osm_physp_replace_dr_path_with_alternate BFS from OSM port until we find the target physp but avoid going through mapped ports */ - p_nextPortsList = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_nextPortsList = (cl_list_t*)malloc(sizeof(cl_list_t)); cl_list_construct( p_nextPortsList ); cl_list_init( p_nextPortsList, 10 ); @@ -741,7 +745,7 @@ osm_physp_replace_dr_path_with_alternate { next_list_is_full = FALSE; p_currPortsList = p_nextPortsList; - p_nextPortsList = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_nextPortsList = (cl_list_t*)malloc(sizeof(cl_list_t)); cl_list_construct( p_nextPortsList ); cl_list_init( p_nextPortsList, 10 ); p_physp = (osm_physp_t*)cl_list_remove_head( p_currPortsList ); @@ -806,13 +810,13 @@ osm_physp_replace_dr_path_with_alternate } } cl_list_destroy( p_currPortsList ); - cl_free(p_currPortsList); + free(p_currPortsList); } /* cleanup */ Exit: cl_list_destroy( p_nextPortsList ); - cl_free( p_nextPortsList ); + free( p_nextPortsList ); cl_map_destroy( &physp_map ); cl_map_destroy( &visited_map ); } Index: osm/opensm/osm_state_mgr.c =================================================================== --- osm/opensm/osm_state_mgr.c (revision 7470) +++ osm/opensm/osm_state_mgr.c (working copy) @@ -54,7 +54,6 @@ #include #include #include -#include #include #include #include @@ -1086,7 +1085,7 @@ osm_topology_file_create( CL_PLOCK_ACQUIRE( p_mgr->p_lock ); file_name = - ( char * )cl_malloc( strlen( p_mgr->p_subn->opt.dump_files_dir ) + 12 ); + ( char * )malloc( strlen( p_mgr->p_subn->opt.dump_files_dir ) + 12 ); CL_ASSERT( file_name ); @@ -1232,7 +1231,7 @@ osm_topology_file_create( fclose( rc ); Exit: - cl_free( file_name ); + free( file_name ); OSM_LOG_EXIT( p_mgr->p_log ); } @@ -1450,7 +1449,7 @@ __process_idle_time_queue_done( } - cl_free( p_process_item ); + free( p_process_item ); OSM_LOG_EXIT( p_mgr->p_log ); return; @@ -2976,7 +2975,7 @@ osm_state_mgr_process_idle( OSM_LOG_ENTER( p_mgr->p_log, osm_state_mgr_process_idle ); - p_idle_item = cl_zalloc( sizeof( osm_idle_item_t ) ); + p_idle_item = malloc( sizeof( osm_idle_item_t ) ); if( p_idle_item == NULL ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, @@ -2985,6 +2984,7 @@ osm_state_mgr_process_idle( return IB_ERROR; } + memset( p_idle_item, 0, sizeof( osm_idle_item_t ) ); p_idle_item->pfn_start = pfn_start; p_idle_item->pfn_done = pfn_done; p_idle_item->context1 = context1; Index: osm/opensm/osm_subnet.c =================================================================== --- osm/opensm/osm_subnet.c (revision 7470) +++ osm/opensm/osm_subnet.c (working copy) @@ -51,8 +51,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include @@ -137,7 +137,7 @@ osm_subn_destroy( { p_rsm = p_next_rsm; p_next_rsm = (osm_remote_sm_t*)cl_qmap_next( &p_rsm->map_item ); - cl_free( p_rsm ); + free( p_rsm ); } p_next_prtn = (osm_prtn_t*)cl_qmap_head( &p_subn->prtn_pkey_tbl ); @@ -634,7 +634,7 @@ __osm_subn_opts_unpack_charp( p_key, p_val_str); printf(buff); cl_log_event("OpenSM", LOG_INFO, buff, NULL, 0); - *p_val = (char *)cl_malloc( strlen(p_val_str) +1 ); + *p_val = (char *)malloc( strlen(p_val_str) +1 ); strcpy( *p_val, p_val_str); } } Index: osm/opensm/osm_db_files.c =================================================================== --- osm/opensm/osm_db_files.c (revision 7470) +++ osm/opensm/osm_db_files.c (working copy) @@ -50,7 +50,6 @@ #include #include #include -#include #include #include @@ -146,8 +145,8 @@ osm_db_domain_destroy( cl_spinlock_destroy( &p_domain_imp->lock ); st_free_table( p_domain_imp->p_hash ); - cl_free( p_domain_imp->file_name ); - cl_free( p_domain_imp ); + free( p_domain_imp->file_name ); + free( p_domain_imp ); } /*************************************************************************** @@ -161,10 +160,10 @@ osm_db_destroy( while ((p_domain = cl_list_remove_head( &p_db->domains )) != NULL ) { osm_db_domain_destroy( p_domain ); - cl_free( p_domain ); + free( p_domain ); } cl_list_destroy( &p_db->domains ); - cl_free( p_db->p_db_imp ); + free( p_db->p_db_imp ); } /*************************************************************************** @@ -179,7 +178,7 @@ osm_db_init( OSM_LOG_ENTER( p_log, osm_db_init ); - p_db_imp = (osm_db_imp_t *)cl_malloc(sizeof(osm_db_imp_t)); + p_db_imp = (osm_db_imp_t *)malloc(sizeof(osm_db_imp_t)); CL_ASSERT( p_db_imp != NULL); p_db_imp->db_dir_name = getenv("OSM_CACHE_DIR"); @@ -233,18 +232,18 @@ osm_db_domain_init( OSM_LOG_ENTER( p_log, osm_db_domain_init ); /* allocate a new domain object */ - p_domain = (osm_db_domain_t *)cl_malloc(sizeof(osm_db_domain_t)); + p_domain = (osm_db_domain_t *)malloc(sizeof(osm_db_domain_t)); CL_ASSERT( p_domain != NULL ); p_domain_imp = - (osm_db_domain_imp_t *)cl_malloc(sizeof(osm_db_domain_imp_t)); + (osm_db_domain_imp_t *)malloc(sizeof(osm_db_domain_imp_t)); CL_ASSERT( p_domain_imp != NULL ); dir_name_len = strlen(((osm_db_imp_t*)p_db->p_db_imp)->db_dir_name); /* set the domain file name */ p_domain_imp->file_name = - (char *)cl_malloc(sizeof(char)*(dir_name_len) + strlen(domain_name) + 2); + (char *)malloc(sizeof(char)*(dir_name_len) + strlen(domain_name) + 2); CL_ASSERT(p_domain_imp->file_name != NULL); strcpy(p_domain_imp->file_name,((osm_db_imp_t*)p_db->p_db_imp)->db_dir_name); strcat(p_domain_imp->file_name,domain_name); @@ -257,8 +256,8 @@ osm_db_domain_init( "osm_db_domain_init: ERR 6102: " " Failed to open the db file:%s\n", p_domain_imp->file_name); - cl_free(p_domain_imp); - cl_free(p_domain); + free(p_domain_imp); + free(p_domain); p_domain = NULL; goto Exit; } @@ -364,19 +363,19 @@ osm_db_restore( goto EndParsing; } - p_key = (char *)cl_malloc(sizeof(char)*(strlen(p_first_word) + 1)); + p_key = (char *)malloc(sizeof(char)*(strlen(p_first_word) + 1)); strcpy(p_key, p_first_word); p_rest_of_line = strtok_r(NULL, "\n", &p_last); if (p_rest_of_line != NULL) { p_accum_val = - (char*)cl_malloc(sizeof(char)*(strlen(p_rest_of_line) + 1)); + (char*)malloc(sizeof(char)*(strlen(p_rest_of_line) + 1)); strcpy(p_accum_val, p_rest_of_line); } else { - p_accum_val = (char*)cl_malloc(2); + p_accum_val = (char*)malloc(2); strcpy(p_accum_val, "\0"); } } @@ -429,9 +428,9 @@ osm_db_restore( /* accumulate into the value */ p_prev_val = p_accum_val; p_accum_val = - (char *)cl_malloc(strlen(p_prev_val) + strlen(sLine) + 1); + (char *)malloc(strlen(p_prev_val) + strlen(sLine) + 1); strcpy(p_accum_val, p_prev_val); - cl_free(p_prev_val); + free(p_prev_val); strcat(p_accum_val, sLine); } } /* in key */ @@ -473,7 +472,7 @@ osm_db_store( p_domain_imp = (osm_db_domain_imp_t *)p_domain->p_domain_imp; p_tmp_file_name = - (char *)cl_malloc(sizeof(char)*(strlen(p_domain_imp->file_name)+8)); + (char *)malloc(sizeof(char)*(strlen(p_domain_imp->file_name)+8)); strcpy(p_tmp_file_name, p_domain_imp->file_name); strcat(p_tmp_file_name,".tmp"); @@ -514,7 +513,7 @@ osm_db_store( } Exit: cl_spinlock_release( &p_domain_imp->lock ); - cl_free(p_tmp_file_name); + free(p_tmp_file_name); OSM_LOG_EXIT( p_log ); return status; } @@ -526,8 +525,8 @@ osm_db_store( int __osm_clear_tbl_entry(st_data_t key, st_data_t val, st_data_t arg) { - cl_free((char*)key); - cl_free((char*)val); + free((char*)key); + free((char*)val); return ST_DELETE; } @@ -625,17 +624,18 @@ osm_db_update( else { /* need to allocate the key */ - p_new_key = cl_malloc(sizeof(char)*(strlen(p_key) + 1)); + p_new_key = malloc(sizeof(char)*(strlen(p_key) + 1)); strcpy(p_new_key, p_key); } /* need to arange a new copy of the value */ - p_new_val = cl_malloc(sizeof(char)*(strlen(p_val) + 1)); + p_new_val = malloc(sizeof(char)*(strlen(p_val) + 1)); strcpy(p_new_val, p_val); st_insert(p_domain_imp->p_hash, (st_data_t)p_new_key, (st_data_t)p_new_val); - if (p_prev_val) cl_free(p_prev_val); + if (p_prev_val) + free(p_prev_val); cl_spinlock_release( &p_domain_imp->lock ); @@ -674,8 +674,8 @@ osm_db_delete( } else { - cl_free(p_key); - cl_free(p_prev_val); + free(p_key); + free(p_prev_val); res = 0; } } Index: osm/opensm/osm_node.c =================================================================== --- osm/opensm/osm_node.c (revision 7470) +++ osm/opensm/osm_node.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -51,7 +51,7 @@ # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include #include @@ -120,9 +120,10 @@ osm_node_new( */ size = p_ni->num_ports; - p_node = cl_zalloc( sizeof(*p_node) + sizeof(osm_physp_t) * size ); + p_node = malloc( sizeof(*p_node) + sizeof(osm_physp_t) * size ); if( p_node != NULL ) { + memset( p_node, 0, sizeof(*p_node) + sizeof(osm_physp_t) * size ); p_node->node_info = *p_ni; p_node->physp_tbl_size = size + 1; @@ -174,7 +175,7 @@ osm_node_delete( IN OUT osm_node_t** const p_node ) { osm_node_destroy( *p_node ); - cl_free( *p_node ); + free( *p_node ); *p_node = NULL; } Index: osm/opensm/osm_mcm_info.c =================================================================== --- osm/opensm/osm_mcm_info.c (revision 7470) +++ osm/opensm/osm_mcm_info.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -51,7 +51,7 @@ # include #endif /* HAVE_CONFIG_H */ -#include +#include #include /********************************************************************** @@ -82,9 +82,10 @@ osm_mcm_info_new( { osm_mcm_info_t* p_mcm; - p_mcm = (osm_mcm_info_t*)cl_zalloc( sizeof(*p_mcm) ); + p_mcm = (osm_mcm_info_t*)malloc( sizeof(*p_mcm) ); if( p_mcm ) { + memset(p_mcm, 0, sizeof(*p_mcm) ); osm_mcm_info_init( p_mcm, mlid ); } @@ -98,6 +99,6 @@ osm_mcm_info_delete( IN osm_mcm_info_t* const p_mcm ) { osm_mcm_info_destroy( p_mcm ); - cl_free( p_mcm ); + free( p_mcm ); } Index: osm/opensm/osm_inform.c =================================================================== --- osm/opensm/osm_inform.c (revision 7470) +++ osm/opensm/osm_inform.c (working copy) @@ -49,9 +49,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include #include #include #include @@ -83,7 +83,7 @@ void osm_infr_destroy( IN osm_infr_t* const p_infr ) { - cl_free(p_infr); + free(p_infr); } /********************************************************************** @@ -112,7 +112,7 @@ osm_infr_new( CL_ASSERT(p_infr_rec); - p_infr = (osm_infr_t*)cl_malloc( sizeof(osm_infr_t) ); + p_infr = (osm_infr_t*)malloc( sizeof(osm_infr_t) ); if( p_infr ) { osm_infr_construct( p_infr ); Index: osm/opensm/osm_service.c =================================================================== --- osm/opensm/osm_service.c (revision 7470) +++ osm/opensm/osm_service.c (working copy) @@ -49,9 +49,8 @@ # include #endif /* HAVE_CONFIG_H */ -#include +#include #include -#include #include #include @@ -70,7 +69,7 @@ void osm_svcr_destroy( IN osm_svcr_t* const p_svcr ) { - cl_free( p_svcr); + free( p_svcr); } /********************************************************************** @@ -102,7 +101,7 @@ osm_svcr_new( CL_ASSERT(p_svc_rec); - p_svcr = (osm_svcr_t*)cl_malloc( sizeof(*p_svcr) ); + p_svcr = (osm_svcr_t*)malloc( sizeof(*p_svcr) ); if( p_svcr ) { osm_svcr_construct( p_svcr ); Index: osm/opensm/osm_switch.c =================================================================== --- osm/opensm/osm_switch.c (revision 7470) +++ osm/opensm/osm_switch.c (working copy) @@ -51,8 +51,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include @@ -104,13 +104,15 @@ osm_switch_init( status = osm_fwd_tbl_init( &p_sw->fwd_tbl, p_si ); - p_sw->p_prof = cl_zalloc( sizeof(*p_sw->p_prof) * num_ports ); + p_sw->p_prof = malloc( sizeof(*p_sw->p_prof) * num_ports ); if( p_sw->p_prof == NULL ) { status = IB_INSUFFICIENT_MEMORY; goto Exit; } + memset( p_sw->p_prof, 0, sizeof(*p_sw->p_prof) * num_ports ); + status = osm_mcast_tbl_init( &p_sw->mcast_tbl, osm_node_get_num_physp( p_node ), cl_ntoh16( p_si->mcast_cap ) ); if( status != IB_SUCCESS ) @@ -131,7 +133,7 @@ osm_switch_destroy( { /* free memory to avoid leaks */ osm_mcast_tbl_destroy( &p_sw->mcast_tbl ); - cl_free( p_sw->p_prof ); + free( p_sw->p_prof ); osm_fwd_tbl_destroy( &p_sw->fwd_tbl ); osm_lid_matrix_destroy( &p_sw->lmx ); } @@ -143,7 +145,7 @@ osm_switch_delete( IN OUT osm_switch_t** const pp_sw ) { osm_switch_destroy( *pp_sw ); - cl_free( *pp_sw ); + free( *pp_sw ); *pp_sw = NULL; } @@ -157,9 +159,10 @@ osm_switch_new( ib_api_status_t status; osm_switch_t *p_sw; - p_sw = (osm_switch_t*)cl_zalloc( sizeof(*p_sw) ); + p_sw = (osm_switch_t*)malloc( sizeof(*p_sw) ); if( p_sw ) { + memset( p_sw, 0, sizeof(*p_sw) ); status = osm_switch_init( p_sw, p_node, p_madw ); if( status != IB_SUCCESS ) osm_switch_delete( &p_sw ); Index: osm/opensm/osm_sminfo_rcv.c =================================================================== --- osm/opensm/osm_sminfo_rcv.c (revision 7470) +++ osm/opensm/osm_sminfo_rcv.c (working copy) @@ -51,9 +51,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include #include #include #include @@ -638,7 +638,7 @@ __osm_sminfo_rcv_process_get_response( p_sm = (osm_remote_sm_t*)cl_qmap_get( p_sm_tbl, port_guid ); if( p_sm == (osm_remote_sm_t*)cl_qmap_end( p_sm_tbl ) ) { - p_sm = cl_malloc( sizeof(*p_sm) ); + p_sm = malloc( sizeof(*p_sm) ); if( p_sm == NULL ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, Index: osm/opensm/osm_multicast.c =================================================================== --- osm/opensm/osm_multicast.c (revision 7470) +++ osm/opensm/osm_multicast.c (working copy) @@ -49,8 +49,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include @@ -107,7 +107,7 @@ osm_mgrp_destroy( /* destroy the mtree_node structure */ osm_mtree_destroy(p_mgrp->p_root); - cl_free(p_mgrp); + free(p_mgrp); } /********************************************************************** @@ -135,7 +135,7 @@ osm_mgrp_new( { osm_mgrp_t* p_mgrp; - p_mgrp = (osm_mgrp_t*)cl_malloc( sizeof(*p_mgrp) ); + p_mgrp = (osm_mgrp_t*)malloc( sizeof(*p_mgrp) ); if( p_mgrp ) osm_mgrp_init( p_mgrp, mlid ); Index: osm/opensm/osm_mtree.c =================================================================== --- osm/opensm/osm_mtree.c (revision 7470) +++ osm/opensm/osm_mtree.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -50,7 +50,7 @@ # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include @@ -83,8 +83,8 @@ osm_mtree_node_new( { osm_mtree_node_t *p_mtn; - p_mtn = cl_malloc( sizeof(osm_mtree_node_t) + - sizeof(void*) * (osm_switch_get_num_ports( p_sw ) - 1) ); + p_mtn = malloc( sizeof(osm_mtree_node_t) + + sizeof(void*) * (osm_switch_get_num_ports( p_sw ) - 1) ); if( p_mtn != NULL ) osm_mtree_node_init( p_mtn, p_sw ); @@ -109,7 +109,7 @@ osm_mtree_destroy( (p_mtn->child_array[i] != OSM_MTREE_LEAF) ) osm_mtree_destroy(p_mtn->child_array[i]); - cl_free( p_mtn ); + free( p_mtn ); } /********************************************************************** Index: osm/opensm/osm_mcast_mgr.c =================================================================== --- osm/opensm/osm_mcast_mgr.c (revision 7470) +++ osm/opensm/osm_mcast_mgr.c (working copy) @@ -51,9 +51,9 @@ #endif /* HAVE_CONFIG_H */ #include +#include #include #include -#include #include #include #include @@ -88,9 +88,12 @@ __osm_mcast_work_obj_new( qlist. see cl_qlist_insert_tail(): CL_ASSERT(p_list_item->p_list != p_list) */ - p_obj = cl_zalloc( sizeof( *p_obj ) ); + p_obj = malloc( sizeof( *p_obj ) ); if( p_obj ) + { + memset( p_obj, 0, sizeof( *p_obj ) ); p_obj->p_port = (osm_port_t*)p_port; + } return( p_obj ); } @@ -101,7 +104,7 @@ static void __osm_mcast_work_obj_delete( IN osm_mcast_work_obj_t* p_wobj ) { - cl_free( p_wobj ); + free( p_wobj ); } /********************************************************************** @@ -123,7 +126,7 @@ __osm_mcast_mgr_purge_tree_node( } - cl_free( p_mtn ); + free( p_mtn ); } /********************************************************************** @@ -738,7 +741,7 @@ __osm_mcast_mgr_branch( TO DO - this list array could probably be moved inside the switch element to save on malloc thrashing. */ - list_array = cl_zalloc( sizeof(cl_qlist_t) * max_children ); + list_array = malloc( sizeof(cl_qlist_t) * max_children ); if( list_array == NULL ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, @@ -748,6 +751,8 @@ __osm_mcast_mgr_branch( goto Exit; } + memset( list_array, 0, sizeof(cl_qlist_t) * max_children ); + for( i = 0; i < max_children; i++ ) cl_qlist_init( &list_array[i] ); @@ -875,7 +880,7 @@ __osm_mcast_mgr_branch( } } - cl_free( list_array ); + free( list_array ); Exit: OSM_LOG_EXIT( p_mgr->p_log ); return( p_mtn ); @@ -1395,7 +1400,7 @@ osm_mcast_mgr_dump_mcast_routes( goto Exit; file_name = - (char*)cl_malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 12); + (char*)malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 12); CL_ASSERT(file_name); @@ -1457,7 +1462,7 @@ osm_mcast_mgr_dump_mcast_routes( Exit: if (file_name) - cl_free(file_name); + free(file_name); OSM_LOG_EXIT( p_mgr->p_log ); } @@ -1639,7 +1644,7 @@ osm_mcast_mgr_process_mgrp_cb( memcpy(&mlid, &p_ctxt->mlid, sizeof(mlid)); /* we can destroy the context now */ - cl_free(p_ctxt); + free(p_ctxt); /* we need a lock to make sure the p_mgrp is not change other ways */ CL_PLOCK_EXCL_ACQUIRE( p_mgr->p_lock ); Index: osm/opensm/osm_sm.c =================================================================== --- osm/opensm/osm_sm.c (revision 7470) +++ osm/opensm/osm_sm.c (working copy) @@ -55,8 +55,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include @@ -261,7 +261,7 @@ osm_sm_destroy( cl_event_destroy( &p_sm->subnet_up_event ); if( p_sm->p_report_buf != NULL ) - cl_free( p_sm->p_report_buf ); + free( p_sm->p_report_buf ); osm_log( p_sm->p_log, OSM_LOG_SYS, "Exiting SM\n" ); /* Format Waived */ OSM_LOG_EXIT( p_sm->p_log ); @@ -295,7 +295,7 @@ osm_sm_init( p_sm->p_disp = p_disp; p_sm->p_lock = p_lock; - p_sm->p_report_buf = cl_malloc( OSM_REPORT_BUF_SIZE ); + p_sm->p_report_buf = malloc( OSM_REPORT_BUF_SIZE ); if( p_sm->p_report_buf == NULL ) { osm_log( p_sm->p_log, OSM_LOG_ERROR, @@ -596,7 +596,7 @@ __osm_sm_mgrp_connect( * isn't busy trying to do something else. */ ctx2 = - ( osm_mcast_mgr_ctxt_t * ) cl_malloc( sizeof( osm_mcast_mgr_ctxt_t ) ); + ( osm_mcast_mgr_ctxt_t * ) malloc( sizeof( osm_mcast_mgr_ctxt_t ) ); memcpy( &ctx2->mlid, &p_mgrp->mlid, sizeof( p_mgrp->mlid ) ); ctx2->req_type = req_type; ctx2->port_guid = port_guid; @@ -629,7 +629,7 @@ __osm_sm_mgrp_disconnect( * isn't busy trying to do something else. */ ctx2 = - ( osm_mcast_mgr_ctxt_t * ) cl_malloc( sizeof( osm_mcast_mgr_ctxt_t ) ); + ( osm_mcast_mgr_ctxt_t * ) malloc( sizeof( osm_mcast_mgr_ctxt_t ) ); memcpy( &ctx2->mlid, &p_mgrp->mlid, sizeof( p_mgrp->mlid ) ); ctx2->req_type = OSM_MCAST_REQ_TYPE_LEAVE; ctx2->port_guid = port_guid; Index: osm/opensm/osm_lin_fwd_tbl.c =================================================================== --- osm/opensm/osm_lin_fwd_tbl.c (revision 7470) +++ osm/opensm/osm_lin_fwd_tbl.c (working copy) @@ -51,8 +51,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include @@ -78,7 +78,7 @@ osm_lin_tbl_new( so add 1 to the end of the range here for this assert. */ CL_ASSERT( size <= IB_LID_UCAST_END_HO + 1 ); - p_tbl = (osm_lin_fwd_tbl_t*)cl_malloc( + p_tbl = (osm_lin_fwd_tbl_t*)malloc( __osm_lin_tbl_compute_obj_size( size ) ); /* @@ -98,6 +98,6 @@ void osm_lin_tbl_delete( IN osm_lin_fwd_tbl_t** const pp_tbl ) { - cl_free( *pp_tbl ); + free( *pp_tbl ); *pp_tbl = NULL; } Index: osm/opensm/osm_prtn.c =================================================================== --- osm/opensm/osm_prtn.c (revision 7470) +++ osm/opensm/osm_prtn.c (working copy) @@ -49,12 +49,12 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include #include -#include #include #include #include @@ -73,9 +73,11 @@ osm_prtn_t* osm_prtn_new( IN const char *name, IN const uint16_t pkey ) { - osm_prtn_t *p = cl_zalloc(sizeof(*p)); + osm_prtn_t *p = malloc(sizeof(*p)); if (!p) return NULL; + + memset(p, 0, sizeof(*p)); p->pkey = pkey; cl_map_construct(&p->full_guid_tbl); cl_map_init(&p->full_guid_tbl, 32); @@ -99,7 +101,7 @@ void osm_prtn_delete( cl_map_destroy(&p->full_guid_tbl); cl_map_remove_all(&p->part_guid_tbl); cl_map_destroy(&p->part_guid_tbl); - cl_free(p); + free(p); *pp_prtn = NULL; } Index: osm/opensm/osm_ucast_mgr.c =================================================================== --- osm/opensm/osm_ucast_mgr.c (revision 7470) +++ osm/opensm/osm_ucast_mgr.c (working copy) @@ -55,9 +55,9 @@ #endif /* HAVE_CONFIG_H */ #include +#include #include #include -#include #include #include #include @@ -231,7 +231,7 @@ osm_ucast_mgr_dump_ucast_routes( goto Exit; file_name = - (char*)cl_malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 10); + (char*)malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 10); CL_ASSERT(file_name); @@ -332,7 +332,7 @@ osm_ucast_mgr_dump_ucast_routes( Exit: if (file_name) - cl_free(file_name); + free(file_name); OSM_LOG_EXIT( p_mgr->p_log ); } @@ -642,7 +642,7 @@ __osm_ucast_mgr_process_port( OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_process_port ); - remote_sys_guids = cl_zalloc( sizeof(uint64_t) * lids_per_port ); + remote_sys_guids = malloc( sizeof(uint64_t) * lids_per_port ); if( remote_sys_guids == NULL ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, @@ -651,7 +651,9 @@ __osm_ucast_mgr_process_port( goto Exit; } - remote_node_guids = cl_zalloc( sizeof(uint64_t) * lids_per_port ); + memset( remote_sys_guids, 0, sizeof(uint64_t) * lids_per_port ); + + remote_node_guids = malloc( sizeof(uint64_t) * lids_per_port ); if( remote_node_guids == NULL ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, @@ -660,6 +662,8 @@ __osm_ucast_mgr_process_port( goto Exit; } + memset(remote_node_guids, 0, sizeof(uint64_t) * lids_per_port ); + osm_port_get_lid_range_ho( p_port, &min_lid_ho, &max_lid_ho ); /* If the lids are zero - then there was some problem with the initialization. @@ -789,9 +793,9 @@ __osm_ucast_mgr_process_port( Exit: if (remote_sys_guids) - cl_free(remote_sys_guids); + free(remote_sys_guids); if (remote_node_guids) - cl_free(remote_node_guids); + free(remote_node_guids); OSM_LOG_EXIT( p_mgr->p_log ); } Index: osm/opensm/osm_ucast_updn.c =================================================================== --- osm/opensm/osm_ucast_updn.c (revision 7470) +++ osm/opensm/osm_ucast_updn.c (working copy) @@ -50,7 +50,7 @@ # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include #include @@ -118,13 +118,16 @@ __updn_create_updn_next_step_t(IN updn_s { updn_next_step_t *p_next_step; - p_next_step = (updn_next_step_t*) cl_zalloc(sizeof(*p_next_step)); + p_next_step = (updn_next_step_t*) malloc(sizeof(*p_next_step)); CL_ASSERT (p_next_step != NULL); + if (p_next_step) + { + memset(p_next_step, 0, sizeof(*p_next_step)); + p_next_step->state = state; + p_next_step->p_sw = p_sw; + } - p_next_step->state = state; - p_next_step->p_sw = p_sw; return p_next_step; - } /********************************************************************** @@ -142,7 +145,7 @@ __updn_update_rank( p_updn_rank = (updn_rank_t*) cl_qmap_get(p_guid_rank_tbl, guid_index); if (p_updn_rank == (updn_rank_t*) cl_qmap_end(p_guid_rank_tbl)) { - p_updn_rank = (updn_rank_t*) cl_malloc(sizeof(updn_rank_t)); + p_updn_rank = (updn_rank_t*) malloc(sizeof(updn_rank_t)); CL_ASSERT (p_updn_rank); p_updn_rank->rank = rank; @@ -181,7 +184,7 @@ __updn_bfs_by_node(IN osm_subn_t *p_subn OSM_LOG_ENTER( &(osm.log), __updn_bfs_by_node); /* Init the list pointers */ - p_nextList = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t)); cl_list_construct( p_nextList ); cl_list_init( p_nextList, 10 ); p_currList = p_nextList; @@ -293,7 +296,7 @@ __updn_bfs_by_node(IN osm_subn_t *p_subn "__updn_bfs_by_node:" "Starting a new iteration with %d elements in current list\n", cl_list_count(p_currList)); /* Init the switch directed list */ - p_nextList = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t)); cl_list_construct( p_nextList ); cl_list_init( p_nextList, 10 ); /* Go over all current list items till it's empty */ @@ -433,19 +436,19 @@ __updn_bfs_by_node(IN osm_subn_t *p_subn } } } - cl_free (p_updn_switch); + free (p_updn_switch); p_updn_switch = (updn_next_step_t*)cl_list_remove_head( p_currList ); } /* Cleanup p_currList */ cl_list_destroy( p_currList ); - cl_free (p_currList); + free (p_currList); /* Reassign p_currList to p_nextList */ p_currList = p_nextList; } /* Cleanup p_currList - Had the pointer to cl_list_t */ cl_list_destroy( p_currList ); - cl_free (p_currList); + free (p_currList); osm_log(&(osm.log), OSM_LOG_VERBOSE, "__updn_bfs_by_node:" "BFS the subnet ]\n"); @@ -471,22 +474,22 @@ updn_destroy( "guid = 0x%" PRIx64 " rank = %u\n", cl_ntoh64(cl_qmap_key(p_map_item)), ((updn_rank_t *)p_map_item)->rank); cl_qmap_remove_item( &p_updn->guid_rank_tbl, p_map_item); - cl_free( (updn_rank_t *)p_map_item); + free( (updn_rank_t *)p_map_item); p_map_item = cl_qmap_head( &p_updn->guid_rank_tbl); } /* free the array of guids */ if (p_updn->updn_ucast_reg_inputs.guid_list) - cl_free(p_updn->updn_ucast_reg_inputs.guid_list); + free(p_updn->updn_ucast_reg_inputs.guid_list); /* destroy the list of root nodes */ while ((p_guid_list_item = cl_list_remove_head( p_updn->p_root_nodes ))) - cl_free( p_guid_list_item ); + free( p_guid_list_item ); cl_list_remove_all( p_updn->p_root_nodes ); cl_list_destroy( p_updn->p_root_nodes ); - cl_free ( p_updn->p_root_nodes ); - cl_free (p_updn); + free ( p_updn->p_root_nodes ); + free (p_updn); } updn_t* @@ -495,7 +498,9 @@ updn_construct(void) updn_t* p_updn; OSM_LOG_ENTER( &(osm.log), updn_construct); - p_updn = cl_zalloc(sizeof(updn_t)); + p_updn = malloc(sizeof(updn_t)); + if (p_updn) + memset(p_updn, 0, sizeof(updn_t)); OSM_LOG_EXIT( &(osm.log) ); return(p_updn); } @@ -522,7 +527,7 @@ updn_init( } p_updn->state = UPDN_INIT; cl_qmap_init( &p_updn->guid_rank_tbl); - p_list = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_list = (cl_list_t*)malloc(sizeof(cl_list_t)); if (!p_list) { status = IB_ERROR; @@ -563,7 +568,7 @@ updn_init( /* Skip Empty Lines anywhere in the file - only one char means the Null termination */ if (strlen(line) > 1) { - p_tmp = cl_malloc(sizeof(uint64_t)); + p_tmp = malloc(sizeof(uint64_t)); *p_tmp = strtoull(line, NULL, 16); cl_list_insert_tail(osm.p_updn_ucast_routing->p_root_nodes, p_tmp); } @@ -636,7 +641,7 @@ updn_subn_rank( "Ranking starts from GUID 0x%" PRIx64 "\n", root_guid); /* Init the list pointers */ - p_nextList = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t)); cl_list_construct( p_nextList ); cl_list_init( p_nextList, 10 ); p_currList = p_nextList; @@ -691,7 +696,7 @@ updn_subn_rank( while (!cl_is_list_empty(p_currList)) { rank++; - p_nextList = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t)); cl_list_construct( p_nextList ); cl_list_init( p_nextList, 10 ); p_physp = (osm_physp_t*)cl_list_remove_head( p_currList ); @@ -749,7 +754,7 @@ updn_subn_rank( } /* First free the allocation of cl_list pointer then reallocate */ cl_list_destroy( p_currList ); - cl_free(p_currList); + free(p_currList); /* p_currList is empty - need to assign it to p_nextList */ p_currList = p_nextList; } @@ -759,7 +764,7 @@ updn_subn_rank( "BFS the subnet ]\n"); cl_list_destroy( p_currList ); - cl_free(p_currList); + free(p_currList); /* Print Summary of ranking */ osm_log(&(osm.log), OSM_LOG_VERBOSE, @@ -901,7 +906,7 @@ osm_subn_calc_up_down_min_hop_table( "guid = 0x%" PRIx64 " rank = %u\n", cl_ntoh64(cl_qmap_key(p_map_item)), ((updn_rank_t *)p_map_item)->rank); cl_qmap_remove_item( &p_updn->guid_rank_tbl, p_map_item); - cl_free( (updn_rank_t *)p_map_item); + free( (updn_rank_t *)p_map_item); p_map_item = cl_qmap_head( &p_updn->guid_rank_tbl); } @@ -954,15 +959,18 @@ void __osm_updn_convert_list2array(IN up p_updn->updn_ucast_reg_inputs.num_guids = cl_list_count( p_updn->p_root_nodes); if (p_updn->updn_ucast_reg_inputs.guid_list) - cl_free(p_updn->updn_ucast_reg_inputs.guid_list); - p_updn->updn_ucast_reg_inputs.guid_list = (uint64_t *)cl_zalloc( + free(p_updn->updn_ucast_reg_inputs.guid_list); + p_updn->updn_ucast_reg_inputs.guid_list = (uint64_t *)malloc( p_updn->updn_ucast_reg_inputs.num_guids*sizeof(uint64_t)); + if (p_updn->updn_ucast_reg_inputs.guid_list) + memset(p_updn->updn_ucast_reg_inputs.guid_list, 0, + p_updn->updn_ucast_reg_inputs.num_guids*sizeof(uint64_t)); if (!cl_is_list_empty(p_updn->p_root_nodes)) { while( (p_guid = (uint64_t*)cl_list_remove_head(p_updn->p_root_nodes)) ) { p_updn->updn_ucast_reg_inputs.guid_list[i] = *p_guid; - cl_free(p_guid); + free(p_guid); i++; } max_num = i; @@ -1033,7 +1041,7 @@ osm_updn_find_root_nodes_by_min_hop( OUT cl_map_init( &ca_by_lid_map, 10 ); /* EZ: - p_ca_list = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_ca_list = (cl_list_t*)malloc(sizeof(cl_list_t)); cl_list_construct( p_ca_list ); cl_list_init( p_ca_list, 10 ); */ @@ -1052,7 +1060,7 @@ osm_updn_find_root_nodes_by_min_hop( OUT self_lid_ho = cl_ntoh16( osm_physp_get_base_lid(p_physp) ); numCas++; /* EZ: - self = cl_malloc(sizeof(uint16_t)); + self = malloc(sizeof(uint16_t)); *self = self_lid_ho; cl_list_insert_tail(p_ca_list, self); */ @@ -1120,7 +1128,7 @@ osm_updn_find_root_nodes_by_min_hop( OUT if ( p_updn_hist == (updn_hist_t*)cl_qmap_end( &min_hop_hist)) { /* New entry in the histogram , first create it */ - p_updn_hist = (updn_hist_t*) cl_malloc(sizeof(updn_hist_t)); + p_updn_hist = (updn_hist_t*) malloc(sizeof(updn_hist_t)); CL_ASSERT (p_updn_hist); p_updn_hist->bar_value = 1; cl_qmap_insert(&min_hop_hist, (uint64_t)hop_val, &p_updn_hist->map_item); @@ -1176,14 +1184,14 @@ osm_updn_find_root_nodes_by_min_hop( OUT while ( p_updn_hist != (updn_hist_t*)cl_qmap_end( &min_hop_hist ) ) { cl_qmap_remove_item( &min_hop_hist, (cl_map_item_t*)p_updn_hist ); - cl_free( p_updn_hist ); + free( p_updn_hist ); p_updn_hist = (updn_hist_t*) cl_qmap_head( &min_hop_hist ); } /* If thd conditions are valid insert the root node to the list */ if ( (numHopBarsOverThd1 == 1) && (numHopBarsOverThd2 == 1) ) { - p_guid = cl_malloc(sizeof(uint64_t)); + p_guid = malloc(sizeof(uint64_t)); *p_guid = cl_ntoh64(osm_node_get_node_guid(p_sw->p_node)); osm_log (&(osm.log), OSM_LOG_DEBUG, "osm_updn_find_root_nodes_by_min_hop: " Index: osm/opensm/osm_mcast_tbl.c =================================================================== --- osm/opensm/osm_mcast_tbl.c (revision 7470) +++ osm/opensm/osm_mcast_tbl.c (working copy) @@ -51,8 +51,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include @@ -102,12 +102,14 @@ osm_mcast_tbl_init( since it is (and must be) defined that way the table structure in order to create a pointer to a two dimensional array. */ - p_tbl->p_mask_tbl = cl_zalloc( p_tbl->num_entries * - (IB_MCAST_POSITION_MAX + 1) * IB_MCAST_MASK_SIZE / 8 ); + p_tbl->p_mask_tbl = malloc( p_tbl->num_entries * + (IB_MCAST_POSITION_MAX + 1) * IB_MCAST_MASK_SIZE / 8 ); if( p_tbl->p_mask_tbl == NULL ) return( IB_INSUFFICIENT_MEMORY ); + memset(p_tbl->p_mask_tbl, 0, + p_tbl->num_entries * (IB_MCAST_POSITION_MAX + 1) * IB_MCAST_MASK_SIZE / 8 ); return( IB_SUCCESS ); } @@ -117,7 +119,7 @@ void osm_mcast_tbl_destroy( IN osm_mcast_tbl_t* const p_tbl ) { - cl_free( p_tbl->p_mask_tbl ); + free( p_tbl->p_mask_tbl ); } /********************************************************************** Index: osm/opensm/osm_pkey.c =================================================================== --- osm/opensm/osm_pkey.c (revision 7470) +++ osm/opensm/osm_pkey.c (working copy) @@ -52,7 +52,6 @@ #include #include #include -#include #include #include #include @@ -81,12 +80,12 @@ void osm_pkey_tbl_destroy( num_blocks = (uint16_t)(cl_ptr_vector_get_size( &p_pkey_tbl->blocks )); for (i = 0; i < num_blocks; i++) - cl_free(cl_ptr_vector_get( &p_pkey_tbl->blocks, i )); + free(cl_ptr_vector_get( &p_pkey_tbl->blocks, i )); cl_ptr_vector_destroy( &p_pkey_tbl->blocks ); num_blocks = (uint16_t)(cl_ptr_vector_get_size( &p_pkey_tbl->new_blocks )); for (i = 0; i < num_blocks; i++) - cl_free(cl_ptr_vector_get( &p_pkey_tbl->new_blocks, i )); + free(cl_ptr_vector_get( &p_pkey_tbl->new_blocks, i )); cl_ptr_vector_destroy( &p_pkey_tbl->new_blocks ); cl_map_remove_all( &p_pkey_tbl->keys ); @@ -120,9 +119,10 @@ void osm_pkey_tbl_sync_new_blocks( if ( b < new_blocks ) p_new_block = cl_ptr_vector_get(&p_pkey_tbl->new_blocks, b); else { - p_new_block = (ib_pkey_table_t *)cl_zalloc(sizeof(*p_new_block)); + p_new_block = (ib_pkey_table_t *)malloc(sizeof(*p_new_block)); if (!p_new_block) break; + memset(p_new_block, 0, sizeof(*p_new_block)); cl_ptr_vector_set(&((osm_pkey_tbl_t *)p_pkey_tbl)->new_blocks, b, p_new_block); } memcpy(p_new_block, p_block, sizeof(*p_new_block)); @@ -150,7 +150,9 @@ int osm_pkey_tbl_set( if ( !p_pkey_block ) { - p_pkey_block = (ib_pkey_table_t *)cl_zalloc(sizeof(ib_pkey_table_t)); + p_pkey_block = (ib_pkey_table_t *)malloc(sizeof(ib_pkey_table_t)); + if (p_pkey_block) + memset(p_pkey_block, 0, sizeof(ib_pkey_table_t)); cl_ptr_vector_set( &p_pkey_tbl->blocks, block, p_pkey_block ); } Index: osm/opensm/osm_mcm_port.c =================================================================== --- osm/opensm/osm_mcm_port.c (revision 7470) +++ osm/opensm/osm_mcm_port.c (working copy) @@ -51,8 +51,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include /********************************************************************** @@ -105,7 +105,7 @@ osm_mcm_port_new( { osm_mcm_port_t* p_mcm; - p_mcm = cl_malloc( sizeof(*p_mcm) ); + p_mcm = malloc( sizeof(*p_mcm) ); if( p_mcm ) { osm_mcm_port_init( p_mcm, p_port_gid, @@ -124,5 +124,5 @@ osm_mcm_port_delete( CL_ASSERT( p_mcm ); osm_mcm_port_destroy( p_mcm ); - cl_free( p_mcm ); + free( p_mcm ); } Index: osm/opensm/osm_log.c =================================================================== --- osm/opensm/osm_log.c (revision 7470) +++ osm/opensm/osm_log.c (working copy) @@ -58,7 +58,6 @@ #include #include #include -#include #ifndef WIN32 #include @@ -142,15 +141,6 @@ osm_log( /* SYS messages go to the log anyways */ if (p_log->level & verbosity) { -#ifdef _MEM_DEBUG_MODE_ - /* If we are running in MEM_DEBUG_MODE then - the cl_mem_check will be called on every run */ - if (cl_mem_check() == FALSE) - { - fprintf( p_log->out_port, "*** MEMORY ERROR!!! ***\n" ); - CL_ASSERT(0); - } -#endif va_start( args, p_str ); vsprintf( buffer, p_str, args ); Index: osm/opensm/osm_db_pack.c =================================================================== --- osm/opensm/osm_db_pack.c (revision 7470) +++ osm/opensm/osm_db_pack.c (working copy) @@ -40,9 +40,9 @@ #endif /* HAVE_CONFIG_H */ #include -#include #include #include + static inline void __osm_pack_guid(uint64_t guid, char *p_guid_str) { @@ -110,7 +110,7 @@ osm_db_guid2lid_guids( while ( (p_key = cl_list_remove_head( &keys )) != NULL ) { - p_guid_elem = (osm_db_guid_elem_t*)cl_malloc(sizeof(osm_db_guid_elem_t)); + p_guid_elem = (osm_db_guid_elem_t*)malloc(sizeof(osm_db_guid_elem_t)); CL_ASSERT( p_guid_elem != NULL ); p_guid_elem->guid = __osm_unpack_guid(p_key); Index: osm/opensm/main.c =================================================================== --- osm/opensm/main.c (revision 7470) +++ osm/opensm/main.c (working copy) @@ -59,7 +59,6 @@ #include #include #include -#include #include #include #include @@ -292,7 +291,6 @@ show_usage(void) " -d1 - Force single threaded dispatching\n" " -d2 - Force log flushing after each log message\n" " -d3 - Disable multicast support\n" - " -d4 - Put OpenSM in memory tracking mode\n" " -d10 - Put OpenSM in testability mode\n" " Without -d, no debug options are enabled\n\n" ); printf( "-h\n" @@ -518,7 +516,6 @@ main( uint32_t log_flags = OSM_LOG_DEFAULT_LEVEL; uint32_t temp, dbg_lvl; boolean_t run_once_flag = FALSE; - boolean_t mem_track = FALSE; int32_t vendor_debug = 0; uint32_t next_option; #if 0 @@ -692,10 +689,10 @@ main( printf(" Debug mode: Disable multicast support\n"); opt.disable_multicast = TRUE; } - else if(dbg_lvl == 4) - { - mem_track = TRUE; - } + /* + * NOTE: Debug level 4 used to be used for memory tracking + * but this is now deprecated + */ else if(dbg_lvl == 5) { vendor_debug++; @@ -825,9 +822,6 @@ main( /* Done with options description */ printf("-------------------------------------------------\n"); - if (mem_track) - __cl_mem_track(TRUE); - opt.log_flags = log_flags; if (vendor_debug) @@ -952,8 +946,6 @@ main( Exit: osm_opensm_destroy( &osm ); - if (mem_track) cl_mem_display(); - complib_exit(); exit( 0 ); Index: osm/opensm/osm_sa_mcmember_record.c =================================================================== --- osm/opensm/osm_sa_mcmember_record.c (revision 7470) +++ osm/opensm/osm_sa_mcmember_record.c (working copy) @@ -55,9 +55,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include #include #include #include @@ -325,7 +325,9 @@ __get_new_mlid( /* track all used mlids in the array (by mlid index) */ used_mlids_array = - (uint8_t *)cl_zalloc(sizeof(uint8_t)*max_num_mlids); + (uint8_t *)malloc(sizeof(uint8_t)*max_num_mlids); + if (used_mlids_array) + memset(used_mlids_array, 0, sizeof(uint8_t)*max_num_mlids); if (!used_mlids_array) return 0; @@ -383,7 +385,7 @@ __get_new_mlid( mlid = 0; } - cl_free(used_mlids_array); + free(used_mlids_array); Exit: OSM_LOG_EXIT(p_rcv->p_log); Index: osm/opensm/osm_drop_mgr.c =================================================================== --- osm/opensm/osm_drop_mgr.c (revision 7470) +++ osm/opensm/osm_drop_mgr.c (working copy) @@ -53,7 +53,6 @@ #include #include -#include #include #include #include @@ -196,7 +195,7 @@ __osm_drop_mgr_remove_port( "__osm_drop_mgr_remove_port: " "Cleaned sm for port guid\n" ); - cl_free(p_sm); + free(p_sm); } osm_port_get_lid_range_ho( p_port, &min_lid_ho, &max_lid_ho ); Index: osm/opensm/osm_lid_mgr.c =================================================================== --- osm/opensm/osm_lid_mgr.c (revision 7470) +++ osm/opensm/osm_lid_mgr.c (working copy) @@ -90,9 +90,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include #include #include #include @@ -137,7 +137,7 @@ osm_lid_mgr_destroy( p_item = cl_qlist_remove_head( &p_mgr->free_ranges ); while ( p_item != cl_qlist_end( &p_mgr->free_ranges ) ) { - cl_free((osm_lid_mgr_range_t *)p_item); + free((osm_lid_mgr_range_t *)p_item); p_item = cl_qlist_remove_head( &p_mgr->free_ranges ); } OSM_LOG_EXIT( p_mgr->p_log ); @@ -399,7 +399,7 @@ __osm_lid_mgr_init_sweep( p_item = cl_qlist_remove_head( &p_mgr->free_ranges ); while ( p_item != cl_qlist_end( &p_mgr->free_ranges ) ) { - cl_free( (osm_lid_mgr_range_t *)p_item ); + free( (osm_lid_mgr_range_t *)p_item ); p_item = cl_qlist_remove_head( &p_mgr->free_ranges ); } @@ -417,7 +417,7 @@ __osm_lid_mgr_init_sweep( "__osm_lid_mgr_init_sweep: " "Skipping all lids as we are reassigning them\n"); p_range = - (osm_lid_mgr_range_t *)cl_malloc(sizeof(osm_lid_mgr_range_t)); + (osm_lid_mgr_range_t *)malloc(sizeof(osm_lid_mgr_range_t)); p_range->min_lid = 1; goto AfterScanningLids; } @@ -596,7 +596,7 @@ __osm_lid_mgr_init_sweep( else { p_range = - (osm_lid_mgr_range_t *)cl_malloc(sizeof(osm_lid_mgr_range_t)); + (osm_lid_mgr_range_t *)malloc(sizeof(osm_lid_mgr_range_t)); p_range->min_lid = lid; p_range->max_lid = lid; } @@ -622,7 +622,7 @@ __osm_lid_mgr_init_sweep( if (!p_range) { p_range = - (osm_lid_mgr_range_t *)cl_malloc(sizeof(osm_lid_mgr_range_t)); + (osm_lid_mgr_range_t *)malloc(sizeof(osm_lid_mgr_range_t)); /* The p_range can be NULL in one of 2 cases: 1. If max_defined_lid == 0. In this case, we want the entire range. From mshefty at ichips.intel.com Wed May 24 10:52:33 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 24 May 2006 10:52:33 -0700 Subject: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: <20060524173604.GB25186@mellanox.co.il> References: <20060524133728.GN21266@mellanox.co.il> <20060524162242.GC21266@mellanox.co.il> <44748D8B.1000508@ichips.intel.com> <20060524173604.GB25186@mellanox.co.il> Message-ID: <44749D61.1060506@ichips.intel.com> Michael S. Tsirkin wrote: >>It's not completely trivial to reproduce. I tried loading and >>unloading ib_ipoib a few times, then I tried to load ib_ipoib and >>unload ib_mthca, I tried pinging in between loading and unloading, and >>I didn't see any crashes. > > > Maybe SM was down. I've tested loading, ifconfig, unloading a substantial number of times with the SM up and down, and I can't reproduce this. I'll continue to look into this, but if you can provide any more information about the test setup, it would be helpful. - Sean From paul.lundin at gmail.com Wed May 24 10:59:31 2006 From: paul.lundin at gmail.com (Paul) Date: Wed, 24 May 2006 13:59:31 -0400 Subject: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o In-Reply-To: References: Message-ID: Scott, Upon further inspection the build.sh and install.sh scripts built 32bit libraries and binaries. If I export CFLAGS (and the like) to include -m64 then the build dies while looking for a 64bit libsysfs. rhel4 u3 does not include a ppc64 sysfsutils, nor have I been able to find an actual 64bit version of it. Is there a workaround for getting things to build actual ppc64 binaries/libraries ? The actual error is: checking for dlsym in -ldl... yes checking for pthread_mutex_init in -lpthread... yes checking for sysfs_open_class in -lsysfs... no configure: error: sysfs_open_class() not found. libibverbs requires libsysfs. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweitzen at cisco.com Wed May 24 12:24:14 2006 From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen)) Date: Wed, 24 May 2006 12:24:14 -0700 Subject: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o Message-ID: I know Vlad made some changes for rc5 in this area, at least for libsdp, not sure if other libs got changed as well. Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems ________________________________ From: Paul [mailto:paul.lundin at gmail.com] Sent: Wednesday, May 24, 2006 11:00 AM To: Scott Weitzenkamp (sweitzen) Cc: openib-general at openib.org Subject: Re: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o Scott, Upon further inspection the build.sh and install.sh scripts built 32bit libraries and binaries. If I export CFLAGS (and the like) to include -m64 then the build dies while looking for a 64bit libsysfs. rhel4 u3 does not include a ppc64 sysfsutils, nor have I been able to find an actual 64bit version of it. Is there a workaround for getting things to build actual ppc64 binaries/libraries ? The actual error is: checking for dlsym in -ldl... yes checking for pthread_mutex_init in -lpthread... yes checking for sysfs_open_class in -lsysfs... no configure: error: sysfs_open_class() not found. libibverbs requires libsysfs. -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.lundin at gmail.com Wed May 24 12:34:17 2006 From: paul.lundin at gmail.com (Paul) Date: Wed, 24 May 2006 15:34:17 -0400 Subject: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs Message-ID: I dont see an rc5 @ https://openfabrics.org/svn/gen2/branches/1.0/ofed/releases/ Has anybody gotten the build.sh or install.sh to produce 64bit stuff on ppc64 ? (show of hands ? any voodoo required ?) On 5/24/06, Scott Weitzenkamp (sweitzen) wrote: > > I know Vlad made some changes for rc5 in this area, at least for libsdp, > not sure if other libs got changed as well. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > ------------------------------ > *From:* Paul [mailto:paul.lundin at gmail.com] > *Sent:* Wednesday, May 24, 2006 11:00 AM > > *To:* Scott Weitzenkamp (sweitzen) > *Cc:* openib-general at openib.org > *Subject:* Re: [openib-general] Compilation issues on rhel4 u3 ppc64 > sysfs.o > > Scott, > Upon further inspection the build.sh and install.sh scripts built > 32bit libraries and binaries. If I export CFLAGS (and the like) to include > -m64 then the build dies while looking for a 64bit libsysfs. rhel4 u3 does > not include a ppc64 sysfsutils, nor have I been able to find an actual 64bit > version of it. Is there a workaround for getting things to build actual > ppc64 binaries/libraries ? > > The actual error is: > checking for dlsym in -ldl... yes > checking for pthread_mutex_init in -lpthread... yes > checking for sysfs_open_class in -lsysfs... no > configure: error: sysfs_open_class() not found. libibverbs requires > libsysfs. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at mellanox.co.il Wed May 24 13:41:42 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 24 May 2006 23:41:42 +0300 Subject: [openib-general] RE: [PATCH]OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_tFix NULL ptr issue Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30236870E@mtlexch01.mtl.com> > But isn't CL_ASSERT a debug compile time thing so it's needed when it's > built without debug ? Yes it is - but I do not think it is required. > > > > You missed the line that asserts on null p_next_step just before the > > code you changed . > > > -- Hal > > > . > > > > EZ > > > > Eitan Zahavi > > Senior Engineering Director, Software Architect > > Mellanox Technologies LTD > > Tel:+972-4-9097208 > > Fax:+972-4-9593245 > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > -----Original Message----- > > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > > Sent: Wednesday, May 24, 2006 4:34 PM > > > To: openib-general at openib.org > > > Cc: Eitan Zahavi > > > Subject: [PATCH] > > OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_tFix > > > NULL ptr issue > > > > > > OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_t Fix NULL ptr > > > issue > > > > > > Signed-off-by: Hal Rosenstock > > > > > > Index: opensm/osm_ucast_updn.c > > > =================================================================== > > > --- opensm/osm_ucast_updn.c (revision 7435) > > > +++ opensm/osm_ucast_updn.c (working copy) > > > @@ -121,10 +121,12 @@ __updn_create_updn_next_step_t(IN updn_s > > > p_next_step = (updn_next_step_t*) cl_zalloc(sizeof(*p_next_step)); > > > CL_ASSERT (p_next_step != NULL); > > > > > > - p_next_step->state = state; > > > - p_next_step->p_sw = p_sw; > > > + if (p_next_step) > > > + { > > > + p_next_step->state = state; > > > + p_next_step->p_sw = p_sw; > > > + } > > > return p_next_step; > > > - > > > } > > > > > > > > /********************************************************************** > > > > > From xma at us.ibm.com Wed May 24 13:42:50 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 24 May 2006 14:42:50 -0600 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: Message-ID: I have changed the order of these patches in order to make the splitting easy. diff -urpN infiniband-split-cq/ulp/ipoib/ipoib.h infiniband-ah/ulp/ipoib/ipoib.h --- infiniband-split-cq/ulp/ipoib/ipoib.h 2006-05-23 10:07:15.000000000 -0700 +++ infiniband-ah/ulp/ipoib/ipoib.h 2006-05-23 10:09:05.000000000 -0700 @@ -86,7 +86,6 @@ enum { IPOIB_PKEY_STOP = 4, IPOIB_FLAG_SUBINTERFACE = 5, IPOIB_MCAST_RUN = 6, - IPOIB_STOP_REAPER = 7, IPOIB_MCAST_STARTED = 8, IPOIB_MAX_BACKOFF_SECONDS = 16, @@ -147,7 +146,6 @@ struct ipoib_dev_priv { struct work_struct mcast_task; struct work_struct flush_task; struct work_struct restart_task; - struct work_struct ah_reap_task; struct ib_device *ca; u8 port; @@ -197,7 +195,6 @@ struct ipoib_ah { struct ib_ah *ah; struct list_head list; struct kref ref; - unsigned last_send; }; struct ipoib_path { @@ -263,7 +260,6 @@ int ipoib_add_pkey_attr(struct net_devic void ipoib_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_ah *address, u32 qpn); -void ipoib_reap_ah(void *dev_ptr); void ipoib_flush_paths(struct net_device *dev); struct ipoib_dev_priv *ipoib_intf_alloc(const char *format); diff -urpN infiniband-split-cq/ulp/ipoib/ipoib_ib.c infiniband-ah/ulp/ipoib/ipoib_ib.c --- infiniband-split-cq/ulp/ipoib/ipoib_ib.c 2006-05-23 10:07:51.000000000 -0700 +++ infiniband-ah/ulp/ipoib/ipoib_ib.c 2006-05-23 10:14:08.000000000 -0700 @@ -65,7 +65,6 @@ struct ipoib_ah *ipoib_create_ah(struct return NULL; ah->dev = dev; - ah->last_send = 0; kref_init(&ah->ref); ah->ah = ib_create_ah(pd, attr); @@ -83,17 +82,9 @@ void ipoib_free_ah(struct kref *kref) struct ipoib_ah *ah = container_of(kref, struct ipoib_ah, ref); struct ipoib_dev_priv *priv = netdev_priv(ah->dev); - unsigned long flags; - - if ((int) priv->tx_tail - (int) ah->last_send >= 0) { - ipoib_dbg(priv, "Freeing ah %p\n", ah->ah); - ib_destroy_ah(ah->ah); - kfree(ah); - } else { - spin_lock_irqsave(&priv->lock, flags); - list_add_tail(&ah->list, &priv->dead_ahs); - spin_unlock_irqrestore(&priv->lock, flags); - } + ipoib_dbg(priv, "Freeing ah %p\n", ah->ah); + ib_destroy_ah(ah->ah); + kfree(ah); } static int ipoib_ib_post_receive(struct net_device *dev, int id) @@ -344,13 +335,16 @@ void ipoib_send(struct net_device *dev, struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_tx_buf *tx_req; dma_addr_t addr; + int err; + kref_get(&address->ref); if (skb->len > dev->mtu + INFINIBAND_ALEN) { ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n", skb->len, dev->mtu + INFINIBAND_ALEN); ++priv->stats.tx_dropped; ++priv->stats.tx_errors; dev_kfree_skb_any(skb); + kref_put(&address->ref, ipoib_free_ah); return; } @@ -370,8 +364,10 @@ void ipoib_send(struct net_device *dev, DMA_TO_DEVICE); pci_unmap_addr_set(tx_req, mapping, addr); - if (unlikely(post_send(priv, priv->tx_head & (ipoib_sendq_size - 1), - address->ah, qpn, addr, skb->len))) { + err = post_send(priv, priv->tx_head & (ipoib_sendq_size - 1), + address->ah, qpn, addr, skb->len); + kref_put(&address->ref, ipoib_free_ah); + if (unlikely(err)) { ipoib_warn(priv, "post_send failed\n"); ++priv->stats.tx_errors; dma_unmap_single(priv->ca->dma_device, addr, skb->len, @@ -380,7 +376,6 @@ void ipoib_send(struct net_device *dev, } else { dev->trans_start = jiffies; - address->last_send = priv->tx_head; ++priv->tx_head; if (priv->tx_head - priv->tx_tail == ipoib_sendq_size) { @@ -390,38 +385,6 @@ void ipoib_send(struct net_device *dev, } } -static void __ipoib_reap_ah(struct net_device *dev) -{ - struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_ah *ah, *tah; - LIST_HEAD(remove_list); - - spin_lock_irq(&priv->lock); - list_for_each_entry_safe(ah, tah, &priv->dead_ahs, list) - if ((int) priv->tx_tail - (int) ah->last_send >= 0) { - list_del(&ah->list); - list_add_tail(&ah->list, &remove_list); - } - spin_unlock_irq(&priv->lock); - - list_for_each_entry_safe(ah, tah, &remove_list, list) { - ipoib_dbg(priv, "Reaping ah %p\n", ah->ah); - ib_destroy_ah(ah->ah); - kfree(ah); - } -} - -void ipoib_reap_ah(void *dev_ptr) -{ - struct net_device *dev = dev_ptr; - struct ipoib_dev_priv *priv = netdev_priv(dev); - - __ipoib_reap_ah(dev); - - if (!test_bit(IPOIB_STOP_REAPER, &priv->flags)) - queue_delayed_work(ipoib_workqueue, &priv->ah_reap_task, HZ); -} - int ipoib_ib_dev_open(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -440,9 +403,6 @@ int ipoib_ib_dev_open(struct net_device return -1; } - clear_bit(IPOIB_STOP_REAPER, &priv->flags); - queue_delayed_work(ipoib_workqueue, &priv->ah_reap_task, HZ); - set_bit(IPOIB_FLAG_INITIALIZED, &priv->flags); return 0; @@ -580,24 +540,6 @@ timeout: if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE)) ipoib_warn(priv, "Failed to modify QP to RESET state\n"); - /* Wait for all AHs to be reaped */ - set_bit(IPOIB_STOP_REAPER, &priv->flags); - cancel_delayed_work(&priv->ah_reap_task); - flush_workqueue(ipoib_workqueue); - - begin = jiffies; - - while (!list_empty(&priv->dead_ahs)) { - __ipoib_reap_ah(dev); - - if (time_after(jiffies, begin + HZ)) { - ipoib_warn(priv, "timing out; will leak address handles\n"); - break; - } - - msleep(1); - } - return 0; } diff -urpN infiniband-split-cq/ulp/ipoib/ipoib_main.c infiniband-ah/ulp/ipoib/ipoib_main.c --- infiniband-split-cq/ulp/ipoib/ipoib_main.c 2006-05-22 08:48:47.000000000 -0700 +++ infiniband-ah/ulp/ipoib/ipoib_main.c 2006-05-23 09:31:49.000000000 -0700 @@ -957,7 +957,6 @@ static void ipoib_setup(struct net_devic INIT_WORK(&priv->mcast_task, ipoib_mcast_join_task, priv->dev); INIT_WORK(&priv->flush_task, ipoib_ib_dev_flush, priv->dev); INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task, priv->dev); - INIT_WORK(&priv->ah_reap_task, ipoib_reap_ah, priv->dev); } struct ipoib_dev_priv *ipoib_intf_alloc(const char *name) Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: infiniband-ah.patch Type: application/octet-stream Size: 6022 bytes Desc: not available URL: From rdreier at cisco.com Wed May 24 13:45:45 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 13:45:45 -0700 Subject: [openib-general] Re: [PATCH] mthca: fix posting list of 256 to SRQ In-Reply-To: <20060524152707.GX21266@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 24 May 2006 18:27:07 +0300") References: <20060524152707.GX21266@mellanox.co.il> Message-ID: Thanks, applied. From rdreier at cisco.com Wed May 24 13:50:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 13:50:37 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: (Shirley Ma's message of "Wed, 24 May 2006 14:42:50 -0600") References: Message-ID: NAK to this patch. Not only is is a step backwards in performance -- you've essentially added two (expensive) atomic operations for every packet sent -- but the patch is actually wrong: > + err = post_send(priv, priv->tx_head & (ipoib_sendq_size - 1), > + address->ah, qpn, addr, skb->len); > + kref_put(&address->ref, ipoib_free_ah); The whole point of the complexity in AH handling in IPoIB is that AHs cannot be freed until the driver knows that all sends referring to them have _completed_. As you've written your patch, an AH can easily be freed before the HCA has a chance to execute the corresponding send request. - R. From mst at mellanox.co.il Wed May 24 13:52:38 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 24 May 2006 23:52:38 +0300 Subject: [openib-general] Re: Compilation issues on rhel4 u3 ppc64 sysfs.o In-Reply-To: References: Message-ID: <20060524205238.GC25817@mellanox.co.il> Quoting r. Paul : > Subject: Re: Compilation issues on rhel4 u3 ppc64 sysfs.o > > Scott, > Upon further inspection the build.sh and install.sh scripts built 32bit libraries and binaries. If I export CFLAGS (and the like) to include -m64 then the build dies while looking for a 64bit libsysfs. rhel4 u3 does not include a ppc64 sysfsutils, nor have I been able to find an actual 64bit version of it. Is there a workaround for getting things to build actual ppc64 binaries/libraries ? > > The actual error is: > checking for dlsym in -ldl... yes > checking for pthread_mutex_init in -lpthread... yes > checking for sysfs_open_class in -lsysfs... no > configure: error: sysfs_open_class() not found. libibverbs requires libsysfs. You need the 64 bit version of libsysfs - verbs require this. -- MST From xma at us.ibm.com Wed May 24 14:00:06 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 24 May 2006 14:00:06 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: Message-ID: Roland Dreier wrote on 05/24/2006 01:50:37 PM: > NAK to this patch. Not only is is a step backwards in performance -- > you've essentially added two (expensive) atomic operations for every > packet sent My observation is the atomic operation is not that expensive. > -- but the patch is actually wrong: > > > + err = post_send(priv, priv->tx_head & (ipoib_sendq_size - 1), > > + address->ah, qpn, addr, skb->len); > > + kref_put(&address->ref, ipoib_free_ah); > > The whole point of the complexity in AH handling in IPoIB is that AHs > cannot be freed until the driver knows that all sends referring to > them have _completed_. As you've written your patch, an AH can easily > be freed before the HCA has a chance to execute the corresponding send > request. > > - R. I thought the path holding another AH reference to prevent it to be freed? Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Wed May 24 14:03:34 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 14:03:34 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: (Shirley Ma's message of "Wed, 24 May 2006 14:00:06 -0700") References: Message-ID: Shirley> My observation is the atomic operation is not that Shirley> expensive. It's just about the worst thing to do. For example, on x86/x86-64 an instruction with the lock; prefix is quite slow. If you look at an instruction level profile you can see that quite clearly. Shirley> I thought the path holding another AH reference to Shirley> prevent it to be freed? If that were true then why would we want to reference count sends at all? The whole point is that a path might be destroyed before the send is executed. - R. From paul.lundin at gmail.com Wed May 24 14:25:20 2006 From: paul.lundin at gmail.com (Paul) Date: Wed, 24 May 2006 17:25:20 -0400 Subject: [openib-general] Re: Compilation issues on rhel4 u3 ppc64 sysfs.o In-Reply-To: <20060524205238.GC25817@mellanox.co.il> References: <20060524205238.GC25817@mellanox.co.il> Message-ID: Yeah. I got it (had to build it myself from the src rpm). I now have a functioning 64bit openIB install. Currently running into some issues with open-mpi .... but I am taking that up with the open-mpi mailling list. Thanks for the help guys. On 5/24/06, Michael S. Tsirkin wrote: > > Quoting r. Paul : > > Subject: Re: Compilation issues on rhel4 u3 ppc64 sysfs.o > > > > Scott, > > Upon further inspection the build.sh and install.sh scripts built > 32bit libraries and binaries. If I export CFLAGS (and the like) to include > -m64 then the build dies while looking for a 64bit libsysfs. rhel4 u3 does > not include a ppc64 sysfsutils, nor have I been able to find an actual 64bit > version of it. Is there a workaround for getting things to build actual > ppc64 binaries/libraries ? > > > > The actual error is: > > checking for dlsym in -ldl... yes > > checking for pthread_mutex_init in -lpthread... yes > > checking for sysfs_open_class in -lsysfs... no > > configure: error: sysfs_open_class() not found. libibverbs requires > libsysfs. > > You need the 64 bit version of libsysfs - verbs require this. > > -- > MST > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Wed May 24 14:34:53 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 24 May 2006 17:34:53 -0400 Subject: [openib-general] RE: [PATCH]OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_tFix NULL ptr issue In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30236870E@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30236870E@mtlexch01.mtl.com> Message-ID: <1148506296.4470.128436.camel@hal.voltaire.com> On Wed, 2006-05-24 at 16:41, Eitan Zahavi wrote: > > But isn't CL_ASSERT a debug compile time thing so it's needed when > it's > > built without debug ? > Yes it is - but I do not think it is required. I don't understand what you wrote. Does "yes it is" mean that you agree it is a debug compile time thing ? Why is it not required ? What if the memory allocation fails ? -- Hal > > > > > > > You missed the line that asserts on null p_next_step just before the > > > code you changed . > > > > > > -- Hal > > > > > . > > > > > > EZ > > > > > > Eitan Zahavi > > > Senior Engineering Director, Software Architect > > > Mellanox Technologies LTD > > > Tel:+972-4-9097208 > > > Fax:+972-4-9593245 > > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > > > > -----Original Message----- > > > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > > > Sent: Wednesday, May 24, 2006 4:34 PM > > > > To: openib-general at openib.org > > > > Cc: Eitan Zahavi > > > > Subject: [PATCH] > > > OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_tFix > > > > NULL ptr issue > > > > > > > > OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_t Fix NULL > ptr > > > > issue > > > > > > > > Signed-off-by: Hal Rosenstock > > > > > > > > Index: opensm/osm_ucast_updn.c > > > > > =================================================================== > > > > --- opensm/osm_ucast_updn.c (revision 7435) > > > > +++ opensm/osm_ucast_updn.c (working copy) > > > > @@ -121,10 +121,12 @@ __updn_create_updn_next_step_t(IN updn_s > > > > p_next_step = (updn_next_step_t*) > cl_zalloc(sizeof(*p_next_step)); > > > > CL_ASSERT (p_next_step != NULL); > > > > > > > > - p_next_step->state = state; > > > > - p_next_step->p_sw = p_sw; > > > > + if (p_next_step) > > > > + { > > > > + p_next_step->state = state; > > > > + p_next_step->p_sw = p_sw; > > > > + } > > > > return p_next_step; > > > > - > > > > } > > > > > > > > > > > > /********************************************************************** > > > > > > > From sean.hefty at intel.com Wed May 24 14:48:17 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 24 May 2006 14:48:17 -0700 Subject: [openib-general] [PATCH] ib_multicast: set member state correctly Message-ID: Once a join request completes, we need to set the state to MCAST_MEMBER. Otherwise, we won't try to leave the group when ib_free_multicast() is called, which will leave the port in the multicast group. Signed-off-by: Sean Hefty --- This fix shouldn't have any effect on stability. Index: multicast.c =================================================================== --- multicast.c (revision 7442) +++ multicast.c (working copy) @@ -332,6 +332,7 @@ static int send_leave(struct mcast_group static void join_group(struct mcast_group *group, struct mcast_member *member, u8 join_state) { + member->state = MCAST_MEMBER; adjust_membership(group, join_state, 1); group->rec.join_state |= join_state; member->multicast.rec = group->rec; From xma at us.ibm.com Wed May 24 14:50:31 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 24 May 2006 14:50:31 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: Message-ID: Roland, Roland Dreier wrote on 05/24/2006 02:03:34 PM: > Shirley> My observation is the atomic operation is not that > Shirley> expensive. > > It's just about the worst thing to do. For example, on x86/x86-64 an > instruction with the lock; prefix is quite slow. If you look at an > instruction level profile you can see that quite clearly. Compared to have a single thread handling AH, I don't think this atomic operation is expensive. > Shirley> I thought the path holding another AH reference to > Shirley> prevent it to be freed? > > If that were true then why would we want to reference count sends at > all? The whole point is that a path might be destroyed before the > send is executed. > > - R. It is true for unicast, it has a reference count before ipoib_send(). I need to look at multicast. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Wed May 24 15:01:01 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 15:01:01 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: (Shirley Ma's message of "Wed, 24 May 2006 14:50:31 -0700") References: Message-ID: Shirley> Compared to have a single thread handling AH, I don't Shirley> think this atomic operation is expensive. But freeing AHs is something that happens infrequently and can be done asynchronously. You're replacing that cost with two atomic operations per send packet! Shirley> It is true for unicast, it has a reference count before Shirley> ipoib_send(). I need to look at multicast. But can you guarantee that the AH stays around until after the send completes (which could be an arbitrarily long delay)? - R. From jlentini at netapp.com Wed May 24 15:22:24 2006 From: jlentini at netapp.com (James Lentini) Date: Wed, 24 May 2006 18:22:24 -0400 (EDT) Subject: [openib-general] Re: which dapl/udapl changes in trunk should be imported into OFED branch? (patch enclosed) In-Reply-To: <200605241806.48546.jackm@mellanox.co.il> References: <200605241806.48546.jackm@mellanox.co.il> Message-ID: On Wed, 24 May 2006, Jack Morgenstein wrote: > Hi, > > Below is a patch file of differences between the OFED dapl library > and the openib main trunk dapl library. > > Please indicate which of the dapl library changes are necessary for > the Intel MPI to work correctly in OFED. How recent is the ucm code in OFED? From xma at us.ibm.com Wed May 24 15:59:15 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 24 May 2006 15:59:15 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: Message-ID: Roland, Roland Dreier wrote on 05/24/2006 03:01:01 PM: > Shirley> Compared to have a single thread handling AH, I don't > Shirley> think this atomic operation is expensive. > > But freeing AHs is something that happens infrequently and can be done > asynchronously. You're replacing that cost with two atomic operations > per send packet! No, actually it didn't free during sending during my test. > Shirley> It is true for unicast, it has a reference count before > Shirley> ipoib_send(). I need to look at multicast. > > But can you guarantee that the AH stays around until after the send > completes (which could be an arbitrarily long delay)? > > - R. I checked negih_add_path(), for unicast it is true always. See code below. static void neigh_add_path(..) { ... if (path->ah) { kref_get(&path->ah->ref); neigh->ah = path->ah; ipoib_send(dev, skb, path->ah... } Please correct me if I am wrong. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Wed May 24 16:07:58 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 16:07:58 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: (Shirley Ma's message of "Wed, 24 May 2006 15:59:15 -0700") References: Message-ID: > > But freeing AHs is something that happens infrequently and can be done > > asynchronously. You're replacing that cost with two atomic operations > > per send packet! > No, actually it didn't free during sending during my test. Sorry, I don't understand that answer at all. It doesn't seem to be responding to my point at all. To reiterate: freeing AHs is a rare, "slow path" operation that can be done asynchronously. It is not a good tradeoff to do two atomic_t operations for every sent packet, just to avoid occasionally reaping AHs in process context. > > But can you guarantee that the AH stays around until after the send > > completes (which could be an arbitrarily long delay)? > I checked negih_add_path(), for unicast it is true always. See code below. > > static void neigh_add_path(..) > { > ... > if (path->ah) { > kref_get(&path->ah->ref); > neigh->ah = path->ah; > ipoib_send(dev, skb, path->ah... > } Again, I don't understand how this is a response at all. The AH cannot be freed until after the send operation is actually fully completed, which could be a long time after ib_post_send() returns. If an AH is freed after ipoib_send() returns but before the send is executed, then the HCA may use stale data, which could lead to a send error. To summarize: the patch is broken (leads to incorrect lifetimes for AHs), and in any case makes the send fast path slower. - R. From mshefty at ichips.intel.com Wed May 24 16:38:45 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 24 May 2006 16:38:45 -0700 Subject: [openib-general] ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: <20060524133728.GN21266@mellanox.co.il> References: <20060524133728.GN21266@mellanox.co.il> Message-ID: <4474EE85.2000409@ichips.intel.com> Michael S. Tsirkin wrote: >>>The last trunk build causes kernel oops on 2.6.16 while restarting the driver. >>> >>>(the previous build -rev 7422- works fine) Note that ipoib moved to ib_multicast in rev 7401. >>>May 24 16:00:40 sw037 kernel: Modules linked in: ib_sa ib_uverbs ib_umad >> >>ib_mthca ib_mad ib_core Can you please verify that ib_multicast was loaded prior to the crash? >>{:ib_sa:ib_sa_mcmember_rec_callback+76} I've been unable to reproduce this after 100 or so tests of doing an ifconfig while unloading the modules. I've tried with the SM both up and down, and with ping running at the same time. Reviewing the code, the multicast module should cancel all SA queries and wait for them to complete before unloading. (Even if it didn't perform the cancel, it should still wait for any outstanding SA query to complete.) > Right, reverting to 7401 solved the problem. Are you able to duplicate the problem with the latest svn rev? - Sean From xma at us.ibm.com Wed May 24 16:42:04 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 24 May 2006 16:42:04 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: Message-ID: Roland Dreier wrote on 05/24/2006 04:07:58 PM: > To reiterate: freeing AHs is a rare, > "slow path" operation that can be done asynchronously. It is not a > good tradeoff to do two atomic_t operations for every sent packet, > just to avoid occasionally reaping AHs in process context. I don't think two atomic operation is that expensive compare to reaping AHs in process context according to the test results and profiling data. Or we can use RCU instead. > > > But can you guarantee that the AH stays around until after the send > > > completes (which could be an arbitrarily long delay)? > > > I checked negih_add_path(), for unicast it is true always. See code below. > > > > static void neigh_add_path(..) > > { > > ... > > if (path->ah) { > > kref_get(&path->ah->ref); > > neigh->ah = path->ah; > > ipoib_send(dev, skb, path->ah... > > } > > Again, I don't understand how this is a response at all. The AH > cannot be freed until after the send operation is actually fully > completed, which could be a long time after ib_post_send() returns. > > If an AH is freed after ipoib_send() returns but before the send is > executed, then the HCA may use stale data, which could lead to a send > error. > > To summarize: the patch is broken (leads to incorrect lifetimes for > AHs), and in any case makes the send fast path slower. > > - R. That's a value point. This problem will be addressed in next tx_ring removal patch. The kref_put was called in ipoib_ib_handle_send_wc(). Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Wed May 24 16:43:58 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 24 May 2006 16:43:58 -0700 Subject: [openib-general] ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: <4474EE85.2000409@ichips.intel.com> References: <20060524133728.GN21266@mellanox.co.il> <4474EE85.2000409@ichips.intel.com> Message-ID: <4474EFBE.2060905@ichips.intel.com> Sean Hefty wrote: > Reviewing the code, the multicast module should cancel all SA queries > and wait for them to complete before unloading. (Even if it didn't > perform the cancel, it should still wait for any outstanding SA query to > complete.) As a thought, is there any chance this crash is related to the ib_sa (and ib_cm, rdma_cm, ib_addr) interface, module unload issue that was reported? - Sean From rdreier at cisco.com Wed May 24 16:50:13 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 16:50:13 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: (Shirley Ma's message of "Wed, 24 May 2006 16:42:04 -0700") References: Message-ID: Shirley> I don't think two atomic operation is that expensive Shirley> compare to reaping AHs in process context according to Shirley> the test results and profiling data. Or we can use RCU Shirley> instead. What is the cost of reaping AHs? Under most workloads I would expect AHs essentially never get freed. Are you saying that you see __ipoib_reap_ah() show up in your profiles? If so then we should fix the underlying cause of that rather than slowing down the ipoib fast path. BTW, since it is allowed to call ib_destroy_ah() with a lock held (cf Documentation/infiniband/core_locking.txt) then I think the patch below is a reasonable cleanup: --- infiniband/ulp/ipoib/ipoib_ib.c (revision 7485) +++ infiniband/ulp/ipoib/ipoib_ib.c (working copy) @@ -380,15 +380,10 @@ static void __ipoib_reap_ah(struct net_d list_for_each_entry_safe(ah, tah, &priv->dead_ahs, list) if ((int) priv->tx_tail - (int) ah->last_send >= 0) { list_del(&ah->list); - list_add_tail(&ah->list, &remove_list); + ib_destroy_ah(ah->ah); + kfree(ah); } spin_unlock_irq(&priv->lock); - - list_for_each_entry_safe(ah, tah, &remove_list, list) { - ipoib_dbg(priv, "Reaping ah %p\n", ah->ah); - ib_destroy_ah(ah->ah); - kfree(ah); - } } void ipoib_reap_ah(void *dev_ptr) From xma at us.ibm.com Wed May 24 17:07:42 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 24 May 2006 17:07:42 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: Message-ID: Roland, My idea is to remove this AH reap thread. We can use RCU to do the same work without lots of coding. Do you agree? And in the reap AH code, tx_tail/tx_head isn't consistently protected by tx_lock. It uses priv->lock. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Wed May 24 17:11:12 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 17:11:12 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: (Shirley Ma's message of "Wed, 24 May 2006 17:07:42 -0700") References: Message-ID: Shirley> Roland, My idea is to remove this AH reap thread. We can Shirley> use RCU to do the same work without lots of coding. Do Shirley> you agree? No, I don't see how that will help. How does RCU know when it's safe to free an AH? Shirley> And in the reap AH code, tx_tail/tx_head isn't Shirley> consistently protected by tx_lock. It uses priv->lock. Hmm, that may be a bug. I'll take a look. - R. From xma at us.ibm.com Wed May 24 17:46:37 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 24 May 2006 17:46:37 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: Message-ID: Roland Dreier wrote on 05/24/2006 05:11:12 PM: > Shirley> Roland, My idea is to remove this AH reap thread. We can > Shirley> use RCU to do the same work without lots of coding. Do > Shirley> you agree? > > No, I don't see how that will help. How does RCU know when it's safe > to free an AH? With tx_ring removal patch, RCU can be done. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Wed May 24 17:52:31 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 17:52:31 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: (Shirley Ma's message of "Wed, 24 May 2006 17:46:37 -0700") References: Message-ID: Shirley> With tx_ring removal patch, RCU can be done. OK, I guess I'll wait and see. But to be honest I don't see how RCU helps anything. - R. From xma at us.ibm.com Wed May 24 17:54:57 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 24 May 2006 17:54:57 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: Message-ID: Roland Dreier wrote on 05/24/2006 05:52:31 PM: > Shirley> With tx_ring removal patch, RCU can be done. > > OK, I guess I'll wait and see. But to be honest I don't see how RCU > helps anything. > > - R. I am continuing to sumit the tx_ring patch with atomic operation for you to review, let's discuss the AH reap solution later. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Wed May 24 18:35:05 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 May 2006 18:35:05 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: (Roland Dreier's message of "Wed, 24 May 2006 17:11:12 -0700") References: Message-ID: Shirley> And in the reap AH code, tx_tail/tx_head isn't Shirley> consistently protected by tx_lock. It uses priv->lock. Roland> Hmm, that may be a bug. I'll take a look. Something like this (untested still) should fix things up: --- infiniband/ulp/ipoib/ipoib_ib.c (revision 7485) +++ infiniband/ulp/ipoib/ipoib_ib.c (working copy) @@ -84,15 +84,9 @@ void ipoib_free_ah(struct kref *kref) unsigned long flags; - if ((int) priv->tx_tail - (int) ah->last_send >= 0) { - ipoib_dbg(priv, "Freeing ah %p\n", ah->ah); - ib_destroy_ah(ah->ah); - kfree(ah); - } else { - spin_lock_irqsave(&priv->lock, flags); - list_add_tail(&ah->list, &priv->dead_ahs); - spin_unlock_irqrestore(&priv->lock, flags); - } + spin_lock_irqsave(&priv->lock, flags); + list_add_tail(&ah->list, &priv->dead_ahs); + spin_unlock_irqrestore(&priv->lock, flags); } static int ipoib_ib_post_receive(struct net_device *dev, int id) @@ -376,19 +370,16 @@ static void __ipoib_reap_ah(struct net_d struct ipoib_ah *ah, *tah; LIST_HEAD(remove_list); - spin_lock_irq(&priv->lock); + spin_lock_irq(&priv->tx_lock); + spin_lock(&priv->lock); list_for_each_entry_safe(ah, tah, &priv->dead_ahs, list) if ((int) priv->tx_tail - (int) ah->last_send >= 0) { list_del(&ah->list); - list_add_tail(&ah->list, &remove_list); + ib_destroy_ah(ah->ah); + kfree(ah); } - spin_unlock_irq(&priv->lock); - - list_for_each_entry_safe(ah, tah, &remove_list, list) { - ipoib_dbg(priv, "Reaping ah %p\n", ah->ah); - ib_destroy_ah(ah->ah); - kfree(ah); - } + spin_unlock(&priv->lock); + spin_unlock_irq(&priv->tx_lock); } void ipoib_reap_ah(void *dev_ptr) From xma at us.ibm.com Wed May 24 19:11:57 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 24 May 2006 19:11:57 -0700 Subject: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring In-Reply-To: Message-ID: Roland, Here is the tx_ring removal patch for you to review. diff -urpN infiniband-ah/ulp/ipoib/ipoib.h infiniband-tx/ulp/ipoib/ipoib.h --- infiniband-ah/ulp/ipoib/ipoib.h 2006-05-23 10:09:05.000000000 -0700 +++ infiniband-tx/ulp/ipoib/ipoib.h 2006-05-24 11:45:52.000000000 -0700 @@ -114,11 +114,19 @@ struct ipoib_rx_buf { dma_addr_t mapping; }; -struct ipoib_tx_buf { - struct sk_buff *skb; - DECLARE_PCI_UNMAP_ADDR(mapping) +struct ipoib_skb_prv { + dma_addr_t addr; + struct ipoib_ah *ah; + struct sk_buff *skb; + struct list_head list; }; +#define IPOIB_SKB_PRV_ADDR(skb) (((struct ipoib_skb_prv *)(skb)->cb)->addr) +#define IPOIB_SKB_PRV_AH(skb) (((struct ipoib_skb_prv *)(skb)->cb)->ah) +#define IPOIB_SKB_PRV_SKB(skb) (((struct ipoib_skb_prv *)(skb)->cb)->skb) +#define IPOIB_SKB_PRV_LIST(skb) (((struct ipoib_skb_prv *)(skb)->cb)->list) + + /* * Device private locking: tx_lock protects members used in TX fast * path (and we use LLTX so upper layers don't do extra locking). @@ -166,12 +174,11 @@ struct ipoib_dev_priv { struct ipoib_rx_buf *rx_ring; - spinlock_t tx_lock ____cacheline_aligned_in_smp; - struct ipoib_tx_buf *tx_ring; - unsigned tx_head; - unsigned tx_tail; + spinlock_t tx_lock; struct ib_sge tx_sge; struct ib_send_wr tx_wr; + spinlock_t slist_lock; + struct list_head send_list; struct list_head dead_ahs; diff -urpN infiniband-ah/ulp/ipoib/ipoib_ib.c infiniband-tx/ulp/ipoib/ipoib_ib.c --- infiniband-ah/ulp/ipoib/ipoib_ib.c 2006-05-23 10:14:08.000000000 -0700 +++ infiniband-tx/ulp/ipoib/ipoib_ib.c 2006-05-24 11:58:46.000000000 -0700 @@ -243,45 +243,36 @@ static void ipoib_ib_handle_send_wc(stru struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); - unsigned int wr_id = wc->wr_id; - struct ipoib_tx_buf *tx_req; - unsigned long flags; - - ipoib_dbg_data(priv, "called: id %d, op %d, status: %d\n", - wr_id, wc->opcode, wc->status); - - if (wr_id >= ipoib_sendq_size) { - ipoib_warn(priv, "completion event with wrid %d (> %d)\n", - wr_id, ipoib_sendq_size); - return; - } - - ipoib_dbg_data(priv, "send complete, wrid %d\n", wr_id); + struct sk_buff *skb; + unsigned long wr_id = wc->wr_id; - tx_req = &priv->tx_ring[wr_id]; - - dma_unmap_single(priv->ca->dma_device, - pci_unmap_addr(tx_req, mapping), - tx_req->skb->len, - DMA_TO_DEVICE); + skb = (struct sk_buff *)wr_id; + kref_put(&IPOIB_SKB_PRV_AH(skb)->ref, ipoib_free_ah); + if (IS_ERR(skb) || skb != IPOIB_SKB_PRV_SKB(skb)) { + ipoib_warn(priv, "send completion event with corrupted wrid\n"); + return; + } + list_del(&IPOIB_SKB_PRV_LIST(skb)); + + ipoib_dbg_data(priv, "send complete, wrid %lu\n", wr_id); + + dma_unmap_single(priv->ca->dma_device, + IPOIB_SKB_PRV_ADDR(skb), + skb->len, + DMA_TO_DEVICE); + ++priv->stats.tx_packets; - priv->stats.tx_bytes += tx_req->skb->len; - - dev_kfree_skb_any(tx_req->skb); - - spin_lock_irqsave(&priv->tx_lock, flags); - ++priv->tx_tail; - if (netif_queue_stopped(dev) && - priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) - netif_wake_queue(dev); - spin_unlock_irqrestore(&priv->tx_lock, flags); - - if (wc->status != IB_WC_SUCCESS && - wc->status != IB_WC_WR_FLUSH_ERR) - ipoib_warn(priv, "failed send event " - "(status=%d, wrid=%d vend_err %x)\n", - wc->status, wr_id, wc->vendor_err); + priv->stats.tx_bytes += skb->len; + dev_kfree_skb_any(skb); + + if (netif_queue_stopped(dev)) + netif_wake_queue(dev); + if (wc->status != IB_WC_SUCCESS && + wc->status != IB_WC_WR_FLUSH_ERR) + ipoib_warn(priv, "failed send event " + "(status=%d, wrid=%lu vend_err %x)\n", + wc->status, wr_id, wc->vendor_err); } void ipoib_ib_send_completion(struct ib_cq *cq, void *dev_ptr) @@ -313,7 +304,7 @@ void ipoib_ib_recv_completion(struct ib_ } static inline int post_send(struct ipoib_dev_priv *priv, - unsigned int wr_id, + unsigned long wr_id, struct ib_ah *address, u32 qpn, dma_addr_t addr, int len) { @@ -333,8 +324,9 @@ void ipoib_send(struct net_device *dev, struct ipoib_ah *address, u32 qpn) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_tx_buf *tx_req; dma_addr_t addr; + unsigned long wr_id; + unsigned long flags; int err; kref_get(&address->ref); @@ -350,38 +342,31 @@ void ipoib_send(struct net_device *dev, ipoib_dbg_data(priv, "sending packet, length=%d address=%p qpn=0x%06x\n", skb->len, address, qpn); - - /* - * We put the skb into the tx_ring _before_ we call post_send() - * because it's entirely possible that the completion handler will - * run before we execute anything after the post_send(). That - * means we have to make sure everything is properly recorded and - * our state is consistent before we call post_send(). - */ - tx_req = &priv->tx_ring[priv->tx_head & (ipoib_sendq_size - 1)]; - tx_req->skb = skb; addr = dma_map_single(priv->ca->dma_device, skb->data, skb->len, DMA_TO_DEVICE); - pci_unmap_addr_set(tx_req, mapping, addr); - - err = post_send(priv, priv->tx_head & (ipoib_sendq_size - 1), - address->ah, qpn, addr, skb->len); - kref_put(&address->ref, ipoib_free_ah); - if (unlikely(err)) { - ipoib_warn(priv, "post_send failed\n"); - ++priv->stats.tx_errors; - dma_unmap_single(priv->ca->dma_device, addr, skb->len, - DMA_TO_DEVICE); - dev_kfree_skb_any(skb); - } else { - dev->trans_start = jiffies; - - ++priv->tx_head; - - if (priv->tx_head - priv->tx_tail == ipoib_sendq_size) { - ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n"); + wr_id = (unsigned long)skb; + err = post_send(priv, wr_id, address->ah, qpn, addr, skb->len); + if (!err) { + dev->trans_start = jiffies; + IPOIB_SKB_PRV_ADDR(skb) = addr; + IPOIB_SKB_PRV_AH(skb) = address; + IPOIB_SKB_PRV_SKB(skb) = skb; + spin_lock_irqsave(&priv->slist_lock, flags); + list_add_tail(&IPOIB_SKB_PRV_LIST(skb), &priv->send_list); + spin_unlock_irqrestore(&priv->slist_lock, flags); + return; + } else { + if (!netif_queue_stopped(dev)) { netif_stop_queue(dev); + ipoib_warn(priv, "stopping kernel net queue\n"); } + dma_unmap_single(priv->ca->dma_device, addr, skb->len, + DMA_TO_DEVICE); + ipoib_warn(priv, "post_send failed\n"); + ++priv->stats.tx_dropped; + ++priv->stats.tx_errors; + dev_kfree_skb_any(skb); + kref_put(&address->ref, ipoib_free_ah); } } @@ -480,7 +465,9 @@ int ipoib_ib_dev_stop(struct net_device struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_attr qp_attr; unsigned long begin; - struct ipoib_tx_buf *tx_req; + unsigned long flags; + struct ipoib_skb_prv *cb, *tcb; + struct sk_buff *skb; int i; clear_bit(IPOIB_FLAG_INITIALIZED, &priv->flags); @@ -496,25 +483,25 @@ int ipoib_ib_dev_stop(struct net_device /* Wait for all sends and receives to complete */ begin = jiffies; - while (priv->tx_head != priv->tx_tail || recvs_pending(dev)) { + while (!list_empty(&priv->send_list) || recvs_pending(dev)) { if (time_after(jiffies, begin + 5 * HZ)) { - ipoib_warn(priv, "timing out; %d sends %d receives not completed\n", - priv->tx_head - priv->tx_tail, recvs_pending(dev)); + ipoib_warn(priv, "timing out; %d receives not completed\n", + recvs_pending(dev)); /* * assume the HW is wedged and just free up * all our pending work requests. */ - while ((int) priv->tx_tail - (int) priv->tx_head < 0) { - tx_req = &priv->tx_ring[priv->tx_tail & - (ipoib_sendq_size - 1)]; - dma_unmap_single(priv->ca->dma_device, - pci_unmap_addr(tx_req, mapping), - tx_req->skb->len, - DMA_TO_DEVICE); - dev_kfree_skb_any(tx_req->skb); - ++priv->tx_tail; - } + spin_lock_irqsave(&priv->slist_lock, flags); + list_for_each_entry_safe(cb, tcb, &priv->send_list, + list) { + skb = cb->skb; + dma_unmap_single(priv->ca->dma_device, + IPOIB_SKB_PRV_ADDR(skb), + skb->len, DMA_TO_DEVICE); + dev_kfree_skb_any(skb); + } + spin_unlock_irqrestore(&priv->slist_lock, flags); for (i = 0; i < ipoib_recvq_size; ++i) if (priv->rx_ring[i].skb) { diff -urpN infiniband-ah/ulp/ipoib/ipoib_main.c infiniband-tx/ulp/ipoib/ipoib_main.c --- infiniband-ah/ulp/ipoib/ipoib_main.c 2006-05-23 09:31:49.000000000 -0700 +++ infiniband-tx/ulp/ipoib/ipoib_main.c 2006-05-24 11:47:06.000000000 -0700 @@ -708,9 +708,7 @@ static void ipoib_timeout(struct net_dev ipoib_warn(priv, "transmit timeout: latency %d msecs\n", jiffies_to_msecs(jiffies - dev->trans_start)); - ipoib_warn(priv, "queue stopped %d, tx_head %u, tx_tail %u\n", - netif_queue_stopped(dev), - priv->tx_head, priv->tx_tail); + ipoib_warn(priv, "queue stopped %d\n", netif_queue_stopped(dev)); /* XXX reset QP, etc. */ } @@ -846,7 +844,7 @@ int ipoib_dev_init(struct net_device *de { struct ipoib_dev_priv *priv = netdev_priv(dev); - /* Allocate RX/TX "rings" to hold queued skbs */ + /* Allocate RX "rings" to hold queued skbs */ priv->rx_ring = kzalloc(ipoib_recvq_size * sizeof *priv->rx_ring, GFP_KERNEL); if (!priv->rx_ring) { @@ -855,24 +853,11 @@ int ipoib_dev_init(struct net_device *de goto out; } - priv->tx_ring = kzalloc(ipoib_sendq_size * sizeof *priv->tx_ring, - GFP_KERNEL); - if (!priv->tx_ring) { - printk(KERN_WARNING "%s: failed to allocate TX ring (%d entries)\n", - ca->name, ipoib_sendq_size); - goto out_rx_ring_cleanup; - } - - /* priv->tx_head & tx_tail are already 0 */ - if (ipoib_ib_dev_init(dev, ca, port)) - goto out_tx_ring_cleanup; + goto out_rx_ring_cleanup; return 0; -out_tx_ring_cleanup: - kfree(priv->tx_ring); - out_rx_ring_cleanup: kfree(priv->rx_ring); @@ -896,10 +881,8 @@ void ipoib_dev_cleanup(struct net_device ipoib_ib_dev_cleanup(dev); kfree(priv->rx_ring); - kfree(priv->tx_ring); priv->rx_ring = NULL; - priv->tx_ring = NULL; } static void ipoib_setup(struct net_device *dev) @@ -944,6 +927,7 @@ static void ipoib_setup(struct net_devic spin_lock_init(&priv->lock); spin_lock_init(&priv->tx_lock); + spin_lock_init(&priv->slist_lock); mutex_init(&priv->mcast_mutex); mutex_init(&priv->vlan_mutex); @@ -952,6 +936,7 @@ static void ipoib_setup(struct net_devic INIT_LIST_HEAD(&priv->child_intfs); INIT_LIST_HEAD(&priv->dead_ahs); INIT_LIST_HEAD(&priv->multicast_list); + INIT_LIST_HEAD(&priv->send_list); INIT_WORK(&priv->pkey_task, ipoib_pkey_poll, priv->dev); INIT_WORK(&priv->mcast_task, ipoib_mcast_join_task, priv->dev); Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianjiang.ict at gmail.com Wed May 24 19:19:56 2006 From: ianjiang.ict at gmail.com (Ian Jiang) Date: Thu, 25 May 2006 10:19:56 +0800 Subject: [openib-general] [Question]Fast Registration Work Request and Fast Memory Regions Message-ID: <7b2fa1820605241919ncdee1f7lc40ecd7020a5e09@mail.gmail.com> Hi all! In IBA spec. 1.2 defines a Fast Registration Work Request and it should be used together with the Allocate L Key verb to get a fast registration in my opinion. Fast Memory Regions is used in the ULPs of IB Stack, such as kDAPL, iSER. The two comfuse me very much. I don't think they are the same thing but I cann't tell the difference between them. Would anybody give an explaination on how they work and where they are usually used? Thanks very much and best wish! -- Ian Jiang -------------- next part -------------- An HTML attachment was scrubbed... URL: From panda at cse.ohio-state.edu Wed May 24 20:24:47 2006 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed, 24 May 2006 23:24:47 -0400 (EDT) Subject: [openib-general] Running MVAPICH2 with SLURM Process Manager In-Reply-To: from "Don.Dhondt@Bull.com" at May 24, 2006 10:28:19 AM Message-ID: <200605250324.k4P3OleE004439@xi.cse.ohio-state.edu> Hi Don, > We are running mvapich2-0.9.3-RC0 with OFED1.0 RC4 and have had good > results. Thanks for doing this testing. Glad to know that it works with OFED1.0 RC4. Please note that we made a formal release of MVAPICH2-0.9.3 during the weekend. > We would like to use the SLURM resource manager with this combination > rather than MPD > but it does not appear to be one of the choices avaliable. Does anyone > have any > experience in this area? > > ./configure --prefix=${PREFIX} ${MULTI_THREAD} \ > --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd \ > --disable-romio --without-mpe 2>&1 |tee config-mine.log > > --with-pm=mpd > We would have liked to have seen an option for slurm. Thanks for this suggestion. We have not tested MVAPICH2 with SLURM. To the best of our knowledge, SLURM works with MPICH2/MPD. Thus, there should not be a problem for MVAPICH2 to work with SLURM. (I believe some of the MVAPICH/MVAPICH2 users do so.) We are taking a look at it and will get back to you. Best Regards, DK > Regards, > Donald Dhondt > GCOS 8 Communications Solutions Project Manager > Bull HN Information Systems Inc. > 13430 N. Black Canyon Hwy., Phoenix, AZ 85029 > Work (602) 862-5245 Fax (602) 862-4290 From xma at us.ibm.com Wed May 24 21:48:38 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 24 May 2006 21:48:38 -0700 Subject: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring In-Reply-To: Message-ID: Oops, I missed one pair of spin_lock_irqsave()/spin_lock_irqrestore() to protect send_list in ipoib_ib_handle_send_wc(). Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Wed May 24 22:01:33 2006 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 24 May 2006 22:01:33 -0700 Subject: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring In-Reply-To: Message-ID: Here is the new patch. diff -urpN infiniband-ah/ulp/ipoib/ipoib.h infiniband-tx/ulp/ipoib/ipoib.h --- infiniband-ah/ulp/ipoib/ipoib.h 2006-05-23 10:09:05.000000000 -0700 +++ infiniband-tx/ulp/ipoib/ipoib.h 2006-05-24 11:45:52.000000000 -0700 @@ -114,11 +114,19 @@ struct ipoib_rx_buf { dma_addr_t mapping; }; -struct ipoib_tx_buf { - struct sk_buff *skb; - DECLARE_PCI_UNMAP_ADDR(mapping) +struct ipoib_skb_prv { + dma_addr_t addr; + struct ipoib_ah *ah; + struct sk_buff *skb; + struct list_head list; }; +#define IPOIB_SKB_PRV_ADDR(skb) (((struct ipoib_skb_prv *)(skb)->cb)->addr) +#define IPOIB_SKB_PRV_AH(skb) (((struct ipoib_skb_prv *)(skb)->cb)->ah) +#define IPOIB_SKB_PRV_SKB(skb) (((struct ipoib_skb_prv *)(skb)->cb)->skb) +#define IPOIB_SKB_PRV_LIST(skb) (((struct ipoib_skb_prv *)(skb)->cb)->list) + + /* * Device private locking: tx_lock protects members used in TX fast * path (and we use LLTX so upper layers don't do extra locking). @@ -166,12 +174,11 @@ struct ipoib_dev_priv { struct ipoib_rx_buf *rx_ring; - spinlock_t tx_lock ____cacheline_aligned_in_smp; - struct ipoib_tx_buf *tx_ring; - unsigned tx_head; - unsigned tx_tail; + spinlock_t tx_lock; struct ib_sge tx_sge; struct ib_send_wr tx_wr; + spinlock_t slist_lock; + struct list_head send_list; struct list_head dead_ahs; diff -urpN infiniband-ah/ulp/ipoib/ipoib_ib.c infiniband-tx/ulp/ipoib/ipoib_ib.c --- infiniband-ah/ulp/ipoib/ipoib_ib.c 2006-05-23 10:14:08.000000000 -0700 +++ infiniband-tx/ulp/ipoib/ipoib_ib.c 2006-05-24 14:57:33.000000000 -0700 @@ -243,45 +243,39 @@ static void ipoib_ib_handle_send_wc(stru struct ib_wc *wc) { struct ipoib_dev_priv *priv = netdev_priv(dev); - unsigned int wr_id = wc->wr_id; - struct ipoib_tx_buf *tx_req; + struct sk_buff *skb; unsigned long flags; + unsigned long wr_id = wc->wr_id; - ipoib_dbg_data(priv, "called: id %d, op %d, status: %d\n", - wr_id, wc->opcode, wc->status); - - if (wr_id >= ipoib_sendq_size) { - ipoib_warn(priv, "completion event with wrid %d (> %d)\n", - wr_id, ipoib_sendq_size); - return; - } - - ipoib_dbg_data(priv, "send complete, wrid %d\n", wr_id); - - tx_req = &priv->tx_ring[wr_id]; - - dma_unmap_single(priv->ca->dma_device, - pci_unmap_addr(tx_req, mapping), - tx_req->skb->len, - DMA_TO_DEVICE); + skb = (struct sk_buff *)wr_id; + kref_put(&IPOIB_SKB_PRV_AH(skb)->ref, ipoib_free_ah); + if (IS_ERR(skb) || skb != IPOIB_SKB_PRV_SKB(skb)) { + ipoib_warn(priv, "send completion event with corrupted wrid\n"); + return; + } + spin_lock_irqsave(&priv->slist_lock, flags); + list_del(&IPOIB_SKB_PRV_LIST(skb)); + spin_unlock_irqrestore(&priv->slist_lock, flags); + + ipoib_dbg_data(priv, "send complete, wrid %lu\n", wr_id); + + dma_unmap_single(priv->ca->dma_device, + IPOIB_SKB_PRV_ADDR(skb), + skb->len, + DMA_TO_DEVICE); + ++priv->stats.tx_packets; - priv->stats.tx_bytes += tx_req->skb->len; - - dev_kfree_skb_any(tx_req->skb); - - spin_lock_irqsave(&priv->tx_lock, flags); - ++priv->tx_tail; - if (netif_queue_stopped(dev) && - priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) - netif_wake_queue(dev); - spin_unlock_irqrestore(&priv->tx_lock, flags); - - if (wc->status != IB_WC_SUCCESS && - wc->status != IB_WC_WR_FLUSH_ERR) - ipoib_warn(priv, "failed send event " - "(status=%d, wrid=%d vend_err %x)\n", - wc->status, wr_id, wc->vendor_err); + priv->stats.tx_bytes += skb->len; + dev_kfree_skb_any(skb); + + if (netif_queue_stopped(dev)) + netif_wake_queue(dev); + if (wc->status != IB_WC_SUCCESS && + wc->status != IB_WC_WR_FLUSH_ERR) + ipoib_warn(priv, "failed send event " + "(status=%d, wrid=%lu vend_err %x)\n", + wc->status, wr_id, wc->vendor_err); } void ipoib_ib_send_completion(struct ib_cq *cq, void *dev_ptr) @@ -313,7 +307,7 @@ void ipoib_ib_recv_completion(struct ib_ } static inline int post_send(struct ipoib_dev_priv *priv, - unsigned int wr_id, + unsigned long wr_id, struct ib_ah *address, u32 qpn, dma_addr_t addr, int len) { @@ -333,8 +327,8 @@ void ipoib_send(struct net_device *dev, struct ipoib_ah *address, u32 qpn) { struct ipoib_dev_priv *priv = netdev_priv(dev); - struct ipoib_tx_buf *tx_req; dma_addr_t addr; + unsigned long wr_id; int err; kref_get(&address->ref); @@ -350,38 +344,31 @@ void ipoib_send(struct net_device *dev, ipoib_dbg_data(priv, "sending packet, length=%d address=%p qpn=0x%06x\n", skb->len, address, qpn); - - /* - * We put the skb into the tx_ring _before_ we call post_send() - * because it's entirely possible that the completion handler will - * run before we execute anything after the post_send(). That - * means we have to make sure everything is properly recorded and - * our state is consistent before we call post_send(). - */ - tx_req = &priv->tx_ring[priv->tx_head & (ipoib_sendq_size - 1)]; - tx_req->skb = skb; addr = dma_map_single(priv->ca->dma_device, skb->data, skb->len, DMA_TO_DEVICE); - pci_unmap_addr_set(tx_req, mapping, addr); - - err = post_send(priv, priv->tx_head & (ipoib_sendq_size - 1), - address->ah, qpn, addr, skb->len); - kref_put(&address->ref, ipoib_free_ah); - if (unlikely(err)) { - ipoib_warn(priv, "post_send failed\n"); - ++priv->stats.tx_errors; - dma_unmap_single(priv->ca->dma_device, addr, skb->len, - DMA_TO_DEVICE); - dev_kfree_skb_any(skb); - } else { - dev->trans_start = jiffies; - - ++priv->tx_head; - - if (priv->tx_head - priv->tx_tail == ipoib_sendq_size) { - ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n"); + wr_id = (unsigned long)skb; + err = post_send(priv, wr_id, address->ah, qpn, addr, skb->len); + if (!err) { + dev->trans_start = jiffies; + IPOIB_SKB_PRV_ADDR(skb) = addr; + IPOIB_SKB_PRV_AH(skb) = address; + IPOIB_SKB_PRV_SKB(skb) = skb; + spin_lock(&priv->slist_lock); + list_add_tail(&IPOIB_SKB_PRV_LIST(skb), &priv->send_list); + spin_unlock(&priv->slist_lock); + return; + } else { + if (!netif_queue_stopped(dev)) { netif_stop_queue(dev); + ipoib_warn(priv, "stopping kernel net queue\n"); } + dma_unmap_single(priv->ca->dma_device, addr, skb->len, + DMA_TO_DEVICE); + ipoib_warn(priv, "post_send failed\n"); + ++priv->stats.tx_dropped; + ++priv->stats.tx_errors; + dev_kfree_skb_any(skb); + kref_put(&address->ref, ipoib_free_ah); } } @@ -480,7 +467,9 @@ int ipoib_ib_dev_stop(struct net_device struct ipoib_dev_priv *priv = netdev_priv(dev); struct ib_qp_attr qp_attr; unsigned long begin; - struct ipoib_tx_buf *tx_req; + unsigned long flags; + struct ipoib_skb_prv *cb, *tcb; + struct sk_buff *skb; int i; clear_bit(IPOIB_FLAG_INITIALIZED, &priv->flags); @@ -496,25 +485,25 @@ int ipoib_ib_dev_stop(struct net_device /* Wait for all sends and receives to complete */ begin = jiffies; - while (priv->tx_head != priv->tx_tail || recvs_pending(dev)) { + while (!list_empty(&priv->send_list) || recvs_pending(dev)) { if (time_after(jiffies, begin + 5 * HZ)) { - ipoib_warn(priv, "timing out; %d sends %d receives not completed\n", - priv->tx_head - priv->tx_tail, recvs_pending(dev)); + ipoib_warn(priv, "timing out; %d receives not completed\n", + recvs_pending(dev)); /* * assume the HW is wedged and just free up * all our pending work requests. */ - while ((int) priv->tx_tail - (int) priv->tx_head < 0) { - tx_req = &priv->tx_ring[priv->tx_tail & - (ipoib_sendq_size - 1)]; - dma_unmap_single(priv->ca->dma_device, - pci_unmap_addr(tx_req, mapping), - tx_req->skb->len, - DMA_TO_DEVICE); - dev_kfree_skb_any(tx_req->skb); - ++priv->tx_tail; - } + spin_lock_irqsave(&priv->slist_lock, flags); + list_for_each_entry_safe(cb, tcb, &priv->send_list, + list) { + skb = cb->skb; + dma_unmap_single(priv->ca->dma_device, + IPOIB_SKB_PRV_ADDR(skb), + skb->len, DMA_TO_DEVICE); + dev_kfree_skb_any(skb); + } + spin_unlock_irqrestore(&priv->slist_lock, flags); for (i = 0; i < ipoib_recvq_size; ++i) if (priv->rx_ring[i].skb) { diff -urpN infiniband-ah/ulp/ipoib/ipoib_main.c infiniband-tx/ulp/ipoib/ipoib_main.c --- infiniband-ah/ulp/ipoib/ipoib_main.c 2006-05-23 09:31:49.000000000 -0700 +++ infiniband-tx/ulp/ipoib/ipoib_main.c 2006-05-24 11:47:06.000000000 -0700 @@ -708,9 +708,7 @@ static void ipoib_timeout(struct net_dev ipoib_warn(priv, "transmit timeout: latency %d msecs\n", jiffies_to_msecs(jiffies - dev->trans_start)); - ipoib_warn(priv, "queue stopped %d, tx_head %u, tx_tail %u\n", - netif_queue_stopped(dev), - priv->tx_head, priv->tx_tail); + ipoib_warn(priv, "queue stopped %d\n", netif_queue_stopped(dev)); /* XXX reset QP, etc. */ } @@ -846,7 +844,7 @@ int ipoib_dev_init(struct net_device *de { struct ipoib_dev_priv *priv = netdev_priv(dev); - /* Allocate RX/TX "rings" to hold queued skbs */ + /* Allocate RX "rings" to hold queued skbs */ priv->rx_ring = kzalloc(ipoib_recvq_size * sizeof *priv->rx_ring, GFP_KERNEL); if (!priv->rx_ring) { @@ -855,24 +853,11 @@ int ipoib_dev_init(struct net_device *de goto out; } - priv->tx_ring = kzalloc(ipoib_sendq_size * sizeof *priv->tx_ring, - GFP_KERNEL); - if (!priv->tx_ring) { - printk(KERN_WARNING "%s: failed to allocate TX ring (%d entries)\n", - ca->name, ipoib_sendq_size); - goto out_rx_ring_cleanup; - } - - /* priv->tx_head & tx_tail are already 0 */ - if (ipoib_ib_dev_init(dev, ca, port)) - goto out_tx_ring_cleanup; + goto out_rx_ring_cleanup; return 0; -out_tx_ring_cleanup: - kfree(priv->tx_ring); - out_rx_ring_cleanup: kfree(priv->rx_ring); @@ -896,10 +881,8 @@ void ipoib_dev_cleanup(struct net_device ipoib_ib_dev_cleanup(dev); kfree(priv->rx_ring); - kfree(priv->tx_ring); priv->rx_ring = NULL; - priv->tx_ring = NULL; } static void ipoib_setup(struct net_device *dev) @@ -944,6 +927,7 @@ static void ipoib_setup(struct net_devic spin_lock_init(&priv->lock); spin_lock_init(&priv->tx_lock); + spin_lock_init(&priv->slist_lock); mutex_init(&priv->mcast_mutex); mutex_init(&priv->vlan_mutex); @@ -952,6 +936,7 @@ static void ipoib_setup(struct net_devic INIT_LIST_HEAD(&priv->child_intfs); INIT_LIST_HEAD(&priv->dead_ahs); INIT_LIST_HEAD(&priv->multicast_list); + INIT_LIST_HEAD(&priv->send_list); INIT_WORK(&priv->pkey_task, ipoib_pkey_poll, priv->dev); INIT_WORK(&priv->mcast_task, ipoib_mcast_join_task, priv->dev); Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Wed May 24 22:44:21 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 25 May 2006 08:44:21 +0300 Subject: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs In-Reply-To: References: Message-ID: <44754435.6090506@mellanox.co.il> Paul wrote: > I dont see an rc5 @ > https://openfabrics.org/svn/gen2/branches/1.0/ofed/releases/ > > Has anybody gotten the build.sh or install.sh to produce 64bit stuff > on ppc64 ? (show of hands ? any voodoo required ?) We succeeded to compile everything on PPC64 on the RC5 tag we opened. It is not yet published (should be today). Instruction how to compile PPC64 with lib64 will be in the release mail. Tziporet From eitan at mellanox.co.il Wed May 24 22:46:38 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 25 May 2006 08:46:38 +0300 Subject: [openib-general] RE: [PATCH]OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_tFix NULL ptrissue Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302368717@mtlexch01.mtl.com> Sorry for my abbreviated late night style. By "yes it is" I meant that the CL_ASSERT is really mapping to no op in non debug version. By "I do not think it is required" I mean that if we have the new "if" we do not need the CL_ASSERT. Eitan > RE:[PATCH]OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_tFix NULL > ptrissue > > On Wed, 2006-05-24 at 16:41, Eitan Zahavi wrote: > > > But isn't CL_ASSERT a debug compile time thing so it's needed when > > it's > > > built without debug ? > > Yes it is - but I do not think it is required. > > I don't understand what you wrote. > > Does "yes it is" mean that you agree it is a debug compile time thing ? > > Why is it not required ? What if the memory allocation fails ? > > -- Hal > > > > > > > > > > > You missed the line that asserts on null p_next_step just before the > > > > code you changed . > > > > > > > > > -- Hal > > > > > > > . > > > > > > > > EZ > > > > > > > > Eitan Zahavi > > > > Senior Engineering Director, Software Architect > > > > Mellanox Technologies LTD > > > > Tel:+972-4-9097208 > > > > Fax:+972-4-9593245 > > > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > > > > > > > > > -----Original Message----- > > > > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > > > > Sent: Wednesday, May 24, 2006 4:34 PM > > > > > To: openib-general at openib.org > > > > > Cc: Eitan Zahavi > > > > > Subject: [PATCH] > > > > OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_tFix > > > > > NULL ptr issue > > > > > > > > > > OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_t Fix NULL > > ptr > > > > > issue > > > > > > > > > > Signed-off-by: Hal Rosenstock > > > > > > > > > > Index: opensm/osm_ucast_updn.c > > > > > > > =================================================================== > > > > > --- opensm/osm_ucast_updn.c (revision 7435) > > > > > +++ opensm/osm_ucast_updn.c (working copy) > > > > > @@ -121,10 +121,12 @@ __updn_create_updn_next_step_t(IN updn_s > > > > > p_next_step = (updn_next_step_t*) > > cl_zalloc(sizeof(*p_next_step)); > > > > > CL_ASSERT (p_next_step != NULL); > > > > > > > > > > - p_next_step->state = state; > > > > > - p_next_step->p_sw = p_sw; > > > > > + if (p_next_step) > > > > > + { > > > > > + p_next_step->state = state; > > > > > + p_next_step->p_sw = p_sw; > > > > > + } > > > > > return p_next_step; > > > > > - > > > > > } > > > > > > > > > > > > > > > > /********************************************************************** > > > > > > > > > From devesh28 at gmail.com Wed May 24 23:00:54 2006 From: devesh28 at gmail.com (Devesh Sharma) Date: Thu, 25 May 2006 11:30:54 +0530 Subject: [openib-general] krping test utility In-Reply-To: References: <309a667c0605232129x66d8cc5ek1bd05d22c7e1db7@mail.gmail.com> Message-ID: <309a667c0605242300t11cf8671i66521842dcdf1715@mail.gmail.com> On 5/24/06, Roland Dreier wrote: > > Devesh> Hello all, In the krping test utility get_dma_mr is called > Devesh> with access premissions > Devesh> > IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE|IB_ACCESS_REMOTE_READ, > Devesh> But the lkey we get from get_dma_mr is similar to reserved > Devesh> lkey with which only Local operations are allowed, but > Devesh> here it seems violating that statement. > > No, ib_get_dma_mr() returns an L_Key/R_Key with exactly the > permissions requested. According to previous disscussions on openIb fourm about ib_get_dma_mr I have an impression that lkey returned from ib_get_dma_mr is similar to Reserved L_Key specified in IBTA spec. But now usage in Krping is requesting remote Read/Write premissions also, So this is confusing me, please clarify this. - R. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jackm at mellanox.co.il Thu May 25 00:24:19 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Thu, 25 May 2006 10:24:19 +0300 Subject: [openib-general] Re: which dapl/udapl changes in trunk should be imported into OFED branch? (patch enclosed) In-Reply-To: References: <200605241806.48546.jackm@mellanox.co.il> Message-ID: <200605251024.19848.jackm@mellanox.co.il> On Thursday 25 May 2006 01:22, James Lentini wrote: > On Wed, 24 May 2006, Jack Morgenstein wrote: > > Hi, > > > > Below is a patch file of differences between the OFED dapl library > > and the openib main trunk dapl library. > > > > Please indicate which of the dapl library changes are necessary for > > the Intel MPI to work correctly in OFED. > > How recent is the ucm code in OFED? I'm not really sure what you mean when you say "ucm code", so I'll try to cover all the bases. PLEASE respond ASAP, since this is holding up release of OFED RC5. We need to build TODAY for testing over our weekend (which is Friday and Saturday).. Userspace: 1. libibcm: trunk most recent (includes "ib_xxxx" to "ibv_xxx" name changes). 2. libibrdma: trunk rev 7079 (May 10). subsequent fixes to cmatose and rping not included. Also, the OFED librdmacm.spec.in file has REVISION 2, while the trunk has REVISION 1 -- we'll fix this. Kernel: 1. core/ucm.c : We've taken the 2.6.17 GIT version (last updated May 11, with fixes for race conditions (SVN 7119). The GIT version differs a bit from the trunk svn 7119 version, in that changes to cm.c for supporting SDP, which is not yet in the GIT. For supporting SDP, we take cm.c from the trunk. Patch below gives the diff between the GIT version and the version used in OFED (a parameter was added to ib_cm_listen): Index: openib_branch1.0/drivers/infiniband/core/ucm.c =================================================================== --- openib_branch1.0.orig/drivers/infiniband/core/ucm.c +++ openib_branch1.0/drivers/infiniband/core/ucm.c @@ -742,7 +742,8 @@ static ssize_t ib_ucm_listen(struct ib_u if (IS_ERR(ctx)) return PTR_ERR(ctx); - result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask); + result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask, + NULL); ib_ucm_ctx_put(ctx); return result; } From jackm at mellanox.co.il Thu May 25 00:33:23 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Thu, 25 May 2006 10:33:23 +0300 Subject: [openib-general] ucm: 2.6.17 GIT vs openib trunk issues in file ucm.c Message-ID: <200605251033.23776.jackm@mellanox.co.il> Hi, Two questions: 1. I notice that in the GIT version of core/ucm.c, the trunk updates below were not taken into the 2.6.17 GIT. Is there a reason? ------------------------------------------------------------------------ r5643 | sean.hefty | 2006-03-07 02:09:48 +0200 (Tue, 07 Mar 2006) | 4 lines Changed paths: M /gen2/trunk/src/linux-kernel/infiniband/core/ucm.c Convert file mutex from a semaphore to a real mutex for 2.6.16. Signed-off-by: Sean Hefty ------------------------------------------------------------------------ ------------------------------------------------------------------------ r5944 | roland | 2006-03-22 03:02:22 +0200 (Wed, 22 Mar 2006) | 4 lines Changed paths: M /gen2/trunk/src/linux-kernel/infiniband/core/ucm.c file_mutex is really a struct mutex, so use mutex_init() instead of init_MUTEX(). Signed-off-by: Roland Dreier ------------------------------------------------------------------------ 2. Missing mutex protection in ucm.c for list_empty call. Is there a reason? GIT: static unsigned int ib_ucm_poll(struct file *filp, struct poll_table_struct *wait) { struct ib_ucm_file *file = filp->private_data; unsigned int mask = 0; poll_wait(filp, &file->poll_wait, wait); if (!list_empty(&file->events)) mask = POLLIN | POLLRDNORM; return mask; } Trunk: static unsigned int ib_ucm_poll(struct file *filp, struct poll_table_struct *wait) { struct ib_ucm_file *file = filp->private_data; unsigned int mask = 0; poll_wait(filp, &file->poll_wait, wait); mutex_lock(&file->file_mutex); if (!list_empty(&file->events)) mask = POLLIN | POLLRDNORM; mutex_unlock(&file->file_mutex); return mask; } From vlad at mellanox.co.il Thu May 25 02:48:59 2006 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 25 May 2006 12:48:59 +0300 Subject: [openib-general] Compilation issues on rhel4 u3 ppc64 sysfs.o In-Reply-To: References: Message-ID: <44757D8B.8090901@mellanox.co.il> In OFED-1.0-rc5 all binaries and libraries will be compiled on *ppc64 *with *-m64* flag. This requires sysfsutils and sysfsutils-devel 64-bit RPM to be installed (in order to build libibverbs). Also pciutils and pciutils-devel 64-bit required for tvflash package. libsdp will be built both 32 and 64 bit libraries. Note: in order to build sysfsutils 64-bit RPM run: CC="gcc -m64" rpmbuild --rebuild sysfsutils-1.3.0-1.2.1.src.rpm (This was tested on Fedora C4 PPC64) Regards, Vladimir Scott Weitzenkamp (sweitzen) wrote: > I know Vlad made some changes for rc5 in this area, at least for > libsdp, not sure if other libs got changed as well. > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > ------------------------------------------------------------------------ > *From:* Paul [mailto:paul.lundin at gmail.com] > *Sent:* Wednesday, May 24, 2006 11:00 AM > *To:* Scott Weitzenkamp (sweitzen) > *Cc:* openib-general at openib.org > *Subject:* Re: [openib-general] Compilation issues on rhel4 u3 > ppc64 sysfs.o > > Scott, > Upon further inspection the build.sh and install.sh scripts > built 32bit libraries and binaries. If I export CFLAGS (and the > like) to include -m64 then the build dies while looking for a > 64bit libsysfs. rhel4 u3 does not include a ppc64 sysfsutils, nor > have I been able to find an actual 64bit version of it. Is there a > workaround for getting things to build actual ppc64 > binaries/libraries ? > > The actual error is: > checking for dlsym in -ldl... yes > checking for pthread_mutex_init in -lpthread... yes > checking for sysfs_open_class in -lsysfs... no > configure: error: sysfs_open_class() not found. libibverbs > requires libsysfs. > From psiaraa at focalsecurity.com Thu May 25 02:54:50 2006 From: psiaraa at focalsecurity.com (Isaiah Parker) Date: Thu, 25 May 2006 11:54:50 +0200 Subject: [openib-general] Massive PE patch sale Message-ID: <000001c6800c$1cee9b00$0100007f@localhost> In a hostess without infrequency the face of harbors grew cemented Black unnatural mouths, the glittering swallowed up the sun essays air was auctions with suppressed judge The wind pulp through the long sufficed and sobbed and octagon the secret dizzying -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: top.jpg Type: image/jpeg Size: 8387 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: down.gif Type: image/gif Size: 7523 bytes Desc: not available URL: From halr at voltaire.com Thu May 25 03:25:52 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 25 May 2006 06:25:52 -0400 Subject: [openib-general] [PATCHv2] OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_t: Fix NULL ptr issue Message-ID: <1148552750.4470.143245.camel@hal.voltaire.com> OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_t: Fix NULL ptr issue in non debug builds Signed-off-by: Hal Rosenstock Index: opensm/osm_ucast_updn.c =================================================================== --- opensm/osm_ucast_updn.c (revision 7435) +++ opensm/osm_ucast_updn.c (working copy) @@ -119,12 +119,12 @@ __updn_create_updn_next_step_t(IN updn_s updn_next_step_t *p_next_step; p_next_step = (updn_next_step_t*) cl_zalloc(sizeof(*p_next_step)); - CL_ASSERT (p_next_step != NULL); - - p_next_step->state = state; - p_next_step->p_sw = p_sw; + if (p_next_step) + { + p_next_step->state = state; + p_next_step->p_sw = p_sw; + } return p_next_step; - } /********************************************************************** From halr at voltaire.com Thu May 25 04:13:31 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 25 May 2006 07:13:31 -0400 Subject: [openib-general] [PATCHv2] OpenSM/memory allocation: Deprecate cl_malloc/zalloc/free and use malloc/free directly Message-ID: <1148555599.4470.144184.camel@hal.voltaire.com> OpenSM/memory allocation: Deprecate cl_malloc/zalloc/free and friends, and use malloc/free directly Signed-off-by: Hal Rosenstock Index: osm/include/opensm/osm_port.h =================================================================== --- osm/include/opensm/osm_port.h (revision 7470) +++ osm/include/opensm/osm_port.h (working copy) @@ -50,9 +50,9 @@ #ifndef _OSM_PORT_H_ #define _OSM_PORT_H_ +#include #include #include -#include #include #include #include @@ -1374,7 +1374,7 @@ osm_port_delete( IN OUT osm_port_t** const pp_port ) { osm_port_destroy( *pp_port ); - cl_free( *pp_port ); + free( *pp_port ); *pp_port = NULL; } /* Index: osm/include/opensm/osm_rand_fwd_tbl.h =================================================================== --- osm/include/opensm/osm_rand_fwd_tbl.h (revision 7470) +++ osm/include/opensm/osm_rand_fwd_tbl.h (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -50,8 +50,8 @@ #ifndef _OSM_RAND_FWD_TBL_H_ #define _OSM_RAND_FWD_TBL_H_ +#include #include -#include #include #ifdef __cplusplus @@ -125,7 +125,7 @@ osm_rand_tbl_delete( /* TO DO - This is a place holder function only! */ - cl_free( *pp_tbl ); + free( *pp_tbl ); *pp_tbl = NULL; } /* Index: osm/include/complib/cl_memory.h =================================================================== --- osm/include/complib/cl_memory.h (revision 7470) +++ osm/include/complib/cl_memory.h (working copy) @@ -96,7 +96,7 @@ BEGIN_C_DECLS * * SYNOPSIS */ -void +void __attribute__((deprecated)) __cl_mem_track( IN const boolean_t start ); /* @@ -135,7 +135,7 @@ __cl_mem_track( * * SYNOPSIS */ -void +void __attribute__((deprecated)) cl_mem_display( void ); /* * RETURN VALUE @@ -162,7 +162,7 @@ cl_mem_display( void ); * * SYNOPSIS */ -boolean_t +boolean_t __attribute__((deprecated)) cl_mem_check( void ); /* * RETURN VALUE @@ -189,7 +189,7 @@ cl_mem_check( void ); * * SYNOPSIS */ -void* +void __attribute__((deprecated)) * __cl_malloc_trk( IN const char* const p_file_name, IN const int32_t line_num, @@ -232,7 +232,7 @@ __cl_malloc_trk( * * SYNOPSIS */ -void* +void __attribute__((deprecated)) * __cl_zalloc_trk( IN const char* const p_file_name, IN const int32_t line_num, @@ -274,7 +274,7 @@ __cl_zalloc_trk( * * SYNOPSIS */ -void* +void __attribute__((deprecated)) * __cl_malloc_ntrk( IN const size_t size ); /* @@ -308,7 +308,7 @@ __cl_malloc_ntrk( * * SYNOPSIS */ -void* +void __attribute__((deprecated)) * __cl_zalloc_ntrk( IN const size_t bytes ); /* @@ -341,7 +341,7 @@ __cl_zalloc_ntrk( * * SYNOPSIS */ -void +void __attribute__((deprecated)) __cl_free_trk( IN const char* const p_file_name, IN const int32_t line_num, @@ -384,7 +384,7 @@ __cl_free_trk( * * SYNOPSIS */ -void +void __attribute__((deprecated)) __cl_free_ntrk( IN void* const p_memory ); /* @@ -418,7 +418,7 @@ __cl_free_ntrk( * * SYNOPSIS */ -void* +void __attribute__((deprecated)) * cl_malloc( IN const size_t size ); /* @@ -449,7 +449,7 @@ cl_malloc( * * SYNOPSIS */ -void* +void __attribute__((deprecated)) * cl_zalloc( IN const size_t size ); /* @@ -480,7 +480,7 @@ cl_zalloc( * * SYNOPSIS */ -void +void __attribute__((deprecated)) cl_free( IN void* const p_memory ); /* Index: osm/include/complib/cl_memtrack.h =================================================================== --- osm/include/complib/cl_memtrack.h (revision 7470) +++ osm/include/complib/cl_memtrack.h (working copy) @@ -79,7 +79,7 @@ typedef struct _cl_mem_tracker /* List to manage free headers. */ cl_qlist_t free_hdr_list; -} cl_mem_tracker_t; +} cl_mem_tracker_t __attribute__((deprecated)); #define FILE_NAME_LENGTH 64 @@ -93,7 +93,7 @@ typedef struct _cl_malloc_hdr char file_name[FILE_NAME_LENGTH]; int32_t line_num; -} cl_malloc_hdr_t; +} cl_malloc_hdr_t __attribute__((deprecated)); extern cl_mem_tracker_t *gp_mem_tracker; Index: osm/include/vendor/osm_vendor_mlx_svc.h =================================================================== --- osm/include/vendor/osm_vendor_mlx_svc.h (revision 7470) +++ osm/include/vendor/osm_vendor_mlx_svc.h (working copy) @@ -40,7 +40,6 @@ #include #include #include -#include #include #ifdef __cplusplus @@ -191,9 +190,10 @@ osmv_mad_copy(IN const ib_mad_t *p_mad) uint8_t *p_copy; CL_ASSERT(p_mad); - p_copy = cl_zalloc(MAD_BLOCK_SIZE); + p_copy = malloc(MAD_BLOCK_SIZE); if (NULL != p_copy) { + memset(p_copy, 0, MAD_BLOCK_SIZE); memcpy(p_copy, p_mad, MAD_BLOCK_SIZE); } Index: osm/libvendor/osm_vendor_mlx_ts.c =================================================================== --- osm/libvendor/osm_vendor_mlx_ts.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_ts.c (working copy) @@ -51,6 +51,7 @@ #include #include #include +#include #include #include @@ -58,7 +59,6 @@ #include #include -#include #include typedef struct _osmv_TOPSPIN_transport_mgr_ { @@ -187,7 +187,7 @@ osmv_transport_init(IN osm_bind_info_t * char device_file[16]; int device_fd; int ts_ioctl_ret; - osmv_TOPSPIN_transport_mgr_t* p_mgr = cl_zalloc(sizeof(osmv_TOPSPIN_transport_mgr_t)); + osmv_TOPSPIN_transport_mgr_t* p_mgr = malloc(sizeof(osmv_TOPSPIN_transport_mgr_t)); int qpn; if (!p_mgr) @@ -195,6 +195,8 @@ osmv_transport_init(IN osm_bind_info_t * return IB_INSUFFICIENT_MEMORY; } + memset(p_mgr, 0, sizeof(osmv_TOPSPIN_transport_mgr_t)); + /* open TopSpin file device */ /* HACK: assume last char in hostid is the HCA index */ sprintf(device_file, "/dev/ts_ua%u", hca_idx); @@ -414,7 +416,7 @@ osmv_transport_done(IN const osm_bind_ha /* seems the only way to abort a blocking read is to make it read something */ __osm_transport_gen_dummy_mad(p_bo); cl_thread_destroy(&(p_tpot_mgr->receiver)); - cl_free(p_tpot_mgr); + free(p_tpot_mgr); } static void Index: osm/libvendor/osm_vendor_mtl_transaction_mgr.c =================================================================== --- osm/libvendor/osm_vendor_mtl_transaction_mgr.c (revision 7470) +++ osm/libvendor/osm_vendor_mtl_transaction_mgr.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -40,7 +40,7 @@ #endif /* HAVE_CONFIG_H */ #include -#include +#include #include #include #include @@ -279,7 +279,7 @@ osm_transaction_mgr_init( IN osm_vendor_ CL_ASSERT( p_vend->p_transaction_mgr == NULL ); (osm_transaction_mgr_t*)p_vend->p_transaction_mgr = - (osm_transaction_mgr_t * ) cl_malloc( sizeof( osm_transaction_mgr_t ) ); + (osm_transaction_mgr_t * ) malloc( sizeof( osm_transaction_mgr_t ) ); trans_mgr_p = (osm_transaction_mgr_t*)p_vend->p_transaction_mgr; @@ -289,12 +289,12 @@ osm_transaction_mgr_init( IN osm_vendor_ /* initialize the qlist */ trans_mgr_p->madw_reqs_list_p = - ( cl_qlist_t * ) cl_malloc( sizeof( cl_qlist_t ) ); + ( cl_qlist_t * ) malloc( sizeof( cl_qlist_t ) ); cl_qlist_init( trans_mgr_p->madw_reqs_list_p); /* initialize the qmap */ trans_mgr_p->madw_by_tid_map_p = - ( cl_qmap_t * ) cl_malloc( sizeof( cl_qmap_t ) ); + ( cl_qmap_t * ) malloc( sizeof( cl_qmap_t ) ); cl_qmap_init( trans_mgr_p->madw_by_tid_map_p ); /* create the timer used by the madw_req_list */ @@ -352,12 +352,12 @@ osm_transaction_mgr_destroy ( IN osm_ven p_map_item = &(osm_madw_req_p->map_item); cl_qmap_remove_item(trans_mgr_p->madw_by_tid_map_p, p_map_item); /* free the item */ - cl_free(osm_madw_req_p); + free(osm_madw_req_p); p_list_item = cl_qlist_remove_head(trans_mgr_p->madw_reqs_list_p); } /* free the qlist and qmap */ - cl_free(trans_mgr_p->madw_reqs_list_p ); - cl_free(trans_mgr_p->madw_by_tid_map_p ); + free(trans_mgr_p->madw_reqs_list_p ); + free(trans_mgr_p->madw_by_tid_map_p ); /* reliease and destroy the lock */ cl_spinlock_release( &trans_mgr_p->transaction_mgr_lock ); cl_spinlock_destroy( &(trans_mgr_p->transaction_mgr_lock) ); @@ -365,7 +365,7 @@ osm_transaction_mgr_destroy ( IN osm_ven cl_timer_trim(&trans_mgr_p->madw_list_timer, 1); cl_timer_destroy( &trans_mgr_p->madw_list_timer ); /* free the transaction_manager object */ - cl_free(trans_mgr_p); + free(trans_mgr_p); trans_mgr_p = NULL; } OSM_LOG_EXIT( p_vend->p_log ); @@ -398,7 +398,7 @@ osm_transaction_mgr_insert_madw( IN osm_ timeout = (uint64_t)(p_vend->timeout) * 1000; /* change the miliseconds value of timeout to microseconds. */ waking_time = timeout + cl_get_time_stamp(); - osm_madw_req_p = (osm_madw_req_t *)cl_malloc( sizeof (osm_madw_req_t) ); + osm_madw_req_p = (osm_madw_req_t *)malloc( sizeof (osm_madw_req_t) ); osm_madw_req_p->p_madw = p_madw; osm_madw_req_p->waking_time = waking_time; @@ -476,7 +476,7 @@ osm_transaction_mgr_erase_madw( IN osm_v "Removed TID:<0x%"PRIx64">.\n", p_mad->trans_id ); /* free the item */ - cl_free(osm_madw_req_p); + free(osm_madw_req_p); } else { Index: osm/libvendor/osm_pkt_randomizer.c =================================================================== --- osm/libvendor/osm_pkt_randomizer.c (revision 7470) +++ osm/libvendor/osm_pkt_randomizer.c (working copy) @@ -58,8 +58,6 @@ #include #endif -#include - /********************************************************************** * Return TRUE if the path is in a fault path, and FALSE otherwise. * By in a fault path the meaning is that there is a path in the fault @@ -284,7 +282,7 @@ osm_pkt_randomizer_init( OSM_LOG_ENTER( p_log, osm_pkt_randomizer_init ); - *pp_pkt_randomizer = cl_zalloc( sizeof( osm_pkt_randomizer_t ) ); + *pp_pkt_randomizer = malloc( sizeof( osm_pkt_randomizer_t ) ); if ( *pp_pkt_randomizer == NULL ) { res = IB_INSUFFICIENT_MEMORY; @@ -317,14 +315,17 @@ osm_pkt_randomizer_init( /* allocate the fault_dr_paths variable */ /* It is the number of the paths that will be saved as fault = osm_pkt_num_unstable_links */ - (*pp_pkt_randomizer)->fault_dr_paths = cl_zalloc( sizeof( osm_dr_path_t ) * - (*pp_pkt_randomizer)->osm_pkt_num_unstable_links ); + (*pp_pkt_randomizer)->fault_dr_paths = malloc( sizeof( osm_dr_path_t ) * + (*pp_pkt_randomizer)->osm_pkt_num_unstable_links ); if ( (*pp_pkt_randomizer)->fault_dr_paths == NULL ) { res = IB_INSUFFICIENT_MEMORY; goto Exit; } + memset( (*pp_pkt_randomizer)->fault_dr_paths, 0, + sizeof( osm_dr_path_t ) * (*pp_pkt_randomizer)->osm_pkt_num_unstable_links ); + Exit: OSM_LOG_EXIT( p_log ); return (res); @@ -341,8 +342,8 @@ osm_pkt_randomizer_destroy( if ( *pp_pkt_randomizer != NULL ) { - cl_free( (*pp_pkt_randomizer)->fault_dr_paths ); - cl_free( *pp_pkt_randomizer ); + free( (*pp_pkt_randomizer)->fault_dr_paths ); + free( *pp_pkt_randomizer ); } OSM_LOG_EXIT( p_log ); } Index: osm/libvendor/osm_vendor_mlx_hca.c =================================================================== --- osm/libvendor/osm_vendor_mlx_hca.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_hca.c (working copy) @@ -46,8 +46,8 @@ #include #include #include -#include #include +#include #include /******************************************************************************** @@ -130,8 +130,8 @@ __osm_vendor_get_ca_ids( IN osm_vendor_t /* allocate and really call - user of this function needs to deallocate it */ *p_hca_ids = - ( VAPI_hca_id_t * ) cl_malloc( *p_num_guids * - sizeof( VAPI_hca_id_t ) ); + ( VAPI_hca_id_t * ) malloc( *p_num_guids * + sizeof( VAPI_hca_id_t ) ); /* now call it really */ vapi_res = EVAPI_list_hcas( *p_num_guids, p_num_guids, *p_hca_ids ); @@ -221,15 +221,15 @@ __osm_ca_info_init( IN osm_vendor_t * co memcpy( &( p_ca_info->guid ), hca_cap.node_guid, 8 * sizeof( u_int8_t ) ); p_ca_info->attr_size = 1; p_ca_info->p_attr = - ( ib_ca_attr_t * ) cl_malloc( sizeof( ib_ca_attr_t ) ); + ( ib_ca_attr_t * ) malloc( sizeof( ib_ca_attr_t ) ); memcpy( &( p_ca_info->p_attr->ca_guid ), hca_cap.node_guid, 8 * sizeof( u_int8_t ) ); /* now obtain the attributes of the ports */ p_ca_info->p_attr->num_ports = hca_cap.phys_port_num; p_ca_info->p_attr->p_port_attr = - ( ib_port_attr_t * ) cl_malloc( hca_cap.phys_port_num * - sizeof( ib_port_attr_t ) ); + ( ib_port_attr_t * ) malloc( hca_cap.phys_port_num * + sizeof( ib_port_attr_t ) ); for( port_num = 0; port_num < p_ca_info->p_attr->num_ports; port_num++ ) { @@ -250,7 +250,7 @@ __osm_ca_info_init( IN osm_vendor_t * co VAPI_query_hca_gid_tbl( hca_hndl, port_num + 1, 0, &maxNumGids, NULL ); p_port_gid = - ( IB_gid_t * ) cl_malloc( maxNumGids * sizeof( IB_gid_t ) ); + ( IB_gid_t * ) malloc( maxNumGids * sizeof( IB_gid_t ) ); vapi_res = VAPI_query_hca_gid_tbl( hca_hndl, port_num + 1, maxNumGids, @@ -270,7 +270,7 @@ __osm_ca_info_init( IN osm_vendor_t * co p_ca_info->p_attr->p_port_attr[port_num].link_state = hca_port.state; p_ca_info->p_attr->p_port_attr[port_num].sm_lid = hca_port.sm_lid; - cl_free( p_port_gid ); + free( p_port_gid ); } status = IB_SUCCESS; @@ -299,14 +299,14 @@ osm_ca_info_destroy( IN osm_vendor_t * c { if(0 != p_ca->p_attr->num_ports) { - cl_free( p_ca->p_attr->p_port_attr ); + free( p_ca->p_attr->p_port_attr ); } - cl_free( p_ca->p_attr); + free( p_ca->p_attr); } } - cl_free( p_ca_info ); + free( p_ca_info ); OSM_LOG_EXIT( p_vend->p_log ); } @@ -349,7 +349,7 @@ osm_vendor_get_all_port_attr( IN osm_ven } /* Allocate an array big enough to hold the ca info objects*/ - p_ca_infos = cl_zalloc( ca_count * sizeof( osm_ca_info_t ) ); + p_ca_infos = malloc( ca_count * sizeof( osm_ca_info_t ) ); if( p_ca_infos == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -358,6 +358,8 @@ osm_vendor_get_all_port_attr( IN osm_ven goto Exit; } + memset( p_ca_infos, 0, ca_count * sizeof( osm_ca_info_t ) ); + /* * For each CA, retrieve the CA info attributes */ @@ -409,7 +411,7 @@ osm_vendor_get_all_port_attr( IN osm_ven Exit: if( p_ca_ids ) - cl_free( p_ca_ids ); + free( p_ca_ids ); if ( p_ca_infos ) { @@ -504,7 +506,7 @@ osm_vendor_get_guid_ca_and_port( VAPI_query_hca_gid_tbl( hca_hndl, portIdx + 1, 0, &maxNumGids, NULL ); p_port_gid = - ( IB_gid_t * ) cl_malloc( maxNumGids * sizeof( IB_gid_t ) ); + ( IB_gid_t * ) malloc( maxNumGids * sizeof( IB_gid_t ) ); /* get the port guid */ vapi_res = @@ -533,7 +535,7 @@ osm_vendor_get_guid_ca_and_port( goto Exit; } - cl_free( p_port_gid ); + free( p_port_gid ); p_port_gid = NULL; } /* ALL PORTS */ } /* all HCAs */ @@ -546,9 +548,9 @@ osm_vendor_get_guid_ca_and_port( Exit: if( p_ca_ids != NULL ) - cl_free( p_ca_ids ); + free( p_ca_ids ); if( p_port_gid != NULL ) - cl_free( p_port_gid ); + free( p_port_gid ); OSM_LOG_EXIT( p_vend->p_log ); return ( status ); } Index: osm/libvendor/osm_vendor_mlx_sa.c =================================================================== --- osm/libvendor/osm_vendor_mlx_sa.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_sa.c (working copy) @@ -40,8 +40,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include @@ -181,7 +181,7 @@ __osmv_sa_mad_rcv_cb( Exit: /* free the copied query request if found */ - if (p_query_req_copy) cl_free(p_query_req_copy); + if (p_query_req_copy) free(p_query_req_copy); /* put back the request madw */ if (p_req_madw) @@ -227,7 +227,7 @@ __osmv_sa_mad_err_cb( if ((p_query_req_copy->flags & OSM_SA_FLAGS_SYNC) == OSM_SA_FLAGS_SYNC) cl_event_signal( &p_bind->sync_event ); - if (p_query_req_copy) cl_free(p_query_req_copy); + if (p_query_req_copy) free(p_query_req_copy); OSM_LOG_EXIT( p_bind->p_log ); } @@ -289,7 +289,7 @@ __osmv_get_lid_and_sm_lid_by_port_guid( /* allocate the attributes */ p_attr_array = - (ib_port_attr_t *)cl_malloc(sizeof(ib_port_attr_t)*num_ports); + (ib_port_attr_t *)malloc(sizeof(ib_port_attr_t)*num_ports); /* obtain the attributes */ status = osm_vendor_get_all_port_attr(p_vend, p_attr_array, &num_ports); @@ -300,7 +300,7 @@ __osmv_get_lid_and_sm_lid_by_port_guid( "Fail to get port attributes (error: %s)\n", ib_get_err_str(status) ); - cl_free(p_attr_array); + free(p_attr_array); goto Exit; } @@ -321,7 +321,7 @@ __osmv_get_lid_and_sm_lid_by_port_guid( } } - cl_free(p_attr_array); + free(p_attr_array); Exit: OSM_LOG_EXIT( p_vend->p_log ); @@ -361,7 +361,7 @@ osmv_bind_sa( /* allocate the new sa bind info */ p_sa_bind_info = - (osmv_sa_bind_info_t *)cl_malloc(sizeof(osmv_sa_bind_info_t)); + (osmv_sa_bind_info_t *)malloc(sizeof(osmv_sa_bind_info_t)); if (! p_sa_bind_info) { osm_log( p_log, OSM_LOG_ERROR, @@ -389,7 +389,7 @@ osmv_bind_sa( if (p_sa_bind_info->h_bind == OSM_BIND_INVALID_HANDLE) { - cl_free(p_sa_bind_info); + free(p_sa_bind_info); p_sa_bind_info = OSM_BIND_INVALID_HANDLE; osm_log( p_log, OSM_LOG_ERROR, "osmv_bind_sa: ERR 0506: " @@ -406,7 +406,7 @@ osmv_bind_sa( &p_sa_bind_info->sm_lid); if (status != IB_SUCCESS) { - cl_free(p_sa_bind_info); + free(p_sa_bind_info); p_sa_bind_info = OSM_BIND_INVALID_HANDLE; osm_log( p_log, OSM_LOG_ERROR, "osmv_bind_sa: ERR 0507: " @@ -424,7 +424,7 @@ osmv_bind_sa( "cl_init_event failed: %s\n", ib_get_err_str(cl_status) ); - cl_free(p_sa_bind_info); + free(p_sa_bind_info); p_sa_bind_info = OSM_BIND_INVALID_HANDLE; } @@ -586,7 +586,7 @@ __osmv_send_sa_req( To store on the MADW we cast it into what opensm has: p_madw->context.arb_context.context1 */ - p_query_req_copy = cl_malloc(sizeof(*p_query_req_copy)); + p_query_req_copy = malloc(sizeof(*p_query_req_copy)); *p_query_req_copy = *p_query_req; p_madw->context.arb_context.context1 = p_query_req_copy; Index: osm/libvendor/osm_vendor_mlx_hca_pfs.c =================================================================== --- osm/libvendor/osm_vendor_mlx_hca_pfs.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_hca_pfs.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -43,8 +43,8 @@ #undef IN #undef OUT #include -#include #include +#include #include #include #include @@ -516,14 +516,14 @@ __osm_ca_info_init( IN osm_vendor_t * co /* set size of attributes and allocate them */ p_ca_info->attr_size = 1; - p_ca_info->p_attr = ( ib_ca_attr_t * ) cl_malloc( sizeof( ib_ca_attr_t ) ); + p_ca_info->p_attr = ( ib_ca_attr_t * ) malloc( sizeof( ib_ca_attr_t ) ); p_ca_info->p_attr->ca_guid = p_ca_info->guid; p_ca_info->p_attr->num_ports = pfs_ca_info.num_ports; /* now obtain the attributes of the ports */ p_ca_info->p_attr->p_port_attr = - ( ib_port_attr_t * ) cl_malloc( pfs_ca_info.num_ports * sizeof( ib_port_attr_t ) ); + ( ib_port_attr_t * ) malloc( pfs_ca_info.num_ports * sizeof( ib_port_attr_t ) ); /* get all the ports info */ for( port_num = 1; port_num <= pfs_ca_info.num_ports; port_num++ ) @@ -581,14 +581,14 @@ osm_ca_info_destroy( IN osm_vendor_t * c { if(0 != p_ca->p_attr->num_ports) { - cl_free( p_ca->p_attr->p_port_attr ); + free( p_ca->p_attr->p_port_attr ); } - cl_free( p_ca->p_attr); + free( p_ca->p_attr); } } - cl_free( p_ca_info ); + free( p_ca_info ); OSM_LOG_EXIT( p_vend->p_log ); } @@ -628,7 +628,7 @@ osm_vendor_get_all_port_attr( IN osm_ven } /* Allocate an array big enough to hold the ca info objects*/ - p_ca_infos = cl_zalloc( ca_count * sizeof( osm_ca_info_t ) ); + p_ca_infos = malloc( ca_count * sizeof( osm_ca_info_t ) ); if( p_ca_infos == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -637,6 +637,8 @@ osm_vendor_get_all_port_attr( IN osm_ven goto Exit; } + memset( p_ca_infos, 0, ca_count * sizeof( osm_ca_info_t ) ); + /* * For each CA, retrieve the CA info attributes */ Index: osm/libvendor/osm_vendor_ibumad_sa.c =================================================================== --- osm/libvendor/osm_vendor_ibumad_sa.c (revision 7470) +++ osm/libvendor/osm_vendor_ibumad_sa.c (working copy) @@ -38,13 +38,12 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include #include -#include - #define MAX_PORTS 64 /***************************************************************************** @@ -180,7 +179,7 @@ __osmv_sa_mad_rcv_cb( Exit: /* free the copied query request if found */ - if (p_query_req_copy) cl_free(p_query_req_copy); + if (p_query_req_copy) free(p_query_req_copy); OSM_LOG_EXIT( p_bind->p_log ); } @@ -221,7 +220,7 @@ __osmv_sa_mad_err_cb( if ((p_query_req_copy->flags & OSM_SA_FLAGS_SYNC) == OSM_SA_FLAGS_SYNC) cl_event_signal( &p_bind->sync_event ); - if (p_query_req_copy) cl_free(p_query_req_copy); + if (p_query_req_copy) free(p_query_req_copy); OSM_LOG_EXIT( p_bind->p_log ); } @@ -282,7 +281,7 @@ __osmv_get_lid_and_sm_lid_by_port_guid( /* allocate the attributes */ p_attr_array = - (ib_port_attr_t *)cl_malloc(sizeof(ib_port_attr_t)*num_ports); + (ib_port_attr_t *)malloc(sizeof(ib_port_attr_t)*num_ports); /* obtain the attributes */ status = osm_vendor_get_all_port_attr(p_vend, p_attr_array, &num_ports); @@ -293,7 +292,7 @@ __osmv_get_lid_and_sm_lid_by_port_guid( "Failed to get port attributes (error: %s)\n", ib_get_err_str(status) ); - cl_free(p_attr_array); + free(p_attr_array); goto Exit; } @@ -314,7 +313,7 @@ __osmv_get_lid_and_sm_lid_by_port_guid( } } - cl_free(p_attr_array); + free(p_attr_array); Exit: OSM_LOG_EXIT( p_vend->p_log ); @@ -354,7 +353,7 @@ osmv_bind_sa( /* allocate the new sa bind info */ p_sa_bind_info = - (osmv_sa_bind_info_t *)cl_malloc(sizeof(osmv_sa_bind_info_t)); + (osmv_sa_bind_info_t *)malloc(sizeof(osmv_sa_bind_info_t)); if (! p_sa_bind_info) { osm_log( p_log, OSM_LOG_ERROR, @@ -382,7 +381,7 @@ osmv_bind_sa( if (p_sa_bind_info->h_bind == OSM_BIND_INVALID_HANDLE) { - cl_free(p_sa_bind_info); + free(p_sa_bind_info); p_sa_bind_info = OSM_BIND_INVALID_HANDLE; osm_log( p_log, OSM_LOG_ERROR, "osmv_bind_sa: ERR 5506: " @@ -399,7 +398,7 @@ osmv_bind_sa( &p_sa_bind_info->sm_lid); if (status != IB_SUCCESS) { - cl_free(p_sa_bind_info); + free(p_sa_bind_info); p_sa_bind_info = OSM_BIND_INVALID_HANDLE; osm_log( p_log, OSM_LOG_ERROR, "osmv_bind_sa: ERR 5507: " @@ -417,7 +416,7 @@ osmv_bind_sa( "cl_init_event failed: %s\n", ib_get_err_str(cl_status) ); - cl_free(p_sa_bind_info); + free(p_sa_bind_info); p_sa_bind_info = OSM_BIND_INVALID_HANDLE; } @@ -576,7 +575,7 @@ __osmv_send_sa_req( To store on the MADW we cast it into what opensm has: p_madw->context.ni_context.node_guid */ - p_query_req_copy = cl_malloc(sizeof(*p_query_req_copy)); + p_query_req_copy = malloc(sizeof(*p_query_req_copy)); *p_query_req_copy = *p_query_req; p_madw->context.ni_context.node_guid = (ib_net64_t)(long)p_query_req_copy; Index: osm/libvendor/osm_vendor_mlx_txn.c =================================================================== --- osm/libvendor/osm_vendor_mlx_txn.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_txn.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -38,7 +38,7 @@ # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include @@ -86,12 +86,13 @@ osmv_txn_init(IN osm_bind_handle_t osm_log( p_bo->p_vendor->p_log, OSM_LOG_DEBUG, "Starting transaction 0x%llX (key=0x%llX)\n", tid, key); - p_txn = cl_zalloc(sizeof(osmv_txn_ctx_t)); + p_txn = malloc(sizeof(osmv_txn_ctx_t)); if (! p_txn) { return IB_INSUFFICIENT_MEMORY; } + memset(p_txn, 0, sizeof(osmv_txn_ctx_t)); p_txn->p_log = p_bo->txn_mgr.p_log; p_txn->tid = tid; p_txn->key = key; @@ -113,7 +114,7 @@ osmv_txn_init(IN osm_bind_handle_t return IB_SUCCESS; insert_txn_failed: - cl_free(p_txn); + free(p_txn); OSM_LOG_EXIT( p_bo->p_vendor->p_log ); return st; @@ -132,13 +133,15 @@ osmv_txn_init_rmpp_sender(IN osm_bind_ha osmv_txn_remove_timeout_ev(h_bind, osmv_txn_get_key(p_txn)); p_txn->rmpp_txfr.rmpp_state = OSMV_TXN_RMPP_SENDER; - p_txn->rmpp_txfr.p_rmpp_send_ctx = cl_zalloc(sizeof(osmv_rmpp_send_ctx_t)); + p_txn->rmpp_txfr.p_rmpp_send_ctx = malloc(sizeof(osmv_rmpp_send_ctx_t)); if (!p_txn->rmpp_txfr.p_rmpp_send_ctx) { return IB_INSUFFICIENT_MEMORY; } + memset(p_txn->rmpp_txfr.p_rmpp_send_ctx, 0, sizeof(osmv_rmpp_send_ctx_t)); + st = osmv_rmpp_send_ctx_init(p_txn->rmpp_txfr.p_rmpp_send_ctx, (void*)p_madw->p_mad, p_madw->mad_size, @@ -171,7 +174,7 @@ osmv_txn_init_rmpp_receiver(IN osm_bind_ p_txn->rmpp_txfr.rmpp_state = OSMV_TXN_RMPP_RECEIVER; p_txn->rmpp_txfr.is_rmpp_init_by_peer = is_init_by_peer; - p_txn->rmpp_txfr.p_rmpp_recv_ctx = cl_zalloc(sizeof(osmv_rmpp_recv_ctx_t)); + p_txn->rmpp_txfr.p_rmpp_recv_ctx = malloc(sizeof(osmv_rmpp_recv_ctx_t)); if (!p_txn->rmpp_txfr.p_rmpp_recv_ctx) { @@ -180,6 +183,8 @@ osmv_txn_init_rmpp_receiver(IN osm_bind_ return IB_INSUFFICIENT_MEMORY; } + memset(p_txn->rmpp_txfr.p_rmpp_recv_ctx, 0, sizeof(osmv_rmpp_recv_ctx_t)); + st = osmv_rmpp_recv_ctx_init(p_txn->rmpp_txfr.p_rmpp_recv_ctx,p_txn->p_log); return st; @@ -271,7 +276,7 @@ osmv_txn_done(IN osm_bind_handle_t h_bin osmv_rmpp_recv_ctx_done(p_ctx->rmpp_txfr.p_rmpp_recv_ctx); } - cl_free(p_ctx); + free(p_ctx); OSM_LOG_EXIT(p_bo->p_vendor->p_log); } @@ -327,12 +332,14 @@ osmv_txnmgr_init(IN osmv_txn_mgr_t *p_tx { cl_status_t cl_st = CL_SUCCESS; - p_tx_mgr->p_event_wheel = cl_zalloc(sizeof(cl_event_wheel_t)); + p_tx_mgr->p_event_wheel = malloc(sizeof(cl_event_wheel_t)); if (!p_tx_mgr->p_event_wheel) { return IB_INSUFFICIENT_MEMORY; } + memset(p_tx_mgr->p_event_wheel, 0, sizeof(cl_event_wheel_t)); + cl_event_wheel_construct(p_tx_mgr->p_event_wheel); /* NOTE! We are using an extended constructor. @@ -342,18 +349,20 @@ osmv_txnmgr_init(IN osmv_txn_mgr_t *p_tx cl_st = cl_event_wheel_init_ex(p_tx_mgr->p_event_wheel, p_log, p_lock); if (cl_st != CL_SUCCESS) { - cl_free(p_tx_mgr->p_event_wheel); + free(p_tx_mgr->p_event_wheel); return (ib_api_status_t)cl_st; } - p_tx_mgr->p_txn_map = cl_zalloc(sizeof(cl_qmap_t)); + p_tx_mgr->p_txn_map = malloc(sizeof(cl_qmap_t)); if (!p_tx_mgr->p_txn_map) { cl_event_wheel_destroy(p_tx_mgr->p_event_wheel); - cl_free(p_tx_mgr->p_event_wheel); + free(p_tx_mgr->p_event_wheel); return IB_INSUFFICIENT_MEMORY; } + memset(p_tx_mgr->p_txn_map, 0, sizeof(cl_qmap_t)); + cl_qmap_init(p_tx_mgr->p_txn_map); p_tx_mgr->p_log = p_log; @@ -366,10 +375,10 @@ osmv_txnmgr_done(IN osm_bind_handle_t osmv_bind_obj_t* p_bo = (osmv_bind_obj_t*)h_bind; __osmv_txn_all_done(h_bind); - cl_free(p_bo->txn_mgr.p_txn_map); + free(p_bo->txn_mgr.p_txn_map); cl_event_wheel_destroy(p_bo->txn_mgr.p_event_wheel); - cl_free(p_bo->txn_mgr.p_event_wheel); + free(p_bo->txn_mgr.p_event_wheel); } ib_api_status_t @@ -430,7 +439,7 @@ __osmv_txnmgr_insert_txn(IN osmv_txn_mgr CL_ASSERT(p_txn); key = osmv_txn_get_key(p_txn); - p_obj = cl_zalloc(sizeof(cl_map_obj_t)); + p_obj = malloc(sizeof(cl_map_obj_t)); if (NULL == p_obj) return IB_INSUFFICIENT_MEMORY; @@ -438,6 +447,8 @@ __osmv_txnmgr_insert_txn(IN osmv_txn_mgr "__osmv_txnmgr_insert_txn: " "Inserting key: 0x%llX to map ptr:%p\n", key, p_tx_mgr->p_txn_map ); + memset(p_obj, 0, sizeof(cl_map_obj_t)); + cl_qmap_set_obj(p_obj,p_txn); /* assuming lookup with this key was made and the result was IB_NOT_FOUND */ cl_qmap_insert(p_tx_mgr->p_txn_map, key, &p_obj->item); @@ -484,7 +495,7 @@ __osmv_txnmgr_remove_txn(IN osmv_txn_mg p_obj = PARENT_STRUCT(p_item, cl_map_obj_t,item); *pp_txn = cl_qmap_obj(p_obj); - cl_free(p_obj); + free(p_obj); OSM_LOG_EXIT(p_tx_mgr->p_log); return IB_SUCCESS; @@ -506,7 +517,7 @@ __osmv_txn_all_done(osm_bind_handle_t p_obj = PARENT_STRUCT(p_item,cl_map_obj_t,item); p_txn = (osmv_txn_ctx_t*)cl_qmap_obj(p_obj); osmv_txn_done(h_bind, osmv_txn_get_key(p_txn), FALSE); - cl_free(p_obj); + free(p_obj); /* assuming osmv_txn_done has removed the txn from the map */ p_item = cl_qmap_head(p_bo->txn_mgr.p_txn_map); } Index: osm/libvendor/osm_vendor_ibumad.c =================================================================== --- osm/libvendor/osm_vendor_ibumad.c (revision 7470) +++ osm/libvendor/osm_vendor_ibumad.c (working copy) @@ -58,12 +58,12 @@ #ifdef OSM_VENDOR_INTF_OPENIB #include +#include #include #include #include #include -#include #include #include #include @@ -507,7 +507,7 @@ osm_vendor_new( goto Exit; } - p_vend = cl_zalloc( sizeof(*p_vend) ); + p_vend = malloc( sizeof(*p_vend) ); if( p_vend == NULL ) { osm_log( p_log, OSM_LOG_ERROR, @@ -516,8 +516,10 @@ osm_vendor_new( goto Exit; } + memset( p_vend, 0, sizeof(*p_vend) ); + if (osm_vendor_init( p_vend, p_log, timeout ) < 0) { - cl_free( p_vend ); + free( p_vend ); p_vend = NULL; } @@ -550,7 +552,7 @@ osm_vendor_delete( cl_event_destroy( &p_ur->signal ); cl_spinlock_destroy( &(*pp_vend)->cb_lock ); cl_spinlock_destroy( &(*pp_vend)->match_tbl_lock ); - cl_free( *pp_vend ); + free( *pp_vend ); *pp_vend = NULL; } @@ -826,13 +828,14 @@ osm_vendor_bind( goto Exit; } - if (!(p_bind = cl_zalloc( sizeof(*p_bind) ))) { + if (!(p_bind = malloc( sizeof(*p_bind) ))) { osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 5425: " "Unable to allocate internal bind object\n" ); goto Exit; } + memset( p_bind, 0, sizeof(*p_bind) ); p_bind->p_vend = p_vend; p_bind->port_id = umad_port_id; p_bind->client_context = context; @@ -880,7 +883,7 @@ osm_vendor_bind( "osm_vendor_bind: ERR 5426: " "Unable to register class %u version %u\n", p_user_bind->mad_class, p_user_bind->class_version); - cl_free(p_bind); + free(p_bind); p_bind = 0; goto Exit; } @@ -892,7 +895,7 @@ osm_vendor_bind( "bad agent id %u or duplicate agent for class %u vers %u\n", p_bind->agent_id, p_user_bind->mad_class, p_user_bind->class_version); - cl_free(p_bind); + free(p_bind); p_bind = 0; goto Exit; } @@ -909,7 +912,7 @@ osm_vendor_bind( "osm_vendor_bind: ERR 5428: " "Unable to register class 1 version %u\n", p_user_bind->class_version); - cl_free(p_bind); + free(p_bind); p_bind = 0; goto Exit; } @@ -920,7 +923,7 @@ osm_vendor_bind( "osm_vendor_bind: ERR 5429: " "bad agent id %u or duplicate agent for class 1 vers %u\n", p_bind->agent_id1, p_user_bind->class_version); - cl_free(p_bind); + free(p_bind); p_bind = 0; goto Exit; } Index: osm/libvendor/osm_vendor_mlx_sar.c =================================================================== --- osm/libvendor/osm_vendor_mlx_sar.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_sar.c (working copy) @@ -38,10 +38,10 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include -#include ib_api_status_t osmv_rmpp_sar_init(osmv_rmpp_sar_t* p_sar, void* p_arbt_mad, @@ -161,21 +161,10 @@ osmv_rmpp_sar_reassemble_arbt_mad(osmv_r p_mad= (char*)p_mad+space_left; } - cl_free(buf_tmp); - cl_free(p_obj); + free(buf_tmp); + free(p_obj); } return IB_SUCCESS; } - - - - - - - - - - - Index: osm/libvendor/osm_vendor_mlx_sim.c =================================================================== --- osm/libvendor/osm_vendor_mlx_sim.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_sim.c (working copy) @@ -51,6 +51,7 @@ #include #include #include +#include #include #include @@ -58,7 +59,6 @@ #include #include -#include /* the simulator messages definition */ #include @@ -149,7 +149,7 @@ osmv_transport_init(IN osm_bind_info_t * IN osmv_bind_obj_t *p_bo) { ibms_conn_handle_t conHdl; /* the connection we talk to the simulator through */ - osmv_ibms_transport_mgr_t* p_mgr = cl_zalloc(sizeof(osmv_ibms_transport_mgr_t)); + osmv_ibms_transport_mgr_t* p_mgr = malloc(sizeof(osmv_ibms_transport_mgr_t)); int qpn; int ibms_status; uint64_t port_guid; @@ -159,6 +159,8 @@ osmv_transport_init(IN osm_bind_info_t * return IB_INSUFFICIENT_MEMORY; } + memset(p_mgr, 0, sizeof(osmv_ibms_transport_mgr_t)); + /* create the client socket connected to the simulator */ /* also perform the "connect" message - such that we validate the target guid */ @@ -343,7 +345,7 @@ osmv_transport_done(IN const osm_bind_ha ibms_disconnect(p_tpot_mgr->conHdl); /* seems the only way to abort a blocking read is to make it read something */ - cl_free(p_tpot_mgr); + free(p_tpot_mgr); } static void Index: osm/libvendor/osm_vendor_umadt.c =================================================================== --- osm/libvendor/osm_vendor_umadt.c (revision 7470) +++ osm/libvendor/osm_vendor_umadt.c (working copy) @@ -59,11 +59,11 @@ #ifdef OSM_VENDOR_INTF_UMADT +#include #include #include #include -#include #include #include #include @@ -153,9 +153,11 @@ osm_vendor_new( OSM_LOG_ENTER( p_log, osm_vendor_new ); - p_umadt_obj = cl_zalloc(sizeof(umadt_obj_t)); + p_umadt_obj = malloc(sizeof(umadt_obj_t)); if( p_umadt_obj ) { + memset( p_umadt_obj, 0, sizeof(umadt_obj_t) ); + status = osm_vendor_init( (osm_vendor_t*)p_umadt_obj, p_log, timeout ); if( status != IB_SUCCESS ) @@ -201,7 +203,7 @@ osm_vendor_delete( p_mad_bind_info = (mad_bind_info_t*)p_list_item; } dlclose(p_umadt_obj->umadt_handle); - cl_free(p_umadt_obj); + free(p_umadt_obj); *pp_vend = NULL; OSM_LOG_EXIT( p_umadt_obj->p_log ); @@ -354,8 +356,8 @@ osm_vendor_get_ports( } pPortAttributesList = - ( IB_PORT_ATTRIBUTES * ) cl_zalloc ( caAttributes. - PortAttributesListSize ); + ( IB_PORT_ATTRIBUTES * ) malloc ( caAttributes. + PortAttributesListSize ); if ( pPortAttributesList == NULL ) { @@ -389,7 +391,7 @@ osm_vendor_get_ports( pPortAttributesList = pPortAttributesList->Next; p_port_guid++; } - cl_free (caAttributes.PortAttributesList); + free (caAttributes.PortAttributesList); p_umadt_obj->IbtInterface.Vpi.CloseCA ( caHandle ); free_guids = free_guids - caAttributes.Ports ; @@ -441,12 +443,15 @@ osm_vendor_get( p_vend_wrap->direction = SEND; return( (ib_mad_t*)&p_madt_struct->IBMad ); #endif /* 0 */ - p_mad = (ib_mad_t*)cl_zalloc( mad_size ); - if ( !p_mad) + p_mad = (ib_mad_t*)malloc( mad_size ); + if ( !p_mad ) { p_vend_wrap->p_madt_struct = NULL; return NULL; } + + memset(p_mad, 0, mad_size); + p_vend_wrap->p_madt_struct = NULL; p_vend_wrap->direction = SEND; p_vend_wrap->size =mad_size; @@ -489,7 +494,7 @@ osm_vendor_put( /* For a send the PostSend released the MAD with Umadt. Simply dealloacte the */ /* local memory that was allocated on the osm_vendor_get() call. */ /* */ - cl_free(p_mad); + free(p_mad); #if 0 Status = p_umadt_obj->uMadtInterface.uMadtReleaseSendMad(p_mad_bind_info->umadt_handle, p_vend_wrap->p_madt_struct); @@ -584,14 +589,15 @@ osm_vendor_send( p_mad->trans_id = cl_ntoh64(p_mad->trans_id)<<24; /* */ - /* Creat a transaction context for this send and save the TID and client context. */ + /* Create a transaction context for this send and save the TID and client context. */ /* */ if ( resp_expected ) { - p_trans_context = cl_zalloc(sizeof(trans_context_t)); + p_trans_context = malloc(sizeof(trans_context_t)); CL_ASSERT(p_trans_context); + memset(p_trans_context, 0, sizeof(trans_context_t)); p_trans_context->trans_id = p_mad->trans_id; p_trans_context->context = transaction_context; p_trans_context->sent_time = cl_get_time_stamp(); @@ -774,9 +780,12 @@ osm_vendor_bind( CL_ASSERT( mad_recv_callback ); /* Allocate memory for registering the handle. */ - p_mad_bind_info = (mad_bind_info_t*)cl_zalloc(sizeof(*p_mad_bind_info)); - - p_umadt_reg_class = &p_mad_bind_info->umadt_reg_class ; + p_mad_bind_info = (mad_bind_info_t*)malloc(sizeof(*p_mad_bind_info)); + if (p_mad_bind_info) + { + memset(p_mad_bind_info, 0, sizeof(*p_mad_bind_info)); + p_umadt_reg_class = &p_mad_bind_info->umadt_reg_class; + } p_umadt_reg_class->PortGuid = cl_ntoh64( p_osm_bind_info->port_guid ); p_umadt_reg_class->ClassId = p_osm_bind_info->mad_class; p_umadt_reg_class->ClassVersion = p_osm_bind_info->class_version; @@ -797,7 +806,7 @@ osm_vendor_bind( &p_mad_bind_info->umadt_handle); if (Status != FSUCCESS) { - cl_free(p_mad_bind_info); + free(p_mad_bind_info); OSM_LOG_EXIT( p_umadt_obj->p_log ); return( OSM_BIND_INVALID_HANDLE ); } @@ -875,7 +884,7 @@ osm_vendor_unbind(IN osm_bind_handle_t h p_next_list_item = cl_qlist_next(p_list_item); cl_qlist_remove_item(&p_mad_bind_info->trans_ctxt_list, p_list_item); - cl_free(p_list_item); + free(p_list_item); p_list_item = p_next_list_item; } cl_spinlock_release(&p_mad_bind_info->trans_ctxt_lock); @@ -887,12 +896,12 @@ osm_vendor_unbind(IN osm_bind_handle_t h p_next_list_item = cl_qlist_next(p_list_item); cl_qlist_remove_item(&p_mad_bind_info->timeout_list, p_list_item); - cl_free(p_list_item); + free(p_list_item); p_list_item = p_next_list_item; } cl_spinlock_release(&p_mad_bind_info->timeout_list_lock); - cl_free(p_mad_bind_info); + free(p_mad_bind_info); } /********************************************************************** @@ -1025,7 +1034,7 @@ __mad_recv_processor( transaction_context =((trans_context_t*)p_list_item)->context; cl_qlist_remove_item(&p_mad_bind_info->trans_ctxt_list, p_list_item); - cl_free(p_list_item); + free(p_list_item); } cl_spinlock_release(&p_mad_bind_info->trans_ctxt_lock); ((ib_mad_t*)p_osm_madw->p_mad)->trans_id = cl_ntoh64(p_osm_madw->p_mad->trans_id>>24); @@ -1137,7 +1146,7 @@ __osm_vendor_timer_callback( p_next_list_item = cl_qlist_next(p_list_item); cl_qlist_remove_item(&p_mad_bind_info->timeout_list, p_list_item); - cl_free(p_list_item); + free(p_list_item); p_list_item = p_next_list_item; } Index: osm/libvendor/osm_vendor_mlx_rmpp_ctx.c =================================================================== --- osm/libvendor/osm_vendor_mlx_rmpp_ctx.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_rmpp_ctx.c (working copy) @@ -38,10 +38,10 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include -#include #include #include @@ -91,7 +91,7 @@ osmv_rmpp_send_ctx_done(IN osmv_rmpp_sen CL_ASSERT(p_ctx); cl_event_destroy(&p_ctx->event); osmv_rmpp_sar_done(&p_ctx->sar); - cl_free(p_ctx); + free(p_ctx); } @@ -211,9 +211,10 @@ osmv_rmpp_recv_ctx_init(osmv_rmpp_recv_c p_ctx->is_sa_mad = FALSE; - p_ctx->p_rbuf = cl_zalloc(sizeof(cl_qlist_t)); + p_ctx->p_rbuf = malloc(sizeof(cl_qlist_t)); if (p_ctx->p_rbuf) { + memset(p_ctx->p_rbuf, 0, sizeof(cl_qlist_t)); cl_qlist_init(p_ctx->p_rbuf); p_ctx->expected_seg = 1; } else st= IB_INSUFFICIENT_MEMORY; @@ -237,16 +238,16 @@ osmv_rmpp_recv_ctx_done(IN osmv_rmpp_rec p_obj = PARENT_STRUCT(p_list_item,cl_list_obj_t,list_item); - cl_free(cl_qlist_obj(p_obj)); - cl_free(p_obj); + free(cl_qlist_obj(p_obj)); + free(p_obj); p_list_item = cl_qlist_remove_head(p_ctx->p_rbuf); } osmv_rmpp_sar_done(&p_ctx->sar); - cl_free(p_ctx->p_rbuf); - cl_free(p_ctx); + free(p_ctx->p_rbuf); + free(p_ctx); } @@ -260,20 +261,22 @@ osmv_rmpp_recv_ctx_store_mad_seg(IN osmv OSM_LOG_ENTER(p_recv_ctx->p_log, osmv_rmpp_recv_ctx_store_mad_seg); CL_ASSERT(p_recv_ctx); - p_list_mad= cl_zalloc(MAD_BLOCK_SIZE); + p_list_mad= malloc(MAD_BLOCK_SIZE); if (NULL == p_list_mad) { return IB_INSUFFICIENT_MEMORY; } - memcpy(p_list_mad,p_mad,MAD_BLOCK_SIZE); + memset(p_list_mad, 0, MAD_BLOCK_SIZE); + memcpy(p_list_mad, p_mad, MAD_BLOCK_SIZE); - p_obj = cl_zalloc(sizeof(cl_list_obj_t)); + p_obj = malloc(sizeof(cl_list_obj_t)); if (NULL == p_obj) { - cl_free(p_list_mad); + free(p_list_mad); return IB_INSUFFICIENT_MEMORY; } - cl_qlist_set_obj(p_obj,p_list_mad); + memset(p_obj, 0, sizeof(cl_list_obj_t)); + cl_qlist_set_obj(p_obj, p_list_mad); cl_qlist_insert_tail(p_recv_ctx->p_rbuf,&p_obj->list_item); Index: osm/libvendor/osm_vendor_mtl_hca_guid.c =================================================================== --- osm/libvendor/osm_vendor_mtl_hca_guid.c (revision 7470) +++ osm/libvendor/osm_vendor_mtl_hca_guid.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -42,10 +42,10 @@ #if defined(OSM_VENDOR_INTF_MTL) | defined(OSM_VENDOR_INTF_TS) #undef IN #undef OUT +#include #include #include #include -#include #include #include @@ -148,8 +148,8 @@ __osm_vendor_get_ca_ids( IN osm_vendor_t /* allocate and really call - user of this function needs to deallocate it */ *p_hca_ids = - ( VAPI_hca_id_t * ) cl_malloc( *p_num_guids * - sizeof( VAPI_hca_id_t ) ); + ( VAPI_hca_id_t * ) malloc( *p_num_guids * + sizeof( VAPI_hca_id_t ) ); /* now call it really */ vapi_res = EVAPI_list_hcas( *p_num_guids, p_num_guids, *p_hca_ids ); @@ -239,15 +239,15 @@ __osm_ca_info_init( IN osm_vendor_t * co memcpy( &( p_ca_info->guid ), hca_cap.node_guid, 8 * sizeof( u_int8_t ) ); p_ca_info->attr_size = 1; p_ca_info->p_attr = - ( ib_ca_attr_t * ) cl_malloc( sizeof( ib_ca_attr_t ) ); + ( ib_ca_attr_t * ) malloc( sizeof( ib_ca_attr_t ) ); memcpy( &( p_ca_info->p_attr->ca_guid ), hca_cap.node_guid, 8 * sizeof( u_int8_t ) ); /* now obtain the attributes of the ports */ p_ca_info->p_attr->num_ports = hca_cap.phys_port_num; p_ca_info->p_attr->p_port_attr = - ( ib_port_attr_t * ) cl_malloc( hca_cap.phys_port_num * - sizeof( ib_port_attr_t ) ); + ( ib_port_attr_t * ) malloc( hca_cap.phys_port_num * + sizeof( ib_port_attr_t ) ); for( port_num = 0; port_num < p_ca_info->p_attr->num_ports; port_num++ ) { @@ -268,7 +268,7 @@ __osm_ca_info_init( IN osm_vendor_t * co VAPI_query_hca_gid_tbl( hca_hndl, port_num + 1, 0, &maxNumGids, NULL ); p_port_gid = - ( IB_gid_t * ) cl_malloc( maxNumGids * sizeof( IB_gid_t ) ); + ( IB_gid_t * ) malloc( maxNumGids * sizeof( IB_gid_t ) ); vapi_res = VAPI_query_hca_gid_tbl( hca_hndl, port_num + 1, maxNumGids, @@ -288,7 +288,7 @@ __osm_ca_info_init( IN osm_vendor_t * co p_ca_info->p_attr->p_port_attr[port_num].link_state = hca_port.state; p_ca_info->p_attr->p_port_attr[port_num].sm_lid = hca_port.sm_lid; - cl_free( p_port_gid ); + free( p_port_gid ); } status = IB_SUCCESS; @@ -309,12 +309,12 @@ osm_ca_info_destroy( IN osm_vendor_t * c { if( p_ca_info->p_attr->num_ports ) { - cl_free( p_ca_info->p_attr->p_port_attr ); + free( p_ca_info->p_attr->p_port_attr ); } - cl_free( p_ca_info->p_attr ); + free( p_ca_info->p_attr ); } - cl_free( p_ca_info ); + free( p_ca_info ); OSM_LOG_EXIT( p_vend->p_log ); } @@ -359,7 +359,7 @@ osm_vendor_get_all_port_attr( IN osm_ven } /* we keep track of all the CAs in this info array */ - p_vend->p_ca_info = cl_zalloc( ca_count * sizeof( *p_vend->p_ca_info ) ); + p_vend->p_ca_info = malloc( ca_count * sizeof( *p_vend->p_ca_info ) ); if( p_vend->p_ca_info == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -368,6 +368,7 @@ osm_vendor_get_all_port_attr( IN osm_ven goto Exit; } + memset( p_vend->p_ca_info, 0, ca_count * sizeof( *p_vend->p_ca_info ) ); p_vend->ca_count = ca_count; /* @@ -433,7 +434,7 @@ osm_vendor_get_all_port_attr( IN osm_ven *p_num_ports = total_ports; if( p_ca_ids ) - cl_free( p_ca_ids ); + free( p_ca_ids ); OSM_LOG_EXIT( p_vend->p_log ); return ( status ); @@ -521,7 +522,7 @@ osm_vendor_get_guid_ca_and_port( IN osm_ VAPI_query_hca_gid_tbl( hca_hndl, portIdx + 1, 0, &maxNumGids, NULL ); p_port_gid = - ( IB_gid_t * ) cl_malloc( maxNumGids * sizeof( IB_gid_t ) ); + ( IB_gid_t * ) malloc( maxNumGids * sizeof( IB_gid_t ) ); /* get the port guid */ vapi_res = @@ -549,7 +550,7 @@ osm_vendor_get_guid_ca_and_port( IN osm_ goto Exit; } - cl_free( p_port_gid ); + free( p_port_gid ); p_port_gid = NULL; } /* ALL PORTS */ } /* all HCAs */ @@ -562,9 +563,9 @@ osm_vendor_get_guid_ca_and_port( IN osm_ Exit: if( p_ca_ids != NULL ) - cl_free( p_ca_ids ); + free( p_ca_ids ); if( p_port_gid != NULL ) - cl_free( p_port_gid ); + free( p_port_gid ); OSM_LOG_EXIT( p_vend->p_log ); return ( status ); } Index: osm/libvendor/osm_vendor_test.c =================================================================== --- osm/libvendor/osm_vendor_test.c (revision 7470) +++ osm/libvendor/osm_vendor_test.c (working copy) @@ -56,8 +56,8 @@ #ifdef OSM_VENDOR_INTF_TEST +#include #include -#include #include #include #include @@ -89,7 +89,7 @@ osm_vendor_delete( CL_ASSERT( pp_vend ); osm_vendor_destroy( *pp_vend ); - cl_free( *pp_vend ); + free( *pp_vend ); *pp_vend = NULL; } @@ -125,9 +125,11 @@ osm_vendor_new( CL_ASSERT( p_log ); - p_vend = cl_zalloc( sizeof(*p_vend) ); + p_vend = malloc( sizeof(*p_vend) ); if( p_vend != NULL ) { + memset( p_vend, 0, sizeof(*p_vend) ); + status = osm_vendor_init( p_vend, p_log, timeout ); if( status != IB_SUCCESS ) { @@ -158,12 +160,15 @@ osm_vendor_get( /* Simply malloc the MAD off the heap. */ - p_mad = (ib_mad_t*)cl_zalloc( size ); + p_mad = (ib_mad_t*)malloc( size ); osm_log( p_vend->p_log, OSM_LOG_VERBOSE, "osm_vendor_get: " "MAD %p.\n", p_mad ); + if (p_mad) + memset( p_mad, 0, size ); + OSM_LOG_EXIT( p_vend->p_log ); return( p_mad ); } @@ -191,7 +196,7 @@ osm_vendor_put( /* Return the MAD to the heap. */ - cl_free( p_mad ); + free( p_mad ); OSM_LOG_EXIT( p_vend->p_log ); } @@ -249,9 +254,10 @@ osm_vendor_bind( UNUSED_PARAM( mad_recv_callback ); UNUSED_PARAM( context ); - h_bind = (osm_bind_handle_t)cl_zalloc(sizeof(*h_bind) ); + h_bind = (osm_bind_handle_t)malloc(sizeof(*h_bind) ); if( h_bind != NULL ) { + memset(h_bind, 0, sizeof(*h_bind)); h_bind->p_vend = p_vend; h_bind->port_guid = p_bind_info->port_guid; h_bind->mad_class = p_bind_info->mad_class; Index: osm/libvendor/osm_vendor_mlx_ibmgt.c =================================================================== --- osm/libvendor/osm_vendor_mlx_ibmgt.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_ibmgt.c (working copy) @@ -46,9 +46,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include #include #include #include @@ -127,11 +127,12 @@ osmv_transport_init(IN osm_bind_info_t * osm_log(p_bo->p_vendor->p_log, OSM_LOG_DEBUG, "osmv_transport_init: first bind() for the vendor\n"); p_bo->p_vendor->p_transport_info - = (osmv_IBMGT_transport_info_t*) cl_zalloc(sizeof(osmv_IBMGT_transport_info_t)); + = (osmv_IBMGT_transport_info_t*) malloc(sizeof(osmv_IBMGT_transport_info_t)); if (NULL == p_bo->p_vendor->p_transport_info) { return IB_INSUFFICIENT_MEMORY; } + memset(p_bo->p_vendor->p_transport_info, 0, sizeof(osmv_IBMGT_transport_info_t)); p_tpot_info = (osmv_IBMGT_transport_info_t*)(p_bo->p_vendor->p_transport_info); p_tpot_info->smi_h = 0xffffffff; @@ -155,17 +156,19 @@ osmv_transport_init(IN osm_bind_info_t * /* allocate transport mgr */ - p_mgr = cl_zalloc(sizeof(osmv_IBMGT_transport_mgr_t)); + p_mgr = malloc(sizeof(osmv_IBMGT_transport_mgr_t)); if (NULL == p_mgr) { - cl_free(p_tpot_info); + free(p_tpot_info); osm_log(p_bo->p_vendor->p_log, OSM_LOG_ERROR, "osmv_transport_init: ERR 7201: " "alloc failed \n"); return IB_INSUFFICIENT_MEMORY; } - p_bo->p_transp_mgr = p_mgr ; + memset(p_mgr, 0, sizeof(osmv_IBMGT_transport_mgr_t)); + + p_bo->p_transp_mgr = p_mgr; switch ( p_info->mad_class ) { case IB_MCLASS_SUBN_LID: @@ -198,7 +201,7 @@ osmv_transport_init(IN osm_bind_info_t * "osmv_transport_init: ERR 7202: " "IB_MGT_get_handle for smi failed \n"); st = IB_ERROR; - cl_free(p_mgr); + free(p_mgr); goto Exit; } @@ -213,12 +216,12 @@ osmv_transport_init(IN osm_bind_info_t * "osmv_transport_init: ERR 7203: " "IB_MGT_bind_sm failed \n"); st = IB_ERROR; - cl_free( p_mgr); + free( p_mgr); goto Exit; } /* init smi list */ - p_tpot_info->p_smi_list = cl_zalloc(sizeof(cl_qlist_t)); + p_tpot_info->p_smi_list = malloc(sizeof(cl_qlist_t)); if (NULL == p_tpot_info->p_smi_list) { osm_log(p_bo->p_vendor->p_log, OSM_LOG_ERROR, @@ -226,9 +229,10 @@ osmv_transport_init(IN osm_bind_info_t * "alloc failed \n"); IB_MGT_unbind_sm(p_tpot_info->smi_h); IB_MGT_release_handle(p_tpot_info->smi_h); - cl_free(p_mgr); + free(p_mgr); return IB_INSUFFICIENT_MEMORY; } + memset(p_tpot_info->p_smi_list, 0, sizeof(cl_qlist_t)); cl_qlist_init(p_tpot_info->p_smi_list); osm_log(p_bo->p_vendor->p_log, OSM_LOG_DEBUG, @@ -248,17 +252,19 @@ osmv_transport_init(IN osm_bind_info_t * ); IB_MGT_unbind_sm(p_tpot_info->smi_h); IB_MGT_release_handle(p_tpot_info->smi_h); - cl_free(p_tpot_info->p_smi_list); - cl_free(p_mgr); + free(p_tpot_info->p_smi_list); + free(p_mgr); st= IB_ERROR; goto Exit; } } /* insert to list of smi's - for raising callbacks later on */ - p_obj = cl_zalloc(sizeof(cl_list_obj_t)); - cl_qlist_set_obj(p_obj,p_bo); - cl_qlist_insert_tail(p_tpot_info->p_smi_list,&p_obj->list_item); + p_obj = malloc(sizeof(cl_list_obj_t)); + if (p_obj) + memset(p_obj, 0, sizeof(cl_list_obj_t)); + cl_qlist_set_obj(p_obj, p_bo); + cl_qlist_insert_tail(p_tpot_info->p_smi_list, &p_obj->list_item); break; @@ -278,7 +284,7 @@ osmv_transport_init(IN osm_bind_info_t * "osmv_transport_init: ERR 7207: " "IB_MGT_get_handle for gsi failed \n"); st = IB_ERROR; - cl_free(p_mgr); + free(p_mgr); goto Exit; } } @@ -293,21 +299,24 @@ osmv_transport_init(IN osm_bind_info_t * "osmv_transport_init: ERR 7208: " "IB_MGT_bind_gsi_class failed \n"); st = IB_ERROR; - cl_free( p_mgr); + free( p_mgr); goto Exit; } - p_tpot_info->gsi_mgmt_lists[p_info->mad_class] = cl_zalloc(sizeof(cl_qlist_t)); + p_tpot_info->gsi_mgmt_lists[p_info->mad_class] = malloc(sizeof(cl_qlist_t)); if (NULL == p_tpot_info->gsi_mgmt_lists[p_info->mad_class]) { IB_MGT_unbind_gsi_class(p_tpot_info->gsi_h,p_info->mad_class); - cl_free(p_mgr); + free(p_mgr); return IB_INSUFFICIENT_MEMORY; } + memset(p_tpot_info->gsi_mgmt_lists[p_info->mad_class], 0, sizeof(cl_qlist_t)); cl_qlist_init(p_tpot_info->gsi_mgmt_lists[p_info->mad_class]); } /* insert to list of smi's - for raising callbacks later on */ - p_obj = cl_zalloc(sizeof(cl_list_obj_t)); + p_obj = malloc(sizeof(cl_list_obj_t)); + if (p_obj) + memset(p_obj, 0, sizeof(cl_list_obj_t)); cl_qlist_set_obj(p_obj,p_bo); cl_qlist_insert_tail(p_tpot_info->gsi_mgmt_lists[p_info->mad_class],&p_obj->list_item); @@ -322,8 +331,8 @@ osmv_transport_init(IN osm_bind_info_t * if (ret != IB_SUCCESS) { IB_MGT_unbind_gsi_class(p_tpot_info->gsi_h,p_mgr->mgmt_class); - cl_free(p_tpot_info->gsi_mgmt_lists[p_mgr->mgmt_class]); - cl_free(p_mgr); + free(p_tpot_info->gsi_mgmt_lists[p_mgr->mgmt_class]); + free(p_mgr); st= IB_ERROR; goto Exit; } @@ -334,7 +343,7 @@ osmv_transport_init(IN osm_bind_info_t * osm_log(p_log, OSM_LOG_ERROR, "osmv_transport_init: ERR 7209: unrecognized mgmt class \n" ); st = IB_ERROR; - cl_free( p_mgr); + free( p_mgr); goto Exit; } @@ -523,12 +532,12 @@ osmv_transport_done(IN const osm_bind_ha CL_ASSERT(p_item != cl_qlist_end(p_list)); cl_qlist_remove_item(p_list,p_item); - if (p_obj) cl_free(p_obj); + if (p_obj) free(p_obj); /* no one is binded to smi anymore - we can free the list, unbind & realease the hndl*/ if (cl_is_qlist_empty(p_list) == TRUE) { - cl_free(p_list); + free(p_list); p_list = NULL; ret = IB_MGT_unbind_sm(p_tpot_info->smi_h); @@ -566,12 +575,12 @@ osmv_transport_done(IN const osm_bind_ha CL_ASSERT(p_item != cl_qlist_end(p_list)); cl_qlist_remove_item(p_list,p_item); - if (p_obj) cl_free(p_obj); + if (p_obj) free(p_obj); /* no one is binded to this class anymore - we can free the list and unbind this class*/ if (cl_is_qlist_empty(p_list) == TRUE) { - cl_free(p_list); + free(p_list); p_list = NULL; ret = IB_MGT_unbind_gsi_class(p_tpot_info->gsi_h,p_mgr->mgmt_class); @@ -604,7 +613,7 @@ osmv_transport_done(IN const osm_bind_ha } }/* end switch */ - cl_free(p_mgr); + free(p_mgr); } Index: osm/libvendor/osm_vendor_mlx_hca_sim.c =================================================================== --- osm/libvendor/osm_vendor_mlx_hca_sim.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_hca_sim.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -44,8 +44,8 @@ #undef OUT #include -#include #include +#include #include #include #include @@ -560,14 +560,14 @@ __osm_ca_info_init( IN osm_vendor_t * co /* set size of attributes and allocate them */ p_ca_info->attr_size = 1; - p_ca_info->p_attr = ( ib_ca_attr_t * ) cl_malloc( sizeof( ib_ca_attr_t ) ); + p_ca_info->p_attr = ( ib_ca_attr_t * ) malloc( sizeof( ib_ca_attr_t ) ); p_ca_info->p_attr->ca_guid = p_ca_info->guid; p_ca_info->p_attr->num_ports = sim_ca_info.num_ports; /* now obtain the attributes of the ports */ p_ca_info->p_attr->p_port_attr = - ( ib_port_attr_t * ) cl_malloc( sim_ca_info.num_ports * sizeof( ib_port_attr_t ) ); + ( ib_port_attr_t * ) malloc( sim_ca_info.num_ports * sizeof( ib_port_attr_t ) ); /* get all the ports info */ for( port_num = 1; port_num <= sim_ca_info.num_ports; port_num++ ) @@ -625,14 +625,14 @@ osm_ca_info_destroy( IN osm_vendor_t * c { if(0 != p_ca->p_attr->num_ports) { - cl_free( p_ca->p_attr->p_port_attr ); + free( p_ca->p_attr->p_port_attr ); } - cl_free( p_ca->p_attr); + free( p_ca->p_attr); } } - cl_free( p_ca_info ); + free( p_ca_info ); OSM_LOG_EXIT( p_vend->p_log ); } @@ -671,7 +671,7 @@ osm_vendor_get_all_port_attr( IN osm_ven } /* Allocate an array big enough to hold the ca info objects*/ - p_ca_infos = cl_zalloc( ca_count * sizeof( osm_ca_info_t ) ); + p_ca_infos = malloc( ca_count * sizeof( osm_ca_info_t ) ); if( p_ca_infos == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -680,6 +680,8 @@ osm_vendor_get_all_port_attr( IN osm_ven goto Exit; } + memset( p_ca_infos, 0, ca_count * sizeof( osm_ca_info_t ) ); + /* * For each CA, retrieve the CA info attributes */ Index: osm/libvendor/osm_vendor_ts.c =================================================================== --- osm/libvendor/osm_vendor_ts.c (revision 7470) +++ osm/libvendor/osm_vendor_ts.c (working copy) @@ -40,10 +40,10 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include -#include #include #include @@ -227,7 +227,7 @@ osm_vendor_delete( IN osm_vendor_t ** co CL_ASSERT( pp_vend ); osm_vendor_destroy( *pp_vend ); - cl_free( *pp_vend ); + free( *pp_vend ); *pp_vend = NULL; } @@ -272,9 +272,11 @@ osm_vendor_new( CL_ASSERT( p_log ); - p_vend = cl_zalloc( sizeof( *p_vend ) ); + p_vend = malloc( sizeof( *p_vend ) ); if( p_vend != NULL ) { + memset( p_vend, 0, sizeof( *p_vend ) ); + status = osm_vendor_init( p_vend, p_log, timeout ); if( status != IB_SUCCESS ) { @@ -717,7 +719,7 @@ osm_vendor_get( IN osm_bind_handle_t h_b p_vw->size = mad_size; /* allocate it */ - p_mad = ( ib_mad_t * ) cl_zalloc( p_vw->size ); + p_mad = ( ib_mad_t * ) malloc( p_vw->size ); if( p_mad == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -774,7 +776,7 @@ osm_vendor_put( */ /* free the mad but the wrapper is part of the madw object */ - cl_free( p_vw->p_mad_buf ); + free( p_vw->p_mad_buf ); p_vw->p_mad_buf = NULL; p_madw = PARENT_STRUCT( p_vw, osm_madw_t, vend_wrap); p_madw->p_mad = NULL; Index: osm/libvendor/osm_vendor_mlx_anafa.c =================================================================== --- osm/libvendor/osm_vendor_mlx_anafa.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_anafa.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -40,6 +40,7 @@ #endif /* HAVE_CONFIG_H */ #include +#include #include #include #include @@ -55,7 +56,6 @@ #include #include -#include #include /** @@ -86,8 +86,9 @@ osm_vendor_new (IN osm_log_t * const p_l CL_ASSERT (p_log); - p_vend = cl_zalloc (sizeof (*p_vend)); + p_vend = malloc (sizeof (*p_vend)); if (p_vend != NULL) { + memset(p_vend, 0, sizeof (*p_vend)); status = osm_vendor_init (p_vend, p_log, timeout); if (status != IB_SUCCESS) { osm_vendor_delete (&p_vend); @@ -134,14 +135,14 @@ osm_vendor_delete (IN osm_vendor_t ** co __osm_vendor_internal_unbind (bind_h); - cl_free (p_obj); - /*removing from list */ + free (p_obj); + /* removing from list */ p_item = cl_qlist_remove_head (&((*pp_vend)->bind_handles)); } } if (NULL != ((*pp_vend)->p_transport_info)) { - cl_free ((*pp_vend)->p_transport_info); + free ((*pp_vend)->p_transport_info); (*pp_vend)->p_transport_info = NULL; } @@ -150,7 +151,7 @@ osm_vendor_delete (IN osm_vendor_t ** co osm_pkt_randomizer_destroy (&((*pp_vend)->p_pkt_randomizer), p_log); - cl_free (*pp_vend); + free (*pp_vend); *pp_vend = NULL; OSM_LOG_EXIT (p_log); @@ -177,11 +178,13 @@ osm_vendor_init (IN osm_vendor_t * const p_vend->ttime_timeout = timeout * OSMV_TXN_TIMEOUT_FACTOR; p_vend->p_transport_info = (osmv_TOPSPIN_ANAFA_transport_info_t *) - cl_zalloc (sizeof (osmv_TOPSPIN_ANAFA_transport_info_t)); + malloc (sizeof (osmv_TOPSPIN_ANAFA_transport_info_t)); if (!p_vend->p_transport_info) { return IB_ERROR; } + memset(p_vend->p_transport_info, 0, sizeof (osmv_TOPSPIN_ANAFA_transport_info_t)); + /* update the run_randomizer flag */ if (getenv ("OSM_PKT_DROP_RATE") != NULL && atol (getenv ("OSM_PKT_DROP_RATE")) != 0) { @@ -247,7 +250,7 @@ osm_vendor_bind (IN osm_vendor_t * const return OSM_BIND_INVALID_HANDLE; } - p_bo = cl_zalloc (sizeof (osmv_bind_obj_t)); + p_bo = malloc (sizeof (osmv_bind_obj_t)); if (NULL == p_bo) { osm_log (p_bo->p_vendor->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 7403: " @@ -255,6 +258,7 @@ osm_vendor_bind (IN osm_vendor_t * const return OSM_BIND_INVALID_HANDLE; } + memset (p_bo, 0, sizeof (osmv_bind_obj_t)); p_bo->p_vendor = p_vend; p_bo->recv_cb = mad_recv_callback; p_bo->send_err_cb = send_err_callback; @@ -276,7 +280,7 @@ osm_vendor_bind (IN osm_vendor_t * const osm_log (p_bo->p_vendor->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 7405: " "could not initialize the spinlock ...\n"); - cl_free (p_bo); + free (p_bo); return OSM_BIND_INVALID_HANDLE; } @@ -288,7 +292,7 @@ osm_vendor_bind (IN osm_vendor_t * const "osm_vendor_bind: ERR 7406: " "osmv_txnmgr_init failed \n"); cl_spinlock_destroy (&p_bo->lock); - cl_free (p_bo); + free (p_bo); return OSM_BIND_INVALID_HANDLE; } @@ -300,12 +304,12 @@ osm_vendor_bind (IN osm_vendor_t * const "osmv_transport_init failed \n"); osmv_txnmgr_done ((osm_bind_handle_t) p_bo); cl_spinlock_destroy (&p_bo->lock); - cl_free (p_bo); + free (p_bo); return OSM_BIND_INVALID_HANDLE; } /* insert bind handle into db */ - p_obj = cl_zalloc (sizeof (cl_list_obj_t)); + p_obj = malloc (sizeof (cl_list_obj_t)); if (NULL == p_obj) { osm_log (p_bo->p_vendor->p_log, OSM_LOG_ERROR, @@ -315,9 +319,11 @@ osm_vendor_bind (IN osm_vendor_t * const osmv_transport_done (p_bo->p_transp_mgr); osmv_txnmgr_done ((osm_bind_handle_t) p_bo); cl_spinlock_destroy (&p_bo->lock); - cl_free (p_bo); + free (p_bo); return OSM_BIND_INVALID_HANDLE; } + if (p_obj) + memset (p_obj, 0, sizeof (cl_list_obj_t)); cl_qlist_set_obj (p_obj, p_bo); cl_qlist_insert_head (&p_vend->bind_handles, &p_obj->list_item); @@ -357,7 +363,7 @@ osm_vendor_unbind (IN osm_bind_handle_t CL_ASSERT (p_item != cl_qlist_end (p_bh_list)); cl_qlist_remove_item (p_bh_list, p_item); - cl_free (p_obj); + free (p_obj); __osm_vendor_internal_unbind (h_bind); @@ -391,7 +397,7 @@ osm_vendor_get (IN osm_bind_handle_t h_b } /* allocate it */ - p_mad = (ib_mad_t *) cl_zalloc (act_mad_size); + p_mad = (ib_mad_t *) malloc (act_mad_size); if (p_mad == NULL) { osm_log (p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_get: ERR 7409: " @@ -399,6 +405,8 @@ osm_vendor_get (IN osm_bind_handle_t h_b goto Exit; } + memset (p_mad, 0, act_mad_size); + if (osm_log_get_level (p_vend->p_log) >= OSM_LOG_DEBUG) { osm_log (p_vend->p_log, OSM_LOG_DEBUG, "osm_vendor_get: " @@ -547,7 +555,7 @@ osm_vendor_put (IN osm_bind_handle_t h_b "osm_vendor_put: " "Retiring MAD %p.\n", p_vw->p_mad); } - cl_free (p_vw->p_mad); + free (p_vw->p_mad); p_vw->p_mad = NULL; OSM_LOG_EXIT (p_vend->p_log); @@ -656,7 +664,7 @@ __osm_vendor_internal_unbind (osm_bind_h the client - and the client might use them cl_spinlock_destroy(&p_bo->lock); - cl_free(p_bo); + free(p_bo); */ OSM_LOG_EXIT (p_log); Index: osm/libvendor/osm_vendor_mtl.c =================================================================== --- osm/libvendor/osm_vendor_mtl.c (revision 7470) +++ osm/libvendor/osm_vendor_mtl.c (working copy) @@ -43,8 +43,8 @@ #ifdef OSM_VENDOR_INTF_MTL +#include #include -#include #include #include /* HACK - I do not know how to prevent complib from loading kernel H files */ @@ -306,7 +306,7 @@ osm_vendor_delete( IN osm_vendor_t ** co CL_ASSERT( pp_vend ); osm_vendor_destroy( *pp_vend ); - cl_free( *pp_vend ); + free( *pp_vend ); *pp_vend = NULL; } @@ -337,7 +337,7 @@ osm_vendor_init( IN osm_vendor_t * const */ ib_mgt_hdl_p = ( osm_vendor_mgt_bind_t * ) - cl_malloc( sizeof( osm_vendor_mgt_bind_t ) ); + malloc( sizeof( osm_vendor_mgt_bind_t ) ); if( ib_mgt_hdl_p == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -376,9 +376,10 @@ osm_vendor_new( CL_ASSERT( p_log ); - p_vend = cl_zalloc( sizeof( *p_vend ) ); + p_vend = malloc( sizeof( *p_vend ) ); if( p_vend != NULL ) { + memset( p_vend, 0, sizeof( *p_vend ) ); status = osm_vendor_init( p_vend, p_log, timeout ); if( status != IB_SUCCESS ) { @@ -675,7 +676,7 @@ osm_vendor_bind( IN osm_vendor_t * const } /* create the bind object tracking this binding */ - p_bind = (osm_mtl_bind_info_t *)cl_malloc( sizeof(osm_mtl_bind_info_t) ); + p_bind = (osm_mtl_bind_info_t *)malloc( sizeof(osm_mtl_bind_info_t) ); memset(p_bind, 0, sizeof(osm_mtl_bind_info_t)); if( p_bind == NULL ) { @@ -736,7 +737,7 @@ osm_vendor_bind( IN osm_vendor_t * const &( ib_mgt_hdl_p->smi_mads_hdl ) ); if( IB_MGT_OK != mgt_ret ) { - cl_free( p_bind ); + free( p_bind ); p_bind = NULL; osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 3C16: " @@ -748,7 +749,7 @@ osm_vendor_bind( IN osm_vendor_t * const mgt_ret = IB_MGT_bind_sm( ib_mgt_hdl_p->smi_mads_hdl ); if( IB_MGT_OK != mgt_ret ) { - cl_free( p_bind ); + free( p_bind ); p_bind = NULL; osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 3C17: " @@ -790,7 +791,7 @@ osm_vendor_bind( IN osm_vendor_t * const &( ib_mgt_hdl_p->gsi_mads_hdl ) ); if( IB_MGT_OK != mgt_ret ) { - cl_free( p_bind ); + free( p_bind ); p_bind = NULL; osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 3C20: " @@ -804,7 +805,7 @@ osm_vendor_bind( IN osm_vendor_t * const p_user_bind->mad_class ); if( IB_MGT_OK != mgt_ret ) { - cl_free( p_bind ); + free( p_bind ); p_bind = NULL; osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 3C22: " @@ -837,7 +838,7 @@ osm_vendor_bind( IN osm_vendor_t * const if( IB_MGT_OK != mgt_ret ) { - cl_free( p_bind ); + free( p_bind ); p_bind = NULL; osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 3C23: " @@ -875,7 +876,7 @@ osm_vendor_get( IN osm_bind_handle_t h_b p_vw->size = MAD_BLOCK_SIZE; /* allocate it */ - mad_p = ( ib_mad_t * ) cl_zalloc( p_vw->size ); + mad_p = ( ib_mad_t * ) malloc( p_vw->size ); if( mad_p == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -931,7 +932,7 @@ osm_vendor_put( IN osm_bind_handle_t h_b */ /* free the mad but the wrapper is part of the madw object */ - cl_free( p_vw->mad_buf_p ); + free( p_vw->mad_buf_p ); p_vw->mad_buf_p = NULL; p_madw = PARENT_STRUCT( p_vw, osm_madw_t, vend_wrap); p_madw->p_mad = NULL; Index: osm/libvendor/osm_vendor_al.c =================================================================== --- osm/libvendor/osm_vendor_al.c (revision 7470) +++ osm/libvendor/osm_vendor_al.c (working copy) @@ -59,8 +59,8 @@ #ifdef OSM_VENDOR_INTF_AL +#include #include -#include #include #include #include @@ -415,7 +415,7 @@ osm_vendor_new( OSM_LOG_ENTER( p_log, osm_vendor_new ); - p_vend = cl_zalloc( sizeof(*p_vend) ); + p_vend = malloc( sizeof(*p_vend) ); if( p_vend == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -424,10 +424,12 @@ osm_vendor_new( goto Exit; } + memset( p_vend, 0, sizeof(*p_vend) ); + status = osm_vendor_init( p_vend, p_log, timeout ); if( status != IB_SUCCESS ) { - cl_free( p_vend ); + free( p_vend ); p_vend = NULL; } @@ -444,7 +446,7 @@ osm_vendor_delete( { /* TO DO - fill this in */ ib_close_al( (*pp_vend)->h_al ); - cl_free( *pp_vend ); + free( *pp_vend ); *pp_vend = NULL; } @@ -483,7 +485,7 @@ __osm_ca_info_init( CL_ASSERT( p_ca_info->attr_size ); - p_ca_info->p_attr = cl_malloc( p_ca_info->attr_size ); + p_ca_info->p_attr = malloc( p_ca_info->attr_size ); if( p_ca_info->p_attr == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -519,9 +521,9 @@ osm_ca_info_destroy( OSM_LOG_ENTER( p_vend->p_log, osm_ca_info_destroy ); if( p_ca_info->p_attr ) - cl_free( p_ca_info->p_attr ); + free( p_ca_info->p_attr ); - cl_free( p_ca_info ); + free( p_ca_info ); OSM_LOG_EXIT( p_vend->p_log ); } @@ -540,10 +542,12 @@ osm_ca_info_new( CL_ASSERT( ca_guid ); - p_ca_info = cl_zalloc( sizeof(*p_ca_info) ); + p_ca_info = malloc( sizeof(*p_ca_info) ); if( p_ca_info == NULL ) goto Exit; + memset( p_ca_info, 0, sizeof(*p_ca_info) ); + status = __osm_ca_info_init( p_vend, p_ca_info, ca_guid ); if( status != IB_SUCCESS ) { @@ -591,7 +595,7 @@ __osm_vendor_get_ca_guids( goto Exit; } - *p_guids = cl_malloc( *p_num_guids * sizeof(**p_guids) ); + *p_guids = malloc( *p_num_guids * sizeof(**p_guids) ); if( *p_guids == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -681,7 +685,7 @@ osm_vendor_get_all_port_attr( */ status = __osm_vendor_get_ca_guids( p_vend, &p_ca_guid, &ca_count ); - p_vend->p_ca_info = cl_zalloc( ca_count * sizeof(*p_vend->p_ca_info) ); + p_vend->p_ca_info = malloc( ca_count * sizeof(*p_vend->p_ca_info) ); if( p_vend->p_ca_info == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -690,6 +694,7 @@ osm_vendor_get_all_port_attr( goto Exit; } + memset( p_vend->p_ca_info, 0, ca_count * sizeof(*p_vend->p_ca_info) ); p_vend->ca_count = ca_count; /* @@ -748,7 +753,7 @@ osm_vendor_get_all_port_attr( Exit: if( p_ca_guid ) - cl_free( p_ca_guid ); + free( p_ca_guid ); OSM_LOG_EXIT( p_vend->p_log ); return( status ); @@ -1003,7 +1008,7 @@ osm_vendor_bind( } } - p_bind = cl_zalloc( sizeof(*p_bind) ); + p_bind = malloc( sizeof(*p_bind) ); if( p_bind == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -1012,6 +1017,7 @@ osm_vendor_bind( goto Exit; } + memset( p_bind, 0, sizeof(*p_bind) ); p_bind->p_vend = p_vend; p_bind->client_context = context; p_bind->port_num = osm_vendor_get_port_num( p_vend, port_guid ); @@ -1055,7 +1061,7 @@ osm_vendor_bind( if( status != IB_SUCCESS ) { - cl_free( p_bind ); + free( p_bind ); osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 3B19: " "Unable to get QP handle (%s).\n", @@ -1088,7 +1094,7 @@ osm_vendor_bind( if( status != IB_SUCCESS ) { - cl_free( p_bind ); + free( p_bind ); osm_log( p_vend->p_log, OSM_LOG_ERROR, "osm_vendor_bind: ERR 3B21: " "Unable to register QP0 MAD service (%s).\n", Index: osm/libvendor/osm_vendor_mlx_ts_anafa.c =================================================================== --- osm/libvendor/osm_vendor_mlx_ts_anafa.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_ts_anafa.c (working copy) @@ -52,6 +52,7 @@ #include #include #include +#include #include #include @@ -60,7 +61,6 @@ #include #include -#include #include static void @@ -186,11 +186,13 @@ osmv_transport_init ( (osmv_TOPSPIN_ANAFA_transport_info_t *) p_bo->p_vendor-> p_transport_info; - p_mgr = cl_zalloc (sizeof (osmv_TOPSPIN_ANAFA_transport_mgr_t)); + p_mgr = malloc (sizeof (osmv_TOPSPIN_ANAFA_transport_mgr_t)); if (!p_mgr) { return IB_INSUFFICIENT_MEMORY; } + memset(p_mgr, 0, sizeof (osmv_TOPSPIN_ANAFA_transport_mgr_t)); + /* open TopSpin file device */ device_fd = open (device_file, O_RDWR); if (device_fd < 0) { @@ -355,7 +357,7 @@ osmv_transport_done (IN const osm_bind_h /* pthread_cancel (p_tpot_mgr->receiver.osd.id); */ cl_thread_destroy (&(p_tpot_mgr->receiver)); - cl_free (p_tpot_mgr); + free (p_tpot_mgr); } static void Index: osm/libvendor/osm_vendor_mlx.c =================================================================== --- osm/libvendor/osm_vendor_mlx.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx.c (working copy) @@ -38,6 +38,7 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include @@ -45,7 +46,6 @@ #include #include #include -#include /** * FORWARD REFERENCES @@ -78,9 +78,11 @@ osm_vendor_new( CL_ASSERT( p_log ); - p_vend = cl_zalloc( sizeof( *p_vend ) ); + p_vend = malloc( sizeof( *p_vend ) ); if ( p_vend != NULL ) { + memset( p_vend, 0, sizeof( *p_vend ) ); + status = osm_vendor_init( p_vend, p_log, timeout ); if ( status != IB_SUCCESS ) { @@ -126,14 +128,14 @@ osm_vendor_delete( IN osm_vendor_t ** co __osm_vendor_internal_unbind(bind_h); - cl_free(p_obj); + free(p_obj); /*removing from list */ p_item = cl_qlist_remove_head(&((*pp_vend)->bind_handles)); } if (NULL != ((*pp_vend)->p_transport_info)) { - cl_free((*pp_vend)->p_transport_info); + free((*pp_vend)->p_transport_info); (*pp_vend)->p_transport_info = NULL; } @@ -141,7 +143,7 @@ osm_vendor_delete( IN osm_vendor_t ** co if ( (*pp_vend)->run_randomizer == TRUE ) osm_pkt_randomizer_destroy( &((*pp_vend)->p_pkt_randomizer), p_log ); - cl_free( *pp_vend ); + free( *pp_vend ); *pp_vend = NULL; OSM_LOG_EXIT( p_log ); @@ -223,7 +225,7 @@ osm_vendor_bind( return OSM_BIND_INVALID_HANDLE; } - p_bo = cl_zalloc(sizeof(osmv_bind_obj_t)); + p_bo = malloc(sizeof(osmv_bind_obj_t)); if (NULL == p_bo) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -231,6 +233,7 @@ osm_vendor_bind( return OSM_BIND_INVALID_HANDLE; } + memset(p_bo, 0, sizeof(osmv_bind_obj_t)); p_bo->p_vendor = p_vend; p_bo->recv_cb = mad_recv_callback; p_bo->send_err_cb = send_err_callback; @@ -257,7 +260,7 @@ osm_vendor_bind( "Fail to find port number of port guid:0x%016"PRIx64"\n", p_bind_info->port_guid ); - cl_free(p_bo); + free(p_bo); return OSM_BIND_INVALID_HANDLE; } @@ -275,7 +278,7 @@ osm_vendor_bind( osm_log(p_bo->p_vendor->p_log,OSM_LOG_ERROR, "osm_vendor_bind: ERR 7305: " "could not initialize the spinlock ...\n"); - cl_free(p_bo); + free(p_bo); return OSM_BIND_INVALID_HANDLE; } @@ -287,7 +290,7 @@ osm_vendor_bind( "osm_vendor_bind: ERR 7306: " "osmv_txnmgr_init failed \n"); cl_spinlock_destroy(&p_bo->lock); - cl_free(p_bo); + free(p_bo); return OSM_BIND_INVALID_HANDLE; } @@ -299,12 +302,12 @@ osm_vendor_bind( "osmv_transport_init failed \n"); osmv_txnmgr_done((osm_bind_handle_t) p_bo); cl_spinlock_destroy(&p_bo->lock); - cl_free(p_bo); + free(p_bo); return OSM_BIND_INVALID_HANDLE; } /* insert bind handle into db */ - p_obj = cl_zalloc(sizeof(cl_list_obj_t)); + p_obj = malloc(sizeof(cl_list_obj_t)); if (NULL == p_obj) { @@ -315,9 +318,10 @@ osm_vendor_bind( osmv_transport_done(p_bo->p_transp_mgr); osmv_txnmgr_done((osm_bind_handle_t) p_bo); cl_spinlock_destroy(&p_bo->lock); - cl_free(p_bo); + free(p_bo); return OSM_BIND_INVALID_HANDLE; } + memset(p_obj, 0, sizeof(cl_list_obj_t)); cl_qlist_set_obj(p_obj, p_bo); cl_qlist_insert_head(&p_vend->bind_handles,&p_obj->list_item); @@ -357,7 +361,7 @@ osm_vendor_unbind(IN osm_bind_handle_t CL_ASSERT(p_item != cl_qlist_end(p_bh_list)); cl_qlist_remove_item(p_bh_list,p_item); - if (p_obj) cl_free(p_obj); + if (p_obj) free(p_obj); if (h_bind != 0) { @@ -398,7 +402,7 @@ osm_vendor_get( IN osm_bind_handle_t h_b } /* allocate it */ - p_mad = ( ib_mad_t * ) cl_zalloc( act_mad_size ); + p_mad = ( ib_mad_t * ) malloc( act_mad_size ); if ( p_mad == NULL ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, @@ -407,6 +411,8 @@ osm_vendor_get( IN osm_bind_handle_t h_b goto Exit; } + memset( p_mad, 0, act_mad_size ); + if ( osm_log_get_level( p_vend->p_log ) >= OSM_LOG_DEBUG ) { osm_log( p_vend->p_log, OSM_LOG_DEBUG, @@ -583,7 +589,7 @@ osm_vendor_put( "osm_vendor_put: " "Retiring MAD %p.\n", p_vw->p_mad ); } - cl_free( p_vw->p_mad ); + free( p_vw->p_mad ); p_vw->p_mad = NULL; OSM_LOG_EXIT( p_vend->p_log ); @@ -708,7 +714,7 @@ __osm_vendor_internal_unbind(osm_bind_ha the client - and the client might use them cl_spinlock_destroy(&p_bo->lock); - cl_free(p_bo); + free(p_bo); */ OSM_LOG_EXIT(p_log); Index: osm/libvendor/osm_vendor_mlx_hca_anafa.c =================================================================== --- osm/libvendor/osm_vendor_mlx_hca_anafa.c (revision 7470) +++ osm/libvendor/osm_vendor_mlx_hca_anafa.c (working copy) @@ -43,11 +43,11 @@ #undef IN #undef OUT +#include #include #include #include -#include #include #include @@ -110,7 +110,7 @@ __osm_ca_info_init (IN osm_vendor_t * co p_ca_info->attr.num_ports = 1; p_ca_info->attr.p_port_attr = - (ib_port_attr_t *) cl_malloc (1 * sizeof (ib_port_attr_t)); + (ib_port_attr_t *) malloc (1 * sizeof (ib_port_attr_t)); port_info.port = 1; ioctl_ret = Index: osm/complib/cl_timer.c =================================================================== --- osm/complib/cl_timer.c (revision 7470) +++ osm/complib/cl_timer.c (working copy) @@ -48,9 +48,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include #include #include #include @@ -83,9 +83,11 @@ __cl_timer_prov_create( void ) { CL_ASSERT( gp_timer_prov == NULL ); - gp_timer_prov = cl_zalloc( sizeof(cl_timer_prov_t) ); + gp_timer_prov = malloc( sizeof(cl_timer_prov_t) ); if( !gp_timer_prov ) return( CL_INSUFFICIENT_MEMORY ); + else + memset( gp_timer_prov, 0, sizeof(cl_timer_prov_t) ); cl_qlist_init( &gp_timer_prov->queue ); @@ -122,7 +124,7 @@ __cl_timer_prov_destroy( void ) pthread_cond_destroy( &gp_timer_prov->cond ); /* Free the memory and reset the global pointer. */ - cl_free( gp_timer_prov ); + free( gp_timer_prov ); gp_timer_prov = NULL; } Index: osm/complib/cl_dispatcher.c =================================================================== --- osm/complib/cl_dispatcher.c (revision 7470) +++ osm/complib/cl_dispatcher.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -51,7 +51,7 @@ # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include @@ -161,7 +161,7 @@ cl_disp_shutdown( /* Free all registration info. */ while( !cl_is_qlist_empty( &p_disp->reg_list ) ) - cl_free( cl_qlist_remove_head( &p_disp->reg_list ) ); + free( cl_qlist_remove_head( &p_disp->reg_list ) ); } /******************************************************************** @@ -253,12 +253,16 @@ cl_disp_register( } /* Get a registration info from the pool. */ - p_reg = (cl_disp_reg_info_t*)cl_zalloc( sizeof(cl_disp_reg_info_t) ); + p_reg = (cl_disp_reg_info_t*)malloc( sizeof(cl_disp_reg_info_t) ); if( !p_reg ) { cl_spinlock_release( &p_disp->lock ); return( NULL ); } + else + { + memset( p_reg, 0, sizeof(cl_disp_reg_info_t) ); + } p_reg->p_disp = p_disp; p_reg->ref_cnt = 0; @@ -276,7 +280,7 @@ cl_disp_register( status = cl_ptr_vector_set( &p_disp->reg_vec, msg_id, p_reg ); if( status != CL_SUCCESS ) { - cl_free( p_reg ); + free( p_reg ); cl_spinlock_release( &p_disp->lock ); return( NULL ); } @@ -323,7 +327,7 @@ cl_disp_unregister( /* Remove the registrant from the list. */ cl_qlist_remove_item( &p_disp->reg_list, (cl_list_item_t*)p_reg ); /* Return the registration info to the pool */ - cl_free( p_reg ); + free( p_reg ); cl_spinlock_release( &p_disp->lock ); } Index: osm/complib/cl_ptr_vector.c =================================================================== --- osm/complib/cl_ptr_vector.c (revision 7470) +++ osm/complib/cl_ptr_vector.c (working copy) @@ -51,9 +51,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include void @@ -113,7 +113,7 @@ cl_ptr_vector_destroy( /* Destroy the page vector. */ if( p_vector->p_ptr_array ) { - cl_free( (void*)p_vector->p_ptr_array ); + free( (void*)p_vector->p_ptr_array ); p_vector->p_ptr_array = NULL; } } @@ -214,9 +214,11 @@ cl_ptr_vector_set_capacity( } /* Allocate our pointer array. */ - p_new_ptr_array = cl_zalloc( new_capacity * sizeof(void*) ); + p_new_ptr_array = malloc( new_capacity * sizeof(void*) ); if( !p_new_ptr_array ) return( CL_INSUFFICIENT_MEMORY ); + else + memset( p_new_ptr_array, 0, new_capacity * sizeof(void*) ); if( p_vector->p_ptr_array ) { @@ -225,7 +227,7 @@ cl_ptr_vector_set_capacity( p_vector->capacity * sizeof(void*) ); /* Free the old pointer array. */ - cl_free( (void*)p_vector->p_ptr_array ); + free( (void*)p_vector->p_ptr_array ); } /* Set the new array. */ Index: osm/complib/cl_perf.c =================================================================== --- osm/complib/cl_perf.c (revision 7470) +++ osm/complib/cl_perf.c (working copy) @@ -51,6 +51,7 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include /* @@ -64,7 +65,6 @@ #include #include -#include @@ -108,10 +108,13 @@ __cl_perf_init( /* Allocate an array of counters. */ p_perf->size = num_counters; p_perf->data_array = (cl_perf_data_t*) - cl_zalloc( sizeof(cl_perf_data_t) * num_counters ); + malloc( sizeof(cl_perf_data_t) * num_counters ); if( !p_perf->data_array ) return( CL_INSUFFICIENT_MEMORY ); + else + memset( p_perf->data_array, 0, + sizeof(cl_perf_data_t) * num_counters ); /* Initialize the user's counters. */ for( i = 0; i < num_counters; i++ ) @@ -223,7 +226,7 @@ __cl_perf_destroy( for( i = 0; i < p_perf->size; i++ ) cl_spinlock_destroy( &p_perf->data_array[i].lock ); - cl_free( p_perf->data_array ); + free( p_perf->data_array ); p_perf->data_array = NULL; p_perf->state = CL_UNINITIALIZED; Index: osm/complib/cl_threadpool.c =================================================================== --- osm/complib/cl_threadpool.c (revision 7470) +++ osm/complib/cl_threadpool.c (working copy) @@ -51,10 +51,10 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include -#include void @@ -151,7 +151,7 @@ cl_thread_pool_init( for( i = 0; i < count; i++ ) { /* Create a new thread. */ - p_thread = (cl_thread_t*)cl_malloc( sizeof(cl_thread_t) ); + p_thread = (cl_thread_t*)malloc( sizeof(cl_thread_t) ); if( !p_thread ) { cl_thread_pool_destroy( p_thread_pool ); @@ -229,7 +229,7 @@ cl_thread_pool_destroy( p_thread = (cl_thread_t*)cl_list_remove_head( &p_thread_pool->thread_list ); cl_thread_destroy( p_thread ); - cl_free( p_thread ); + free( p_thread ); } } Index: osm/complib/cl_vector.c =================================================================== --- osm/complib/cl_vector.c (revision 7470) +++ osm/complib/cl_vector.c (working copy) @@ -51,9 +51,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include /* @@ -316,12 +316,12 @@ cl_vector_destroy( /* Deallocate the pages */ while( !cl_is_qlist_empty( &p_vector->alloc_list ) ) - cl_free( cl_qlist_remove_head( &p_vector->alloc_list ) ); + free( cl_qlist_remove_head( &p_vector->alloc_list ) ); /* Destroy the page vector. */ if( p_vector->p_ptr_array ) { - cl_free( p_vector->p_ptr_array ); + free( p_vector->p_ptr_array ); p_vector->p_ptr_array = NULL; } } @@ -406,9 +406,11 @@ cl_vector_set_capacity( } /* Allocate our pointer array. */ - p_new_ptr_array = cl_zalloc( new_capacity * sizeof(void*) ); + p_new_ptr_array = malloc( new_capacity * sizeof(void*) ); if( !p_new_ptr_array ) return( CL_INSUFFICIENT_MEMORY ); + else + memset( p_new_ptr_array, 0, new_capacity * sizeof(void*) ); if( p_vector->p_ptr_array ) { @@ -417,7 +419,7 @@ cl_vector_set_capacity( p_vector->capacity * sizeof(void*) ); /* Free the old pointer array. */ - cl_free( p_vector->p_ptr_array ); + free( p_vector->p_ptr_array ); } /* Set the new array. */ @@ -431,9 +433,11 @@ cl_vector_set_capacity( /* Determine the allocation size for the new array elements. */ alloc_size = new_elements * p_vector->element_size; - p_buf = (cl_list_item_t*)cl_zalloc( alloc_size + sizeof(cl_list_item_t) ); + p_buf = (cl_list_item_t*)malloc( alloc_size + sizeof(cl_list_item_t) ); if( !p_buf ) return( CL_INSUFFICIENT_MEMORY ); + else + memset( p_buf, 0, alloc_size + sizeof(cl_list_item_t) ); cl_qlist_insert_tail( &p_vector->alloc_list, p_buf ); /* Advance the buffer pointer past the list item. */ Index: osm/complib/cl_event_wheel.c =================================================================== --- osm/complib/cl_event_wheel.c (revision 7470) +++ osm/complib/cl_event_wheel.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -40,7 +40,7 @@ #endif /* HAVE_CONFIG_H */ #include -#include +#include #include #include @@ -130,7 +130,7 @@ __cl_event_wheel_callback( IN void* cont cl_qlist_remove_head(&p_event_wheel->events_wheel); /* delete the event info object - allocated by cl_event_wheel_reg */ - cl_free(p_event); + free(p_event); } else { @@ -330,7 +330,7 @@ cl_event_wheel_destroy( /* remove it from the map */ p_map_item = &(p_event->map_item); cl_qmap_remove_item(&p_event_wheel->events_map, p_map_item); - cl_free(p_event); /* allocated by cl_event_wheel_reg */ + free(p_event); /* allocated by cl_event_wheel_reg */ p_list_item = cl_qlist_remove_head(&p_event_wheel->events_wheel); } @@ -387,7 +387,7 @@ cl_event_wheel_reg( { /* make a new one */ p_event = (cl_event_wheel_reg_info_t *) - cl_malloc( sizeof (cl_event_wheel_reg_info_t) ); + malloc( sizeof (cl_event_wheel_reg_info_t) ); p_event->num_regs = 0; } @@ -504,7 +504,7 @@ cl_event_wheel_unreg( "Removed key:0x%"PRIx64"\n", key ); /* free the item */ - cl_free(p_event); + free(p_event); } else { Index: osm/complib/cl_pool.c =================================================================== --- osm/complib/cl_pool.c (revision 7470) +++ osm/complib/cl_pool.c (working copy) @@ -52,12 +52,12 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include #include #include -#include #include @@ -116,11 +116,14 @@ cl_qcpool_init( * Allocate the array of component sizes and component pointers all * in one allocation. */ - p_pool->component_sizes = (size_t*)cl_zalloc( + p_pool->component_sizes = (size_t*)malloc( (sizeof(size_t) + sizeof(void*)) * num_components ); if( !p_pool->component_sizes ) return( CL_INSUFFICIENT_MEMORY ); + else + memset( p_pool->component_sizes, 0, + (sizeof(size_t) + sizeof(void*)) * num_components ); /* Calculate the pointer to the array of pointers, used for callbacks. */ p_pool->p_components = @@ -213,11 +216,11 @@ cl_qcpool_destroy( /* Free all alocated memory blocks. */ while( !cl_is_qlist_empty( &p_pool->alloc_list ) ) - cl_free( cl_qlist_remove_head( &p_pool->alloc_list ) ); + free( cl_qlist_remove_head( &p_pool->alloc_list ) ); if( p_pool->component_sizes ) { - cl_free( p_pool->component_sizes ); + free( p_pool->component_sizes ); p_pool->component_sizes = NULL; } } @@ -256,11 +259,14 @@ cl_qcpool_grow( /* Allocate the buffer for the new objects. */ p_objects = (uint8_t*) - cl_zalloc( sizeof(cl_list_item_t) + (obj_size * obj_count) ); + malloc( sizeof(cl_list_item_t) + (obj_size * obj_count) ); /* Make sure the allocation succeeded. */ if( !p_objects ) return( CL_INSUFFICIENT_MEMORY ); + else + memset( p_objects, 0, + sizeof(cl_list_item_t) + (obj_size * obj_count) ); /* Insert the allocation in our list. */ cl_qlist_insert_tail( &p_pool->alloc_list, (cl_list_item_t*)p_objects ); Index: osm/osmtest/osmtest.c =================================================================== --- osm/osmtest/osmtest.c (revision 7470) +++ osm/osmtest/osmtest.c (working copy) @@ -64,7 +64,6 @@ #include #endif #include -#include #include "osmtest.h" #ifndef __WIN__ @@ -460,21 +459,21 @@ osmtest_destroy( IN osmtest_t * const p_ { p_item = p_next_item; p_next_item = cl_qmap_next( p_item ); - cl_free( p_item ); + free( p_item ); } p_next_item = cl_qmap_head( &p_osmt->exp_subn.mgrp_mlid_tbl ); while( p_next_item != cl_qmap_end( &p_osmt->exp_subn.mgrp_mlid_tbl ) ) { p_item = p_next_item; p_next_item = cl_qmap_next( p_item ); - cl_free( p_item ); + free( p_item ); } p_next_item = cl_qmap_head( &p_osmt->exp_subn.node_guid_tbl ); while( p_next_item != cl_qmap_end( &p_osmt->exp_subn.node_guid_tbl ) ) { p_item = p_next_item; p_next_item = cl_qmap_next( p_item ); - cl_free( p_item ); + free( p_item ); } p_next_item = cl_qmap_head( &p_osmt->exp_subn.node_lid_tbl ); @@ -482,7 +481,7 @@ osmtest_destroy( IN osmtest_t * const p_ { p_item = p_next_item; p_next_item = cl_qmap_next( p_item ); - cl_free( p_item ); + free( p_item ); } p_next_item = cl_qmap_head( &p_osmt->exp_subn.path_tbl ); @@ -490,14 +489,14 @@ osmtest_destroy( IN osmtest_t * const p_ { p_item = p_next_item; p_next_item = cl_qmap_next( p_item ); - cl_free( p_item ); + free( p_item ); } p_next_item = cl_qmap_head( &p_osmt->exp_subn.port_key_tbl ); while( p_next_item != cl_qmap_end( &p_osmt->exp_subn.port_key_tbl ) ) { p_item = p_next_item; p_next_item = cl_qmap_next( p_item ); - cl_free( p_item ); + free( p_item ); } osm_log_destroy( &p_osmt->log ); Index: osm/osmtest/include/osmtest_subnet.h =================================================================== --- osm/osmtest/include/osmtest_subnet.h (revision 7470) +++ osm/osmtest/include/osmtest_subnet.h (working copy) @@ -1,4 +1,5 @@ /* + * Copyright (c) 2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -47,7 +48,7 @@ #ifndef _OSMTEST_SUBNET_H_ #define _OSMTEST_SUBNET_H_ -#include +#include #include #include #include @@ -121,14 +122,16 @@ node_new( void ) { node_t *p_obj; - p_obj = cl_zalloc( sizeof( *p_obj ) ); + p_obj = malloc( sizeof( *p_obj ) ); + if (p_obj) + memset( p_obj, 0, sizeof( *p_obj ) ); return ( p_obj ); } static inline void node_delete( IN node_t * p_obj ) { - cl_free( p_obj ); + free( p_obj ); } /****s* Subnet Database/port_t @@ -179,14 +182,16 @@ port_new( void ) { port_t *p_obj; - p_obj = cl_zalloc( sizeof( *p_obj ) ); + p_obj = malloc( sizeof( *p_obj ) ); + if (p_obj) + memset( p_obj, 0, sizeof( *p_obj ) ); return ( p_obj ); } static inline void port_delete( IN port_t * p_obj ) { - cl_free( p_obj ); + free( p_obj ); } static inline uint64_t @@ -268,14 +273,16 @@ path_new( void ) { path_t *p_obj; - p_obj = cl_zalloc( sizeof( *p_obj ) ); + p_obj = malloc( sizeof( *p_obj ) ); + if (p_obj) + memset( p_obj, 0, sizeof( *p_obj ) ); return ( p_obj ); } static inline void path_delete( IN path_t * p_obj ) { - cl_free( p_obj ); + free( p_obj ); } /****s* Subnet Database/subnet_t Index: osm/osmtest/osmt_service.c =================================================================== --- osm/osmtest/osmt_service.c (revision 7470) +++ osm/osmtest/osmt_service.c (working copy) @@ -60,7 +60,6 @@ #include #include #include -#include #include "osmtest.h" ib_api_status_t @@ -1055,7 +1054,7 @@ osmt_get_all_services_and_check_names( I OSM_LOG_ENTER(&p_osmt->log, osmt_get_all_services_and_check_names ); /* Prepare tracker for the checked names */ - p_checked_names = (uint8_t*)cl_malloc(sizeof(uint8_t)*num_of_valid_names); + p_checked_names = (uint8_t*)malloc(sizeof(uint8_t)*num_of_valid_names); for (j = 0 ; j < num_of_valid_names ; j++) { p_checked_names[j] = 0; Index: osm/osmtest/main.c =================================================================== --- osm/osmtest/main.c (revision 7470) +++ osm/osmtest/main.c (working copy) @@ -1,6 +1,6 @@ /* + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. - * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -47,7 +47,6 @@ #include #include #include -#include #include "osmtest.h" /******************************************************************** @@ -120,7 +119,6 @@ show_usage( ) " -d0 - Unused.\n" " -d1 - Do not scan/compare path records.\n" " -d2 - Force log flushing after each log message.\n" - " -d3 - Use mem tracking.\n" " Without -d, no debug options are enabled\n\n" ); printf( "-m \n" "--max_lid \n" @@ -307,7 +305,6 @@ main( int argc, uint32_t log_flags = OSM_LOG_ERROR | OSM_LOG_INFO; int32_t vendor_debug=0; char flow_name[64]; - boolean_t mem_track = FALSE; uint32_t next_option; const char *const short_option = "f:l:m:M:d:g:s:t:i:pcvVh"; @@ -559,9 +556,7 @@ main( int argc, opt.force_log_flush = TRUE; break; case 3: - printf( "Use Mem Tracking\n" ); - mem_track = TRUE; - break; + /* Used to be memory tracking */ default: printf( "Unknown value %ld (ignored)\n", strtol( optarg, NULL, 0 ) ); break; @@ -591,7 +586,6 @@ main( int argc, printf( "\tFlow = %s\n", flow_name ); - if (mem_track) __cl_mem_track(TRUE); if (vendor_debug) osm_vendor_set_debug(osm_test.p_vendor, vendor_debug); @@ -634,8 +628,6 @@ main( int argc, } osmtest_destroy( &osm_test ); - if (mem_track) cl_mem_display(); - complib_exit(); Exit: Index: osm/osmtest/osmt_multicast.c =================================================================== --- osm/osmtest/osmt_multicast.c (revision 7470) +++ osm/osmtest/osmt_multicast.c (working copy) @@ -53,7 +53,6 @@ #include #include #include -#include #include #include "osmtest.h" @@ -157,7 +156,7 @@ osmt_query_mcast( IN osmtest_t * const p p_item = p_next_item; p_next_item = cl_qmap_next( p_item ); cl_qmap_remove_item(&p_osmt->exp_subn.mgrp_mlid_tbl,p_item); - cl_free( p_item ); + free( p_item ); } @@ -197,7 +196,7 @@ osmt_query_mcast( IN osmtest_t * const p status = IB_ERROR; goto Exit; } - p_mgrp = (osmtest_mgrp_t*)cl_malloc( sizeof(*p_mgrp) ); + p_mgrp = (osmtest_mgrp_t*)malloc( sizeof(*p_mgrp) ); if (!p_mgrp) { osm_log( &p_osmt->log, OSM_LOG_ERROR, Index: osm/opensm/osm_port.c =================================================================== --- osm/opensm/osm_port.c (revision 7470) +++ osm/opensm/osm_port.c (working copy) @@ -53,7 +53,6 @@ #include #include -#include #include #include #include @@ -88,7 +87,7 @@ osm_physp_destroy( /* free the SL2VL Tables */ num_slvl = cl_ptr_vector_get_size(&p_physp->slvl_by_port); for (i = 0; i < num_slvl; i++) - cl_free(cl_ptr_vector_get(&p_physp->slvl_by_port, i)); + free(cl_ptr_vector_get(&p_physp->slvl_by_port, i)); cl_ptr_vector_destroy(&p_physp->slvl_by_port); /* free the P_Key Tables */ @@ -142,7 +141,9 @@ osm_physp_init( cl_ptr_vector_init( &p_physp->slvl_by_port, num_slvl, 1); for (i = 0; i < num_slvl; i++) { - p_slvl = (ib_slvl_table_t *)cl_zalloc(sizeof(ib_slvl_table_t)); + p_slvl = (ib_slvl_table_t *)malloc(sizeof(ib_slvl_table_t)); + if (p_slvl) + memset(p_slvl, 0, sizeof(ib_slvl_table_t)); cl_ptr_vector_set(&p_physp->slvl_by_port, i, p_slvl); } @@ -238,9 +239,12 @@ osm_port_new( */ size = p_ni->num_ports; - p_port = cl_zalloc( sizeof(*p_port) + sizeof(void *) * size ); + p_port = malloc( sizeof(*p_port) + sizeof(void *) * size ); if( p_port != NULL ) + { + memset( p_port, 0, sizeof(*p_port) + sizeof(void *) * size ); osm_port_init( p_port, p_ni, p_parent_node ); + } return( p_port ); } @@ -706,7 +710,7 @@ osm_physp_replace_dr_path_with_alternate BFS from OSM port until we find the target physp but avoid going through mapped ports */ - p_nextPortsList = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_nextPortsList = (cl_list_t*)malloc(sizeof(cl_list_t)); cl_list_construct( p_nextPortsList ); cl_list_init( p_nextPortsList, 10 ); @@ -741,7 +745,7 @@ osm_physp_replace_dr_path_with_alternate { next_list_is_full = FALSE; p_currPortsList = p_nextPortsList; - p_nextPortsList = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_nextPortsList = (cl_list_t*)malloc(sizeof(cl_list_t)); cl_list_construct( p_nextPortsList ); cl_list_init( p_nextPortsList, 10 ); p_physp = (osm_physp_t*)cl_list_remove_head( p_currPortsList ); @@ -806,13 +810,13 @@ osm_physp_replace_dr_path_with_alternate } } cl_list_destroy( p_currPortsList ); - cl_free(p_currPortsList); + free(p_currPortsList); } /* cleanup */ Exit: cl_list_destroy( p_nextPortsList ); - cl_free( p_nextPortsList ); + free( p_nextPortsList ); cl_map_destroy( &physp_map ); cl_map_destroy( &visited_map ); } Index: osm/opensm/osm_state_mgr.c =================================================================== --- osm/opensm/osm_state_mgr.c (revision 7470) +++ osm/opensm/osm_state_mgr.c (working copy) @@ -54,7 +54,6 @@ #include #include #include -#include #include #include #include @@ -1086,7 +1085,7 @@ osm_topology_file_create( CL_PLOCK_ACQUIRE( p_mgr->p_lock ); file_name = - ( char * )cl_malloc( strlen( p_mgr->p_subn->opt.dump_files_dir ) + 12 ); + ( char * )malloc( strlen( p_mgr->p_subn->opt.dump_files_dir ) + 12 ); CL_ASSERT( file_name ); @@ -1232,7 +1231,7 @@ osm_topology_file_create( fclose( rc ); Exit: - cl_free( file_name ); + free( file_name ); OSM_LOG_EXIT( p_mgr->p_log ); } @@ -1450,7 +1449,7 @@ __process_idle_time_queue_done( } - cl_free( p_process_item ); + free( p_process_item ); OSM_LOG_EXIT( p_mgr->p_log ); return; @@ -2976,7 +2975,7 @@ osm_state_mgr_process_idle( OSM_LOG_ENTER( p_mgr->p_log, osm_state_mgr_process_idle ); - p_idle_item = cl_zalloc( sizeof( osm_idle_item_t ) ); + p_idle_item = malloc( sizeof( osm_idle_item_t ) ); if( p_idle_item == NULL ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, @@ -2985,6 +2984,7 @@ osm_state_mgr_process_idle( return IB_ERROR; } + memset( p_idle_item, 0, sizeof( osm_idle_item_t ) ); p_idle_item->pfn_start = pfn_start; p_idle_item->pfn_done = pfn_done; p_idle_item->context1 = context1; Index: osm/opensm/osm_subnet.c =================================================================== --- osm/opensm/osm_subnet.c (revision 7470) +++ osm/opensm/osm_subnet.c (working copy) @@ -51,8 +51,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include @@ -137,7 +137,7 @@ osm_subn_destroy( { p_rsm = p_next_rsm; p_next_rsm = (osm_remote_sm_t*)cl_qmap_next( &p_rsm->map_item ); - cl_free( p_rsm ); + free( p_rsm ); } p_next_prtn = (osm_prtn_t*)cl_qmap_head( &p_subn->prtn_pkey_tbl ); @@ -634,7 +634,7 @@ __osm_subn_opts_unpack_charp( p_key, p_val_str); printf(buff); cl_log_event("OpenSM", LOG_INFO, buff, NULL, 0); - *p_val = (char *)cl_malloc( strlen(p_val_str) +1 ); + *p_val = (char *)malloc( strlen(p_val_str) +1 ); strcpy( *p_val, p_val_str); } } Index: osm/opensm/osm_db_files.c =================================================================== --- osm/opensm/osm_db_files.c (revision 7470) +++ osm/opensm/osm_db_files.c (working copy) @@ -50,7 +50,6 @@ #include #include #include -#include #include #include @@ -146,8 +145,8 @@ osm_db_domain_destroy( cl_spinlock_destroy( &p_domain_imp->lock ); st_free_table( p_domain_imp->p_hash ); - cl_free( p_domain_imp->file_name ); - cl_free( p_domain_imp ); + free( p_domain_imp->file_name ); + free( p_domain_imp ); } /*************************************************************************** @@ -161,10 +160,10 @@ osm_db_destroy( while ((p_domain = cl_list_remove_head( &p_db->domains )) != NULL ) { osm_db_domain_destroy( p_domain ); - cl_free( p_domain ); + free( p_domain ); } cl_list_destroy( &p_db->domains ); - cl_free( p_db->p_db_imp ); + free( p_db->p_db_imp ); } /*************************************************************************** @@ -179,7 +178,7 @@ osm_db_init( OSM_LOG_ENTER( p_log, osm_db_init ); - p_db_imp = (osm_db_imp_t *)cl_malloc(sizeof(osm_db_imp_t)); + p_db_imp = (osm_db_imp_t *)malloc(sizeof(osm_db_imp_t)); CL_ASSERT( p_db_imp != NULL); p_db_imp->db_dir_name = getenv("OSM_CACHE_DIR"); @@ -233,18 +232,18 @@ osm_db_domain_init( OSM_LOG_ENTER( p_log, osm_db_domain_init ); /* allocate a new domain object */ - p_domain = (osm_db_domain_t *)cl_malloc(sizeof(osm_db_domain_t)); + p_domain = (osm_db_domain_t *)malloc(sizeof(osm_db_domain_t)); CL_ASSERT( p_domain != NULL ); p_domain_imp = - (osm_db_domain_imp_t *)cl_malloc(sizeof(osm_db_domain_imp_t)); + (osm_db_domain_imp_t *)malloc(sizeof(osm_db_domain_imp_t)); CL_ASSERT( p_domain_imp != NULL ); dir_name_len = strlen(((osm_db_imp_t*)p_db->p_db_imp)->db_dir_name); /* set the domain file name */ p_domain_imp->file_name = - (char *)cl_malloc(sizeof(char)*(dir_name_len) + strlen(domain_name) + 2); + (char *)malloc(sizeof(char)*(dir_name_len) + strlen(domain_name) + 2); CL_ASSERT(p_domain_imp->file_name != NULL); strcpy(p_domain_imp->file_name,((osm_db_imp_t*)p_db->p_db_imp)->db_dir_name); strcat(p_domain_imp->file_name,domain_name); @@ -257,8 +256,8 @@ osm_db_domain_init( "osm_db_domain_init: ERR 6102: " " Failed to open the db file:%s\n", p_domain_imp->file_name); - cl_free(p_domain_imp); - cl_free(p_domain); + free(p_domain_imp); + free(p_domain); p_domain = NULL; goto Exit; } @@ -364,19 +363,19 @@ osm_db_restore( goto EndParsing; } - p_key = (char *)cl_malloc(sizeof(char)*(strlen(p_first_word) + 1)); + p_key = (char *)malloc(sizeof(char)*(strlen(p_first_word) + 1)); strcpy(p_key, p_first_word); p_rest_of_line = strtok_r(NULL, "\n", &p_last); if (p_rest_of_line != NULL) { p_accum_val = - (char*)cl_malloc(sizeof(char)*(strlen(p_rest_of_line) + 1)); + (char*)malloc(sizeof(char)*(strlen(p_rest_of_line) + 1)); strcpy(p_accum_val, p_rest_of_line); } else { - p_accum_val = (char*)cl_malloc(2); + p_accum_val = (char*)malloc(2); strcpy(p_accum_val, "\0"); } } @@ -429,9 +428,9 @@ osm_db_restore( /* accumulate into the value */ p_prev_val = p_accum_val; p_accum_val = - (char *)cl_malloc(strlen(p_prev_val) + strlen(sLine) + 1); + (char *)malloc(strlen(p_prev_val) + strlen(sLine) + 1); strcpy(p_accum_val, p_prev_val); - cl_free(p_prev_val); + free(p_prev_val); strcat(p_accum_val, sLine); } } /* in key */ @@ -473,7 +472,7 @@ osm_db_store( p_domain_imp = (osm_db_domain_imp_t *)p_domain->p_domain_imp; p_tmp_file_name = - (char *)cl_malloc(sizeof(char)*(strlen(p_domain_imp->file_name)+8)); + (char *)malloc(sizeof(char)*(strlen(p_domain_imp->file_name)+8)); strcpy(p_tmp_file_name, p_domain_imp->file_name); strcat(p_tmp_file_name,".tmp"); @@ -514,7 +513,7 @@ osm_db_store( } Exit: cl_spinlock_release( &p_domain_imp->lock ); - cl_free(p_tmp_file_name); + free(p_tmp_file_name); OSM_LOG_EXIT( p_log ); return status; } @@ -526,8 +525,8 @@ osm_db_store( int __osm_clear_tbl_entry(st_data_t key, st_data_t val, st_data_t arg) { - cl_free((char*)key); - cl_free((char*)val); + free((char*)key); + free((char*)val); return ST_DELETE; } @@ -625,17 +624,18 @@ osm_db_update( else { /* need to allocate the key */ - p_new_key = cl_malloc(sizeof(char)*(strlen(p_key) + 1)); + p_new_key = malloc(sizeof(char)*(strlen(p_key) + 1)); strcpy(p_new_key, p_key); } /* need to arange a new copy of the value */ - p_new_val = cl_malloc(sizeof(char)*(strlen(p_val) + 1)); + p_new_val = malloc(sizeof(char)*(strlen(p_val) + 1)); strcpy(p_new_val, p_val); st_insert(p_domain_imp->p_hash, (st_data_t)p_new_key, (st_data_t)p_new_val); - if (p_prev_val) cl_free(p_prev_val); + if (p_prev_val) + free(p_prev_val); cl_spinlock_release( &p_domain_imp->lock ); @@ -674,8 +674,8 @@ osm_db_delete( } else { - cl_free(p_key); - cl_free(p_prev_val); + free(p_key); + free(p_prev_val); res = 0; } } Index: osm/opensm/osm_node.c =================================================================== --- osm/opensm/osm_node.c (revision 7470) +++ osm/opensm/osm_node.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -51,7 +51,7 @@ # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include #include @@ -120,9 +120,10 @@ osm_node_new( */ size = p_ni->num_ports; - p_node = cl_zalloc( sizeof(*p_node) + sizeof(osm_physp_t) * size ); + p_node = malloc( sizeof(*p_node) + sizeof(osm_physp_t) * size ); if( p_node != NULL ) { + memset( p_node, 0, sizeof(*p_node) + sizeof(osm_physp_t) * size ); p_node->node_info = *p_ni; p_node->physp_tbl_size = size + 1; @@ -174,7 +175,7 @@ osm_node_delete( IN OUT osm_node_t** const p_node ) { osm_node_destroy( *p_node ); - cl_free( *p_node ); + free( *p_node ); *p_node = NULL; } Index: osm/opensm/osm_mcm_info.c =================================================================== --- osm/opensm/osm_mcm_info.c (revision 7470) +++ osm/opensm/osm_mcm_info.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -51,7 +51,7 @@ # include #endif /* HAVE_CONFIG_H */ -#include +#include #include /********************************************************************** @@ -82,9 +82,10 @@ osm_mcm_info_new( { osm_mcm_info_t* p_mcm; - p_mcm = (osm_mcm_info_t*)cl_zalloc( sizeof(*p_mcm) ); + p_mcm = (osm_mcm_info_t*)malloc( sizeof(*p_mcm) ); if( p_mcm ) { + memset(p_mcm, 0, sizeof(*p_mcm) ); osm_mcm_info_init( p_mcm, mlid ); } @@ -98,6 +99,6 @@ osm_mcm_info_delete( IN osm_mcm_info_t* const p_mcm ) { osm_mcm_info_destroy( p_mcm ); - cl_free( p_mcm ); + free( p_mcm ); } Index: osm/opensm/osm_inform.c =================================================================== --- osm/opensm/osm_inform.c (revision 7470) +++ osm/opensm/osm_inform.c (working copy) @@ -49,9 +49,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include #include #include #include @@ -83,7 +83,7 @@ void osm_infr_destroy( IN osm_infr_t* const p_infr ) { - cl_free(p_infr); + free(p_infr); } /********************************************************************** @@ -112,7 +112,7 @@ osm_infr_new( CL_ASSERT(p_infr_rec); - p_infr = (osm_infr_t*)cl_malloc( sizeof(osm_infr_t) ); + p_infr = (osm_infr_t*)malloc( sizeof(osm_infr_t) ); if( p_infr ) { osm_infr_construct( p_infr ); Index: osm/opensm/osm_service.c =================================================================== --- osm/opensm/osm_service.c (revision 7470) +++ osm/opensm/osm_service.c (working copy) @@ -49,9 +49,8 @@ # include #endif /* HAVE_CONFIG_H */ -#include +#include #include -#include #include #include @@ -70,7 +69,7 @@ void osm_svcr_destroy( IN osm_svcr_t* const p_svcr ) { - cl_free( p_svcr); + free( p_svcr); } /********************************************************************** @@ -102,7 +101,7 @@ osm_svcr_new( CL_ASSERT(p_svc_rec); - p_svcr = (osm_svcr_t*)cl_malloc( sizeof(*p_svcr) ); + p_svcr = (osm_svcr_t*)malloc( sizeof(*p_svcr) ); if( p_svcr ) { osm_svcr_construct( p_svcr ); Index: osm/opensm/osm_switch.c =================================================================== --- osm/opensm/osm_switch.c (revision 7470) +++ osm/opensm/osm_switch.c (working copy) @@ -51,8 +51,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include @@ -104,13 +104,15 @@ osm_switch_init( status = osm_fwd_tbl_init( &p_sw->fwd_tbl, p_si ); - p_sw->p_prof = cl_zalloc( sizeof(*p_sw->p_prof) * num_ports ); + p_sw->p_prof = malloc( sizeof(*p_sw->p_prof) * num_ports ); if( p_sw->p_prof == NULL ) { status = IB_INSUFFICIENT_MEMORY; goto Exit; } + memset( p_sw->p_prof, 0, sizeof(*p_sw->p_prof) * num_ports ); + status = osm_mcast_tbl_init( &p_sw->mcast_tbl, osm_node_get_num_physp( p_node ), cl_ntoh16( p_si->mcast_cap ) ); if( status != IB_SUCCESS ) @@ -131,7 +133,7 @@ osm_switch_destroy( { /* free memory to avoid leaks */ osm_mcast_tbl_destroy( &p_sw->mcast_tbl ); - cl_free( p_sw->p_prof ); + free( p_sw->p_prof ); osm_fwd_tbl_destroy( &p_sw->fwd_tbl ); osm_lid_matrix_destroy( &p_sw->lmx ); } @@ -143,7 +145,7 @@ osm_switch_delete( IN OUT osm_switch_t** const pp_sw ) { osm_switch_destroy( *pp_sw ); - cl_free( *pp_sw ); + free( *pp_sw ); *pp_sw = NULL; } @@ -157,9 +159,10 @@ osm_switch_new( ib_api_status_t status; osm_switch_t *p_sw; - p_sw = (osm_switch_t*)cl_zalloc( sizeof(*p_sw) ); + p_sw = (osm_switch_t*)malloc( sizeof(*p_sw) ); if( p_sw ) { + memset( p_sw, 0, sizeof(*p_sw) ); status = osm_switch_init( p_sw, p_node, p_madw ); if( status != IB_SUCCESS ) osm_switch_delete( &p_sw ); Index: osm/opensm/osm_sminfo_rcv.c =================================================================== --- osm/opensm/osm_sminfo_rcv.c (revision 7470) +++ osm/opensm/osm_sminfo_rcv.c (working copy) @@ -51,9 +51,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include #include #include #include @@ -638,7 +638,7 @@ __osm_sminfo_rcv_process_get_response( p_sm = (osm_remote_sm_t*)cl_qmap_get( p_sm_tbl, port_guid ); if( p_sm == (osm_remote_sm_t*)cl_qmap_end( p_sm_tbl ) ) { - p_sm = cl_malloc( sizeof(*p_sm) ); + p_sm = malloc( sizeof(*p_sm) ); if( p_sm == NULL ) { osm_log( p_rcv->p_log, OSM_LOG_ERROR, Index: osm/opensm/osm_multicast.c =================================================================== --- osm/opensm/osm_multicast.c (revision 7470) +++ osm/opensm/osm_multicast.c (working copy) @@ -49,8 +49,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include @@ -107,7 +107,7 @@ osm_mgrp_destroy( /* destroy the mtree_node structure */ osm_mtree_destroy(p_mgrp->p_root); - cl_free(p_mgrp); + free(p_mgrp); } /********************************************************************** @@ -135,7 +135,7 @@ osm_mgrp_new( { osm_mgrp_t* p_mgrp; - p_mgrp = (osm_mgrp_t*)cl_malloc( sizeof(*p_mgrp) ); + p_mgrp = (osm_mgrp_t*)malloc( sizeof(*p_mgrp) ); if( p_mgrp ) osm_mgrp_init( p_mgrp, mlid ); Index: osm/opensm/osm_mtree.c =================================================================== --- osm/opensm/osm_mtree.c (revision 7470) +++ osm/opensm/osm_mtree.c (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * @@ -50,7 +50,7 @@ # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include @@ -83,8 +83,8 @@ osm_mtree_node_new( { osm_mtree_node_t *p_mtn; - p_mtn = cl_malloc( sizeof(osm_mtree_node_t) + - sizeof(void*) * (osm_switch_get_num_ports( p_sw ) - 1) ); + p_mtn = malloc( sizeof(osm_mtree_node_t) + + sizeof(void*) * (osm_switch_get_num_ports( p_sw ) - 1) ); if( p_mtn != NULL ) osm_mtree_node_init( p_mtn, p_sw ); @@ -109,7 +109,7 @@ osm_mtree_destroy( (p_mtn->child_array[i] != OSM_MTREE_LEAF) ) osm_mtree_destroy(p_mtn->child_array[i]); - cl_free( p_mtn ); + free( p_mtn ); } /********************************************************************** Index: osm/opensm/osm_mcast_mgr.c =================================================================== --- osm/opensm/osm_mcast_mgr.c (revision 7470) +++ osm/opensm/osm_mcast_mgr.c (working copy) @@ -51,9 +51,9 @@ #endif /* HAVE_CONFIG_H */ #include +#include #include #include -#include #include #include #include @@ -88,9 +88,12 @@ __osm_mcast_work_obj_new( qlist. see cl_qlist_insert_tail(): CL_ASSERT(p_list_item->p_list != p_list) */ - p_obj = cl_zalloc( sizeof( *p_obj ) ); + p_obj = malloc( sizeof( *p_obj ) ); if( p_obj ) + { + memset( p_obj, 0, sizeof( *p_obj ) ); p_obj->p_port = (osm_port_t*)p_port; + } return( p_obj ); } @@ -101,7 +104,7 @@ static void __osm_mcast_work_obj_delete( IN osm_mcast_work_obj_t* p_wobj ) { - cl_free( p_wobj ); + free( p_wobj ); } /********************************************************************** @@ -123,7 +126,7 @@ __osm_mcast_mgr_purge_tree_node( } - cl_free( p_mtn ); + free( p_mtn ); } /********************************************************************** @@ -738,7 +741,7 @@ __osm_mcast_mgr_branch( TO DO - this list array could probably be moved inside the switch element to save on malloc thrashing. */ - list_array = cl_zalloc( sizeof(cl_qlist_t) * max_children ); + list_array = malloc( sizeof(cl_qlist_t) * max_children ); if( list_array == NULL ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, @@ -748,6 +751,8 @@ __osm_mcast_mgr_branch( goto Exit; } + memset( list_array, 0, sizeof(cl_qlist_t) * max_children ); + for( i = 0; i < max_children; i++ ) cl_qlist_init( &list_array[i] ); @@ -875,7 +880,7 @@ __osm_mcast_mgr_branch( } } - cl_free( list_array ); + free( list_array ); Exit: OSM_LOG_EXIT( p_mgr->p_log ); return( p_mtn ); @@ -1395,7 +1400,7 @@ osm_mcast_mgr_dump_mcast_routes( goto Exit; file_name = - (char*)cl_malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 12); + (char*)malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 12); CL_ASSERT(file_name); @@ -1457,7 +1462,7 @@ osm_mcast_mgr_dump_mcast_routes( Exit: if (file_name) - cl_free(file_name); + free(file_name); OSM_LOG_EXIT( p_mgr->p_log ); } @@ -1639,7 +1644,7 @@ osm_mcast_mgr_process_mgrp_cb( memcpy(&mlid, &p_ctxt->mlid, sizeof(mlid)); /* we can destroy the context now */ - cl_free(p_ctxt); + free(p_ctxt); /* we need a lock to make sure the p_mgrp is not change other ways */ CL_PLOCK_EXCL_ACQUIRE( p_mgr->p_lock ); Index: osm/opensm/osm_sm.c =================================================================== --- osm/opensm/osm_sm.c (revision 7470) +++ osm/opensm/osm_sm.c (working copy) @@ -55,8 +55,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include @@ -261,7 +261,7 @@ osm_sm_destroy( cl_event_destroy( &p_sm->subnet_up_event ); if( p_sm->p_report_buf != NULL ) - cl_free( p_sm->p_report_buf ); + free( p_sm->p_report_buf ); osm_log( p_sm->p_log, OSM_LOG_SYS, "Exiting SM\n" ); /* Format Waived */ OSM_LOG_EXIT( p_sm->p_log ); @@ -295,7 +295,7 @@ osm_sm_init( p_sm->p_disp = p_disp; p_sm->p_lock = p_lock; - p_sm->p_report_buf = cl_malloc( OSM_REPORT_BUF_SIZE ); + p_sm->p_report_buf = malloc( OSM_REPORT_BUF_SIZE ); if( p_sm->p_report_buf == NULL ) { osm_log( p_sm->p_log, OSM_LOG_ERROR, @@ -596,7 +596,7 @@ __osm_sm_mgrp_connect( * isn't busy trying to do something else. */ ctx2 = - ( osm_mcast_mgr_ctxt_t * ) cl_malloc( sizeof( osm_mcast_mgr_ctxt_t ) ); + ( osm_mcast_mgr_ctxt_t * ) malloc( sizeof( osm_mcast_mgr_ctxt_t ) ); memcpy( &ctx2->mlid, &p_mgrp->mlid, sizeof( p_mgrp->mlid ) ); ctx2->req_type = req_type; ctx2->port_guid = port_guid; @@ -629,7 +629,7 @@ __osm_sm_mgrp_disconnect( * isn't busy trying to do something else. */ ctx2 = - ( osm_mcast_mgr_ctxt_t * ) cl_malloc( sizeof( osm_mcast_mgr_ctxt_t ) ); + ( osm_mcast_mgr_ctxt_t * ) malloc( sizeof( osm_mcast_mgr_ctxt_t ) ); memcpy( &ctx2->mlid, &p_mgrp->mlid, sizeof( p_mgrp->mlid ) ); ctx2->req_type = OSM_MCAST_REQ_TYPE_LEAVE; ctx2->port_guid = port_guid; Index: osm/opensm/osm_lin_fwd_tbl.c =================================================================== --- osm/opensm/osm_lin_fwd_tbl.c (revision 7470) +++ osm/opensm/osm_lin_fwd_tbl.c (working copy) @@ -51,8 +51,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include @@ -78,7 +78,7 @@ osm_lin_tbl_new( so add 1 to the end of the range here for this assert. */ CL_ASSERT( size <= IB_LID_UCAST_END_HO + 1 ); - p_tbl = (osm_lin_fwd_tbl_t*)cl_malloc( + p_tbl = (osm_lin_fwd_tbl_t*)malloc( __osm_lin_tbl_compute_obj_size( size ) ); /* @@ -98,6 +98,6 @@ void osm_lin_tbl_delete( IN osm_lin_fwd_tbl_t** const pp_tbl ) { - cl_free( *pp_tbl ); + free( *pp_tbl ); *pp_tbl = NULL; } Index: osm/opensm/osm_prtn.c =================================================================== --- osm/opensm/osm_prtn.c (revision 7470) +++ osm/opensm/osm_prtn.c (working copy) @@ -49,12 +49,12 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include #include #include -#include #include #include #include @@ -73,9 +73,11 @@ osm_prtn_t* osm_prtn_new( IN const char *name, IN const uint16_t pkey ) { - osm_prtn_t *p = cl_zalloc(sizeof(*p)); + osm_prtn_t *p = malloc(sizeof(*p)); if (!p) return NULL; + + memset(p, 0, sizeof(*p)); p->pkey = pkey; cl_map_construct(&p->full_guid_tbl); cl_map_init(&p->full_guid_tbl, 32); @@ -99,7 +101,7 @@ void osm_prtn_delete( cl_map_destroy(&p->full_guid_tbl); cl_map_remove_all(&p->part_guid_tbl); cl_map_destroy(&p->part_guid_tbl); - cl_free(p); + free(p); *pp_prtn = NULL; } Index: osm/opensm/osm_ucast_mgr.c =================================================================== --- osm/opensm/osm_ucast_mgr.c (revision 7470) +++ osm/opensm/osm_ucast_mgr.c (working copy) @@ -55,9 +55,9 @@ #endif /* HAVE_CONFIG_H */ #include +#include #include #include -#include #include #include #include @@ -231,7 +231,7 @@ osm_ucast_mgr_dump_ucast_routes( goto Exit; file_name = - (char*)cl_malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 10); + (char*)malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 10); CL_ASSERT(file_name); @@ -332,7 +332,7 @@ osm_ucast_mgr_dump_ucast_routes( Exit: if (file_name) - cl_free(file_name); + free(file_name); OSM_LOG_EXIT( p_mgr->p_log ); } @@ -642,7 +642,7 @@ __osm_ucast_mgr_process_port( OSM_LOG_ENTER( p_mgr->p_log, __osm_ucast_mgr_process_port ); - remote_sys_guids = cl_zalloc( sizeof(uint64_t) * lids_per_port ); + remote_sys_guids = malloc( sizeof(uint64_t) * lids_per_port ); if( remote_sys_guids == NULL ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, @@ -651,7 +651,9 @@ __osm_ucast_mgr_process_port( goto Exit; } - remote_node_guids = cl_zalloc( sizeof(uint64_t) * lids_per_port ); + memset( remote_sys_guids, 0, sizeof(uint64_t) * lids_per_port ); + + remote_node_guids = malloc( sizeof(uint64_t) * lids_per_port ); if( remote_node_guids == NULL ) { osm_log( p_mgr->p_log, OSM_LOG_ERROR, @@ -660,6 +662,8 @@ __osm_ucast_mgr_process_port( goto Exit; } + memset(remote_node_guids, 0, sizeof(uint64_t) * lids_per_port ); + osm_port_get_lid_range_ho( p_port, &min_lid_ho, &max_lid_ho ); /* If the lids are zero - then there was some problem with the initialization. @@ -789,9 +793,9 @@ __osm_ucast_mgr_process_port( Exit: if (remote_sys_guids) - cl_free(remote_sys_guids); + free(remote_sys_guids); if (remote_node_guids) - cl_free(remote_node_guids); + free(remote_node_guids); OSM_LOG_EXIT( p_mgr->p_log ); } Index: osm/opensm/osm_ucast_updn.c =================================================================== --- osm/opensm/osm_ucast_updn.c (revision 7470) +++ osm/opensm/osm_ucast_updn.c (working copy) @@ -50,7 +50,7 @@ # include #endif /* HAVE_CONFIG_H */ -#include +#include #include #include #include @@ -118,13 +118,15 @@ __updn_create_updn_next_step_t(IN updn_s { updn_next_step_t *p_next_step; - p_next_step = (updn_next_step_t*) cl_zalloc(sizeof(*p_next_step)); - CL_ASSERT (p_next_step != NULL); + p_next_step = (updn_next_step_t*) malloc(sizeof(*p_next_step)); + if (p_next_step) + { + memset(p_next_step, 0, sizeof(*p_next_step)); + p_next_step->state = state; + p_next_step->p_sw = p_sw; + } - p_next_step->state = state; - p_next_step->p_sw = p_sw; return p_next_step; - } /********************************************************************** @@ -142,7 +144,7 @@ __updn_update_rank( p_updn_rank = (updn_rank_t*) cl_qmap_get(p_guid_rank_tbl, guid_index); if (p_updn_rank == (updn_rank_t*) cl_qmap_end(p_guid_rank_tbl)) { - p_updn_rank = (updn_rank_t*) cl_malloc(sizeof(updn_rank_t)); + p_updn_rank = (updn_rank_t*) malloc(sizeof(updn_rank_t)); CL_ASSERT (p_updn_rank); p_updn_rank->rank = rank; @@ -181,7 +183,7 @@ __updn_bfs_by_node(IN osm_subn_t *p_subn OSM_LOG_ENTER( &(osm.log), __updn_bfs_by_node); /* Init the list pointers */ - p_nextList = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t)); cl_list_construct( p_nextList ); cl_list_init( p_nextList, 10 ); p_currList = p_nextList; @@ -293,7 +295,7 @@ __updn_bfs_by_node(IN osm_subn_t *p_subn "__updn_bfs_by_node:" "Starting a new iteration with %d elements in current list\n", cl_list_count(p_currList)); /* Init the switch directed list */ - p_nextList = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t)); cl_list_construct( p_nextList ); cl_list_init( p_nextList, 10 ); /* Go over all current list items till it's empty */ @@ -433,19 +435,19 @@ __updn_bfs_by_node(IN osm_subn_t *p_subn } } } - cl_free (p_updn_switch); + free (p_updn_switch); p_updn_switch = (updn_next_step_t*)cl_list_remove_head( p_currList ); } /* Cleanup p_currList */ cl_list_destroy( p_currList ); - cl_free (p_currList); + free (p_currList); /* Reassign p_currList to p_nextList */ p_currList = p_nextList; } /* Cleanup p_currList - Had the pointer to cl_list_t */ cl_list_destroy( p_currList ); - cl_free (p_currList); + free (p_currList); osm_log(&(osm.log), OSM_LOG_VERBOSE, "__updn_bfs_by_node:" "BFS the subnet ]\n"); @@ -471,22 +473,22 @@ updn_destroy( "guid = 0x%" PRIx64 " rank = %u\n", cl_ntoh64(cl_qmap_key(p_map_item)), ((updn_rank_t *)p_map_item)->rank); cl_qmap_remove_item( &p_updn->guid_rank_tbl, p_map_item); - cl_free( (updn_rank_t *)p_map_item); + free( (updn_rank_t *)p_map_item); p_map_item = cl_qmap_head( &p_updn->guid_rank_tbl); } /* free the array of guids */ if (p_updn->updn_ucast_reg_inputs.guid_list) - cl_free(p_updn->updn_ucast_reg_inputs.guid_list); + free(p_updn->updn_ucast_reg_inputs.guid_list); /* destroy the list of root nodes */ while ((p_guid_list_item = cl_list_remove_head( p_updn->p_root_nodes ))) - cl_free( p_guid_list_item ); + free( p_guid_list_item ); cl_list_remove_all( p_updn->p_root_nodes ); cl_list_destroy( p_updn->p_root_nodes ); - cl_free ( p_updn->p_root_nodes ); - cl_free (p_updn); + free ( p_updn->p_root_nodes ); + free (p_updn); } updn_t* @@ -495,7 +497,9 @@ updn_construct(void) updn_t* p_updn; OSM_LOG_ENTER( &(osm.log), updn_construct); - p_updn = cl_zalloc(sizeof(updn_t)); + p_updn = malloc(sizeof(updn_t)); + if (p_updn) + memset(p_updn, 0, sizeof(updn_t)); OSM_LOG_EXIT( &(osm.log) ); return(p_updn); } @@ -522,7 +526,7 @@ updn_init( } p_updn->state = UPDN_INIT; cl_qmap_init( &p_updn->guid_rank_tbl); - p_list = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_list = (cl_list_t*)malloc(sizeof(cl_list_t)); if (!p_list) { status = IB_ERROR; @@ -563,7 +567,7 @@ updn_init( /* Skip Empty Lines anywhere in the file - only one char means the Null termination */ if (strlen(line) > 1) { - p_tmp = cl_malloc(sizeof(uint64_t)); + p_tmp = malloc(sizeof(uint64_t)); *p_tmp = strtoull(line, NULL, 16); cl_list_insert_tail(osm.p_updn_ucast_routing->p_root_nodes, p_tmp); } @@ -636,7 +640,7 @@ updn_subn_rank( "Ranking starts from GUID 0x%" PRIx64 "\n", root_guid); /* Init the list pointers */ - p_nextList = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t)); cl_list_construct( p_nextList ); cl_list_init( p_nextList, 10 ); p_currList = p_nextList; @@ -691,7 +695,7 @@ updn_subn_rank( while (!cl_is_list_empty(p_currList)) { rank++; - p_nextList = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_nextList = (cl_list_t*)malloc(sizeof(cl_list_t)); cl_list_construct( p_nextList ); cl_list_init( p_nextList, 10 ); p_physp = (osm_physp_t*)cl_list_remove_head( p_currList ); @@ -749,7 +753,7 @@ updn_subn_rank( } /* First free the allocation of cl_list pointer then reallocate */ cl_list_destroy( p_currList ); - cl_free(p_currList); + free(p_currList); /* p_currList is empty - need to assign it to p_nextList */ p_currList = p_nextList; } @@ -759,7 +763,7 @@ updn_subn_rank( "BFS the subnet ]\n"); cl_list_destroy( p_currList ); - cl_free(p_currList); + free(p_currList); /* Print Summary of ranking */ osm_log(&(osm.log), OSM_LOG_VERBOSE, @@ -901,7 +905,7 @@ osm_subn_calc_up_down_min_hop_table( "guid = 0x%" PRIx64 " rank = %u\n", cl_ntoh64(cl_qmap_key(p_map_item)), ((updn_rank_t *)p_map_item)->rank); cl_qmap_remove_item( &p_updn->guid_rank_tbl, p_map_item); - cl_free( (updn_rank_t *)p_map_item); + free( (updn_rank_t *)p_map_item); p_map_item = cl_qmap_head( &p_updn->guid_rank_tbl); } @@ -954,15 +958,18 @@ void __osm_updn_convert_list2array(IN up p_updn->updn_ucast_reg_inputs.num_guids = cl_list_count( p_updn->p_root_nodes); if (p_updn->updn_ucast_reg_inputs.guid_list) - cl_free(p_updn->updn_ucast_reg_inputs.guid_list); - p_updn->updn_ucast_reg_inputs.guid_list = (uint64_t *)cl_zalloc( + free(p_updn->updn_ucast_reg_inputs.guid_list); + p_updn->updn_ucast_reg_inputs.guid_list = (uint64_t *)malloc( p_updn->updn_ucast_reg_inputs.num_guids*sizeof(uint64_t)); + if (p_updn->updn_ucast_reg_inputs.guid_list) + memset(p_updn->updn_ucast_reg_inputs.guid_list, 0, + p_updn->updn_ucast_reg_inputs.num_guids*sizeof(uint64_t)); if (!cl_is_list_empty(p_updn->p_root_nodes)) { while( (p_guid = (uint64_t*)cl_list_remove_head(p_updn->p_root_nodes)) ) { p_updn->updn_ucast_reg_inputs.guid_list[i] = *p_guid; - cl_free(p_guid); + free(p_guid); i++; } max_num = i; @@ -1033,7 +1040,7 @@ osm_updn_find_root_nodes_by_min_hop( OUT cl_map_init( &ca_by_lid_map, 10 ); /* EZ: - p_ca_list = (cl_list_t*)cl_malloc(sizeof(cl_list_t)); + p_ca_list = (cl_list_t*)malloc(sizeof(cl_list_t)); cl_list_construct( p_ca_list ); cl_list_init( p_ca_list, 10 ); */ @@ -1052,7 +1059,7 @@ osm_updn_find_root_nodes_by_min_hop( OUT self_lid_ho = cl_ntoh16( osm_physp_get_base_lid(p_physp) ); numCas++; /* EZ: - self = cl_malloc(sizeof(uint16_t)); + self = malloc(sizeof(uint16_t)); *self = self_lid_ho; cl_list_insert_tail(p_ca_list, self); */ @@ -1120,7 +1127,7 @@ osm_updn_find_root_nodes_by_min_hop( OUT if ( p_updn_hist == (updn_hist_t*)cl_qmap_end( &min_hop_hist)) { /* New entry in the histogram , first create it */ - p_updn_hist = (updn_hist_t*) cl_malloc(sizeof(updn_hist_t)); + p_updn_hist = (updn_hist_t*) malloc(sizeof(updn_hist_t)); CL_ASSERT (p_updn_hist); p_updn_hist->bar_value = 1; cl_qmap_insert(&min_hop_hist, (uint64_t)hop_val, &p_updn_hist->map_item); @@ -1176,14 +1183,14 @@ osm_updn_find_root_nodes_by_min_hop( OUT while ( p_updn_hist != (updn_hist_t*)cl_qmap_end( &min_hop_hist ) ) { cl_qmap_remove_item( &min_hop_hist, (cl_map_item_t*)p_updn_hist ); - cl_free( p_updn_hist ); + free( p_updn_hist ); p_updn_hist = (updn_hist_t*) cl_qmap_head( &min_hop_hist ); } /* If thd conditions are valid insert the root node to the list */ if ( (numHopBarsOverThd1 == 1) && (numHopBarsOverThd2 == 1) ) { - p_guid = cl_malloc(sizeof(uint64_t)); + p_guid = malloc(sizeof(uint64_t)); *p_guid = cl_ntoh64(osm_node_get_node_guid(p_sw->p_node)); osm_log (&(osm.log), OSM_LOG_DEBUG, "osm_updn_find_root_nodes_by_min_hop: " Index: osm/opensm/osm_mcast_tbl.c =================================================================== --- osm/opensm/osm_mcast_tbl.c (revision 7470) +++ osm/opensm/osm_mcast_tbl.c (working copy) @@ -51,8 +51,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include #include #include @@ -102,12 +102,14 @@ osm_mcast_tbl_init( since it is (and must be) defined that way the table structure in order to create a pointer to a two dimensional array. */ - p_tbl->p_mask_tbl = cl_zalloc( p_tbl->num_entries * - (IB_MCAST_POSITION_MAX + 1) * IB_MCAST_MASK_SIZE / 8 ); + p_tbl->p_mask_tbl = malloc( p_tbl->num_entries * + (IB_MCAST_POSITION_MAX + 1) * IB_MCAST_MASK_SIZE / 8 ); if( p_tbl->p_mask_tbl == NULL ) return( IB_INSUFFICIENT_MEMORY ); + memset(p_tbl->p_mask_tbl, 0, + p_tbl->num_entries * (IB_MCAST_POSITION_MAX + 1) * IB_MCAST_MASK_SIZE / 8 ); return( IB_SUCCESS ); } @@ -117,7 +119,7 @@ void osm_mcast_tbl_destroy( IN osm_mcast_tbl_t* const p_tbl ) { - cl_free( p_tbl->p_mask_tbl ); + free( p_tbl->p_mask_tbl ); } /********************************************************************** Index: osm/opensm/osm_pkey.c =================================================================== --- osm/opensm/osm_pkey.c (revision 7470) +++ osm/opensm/osm_pkey.c (working copy) @@ -52,7 +52,6 @@ #include #include #include -#include #include #include #include @@ -81,12 +80,12 @@ void osm_pkey_tbl_destroy( num_blocks = (uint16_t)(cl_ptr_vector_get_size( &p_pkey_tbl->blocks )); for (i = 0; i < num_blocks; i++) - cl_free(cl_ptr_vector_get( &p_pkey_tbl->blocks, i )); + free(cl_ptr_vector_get( &p_pkey_tbl->blocks, i )); cl_ptr_vector_destroy( &p_pkey_tbl->blocks ); num_blocks = (uint16_t)(cl_ptr_vector_get_size( &p_pkey_tbl->new_blocks )); for (i = 0; i < num_blocks; i++) - cl_free(cl_ptr_vector_get( &p_pkey_tbl->new_blocks, i )); + free(cl_ptr_vector_get( &p_pkey_tbl->new_blocks, i )); cl_ptr_vector_destroy( &p_pkey_tbl->new_blocks ); cl_map_remove_all( &p_pkey_tbl->keys ); @@ -120,9 +119,10 @@ void osm_pkey_tbl_sync_new_blocks( if ( b < new_blocks ) p_new_block = cl_ptr_vector_get(&p_pkey_tbl->new_blocks, b); else { - p_new_block = (ib_pkey_table_t *)cl_zalloc(sizeof(*p_new_block)); + p_new_block = (ib_pkey_table_t *)malloc(sizeof(*p_new_block)); if (!p_new_block) break; + memset(p_new_block, 0, sizeof(*p_new_block)); cl_ptr_vector_set(&((osm_pkey_tbl_t *)p_pkey_tbl)->new_blocks, b, p_new_block); } memcpy(p_new_block, p_block, sizeof(*p_new_block)); @@ -150,7 +150,9 @@ int osm_pkey_tbl_set( if ( !p_pkey_block ) { - p_pkey_block = (ib_pkey_table_t *)cl_zalloc(sizeof(ib_pkey_table_t)); + p_pkey_block = (ib_pkey_table_t *)malloc(sizeof(ib_pkey_table_t)); + if (p_pkey_block) + memset(p_pkey_block, 0, sizeof(ib_pkey_table_t)); cl_ptr_vector_set( &p_pkey_tbl->blocks, block, p_pkey_block ); } Index: osm/opensm/osm_mcm_port.c =================================================================== --- osm/opensm/osm_mcm_port.c (revision 7470) +++ osm/opensm/osm_mcm_port.c (working copy) @@ -51,8 +51,8 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include -#include #include /********************************************************************** @@ -105,7 +105,7 @@ osm_mcm_port_new( { osm_mcm_port_t* p_mcm; - p_mcm = cl_malloc( sizeof(*p_mcm) ); + p_mcm = malloc( sizeof(*p_mcm) ); if( p_mcm ) { osm_mcm_port_init( p_mcm, p_port_gid, @@ -124,5 +124,5 @@ osm_mcm_port_delete( CL_ASSERT( p_mcm ); osm_mcm_port_destroy( p_mcm ); - cl_free( p_mcm ); + free( p_mcm ); } Index: osm/opensm/osm_log.c =================================================================== --- osm/opensm/osm_log.c (revision 7470) +++ osm/opensm/osm_log.c (working copy) @@ -58,7 +58,6 @@ #include #include #include -#include #ifndef WIN32 #include @@ -142,15 +141,6 @@ osm_log( /* SYS messages go to the log anyways */ if (p_log->level & verbosity) { -#ifdef _MEM_DEBUG_MODE_ - /* If we are running in MEM_DEBUG_MODE then - the cl_mem_check will be called on every run */ - if (cl_mem_check() == FALSE) - { - fprintf( p_log->out_port, "*** MEMORY ERROR!!! ***\n" ); - CL_ASSERT(0); - } -#endif va_start( args, p_str ); vsprintf( buffer, p_str, args ); Index: osm/opensm/osm_db_pack.c =================================================================== --- osm/opensm/osm_db_pack.c (revision 7470) +++ osm/opensm/osm_db_pack.c (working copy) @@ -40,9 +40,9 @@ #endif /* HAVE_CONFIG_H */ #include -#include #include #include + static inline void __osm_pack_guid(uint64_t guid, char *p_guid_str) { @@ -110,7 +110,7 @@ osm_db_guid2lid_guids( while ( (p_key = cl_list_remove_head( &keys )) != NULL ) { - p_guid_elem = (osm_db_guid_elem_t*)cl_malloc(sizeof(osm_db_guid_elem_t)); + p_guid_elem = (osm_db_guid_elem_t*)malloc(sizeof(osm_db_guid_elem_t)); CL_ASSERT( p_guid_elem != NULL ); p_guid_elem->guid = __osm_unpack_guid(p_key); Index: osm/opensm/main.c =================================================================== --- osm/opensm/main.c (revision 7470) +++ osm/opensm/main.c (working copy) @@ -59,7 +59,6 @@ #include #include #include -#include #include #include #include @@ -292,7 +291,6 @@ show_usage(void) " -d1 - Force single threaded dispatching\n" " -d2 - Force log flushing after each log message\n" " -d3 - Disable multicast support\n" - " -d4 - Put OpenSM in memory tracking mode\n" " -d10 - Put OpenSM in testability mode\n" " Without -d, no debug options are enabled\n\n" ); printf( "-h\n" @@ -518,7 +516,6 @@ main( uint32_t log_flags = OSM_LOG_DEFAULT_LEVEL; uint32_t temp, dbg_lvl; boolean_t run_once_flag = FALSE; - boolean_t mem_track = FALSE; int32_t vendor_debug = 0; uint32_t next_option; #if 0 @@ -692,10 +689,10 @@ main( printf(" Debug mode: Disable multicast support\n"); opt.disable_multicast = TRUE; } - else if(dbg_lvl == 4) - { - mem_track = TRUE; - } + /* + * NOTE: Debug level 4 used to be used for memory tracking + * but this is now deprecated + */ else if(dbg_lvl == 5) { vendor_debug++; @@ -825,9 +822,6 @@ main( /* Done with options description */ printf("-------------------------------------------------\n"); - if (mem_track) - __cl_mem_track(TRUE); - opt.log_flags = log_flags; if (vendor_debug) @@ -952,8 +946,6 @@ main( Exit: osm_opensm_destroy( &osm ); - if (mem_track) cl_mem_display(); - complib_exit(); exit( 0 ); Index: osm/opensm/osm_sa_mcmember_record.c =================================================================== --- osm/opensm/osm_sa_mcmember_record.c (revision 7470) +++ osm/opensm/osm_sa_mcmember_record.c (working copy) @@ -55,9 +55,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include #include #include #include @@ -325,7 +325,9 @@ __get_new_mlid( /* track all used mlids in the array (by mlid index) */ used_mlids_array = - (uint8_t *)cl_zalloc(sizeof(uint8_t)*max_num_mlids); + (uint8_t *)malloc(sizeof(uint8_t)*max_num_mlids); + if (used_mlids_array) + memset(used_mlids_array, 0, sizeof(uint8_t)*max_num_mlids); if (!used_mlids_array) return 0; @@ -383,7 +385,7 @@ __get_new_mlid( mlid = 0; } - cl_free(used_mlids_array); + free(used_mlids_array); Exit: OSM_LOG_EXIT(p_rcv->p_log); Index: osm/opensm/osm_drop_mgr.c =================================================================== --- osm/opensm/osm_drop_mgr.c (revision 7470) +++ osm/opensm/osm_drop_mgr.c (working copy) @@ -53,7 +53,6 @@ #include #include -#include #include #include #include @@ -196,7 +195,7 @@ __osm_drop_mgr_remove_port( "__osm_drop_mgr_remove_port: " "Cleaned sm for port guid\n" ); - cl_free(p_sm); + free(p_sm); } osm_port_get_lid_range_ho( p_port, &min_lid_ho, &max_lid_ho ); Index: osm/opensm/osm_lid_mgr.c =================================================================== --- osm/opensm/osm_lid_mgr.c (revision 7470) +++ osm/opensm/osm_lid_mgr.c (working copy) @@ -90,9 +90,9 @@ # include #endif /* HAVE_CONFIG_H */ +#include #include #include -#include #include #include #include @@ -137,7 +137,7 @@ osm_lid_mgr_destroy( p_item = cl_qlist_remove_head( &p_mgr->free_ranges ); while ( p_item != cl_qlist_end( &p_mgr->free_ranges ) ) { - cl_free((osm_lid_mgr_range_t *)p_item); + free((osm_lid_mgr_range_t *)p_item); p_item = cl_qlist_remove_head( &p_mgr->free_ranges ); } OSM_LOG_EXIT( p_mgr->p_log ); @@ -399,7 +399,7 @@ __osm_lid_mgr_init_sweep( p_item = cl_qlist_remove_head( &p_mgr->free_ranges ); while ( p_item != cl_qlist_end( &p_mgr->free_ranges ) ) { - cl_free( (osm_lid_mgr_range_t *)p_item ); + free( (osm_lid_mgr_range_t *)p_item ); p_item = cl_qlist_remove_head( &p_mgr->free_ranges ); } @@ -417,7 +417,7 @@ __osm_lid_mgr_init_sweep( "__osm_lid_mgr_init_sweep: " "Skipping all lids as we are reassigning them\n"); p_range = - (osm_lid_mgr_range_t *)cl_malloc(sizeof(osm_lid_mgr_range_t)); + (osm_lid_mgr_range_t *)malloc(sizeof(osm_lid_mgr_range_t)); p_range->min_lid = 1; goto AfterScanningLids; } @@ -596,7 +596,7 @@ __osm_lid_mgr_init_sweep( else { p_range = - (osm_lid_mgr_range_t *)cl_malloc(sizeof(osm_lid_mgr_range_t)); + (osm_lid_mgr_range_t *)malloc(sizeof(osm_lid_mgr_range_t)); p_range->min_lid = lid; p_range->max_lid = lid; } @@ -622,7 +622,7 @@ __osm_lid_mgr_init_sweep( if (!p_range) { p_range = - (osm_lid_mgr_range_t *)cl_malloc(sizeof(osm_lid_mgr_range_t)); + (osm_lid_mgr_range_t *)malloc(sizeof(osm_lid_mgr_range_t)); /* The p_range can be NULL in one of 2 cases: 1. If max_defined_lid == 0. In this case, we want the entire range. From eitan at mellanox.co.il Thu May 25 06:21:24 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 25 May 2006 16:21:24 +0300 Subject: [openib-general] RE: [PATCHv2] OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_t:Fix NULL ptr issue Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302368722@mtlexch01.mtl.com> Thanks Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Thursday, May 25, 2006 1:26 PM > To: openib-general at openib.org > Cc: Eitan Zahavi > Subject: [PATCHv2] > OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_t:Fix NULL ptr issue > > OpenSM/osm_ucast_updn.c::__updn_create_updn_next_step_t: Fix NULL ptr > issue in non debug builds > > Signed-off-by: Hal Rosenstock > > Index: opensm/osm_ucast_updn.c > =================================================================== > --- opensm/osm_ucast_updn.c (revision 7435) > +++ opensm/osm_ucast_updn.c (working copy) > @@ -119,12 +119,12 @@ __updn_create_updn_next_step_t(IN updn_s > updn_next_step_t *p_next_step; > > p_next_step = (updn_next_step_t*) cl_zalloc(sizeof(*p_next_step)); > - CL_ASSERT (p_next_step != NULL); > - > - p_next_step->state = state; > - p_next_step->p_sw = p_sw; > + if (p_next_step) > + { > + p_next_step->state = state; > + p_next_step->p_sw = p_sw; > + } > return p_next_step; > - > } > > /********************************************************************** > From jlentini at netapp.com Thu May 25 06:36:09 2006 From: jlentini at netapp.com (James Lentini) Date: Thu, 25 May 2006 09:36:09 -0400 (EDT) Subject: [openib-general] Re: which dapl/udapl changes in trunk should be imported into OFED branch? (patch enclosed) In-Reply-To: <200605251024.19848.jackm@mellanox.co.il> References: <200605241806.48546.jackm@mellanox.co.il> <200605251024.19848.jackm@mellanox.co.il> Message-ID: On Thu, 25 May 2006, Jack Morgenstein wrote: > On Thursday 25 May 2006 01:22, James Lentini wrote: > > On Wed, 24 May 2006, Jack Morgenstein wrote: > > > Hi, > > > > > > Below is a patch file of differences between the OFED dapl library > > > and the openib main trunk dapl library. > > > > > > Please indicate which of the dapl library changes are necessary for > > > the Intel MPI to work correctly in OFED. > > > > How recent is the ucm code in OFED? > > I'm not really sure what you mean when you say "ucm code", so I'll try to > cover all the bases. > > PLEASE respond ASAP, since this is holding up release of OFED RC5. We need to > build TODAY for testing over our weekend (which is Friday and Saturday).. > > Userspace: > > 1. libibcm: trunk most recent (includes "ib_xxxx" to "ibv_xxx" name changes). > 2. libibrdma: trunk rev 7079 (May 10). subsequent fixes to cmatose and rping > not included. Also, the OFED librdmacm.spec.in file has REVISION 2, while > the trunk has REVISION 1 -- we'll fix this. With librdmacm you should be using the current sources in the trunk (revision 7141, May 12) > Kernel: As long as the libibcm and librdmacm are working with your kernel, uDAPL should work with the kernel. I'm not familiar with what is needed by Intel MPI. My understanding is that Intel MPI works with revision 7141, but you should confirm that with Arlin. From eli at mellanox.co.il Thu May 25 06:37:14 2006 From: eli at mellanox.co.il (Eli Cohen) Date: Thu, 25 May 2006 16:37:14 +0300 Subject: [openib-general] Re: [IPoIB] executing iperf over IPoIB causes to multicast (IP) packets to be recieved out-of-order In-Reply-To: <20060524170213.GG21266@mellanox.co.il> References: <200605241851.57398.dotanb@mellanox.co.il> <1148486354.4470.122283.camel@hal.voltaire.com> <20060524170213.GG21266@mellanox.co.il> Message-ID: <1148564234.9387.6.camel@mtls03.yok.mtl.com> > This is on back to back. I don't think hardware does this - this is > ipoib software thing. > I saw this problem with unicast UD datagrams as well. This was on Fedora C4 kernel 2.6.11-1.1369_FC4smp. I verified that the packets arrived in order just before calling netif_rx_ni() by peeking into the ip and udp layers. After that I tried this on kernel 2.6.16.17 and the there were no out of order reports. So I guess this was a Linux networking stack problem that was resolved in newer kernels. From tziporet at mellanox.co.il Thu May 25 07:11:52 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Thu, 25 May 2006 17:11:52 +0300 Subject: [openib-general] OFED-1.0-rc5 is available Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7058@mtlexch01.mtl.com> Hi All, We have prepared OFED 1.0 RC5. Release location: https://openib.org/svn/gen2/branches/1.0/ofed/releases File: OFED-1.0-rc5.tgz BUILD_ID: OFED-1.0-rc5 openib-1.0 (REV=7490) # User space https://openib.org/svn/gen2/branches/1.0/src/userspace # Kernel space https://openib.org/svn/gen2/branches/1.0/ofed/tags/rc5/linux-kernel Git: ref: refs/heads/for-2.6.17 # MPI mpi_osu-0.9.7-mlx2.1.0.tgz openmpi-1.1a7-1.src.rpm mpitests-1.0-0.src.rpm OSes: * RH EL4 up2: 2.6.9-22.ELsmp * RH EL4 up3: 2.6.9-34.ELsmp * Fedora C4: 2.6.11-1.1369_FC4 * SLES10 beta 10: 2.6.16-20-smp * SUSE 10 Pro: 2.6.13-15-smp * kernel.org: 2.6.16.x Systems: * x86_64 * x86 * ia64 * ppc64 Main changes from RC4: 1. SDP - new code is now available - see details and limitations bellow 2. SRP - is now available on RHEL4 up2 and up3 3. Open MPI - new package based on 1.1a7-1 4. OSU-MPI - SRQ changed to be run time option 5. Added support for building RPMs by non-root user (using build.sh script) 6. PPC64: all binaries and libraries will be compiled 64-bit. libsdp will be compiled 32-bit also. MPI OSU and Open MPI are compiling on PPC64 7. Installation enhancements: a. Removing link from /usr/bin to /bin. Instead added /etc/profile.d/ofed.sh and /etc/profile.d/ofed.csh in order to update PATH variable with /bin. To update the PATH variable logout/login or 'source /etc/profile.d/ofed.(c)sh' is required. b. Added /etc/ld.so.conf.d/ofed.conf, instead of updating the system ld.so.conf. 8. ipath driver compiled on all systems (64 bits only). 9. Bug Fixes. Package limitations: 1. iSER is working on SuSE SLES 10 Beta8 only SDP Details: Current SDP limitations: * SDP currently does not support sending/receiving out of band data (MSG_OOB). * Generally, SDP supports only SOL_SOCKET socket options. * The following options can be set but actual support is missing: o SO_KEEPALIVE - no keepalives are sent o SO_OOBINLINE - out of band data is not supported o SDP currently supports setting the following SOL_TCP socket options: o TCP_NODELAY, TCP_CORK - but actual support for these options is still missing * SDP currently does not handle Zcopy mode messages correctly and does not set MaxAdverts properly in HH/HAH messages. * SDP currently does not support IPv6 addressing. The following applications have been tested to work with SDP: * iperf * net_perf * netpipe * ttcp * ssh * mySQL * Java apps over GIJ (GNU libgcj) The following applications are known not to work: * telnet * ftp * apache web server SDP performance: TCP STREAM TEST to 11.4.3.68 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. MBytes /s % T % T us/KB us/KB 118784 118784 118784 10.00 801.49 70.19 71.39 3.421 3.480 OFED components tested by Mellanox: * Verbs over mthca * IPoIB * OpenSM * OSU-MPI * SRP * SDP * IB administration utils (ibutils) Please send us any issues you encounter and/or test results. Thanks Tziporet & Vlad Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Thu May 25 08:18:31 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 25 May 2006 18:18:31 +0300 Subject: [openib-general] [PATCH] CMA: fix port 2 loopback problems In-Reply-To: <4460EC93.4090307@ichips.intel.com> References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> <445FA39B.50107@ichips.intel.com> <20060508202904.GD25527@mellanox.co.il> <4460EC93.4090307@ichips.intel.com> Message-ID: <20060525151831.GW21266@mellanox.co.il> Quoting r. Sean Hefty : > Until the underlying IB stack supports loopback connections on a non-active > port, my thinking is to have the RDMA CM select the first active port when > connecting in loopback. OK. How's this? --- Fix CMA for loopback configurations: in cma_bind_loopback, make sure sa query is performed from an active port. Signed-off-by: Ali Ayoub Signed-off-by: Michael S. Tsirkin Index: openib_gen2/drivers/infiniband/core/cma.c =================================================================== --- openib_gen2.orig/drivers/infiniband/core/cma.c 2006-05-25 16:40:46.000000000 +0300 +++ openib_gen2/drivers/infiniband/core/cma.c 2006-05-25 19:27:19.000000000 +0300 @@ -1272,28 +1272,39 @@ EXPORT_SYMBOL(rdma_resolve_route); static int cma_bind_loopback(struct rdma_id_private *id_priv) { struct cma_device *cma_dev; + struct ib_port_attr port_attr; union ib_gid *gid; u16 pkey; int ret; + u8 p; mutex_lock(&lock); - if (list_empty(&dev_list)) { + list_for_each_entry(cma_dev, &dev_list, list) + for (p = 1; p <= cma_dev->device->phys_port_cnt; ++p) + if (!ib_query_port (cma_dev->device, p, &port_attr) && + port_attr.state == IB_PORT_ACTIVE) + goto port_found; + + if (!list_empty(&dev_list)) { + p = 1; + cma_dev = list_entry(dev_list.next, struct cma_device, list); + } else { ret = -ENODEV; goto out; } - cma_dev = list_entry(dev_list.next, struct cma_device, list); +port_found: gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr); - ret = ib_get_cached_gid(cma_dev->device, 1, 0, gid); + ret = ib_get_cached_gid(cma_dev->device, p, 0, gid); if (ret) goto out; - ret = ib_get_cached_pkey(cma_dev->device, 1, 0, &pkey); + ret = ib_get_cached_pkey(cma_dev->device, p, 0, &pkey); if (ret) goto out; ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey); - id_priv->id.port_num = 1; + id_priv->id.port_num = p; cma_attach_to_dev(id_priv, cma_dev); out: mutex_unlock(&lock); -- MST From jackm at mellanox.co.il Thu May 25 08:22:52 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Thu, 25 May 2006 18:22:52 +0300 Subject: [openib-general] libibcm and librdmacm are broken following removal of libsysfs from libibverbs Message-ID: <200605251822.52340.jackm@mellanox.co.il> Roland's patch removing libsysfs dependency from libibverbs (SVN 7484) has resulted in libibcm and librdmacm having compilation errors. Cause: These libs still depend on libsysfs, and leaned on libibverbs also depending on libsysfs specifically, userspace file verbs.h no longer does #include If libibcm and librdmacm continue to depend on libsysfs, they must do this include themselves (say in files cm.c and cma.c). I tried that, but this did not fix the problem entirely. Still get the following: make[2]: Entering directory `/tmp/openib_gen2-20060525-1700_check/src/userspace/librdmacm' /bin/sh ./libtool --tag=CC --mode=link gcc -g -Wall -D_GNU_SOURCE -g -O2 -L../libibverbs/src -libverbs -o examples/ucmatose cmatose.o ./src/librdmacm.la gcc -g -Wall -D_GNU_SOURCE -g -O2 -o examples/.libs/ucmatose cmatose.o -L/tmp/openib_gen2-20060525-1700_check/src/userspace/libibverbs/src ./src/.libs/librdmacm.so /tmp/openib_gen2-20060525-1700_check/src/userspace/libibverbs/src/.libs/libibverbs.so -lpthread -ldl ./src/.libs/librdmacm.so: undefined reference to `_dlist_mark_move' ./src/.libs/librdmacm.so: undefined reference to `dlist_destroy' ./src/.libs/librdmacm.so: undefined reference to `sysfs_get_mnt_path' ./src/.libs/librdmacm.so: undefined reference to `dlist_push' ./src/.libs/librdmacm.so: undefined reference to `dlist_start' ./src/.libs/librdmacm.so: undefined reference to `dlist_new' ./src/.libs/librdmacm.so: undefined reference to `sysfs_read_attribute_value' collect2: ld returned 1 exit status Please coordinate the libsysfs dependency changes, and please fix the trunk (include fixes and make fixes). Thanks. From rdreier at cisco.com Thu May 25 08:31:51 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 25 May 2006 08:31:51 -0700 Subject: [openib-general] Re: ucm: 2.6.17 GIT vs openib trunk issues in file ucm.c In-Reply-To: <200605251033.23776.jackm@mellanox.co.il> (Jack Morgenstein's message of "Thu, 25 May 2006 10:33:23 +0300") References: <200605251033.23776.jackm@mellanox.co.il> Message-ID: Jack> 1. I notice that in the GIT version of core/ucm.c, the trunk Jack> updates below were not taken into the 2.6.17 GIT. Is there Jack> a reason? "Because no one sent the patches to me" is the short answer. - R. From mst at mellanox.co.il Thu May 25 08:39:22 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 25 May 2006 18:39:22 +0300 Subject: [openib-general] Re: ucm: 2.6.17 GIT vs openib trunk issues in file ucm.c In-Reply-To: References: <200605251033.23776.jackm@mellanox.co.il> Message-ID: <20060525153922.GY21266@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: ucm: 2.6.17 GIT vs openib trunk issues in file ucm.c > > Jack> 1. I notice that in the GIT version of core/ucm.c, the trunk > Jack> updates below were not taken into the 2.6.17 GIT. Is there > Jack> a reason? > > "Because no one sent the patches to me" is the short answer. Right :) But these things make sense for 2.6.17, don't they? -- MST From jackm at mellanox.co.il Thu May 25 08:41:16 2006 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Thu, 25 May 2006 18:41:16 +0300 Subject: [openib-general] Re: which dapl/udapl changes in trunk should be imported into OFED branch? (patch enclosed) In-Reply-To: References: <200605241806.48546.jackm@mellanox.co.il> <200605251024.19848.jackm@mellanox.co.il> Message-ID: <200605251841.16994.jackm@mellanox.co.il> On Thursday 25 May 2006 16:36, James Lentini wrote: > On Thu, 25 May 2006, Jack Morgenstein wrote: > > 1. libibcm: trunk most recent (includes "ib_xxxx" to "ibv_xxx" name > > changes). 2. libibrdma: trunk rev 7079 (May 10). subsequent fixes to > > cmatose and rping not included. Also, the OFED librdmacm.spec.in file > > has REVISION 2, while the trunk has REVISION 1 -- we'll fix this. > > With librdmacm you should be using the current sources in the trunk > (revision 7141, May 12) There were no librdmacm changes on May 11 and 12, so we're OK (i.e., there were no librdmacm checkins between 7079 and 7141. The next most recent librdmacm checking was 7298). > > > Kernel: > > As long as the libibcm and librdmacm are working with your kernel, > uDAPL should work with the kernel. > > I'm not familiar with what is needed by Intel MPI. My > understanding is that Intel MPI works with revision 7141, but you > should confirm that with Arlin. Arlin, could you please indicate which is the earliest revision that Intel MPI works with? - Jack From rdreier at cisco.com Thu May 25 08:46:21 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 25 May 2006 08:46:21 -0700 Subject: [openib-general] Re: ucm: 2.6.17 GIT vs openib trunk issues in file ucm.c In-Reply-To: <20060525153922.GY21266@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 25 May 2006 18:39:22 +0300") References: <200605251033.23776.jackm@mellanox.co.il> <20060525153922.GY21266@mellanox.co.il> Message-ID: Michael> Right :) But these things make sense for 2.6.17, don't Michael> they? I think semaphore->mutex conversions can wait for 2.6.18. I don't know if the other change is a bug or not -- maybe Sean remembers? - R. From mst at mellanox.co.il Thu May 25 08:53:22 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 25 May 2006 18:53:22 +0300 Subject: [openib-general] Re: ucm: 2.6.17 GIT vs openib trunk issues in file ucm.c In-Reply-To: References: <200605251033.23776.jackm@mellanox.co.il> <20060525153922.GY21266@mellanox.co.il> Message-ID: <20060525155322.GZ21266@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: ucm: 2.6.17 GIT vs openib trunk issues in file ucm.c > > Michael> Right :) But these things make sense for 2.6.17, don't > Michael> they? > > I think semaphore->mutex conversions can wait for 2.6.18. OK. Sean, could you generate the patches please? > I don't know if the other change is a bug or not -- maybe Sean remembers? It does look like one ... -- MST From Don.Dhondt at Bull.com Thu May 25 08:55:00 2006 From: Don.Dhondt at Bull.com (Don.Dhondt at Bull.com) Date: Thu, 25 May 2006 08:55:00 -0700 Subject: [openib-general] Running MVAPICH2 with SLURM Process Manager In-Reply-To: <200605250324.k4P3OleE004439@xi.cse.ohio-state.edu> Message-ID: Thanks for the reply Dr. Panda. I did notice the formal release on the openib forum and we will move to that release today. We made a couple attempts at rebuilding mvapich2 and our symptoms changed. Maybe for the better, but still not good results. In our last attempt we disabled the compile option "USE_MPD_RING" (HAVE_MPD_RING=""). It seemed to get further but then failed with a "cannot create cq" error message. We are obviously failing now in the infiniband code. The perplexing thing is that the applications work when run with mpiexec (outside of slurm) and have the MPD deamons running. The latest suggestion from LLNL is to make sure we have unlimited max locked memory for our MPI tasks with: srun sh -c 'ulimit -l' Below are the latest "traces" of the error. > (2) We tried building without USE_MPD_RING and the test now fails in > MPI_Init: > ----------------- > 1: slurmd[molson]: task_pre_launch: 3.0, task 1 > 1: In: PMI_Init > 1: In: PMI_Get_rank > 1: In: PMI_Get_size > 1: In: PMI_Get_appnum > 1: In: PMI_Get_id_length_max > 1: In: PMI_Get_id > 1: In: PMI_KVS_Get_name_length_max > 1: In: PMI_KVS_Get_my_name > 1: cannot create cq > 1: Fail to init hca > 1: Fatal error in MPI_Init: Other MPI error, error stack: > 1: MPIR_Init_thread(225): Initialization failed > 1: MPID_Init(81)........: channel initialization failed > ------------- > We checked the Troubleshooting section of the mvapich2 document and > followed the suggestions for these errors, but it did not help. Dhabaleswar Panda 05/24/2006 08:24 PM Please respond to panda at cse.ohio-state.edu To Don.Dhondt at Bull.com cc openib-general at openib.org, mvapich-discuss at cse.ohio-state.edu Subject Re: [openib-general] Running MVAPICH2 with SLURM Process Manager Hi Don, > We are running mvapich2-0.9.3-RC0 with OFED1.0 RC4 and have had good > results. Thanks for doing this testing. Glad to know that it works with OFED1.0 RC4. Please note that we made a formal release of MVAPICH2-0.9.3 during the weekend. > We would like to use the SLURM resource manager with this combination > rather than MPD > but it does not appear to be one of the choices avaliable. Does anyone > have any > experience in this area? > > ./configure --prefix=${PREFIX} ${MULTI_THREAD} \ > --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd \ > --disable-romio --without-mpe 2>&1 |tee config-mine.log > > --with-pm=mpd > We would have liked to have seen an option for slurm. Thanks for this suggestion. We have not tested MVAPICH2 with SLURM. To the best of our knowledge, SLURM works with MPICH2/MPD. Thus, there should not be a problem for MVAPICH2 to work with SLURM. (I believe some of the MVAPICH/MVAPICH2 users do so.) We are taking a look at it and will get back to you. Best Regards, DK > Regards, > Donald Dhondt > GCOS 8 Communications Solutions Project Manager > Bull HN Information Systems Inc. > 13430 N. Black Canyon Hwy., Phoenix, AZ 85029 > Work (602) 862-5245 Fax (602) 862-4290 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Thu May 25 08:56:38 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 25 May 2006 18:56:38 +0300 Subject: [openib-general] Re: libibcm and librdmacm are broken following removal of libsysfs from libibverbs In-Reply-To: <200605251822.52340.jackm@mellanox.co.il> References: <200605251822.52340.jackm@mellanox.co.il> Message-ID: <20060525155638.GB21266@mellanox.co.il> Quoting r. Jack Morgenstein : > Subject: libibcm and librdmacm are broken following removal of libsysfs from libibverbs > > Roland's patch removing libsysfs dependency from libibverbs (SVN 7484) has > resulted in libibcm and librdmacm having compilation errors. > > Cause: These libs still depend on libsysfs, and leaned on libibverbs also > depending on libsysfs Best thing would be to remove the libsysfs dependency. It's deprecated, and doesn't really add much anything of value as compared to plan fopen/fscanf. -- MST From rdreier at cisco.com Thu May 25 09:24:01 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 25 May 2006 09:24:01 -0700 Subject: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring In-Reply-To: (Shirley Ma's message of "Wed, 24 May 2006 22:01:33 -0700") References: Message-ID: This also looks like a step backwards to me. You are replacing a cache-friendly array with a cache-unfriendly linked list, which also requires two more lock/unlock operations in the fast path. In a more general note, you are definitely generating some creative ideas about IPoIB but you need to work on presentation and communication if you want your patches integrated. For example, all you said about this patch is: > Here is the tx_ring removal patch for you to review. If you want this to be merged then you need to provide a changelog entry that explains _what_ the patch does, _how_ it does it, and most importantly _why_ the patch is an improvement. This is especially important for these patches, which naively at least look like they are making things worse. Also, improving the quality of your patches would be helpful. For example, in this patch you delete the comments: > - /* > - * We put the skb into the tx_ring _before_ we call post_send() > - * because it's entirely possible that the completion handler will > - * run before we execute anything after the post_send(). That > - * means we have to make sure everything is properly recorded and > - * our state is consistent before we call post_send(). > - */ and then a few lines further down you create exactly the bug it described: > + err = post_send(priv, wr_id, address->ah, qpn, addr, skb->len); > + if (!err) { > + dev->trans_start = jiffies; > + IPOIB_SKB_PRV_ADDR(skb) = addr; > + IPOIB_SKB_PRV_AH(skb) = address; > + IPOIB_SKB_PRV_SKB(skb) = skb; > + spin_lock(&priv->slist_lock); > + list_add_tail(&IPOIB_SKB_PRV_LIST(skb), &priv->send_list); > + spin_unlock(&priv->slist_lock); > + return; This means I have to read your patches very carefully, because they often introduce races or other bugs. Finally, keeping strictly to the "one idea per patch" rule and holding back from the temptation to fiddle with unrelated things would be nice. In this case you have: > - spinlock_t tx_lock ____cacheline_aligned_in_smp; > + spinlock_t tx_lock; That ____cacheline_aligned_in_smp annotation is something that you added earlier in the patch series (with no explanation), and now you're removing it (again with no explanation). This sort of churn just makes the patches bigger and harder to review and merge. As I said you are generating very creative ideas for IPoIB. Perhaps you can find someone to help you polish the presentation so that we can use them? Thanks, Roland From rdreier at cisco.com Thu May 25 09:28:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 25 May 2006 09:28:24 -0700 Subject: [openib-general] Re: libibcm and librdmacm are broken following removal of libsysfs from libibverbs In-Reply-To: <200605251822.52340.jackm@mellanox.co.il> (Jack Morgenstein's message of "Thu, 25 May 2006 18:22:52 +0300") References: <200605251822.52340.jackm@mellanox.co.il> Message-ID: Jack> Cause: These libs still depend on libsysfs, and leaned on Jack> libibverbs also depending on libsysfs Jack> If libibcm and librdmacm continue to depend on libsysfs, Jack> they must do this include themselves (say in files cm.c and Jack> cma.c). Yes. Jack> I tried that, but this did not fix the problem entirely. Jack> Still get the following: Jack> ./src/.libs/librdmacm.so: undefined reference to Jack> `sysfs_read_attribute_value' This is because they also have to link with libsysfs too. The best thing to do would be to remove usage of libsysfs entirely, but for now I just checked in the following change which fixes the build (I also fixed the libraries so they work with libsysfs 2.0, which no longer has sysfs_read_attribute_value()). - R. --- libibcm/configure.in (revision 7485) +++ libibcm/configure.in (working copy) @@ -25,6 +25,8 @@ AC_CHECK_SIZEOF(long) dnl Checks for libraries if test "$disable_libcheck" != "yes" then +AC_CHECK_LIB(sysfs, sysfs_open_class, [], + AC_MSG_ERROR([sysfs_open_class() not found. libibcm requires libsysfs.])) AC_CHECK_LIB(ibverbs, ibv_get_device_list, [], AC_MSG_ERROR([ibv_get_device_list() not found. libibcm requires libibverbs.])) #AC_CHECK_LIB(rdmacm, rdma_create_id, [], @@ -34,6 +36,8 @@ fi dnl Checks for header files. if test "$disable_libcheck" != "yes" then +AC_CHECK_HEADER(sysfs/libsysfs.h, [], + AC_MSG_ERROR([ not found. libibcm requires libsysfs.])) AC_CHECK_HEADER(infiniband/verbs.h, [], AC_MSG_ERROR([ not found. Is libibverbs installed?])) AC_CHECK_HEADER(infiniband/marshall.h, [], --- libibcm/src/cm.c (revision 7485) +++ libibcm/src/cm.c (working copy) @@ -50,6 +50,8 @@ #include #include +#include + #include #include #include @@ -115,8 +117,9 @@ static struct dlist *device_list; static int check_abi_version(void) { char path[256]; - char val[16]; + struct sysfs_attribute *attr; int abi_ver; + int ret = -1; if (sysfs_get_mnt_path(path, sizeof path)) { fprintf(stderr, PFX "couldn't find sysfs mount.\n"); @@ -124,20 +127,32 @@ static int check_abi_version(void) } strncat(path, "/class/infiniband_cm/abi_version", sizeof path); - if (sysfs_read_attribute_value(path, val, sizeof val)) { - fprintf(stderr, PFX "couldn't read ucm ABI version.\n"); + + attr = sysfs_open_attribute(path); + if (!attr) { + fprintf(stderr, PFX "couldn't open ucm ABI version.\n"); return -1; } - abi_ver = strtol(val, NULL, 10); + if (sysfs_read_attribute(attr)) { + fprintf(stderr, PFX "couldn't read ucm ABI version.\n"); + goto out; + } + + abi_ver = strtol(attr->value, NULL, 10); if (abi_ver < IB_USER_CM_MIN_ABI_VERSION || abi_ver > IB_USER_CM_MAX_ABI_VERSION) { fprintf(stderr, PFX "kernel ABI version %d " "doesn't match library version %d.\n", abi_ver, IB_USER_CM_MAX_ABI_VERSION); - return -1; + goto out; } - return 0; + + ret = 0; + +out: + sysfs_close_attribute(attr); + return ret; } static uint64_t get_device_guid(struct sysfs_class_device *ibdev) --- librdmacm/configure.in (revision 7485) +++ librdmacm/configure.in (working copy) @@ -25,6 +25,8 @@ AC_CHECK_SIZEOF(long) dnl Checks for libraries if test "$disable_libcheck" != "yes" then +AC_CHECK_LIB(sysfs, sysfs_open_class, [], + AC_MSG_ERROR([sysfs_open_class() not found. librdmacm requires libsysfs.])) AC_CHECK_LIB(ibverbs, ibv_get_device_list, [], AC_MSG_ERROR([ibv_get_device_list() not found. librdmacm requires libibverbs.])) fi @@ -32,6 +34,8 @@ fi dnl Checks for header files. if test "$disable_libcheck" != "yes" then +AC_CHECK_HEADER(sysfs/libsysfs.h, [], + AC_MSG_ERROR([ not found. librdmacm requires libsysfs.])) AC_CHECK_HEADER(infiniband/verbs.h, [], AC_MSG_ERROR([ not found. Is libibverbs installed?])) fi --- librdmacm/src/cma.c (revision 7485) +++ librdmacm/src/cma.c (working copy) @@ -49,6 +49,8 @@ #include #include +#include + #include #include #include @@ -140,7 +142,8 @@ static void ucma_cleanup(void) static int check_abi_version(void) { char path[256]; - char val[16]; + struct sysfs_attribute *attr; + int ret = -1; if (sysfs_get_mnt_path(path, sizeof path)) { fprintf(stderr, "librdmacm: couldn't find sysfs mount.\n"); @@ -148,17 +151,33 @@ static int check_abi_version(void) } strncat(path, "/class/misc/rdma_cm/abi_version", sizeof path); - if (!sysfs_read_attribute_value(path, val, sizeof val)) - abi_ver = strtol(val, NULL, 10); + + attr = sysfs_open_attribute(path); + if (!attr) { + fprintf(stderr, "librdmacm: couldn't open rdma_cm ABI version.\n"); + return -ENOSYS; + } + + if (sysfs_read_attribute(attr)) { + fprintf(stderr, "librdmacm: couldn't read rdma_cm ABI version.\n"); + goto out; + } + + abi_ver = strtol(attr->value, NULL, 10); if (abi_ver < RDMA_USER_CM_MIN_ABI_VERSION || abi_ver > RDMA_USER_CM_MAX_ABI_VERSION) { fprintf(stderr, "librdmacm: kernel ABI version %d " "doesn't match library version %d.\n", abi_ver, RDMA_USER_CM_MAX_ABI_VERSION); - return -ENOSYS; + goto out; } - return 0; + + ret = 0; + +out: + sysfs_close_attribute(attr); + return ret; } static int ucma_init(void) From hycsw at ca.sandia.gov Thu May 25 09:30:21 2006 From: hycsw at ca.sandia.gov (helen chen) Date: 25 May 2006 09:30:21 -0700 Subject: [openib-general] NFS/RDMA for Linux: client and server update release 5 In-Reply-To: <1148426365.1575.10.camel@shuttle> References: <7.0.1.0.2.20060522161137.04202e30@netapp.com> <1148426365.1575.10.camel@shuttle> Message-ID: <1148574621.1573.15.camel@shuttle> Tom, I modified the exports file to add the insecure option but am still having the identical problem. The client dmesg showed identical crash dump, but the server demesg recorded no errors. Thanks, Helen ------------------------------------------------------- On Tue, 2006-05-23 at 16:19, helen chen wrote: > Hi Tom, > > I have downloaded your release 5 of the NFS/RDMA and am having trouble > mounting the rdma nfs, the > "./nfsrdmamount -o rdma on16-ib:/mnt/rdma /mnt/rdma" command never > returned. and the dmesg for client and server are: > > ------ demsg from client ----- > RPCRDMA Module Init, register RPC RDMA transport > Defaults: > MaxRequests 50 > MaxInlineRead 1024 > MaxInlineWrite 1024 > Padding 0 > Memreg 5 > RPC: Registered rdma transport module. > RPC: Registered rdma transport module. > RPC: xprt_setup_rdma: 140.221.134.221:2049 > nfs: server on16-ib not responding, timed out > Unable to handle kernel NULL pointer dereference at 0000000000000000 > RIP: > [<0000000000000000>] > PGD a9f2b067 PUD a8ca2067 PMD 0 > Oops: 0010 [1] PREEMPT SMP > CPU 1 > Modules linked in: xprtrdma ib_srp iscsi_tcp scsi_transport_iscsi > scsi_mod > Pid: 346, comm: ib_cm/1 Not tainted 2.6.16.16 #4 > RIP: 0010:[<0000000000000000>] [<0000000000000000>] > RSP: 0018:ffff8100af5a1c30 EFLAGS: 00010246 > RAX: ffff8100aeff2400 RBX: ffff8100aeff2400 RCX: ffff8100afc9e458 > RDX: 0000000000000000 RSI: ffff8100af5a1d48 RDI: ffff8100aeff2440 > RBP: ffff8100aeff2440 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000003 R11: 0000000000000000 R12: ffff8100aeff2500 > R13: 00000000ffffff99 R14: ffff8100af5a1d48 R15: ffffffff8036c72c > FS: 0000000000505ae0(0000) GS:ffff810003ce25c0(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 00000000ad587000 CR4: 00000000000006a0 > Process ib_cm/1 (pid: 346, threadinfo ffff8100af5a0000, task > ffff8100afea8100) > Stack: ffffffff8802a331 ffff8100aeff2500 0000000000000001 > ffff8100aeff2440 > ffffffff804011fd 0000000000000000 ffffffff8802a343 > ffff8100afdd6100 > ffffffff80364ee4 0000000000000100 > Call Trace: [] [] > [] [] [] > [] [] [] > [] [] [] > [] [] [] > [] [] [] > [] [] [] > [] [] [] > [] [] [] > [] > > Code: Bad RIP value. > RIP [<0000000000000000>] RSP > CR2: 0000000000000000 > > ------dmesg from server ------ > nfsd: request from insecure port 140.221.134.220, port=32768! > svc_rdma_recvfrom: transport ffff81007e8f2800 is closing > svc_rdma_put: Destroying transport ffff81007e8f2800, > cm_id=ffff81007e945200, sk_flags=154, sk_inuse=0 > > Did I forget to configure necessary components into my kernel? > > Thanks, > Helen > > On Mon, 2006-05-22 at 13:25, Talpey, Thomas wrote: > > Network Appliance is pleased to announce release 5 of the NFS/RDMA > > client and server for Linux 2.6.16.16. This update to the April 19 release > > adds improved server parallel performance and fixes various issues. This > > code supports both Infiniband and iWARP transports. > > > > > > > > > > > > Comments and feedback welcome. We're especially interested in > > successful test reports! Thanks. > > > > Tom Talpey, for the various NFS/RDMA projects. > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From weiny2 at llnl.gov Thu May 25 09:30:51 2006 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 25 May 2006 09:30:51 -0700 Subject: [openib-general] Running MVAPICH2 with SLURM Process Manager In-Reply-To: References: <200605250324.k4P3OleE004439@xi.cse.ohio-state.edu> Message-ID: <20060525093051.3b3c9f9d.weiny2@llnl.gov> On Thu, 25 May 2006 08:55:00 -0700 Don.Dhondt at Bull.com wrote: > > We made a couple attempts at rebuilding mvapich2 and our symptoms > changed. Maybe > for the better, but still not good results. In our last attempt we > disabled the compile option > "USE_MPD_RING" (HAVE_MPD_RING=""). It seemed to get further but then > failed with a > "cannot create cq" error message. We are obviously failing now in the > infiniband code. > The perplexing thing is that the applications work when run with > mpiexec (outside of slurm) > and have the MPD deamons running. > > The latest suggestion from LLNL is to make sure we have unlimited max > locked > memory for our MPI tasks with: > > srun sh -c 'ulimit -l' > Here at LLNL we are only running MVAPICH but yes we have had to run our slurm with "ulimit -l unlimited". I have been meaning to ping the list to see if anyone else has this issue and if there are any ideas on how to keep a limit on max locked memory while still allowing MVAPICH to run. Ira Weiny weiny2 at llnl.gov From ardavis at ichips.intel.com Thu May 25 09:30:56 2006 From: ardavis at ichips.intel.com (Arlin Davis) Date: Thu, 25 May 2006 09:30:56 -0700 Subject: [openib-general] Re: which dapl/udapl changes in trunk should be imported into OFED branch? (patch enclosed) In-Reply-To: <200605251841.16994.jackm@mellanox.co.il> References: <200605241806.48546.jackm@mellanox.co.il> <200605251024.19848.jackm@mellanox.co.il> <200605251841.16994.jackm@mellanox.co.il> Message-ID: <4475DBC0.7060801@ichips.intel.com> Jack Morgenstein wrote: >> >>I'm not familiar with what is needed by Intel MPI. My >>understanding is that Intel MPI works with revision 7141, but you >>should confirm that with Arlin. >> >> > >Arlin, could you please indicate which is the earliest revision that Intel MPI >works with? > > Yes, 7141 works with Intel MPI. >- Jack >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From rdreier at cisco.com Thu May 25 09:34:51 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 25 May 2006 09:34:51 -0700 Subject: [openib-general] Running MVAPICH2 with SLURM Process Manager In-Reply-To: <20060525093051.3b3c9f9d.weiny2@llnl.gov> (Ira Weiny's message of "Thu, 25 May 2006 09:30:51 -0700") References: <200605250324.k4P3OleE004439@xi.cse.ohio-state.edu> <20060525093051.3b3c9f9d.weiny2@llnl.gov> Message-ID: Ira> I have been meaning to ping the list to see if anyone else Ira> has this issue and if there are any ideas on how to keep a Ira> limit on max locked memory while still allowing MVAPICH to Ira> run. Not really -- registering memory for use with RDMA requires that it be locked, and MVAPICH wants to register a lot of memory. I guess setting a large limit would work too, but I don't know exactly how large the limit would have to be -- and it would probably depend on what MPI codes you're running. - R. From surs at cse.ohio-state.edu Thu May 25 09:58:13 2006 From: surs at cse.ohio-state.edu (Sayantan Sur) Date: Thu, 25 May 2006 12:58:13 -0400 Subject: [openib-general] Running MVAPICH2 with SLURM Process Manager In-Reply-To: References: <200605250324.k4P3OleE004439@xi.cse.ohio-state.edu> <20060525093051.3b3c9f9d.weiny2@llnl.gov> Message-ID: <20060525165809.GB2835@cse.ohio-state.edu> * On May,3 Roland Dreier wrote : > Ira> I have been meaning to ping the list to see if anyone else > Ira> has this issue and if there are any ideas on how to keep a > Ira> limit on max locked memory while still allowing MVAPICH to > Ira> run. > > Not really -- registering memory for use with RDMA requires that it be > locked, and MVAPICH wants to register a lot of memory. I guess > setting a large limit would work too, but I don't know exactly how > large the limit would have to be -- and it would probably depend on > what MPI codes you're running. The amount of registered memory in MVAPICH is very much application dependent and easily tunable at runtime. There are three main parameters which you can pass to MVAPICH to control the amount of registered memory. VIADEV_DREG_CACHE_LIMIT: Controls the limit of pages reserved for large messages. VIADEV_VBUF_POOL_SIZE: Controls the number of buffers allocated for small messages. VIADEV_NUM_RDMA_BUFFER: Controls the number of RDMA buffers for scalable RDMA channel for small messages All these parameters are explained in detail in our user guide: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich_user_guide.html#x1-540009 Based on the tuning of the MVAPICH library and the application codes you are running, you may decide on a lower ulimit than just "unlimited". Thanks, Sayantan. -- http://www.cse.ohio-state.edu/~surs From sean.hefty at intel.com Thu May 25 10:03:23 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 25 May 2006 10:03:23 -0700 Subject: [openib-general] [PATCH] git ucm for 2.6.18: convert semaphore to mutex In-Reply-To: Message-ID: >I think semaphore->mutex conversions can wait for 2.6.18. I don't >know if the other change is a bug or not -- maybe Sean remembers? The other change was an attempt to fix an issue that Arlin was hitting with uDAPL, but I don't think that it's needed. Please add the patch below to the for-2.6.18 branch, which will bring the ucm up to date. --- Convert semaphore in ib_ucm_file to a real mutex. Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c index fefc9b6..67caf36 100644 --- a/drivers/infiniband/core/ucm.c +++ b/drivers/infiniband/core/ucm.c @@ -64,7 +64,7 @@ struct ib_ucm_device { }; struct ib_ucm_file { - struct semaphore mutex; + struct mutex file_mutex; struct file *filp; struct ib_ucm_device *device; @@ -153,7 +153,7 @@ static void ib_ucm_cleanup_events(struct { struct ib_ucm_event *uevent; - down(&ctx->file->mutex); + mutex_lock(&ctx->file->file_mutex); list_del(&ctx->file_list); while (!list_empty(&ctx->events)) { @@ -168,7 +168,7 @@ static void ib_ucm_cleanup_events(struct kfree(uevent); } - up(&ctx->file->mutex); + mutex_unlock(&ctx->file->file_mutex); } static struct ib_ucm_context *ib_ucm_ctx_alloc(struct ib_ucm_file *file) @@ -375,11 +375,11 @@ static int ib_ucm_event_handler(struct i if (result) goto err2; - down(&ctx->file->mutex); + mutex_lock(&ctx->file->file_mutex); list_add_tail(&uevent->file_list, &ctx->file->events); list_add_tail(&uevent->ctx_list, &ctx->events); wake_up_interruptible(&ctx->file->poll_wait); - up(&ctx->file->mutex); + mutex_unlock(&ctx->file->file_mutex); return 0; err2: @@ -405,7 +405,7 @@ static ssize_t ib_ucm_event(struct ib_uc if (copy_from_user(&cmd, inbuf, sizeof(cmd))) return -EFAULT; - down(&file->mutex); + mutex_lock(&file->file_mutex); while (list_empty(&file->events)) { if (file->filp->f_flags & O_NONBLOCK) { @@ -420,9 +420,9 @@ static ssize_t ib_ucm_event(struct ib_uc prepare_to_wait(&file->poll_wait, &wait, TASK_INTERRUPTIBLE); - up(&file->mutex); + mutex_unlock(&file->file_mutex); schedule(); - down(&file->mutex); + mutex_lock(&file->file_mutex); finish_wait(&file->poll_wait, &wait); } @@ -482,7 +482,7 @@ static ssize_t ib_ucm_event(struct ib_uc kfree(uevent->info); kfree(uevent); done: - up(&file->mutex); + mutex_unlock(&file->file_mutex); return result; } @@ -501,9 +501,9 @@ static ssize_t ib_ucm_create_id(struct i if (copy_from_user(&cmd, inbuf, sizeof(cmd))) return -EFAULT; - down(&file->mutex); + mutex_lock(&file->file_mutex); ctx = ib_ucm_ctx_alloc(file); - up(&file->mutex); + mutex_unlock(&file->file_mutex); if (!ctx) return -ENOMEM; @@ -1159,10 +1159,8 @@ static unsigned int ib_ucm_poll(struct f poll_wait(filp, &file->poll_wait, wait); - down(&file->mutex); if (!list_empty(&file->events)) mask = POLLIN | POLLRDNORM; - up(&file->mutex); return mask; } @@ -1179,7 +1177,7 @@ static int ib_ucm_open(struct inode *ino INIT_LIST_HEAD(&file->ctxs); init_waitqueue_head(&file->poll_wait); - init_MUTEX(&file->mutex); + mutex_init(&file->file_mutex); filp->private_data = file; file->filp = filp; @@ -1193,11 +1191,11 @@ static int ib_ucm_close(struct inode *in struct ib_ucm_file *file = filp->private_data; struct ib_ucm_context *ctx; - down(&file->mutex); + mutex_lock(&file->file_mutex); while (!list_empty(&file->ctxs)) { ctx = list_entry(file->ctxs.next, struct ib_ucm_context, file_list); - up(&file->mutex); + mutex_unlock(&file->file_mutex); mutex_lock(&ctx_id_mutex); idr_remove(&ctx_id_table, ctx->id); @@ -1207,9 +1205,9 @@ static int ib_ucm_close(struct inode *in ib_ucm_cleanup_events(ctx); kfree(ctx); - down(&file->mutex); + mutex_lock(&file->file_mutex); } - up(&file->mutex); + mutex_unlock(&file->file_mutex); kfree(file); return 0; } From hycsw at ca.sandia.gov Thu May 25 10:54:08 2006 From: hycsw at ca.sandia.gov (helen chen) Date: 25 May 2006 10:54:08 -0700 Subject: [openib-general] NFS/RDMA for Linux: client and server update release 5 In-Reply-To: <7.0.1.0.2.20060524071711.0421e2d8@netapp.com> References: <7.0.1.0.2.20060522161137.04202e30@netapp.com> <1148426365.1575.10.camel@shuttle> <7.0.1.0.2.20060524071711.0421e2d8@netapp.com> Message-ID: <1148579648.1573.18.camel@shuttle> Tom, Please review the attached ksymoops output. Helen On Wed, 2006-05-24 at 04:25, Talpey, Thomas wrote: > [Cutting down the reply list to more relevant parties...] > > It's hard to say what is crashing, but I suspect the CM code, due > to the process context being ib_cm. Is there some reason you're > not getting symbols in the stack trace? If you could feed this oops > text to ksymoops it will give us more information. > > In any case, it appears the connection is succeeding at the server, > but the client RPC code isn't being signalled that it has done so. > Perhaps this is due to a lost reply, but the NFS code hasn't actually > started to do anything. So, I would look for IB-level issues. Is the > client running the current OpenFabrics svn top-of-tree? > > Let's take this offline to diagnose, unless someone has an idea why > the CM would be failing. The ksymoops analysis would help. > > Tom. > > > > At 07:19 PM 5/23/2006, helen chen wrote: > >Hi Tom, > > > >I have downloaded your release 5 of the NFS/RDMA and am having trouble > >mounting the rdma nfs, the > >"./nfsrdmamount -o rdma on16-ib:/mnt/rdma /mnt/rdma" command never > >returned. and the dmesg for client and server are: > > > >------ demsg from client ----- > >RPCRDMA Module Init, register RPC RDMA transport > >Defaults: > > MaxRequests 50 > > MaxInlineRead 1024 > > MaxInlineWrite 1024 > > Padding 0 > > Memreg 5 > >RPC: Registered rdma transport module. > >RPC: Registered rdma transport module. > >RPC: xprt_setup_rdma: 140.221.134.221:2049 > >nfs: server on16-ib not responding, timed out > >Unable to handle kernel NULL pointer dereference at 0000000000000000 > >RIP: > >[<0000000000000000>] > >PGD a9f2b067 PUD a8ca2067 PMD 0 > >Oops: 0010 [1] PREEMPT SMP > >CPU 1 > >Modules linked in: xprtrdma ib_srp iscsi_tcp scsi_transport_iscsi > >scsi_mod > >Pid: 346, comm: ib_cm/1 Not tainted 2.6.16.16 #4 > >RIP: 0010:[<0000000000000000>] [<0000000000000000>] > >RSP: 0018:ffff8100af5a1c30 EFLAGS: 00010246 > >RAX: ffff8100aeff2400 RBX: ffff8100aeff2400 RCX: ffff8100afc9e458 > >RDX: 0000000000000000 RSI: ffff8100af5a1d48 RDI: ffff8100aeff2440 > >RBP: ffff8100aeff2440 R08: 0000000000000000 R09: 0000000000000000 > >R10: 0000000000000003 R11: 0000000000000000 R12: ffff8100aeff2500 > >R13: 00000000ffffff99 R14: ffff8100af5a1d48 R15: ffffffff8036c72c > >FS: 0000000000505ae0(0000) GS:ffff810003ce25c0(0000) > >knlGS:0000000000000000 > >CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > >CR2: 0000000000000000 CR3: 00000000ad587000 CR4: 00000000000006a0 > >Process ib_cm/1 (pid: 346, threadinfo ffff8100af5a0000, task > >ffff8100afea8100) > >Stack: ffffffff8802a331 ffff8100aeff2500 0000000000000001 > >ffff8100aeff2440 > > ffffffff804011fd 0000000000000000 ffffffff8802a343 > >ffff8100afdd6100 > > ffffffff80364ee4 0000000000000100 > >Call Trace: [] [] > > [] [] [] > > [] [] [] > > [] [] [] > > [] [] [] > > [] [] [] > > [] [] [] > > [] [] [] > > [] [] [] > > [] > > > >Code: Bad RIP value. > >RIP [<0000000000000000>] RSP > >CR2: 0000000000000000 > > > >------dmesg from server ------ > >nfsd: request from insecure port 140.221.134.220, port=32768! > >svc_rdma_recvfrom: transport ffff81007e8f2800 is closing > >svc_rdma_put: Destroying transport ffff81007e8f2800, > >cm_id=ffff81007e945200, sk_flags=154, sk_inuse=0 > > > >Did I forget to configure necessary components into my kernel? > > > >Thanks, > >Helen > > > >On Mon, 2006-05-22 at 13:25, Talpey, Thomas wrote: > >> Network Appliance is pleased to announce release 5 of the NFS/RDMA > >> client and server for Linux 2.6.16.16. This update to the April 19 release > >> adds improved server parallel performance and fixes various issues. This > >> code supports both Infiniband and iWARP transports. > >> > >> > >> > >> > > > >> > >> Comments and feedback welcome. We're especially interested in > >> successful test reports! Thanks. > >> > >> Tom Talpey, for the various NFS/RDMA projects. > >> > >> _______________________________________________ > >> openib-general mailing list > >> openib-general at openib.org > >> http://openib.org/mailman/listinfo/openib-general > >> > >> To unsubscribe, please visit > >http://openib.org/mailman/listinfo/openib-general > >> > > -------------- next part -------------- ksymoops 2.4.9 on x86_64 2.6.16.16. Options used -v /usr/src/linux-2.6.16.16/vmlinux (specified) -K (specified) -l /proc/modules (default) -o /lib/modules/2.6.16.16/ (default) -m /usr/src/linux-2.6.16.16/System.map (specified) No modules in ksyms, skipping objects No ksyms, skipping lsmod Warning (compare_maps): vmlinux symbol __crc_I_BDEV not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_SELECT_DRIVE not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_____request_resource not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc____pskb_trim not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___alloc_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___alloc_percpu not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___alloc_skb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bdevname not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bforget not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bio_clone not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bitmap_and not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bitmap_andnot not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bitmap_complement not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bitmap_empty not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bitmap_equal not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bitmap_full not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bitmap_intersects not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bitmap_or not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bitmap_shift_left not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bitmap_shift_right not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bitmap_subset not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bitmap_weight not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bitmap_xor not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___blk_put_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___blockdev_direct_IO not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___bread not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___breadahead not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___break_lease not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___brelse not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___builtin_strlen not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___check_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___clear_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___const_udelay not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___cpufreq_driver_target not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___create_workqueue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___delay not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___dev_get_by_index not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___dev_get_by_name not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___dev_remove_pack not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___down_failed not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___down_failed_interruptible not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___down_failed_trylock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___down_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___down_read_trylock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___down_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___down_write_trylock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___downgrade_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___dst_free not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___elv_add_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___find_get_block not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___free_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___generic_file_aio_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___generic_unplug_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___get_free_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___get_user_1 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___get_user_2 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___get_user_4 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___get_user_8 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___getblk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___handle_mm_fault not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ide_abort not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ide_dma_bad_drive not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ide_dma_check not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ide_dma_end not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ide_dma_good_drive not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ide_dma_host_off not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ide_dma_host_on not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ide_dma_lostirq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ide_dma_off not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ide_dma_off_quietly not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ide_dma_on not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ide_dma_timeout not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ide_error not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ide_pci_register_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___inet_lookup_listener not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___inet_twsk_hashdance not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___inet_twsk_kill not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___init_timer_base not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___inode_dir_notify not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___insert_inode_hash not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___invalidate_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ioremap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___iowrite32_copy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ip_route_output_key not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ip_select_ident not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___kfifo_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___kfifo_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___kfree_skb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___kill_fasync not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___kmalloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___lock_buffer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___lock_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___mark_inode_dirty not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___memcpy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___memcpy_fromio not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___memcpy_toio not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___mod_page_state_offset not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___mod_timer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___module_put_and_exit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___mutex_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___ndelay not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___neigh_event_send not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___neigh_for_each_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___net_timestamp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___netdev_watchdog_up not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___nla_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___nla_reserve not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___page_cache_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___page_symlink not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___pagevec_lru_add not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___pagevec_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___pci_register_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___printk_ratelimit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___pskb_pull_tail not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___put_user_1 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___put_user_2 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___put_user_4 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___put_user_8 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___read_lock_failed not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___release_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___request_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___rpc_wait_for_completion_task not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___rta_fill not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___scm_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___scm_send not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___serio_register_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___serio_register_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___serio_unregister_port_delayed not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___set_page_dirty_buffers not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___set_page_dirty_nobuffers not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___set_personality not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___sk_stream_mem_reclaim not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___skb_checksum_complete not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___skb_linearize not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___strncpy_from_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___supported_pte_mask not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___symbol_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___symbol_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___tasklet_hi_schedule not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___tasklet_schedule not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___udelay not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___up_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___up_wakeup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___up_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___usb_get_extra_descriptor not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___user_walk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___user_walk_fd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___vm_enough_memory not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___vmalloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___vmalloc_node not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___wait_on_bit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___wait_on_bit_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___wait_on_buffer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___wake_up not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___wake_up_bit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___wake_up_sync not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc___write_lock_failed not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__atomic_dec_and_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__cpu_pda not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__ctype not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__read_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__read_lock_bh not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__read_lock_irq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__read_lock_irqsave not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__read_trylock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__read_unlock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__read_unlock_bh not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__read_unlock_irq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__read_unlock_irqrestore not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__spin_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__spin_lock_bh not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__spin_lock_irq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__spin_lock_irqsave not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__spin_trylock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__spin_trylock_bh not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__spin_unlock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__spin_unlock_bh not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__spin_unlock_irq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__spin_unlock_irqrestore not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__write_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__write_lock_bh not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__write_lock_irq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__write_lock_irqsave not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__write_trylock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__write_unlock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__write_unlock_bh not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__write_unlock_irq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc__write_unlock_irqrestore not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_acquire_global_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_bus_add not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_bus_generate_event not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_bus_get_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_bus_get_power not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_bus_get_status not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_bus_receive_event not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_bus_register_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_bus_set_power not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_bus_start not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_bus_unregister_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_clear_event not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_dbg_layer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_dbg_level not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_disable_event not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_disabled not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_enable_event not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_enable_gpe not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_enter_sleep_state not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_enter_sleep_state_s4bios not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_evaluate_integer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_evaluate_object not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_evaluate_reference not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_extract_package not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_fadt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_fadt_is_v1 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_child not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_current_resources not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_devices not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_firmware_table not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_handle not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_name not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_next_object not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_object_info not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_parent not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_pci_id not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_pci_rootbridge_handle not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_physical_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_sleep_type_data not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_table not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_get_type not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_install_address_space_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_install_fixed_event_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_install_gpe_block not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_install_gpe_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_install_notify_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_create_semaphore not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_delete_semaphore not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_free not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_map_memory not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_printf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_queue_for_execution not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_read_pci_configuration not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_read_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_signal not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_signal_semaphore not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_sleep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_stall not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_unmap_memory not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_wait_events_complete not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_wait_semaphore not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_os_write_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_pci_irq_enable not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_pci_register_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_pci_unregister_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_register_gsi not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_register_ioapic not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_release_global_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_remove_address_space_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_remove_fixed_event_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_remove_gpe_block not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_remove_gpe_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_remove_notify_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_resource_to_address64 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_root_dir not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_set_current_resources not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_set_gpe_type not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_set_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_specific_hotkey_enabled not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_strict not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_unregister_ioapic not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_walk_namespace not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acpi_walk_resources not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_acquire_console_sem not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_add_disk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_add_disk_randomness not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_add_preempt_count not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_add_taint not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_add_to_page_cache not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_add_uevent_var not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_add_wait_queue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_add_wait_queue_exclusive not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_adjust_resource not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp3_generic_cleanup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp3_generic_configure not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp3_generic_fetch_size not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp3_generic_sizes not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp3_generic_tlbflush not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_add_bridge not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_alloc_bridge not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_allocate_memory not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_backend_acquire not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_backend_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_bind_memory not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_bridge not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_bridges not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_collect_device_status not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_copy_info not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_create_memory not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_device_command not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_enable not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_find_bridge not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_free_key not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_free_memory not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_generic_alloc_by_type not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_generic_alloc_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_generic_create_gatt_table not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_generic_destroy_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_generic_enable not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_generic_free_by_type not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_generic_free_gatt_table not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_generic_insert_memory not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_generic_mask_memory not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_generic_remove_memory not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_memory_reserved not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_num_entries not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_off not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_put_bridge not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_remove_bridge not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_try_unsupported_boot not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_agp_unbind_memory not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_aio_complete not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_aio_put_req not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_alloc_buffer_head not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_alloc_chrdev_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_alloc_disk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_alloc_disk_node not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_alloc_etherdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_alloc_fcdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_alloc_netdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_alloc_page_buffers not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_alloc_tty_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_allocate_resource not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_allow_signal not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_anon_transport_class_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_anon_transport_class_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_arch_acpi_processor_init_pdc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_arp_broken_ops not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_arp_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_arp_find not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_arp_rcv not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_arp_send not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_arp_tbl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_arp_xmit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_attribute_container_add_attrs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_attribute_container_add_class_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_attribute_container_add_class_device_adapter not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_attribute_container_class_device_del not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_attribute_container_classdev_to_container not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_attribute_container_device_trigger not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_attribute_container_find_class_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_attribute_container_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_attribute_container_remove_attrs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_attribute_container_remove_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_attribute_container_trigger not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_attribute_container_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_auth_domain_find not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_auth_domain_lookup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_auth_domain_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_auth_unix_add_addr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_auth_unix_forget_old not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_auth_unix_lookup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_autoremove_wake_function not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_avenrun not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bad_dma_address not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_balance_dirty_pages_ratelimited not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bd_claim not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bd_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bd_set_size not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bdev_read_only not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bdevname not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bdget not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bdput not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_add_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_add_pc_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_alloc_bioset not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_clone not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_copy_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_endio not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_free not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_get_nr_vecs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_hw_segments not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_map_kern not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_map_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_pair_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_phys_segments not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_split not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_split_pool not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_uncopy_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bio_unmap_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bioset_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bioset_free not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bit_waitqueue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bitmap_allocate_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bitmap_bitremap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bitmap_find_free_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bitmap_parse not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bitmap_parselist not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bitmap_release_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bitmap_remap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bitmap_scnlistprintf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bitmap_scnprintf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bitreverse not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_alloc_queue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_alloc_queue_node not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_cleanup_queue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_complete_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_congestion_wait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_dump_rq_flags not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_end_sync_rq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_execute_rq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_execute_rq_nowait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_get_backing_dev_info not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_get_queue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_get_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_init_queue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_init_queue_node not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_insert_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_max_low_pfn not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_max_pfn not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_plug_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_put_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_activity_fn not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_bounce not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_bounce_limit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_dma_alignment not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_end_tag not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_find_tag not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_free_tags not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_hardsect_size not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_init_tags not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_invalidate_tags not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_issue_flush_fn not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_make_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_max_hw_segments not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_max_phys_segments not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_max_sectors not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_max_segment_size not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_merge_bvec not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_ordered not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_prep_rq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_resize_tags not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_segment_boundary not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_softirq_done not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_stack_limits not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_queue_start_tag not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_register_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_remove_plug not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_requeue_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_rq_bio_prep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_rq_map_kern not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_rq_map_sg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_rq_map_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_rq_map_user_iov not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_rq_unmap_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_run_queue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_start_queue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_stop_queue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_sync_queue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blk_unregister_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blkdev_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blkdev_ioctl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blkdev_issue_flush not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_blkdev_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_block_all_signals not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_block_commit_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_block_invalidatepage not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_block_prepare_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_block_read_full_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_block_sync_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_block_truncate_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_block_write_full_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_boot_cpu_data not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_boot_option_idle_override not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_brioctl_set not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bus_add_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bus_create_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bus_find_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bus_for_each_dev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bus_for_each_drv not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bus_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bus_remove_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bus_remove_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bus_rescan_devices not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_bus_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cache_check not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cache_flush not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cache_fresh not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cache_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cache_purge not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cache_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cache_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_call_rcu not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_call_rcu_bh not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_call_usermodehelper_keys not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cancel_rearming_delayed_work not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cancel_rearming_delayed_workqueue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_bprm_apply_creds not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_bprm_secureexec not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_bprm_set_security not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_bset not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_capable not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_capget not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_capset_check not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_capset_set not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_inode_removexattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_inode_setxattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_netlink_recv not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_netlink_send not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_ptrace not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_settime not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_syslog not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_task_post_setuid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_task_reparent_to_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cap_vm_enough_memory not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_capable not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cdev_add not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cdev_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cdev_del not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cdev_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cdrom_get_last_written not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cdrom_get_media_event not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cdrom_ioctl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cdrom_media_changed not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cdrom_mode_select not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cdrom_mode_sense not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cdrom_number_of_slots not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cdrom_open not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cdrom_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cfb_copyarea not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cfb_fillrect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cfb_imageblit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_change_page_attr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_check_disk_change not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_create_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_device_add not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_device_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_device_create_bin_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_device_create_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_device_del not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_device_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_device_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_device_initialize not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_device_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_device_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_device_remove_bin_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_device_remove_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_device_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_interface_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_interface_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_remove_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_class_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_clear_inode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_clear_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_clear_page_dirty_for_io not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_clear_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_close_bdev_excl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_color_table not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_complete not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_complete_all not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_complete_and_exit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_compute_creds not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_con_copy_unimap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_con_set_default_unimap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cond_resched not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cond_resched_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cond_resched_softirq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_console_blank_hook not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_console_blanked not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_console_conditional_schedule not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_console_print not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_console_printk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_console_start not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_console_stop not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cont_prepare_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_contig_page_data not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_copy_from_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_copy_fs_struct not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_copy_in_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_copy_io_context not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_copy_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_copy_strings_kernel not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_copy_to_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_copy_user_generic not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpu_callout_map not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpu_core_map not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpu_data not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpu_idle_wait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpu_khz not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpu_online_map not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpu_possible_map not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpu_present_map not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpu_sibling_map not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpu_sysdev_class not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_cpu_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_cpu_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_driver_target not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_get_policy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_gov_performance not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_governor not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_notify_transition not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_parse_governor not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_quick_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_register_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_register_governor not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_register_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_set_policy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_unregister_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_unregister_governor not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_unregister_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_cpufreq_update_policy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_crc32_be not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_crc32_le not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_create_empty_buffers not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_create_proc_entry not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_create_proc_ide_interfaces not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_crypto_alg_available not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_crypto_alloc_tfm not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_crypto_free_tfm not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_crypto_hmac not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_crypto_hmac_final not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_crypto_hmac_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_crypto_hmac_update not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_crypto_register_alg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_crypto_unregister_alg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_csum_ipv6_magic not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_csum_partial not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_csum_partial_copy_from_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_csum_partial_copy_fromiovecend not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_csum_partial_copy_nocheck not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_csum_partial_copy_to_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_csum_partial_copy_to_xdr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_current_fs_time not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_current_io_context not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_current_kernel_time not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_alloc_anon not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_alloc_name not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_alloc_root not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_delete not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_find_alias not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_genocide not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_instantiate not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_instantiate_unique not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_invalidate not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_lookup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_move not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_path not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_prune_aliases not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_rehash not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_splice_alias not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_d_validate not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_daemonize not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_datagram_poll not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dcache_dir_close not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dcache_dir_lseek not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dcache_dir_open not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dcache_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dcache_readdir not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dcookie_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dcookie_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_deactivate_super not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_debug_smp_processor_id not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_default_backing_dev_info not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_default_blu not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_default_grn not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_default_hwif_mmiops not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_default_llseek not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_default_red not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_default_unplug_io_fn not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_default_wake_function not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_del_gendisk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_del_timer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_del_timer_sync not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dentry_open not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dentry_unhash not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dequeue_signal not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_destroy_workqueue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_add_pack not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_alloc_name not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_base not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_base_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_change_flags not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_close not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_ethtool not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_get_by_flags not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_get_by_index not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_get_by_name not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_get_flags not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_getbyhwaddr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_getfirstbyhwtype not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_load not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_mc_add not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_mc_delete not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_mc_upload not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_open not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_queue_xmit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_remove_pack not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_set_allmulti not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_set_mac_address not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_set_mtu not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_set_promiscuity not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dev_valid_name not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_add not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_attach not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_bind_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_create_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_del not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_for_each_child not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_initialize not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_pm_set_parent not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_power_down not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_power_up not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_release_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_remove_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_resume not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_suspend not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_device_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_devinet_ioctl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dget_locked not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_die_chain not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_disable_irq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_disable_irq_nosync not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_disable_timer_nmi_watchdog not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_disallow_signal not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_disk_round_stats not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dlci_ioctl_set not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dma_alloc_coherent not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dma_free_coherent not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dma_get_required_mask not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dma_ops not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dma_pool_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dma_pool_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dma_pool_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dma_pool_free not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dma_set_mask not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dma_spin_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dma_supported not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dmi_check_system not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dmi_find_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dmi_get_system_info not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dnotify_parent not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_SAK not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_add_mount not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_blank_screen not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_brk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_exit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_generic_mapping_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_gettimeofday not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_kern_mount not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_mmap_pgoff not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_munmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_posix_clock_nonanosleep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_posix_clock_nosettime not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_settimeofday not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_softirq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_softirq_thunk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_sync_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_sync_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_do_unblank_screen not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dpm_runtime_resume not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dpm_runtime_suspend not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dput not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_drive_is_ready not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_driver_attach not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_driver_create_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_driver_find not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_driver_find_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_driver_for_each_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_driver_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_driver_remove_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_driver_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_drop_super not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dst_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dst_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_dump_stack not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ec_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ec_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_eighty_ninty_three not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_elevator_exit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_elevator_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_elv_add_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_elv_completed_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_elv_dequeue_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_elv_dispatch_sort not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_elv_next_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_elv_queue_empty not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_elv_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_elv_requeue_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_elv_rq_merge_ok not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_elv_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_emergency_restart not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_empty_zero_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_enable_irq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_enable_timer_nmi_watchdog not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_end_buffer_async_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_end_buffer_read_sync not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_end_buffer_write_sync not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_end_page_writeback not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_end_pfn not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_end_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_end_that_request_chunk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_end_that_request_first not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_end_that_request_last not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_eth_type_trans not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ether_setup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ethtool_op_get_link not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ethtool_op_get_perm_addr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ethtool_op_get_sg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ethtool_op_get_tso not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ethtool_op_get_tx_csum not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ethtool_op_get_ufo not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ethtool_op_set_sg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ethtool_op_set_tso not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ethtool_op_set_tx_csum not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ethtool_op_set_tx_hw_csum not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ethtool_op_set_ufo not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_exit_fs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_export_op_default not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_f_setown not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fasync_helper not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_add_entries not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_alloc_new_dir not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_attach not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_build_inode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_date_unix2dos not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_detach not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_dir_empty not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_fill_super not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_free_clusters not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_fs_panic not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_get_dotdot_entry not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_notify_change not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_remove_entries not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_scan not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_search_long not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_sync_bhs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fat_sync_inode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_add_videomode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_alloc_cmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_blank not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_con_duit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_copy_cmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_dealloc_cmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_default_cmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_delete_videomode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_destroy_modedb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_destroy_modelist not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_edid_to_monspecs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_find_best_display not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_find_best_mode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_find_mode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_find_mode_cvt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_find_nearest_mode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_firmware_edid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_get_buffer_offset not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_get_color_depth not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_get_mode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_get_options not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_invert_cmaps not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_match_mode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_mode_is_equal not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_new_modelist not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_pad_aligned_buffer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_pad_unaligned_buffer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_pan_display not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_parse_edid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_prepare_logo not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_register_client not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_set_cmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_set_suspend not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_set_var not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_show_logo not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_unregister_client not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_validate_mode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_var_to_videomode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_videomode_to_modelist not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fb_videomode_to_var not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fbcon_set_bitops not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fd_install not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fg_console not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fget not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_file_fsync not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_file_lock_list not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_file_permission not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_file_update_time not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_filemap_fdatawait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_filemap_fdatawrite not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_filemap_flush not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_filemap_nopage not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_filemap_populate not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_filemap_write_and_wait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_filp_close not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_filp_open not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_bus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_exported_dentry not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_first_bit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_first_zero_bit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_font not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_get_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_inode_number not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_lock_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_next_bit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_next_zero_bit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_next_zero_string not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_or_create_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_task_by_pid_type not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_trylock_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_find_vma not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_finish_wait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_firmware_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_firmware_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_flock_lock_file_wait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_flush_old_exec not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_flush_scheduled_work not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_flush_signals not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_flush_tlb_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_flush_workqueue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_follow_down not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_follow_up not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_force_sig not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fput not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_framebuffer_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_framebuffer_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_free_buffer_head not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_free_dma not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_free_irq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_free_netdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_free_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_free_percpu not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_free_task not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_freeze_bdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fs_overflowgid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fs_overflowuid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fs_subsys not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_fsync_bdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_g_make_token_header not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_g_token_size not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_g_verify_token_header not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gen_kill_estimator not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gen_new_estimator not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gen_replace_estimator not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generate_random_uuid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic__raw_read_trylock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_block_bmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_commit_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_cont_expand not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_cont_expand_simple not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_delete_inode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_drop_inode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_aio_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_aio_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_aio_write_nolock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_buffered_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_direct_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_llseek not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_mmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_open not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_readonly_mmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_readv not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_sendfile not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_write_nolock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_file_writev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_fillattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_getxattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_ide_ioctl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_listxattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_make_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_osync_inode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_permission not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_read_dir not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_readlink not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_removexattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_ro_fops not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_setxattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_shutdown_super not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_unplug_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_generic_write_checks not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_genl_register_family not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_genl_register_ops not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_genl_sock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_genl_unregister_family not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_genl_unregister_ops not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_agp_version not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_bus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_cpu_sysdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_dcookie not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_default_font not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_disk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_empty_filp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_fs_type not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_io_context not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_max_files not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_option not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_options not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_random_bytes not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_sb_bdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_sb_nodev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_sb_pseudo not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_sb_single not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_super not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_task_mm not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_unmapped_area not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_unused_fd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_user_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_wchan not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_write_access not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_get_zeroed_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_getname not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_getnstimeofday not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_give_up_console not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_global_cache_flush not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_global_flush_tlb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gnet_stats_copy_app not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gnet_stats_copy_basic not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gnet_stats_copy_queue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gnet_stats_copy_rate_est not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gnet_stats_finish_copy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gnet_stats_start_copy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gnet_stats_start_copy_compat not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_grab_cache_page_nowait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_groups_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_groups_free not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gss_decrypt_xdr_buf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gss_encrypt_xdr_buf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gss_mech_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gss_mech_get_by_name not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gss_mech_get_by_pseudoflavor not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gss_mech_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gss_mech_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gss_mech_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gss_pseudoflavor_to_service not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_gss_service_to_auth_domain_name not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_half_md4_transform not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_handle_sysrq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_have_submounts not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_high_memory not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_hwmon_device_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_hwmon_device_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ia32_setup_arg_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ia32_sys_call_table not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_alloc_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_alloc_fmr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_alloc_mw not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_alloc_pd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_attach_mcast not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_cancel_mad not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_cm_establish not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_cm_init_qp_attr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_cm_listen not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_copy_path_rec_from_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_copy_path_rec_to_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_copy_qp_attr_to_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_create_ah not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_create_ah_from_wc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_create_cm_id not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_create_cq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_create_fmr_pool not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_create_path_cursor not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_create_qp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_create_send_mad not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_create_srq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_dealloc_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_dealloc_fmr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_dealloc_mw not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_dealloc_pd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_dereg_mr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_destroy_ah not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_destroy_cm_id not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_destroy_cq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_destroy_fmr_pool not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_destroy_qp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_destroy_srq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_detach_mcast not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_dispatch_event not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_find_cached_gid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_find_cached_pkey not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_flush_fmr_pool not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_fmr_pool_map_phys not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_fmr_pool_unmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_free_multicast not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_free_recv_mad not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_free_sa_cursor not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_free_send_mad not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_get_cached_gid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_get_cached_lmc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_get_cached_pkey not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_get_client_data not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_get_dma_mr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_get_mad_data_offset not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_get_next_sa_attr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_get_path_rec not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_get_rmpp_segment not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_is_mad_class_rmpp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_join_multicast not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_modify_ah not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_modify_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_modify_mad not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_modify_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_modify_qp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_modify_qp_is_ok not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_modify_srq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_pack not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_post_send_mad not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_process_mad_wc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_query_ah not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_query_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_query_gid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_query_mr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_query_pkey not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_query_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_query_qp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_query_srq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_rate_to_mult not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_redirect_mad_qp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_reg_phys_mr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_register_client not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_register_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_register_event_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_register_mad_agent not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_register_mad_snoop not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_rereg_phys_mr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_resize_cq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_sa_cancel_query not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_sa_mcmember_rec_query not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_sa_pack_attr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_sa_path_rec_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_sa_service_rec_query not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_sa_unpack_attr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_send_cm_apr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_send_cm_drep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_send_cm_dreq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_send_cm_lap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_send_cm_mra not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_send_cm_rej not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_send_cm_rep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_send_cm_req not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_send_cm_rtu not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_send_cm_sidr_rep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_send_cm_sidr_req not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_set_client_data not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_ud_header_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_ud_header_pack not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_ud_header_unpack not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_unmap_fmr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_unpack not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_unregister_client not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_unregister_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_unregister_event_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ib_unregister_mad_agent not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_icmp_err_convert not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_icmp_send not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_icmp_statistics not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_add_setting not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_build_dmatable not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_build_sglist not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_bus_type not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_config_drive_speed not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_destroy_dmatable not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_dma_enable not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_dma_intr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_dma_setup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_dma_speed not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_dma_start not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_dma_verbose not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_do_drive_cmd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_do_reset not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_dump_status not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_end_drive_cmd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_end_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_error not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_execute_command not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_fix_driveid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_fixstring not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_get_best_pio_mode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_get_error_location not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_hwifs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_in_drive_list not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_init_disk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_init_drive_cmd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_init_sg_cmd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_map_sg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_pci_create_host_proc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_pci_setup_ports not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_pci_unregister_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_pio_timings not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_rate_filter not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_raw_taskfile not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_register_hw not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_register_hw_with_fixup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_register_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_register_subdriver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_set_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_set_xfer_rate not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_setup_dma not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_setup_pci_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_setup_pci_devices not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_setup_pci_noise not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_spin_wait_hwgroup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_stall_queue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_undecoded_slave not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_unregister_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_unregister_subdriver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_use_dma not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_wait_not_busy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_wait_stat not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ide_xfer_verbose not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ideprobe_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_idle_notifier_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_idle_notifier_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_idr_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_idr_find not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_idr_get_new not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_idr_get_new_above not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_idr_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_idr_pre_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_idr_remove not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iget5_locked not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iget_locked not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_igrab not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ilookup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ilookup5 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ilookup5_nowait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_in_aton not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_in_dev_finish_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_in_egroup_p not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_in_group_p not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_in_lock_functions not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_index_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_index_find not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_index_find_replace not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_index_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_index_insert not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_index_remove not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_index_remove_all not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_accept not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_add_protocol not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_addr_type not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_bind not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_bind_bucket_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_bind_hash not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_accept not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_addr2sockaddr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_bind_conflict not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_clear_xmit_timers not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_clone not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_delete_keepalive_timer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_destroy_sock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_get_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_init_xmit_timers not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_listen_start not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_listen_stop not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_reqsk_queue_hash_add not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_reqsk_queue_prune not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_reset_keepalive_timer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_route_req not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_search_req not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_csk_timer_bug_msg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_del_protocol not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_dgram_connect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_dgram_ops not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_diag_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_diag_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_getname not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_hash_connect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_ioctl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_listen not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_listen_wlock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_put_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_register_protosw not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_select_addr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_sendmsg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_shutdown not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_sk_rebuild_header not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_sock_destruct not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_stream_connect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_stream_ops not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_twdr_hangman not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_twdr_twcal_tick not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_twdr_twkill_work not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_twsk_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_twsk_deschedule not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_twsk_schedule not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inet_unregister_protosw not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inetdev_by_index not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_init_buffer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_init_cdrom_command not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_init_level4_pgt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_init_mm not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_init_rwsem not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_init_special_inode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_init_task not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_init_timer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inode_add_bytes not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inode_change_ok not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inode_get_bytes not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inode_init_once not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inode_needs_sync not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inode_set_bytes not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inode_setattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inode_sub_bytes not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inotify_dentry_parent_queue_event not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inotify_get_cookie not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inotify_inode_is_dead not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inotify_inode_queue_event not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_inotify_unmount_inodes not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_input_accept_process not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_input_allocate_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_input_class not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_input_close_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_input_event not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_input_flush_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_input_grab_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_input_open_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_input_register_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_input_register_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_input_release_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_input_unregister_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_input_unregister_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_insert_resource not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_install_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_int_sqrt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_interruptible_sleep_on not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_interruptible_sleep_on_timeout not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_invalidate_bdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_invalidate_inode_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_invalidate_inode_pages2 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_invalidate_inode_pages2_range not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_invalidate_inodes not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_invalidate_partition not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_io_schedule not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ioctl_by_bdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iomem_resource not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iommu_bio_merge not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iommu_merge not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iommu_sac_force not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ioport_map not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ioport_resource not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ioport_unmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ioread16 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ioread16_rep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ioread16be not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ioread32 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ioread32_rep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ioread32be not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ioread8 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ioread8_rep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ioremap_nocache not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iounmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iov_shorten not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iowrite16 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iowrite16_rep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iowrite16be not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iowrite32 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iowrite32_rep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iowrite32be not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iowrite8 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iowrite8_rep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip4_datagram_connect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_build_and_send_pkt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_cmsg_recv not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_compute_csum not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_ct_attach not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_defrag not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_fragment not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_generic_getfrag not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_getsockopt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_mc_dec_group not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_mc_inc_group not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_mc_join_group not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_nat_decode_session not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_queue_xmit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_route_input not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_route_me_harder not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_route_output_flow not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_route_output_key not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_rt_ioctl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_send_check not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_setsockopt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ip_statistics not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iput not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ipv4_config not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ipv4_specific not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_is_bad_inode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_is_console_locked not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_isa_dma_bridge_buggy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_iunique not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_jiffies not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_jiffies_64 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_abort not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_ack_err not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_blocks_per_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_check_available_features not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_check_used_features not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_clear_err not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_dirty_data not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_dirty_metadata not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_errno not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_extend not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_flush not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_force_commit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_force_commit_nested not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_forget not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_get_create_access not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_get_undo_access not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_get_write_access not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_init_dev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_init_inode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_invalidatepage not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_load not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_lock_updates not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_release_buffer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_restart not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_revoke not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_set_features not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_start not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_start_commit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_stop not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_try_to_free_buffers not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_unlock_updates not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_update_format not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_update_superblock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_journal_wipe not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kblockd_flush not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kblockd_schedule_work not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kern_mount not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kernel_halt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kernel_kexec not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kernel_power_off not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kernel_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kernel_recvmsg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kernel_restart not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kernel_sendmsg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kernel_subsys not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kernel_thread not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kfifo_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kfifo_free not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kfifo_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kfree not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kick_iocb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kill_anon_super not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kill_block_super not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kill_fasync not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kill_litter_super not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kill_pg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kill_proc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kill_proc_info_as_uid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_klist_add_head not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_klist_add_tail not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_klist_del not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_klist_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_klist_iter_exit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_klist_iter_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_klist_iter_init_node not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_klist_next not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_klist_node_attached not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_klist_remove not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kmem_cache_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kmem_cache_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kmem_cache_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kmem_cache_free not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kmem_cache_name not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kmem_cache_shrink not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kmem_cache_size not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kmem_find_general_cachep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kobject_add not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kobject_del not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kobject_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kobject_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kobject_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kobject_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kobject_set_name not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kobject_uevent not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kobject_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_krb5_decrypt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_krb5_encrypt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kref_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kref_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kref_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kset_find_obj not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kset_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kset_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kstrdup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kthread_bind not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kthread_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kthread_should_stop not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kthread_stop not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kthread_stop_sem not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ktime_get_real not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ktime_get_ts not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_kzalloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_laptop_mode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_lease_get_mtime not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_lease_modify not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_linkwatch_fire_event not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ll_rw_block not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_load_gs_index not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_load_nls not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_load_nls_default not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_local_bh_enable not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_lock_kernel not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_lock_may_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_lock_may_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_lock_rename not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_lock_sock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_lockd_down not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_lockd_up not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_locks_copy_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_locks_init_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_locks_mandatory_area not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_locks_remove_posix not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_log_wait_commit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_lookup_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_lookup_hash not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_lookup_instantiate_filp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_lookup_one_len not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_loop_register_transfer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_loop_unregister_transfer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_loopback_dev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_loops_per_jiffy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_make_bad_inode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_make_checksum not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_malloc_sizes not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_map_page_into_agp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mapping_tagged not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mark_buffer_async_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mark_buffer_dirty not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mark_buffer_dirty_inode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mark_mounts_for_expiry not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mark_page_accessed not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_match_hex not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_match_int not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_match_octal not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_match_strcpy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_match_strdup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_match_token not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_max_cstate not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_max_mapnr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_max_pfn not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_may_umount not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_may_umount_tree not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mb_cache_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mb_cache_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mb_cache_entry_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mb_cache_entry_find_first not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mb_cache_entry_find_next not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mb_cache_entry_free not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mb_cache_entry_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mb_cache_entry_insert not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mb_cache_entry_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mb_cache_shrink not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mem_map not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_memchr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_memcmp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_memcpy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_memcpy_fromiovec not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_memcpy_fromiovecend not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_memcpy_toiovec not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_memmove not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_memparse not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mempool_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mempool_alloc_slab not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mempool_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mempool_create_node not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mempool_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mempool_free not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mempool_free_slab not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mempool_resize not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_memscan not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_memset not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_memset_io not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_misc_deregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_misc_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mktime not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mmput not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mnt_pin not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mnt_unpin not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mntput_no_expire not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mod_page_state_offset not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mod_timer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_module_add_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_module_refcount not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_module_remove_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_monotonic_clock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_move_addr_to_kernel not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_move_addr_to_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mpage_readpage not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mpage_readpages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mpage_writepage not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mpage_writepages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_msleep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_msleep_interruptible not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mtrr_add not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mtrr_del not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mult_to_ib_rate not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mutex_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mutex_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mutex_lock_interruptible not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mutex_trylock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_mutex_unlock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_n_tty_ioctl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_names_cachep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_add not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_changeaddr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_compat_output not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_connected_output not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_delete not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_dump_info not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_event_ns not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_for_each not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_ifdown not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_lookup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_lookup_nodev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_parms_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_parms_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_rand_reach_time not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_resolve_output not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_seq_next not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_seq_start not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_seq_stop not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_sysctl_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_sysctl_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_table_clear not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_table_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_update not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neigh_update_hhs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neightbl_dump_info not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_neightbl_set not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_net_disable_timestamp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_net_enable_timestamp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_net_random not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_net_ratelimit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_net_srandom not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_net_statistics not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netdev_boot_setup_check not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netdev_features_change not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netdev_rx_csum_fault not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netdev_set_master not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netdev_state_change not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netif_carrier_off not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netif_carrier_on not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netif_receive_skb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netif_rx not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netif_rx_ni not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netlink_ack not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netlink_broadcast not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netlink_dump_start not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netlink_kernel_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netlink_queue_skip not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netlink_register_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netlink_run_queue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netlink_set_err not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netlink_set_nonroot not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netlink_unicast not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_netlink_unregister_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_new_inode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_next_thread not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_ct_attach not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_getsockopt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_hook_slow not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_hooks not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_log_packet not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_log_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_log_unregister_logger not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_log_unregister_pf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_register_hook not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_register_queue_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_register_queue_rerouter not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_register_sockopt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_reinject not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_setsockopt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_unregister_hook not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_unregister_queue_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_unregister_queue_handlers not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_unregister_queue_rerouter not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nf_unregister_sockopt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nfs4_acl_add_ace not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nfs4_acl_free not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nfs4_acl_get_whotype not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nfs4_acl_new not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nfs4_acl_nfsv4_to_posix not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nfs4_acl_posix_to_nfsv4 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nfs4_acl_write_who not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nfs_debug not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nfsd_debug not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nla_find not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nla_memcmp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nla_memcpy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nla_parse not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nla_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nla_reserve not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nla_strcmp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nla_strlcpy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nla_validate not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nlm_debug not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nlmclnt_proc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nlmsvc_ops not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nmi_active not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nmi_watchdog not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_no_llseek not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_noautodma not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nobh_commit_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nobh_prepare_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nobh_truncate_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nobh_writepage not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_node_online_map not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_node_possible_map not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nonseekable_open not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_noop_qdisc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_noop_qdisc_ops not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_notifier_call_chain not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_notifier_chain_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_notifier_chain_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_notify_change not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nr_free_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_nr_pagecache not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_num_physpages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_num_registered_fb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_oops_in_progress not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_open_bdev_excl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_open_by_devnum not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_open_exec not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_open_softirq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_out_of_line_bug not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_out_of_line_wait_on_bit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_out_of_line_wait_on_bit_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_overflowgid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_overflowuid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_page_follow_link_light not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_page_put_link not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_page_readlink not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_page_symlink not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_page_symlink_inode_operations not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pagevec_lookup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pagevec_lookup_tag not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_panic not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_panic_blink not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_panic_notifier_list not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_panic_timeout not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_array_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_array_set not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_get_bool not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_get_byte not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_get_charp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_get_int not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_get_invbool not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_get_long not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_get_short not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_get_string not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_get_uint not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_get_ulong not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_get_ushort not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_set_bool not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_set_byte not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_set_charp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_set_copystring not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_set_int not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_set_invbool not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_set_long not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_set_short not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_set_uint not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_set_ulong not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_param_set_ushort not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_path_lookup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_path_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_path_walk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_add_new_bus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_assign_resource not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_block_user_cfg_access not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_bus_add_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_bus_add_devices not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_bus_alloc_resource not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_bus_assign_resources not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_bus_find_capability not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_bus_read_config_byte not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_bus_read_config_dword not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_bus_read_config_word not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_bus_size_bridges not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_bus_type not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_bus_write_config_byte not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_bus_write_config_dword not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_bus_write_config_word not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_choose_state not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_claim_resource not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_clear_mwi not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_create_bus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_dev_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_dev_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_dev_present not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_dev_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_disable_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_do_scan_bus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_enable_bridges not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_enable_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_enable_device_bars not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_enable_wake not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_find_bus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_find_capability not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_find_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_find_device_reverse not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_find_next_bus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_find_next_capability not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_find_parent_resource not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_find_slot not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_fixup_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_get_class not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_get_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_get_slot not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_get_subsys not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_intx not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_iomap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_iounmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_map_rom not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_map_rom_copy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_match_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_match_id not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_mem_start not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_osc_control_set not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_osc_support_set not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_pci_problems not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_proc_attach_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_proc_detach_bus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_release_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_release_regions not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_remove_behind_bridge not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_remove_bus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_remove_bus_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_remove_rom not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_request_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_request_regions not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_restore_bars not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_restore_state not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_root_buses not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_save_state not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_scan_bridge not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_scan_bus_parented not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_scan_child_bus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_scan_single_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_scan_slot not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_set_consistent_dma_mask not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_set_dma_mask not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_set_master not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_set_mwi not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_set_power_state not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_setup_cardbus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_unblock_user_cfg_access not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_unmap_rom not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_unregister_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pci_walk_bus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pcie_mch_quirk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pcie_port_service_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pcie_port_service_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pciserial_init_ports not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pciserial_remove_ports not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pciserial_resume_ports not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pciserial_suspend_ports not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_per_cpu__kstat not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_per_cpu__softnet_data not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_percpu_counter_mod not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_percpu_counter_sum not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_permission not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_add_devices not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_bus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_bus_type not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_device_add not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_device_add_data not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_device_add_resources not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_device_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_device_del not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_device_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_device_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_device_register_simple not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_device_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_driver_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_driver_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_get_irq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_get_irq_byname not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_get_resource not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_platform_get_resource_byname not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pm_active not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pm_idle not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pm_power_off not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pm_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pm_send_all not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pm_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pm_unregister_all not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pneigh_enqueue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pneigh_lookup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_poll_freewait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_poll_initwait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_acl_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_acl_chmod_masq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_acl_clone not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_acl_create_masq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_acl_equiv_mode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_acl_from_mode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_acl_from_xattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_acl_permission not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_acl_to_xattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_acl_valid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_block_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_lock_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_lock_file_wait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_locks_deadlock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_test_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_timer_event not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_posix_unblock_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pre_task_out_intr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_preempt_schedule not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_prepare_binprm not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_prepare_to_wait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_prepare_to_wait_exclusive not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_print_hexl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_printk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_printk_ratelimit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_probe_hwif_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_probe_irq_mask not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_probe_irq_off not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_probe_irq_on not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_bus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_dointvec not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_dointvec_jiffies not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_dointvec_minmax not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_dointvec_ms_jiffies not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_dointvec_userhz_jiffies not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_dostring not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_doulongvec_minmax not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_doulongvec_ms_jiffies_minmax not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_ide_read_capacity not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_ide_read_geometry not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_mkdir not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_net not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_net_netfilter not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_net_stat not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_root not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_root_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_root_fs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proc_symlink not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_profile_event_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_profile_event_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_profile_pc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proto_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_proto_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ps2_cmd_aborted not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ps2_command not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ps2_drain not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ps2_handle_ack not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ps2_handle_response not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ps2_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ps2_schedule_command not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ps2_sendbyte not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pskb_copy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_pskb_expand_head not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_ptrace_notify not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_put_bus not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_put_cmsg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_put_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_put_disk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_put_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_put_files_struct not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_put_io_context not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_put_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_put_rpccred not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_put_tty_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_put_unused_fd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_qdisc_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_qdisc_create_dflt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_qdisc_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_qdisc_lock_tree not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_qdisc_reset not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_qdisc_restart not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_qdisc_unlock_tree not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_queue_delayed_work not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_queue_work not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_qword_add not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_qword_addhex not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_qword_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_radix_tree_delete not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_radix_tree_gang_lookup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_radix_tree_gang_lookup_tag not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_radix_tree_insert not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_radix_tree_lookup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_radix_tree_lookup_slot not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_radix_tree_tag_clear not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_radix_tree_tag_set not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_radix_tree_tagged not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_raise_softirq_irqoff not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rb_erase not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rb_first not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rb_insert_color not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rb_last not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rb_next not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rb_prev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rb_replace_node not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rcu_barrier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rcu_batches_completed not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_accept not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_addr_cancel not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_bind_addr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_connect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_create_id not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_create_qp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_destroy_id not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_destroy_qp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_disconnect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_init_qp_attr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_listen not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_reject not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_resolve_addr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_resolve_ip not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_resolve_route not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_set_ib_paths not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rdma_translate_ip not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_read_bytes_from_xdr_buf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_read_cache_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_read_cache_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_read_dev_sector not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_recalc_sigpending not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_redirty_page_for_writepage not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_redraw_screen not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_refrigerator not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_acpi_bus_type not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_binfmt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_blkdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_cdrom not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_chrdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_chrdev_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_console not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_cpu_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_die_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_exec_domain not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_filesystem not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_framebuffer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_gifconf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_inetaddr_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_module_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_netdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_netdevice not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_netdevice_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_nls not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_posix_clock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_reboot_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_sysctl_table not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_sysrq_key not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_register_timer_hook not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_registered_fb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_release_console_sem not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_release_lapic_nmi not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_release_resource not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_release_sock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_remap_pfn_range not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_remote_llseek not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_remove_arg_zero not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_remove_inode_hash not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_remove_proc_entry not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_remove_shrinker not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_remove_suid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_remove_wait_queue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_reqsk_queue_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_reqsk_queue_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_request_dma not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_request_irq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_request_module not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_request_resource not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_reserve_lapic_nmi not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_alloc_iostats not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_bind_new_program not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_calc_rto not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_call_async not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_call_setup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_call_sync not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_clnt_sigmask not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_clnt_sigunmask not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_clone_client not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_debug not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_delay not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_destroy_client not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_execute not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_exit_task not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_force_rebind not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_free not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_free_iostats not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_get_protocol not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_init_rtt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_init_task not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_init_wait_queue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_killall_tasks not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_malloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_max_payload not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_mkpipe not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_new_task not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_peeraddr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_peeraddr2str not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_print_iostats not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_proc_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_proc_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_queue_upcall not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_release_task not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_restart_call not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_run_task not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_setbufsize not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_shutdown_client not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_sleep_on not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_unlink not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_update_rtt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_wake_up not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_wake_up_next not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_wake_up_status not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpc_wake_up_task not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpcauth_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpcauth_free_credcache not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpcauth_init_credcache not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpcauth_lookup_credcache not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpcauth_lookupcred not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpcauth_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpcauth_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpcb_getport4 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpcb_getport_external not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpciod_down not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rpciod_up not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rtattr_parse not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rtattr_strlcpy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rtc_control not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rtc_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rtc_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rtc_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rtnetlink_links not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rtnetlink_put_metrics not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rtnl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rtnl_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rtnl_lock_interruptible not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rtnl_sem not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_rtnl_unlock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sb_min_blocksize not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sb_set_blocksize not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sched_setscheduler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_schedule not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_schedule_delayed_work not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_schedule_delayed_work_on not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_schedule_timeout not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_schedule_timeout_interruptible not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_schedule_timeout_uninterruptible not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_schedule_work not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_scm_detach_fds not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_scm_fp_dup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_scnprintf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_screen_info not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_scsi_cmd_ioctl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_scsi_command_size not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_search_binary_handler not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_secure_tcp_sequence_number not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_securebits not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_send_sig not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_send_sig_info not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_seq_escape not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_seq_lseek not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_seq_open not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_seq_path not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_seq_printf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_seq_putc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_seq_puts not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_seq_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_seq_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_seq_release_private not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_serial8250_register_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_serial8250_resume_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_serial8250_suspend_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_serial8250_unregister_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_serio_close not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_serio_interrupt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_serio_open not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_serio_reconnect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_serio_rescan not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_serio_unregister_child_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_serio_unregister_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_serio_unregister_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_set_anon_super not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_set_bh_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_set_binfmt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_set_blocksize not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_set_cpus_allowed not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_set_current_groups not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_set_device_ro not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_set_disk_ro not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_set_nmi_callback not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_set_page_dirty not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_set_page_dirty_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_set_shrinker not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_set_user_nice not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_setlease not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_setup_arg_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sget not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sha_transform not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_shrink_dcache_parent not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_shrink_dcache_sb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_si_meminfo not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sigprocmask not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_attr_close not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_attr_open not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_attr_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_attr_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_commit_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_dir_inode_operations not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_dir_operations not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_empty not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_fill_super not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_getattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_link not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_lookup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_pin_fs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_prepare_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_read_from_buffer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_readpage not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_release_fs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_rename not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_rmdir not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_statfs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_strtol not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_strtoul not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_strtoull not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_sync_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_transaction_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_transaction_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_transaction_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_simple_unlink not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_single_open not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_single_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_chk_filter not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_clone not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_common_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_free not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_reset_timer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_run_filter not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_send_sigurg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_stop_timer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_stream_error not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_stream_kill_queues not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_stream_mem_schedule not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_stream_rfree not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_stream_wait_close not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_stream_wait_connect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_stream_wait_memory not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_stream_write_space not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sk_wait_data not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_abort_seq_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_append not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_append_datato_frags not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_checksum not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_checksum_help not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_clone not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_clone_fraglist not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_copy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_copy_and_csum_bits not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_copy_and_csum_datagram_iovec not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_copy_and_csum_dev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_copy_bits not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_copy_datagram_iovec not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_copy_expand not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_dequeue not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_dequeue_tail not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_find_text not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_free_datagram not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_insert not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_kill_datagram not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_make_writable not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_over_panic not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_pad not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_prepare_seq_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_queue_head not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_queue_purge not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_queue_tail not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_realloc_headroom not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_recv_datagram not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_seq_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_split not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_store_bits not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_under_panic not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_skb_unlink not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sleep_on not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sleep_on_timeout not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_smp_call_function not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_smp_num_siblings not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_snprintf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_alloc_send_skb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_common_getsockopt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_common_recvmsg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_common_setsockopt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_create_kern not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_create_lite not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_enable_timestamp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_get_timestamp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_i_ino not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_i_uid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_init_data not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_kfree_s not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_kmalloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_map_fd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_accept not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_bind not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_connect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_getname not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_getsockopt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_ioctl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_listen not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_mmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_poll not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_recvmsg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_sendmsg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_sendpage not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_setsockopt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_shutdown not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_no_socketpair not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_recvmsg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_release not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_rfree not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_sendmsg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_setsockopt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_wake_async not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_wfree not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sock_wmalloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sockfd_lookup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_soft_cursor not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sort not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sprintf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sscanf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_start_tty not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_steal_locks not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_stop_tty not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strcat not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strchr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strcmp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strcpy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strcspn not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strlcat not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strlcpy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strlen not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strncat not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strnchr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strncmp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strncpy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strncpy_from_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strnicmp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strnlen not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strnlen_user not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strpbrk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strrchr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strsep not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strspn not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_strstr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_struct_module not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sub_preempt_count not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_submit_bh not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_submit_bio not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_subsys_create_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_subsys_remove_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_subsystem_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_subsystem_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_subsystem_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_suid_dumpable not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_auth_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_auth_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_authenticate not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_create_thread not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_destroy not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_drop not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_exit_thread not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_makesock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_print_addr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_proc_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_proc_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_process not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_recv not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_reserve not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_seq_show not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_set_client not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svc_wake_up not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svcauth_gss_register_pseudoflavor not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_svcauth_unix_purge not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swap_io_context not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_alloc_coherent not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_dma_mapping_error not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_dma_supported not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_free_coherent not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_map_sg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_map_single not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_sync_sg_for_cpu not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_sync_sg_for_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_sync_single_for_cpu not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_sync_single_for_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_sync_single_range_for_cpu not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_sync_single_range_for_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_unmap_sg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_swiotlb_unmap_single not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_switch_APIC_timer_to_ipi not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_switch_ipi_to_APIC_timer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_symbol_put_addr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sync_blockdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sync_dirty_buffer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sync_inode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sync_mapping_buffers not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sync_page_range not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sync_page_range_nolock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_synchronize_irq not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_synchronize_kernel not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_synchronize_net not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_synchronize_rcu not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sys_close not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sys_ioctl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sys_open not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sys_openat not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sys_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sys_tz not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_intvec not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_ip_nonlocal_bind not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_jiffies not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_local_port_range not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_ms_jiffies not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_optmem_max not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_rmem_max not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_string not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_tcp_abc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_tcp_ecn not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_tcp_low_latency not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_tcp_mem not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_tcp_reordering not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_tcp_rmem not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_tcp_tso_win_divisor not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_tcp_tw_reuse not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_tcp_wmem not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_vfs_cache_pressure not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysctl_wmem_max not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysdev_class_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysdev_class_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysdev_create_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysdev_driver_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysdev_driver_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysdev_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysdev_remove_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysdev_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysfs_chmod_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysfs_create_bin_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysfs_create_dir not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysfs_create_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysfs_create_group not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysfs_create_link not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysfs_remove_bin_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysfs_remove_dir not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysfs_remove_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysfs_remove_group not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysfs_remove_link not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysfs_rename_dir not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_sysfs_update_file not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_system_bus_clock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_system_state not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_system_utsname not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_take_over_console not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_task_handoff_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_task_handoff_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_task_in_intr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_task_nice not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_task_no_data_intr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tasklet_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tasklet_kill not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tasklist_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_check_req not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_child_process not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_close not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_connect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_create_openreq_child not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_death_row not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_disconnect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_enter_memory_pressure not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_get_info not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_getsockopt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_hashinfo not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_init_congestion_ops not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_init_xmit_timers not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_initialize_rcv_mss not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_ioctl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_make_synack not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_memory_allocated not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_memory_pressure not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_orphan_count not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_parse_options not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_poll not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_proc_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_proc_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_prot not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_rcv_established not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_rcv_state_process not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_read_sock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_recvmsg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_register_congestion_control not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_reno_cong_avoid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_reno_min_cwnd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_reno_ssthresh not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_sendmsg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_sendpage not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_setsockopt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_shutdown not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_simple_retransmit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_slow_start not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_sockets_allocated not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_statistics not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_sync_mss not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_timewait_state_process not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_twsk_unique not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_unhash not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_unregister_congestion_control not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_v4_conn_request not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_v4_connect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_v4_destroy_sock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_v4_do_rcv not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_v4_remember_stamp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_v4_send_check not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tcp_v4_syn_recv_sock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_test_clear_page_dirty not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_test_set_page_writeback not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_thaw_bdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_timespec_trunc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_totalram_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_touch_atime not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_touch_nmi_watchdog not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_touch_softlockup_watchdog not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_transport_add_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_transport_class_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_transport_class_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_transport_configure_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_transport_destroy_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_transport_remove_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_transport_setup_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_truncate_inode_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_truncate_inode_pages_range not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_try_acquire_console_sem not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_try_to_free_buffers not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_try_to_release_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_buffer_request_room not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_check_change not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_flip_buffer_push not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_get_baud_rate not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_hangup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_hung_up_p not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_insert_flip_string not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_insert_flip_string_flags not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_ldisc_deref not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_ldisc_flush not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_ldisc_get not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_ldisc_put not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_ldisc_ref not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_ldisc_ref_wait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_name not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_prepare_flip_string not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_prepare_flip_string_flags not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_register_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_register_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_register_ldisc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_set_operations not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_std_termios not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_termios_baud_rate not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_unregister_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_unregister_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_unregister_ldisc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_vhangup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_wait_until_sent not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_tty_wakeup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_uart_add_one_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_uart_get_baud_rate not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_uart_get_divisor not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_uart_match_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_uart_register_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_uart_remove_one_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_uart_resume_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_uart_suspend_port not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_uart_unregister_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_uart_update_timeout not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_uart_write_wakeup not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_udp_disconnect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_udp_hash not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_udp_hash_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_udp_ioctl not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_udp_poll not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_udp_port_rover not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_udp_proc_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_udp_proc_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_udp_prot not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_udp_sendmsg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_uhci_check_and_reset_hc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_uhci_reset_hc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unblock_all_signals not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unix_domain_find not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unload_nls not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unlock_buffer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unlock_kernel not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unlock_new_inode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unlock_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unlock_rename not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unmap_mapping_range not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unmap_page_from_agp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unmap_underlying_metadata not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_acpi_bus_type not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_binfmt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_blkdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_cdrom not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_chrdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_chrdev_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_console not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_cpu_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_exec_domain not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_filesystem not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_framebuffer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_inetaddr_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_module_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_netdev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_netdevice not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_netdevice_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_nls not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_reboot_notifier not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_sysctl_table not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_sysrq_key not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unregister_timer_hook not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unset_nmi_callback not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_unshare_files not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_update_region not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_add_hcd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_alloc_dev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_alloc_urb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_altnum_to_altsetting not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_buffer_alloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_buffer_free not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_buffer_map_sg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_buffer_unmap_sg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_bulk_msg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_bus_list not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_bus_list_lock not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_calc_bus_time not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_check_bandwidth not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_claim_bandwidth not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_clear_halt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_control_msg not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_create_hcd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_deregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_deregister_dev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_disabled not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_disconnect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_driver_claim_interface not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_driver_release_interface not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_find_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_find_interface not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_free_urb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_get_current_frame_number not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_get_descriptor not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_get_dev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_get_intf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_get_status not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_get_string not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_get_urb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_hc_died not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_hcd_giveback_urb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_hcd_pci_probe not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_hcd_pci_remove not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_hcd_pci_resume not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_hcd_pci_suspend not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_hcd_poll_rh_status not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_hcd_resume_root_hub not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_hcd_suspend_root_hub not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_hub_tt_clear_buffer not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_ifnum_to_if not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_init_urb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_kill_urb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_lock_device_for_reset not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_match_id not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_put_dev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_put_hcd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_put_intf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_register_dev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_register_driver not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_register_notify not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_release_bandwidth not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_remove_hcd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_reset_configuration not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_reset_device not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_root_hub_lost_power not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_set_device_state not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_set_interface not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_sg_cancel not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_sg_init not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_sg_wait not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_string not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_submit_urb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_unlink_urb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_usb_unregister_notify not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_utf8_mbstowcs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_utf8_mbtowc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_utf8_wcstombs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_utf8_wctomb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_uts_sem not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vc_cons not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vc_resize not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfree not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_create not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_follow_link not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_fstat not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_getattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_getxattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_link not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_llseek not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_lstat not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_mkdir not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_mknod not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_permission not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_read not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_readdir not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_readlink not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_readv not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_removexattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_rename not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_rmdir not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_setxattr not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_stat not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_statfs not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_symlink not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_unlink not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_write not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vfs_writev not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vlan_ioctl_set not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vm_insert_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vmalloc not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vmalloc_32 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vmalloc_earlyreserve not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vmalloc_node not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vmalloc_to_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vmalloc_to_pfn not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vmtruncate not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vmtruncate_range not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vprintk not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vscnprintf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vsnprintf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vsprintf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vsscanf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_vunmap not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_wait_for_completion not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_wait_for_completion_interruptible not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_wait_for_completion_interruptible_timeout not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_wait_for_completion_timeout not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_wait_on_page_bit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_wait_on_sync_kiocb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_wake_bit_function not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_wake_up_bit not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_wake_up_process not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_write_inode_now not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_write_one_page not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_x86_acpiid_to_apicid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_x86_cpu_to_apicid not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_buf_from_iov not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_buf_read_netobj not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_buf_subsegment not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_decode_array2 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_decode_netobj not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_decode_string_inplace not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_decode_word not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_encode_array2 not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_encode_netobj not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_encode_opaque not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_encode_opaque_fixed not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_encode_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_encode_string not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_encode_word not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_init_decode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_init_encode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_inline_decode not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_inline_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_partial_copy_from_skb not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_read_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_reserve_space not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_shift_buf not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xdr_write_pages not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_adjust_cwnd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_complete_rqst not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_disconnect not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_lookup_rqst not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_register not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_release_rqst_cong not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_release_xprt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_release_xprt_cong not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_reserve_xprt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_reserve_xprt_cong not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_set_retrans_timeout_def not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_set_retrans_timeout_rtt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_set_timeout not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_unregister not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_update_rtt not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_wait_for_buffer_space not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_wake_pending_tasks not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xprt_write_space not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xrlim_allow not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_xtime not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_yield not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_zero_fill_bio not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_zlib_inflate not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_zlib_inflateEnd not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_zlib_inflateIncomp not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_zlib_inflateInit2_ not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_zlib_inflateInit_ not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_zlib_inflateReset not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_zlib_inflate_workspacesize not found in System.map. Ignoring System.map entry Warning (compare_maps): vmlinux symbol __crc_zone_table not found in System.map. Ignoring System.map entry ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) CPU 0: aperture @ 0 size 32 MB CPU 1: Syncing TSC to CPU 0. CPU 1: synchronized TSC with CPU 0 (last diff -130 cycles, maxerr 886 cycles) testing NMI watchdog ... OK. SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled Defaults: CPU 1 Pid: 4803, comm: nfsrdmamount Not tainted 2.6.16.16 #4 RIP: 0010:[] [] Using defaults from ksymoops -t elf64-x86-64 -a i386:x86-64 RSP: 0018:ffff8100aea41af8 EFLAGS: 00010206 RAX: 6d64723d6f746f72 RBX: ffff8100ae4b6c80 RCX: 00000000d2cd7156 RDX: ffff8100aedd3a68 RSI: 0000000000000800 RDI: ffff8100ae4b6c80 RBP: ffff8100a94b8000 R08: 0000000000000000 R09: ffff8100ae4b6c80 R10: 0000000000000001 R11: 0000000000000001 R12: ffff8100aedd3800 R13: ffff8100aea41b38 R14: ffffffff80436e86 R15: ffff8100aedd3800 FS: 0000000000513ae0(0000) GS:ffff810003ce25c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fffff829000 CR3: 00000000ae620000 CR4: 00000000000006a0 Stack: ffff8100ae4b6c80 ffff8100ae4b6d50 0000000000000000 ffffffff803e7a86 ffff8100ae4b6c80 0000000000000000 ffff8100aea41b68 ffffffff803e3f6b 0000000000000000 0000000000000600 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Code: ff 50 38 48 85 c0 75 5c 48 89 de 48 c7 c7 22 f5 45 80 e8 5b >>RIP; ffffffff803e41cb <===== >>RAX; 6d64723d6f746f72 <__crc_genl_unregister_ops+6d64723c6f832eb4/fffffffe801ebf42> >>RBX; ffff8100ae4b6c80 <__crc_genl_unregister_ops+ffff80ffae5a2bc2/fffffffe801ebf42> >>RCX; 00000000d2cd7156 <__crc_ib_create_ah+fd3d8/174338> >>RDX; ffff8100aedd3a68 <__crc_genl_unregister_ops+ffff80ffaeebf9aa/fffffffe801ebf42> >>RDI; ffff8100ae4b6c80 <__crc_genl_unregister_ops+ffff80ffae5a2bc2/fffffffe801ebf42> >>RBP; ffff8100a94b8000 <__crc_genl_unregister_ops+ffff80ffa95a3f42/fffffffe801ebf42> >>R09; ffff8100ae4b6c80 <__crc_genl_unregister_ops+ffff80ffae5a2bc2/fffffffe801ebf42> >>R12; ffff8100aedd3800 <__crc_genl_unregister_ops+ffff80ffaeebf742/fffffffe801ebf42> >>R13; ffff8100aea41b38 <__crc_genl_unregister_ops+ffff80ffaeb2da7a/fffffffe801ebf42> >>R14; ffffffff80436e86 <__func__.0+16b96/44330> >>R15; ffff8100aedd3800 <__crc_genl_unregister_ops+ffff80ffaeebf742/fffffffe801ebf42> Trace; ffffffff803e7a86 <__rpc_execute+84/1bf> Trace; ffffffff803e3f6b Trace; ffffffff803e4ef8 Trace; ffffffff803e5275 Trace; ffffffff801f52a3 Trace; ffffffff8016b612 Trace; ffffffff80172783 Trace; ffffffff8017ee1b Trace; ffffffff801210c0 Trace; ffffffff802b2358 <__up_read+10/98> Trace; ffffffff8017e768 Trace; ffffffff80138d29 Trace; ffffffff801187b3 Trace; ffffffff804011fd <_spin_unlock_irqrestore+14/31> Trace; ffffffff80121937 Trace; ffffffff801230eb <__wake_up_common+42/61> Trace; ffffffff804011fd <_spin_unlock_irqrestore+14/31> Trace; ffffffff80400f5b <__down+ed/100> Trace; ffffffff80400d01 <__down_failed+35/3a> Trace; ffffffff8017f14a Trace; ffffffff8010a9f2 Code; ffffffff803e41cb 0000000000000000 <_RIP>: Code; ffffffff803e41cb <===== 0: ff 50 38 callq *0x38(%rax) <===== Code; ffffffff803e41ce 3: 48 85 c0 test %rax,%rax Code; ffffffff803e41d1 6: 75 5c jne 64 <_RIP+0x64> Code; ffffffff803e41d3 8: 48 89 de mov %rbx,%rsi Code; ffffffff803e41d6 b: 48 c7 c7 22 f5 45 80 mov $0xffffffff8045f522,%rdi Code; ffffffff803e41dd 12: e8 5b 00 00 00 callq 72 <_RIP+0x72> 2836 warnings issued. Results may not be reliable. From xma at us.ibm.com Thu May 25 12:31:13 2006 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 25 May 2006 12:31:13 -0700 Subject: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring In-Reply-To: Message-ID: Roland, I made some mistakes during splitting these patches. Thanks for pointing out. The reason I removed the cacheline because I have tested and proved that it didn't help somehow. Even in some code, I induced some locks, the overall performance of all these 7 patches I am trying to post here could improve IPoIB from 20% - 80% unidirectional and doubled bidirectional. As you mentioned I need help to repolish these patches. I am glad that you give all these valuable inputs of my patches. Thanks lots here. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Thu May 25 14:27:27 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 25 May 2006 14:27:27 -0700 Subject: [openib-general] Re: [PATCH] CMA: fix port 2 loopback problems In-Reply-To: <20060525151831.GW21266@mellanox.co.il> References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> <445FA39B.50107@ichips.intel.com> <20060508202904.GD25527@mellanox.co.il> <4460EC93.4090307@ichips.intel.com> <20060525151831.GW21266@mellanox.co.il> Message-ID: <4476213F.4070408@ichips.intel.com> Michael S. Tsirkin wrote: > Fix CMA for loopback configurations: in cma_bind_loopback, make sure sa query is > performed from an active port. Thanks! - committed in 7502. - Sean From mshefty at ichips.intel.com Thu May 25 14:36:27 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 25 May 2006 14:36:27 -0700 Subject: [openib-general] Re: [PATCH] CMA: fix port 2 loopback problems In-Reply-To: <20060525151831.GW21266@mellanox.co.il> References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> <445FA39B.50107@ichips.intel.com> <20060508202904.GD25527@mellanox.co.il> <4460EC93.4090307@ichips.intel.com> <20060525151831.GW21266@mellanox.co.il> Message-ID: <4476235B.6030600@ichips.intel.com> Roland, can you add this patch to for-2.6.18? This is needed to support loopback if port 1 is down, but port 2 is up. (I'm guessing that it will apply with some offset.) > --- > > Fix CMA for loopback configurations: in cma_bind_loopback, make sure sa query is > performed from an active port. > > Signed-off-by: Ali Ayoub > Signed-off-by: Michael S. Tsirkin Signed-off-by: Sean Hefty > > Index: openib_gen2/drivers/infiniband/core/cma.c > =================================================================== > --- openib_gen2.orig/drivers/infiniband/core/cma.c 2006-05-25 16:40:46.000000000 +0300 > +++ openib_gen2/drivers/infiniband/core/cma.c 2006-05-25 19:27:19.000000000 +0300 > @@ -1272,28 +1272,39 @@ EXPORT_SYMBOL(rdma_resolve_route); > static int cma_bind_loopback(struct rdma_id_private *id_priv) > { > struct cma_device *cma_dev; > + struct ib_port_attr port_attr; > union ib_gid *gid; > u16 pkey; > int ret; > + u8 p; > > mutex_lock(&lock); > - if (list_empty(&dev_list)) { > + list_for_each_entry(cma_dev, &dev_list, list) > + for (p = 1; p <= cma_dev->device->phys_port_cnt; ++p) > + if (!ib_query_port (cma_dev->device, p, &port_attr) && > + port_attr.state == IB_PORT_ACTIVE) > + goto port_found; > + > + if (!list_empty(&dev_list)) { > + p = 1; > + cma_dev = list_entry(dev_list.next, struct cma_device, list); > + } else { > ret = -ENODEV; > goto out; > } > > - cma_dev = list_entry(dev_list.next, struct cma_device, list); > +port_found: > gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr); > - ret = ib_get_cached_gid(cma_dev->device, 1, 0, gid); > + ret = ib_get_cached_gid(cma_dev->device, p, 0, gid); > if (ret) > goto out; > > - ret = ib_get_cached_pkey(cma_dev->device, 1, 0, &pkey); > + ret = ib_get_cached_pkey(cma_dev->device, p, 0, &pkey); > if (ret) > goto out; > > ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey); > - id_priv->id.port_num = 1; > + id_priv->id.port_num = p; > cma_attach_to_dev(id_priv, cma_dev); > out: > mutex_unlock(&lock); > > From sean.hefty at intel.com Thu May 25 14:51:33 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 25 May 2006 14:51:33 -0700 Subject: [openib-general] [PATCH] git for-2.6.18 cm: remove unneeded flush workqueue Message-ID: Destroy_workqueue already does flush_workqueue. Signed-off-by: Michael S. Tsirkin Signed-off-by: Sean Hefty --- diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 490fd03..1c7463b 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3358,7 +3358,6 @@ error: static void __exit ib_cm_cleanup(void) { - flush_workqueue(cm.wq); destroy_workqueue(cm.wq); ib_unregister_client(&cm_client); idr_destroy(&cm.local_id_table); From rdreier at cisco.com Thu May 25 15:40:29 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 25 May 2006 15:40:29 -0700 Subject: [openib-general] [PATCH] git ucm for 2.6.18: convert semaphore to mutex In-Reply-To: (Sean Hefty's message of "Thu, 25 May 2006 10:03:23 -0700") References: Message-ID: Thanks, added to for-2.6.18 From rdreier at cisco.com Thu May 25 15:40:35 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 25 May 2006 15:40:35 -0700 Subject: [openib-general] Re: [PATCH] git for-2.6.18 cm: remove unneeded flush workqueue In-Reply-To: (Sean Hefty's message of "Thu, 25 May 2006 14:51:33 -0700") References: Message-ID: Thanks, added to for-2.6.18 From rdreier at cisco.com Thu May 25 15:41:48 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 25 May 2006 15:41:48 -0700 Subject: [openib-general] Re: [PATCH] CMA: fix port 2 loopback problems In-Reply-To: <4476235B.6030600@ichips.intel.com> (Sean Hefty's message of "Thu, 25 May 2006 14:36:27 -0700") References: <20060508132803.GB21036@mellanox.co.il> <445F8A94.2080506@ichips.intel.com> <20060508194750.GB25527@mellanox.co.il> <445FA39B.50107@ichips.intel.com> <20060508202904.GD25527@mellanox.co.il> <4460EC93.4090307@ichips.intel.com> <20060525151831.GW21266@mellanox.co.il> <4476235B.6030600@ichips.intel.com> Message-ID: Thanks, I updated my branches with this. From rdreier at cisco.com Thu May 25 15:58:02 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 25 May 2006 15:58:02 -0700 Subject: [openib-general] [PATCH] git ucm for 2.6.18: convert semaphore to mutex In-Reply-To: (Sean Hefty's message of "Thu, 25 May 2006 10:03:23 -0700") References: Message-ID: BTW I went back and rewrote history so that this chunk: > @@ -1159,10 +1159,8 @@ static unsigned int ib_ucm_poll(struct f > > poll_wait(filp, &file->poll_wait, wait); > > - down(&file->mutex); > if (!list_empty(&file->events)) > mask = POLLIN | POLLRDNORM; > - up(&file->mutex); > > return mask; > } isn't needed any more. But that means I rebased my whole for-2.6.18 branch, because the patch that added that chunk was the first one in the series... - R. From rdreier at cisco.com Thu May 25 16:28:27 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 25 May 2006 16:28:27 -0700 Subject: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring In-Reply-To: (Shirley Ma's message of "Thu, 25 May 2006 12:31:13 -0700") References: Message-ID: Shirley> didn't help somehow. Even in some code, I induced some Shirley> locks, the overall performance of all these 7 patches I Shirley> am trying to post here could improve IPoIB from 20% - 80% Shirley> unidirectional and doubled bidirectional. I'm guessing that all of that gain comes from letting the send and receive completion handlers run simultaneously. Is that right? For example how much of an improvement do you see if you just apply the patches you've posted -- that is, only 1/7, 2/7 and 3/7? - R. From paul.lundin at gmail.com Thu May 25 16:32:01 2006 From: paul.lundin at gmail.com (Paul) Date: Thu, 25 May 2006 19:32:01 -0400 Subject: [openib-general] RC 5 ppc64 problems Message-ID: So far I have had 2 issues with the RC5 build. First the compile fails as it does not pass the correct parameters to g++ (regardless of CXXFLAGS, LDFLAGS, CCFLAGS, CFLAGS) settings. I made this work correctly by creating a bash script in place of g++ that called g++ -m64 explicit ally. Now that I have everything compiled I am experiencing the same problem as before ... with some further information available. As noted before I was experiencing some issues with running pallas. I had hand built pallas. Now I am using the full OFED stack (openib, open-mpi, pallas) and the resulting pallas binary will run (localhost test), only if the mthca.so file is missing. If the mthca file is present I get the following consistent error: [root at something PMB-2.2.1]# /usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/bin/mpirun -np 2 -hostfile machine.list ./PMB-MPI1 Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) Failing at addr:0x3000100a619d [0] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/libopal.so.0 [0x80001d0038] [1] func:[0x1ffffffe5f0] [2] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/libmpi.so.0 [0x800006a9dc] [3] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/libmpi.so.0 [0x800006abf8] [4] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/openmpi/mca_btl_openib.so [0x800055a5f0] [5] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/libmpi.so.0 [0x80000d7e48] [6] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/openmpi/mca_bml_r2.so [0x800053f99c] [7] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/libmpi.so.0 [0x80000d7530] [8] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/openmpi/mca_pml_ob1.so [0x800051f00c] [9] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/libmpi.so.0 [0x80000e0558] [10] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/libmpi.so.0 [0x800008c900] [11] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/libmpi.so.0 [0x80000b6f20] [12] func:./PMB-MPI1 [0x10003144] [13] func:/lib64/tls/libc.so.6 [0x8064e9415c] [14] func:/lib64/tls/libc.so.6 [0x8064e942e4] *** End of error message *** [root at something PMB-2.2.1]# -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Thu May 25 17:09:10 2006 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 25 May 2006 17:09:10 -0700 Subject: [openib-general] [git pull] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree has one bug fix (really the second half of a fix for a bug that appeared two different places): Michael S. Tsirkin: IB/mthca: Fix posting lists of 256 receive requests to SRQ for Tavor drivers/infiniband/hw/mthca/mthca_srq.c | 41 ++++++++++++++++--------------- 1 files changed, 21 insertions(+), 20 deletions(-) diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c index 1ea4332..b292fef 100644 --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -490,26 +490,7 @@ int mthca_tavor_post_srq_recv(struct ib_ first_ind = srq->first_free; - for (nreq = 0; wr; ++nreq, wr = wr->next) { - if (unlikely(nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB)) { - nreq = 0; - - doorbell[0] = cpu_to_be32(first_ind << srq->wqe_shift); - doorbell[1] = cpu_to_be32(srq->srqn << 8); - - /* - * Make sure that descriptors are written - * before doorbell is rung. - */ - wmb(); - - mthca_write64(doorbell, - dev->kar + MTHCA_RECEIVE_DOORBELL, - MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); - - first_ind = srq->first_free; - } - + for (nreq = 0; wr; wr = wr->next) { ind = srq->first_free; if (ind < 0) { @@ -569,6 +550,26 @@ int mthca_tavor_post_srq_recv(struct ib_ srq->wrid[ind] = wr->wr_id; srq->first_free = next_ind; + + ++nreq; + if (unlikely(nreq == MTHCA_TAVOR_MAX_WQES_PER_RECV_DB)) { + nreq = 0; + + doorbell[0] = cpu_to_be32(first_ind << srq->wqe_shift); + doorbell[1] = cpu_to_be32(srq->srqn << 8); + + /* + * Make sure that descriptors are written + * before doorbell is rung. + */ + wmb(); + + mthca_write64(doorbell, + dev->kar + MTHCA_RECEIVE_DOORBELL, + MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock)); + + first_ind = srq->first_free; + } } if (likely(nreq)) { From katiyar.mohit at gmail.com Thu May 25 18:08:04 2006 From: katiyar.mohit at gmail.com (Mohit Katiyar) Date: Fri, 26 May 2006 10:08:04 +0900 Subject: [openib-general] InfiniBand Host adapters Message-ID: <46465bb30605251808q15ce8bc5na88b3c70a8a95a80@mail.gmail.com> Hi all, I have a query regarding the InfiniBand Host Adapters. Currently I am using SLES 9 on x86_64 machine and I have an InfiniBand Host adapter. I want to know if I had two IB host adapters(or any other iscsi adapter) on the same machine then what would be the pattern of their entries in /sys/class/scsi_host/iscsi directory? TIA Mohit katiyar From xma at us.ibm.com Thu May 25 18:37:30 2006 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 25 May 2006 18:37:30 -0700 Subject: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring In-Reply-To: Message-ID: Roland Dreier wrote on 05/25/2006 04:28:27 PM: > Shirley> didn't help somehow. Even in some code, I induced some > Shirley> locks, the overall performance of all these 7 patches I > Shirley> am trying to post here could improve IPoIB from 20% - 80% > Shirley> unidirectional and doubled bidirectional. > > I'm guessing that all of that gain comes from letting the send and > receive completion handlers run simultaneously. Is that right? > > For example how much of an improvement do you see if you just apply > the patches you've posted -- that is, only 1/7, 2/7 and 3/7? > > - R. That's not true. I tested performance with 1/7, 3/7 a couple weeks ago, I saw more than 10% improvement. I never saw send queue overrun with tx_ring before on one cpu with one TCP stream. After removing tx_ring, the send path is much faster, the default 128 is not bigger enough to handle it. That's the reason I have another patch to handler send queue overrun -- requeue the packet to the head of dev xmit queue instead of current implementation which is silently dropped the packets when device driver send queue is full, and depends on TCP retransmission. This implementation would cause TCP fast trans, slow start, and packets out of orders. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Thu May 25 19:30:23 2006 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 25 May 2006 19:30:23 -0700 Subject: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring In-Reply-To: Message-ID: Roland, Roland Dreier wrote on 05/25/2006 09:24:01 AM: > This also looks like a step backwards to me. You are replacing a > cache-friendly array with a cache-unfriendly linked list, which also > requires two more lock/unlock operations in the fast path. This patch reduces one extra ring between dev xmit queue and device send queue and removes tx_lock in completion handler. The whole purpose to have the send_list and slock is for shutting down clean up. Otherwise we don't need to maintain this list. And most likely when shutting down, waiting for 5HZ, the list is empty. I could implment it differently, like use RCU list with cache-friendly. I thought it's not worth it before since i didn't see the gain. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom at opengridcomputing.com Fri May 26 06:46:32 2006 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 26 May 2006 08:46:32 -0500 Subject: [openib-general] NFS/RDMA for Linux: client and server update release 5 In-Reply-To: <1148579648.1573.18.camel@shuttle> References: <7.0.1.0.2.20060522161137.04202e30@netapp.com> <1148426365.1575.10.camel@shuttle> <7.0.1.0.2.20060524071711.0421e2d8@netapp.com> <1148579648.1573.18.camel@shuttle> Message-ID: <1148651192.31781.7.camel@trinity.ogc.int> Helen: Can you please do the following and send the output to me and Tom Talpey? objdump -Sl net/sunrpc/sched.o > objdump.out Thanks, Tom (Tom Tucker Not Talpey) On Thu, 2006-05-25 at 10:54 -0700, helen chen wrote: > Tom, > > Please review the attached ksymoops output. > > Helen > > On Wed, 2006-05-24 at 04:25, Talpey, Thomas wrote: > > [Cutting down the reply list to more relevant parties...] > > > > It's hard to say what is crashing, but I suspect the CM code, due > > to the process context being ib_cm. Is there some reason you're > > not getting symbols in the stack trace? If you could feed this oops > > text to ksymoops it will give us more information. > > > > In any case, it appears the connection is succeeding at the server, > > but the client RPC code isn't being signalled that it has done so. > > Perhaps this is due to a lost reply, but the NFS code hasn't actually > > started to do anything. So, I would look for IB-level issues. Is the > > client running the current OpenFabrics svn top-of-tree? > > > > Let's take this offline to diagnose, unless someone has an idea why > > the CM would be failing. The ksymoops analysis would help. > > > > Tom. > > > > > > > > At 07:19 PM 5/23/2006, helen chen wrote: > > >Hi Tom, > > > > > >I have downloaded your release 5 of the NFS/RDMA and am having trouble > > >mounting the rdma nfs, the > > >"./nfsrdmamount -o rdma on16-ib:/mnt/rdma /mnt/rdma" command never > > >returned. and the dmesg for client and server are: > > > > > >------ demsg from client ----- > > >RPCRDMA Module Init, register RPC RDMA transport > > >Defaults: > > > MaxRequests 50 > > > MaxInlineRead 1024 > > > MaxInlineWrite 1024 > > > Padding 0 > > > Memreg 5 > > >RPC: Registered rdma transport module. > > >RPC: Registered rdma transport module. > > >RPC: xprt_setup_rdma: 140.221.134.221:2049 > > >nfs: server on16-ib not responding, timed out > > >Unable to handle kernel NULL pointer dereference at 0000000000000000 > > >RIP: > > >[<0000000000000000>] > > >PGD a9f2b067 PUD a8ca2067 PMD 0 > > >Oops: 0010 [1] PREEMPT SMP > > >CPU 1 > > >Modules linked in: xprtrdma ib_srp iscsi_tcp scsi_transport_iscsi > > >scsi_mod > > >Pid: 346, comm: ib_cm/1 Not tainted 2.6.16.16 #4 > > >RIP: 0010:[<0000000000000000>] [<0000000000000000>] > > >RSP: 0018:ffff8100af5a1c30 EFLAGS: 00010246 > > >RAX: ffff8100aeff2400 RBX: ffff8100aeff2400 RCX: ffff8100afc9e458 > > >RDX: 0000000000000000 RSI: ffff8100af5a1d48 RDI: ffff8100aeff2440 > > >RBP: ffff8100aeff2440 R08: 0000000000000000 R09: 0000000000000000 > > >R10: 0000000000000003 R11: 0000000000000000 R12: ffff8100aeff2500 > > >R13: 00000000ffffff99 R14: ffff8100af5a1d48 R15: ffffffff8036c72c > > >FS: 0000000000505ae0(0000) GS:ffff810003ce25c0(0000) > > >knlGS:0000000000000000 > > >CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > > >CR2: 0000000000000000 CR3: 00000000ad587000 CR4: 00000000000006a0 > > >Process ib_cm/1 (pid: 346, threadinfo ffff8100af5a0000, task > > >ffff8100afea8100) > > >Stack: ffffffff8802a331 ffff8100aeff2500 0000000000000001 > > >ffff8100aeff2440 > > > ffffffff804011fd 0000000000000000 ffffffff8802a343 > > >ffff8100afdd6100 > > > ffffffff80364ee4 0000000000000100 > > >Call Trace: [] [] > > > [] [] [] > > > [] [] [] > > > [] [] [] > > > [] [] [] > > > [] [] [] > > > [] [] [] > > > [] [] [] > > > [] [] [] > > > [] > > > > > >Code: Bad RIP value. > > >RIP [<0000000000000000>] RSP > > >CR2: 0000000000000000 > > > > > >------dmesg from server ------ > > >nfsd: request from insecure port 140.221.134.220, port=32768! > > >svc_rdma_recvfrom: transport ffff81007e8f2800 is closing > > >svc_rdma_put: Destroying transport ffff81007e8f2800, > > >cm_id=ffff81007e945200, sk_flags=154, sk_inuse=0 > > > > > >Did I forget to configure necessary components into my kernel? > > > > > >Thanks, > > >Helen > > > > > >On Mon, 2006-05-22 at 13:25, Talpey, Thomas wrote: > > >> Network Appliance is pleased to announce release 5 of the NFS/RDMA > > >> client and server for Linux 2.6.16.16. This update to the April 19 release > > >> adds improved server parallel performance and fixes various issues. This > > >> code supports both Infiniband and iWARP transports. > > >> > > >> > > >> > > >> > > > > > >> > > >> Comments and feedback welcome. We're especially interested in > > >> successful test reports! Thanks. > > >> > > >> Tom Talpey, for the various NFS/RDMA projects. > > >> > > >> _______________________________________________ > > >> openib-general mailing list > > >> openib-general at openib.org > > >> http://openib.org/mailman/listinfo/openib-general > > >> > > >> To unsubscribe, please visit > > >http://openib.org/mailman/listinfo/openib-general > > >> > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From Don.Albert at Bull.com Fri May 26 07:45:56 2006 From: Don.Albert at Bull.com (Don.Albert at Bull.com) Date: Fri, 26 May 2006 07:45:56 -0700 Subject: [openib-general] Re: NOP problem in ib_mthca on OFED RC4 In-Reply-To: <20060517144830.GE30211@mellanox.co.il> Message-ID: Michael, Sorry for the long delay in replying, I was on vacation for 10 days, then when I returned, the OFED RC5 release was imminent, so I decided to wait to install it before persuing this further. Of course, when I did, the problem mysteriously went away. The ib_mthca module now initializes correctly on both EM64T machines. I noticed some discussion between you and Roland about making the parameter "fw_cmd_doorbell=0" the default. Did this occur in RC5? > > Could you please give more detail on the exact system that had/has > this problem? Model, chipset revision, full lspci -v output, etc. > -- > MST In case the problem comes back again with RC6, below is some information on the machine that had the problem. -Don Albert- MODEL x86_64 [type=x86_64] CPU 4 x Intel(R) Xeon(TM) CPU 3.00GHz, 64 bits 2992.628 Mhz MEM 2055516 kB real memory FIRM e820 OS Red Hat Enterprise Linux AS release 4 (Nahant Update 3) - kernel 2.6.16 [jatoba] (ib) ib> /sbin/lspci -v pcilib: Resource 2 in /sys/bus/pci/devices/0000:03:00.0/resource has a 64-bit address, ignoring 00:00.0 Host bridge: Intel Corporation E7525 Memory Controller Hub (rev 0c) Subsystem: Intel Corporation: Unknown device 3444 Flags: bus master, fast devsel, latency 0 Capabilities: [40] Vendor Specific Information 00:00.1 Class ff00: Intel Corporation E7525/E7520 Error Reporting Registers (rev 0c) Subsystem: Intel Corporation: Unknown device 3444 Flags: fast devsel 00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 0c) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 Capabilities: [50] Power Management version 2 Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable- Capabilities: [64] Express Root Port (Slot-) IRQ 0 Capabilities: [100] Advanced Error Reporting 00:03.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A1 (rev 0c) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 Capabilities: [50] Power Management version 2 Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable- Capabilities: [64] Express Root Port (Slot-) IRQ 0 Capabilities: [100] Advanced Error Reporting 00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 0c) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 Memory behind bridge: ded00000-dedfffff Prefetchable memory behind bridge: 00000000ff800000-00000000fff00000 Capabilities: [50] Power Management version 2 Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable- Capabilities: [64] Express Root Port (Slot+) IRQ 0 Capabilities: [100] Advanced Error Reporting 00:1c.0 PCI bridge: Intel Corporation 6300ESB 64-bit PCI-X Bridge (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, 66Mhz, fast devsel, latency 64 Bus: primary=00, secondary=04, subordinate=04, sec-latency=48 Memory behind bridge: dee00000-deefffff Capabilities: [50] PCI-X bridge device. 00:1d.0 USB Controller: Intel Corporation 6300ESB USB Universal Host Controller (rev 02) (prog-if 00 [UHCI]) Subsystem: Intel Corporation: Unknown device 3444 Flags: bus master, medium devsel, latency 0, IRQ 169 I/O ports at d880 [size=32] 00:1d.1 USB Controller: Intel Corporation 6300ESB USB Universal Host Controller (rev 02) (prog-if 00 [UHCI]) Subsystem: Intel Corporation: Unknown device 3444 Flags: bus master, medium devsel, latency 0, IRQ 201 I/O ports at dc00 [size=32] 00:1d.4 System peripheral: Intel Corporation 6300ESB Watchdog Timer (rev 02) Subsystem: Intel Corporation: Unknown device 3444 Flags: medium devsel Memory at decff800 (32-bit, non-prefetchable) [size=16] 00:1d.5 PIC: Intel Corporation 6300ESB I/O Advanced Programmable Interrupt Controller (rev 02) (prog-if 20 [IO(X)-APIC]) Subsystem: Intel Corporation: Unknown device 3444 Flags: bus master, fast devsel, latency 0 Capabilities: [50] PCI-X non-bridge device. 00:1d.7 USB Controller: Intel Corporation 6300ESB USB2 Enhanced Host Controller (rev 02) (prog-if 20 [EHCI]) Subsystem: Intel Corporation: Unknown device 3444 Flags: bus master, medium devsel, latency 0, IRQ 193 Memory at decffc00 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 0a) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=05, subordinate=05, sec-latency=32 I/O behind bridge: 0000e000-0000efff Memory behind bridge: def00000-dfffffff Prefetchable memory behind bridge: 88000000-880fffff 00:1f.0 ISA bridge: Intel Corporation 6300ESB LPC Interface Controller (rev 02) Flags: bus master, medium devsel, latency 0 00:1f.1 IDE interface: Intel Corporation 6300ESB PATA Storage Controller (rev 02) (prog-if 8a [Master SecP PriP]) Subsystem: Intel Corporation: Unknown device 3444 Flags: bus master, medium devsel, latency 0, IRQ 185 I/O ports at I/O ports at I/O ports at I/O ports at I/O ports at fc00 [size=16] Memory at 88100000 (32-bit, non-prefetchable) [size=1K] 00:1f.2 IDE interface: Intel Corporation 6300ESB SATA Storage Controller (rev 02) (prog-if 8f [Master SecP SecO PriP PriO]) Subsystem: Intel Corporation: Unknown device 3444 Flags: bus master, 66Mhz, medium devsel, latency 0, IRQ 185 I/O ports at d800 [size=8] I/O ports at d480 [size=4] I/O ports at d400 [size=8] I/O ports at d080 [size=4] I/O ports at d000 [size=16] 00:1f.3 SMBus: Intel Corporation 6300ESB SMBus Controller (rev 02) Subsystem: Intel Corporation: Unknown device 3444 Flags: medium devsel, IRQ 11 I/O ports at 0400 [size=32] 03:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20) Subsystem: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] Flags: bus master, fast devsel, latency 0, IRQ 169 Memory at ded00000 (64-bit, non-prefetchable) [size=1M] Memory at (64-bit, prefetchable) Capabilities: [40] Power Management version 2 Capabilities: [48] Vital Product Data Capabilities: [90] Message Signalled Interrupts: 64bit+ Queue=0/5 Enable- Capabilities: [84] MSI-X: Enable- Mask- TabSize=32 Capabilities: [60] Express Endpoint IRQ 0 04:02.0 Ethernet controller: Alteon Networks Inc. AceNIC Gigabit Ethernet (rev 01) Subsystem: IBM Gigabit Ethernet-SX PCI Adapter Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 177 Memory at deefc000 (32-bit, non-prefetchable) [size=16K] 05:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA]) Subsystem: Intel Corporation: Unknown device 3444 Flags: bus master, stepping, medium devsel, latency 64, IRQ 11 Memory at df000000 (32-bit, non-prefetchable) [size=16M] I/O ports at e800 [size=256] Memory at defff000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at 88000000 [disabled] [size=128K] Capabilities: [5c] Power Management version 2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Don.Albert at Bull.com Fri May 26 08:02:54 2006 From: Don.Albert at Bull.com (Don.Albert at Bull.com) Date: Fri, 26 May 2006 08:02:54 -0700 Subject: [openib-general] OpenSM segmentation fault on RC5 Message-ID: I just installed RC5 on a small testbed consisting of two EM64T machines connected back to back with Mellanox MT25204 DDR HCAs. With RC4 I had problems with the ib_mthca driver initializing on one of the machines. This problem has mysteriously gone away with RC5, but now I am having a problem with the OpenSM Subnet Manager. When I use the script supplied by RC5 to start the SM, either after a boot or manually, I get a segmentation fault, right after the SM declares itself the MASTER: [koa] (root) root> /etc/init.d/opensmd start /etc/init.d/opensmd: line 330: 6844 Done echo $PORT_FLAG 6845 Segmentation fault | $prog $START_FLAGS >/dev/null 2>&1 opensm start[FAILED] To get more information, I tried starting the SM with the "-V" flag: [koa] (root) root> /usr/local/ofed/bin/opensm -V ------------------------------------------------- OpenSM Rev:openib-1.2.0 Based on OpenIB svn Exported revision Command Line Arguments: Big V selected Log File: /var/log/osm.log ------------------------------------------------- OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision Using default GUID 0x2c90200216dc5 Entering MASTER state Segmentation fault Attached is the /var/log/osm.log file. The first attempt at time 16:08 was with the script, the second at 16:33 was with the "-V" flag. Also attached is the /etc/opensm.conf file. -Don Albert- Bull HN Information Systems -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: opensm_log Type: application/octet-stream Size: 81394 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: opensm.conf Type: application/octet-stream Size: 4597 bytes Desc: not available URL: From paul.lundin at gmail.com Fri May 26 08:35:39 2006 From: paul.lundin at gmail.com (Paul) Date: Fri, 26 May 2006 11:35:39 -0400 Subject: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: References: Message-ID: I am having a similar issue on my ppc64 systems. Take a look at the email I sent to the list last night. I have not been able to figure out much regarding why its dying, I wonder if it might be tied to some other issues I have am having. On 5/26/06, Don.Albert at bull.com wrote: > > > I just installed RC5 on a small testbed consisting of two EM64T machines > connected back to back with Mellanox MT25204 DDR HCAs. With RC4 I had > problems with the ib_mthca driver initializing on one of the machines. > This problem has mysteriously gone away with RC5, but now I am having a > problem with the OpenSM Subnet Manager. > > When I use the script supplied by RC5 to start the SM, either after a > boot or manually, I get a segmentation fault, right after the SM declares > itself the MASTER: > > [koa] (root) root> /etc/init.d/opensmd start > /etc/init.d/opensmd: line 330: 6844 Done echo > $PORT_FLAG > 6845 Segmentation fault | $prog $START_FLAGS >/dev/null 2>&1 > opensm start[FAILED] > > To get more information, I tried starting the SM with the "-V" flag: > > [koa] (root) root> /usr/local/ofed/bin/opensm -V > ------------------------------------------------- > OpenSM Rev:openib-1.2.0 > Based on OpenIB svn Exported revision > Command Line Arguments: > Big V selected > Log File: /var/log/osm.log > ------------------------------------------------- > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision > > Using default GUID 0x2c90200216dc5 > Entering MASTER state > > Segmentation fault > > Attached is the /var/log/osm.log file. The first attempt at time 16:08 > was with the script, the second at 16:33 was with the "-V" flag. Also > attached is the /etc/opensm.conf file. > > -Don Albert- > Bull HN Information Systems > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Fri May 26 08:39:15 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 26 May 2006 11:39:15 -0400 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: References: Message-ID: <1148657954.4583.5262.camel@hal.voltaire.com> Hi Paul, On Fri, 2006-05-26 at 11:35, Paul wrote: > I am having a similar issue on my ppc64 systems. Take a look at the > email I sent to the list last night. I have not been able to figure > out much regarding why its dying, Are you referring to your mail on compile flags and then OpenMPI ? I saw no mention of an issue with OpenSM in that email. > I wonder if it might be tied to some other issues I have am having. If SM is not running, then all bets are off. -- Hal > > On 5/26/06, Don.Albert at bull.com wrote: > I just installed RC5 on a small testbed consisting of two > EM64T machines connected back to back with Mellanox MT25204 > DDR HCAs. With RC4 I had problems with the ib_mthca driver > initializing on one of the machines. This problem has > mysteriously gone away with RC5, but now I am having a problem > with the OpenSM Subnet Manager. > > When I use the script supplied by RC5 to start the SM, either > after a boot or manually, I get a segmentation fault, right > after the SM declares itself the MASTER: > > [koa] (root) root> /etc/init.d/opensmd start > /etc/init.d/opensmd: line 330: 6844 Done > echo $PORT_FLAG > 6845 Segmentation fault | $prog $START_FLAGS > >/dev/null 2>&1 > opensm start[FAILED] > > To get more information, I tried starting the SM with the "-V" > flag: > > [koa] (root) root> /usr/local/ofed/bin/opensm -V > ------------------------------------------------- > OpenSM Rev:openib-1.2.0 > Based on OpenIB svn Exported revision > Command Line Arguments: > Big V selected > Log File: /var/log/osm.log > ------------------------------------------------- > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision > > Using default GUID 0x2c90200216dc5 > Entering MASTER state > > Segmentation fault > > Attached is the /var/log/osm.log file. The first attempt at > time 16:08 was with the script, the second at 16:33 was with > the "-V" flag. Also attached is the /etc/opensm.conf file. > > -Don Albert- > Bull HN Information Systems > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > > > > ______________________________________________________________________ > > _______________________________________________ > openfabrics-ewg mailing list > openfabrics-ewg at openib.org > http://openib.org/mailman/listinfo/openfabrics-ewg From paul.lundin at gmail.com Fri May 26 09:14:49 2006 From: paul.lundin at gmail.com (Paul) Date: Fri, 26 May 2006 12:14:49 -0400 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: <1148657954.4583.5262.camel@hal.voltaire.com> References: <1148657954.4583.5262.camel@hal.voltaire.com> Message-ID: No, I figured all of that out, ppc64 was not supported/working in RC4. Either way, here is what I see with opensm: [root at something ~]# /etc/init.d/opensmd start *** glibc detected *** realloc(): invalid next size: 0x00000000100ab1e0 *** /etc/init.d/opensmd: line 330: 7854 Done echo $PORT_FLAG 7855 Aborted | $prog $START_FLAGS >/dev/null 2>&1 opensm start [FAILED] [root at something ~]# -------------- next part -------------- An HTML attachment was scrubbed... URL: From bardov at gmail.com Fri May 26 09:44:46 2006 From: bardov at gmail.com (Dan Bar Dov) Date: Fri, 26 May 2006 18:44:46 +0200 Subject: [openib-general] InfiniBand Host adapters In-Reply-To: <46465bb30605251808q15ce8bc5na88b3c70a8a95a80@mail.gmail.com> References: <46465bb30605251808q15ce8bc5na88b3c70a8a95a80@mail.gmail.com> Message-ID: Mohit, this is question to the open-iscsi list, not the openIB list. Anyway, the answer really depends on the version of open-iscsi that you use. To check for yourself, assuming /dev/sda is an iscsi disk, cd to /sys/block/sda and do a /bin/pwd (not regular pwd, /bin/pwd), and you'd see the path. Check on the open-iscsi mailing list, I recently sent a script called oil that I believe does what you need. Dan On 5/26/06, Mohit Katiyar wrote: > Hi all, > I have a query regarding the InfiniBand Host Adapters. > Currently I am using SLES 9 on x86_64 machine and I have an InfiniBand > Host adapter. I want to know if I had two IB host adapters(or any other iscsi > adapter) on the same machine then what would be the pattern of their > entries in /sys/class/scsi_host/iscsi directory? > > TIA > > Mohit katiyar > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From bardov at gmail.com Fri May 26 09:49:00 2006 From: bardov at gmail.com (Dan Bar Dov) Date: Fri, 26 May 2006 18:49:00 +0200 Subject: [openib-general] iSCSI host adapter entries In-Reply-To: <46465bb30605220241u28eb958ayd094d3a6bc4174a3@mail.gmail.com> References: <46465bb30605220241u28eb958ayd094d3a6bc4174a3@mail.gmail.com> Message-ID: Mohit, I didn't see this earlier. Again, please send iscsi related questions to open-iscsi mailing list. The fact it is iscsi over IB does not make it an IB question. If it was iscsi over TCP, you wouldn't send it to the TCP mailing list, right? Dan On 5/22/06, Mohit Katiyar wrote: > Hi all, > I have a query regarding the Host adapters. > Currently I am using SLES 9 on x86_64 machine and I have an InfiniBand > Host adapter. > > The entry in the sys directory for the for the host is as follows > optgfs:~ # ll /sys/class/scsi_host/iscsi/* > -r--r--r-- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/cmd_per_lun > --w------- 1 root root 4096 May 22 13:31 > /sys/class/scsi_host/iscsi/connfailtimeout > lrwxrwxrwx 1 root root 0 May 22 13:31 > /sys/class/scsi_host/iscsi/device -> ../../../devices/platform/iscsi > --w------- 1 root root 4096 May 22 13:31 > /sys/class/scsi_host/iscsi/diskcommandtimeout > -r--r--r-- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/host_busy > -r--r--r-- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/host_no > --w------- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/log > -rw-r--r-- 1 root root 4096 May 22 13:31 > /sys/class/scsi_host/iscsi/no_partition_check > -r--r--r-- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/proc_name > --w------- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/scan > -r--r--r-- 1 root root 4096 May 22 13:31 > /sys/class/scsi_host/iscsi/sg_tablesize > --w------- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/shutdown > -r--r--r-- 1 root root 4096 May 22 13:31 > /sys/class/scsi_host/iscsi/unchecked_isa_dma > -r--r--r-- 1 root root 4096 May 22 13:31 /sys/class/scsi_host/iscsi/unique_id > > I want to know if I had two IB host adapters(or any other iscsi > adapter) on the same machine then what would be the pattern of their > entries in /sys/class/scsi_host/iscsi directory? > > Thanks > Mohit katiyar > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Fri May 26 09:46:01 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 26 May 2006 12:46:01 -0400 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: References: <1148657954.4583.5262.camel@hal.voltaire.com> Message-ID: <1148661960.4583.6711.camel@hal.voltaire.com> Hi again Paul, On Fri, 2006-05-26 at 12:14, Paul wrote: > No, I figured all of that out, ppc64 was not supported/working in RC4. > Either way, here is what I see with opensm: > > [root at something ~]# /etc/init.d/opensmd start > *** glibc detected *** realloc(): invalid next size: > 0x00000000100ab1e0 *** > /etc/init.d/opensmd: line 330: 7854 Done echo $PORT_FLAG > 7855 Aborted | $prog $START_FLAGS >/dev/null 2>&1 > opensm start [FAILED] > [root at something ~]# OK; that's a totally different problem than Don's. I would like to get to the bottom of this. 0x100ab1e0 is a pretty big size. Is this reproducible ? I'm not sure how realloc gets called as I do not believe OpenSM calls it directly (or any of its libraries). Are you using 32 or 64 bit libraries for this ? Would you rebuild OpenSM with debug: ./configure --enable-debug && make clean && make && make install and then run opensm under gdb and provide the backtrace after the failure? Thanks. -- Hal From bardov at gmail.com Fri May 26 09:56:05 2006 From: bardov at gmail.com (Dan Bar Dov) Date: Fri, 26 May 2006 18:56:05 +0200 Subject: [openib-general] Splitting openib-general to openib-devel and openib-users Message-ID: As traffic on openib-general is picking up, and usability questions are not always picked up in between the patches/rfcs etc., I think we should split the "general" into a "devel" and "users", and maybe also "announce" mailing lists (or use general for announcements etc.) Dan From robert.j.woodruff at intel.com Fri May 26 09:56:46 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Fri, 26 May 2006 09:56:46 -0700 Subject: [openib-general] OpenSM build problem Message-ID: <1AC79F16F5C5284499BB9591B33D6F0007CE0ED7@orsmsx408> When I build the opensm on the 1.0 branch using the folowing commands (the defaults) cd ../management/libibcommon ./autogen.sh && ./configure && make && make install cd ../libibumad ./autogen.sh && ./configure && make && make install cd ../libibmad ./autogen.sh && ./configure && make && make install cd ../osm/complib ./autogen.sh && ./configure && make && make install cd ../libvendor ./autogen.sh && ./configure && make && make install cd ../opensm ./autogen.sh && ./configure && make && make install I get the following error when trying to run it. [root at iclust-tiger1 woody]# /usr/local/bin/opensm: /usr/local/lib/libosmcomp.so.1: version `OSMCOMP_1.0' not found (required by /usr/local/bin/opensm) [1]+ Exit 1 /usr/local/bin/opensm This does not happen with the trunk version 7479 when I build it the same way. Is there something that I need to specify when I build it for the 1.0 version ? woody From hycsw at ca.sandia.gov Fri May 26 10:00:19 2006 From: hycsw at ca.sandia.gov (helen chen) Date: 26 May 2006 10:00:19 -0700 Subject: [openib-general] NFS/RDMA for Linux: client and server update release 5 In-Reply-To: <1148651192.31781.7.camel@trinity.ogc.int> References: <7.0.1.0.2.20060522161137.04202e30@netapp.com> <1148426365.1575.10.camel@shuttle> <7.0.1.0.2.20060524071711.0421e2d8@netapp.com> <1148579648.1573.18.camel@shuttle> <1148651192.31781.7.camel@trinity.ogc.int> Message-ID: <1148662819.1575.34.camel@shuttle> Here it is! - Helen On Fri, 2006-05-26 at 06:46, Tom Tucker wrote: > Helen: > > Can you please do the following and send the output to me and Tom > Talpey? > > objdump -Sl net/sunrpc/sched.o > objdump.out > > Thanks, > > Tom > (Tom Tucker Not Talpey) > > > On Thu, 2006-05-25 at 10:54 -0700, helen chen wrote: > > Tom, > > > > Please review the attached ksymoops output. > > > > Helen > > > > On Wed, 2006-05-24 at 04:25, Talpey, Thomas wrote: > > > [Cutting down the reply list to more relevant parties...] > > > > > > It's hard to say what is crashing, but I suspect the CM code, due > > > to the process context being ib_cm. Is there some reason you're > > > not getting symbols in the stack trace? If you could feed this oops > > > text to ksymoops it will give us more information. > > > > > > In any case, it appears the connection is succeeding at the server, > > > but the client RPC code isn't being signalled that it has done so. > > > Perhaps this is due to a lost reply, but the NFS code hasn't actually > > > started to do anything. So, I would look for IB-level issues. Is the > > > client running the current OpenFabrics svn top-of-tree? > > > > > > Let's take this offline to diagnose, unless someone has an idea why > > > the CM would be failing. The ksymoops analysis would help. > > > > > > Tom. > > > > > > > > > > > > At 07:19 PM 5/23/2006, helen chen wrote: > > > >Hi Tom, > > > > > > > >I have downloaded your release 5 of the NFS/RDMA and am having trouble > > > >mounting the rdma nfs, the > > > >"./nfsrdmamount -o rdma on16-ib:/mnt/rdma /mnt/rdma" command never > > > >returned. and the dmesg for client and server are: > > > > > > > >------ demsg from client ----- > > > >RPCRDMA Module Init, register RPC RDMA transport > > > >Defaults: > > > > MaxRequests 50 > > > > MaxInlineRead 1024 > > > > MaxInlineWrite 1024 > > > > Padding 0 > > > > Memreg 5 > > > >RPC: Registered rdma transport module. > > > >RPC: Registered rdma transport module. > > > >RPC: xprt_setup_rdma: 140.221.134.221:2049 > > > >nfs: server on16-ib not responding, timed out > > > >Unable to handle kernel NULL pointer dereference at 0000000000000000 > > > >RIP: > > > >[<0000000000000000>] > > > >PGD a9f2b067 PUD a8ca2067 PMD 0 > > > >Oops: 0010 [1] PREEMPT SMP > > > >CPU 1 > > > >Modules linked in: xprtrdma ib_srp iscsi_tcp scsi_transport_iscsi > > > >scsi_mod > > > >Pid: 346, comm: ib_cm/1 Not tainted 2.6.16.16 #4 > > > >RIP: 0010:[<0000000000000000>] [<0000000000000000>] > > > >RSP: 0018:ffff8100af5a1c30 EFLAGS: 00010246 > > > >RAX: ffff8100aeff2400 RBX: ffff8100aeff2400 RCX: ffff8100afc9e458 > > > >RDX: 0000000000000000 RSI: ffff8100af5a1d48 RDI: ffff8100aeff2440 > > > >RBP: ffff8100aeff2440 R08: 0000000000000000 R09: 0000000000000000 > > > >R10: 0000000000000003 R11: 0000000000000000 R12: ffff8100aeff2500 > > > >R13: 00000000ffffff99 R14: ffff8100af5a1d48 R15: ffffffff8036c72c > > > >FS: 0000000000505ae0(0000) GS:ffff810003ce25c0(0000) > > > >knlGS:0000000000000000 > > > >CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > > > >CR2: 0000000000000000 CR3: 00000000ad587000 CR4: 00000000000006a0 > > > >Process ib_cm/1 (pid: 346, threadinfo ffff8100af5a0000, task > > > >ffff8100afea8100) > > > >Stack: ffffffff8802a331 ffff8100aeff2500 0000000000000001 > > > >ffff8100aeff2440 > > > > ffffffff804011fd 0000000000000000 ffffffff8802a343 > > > >ffff8100afdd6100 > > > > ffffffff80364ee4 0000000000000100 > > > >Call Trace: [] [] > > > > [] [] [] > > > > [] [] [] > > > > [] [] [] > > > > [] [] [] > > > > [] [] [] > > > > [] [] [] > > > > [] [] [] > > > > [] [] [] > > > > [] > > > > > > > >Code: Bad RIP value. > > > >RIP [<0000000000000000>] RSP > > > >CR2: 0000000000000000 > > > > > > > >------dmesg from server ------ > > > >nfsd: request from insecure port 140.221.134.220, port=32768! > > > >svc_rdma_recvfrom: transport ffff81007e8f2800 is closing > > > >svc_rdma_put: Destroying transport ffff81007e8f2800, > > > >cm_id=ffff81007e945200, sk_flags=154, sk_inuse=0 > > > > > > > >Did I forget to configure necessary components into my kernel? > > > > > > > >Thanks, > > > >Helen > > > > > > > >On Mon, 2006-05-22 at 13:25, Talpey, Thomas wrote: > > > >> Network Appliance is pleased to announce release 5 of the NFS/RDMA > > > >> client and server for Linux 2.6.16.16. This update to the April 19 release > > > >> adds improved server parallel performance and fixes various issues. This > > > >> code supports both Infiniband and iWARP transports. > > > >> > > > >> > > > >> > > > >> > > > > > > > >> > > > >> Comments and feedback welcome. We're especially interested in > > > >> successful test reports! Thanks. > > > >> > > > >> Tom Talpey, for the various NFS/RDMA projects. > > > >> > > > >> _______________________________________________ > > > >> openib-general mailing list > > > >> openib-general at openib.org > > > >> http://openib.org/mailman/listinfo/openib-general > > > >> > > > >> To unsubscribe, please visit > > > >http://openib.org/mailman/listinfo/openib-general > > > >> > > > > > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > -------------- next part -------------- /usr/src/linux-2.6.16.16/net/sunrpc/sched.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <__rpc_wait_for_completion_task>: __rpc_wait_for_completion_task(): 0: 48 85 f6 test %rsi,%rsi 3: 48 c7 c0 00 00 00 00 mov $0x0,%rax a: 48 0f 44 f0 cmove %rax,%rsi e: 51 push %rcx f: 8b 87 d0 00 00 00 mov 0xd0(%rdi),%eax 15: 31 d2 xor %edx,%edx 17: 4c 8d 87 d0 00 00 00 lea 0xd0(%rdi),%r8 1e: a8 10 test $0x10,%al 20: 74 17 je 39 <__rpc_wait_for_completion_task+0x39> 22: 48 89 f2 mov %rsi,%rdx 25: b9 01 00 00 00 mov $0x1,%ecx 2a: be 04 00 00 00 mov $0x4,%esi 2f: 4c 89 c7 mov %r8,%rdi 32: e8 00 00 00 00 callq 37 <__rpc_wait_for_completion_task+0x37> 37: 89 c2 mov %eax,%edx 39: 89 d0 mov %edx,%eax 3b: 5a pop %rdx 3c: c3 retq 000000000000003d : rpc_run_timer(): 3d: 55 push %rbp 3e: 53 push %rbx 3f: 48 89 fb mov %rdi,%rbx 42: 41 50 push %r8 44: 48 8b 6f 68 mov 0x68(%rdi),%rbp 48: 48 c7 47 68 00 00 00 movq $0x0,0x68(%rdi) 4f: 00 50: 48 85 ed test %rbp,%rbp 53: 74 2d je 82 55: 8b 87 d0 00 00 00 mov 0xd0(%rdi),%eax 5b: a8 02 test $0x2,%al 5d: 74 23 je 82 5f: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # 66 66: 74 15 je 7d 68: 0f b7 b7 40 01 00 00 movzwl 0x140(%rdi),%esi 6f: 31 c0 xor %eax,%eax 71: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 78: e8 00 00 00 00 callq 7d 7d: 48 89 df mov %rbx,%rdi 80: ff d5 callq *%ebp 82: f0 0f ba b3 d0 00 00 lock btrl $0x3,0xd0(%rbx) 89: 00 03 8b: 5e pop %rsi 8c: 5b pop %rbx 8d: 5d pop %rbp 8e: c3 retq 000000000000008f : rpc_delete_timer(): 8f: 53 push %rbx 90: 8b 87 d0 00 00 00 mov 0xd0(%rdi),%eax 96: 48 89 fb mov %rdi,%rbx 99: a8 02 test $0x2,%al 9b: 75 3a jne d7 9d: f0 0f ba b7 d0 00 00 lock btrl $0x3,0xd0(%rdi) a4: 00 03 a6: 19 c0 sbb %eax,%eax a8: 85 c0 test %eax,%eax aa: 74 2b je d7 ac: 48 8d bf 90 00 00 00 lea 0x90(%rdi),%rdi b3: e8 00 00 00 00 callq b8 b8: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # bf bf: 74 16 je d7 c1: 0f b7 b3 40 01 00 00 movzwl 0x140(%rbx),%esi c8: 48 c7 c7 00 00 00 00 mov $0x0,%rdi cf: 31 c0 xor %eax,%eax d1: 5b pop %rbx d2: e9 00 00 00 00 jmpq d7 d7: 5b pop %rbx d8: c3 retq 00000000000000d9 <__rpc_init_priority_wait_queue>: __rpc_init_priority_wait_queue(): d9: 48 c7 07 01 00 00 00 movq $0x1,(%rdi) e0: 48 89 f9 mov %rdi,%rcx e3: 41 b8 02 00 00 00 mov $0x2,%r8d e9: 48 8d 41 08 lea 0x8(%rcx),%rax ed: 48 89 41 08 mov %rax,0x8(%rcx) f1: 48 89 41 10 mov %rax,0x10(%rcx) f5: 48 83 c1 10 add $0x10,%rcx f9: 41 ff c8 dec %r8d fc: 79 eb jns e9 <__rpc_init_priority_wait_queue+0x10> fe: 0f b6 ca movzbl %dl,%ecx 101: b8 01 00 00 00 mov $0x1,%eax 106: 88 57 40 mov %dl,0x40(%rdi) 109: 88 4f 41 mov %cl,0x41(%rdi) 10c: 01 c9 add %ecx,%ecx 10e: 48 c7 47 38 00 00 00 movq $0x0,0x38(%rdi) 115: 00 116: d3 e0 shl %cl,%eax 118: c6 47 43 10 movb $0x10,0x43(%rdi) 11c: 48 89 77 48 mov %rsi,0x48(%rdi) 120: 88 47 42 mov %al,0x42(%rdi) 123: c3 retq 0000000000000124 : rpc_init_priority_wait_queue(): 124: ba 02 00 00 00 mov $0x2,%edx 129: eb ae jmp d9 <__rpc_init_priority_wait_queue> 000000000000012b : rpc_init_wait_queue(): 12b: 31 d2 xor %edx,%edx 12d: eb aa jmp d9 <__rpc_init_priority_wait_queue> 000000000000012f : rpc_wait_bit_interruptible(): 12f: 41 52 push %r10 131: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 138: 00 00 13a: 48 8b 40 08 mov 0x8(%rax),%rax 13e: ba 00 fe ff ff mov $0xfffffe00,%edx 143: 8b 40 10 mov 0x10(%rax),%eax 146: a8 04 test $0x4,%al 148: 75 07 jne 151 14a: e8 00 00 00 00 callq 14f 14f: 31 d2 xor %edx,%edx 151: 41 59 pop %r9 153: 89 d0 mov %edx,%eax 155: c3 retq 0000000000000156 : rpc_sleep_on(): 156: 41 55 push %r13 158: 49 89 d5 mov %rdx,%r13 15b: 41 54 push %r12 15d: 49 89 cc mov %rcx,%r12 160: 55 push %rbp 161: 48 89 fd mov %rdi,%rbp 164: 53 push %rbx 165: 53 push %rbx 166: 48 89 f3 mov %rsi,%rbx 169: e8 00 00 00 00 callq 16e 16e: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # 175 175: 74 31 je 1a8 177: 48 85 ed test %rbp,%rbp 17a: 74 09 je 185 17c: 48 8b 55 48 mov 0x48(%rbp),%rdx 180: 48 85 d2 test %rdx,%rdx 183: 75 07 jne 18c 185: 48 c7 c2 00 00 00 00 mov $0x0,%rdx 18c: 0f b7 b3 40 01 00 00 movzwl 0x140(%rbx),%esi 193: 48 8b 0d 00 00 00 00 mov 0(%rip),%rcx # 19a 19a: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 1a1: 31 c0 xor %eax,%eax 1a3: e8 00 00 00 00 callq 1a8 1a8: f6 83 c8 00 00 00 01 testb $0x1,0xc8(%rbx) 1af: 75 1d jne 1ce 1b1: 8b 83 d0 00 00 00 mov 0xd0(%rbx),%eax 1b7: a8 10 test $0x10,%al 1b9: 75 13 jne 1ce 1bb: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 1c2: 31 c0 xor %eax,%eax 1c4: e8 00 00 00 00 callq 1c9 1c9: e9 ec 01 00 00 jmpq 3ba 1ce: f0 0f ba ab d0 00 00 lock btsl $0x4,0xd0(%rbx) 1d5: 00 04 1d7: 8b 83 d0 00 00 00 mov 0xd0(%rbx),%eax 1dd: a8 02 test $0x2,%al 1df: 74 0a je 1eb 1e1: 0f 0b ud2a 1e3: 68 00 00 00 00 pushq $0x0 1e8: c2 a6 00 retq $0xa6 1eb: 80 7d 40 00 cmpb $0x0,0x40(%rbp) 1ef: 0f 84 87 00 00 00 je 27c 1f5: 48 8d 83 f0 00 00 00 lea 0xf0(%rbx),%rax 1fc: 48 89 40 08 mov %rax,0x8(%rax) 200: 0f b6 93 ca 00 00 00 movzbl 0xca(%rbx),%edx 207: 48 89 83 f0 00 00 00 mov %rax,0xf0(%rbx) 20e: 83 e2 03 and $0x3,%edx 211: 0f b6 c2 movzbl %dl,%eax 214: 48 c1 e0 04 shl $0x4,%rax 218: 48 8d 4c 05 08 lea 0x8(%rbp,%rax,1),%rcx 21d: 0f b6 45 40 movzbl 0x40(%rbp),%eax 221: 38 c2 cmp %al,%dl 223: 76 0c jbe 231 225: 0f b6 c0 movzbl %al,%eax 228: 48 c1 e0 04 shl $0x4,%rax 22c: 48 8d 4c 05 08 lea 0x8(%rbp,%rax,1),%rcx 231: 48 8b 11 mov (%rcx),%rdx 234: 48 8d b2 20 ff ff ff lea 0xffffffffffffff20(%rdx),%rsi 23b: 48 8b 86 e0 00 00 00 mov 0xe0(%rsi),%rax 242: 0f 18 08 prefetcht0 (%rax) 245: 48 39 ca cmp %rcx,%rdx 248: 74 13 je 25d 24a: 48 8b 43 60 mov 0x60(%rbx),%rax 24e: 48 39 46 60 cmp %rax,0x60(%rsi) 252: 74 55 je 2a9 254: 48 8b 96 e0 00 00 00 mov 0xe0(%rsi),%rdx 25b: eb d7 jmp 234 25d: 48 8b 51 08 mov 0x8(%rcx),%rdx 261: 48 8d 83 e0 00 00 00 lea 0xe0(%rbx),%rax 268: 48 89 8b e0 00 00 00 mov %rcx,0xe0(%rbx) 26f: 48 89 41 08 mov %rax,0x8(%rcx) 273: 48 89 02 mov %rax,(%rdx) 276: 48 89 50 08 mov %rdx,0x8(%rax) 27a: eb 69 jmp 2e5 27c: f6 83 c8 00 00 00 02 testb $0x2,0xc8(%rbx) 283: 48 8d 8b e0 00 00 00 lea 0xe0(%rbx),%rcx 28a: 48 8d 55 08 lea 0x8(%rbp),%rdx 28e: 74 3f je 2cf 290: 48 8b 45 08 mov 0x8(%rbp),%rax 294: 48 89 48 08 mov %rcx,0x8(%rax) 298: 48 89 83 e0 00 00 00 mov %rax,0xe0(%rbx) 29f: 48 89 51 08 mov %rdx,0x8(%rcx) 2a3: 48 89 4d 08 mov %rcx,0x8(%rbp) 2a7: eb 3c jmp 2e5 2a9: 48 8d 96 f0 00 00 00 lea 0xf0(%rsi),%rdx 2b0: 48 8d 83 e0 00 00 00 lea 0xe0(%rbx),%rax 2b7: 48 8b 4a 08 mov 0x8(%rdx),%rcx 2bb: 48 89 93 e0 00 00 00 mov %rdx,0xe0(%rbx) 2c2: 48 89 42 08 mov %rax,0x8(%rdx) 2c6: 48 89 01 mov %rax,(%rcx) 2c9: 48 89 48 08 mov %rcx,0x8(%rax) 2cd: eb 16 jmp 2e5 2cf: 48 8b 42 08 mov 0x8(%rdx),%rax 2d3: 48 89 93 e0 00 00 00 mov %rdx,0xe0(%rbx) 2da: 48 89 4a 08 mov %rcx,0x8(%rdx) 2de: 48 89 08 mov %rcx,(%rax) 2e1: 48 89 41 08 mov %rax,0x8(%rcx) 2e5: 48 89 ab 00 01 00 00 mov %rbp,0x100(%rbx) 2ec: 66 ff 45 44 incw 0x44(%rbp) 2f0: f0 0f ba ab d0 00 00 lock btsl $0x1,0xd0(%rbx) 2f7: 00 01 2f9: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # 300 300: 74 2d je 32f 302: 48 85 ed test %rbp,%rbp 305: 74 09 je 310 307: 48 8b 4d 48 mov 0x48(%rbp),%rcx 30b: 48 85 c9 test %rcx,%rcx 30e: 75 07 jne 317 310: 48 c7 c1 00 00 00 00 mov $0x0,%rcx 317: 0f b7 b3 40 01 00 00 movzwl 0x140(%rbx),%esi 31e: 48 89 ea mov %rbp,%rdx 321: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 328: 31 c0 xor %eax,%eax 32a: e8 00 00 00 00 callq 32f 32f: 48 83 7b 70 00 cmpq $0x0,0x70(%rbx) 334: 74 0a je 340 336: 0f 0b ud2a 338: 68 00 00 00 00 pushq $0x0 33d: c2 52 01 retq $0x152 340: 48 8b 83 c0 00 00 00 mov 0xc0(%rbx),%rax 347: 4c 89 6b 70 mov %r13,0x70(%rbx) 34b: 48 85 c0 test %rax,%rax 34e: 74 6a je 3ba 350: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # 357 357: 74 2c je 385 359: 48 69 c8 e8 03 00 00 imul $0x3e8,%rax,%rcx 360: be fa 00 00 00 mov $0xfa,%esi 365: 31 d2 xor %edx,%edx 367: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 36e: 48 89 c8 mov %rcx,%rax 371: 48 f7 f6 div %rsi 374: 0f b7 b3 40 01 00 00 movzwl 0x140(%rbx),%esi 37b: 48 89 c2 mov %rax,%rdx 37e: 31 c0 xor %eax,%eax 380: e8 00 00 00 00 callq 385 385: 4d 85 e4 test %r12,%r12 388: 48 c7 c0 00 00 00 00 mov $0x0,%rax 38f: 49 0f 45 c4 cmovne %r12,%rax 393: 48 89 43 68 mov %rax,0x68(%rbx) 397: f0 0f ba ab d0 00 00 lock btsl $0x3,0xd0(%rbx) 39e: 00 03 3a0: 48 8b 35 00 00 00 00 mov 0(%rip),%rsi # 3a7 3a7: 48 03 b3 c0 00 00 00 add 0xc0(%rbx),%rsi 3ae: 48 8d bb 90 00 00 00 lea 0x90(%rbx),%rdi 3b5: e8 00 00 00 00 callq 3ba 3ba: 41 5b pop %r11 3bc: 5b pop %rbx 3bd: 48 89 ef mov %rbp,%rdi 3c0: 5d pop %rbp 3c1: 41 5c pop %r12 3c3: 41 5d pop %r13 3c5: e9 00 00 00 00 jmpq 3ca <__rpc_do_wake_up_task> 00000000000003ca <__rpc_do_wake_up_task>: __rpc_do_wake_up_task(): 3ca: 55 push %rbp 3cb: 53 push %rbx 3cc: 48 89 fb mov %rdi,%rbx 3cf: 56 push %rsi 3d0: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # 3d7 <__rpc_do_wake_up_task+0xd> 3d7: 74 1c je 3f5 <__rpc_do_wake_up_task+0x2b> 3d9: 0f b7 b7 40 01 00 00 movzwl 0x140(%rdi),%esi 3e0: 48 8b 15 00 00 00 00 mov 0(%rip),%rdx # 3e7 <__rpc_do_wake_up_task+0x1d> 3e7: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 3ee: 31 c0 xor %eax,%eax 3f0: e8 00 00 00 00 callq 3f5 <__rpc_do_wake_up_task+0x2b> 3f5: 8b 83 d0 00 00 00 mov 0xd0(%rbx),%eax 3fb: a8 10 test $0x10,%al 3fd: 75 14 jne 413 <__rpc_do_wake_up_task+0x49> 3ff: 59 pop %rcx 400: 48 89 de mov %rbx,%rsi 403: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 40a: 31 c0 xor %eax,%eax 40c: 5b pop %rbx 40d: 5d pop %rbp 40e: e9 00 00 00 00 jmpq 413 <__rpc_do_wake_up_task+0x49> 413: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # 41a <__rpc_do_wake_up_task+0x50> 41a: 74 15 je 431 <__rpc_do_wake_up_task+0x67> 41c: 0f b7 b3 40 01 00 00 movzwl 0x140(%rbx),%esi 423: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 42a: 31 c0 xor %eax,%eax 42c: e8 00 00 00 00 callq 431 <__rpc_do_wake_up_task+0x67> 431: 4c 8b 8b 00 01 00 00 mov 0x100(%rbx),%r9 438: 48 c7 43 68 00 00 00 movq $0x0,0x68(%rbx) 43f: 00 440: 48 c7 83 c0 00 00 00 movq $0x0,0xc0(%rbx) 447: 00 00 00 00 44b: 41 80 79 40 00 cmpb $0x0,0x40(%r9) 450: 74 73 je 4c5 <__rpc_do_wake_up_task+0xfb> 452: 48 8b b3 f0 00 00 00 mov 0xf0(%rbx),%rsi 459: 48 8d bb f0 00 00 00 lea 0xf0(%rbx),%rdi 460: 48 39 fe cmp %rdi,%rsi 463: 74 60 je 4c5 <__rpc_do_wake_up_task+0xfb> 465: 48 8b 06 mov (%rsi),%rax 468: 48 8b 56 08 mov 0x8(%rsi),%rdx 46c: 48 8d 8b e0 00 00 00 lea 0xe0(%rbx),%rcx 473: 4c 8d 46 10 lea 0x10(%rsi),%r8 477: 48 89 02 mov %rax,(%rdx) 47a: 48 89 50 08 mov %rdx,0x8(%rax) 47e: 48 8b 83 e0 00 00 00 mov 0xe0(%rbx),%rax 485: 48 89 06 mov %rax,(%rsi) 488: 48 89 70 08 mov %rsi,0x8(%rax) 48c: 48 89 4e 08 mov %rcx,0x8(%rsi) 490: 48 8b 8b f0 00 00 00 mov 0xf0(%rbx),%rcx 497: 48 89 b3 e0 00 00 00 mov %rsi,0xe0(%rbx) 49e: 48 39 f9 cmp %rdi,%rcx 4a1: 74 22 je 4c5 <__rpc_do_wake_up_task+0xfb> 4a3: 48 8b 57 08 mov 0x8(%rdi),%rdx 4a7: 48 8b 46 10 mov 0x10(%rsi),%rax 4ab: 4c 89 41 08 mov %r8,0x8(%rcx) 4af: 48 89 4e 10 mov %rcx,0x10(%rsi) 4b3: 48 89 50 08 mov %rdx,0x8(%rax) 4b7: 48 89 02 mov %rax,(%rdx) 4ba: 48 89 7f 08 mov %rdi,0x8(%rdi) 4be: 48 89 bb f0 00 00 00 mov %rdi,0xf0(%rbx) 4c5: 48 8d 8b e0 00 00 00 lea 0xe0(%rbx),%rcx 4cc: 48 8b 83 e0 00 00 00 mov 0xe0(%rbx),%rax 4d3: 48 8b 51 08 mov 0x8(%rcx),%rdx 4d7: 48 89 50 08 mov %rdx,0x8(%rax) 4db: 48 89 02 mov %rax,(%rdx) 4de: 48 c7 41 08 00 02 20 movq $0x200200,0x8(%rcx) 4e5: 00 4e6: 48 c7 83 e0 00 00 00 movq $0x100100,0xe0(%rbx) 4ed: 00 01 10 00 4f1: 66 41 ff 49 44 decw 0x44(%r9) 4f6: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # 4fd <__rpc_do_wake_up_task+0x133> 4fd: 74 2d je 52c <__rpc_do_wake_up_task+0x162> 4ff: 4d 85 c9 test %r9,%r9 502: 74 09 je 50d <__rpc_do_wake_up_task+0x143> 504: 49 8b 49 48 mov 0x48(%r9),%rcx 508: 48 85 c9 test %rcx,%rcx 50b: 75 07 jne 514 <__rpc_do_wake_up_task+0x14a> 50d: 48 c7 c1 00 00 00 00 mov $0x0,%rcx 514: 0f b7 b3 40 01 00 00 movzwl 0x140(%rbx),%esi 51b: 4c 89 ca mov %r9,%rdx 51e: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 525: 31 c0 xor %eax,%eax 527: e8 00 00 00 00 callq 52c <__rpc_do_wake_up_task+0x162> 52c: 48 83 7b 68 00 cmpq $0x0,0x68(%rbx) 531: 74 0a je 53d <__rpc_do_wake_up_task+0x173> 533: 0f 0b ud2a 535: 68 00 00 00 00 pushq $0x0 53a: c2 29 01 retq $0x129 53d: 48 8d bb d0 00 00 00 lea 0xd0(%rbx),%rdi 544: f0 0f ba ab d0 00 00 lock btsl $0x0,0xd0(%rbx) 54b: 00 00 54d: 19 c0 sbb %eax,%eax 54f: f0 0f ba b3 d0 00 00 lock btrl $0x1,0xd0(%rbx) 556: 00 01 558: 85 c0 test %eax,%eax 55a: 75 7c jne 5d8 <__rpc_do_wake_up_task+0x20e> 55c: f6 83 c8 00 00 00 01 testb $0x1,0xc8(%rbx) 563: 74 69 je 5ce <__rpc_do_wake_up_task+0x204> 565: 48 8d 83 e8 00 00 00 lea 0xe8(%rbx),%rax 56c: 48 8d bb 10 01 00 00 lea 0x110(%rbx),%rdi 573: 48 89 40 08 mov %rax,0x8(%rax) 577: 48 89 83 e8 00 00 00 mov %rax,0xe8(%rbx) 57e: 48 c7 83 e0 00 00 00 movq $0x0,0xe0(%rbx) 585: 00 00 00 00 589: 48 c7 83 f8 00 00 00 movq $0x0,0xf8(%rbx) 590: 00 00 00 00 594: 48 89 9b 00 01 00 00 mov %rbx,0x100(%rbx) 59b: e8 00 00 00 00 callq 5a0 <__rpc_do_wake_up_task+0x1d6> 5a0: 48 8b bb d8 00 00 00 mov 0xd8(%rbx),%rdi 5a7: 48 8d b3 e0 00 00 00 lea 0xe0(%rbx),%rsi 5ae: e8 00 00 00 00 callq 5b3 <__rpc_do_wake_up_task+0x1e9> 5b3: 85 c0 test %eax,%eax 5b5: 89 c5 mov %eax,%ebp 5b7: 79 1f jns 5d8 <__rpc_do_wake_up_task+0x20e> 5b9: 89 c6 mov %eax,%esi 5bb: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 5c2: 31 c0 xor %eax,%eax 5c4: e8 00 00 00 00 callq 5c9 <__rpc_do_wake_up_task+0x1ff> 5c9: 89 6b 30 mov %ebp,0x30(%rbx) 5cc: eb 0a jmp 5d8 <__rpc_do_wake_up_task+0x20e> 5ce: be 01 00 00 00 mov $0x1,%esi 5d3: e8 00 00 00 00 callq 5d8 <__rpc_do_wake_up_task+0x20e> 5d8: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # 5df <__rpc_do_wake_up_task+0x215> 5df: 74 11 je 5f2 <__rpc_do_wake_up_task+0x228> 5e1: 5a pop %rdx 5e2: 5b pop %rbx 5e3: 5d pop %rbp 5e4: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 5eb: 31 c0 xor %eax,%eax 5ed: e9 00 00 00 00 jmpq 5f2 <__rpc_do_wake_up_task+0x228> 5f2: 5d pop %rbp 5f3: 5b pop %rbx 5f4: 5d pop %rbp 5f5: c3 retq 00000000000005f6 <__rpc_wake_up_task>: __rpc_wake_up_task(): 5f6: 53 push %rbx 5f7: 48 89 fb mov %rdi,%rbx 5fa: f0 0f ba af d0 00 00 lock btsl $0x2,0xd0(%rdi) 601: 00 02 603: 19 c0 sbb %eax,%eax 605: 85 c0 test %eax,%eax 607: 75 18 jne 621 <__rpc_wake_up_task+0x2b> 609: 8b 87 d0 00 00 00 mov 0xd0(%rdi),%eax 60f: a8 02 test $0x2,%al 611: 74 05 je 618 <__rpc_wake_up_task+0x22> 613: e8 b2 fd ff ff callq 3ca <__rpc_do_wake_up_task> 618: f0 0f ba b3 d0 00 00 lock btrl $0x2,0xd0(%rbx) 61f: 00 02 621: 5b pop %rbx 622: c3 retq 0000000000000623 : rpc_wake_up_task(): 623: 55 push %rbp 624: 48 89 fd mov %rdi,%rbp 627: 53 push %rbx 628: 41 50 push %r8 62a: f0 0f ba af d0 00 00 lock btsl $0x2,0xd0(%rdi) 631: 00 02 633: 19 c0 sbb %eax,%eax 635: 85 c0 test %eax,%eax 637: 75 32 jne 66b 639: 8b 87 d0 00 00 00 mov 0xd0(%rdi),%eax 63f: a8 02 test $0x2,%al 641: 74 1f je 662 643: 48 8b 9f 00 01 00 00 mov 0x100(%rdi),%rbx 64a: 48 89 df mov %rbx,%rdi 64d: e8 00 00 00 00 callq 652 652: 48 89 ef mov %rbp,%rdi 655: e8 70 fd ff ff callq 3ca <__rpc_do_wake_up_task> 65a: 48 89 df mov %rbx,%rdi 65d: e8 00 00 00 00 callq 662 662: f0 0f ba b5 d0 00 00 lock btrl $0x2,0xd0(%rbp) 669: 00 02 66b: 5f pop %rdi 66c: 5b pop %rbx 66d: 5d pop %rbp 66e: c3 retq 000000000000066f <__rpc_default_timer>: __rpc_default_timer(): 66f: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # 676 <__rpc_default_timer+0x7> 676: 53 push %rbx 677: 48 89 fb mov %rdi,%rbx 67a: 74 15 je 691 <__rpc_default_timer+0x22> 67c: 0f b7 b7 40 01 00 00 movzwl 0x140(%rdi),%esi 683: 31 c0 xor %eax,%eax 685: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 68c: e8 00 00 00 00 callq 691 <__rpc_default_timer+0x22> 691: c7 43 30 92 ff ff ff movl $0xffffff92,0x30(%rbx) 698: 48 89 df mov %rbx,%rdi 69b: 5b pop %rbx 69c: e9 00 00 00 00 jmpq 6a1 00000000000006a1 : rpc_wake_up_next(): 6a1: 55 push %rbp 6a2: 31 ed xor %ebp,%ebp 6a4: 53 push %rbx 6a5: 48 89 fb mov %rdi,%rbx 6a8: 41 52 push %r10 6aa: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # 6b1 6b1: 74 26 je 6d9 6b3: 48 85 ff test %rdi,%rdi 6b6: 74 09 je 6c1 6b8: 48 8b 57 48 mov 0x48(%rdi),%rdx 6bc: 48 85 d2 test %rdx,%rdx 6bf: 75 07 jne 6c8 6c1: 48 c7 c2 00 00 00 00 mov $0x0,%rdx 6c8: 48 89 de mov %rbx,%rsi 6cb: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 6d2: 31 c0 xor %eax,%eax 6d4: e8 00 00 00 00 callq 6d9 6d9: 48 89 df mov %rbx,%rdi 6dc: e8 00 00 00 00 callq 6e1 6e1: 80 7b 40 00 cmpb $0x0,0x40(%rbx) 6e5: 0f 84 01 01 00 00 je 7ec 6eb: 48 0f b6 43 41 movzbq 0x41(%rbx),%rax 6f0: 48 c1 e0 04 shl $0x4,%rax 6f4: 48 8d 04 03 lea (%rbx,%rax,1),%rax 6f8: 48 8b 70 08 mov 0x8(%rax),%rsi 6fc: 48 8d 48 08 lea 0x8(%rax),%rcx 700: 48 39 ce cmp %rcx,%rsi 703: 74 4f je 754 705: 48 8b 46 80 mov 0xffffffffffffff80(%rsi),%rax 709: 48 39 43 38 cmp %rax,0x38(%rbx) 70d: 48 8d ae 20 ff ff ff lea 0xffffffffffffff20(%rsi),%rbp 714: 75 31 jne 747 716: 0f b6 43 43 movzbl 0x43(%rbx),%eax 71a: ff c8 dec %eax 71c: 84 c0 test %al,%al 71e: 88 43 43 mov %al,0x43(%rbx) 721: 0f 85 b5 00 00 00 jne 7dc 727: 48 8b 46 08 mov 0x8(%rsi),%rax 72b: 48 8b 16 mov (%rsi),%rdx 72e: 48 89 42 08 mov %rax,0x8(%rdx) 732: 48 89 10 mov %rdx,(%rax) 735: 48 8b 41 08 mov 0x8(%rcx),%rax 739: 48 89 0e mov %rcx,(%rsi) 73c: 48 89 71 08 mov %rsi,0x8(%rcx) 740: 48 89 30 mov %rsi,(%rax) 743: 48 89 46 08 mov %rax,0x8(%rsi) 747: 0f b6 43 42 movzbl 0x42(%rbx),%eax 74b: ff c8 dec %eax 74d: 84 c0 test %al,%al 74f: 88 43 42 mov %al,0x42(%rbx) 752: 75 7c jne 7d0 754: 48 8d 43 08 lea 0x8(%rbx),%rax 758: 48 39 c1 cmp %rax,%rcx 75b: 75 10 jne 76d 75d: 48 0f b6 43 40 movzbq 0x40(%rbx),%rax 762: 48 c1 e0 04 shl $0x4,%rax 766: 48 8d 4c 03 08 lea 0x8(%rbx,%rax,1),%rcx 76b: eb 04 jmp 771 76d: 48 83 e9 10 sub $0x10,%rcx 771: 48 8b 01 mov (%rcx),%rax 774: 48 39 c8 cmp %rcx,%rax 777: 75 36 jne 7af 779: 48 0f b6 43 41 movzbq 0x41(%rbx),%rax 77e: 48 c1 e0 04 shl $0x4,%rax 782: 48 8d 44 03 08 lea 0x8(%rbx,%rax,1),%rax 787: 48 39 c1 cmp %rax,%rcx 78a: 75 c8 jne 754 78c: 0f b6 4b 40 movzbl 0x40(%rbx),%ecx 790: b8 01 00 00 00 mov $0x1,%eax 795: 48 c7 43 38 00 00 00 movq $0x0,0x38(%rbx) 79c: 00 79d: c6 43 43 10 movb $0x10,0x43(%rbx) 7a1: 88 4b 41 mov %cl,0x41(%rbx) 7a4: 01 c9 add %ecx,%ecx 7a6: d3 e0 shl %cl,%eax 7a8: 88 43 42 mov %al,0x42(%rbx) 7ab: 31 c0 xor %eax,%eax 7ad: eb 38 jmp 7e7 7af: 48 29 d9 sub %rbx,%rcx 7b2: 48 8d a8 20 ff ff ff lea 0xffffffffffffff20(%rax),%rbp 7b9: b8 01 00 00 00 mov $0x1,%eax 7be: 48 83 e9 08 sub $0x8,%rcx 7c2: 48 c1 e9 04 shr $0x4,%rcx 7c6: 88 4b 41 mov %cl,0x41(%rbx) 7c9: 01 c9 add %ecx,%ecx 7cb: d3 e0 shl %cl,%eax 7cd: 88 43 42 mov %al,0x42(%rbx) 7d0: 48 8b 45 60 mov 0x60(%rbp),%rax 7d4: c6 43 43 10 movb $0x10,0x43(%rbx) 7d8: 48 89 43 38 mov %rax,0x38(%rbx) 7dc: 48 89 ef mov %rbp,%rdi 7df: e8 12 fe ff ff callq 5f6 <__rpc_wake_up_task> 7e4: 48 89 e8 mov %rbp,%rax 7e7: 48 89 c5 mov %rax,%rbp 7ea: eb 1c jmp 808 7ec: 48 8b 53 08 mov 0x8(%rbx),%rdx 7f0: 48 8d 43 08 lea 0x8(%rbx),%rax 7f4: 48 39 c2 cmp %rax,%rdx 7f7: 74 0f je 808 7f9: 48 8d aa 20 ff ff ff lea 0xffffffffffffff20(%rdx),%rbp 800: 48 89 ef mov %rbp,%rdi 803: e8 ee fd ff ff callq 5f6 <__rpc_wake_up_task> 808: 48 89 df mov %rbx,%rdi 80b: e8 00 00 00 00 callq 810 810: 41 59 pop %r9 812: 5b pop %rbx 813: 48 89 e8 mov %rbp,%rax 816: 5d pop %rbp 817: c3 retq 0000000000000818 : rpc_wake_up(): 818: 41 54 push %r12 81a: 55 push %rbp 81b: 48 89 fd mov %rdi,%rbp 81e: 53 push %rbx 81f: e8 00 00 00 00 callq 824 824: 48 0f b6 45 40 movzbq 0x40(%rbp),%rax 829: 48 c1 e0 04 shl $0x4,%rax 82d: 48 8d 5c 05 08 lea 0x8(%rbp,%rax,1),%rbx 832: 48 8b 03 mov (%rbx),%rax 835: 48 8d b8 20 ff ff ff lea 0xffffffffffffff20(%rax),%rdi 83c: 4c 8b a7 e0 00 00 00 mov 0xe0(%rdi),%r12 843: 49 81 ec e0 00 00 00 sub $0xe0,%r12 84a: 48 39 d8 cmp %rbx,%rax 84d: 74 20 je 86f 84f: e8 a2 fd ff ff callq 5f6 <__rpc_wake_up_task> 854: 4c 89 e7 mov %r12,%rdi 857: 4d 8b a4 24 e0 00 00 mov 0xe0(%r12),%r12 85e: 00 85f: 48 8d 87 e0 00 00 00 lea 0xe0(%rdi),%rax 866: 49 81 ec e0 00 00 00 sub $0xe0,%r12 86d: eb db jmp 84a 86f: 48 8d 45 08 lea 0x8(%rbp),%rax 873: 48 39 c3 cmp %rax,%rbx 876: 74 06 je 87e 878: 48 83 eb 10 sub $0x10,%rbx 87c: eb b4 jmp 832 87e: 5b pop %rbx 87f: 48 89 ef mov %rbp,%rdi 882: 5d pop %rbp 883: 41 5c pop %r12 885: e9 00 00 00 00 jmpq 88a 000000000000088a : rpc_wake_up_status(): 88a: 41 55 push %r13 88c: 41 89 f5 mov %esi,%r13d 88f: 41 54 push %r12 891: 55 push %rbp 892: 48 89 fd mov %rdi,%rbp 895: 53 push %rbx 896: 53 push %rbx 897: e8 00 00 00 00 callq 89c 89c: 48 0f b6 45 40 movzbq 0x40(%rbp),%rax 8a1: 48 c1 e0 04 shl $0x4,%rax 8a5: 48 8d 5c 05 08 lea 0x8(%rbp,%rax,1),%rbx 8aa: 48 8b 03 mov (%rbx),%rax 8ad: 48 8d b8 20 ff ff ff lea 0xffffffffffffff20(%rax),%rdi 8b4: 4c 8b a7 e0 00 00 00 mov 0xe0(%rdi),%r12 8bb: 49 81 ec e0 00 00 00 sub $0xe0,%r12 8c2: 48 39 d8 cmp %rbx,%rax 8c5: 74 24 je 8eb 8c7: 44 89 6f 30 mov %r13d,0x30(%rdi) 8cb: e8 26 fd ff ff callq 5f6 <__rpc_wake_up_task> 8d0: 4c 89 e7 mov %r12,%rdi 8d3: 4d 8b a4 24 e0 00 00 mov 0xe0(%r12),%r12 8da: 00 8db: 48 8d 87 e0 00 00 00 lea 0xe0(%rdi),%rax 8e2: 49 81 ec e0 00 00 00 sub $0xe0,%r12 8e9: eb d7 jmp 8c2 8eb: 48 8d 45 08 lea 0x8(%rbp),%rax 8ef: 48 39 c3 cmp %rax,%rbx 8f2: 74 06 je 8fa 8f4: 48 83 eb 10 sub $0x10,%rbx 8f8: eb b0 jmp 8aa 8fa: 41 5b pop %r11 8fc: 5b pop %rbx 8fd: 48 89 ef mov %rbp,%rdi 900: 5d pop %rbp 901: 41 5c pop %r12 903: 41 5d pop %r13 905: e9 00 00 00 00 jmpq 90a 000000000000090a : rpc_delay(): 90a: 48 89 b7 c0 00 00 00 mov %rsi,0xc0(%rdi) 911: 48 c7 c1 00 00 00 00 mov $0x0,%rcx 918: 48 89 fe mov %rdi,%rsi 91b: 31 d2 xor %edx,%edx 91d: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 924: e9 00 00 00 00 jmpq 929 <__rpc_atrun> 0000000000000929 <__rpc_atrun>: __rpc_atrun(): 929: c7 47 30 00 00 00 00 movl $0x0,0x30(%rdi) 930: e9 00 00 00 00 jmpq 935 0000000000000935 : rpc_prepare_task(): 935: 48 8b 87 80 00 00 00 mov 0x80(%rdi),%rax 93c: 48 8b b7 88 00 00 00 mov 0x88(%rdi),%rsi 943: 4c 8b 18 mov (%rax),%r11 946: 41 ff e3 jmpq *%r11d 0000000000000949 : rpc_exit_task(): 949: 53 push %rbx 94a: 48 8b 87 80 00 00 00 mov 0x80(%rdi),%rax 951: 48 89 fb mov %rdi,%rbx 954: 48 c7 47 78 00 00 00 movq $0x0,0x78(%rdi) 95b: 00 95c: 48 8b 40 08 mov 0x8(%rax),%rax 960: 48 85 c0 test %rax,%rax 963: 74 48 je 9ad 965: 48 8b b7 88 00 00 00 mov 0x88(%rdi),%rsi 96c: ff d0 callq *%eax 96e: 48 83 7b 78 00 cmpq $0x0,0x78(%rbx) 973: 74 38 je 9ad 975: f6 83 c9 00 00 00 01 testb $0x1,0xc9(%rbx) 97c: 74 26 je 9a4 97e: b9 3e 02 00 00 mov $0x23e,%ecx 983: 48 c7 c2 00 00 00 00 mov $0x0,%rdx 98a: 48 c7 c6 00 00 00 00 mov $0x0,%rsi 991: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 998: 31 c0 xor %eax,%eax 99a: e8 00 00 00 00 callq 99f 99f: e8 00 00 00 00 callq 9a4 9a4: 48 89 df mov %rbx,%rdi 9a7: 5b pop %rbx 9a8: e9 00 00 00 00 jmpq 9ad 9ad: 5b pop %rbx 9ae: c3 retq 00000000000009af : rpc_malloc(): 9af: 55 push %rbp 9b0: 53 push %rbx 9b1: 48 89 f3 mov %rsi,%rbx 9b4: 50 push %rax 9b5: 0f b7 87 c8 00 00 00 movzwl 0xc8(%rdi),%eax 9bc: 48 8b 6f 28 mov 0x28(%rdi),%rbp 9c0: 83 e0 02 and $0x2,%eax 9c3: 83 f8 01 cmp $0x1,%eax 9c6: 19 f6 sbb %esi,%esi 9c8: 83 e6 30 and $0x30,%esi 9cb: 83 c6 20 add $0x20,%esi 9ce: 48 81 fb 00 08 00 00 cmp $0x800,%rbx 9d5: 76 1d jbe 9f4 9d7: 48 89 df mov %rbx,%rdi 9da: e8 00 00 00 00 callq 9df 9df: 48 85 c0 test %rax,%rax 9e2: 48 89 85 b8 00 00 00 mov %rax,0xb8(%rbp) 9e9: 74 2c je a17 9eb: 48 89 9d c0 00 00 00 mov %rbx,0xc0(%rbp) 9f2: eb 23 jmp a17 9f4: 48 8b 3d 00 00 00 00 mov 0(%rip),%rdi # 9fb 9fb: e8 00 00 00 00 callq a00 a00: 48 85 c0 test %rax,%rax a03: 48 89 85 b8 00 00 00 mov %rax,0xb8(%rbp) a0a: 74 0b je a17 a0c: 48 c7 85 c0 00 00 00 movq $0x800,0xc0(%rbp) a13: 00 08 00 00 a17: 48 8b 85 b8 00 00 00 mov 0xb8(%rbp),%rax a1e: 5d pop %rbp a1f: 5b pop %rbx a20: 5d pop %rbp a21: c3 retq 0000000000000a22 : rpc_free(): a22: 53 push %rbx a23: 48 8b 5f 28 mov 0x28(%rdi),%rbx a27: 48 8b bb b8 00 00 00 mov 0xb8(%rbx),%rdi a2e: 48 85 ff test %rdi,%rdi a31: 74 36 je a69 a33: 48 81 bb c0 00 00 00 cmpq $0x800,0xc0(%rbx) a3a: 00 08 00 00 a3e: 75 0e jne a4e a40: 48 8b 35 00 00 00 00 mov 0(%rip),%rsi # a47 a47: e8 00 00 00 00 callq a4c a4c: eb 05 jmp a53 a4e: e8 00 00 00 00 callq a53 a53: 48 c7 83 b8 00 00 00 movq $0x0,0xb8(%rbx) a5a: 00 00 00 00 a5e: 48 c7 83 c0 00 00 00 movq $0x0,0xc0(%rbx) a65: 00 00 00 00 a69: 5b pop %rbx a6a: c3 retq 0000000000000a6b : rpc_init_task(): a6b: 41 56 push %r14 a6d: 4d 89 c6 mov %r8,%r14 a70: 41 55 push %r13 a72: 49 89 f5 mov %rsi,%r13 a75: 31 f6 xor %esi,%esi a77: 41 54 push %r12 a79: 41 89 d4 mov %edx,%r12d a7c: ba 48 01 00 00 mov $0x148,%edx a81: 55 push %rbp a82: 48 89 fd mov %rdi,%rbp a85: 53 push %rbx a86: 48 89 cb mov %rcx,%rbx a89: e8 00 00 00 00 callq a8e a8e: 48 8d bd 90 00 00 00 lea 0x90(%rbp),%rdi a95: e8 00 00 00 00 callq a9a a9a: 48 89 ad b0 00 00 00 mov %rbp,0xb0(%rbp) aa1: 48 c7 85 a8 00 00 00 movq $0x0,0xa8(%rbp) aa8: 00 00 00 00 aac: c7 45 00 01 00 00 00 movl $0x1,0x0(%rbp) ab3: 48 8b 05 00 00 00 00 mov 0(%rip),%rax # aba aba: 4c 89 6d 20 mov %r13,0x20(%rbp) abe: 66 44 89 a5 c8 00 00 mov %r12w,0xc8(%rbp) ac5: 00 ac6: 48 89 9d 80 00 00 00 mov %rbx,0x80(%rbp) acd: 48 89 45 08 mov %rax,0x8(%rbp) ad1: 48 83 3b 00 cmpq $0x0,(%rbx) ad5: 74 08 je adf ad7: 48 c7 45 78 00 00 00 movq $0x0,0x78(%rbp) ade: 00 adf: 0f b6 85 ca 00 00 00 movzbl 0xca(%rbp),%eax ae6: 4c 89 b5 88 00 00 00 mov %r14,0x88(%rbp) aed: c6 45 58 02 movb $0x2,0x58(%rbp) af1: c6 45 59 02 movb $0x2,0x59(%rbp) af5: 83 e0 fc and $0xfffffffffffffffc,%eax af8: 83 c8 01 or $0x1,%eax afb: 88 85 ca 00 00 00 mov %al,0xca(%rbp) b01: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax b08: 00 00 b0a: 48 89 45 60 mov %rax,0x60(%rbp) b0e: 48 8b 05 00 00 00 00 mov 0(%rip),%rax # b15 b15: 4d 85 ed test %r13,%r13 b18: 48 89 85 d8 00 00 00 mov %rax,0xd8(%rbp) b1f: 74 25 je b46 b21: f0 41 ff 45 04 lock incl 0x4(%r13) b26: 41 f6 45 50 01 testb $0x1,0x50(%r13) b2b: 74 09 je b36 b2d: 66 81 8d c8 00 00 00 orw $0x200,0xc8(%rbp) b34: 00 02 b36: 41 f6 45 50 02 testb $0x2,0x50(%r13) b3b: 75 09 jne b46 b3d: 66 81 8d c8 00 00 00 orw $0x400,0xc8(%rbp) b44: 00 04 b46: 8b 15 00 00 00 00 mov 0(%rip),%edx # b4c b4c: 48 c7 c7 00 00 00 00 mov $0x0,%rdi b53: 8d 42 01 lea 0x1(%rdx),%eax b56: 89 05 00 00 00 00 mov %eax,0(%rip) # b5c b5c: 66 89 95 40 01 00 00 mov %dx,0x140(%rbp) b63: e8 00 00 00 00 callq b68 b68: 48 8b 15 00 00 00 00 mov 0(%rip),%rdx # b6f b6f: 48 8d 45 10 lea 0x10(%rbp),%rax b73: 48 c7 45 10 00 00 00 movq $0x0,0x10(%rbp) b7a: 00 b7b: 48 c7 c7 00 00 00 00 mov $0x0,%rdi b82: 48 89 05 00 00 00 00 mov %rax,0(%rip) # b89 b89: 48 89 02 mov %rax,(%rdx) b8c: 48 89 50 08 mov %rdx,0x8(%rax) b90: e8 00 00 00 00 callq b95 b95: 48 83 bd 80 00 00 00 cmpq $0x0,0x80(%rbp) b9c: 00 b9d: 75 0a jne ba9 b9f: 0f 0b ud2a ba1: 68 00 00 00 00 pushq $0x0 ba6: c2 1a 03 retq $0x31a ba9: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # bb0 bb0: 74 2c je bde bb2: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax bb9: 00 00 bbb: 5b pop %rbx bbc: 0f b7 b5 40 01 00 00 movzwl 0x140(%rbp),%esi bc3: 8b 90 04 01 00 00 mov 0x104(%rax),%edx bc9: 48 c7 c7 00 00 00 00 mov $0x0,%rdi bd0: 5d pop %rbp bd1: 41 5c pop %r12 bd3: 41 5d pop %r13 bd5: 41 5e pop %r14 bd7: 31 c0 xor %eax,%eax bd9: e9 00 00 00 00 jmpq bde bde: 5b pop %rbx bdf: 5d pop %rbp be0: 41 5c pop %r12 be2: 41 5d pop %r13 be4: 41 5e pop %r14 be6: c3 retq 0000000000000be7 : rpc_new_task(): be7: 41 56 push %r14 be9: 41 89 f6 mov %esi,%r14d bec: be 50 00 00 00 mov $0x50,%esi bf1: 41 55 push %r13 bf3: 49 89 d5 mov %rdx,%r13 bf6: 41 54 push %r12 bf8: 49 89 cc mov %rcx,%r12 bfb: 55 push %rbp bfc: 53 push %rbx bfd: 48 89 fb mov %rdi,%rbx c00: 48 8b 3d 00 00 00 00 mov 0(%rip),%rdi # c07 c07: e8 00 00 00 00 callq c0c c0c: 48 85 c0 test %rax,%rax c0f: 48 89 c5 mov %rax,%rbp c12: 74 3d je c51 c14: 4d 89 e0 mov %r12,%r8 c17: 4c 89 e9 mov %r13,%rcx c1a: 44 89 f2 mov %r14d,%edx c1d: 48 89 de mov %rbx,%rsi c20: 48 89 c7 mov %rax,%rdi c23: e8 00 00 00 00 callq c28 c28: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # c2f c2f: 74 15 je c46 c31: 0f b7 b5 40 01 00 00 movzwl 0x140(%rbp),%esi c38: 48 c7 c7 00 00 00 00 mov $0x0,%rdi c3f: 31 c0 xor %eax,%eax c41: e8 00 00 00 00 callq c46 c46: 66 81 8d c8 00 00 00 orw $0x80,0xc8(%rbp) c4d: 80 00 c4f: eb 2c jmp c7d c51: 48 85 db test %rbx,%rbx c54: 74 27 je c7d c56: 0f b6 53 50 movzbl 0x50(%rbx),%edx c5a: 8b 73 04 mov 0x4(%rbx),%esi c5d: 48 c7 c7 00 00 00 00 mov $0x0,%rdi c64: 31 c0 xor %eax,%eax c66: c0 ea 04 shr $0x4,%dl c69: 83 e2 01 and $0x1,%edx c6c: e8 00 00 00 00 callq c71 c71: f0 ff 43 04 lock incl 0x4(%rbx) c75: 48 89 df mov %rbx,%rdi c78: e8 00 00 00 00 callq c7d c7d: 5b pop %rbx c7e: 48 89 e8 mov %rbp,%rax c81: 5d pop %rbp c82: 41 5c pop %r12 c84: 41 5d pop %r13 c86: 41 5e pop %r14 c88: c3 retq 0000000000000c89 : rpc_release_task(): c89: 41 54 push %r12 c8b: 55 push %rbp c8c: 53 push %rbx c8d: 48 89 fb mov %rdi,%rbx c90: 48 8b af 80 00 00 00 mov 0x80(%rdi),%rbp c97: 4c 8b a7 88 00 00 00 mov 0x88(%rdi),%r12 c9e: f0 ff 0f lock decl (%rdi) ca1: 0f 94 c0 sete %al ca4: 84 c0 test %al,%al ca6: 0f 84 f5 00 00 00 je da1 cac: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # cb3 cb3: 74 15 je cca cb5: 0f b7 b7 40 01 00 00 movzwl 0x140(%rdi),%esi cbc: 31 c0 xor %eax,%eax cbe: 48 c7 c7 00 00 00 00 mov $0x0,%rdi cc5: e8 00 00 00 00 callq cca cca: 48 c7 c7 00 00 00 00 mov $0x0,%rdi cd1: e8 00 00 00 00 callq cd6 cd6: 48 8d 4b 10 lea 0x10(%rbx),%rcx cda: 48 8b 43 10 mov 0x10(%rbx),%rax cde: 48 c7 c7 00 00 00 00 mov $0x0,%rdi ce5: 48 8b 51 08 mov 0x8(%rcx),%rdx ce9: 48 89 50 08 mov %rdx,0x8(%rax) ced: 48 89 02 mov %rax,(%rdx) cf0: 48 c7 41 08 00 02 20 movq $0x200200,0x8(%rcx) cf7: 00 cf8: 48 c7 43 10 00 01 10 movq $0x100100,0x10(%rbx) cff: 00 d00: e8 00 00 00 00 callq d05 d05: 8b 83 d0 00 00 00 mov 0xd0(%rbx),%eax d0b: a8 02 test $0x2,%al d0d: 74 0a je d19 d0f: 0f 0b ud2a d11: 68 00 00 00 00 pushq $0x0 d16: c2 59 03 retq $0x359 d19: 48 89 df mov %rbx,%rdi d1c: e8 6e f3 ff ff callq 8f d21: 48 83 7b 28 00 cmpq $0x0,0x28(%rbx) d26: 74 08 je d30 d28: 48 89 df mov %rbx,%rdi d2b: e8 00 00 00 00 callq d30 d30: 48 83 7b 50 00 cmpq $0x0,0x50(%rbx) d35: 74 08 je d3f d37: 48 89 df mov %rbx,%rdi d3a: e8 00 00 00 00 callq d3f d3f: 48 8b 7b 20 mov 0x20(%rbx),%rdi d43: 48 85 ff test %rdi,%rdi d46: 74 0d je d55 d48: e8 00 00 00 00 callq d4d d4d: 48 c7 43 20 00 00 00 movq $0x0,0x20(%rbx) d54: 00 d55: 80 bb c8 00 00 00 00 cmpb $0x0,0xc8(%rbx) d5c: 79 2d jns d8b d5e: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # d65 d65: 74 15 je d7c d67: 0f b7 b3 40 01 00 00 movzwl 0x140(%rbx),%esi d6e: 48 c7 c7 00 00 00 00 mov $0x0,%rdi d75: 31 c0 xor %eax,%eax d77: e8 00 00 00 00 callq d7c d7c: 48 8b 35 00 00 00 00 mov 0(%rip),%rsi # d83 d83: 48 89 df mov %rbx,%rdi d86: e8 00 00 00 00 callq d8b d8b: 48 8b 45 10 mov 0x10(%rbp),%rax d8f: 48 85 c0 test %rax,%rax d92: 74 0d je da1 d94: 5b pop %rbx d95: 5d pop %rbp d96: 4c 89 e7 mov %r12,%rdi d99: 49 89 c3 mov %rax,%r11 d9c: 41 5c pop %r12 d9e: 41 ff e3 jmpq *%r11d da1: 5b pop %rbx da2: 5d pop %rbp da3: 41 5c pop %r12 da5: c3 retq 0000000000000da6 <__rpc_execute>: __rpc_execute(): da6: 41 54 push %r12 da8: 45 31 e4 xor %r12d,%r12d dab: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # db2 <__rpc_execute+0xc> db2: 55 push %rbp db3: 53 push %rbx db4: 48 89 fb mov %rdi,%rbx db7: 74 1c je dd5 <__rpc_execute+0x2f> db9: 0f b7 97 c8 00 00 00 movzwl 0xc8(%rdi),%edx dc0: 0f b7 b7 40 01 00 00 movzwl 0x140(%rdi),%esi dc7: 31 c0 xor %eax,%eax dc9: 48 c7 c7 00 00 00 00 mov $0x0,%rdi dd0: e8 00 00 00 00 callq dd5 <__rpc_execute+0x2f> dd5: 8b 83 d0 00 00 00 mov 0xd0(%rbx),%eax ddb: a8 02 test $0x2,%al ddd: 74 0a je de9 <__rpc_execute+0x43> ddf: 0f 0b ud2a de1: 68 00 00 00 00 pushq $0x0 de6: c2 50 02 retq $0x250 de9: 48 89 df mov %rbx,%rdi dec: e8 9e f2 ff ff callq 8f df1: 48 8b 43 70 mov 0x70(%rbx),%rax df5: 48 85 c0 test %rax,%rax df8: 74 0d je e07 <__rpc_execute+0x61> dfa: 48 c7 43 70 00 00 00 movq $0x0,0x70(%rbx) e01: 00 e02: 48 89 df mov %rbx,%rdi e05: ff d0 callq *%eax e07: 8b 83 d0 00 00 00 mov 0xd0(%rbx),%eax e0d: 48 8d ab d0 00 00 00 lea 0xd0(%rbx),%rbp e14: a8 02 test $0x2,%al e16: 75 12 jne e2a <__rpc_execute+0x84> e18: 48 8b 43 78 mov 0x78(%rbx),%rax e1c: 48 85 c0 test %rax,%rax e1f: 0f 84 f2 00 00 00 je f17 <__rpc_execute+0x171> e25: 48 89 df mov %rbx,%rdi e28: ff d0 callq *%eax e2a: 8b 83 d0 00 00 00 mov 0xd0(%rbx),%eax e30: a8 02 test $0x2,%al e32: 74 b5 je de9 <__rpc_execute+0x43> e34: f0 0f ba b3 d0 00 00 lock btrl $0x0,0xd0(%rbx) e3b: 00 00 e3d: f6 83 c8 00 00 00 01 testb $0x1,0xc8(%rbx) e44: 74 25 je e6b <__rpc_execute+0xc5> e46: 8b 83 d0 00 00 00 mov 0xd0(%rbx),%eax e4c: 48 d1 e8 shr %rax e4f: 83 e0 01 and $0x1,%eax e52: 85 c0 test %eax,%eax e54: 75 0e jne e64 <__rpc_execute+0xbe> e56: f0 0f ab 83 d0 00 00 lock bts %eax,0xd0(%rbx) e5d: 00 e5e: 19 c0 sbb %eax,%eax e60: 85 c0 test %eax,%eax e62: 74 85 je de9 <__rpc_execute+0x43> e64: 31 c0 xor %eax,%eax e66: e9 f5 00 00 00 jmpq f60 <__rpc_execute+0x1ba> e6b: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # e72 <__rpc_execute+0xcc> e72: 74 15 je e89 <__rpc_execute+0xe3> e74: 0f b7 b3 40 01 00 00 movzwl 0x140(%rbx),%esi e7b: 48 c7 c7 00 00 00 00 mov $0x0,%rdi e82: 31 c0 xor %eax,%eax e84: e8 00 00 00 00 callq e89 <__rpc_execute+0xe3> e89: b9 01 00 00 00 mov $0x1,%ecx e8e: 48 c7 c2 00 00 00 00 mov $0x0,%rdx e95: be 01 00 00 00 mov $0x1,%esi e9a: 48 89 ef mov %rbp,%rdi e9d: e8 00 00 00 00 callq ea2 <__rpc_execute+0xfc> ea2: 3d 00 fe ff ff cmp $0xfffffe00,%eax ea7: 41 89 c4 mov %eax,%r12d eaa: 75 3b jne ee7 <__rpc_execute+0x141> eac: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # eb3 <__rpc_execute+0x10d> eb3: 74 15 je eca <__rpc_execute+0x124> eb5: 0f b7 b3 40 01 00 00 movzwl 0x140(%rbx),%esi ebc: 48 c7 c7 00 00 00 00 mov $0x0,%rdi ec3: 31 c0 xor %eax,%eax ec5: e8 00 00 00 00 callq eca <__rpc_execute+0x124> eca: 66 81 8b c8 00 00 00 orw $0x100,0xc8(%rbx) ed1: 00 01 ed3: 44 89 63 30 mov %r12d,0x30(%rbx) ed7: 48 89 df mov %rbx,%rdi eda: 48 c7 43 78 00 00 00 movq $0x0,0x78(%rbx) ee1: 00 ee2: e8 00 00 00 00 callq ee7 <__rpc_execute+0x141> ee7: f0 0f ba ab d0 00 00 lock btsl $0x0,0xd0(%rbx) eee: 00 00 ef0: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # ef7 <__rpc_execute+0x151> ef7: 0f 84 ec fe ff ff je de9 <__rpc_execute+0x43> efd: 0f b7 b3 40 01 00 00 movzwl 0x140(%rbx),%esi f04: 48 c7 c7 00 00 00 00 mov $0x0,%rdi f0b: 31 c0 xor %eax,%eax f0d: e8 00 00 00 00 callq f12 <__rpc_execute+0x16c> f12: e9 d2 fe ff ff jmpq de9 <__rpc_execute+0x43> f17: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # f1e <__rpc_execute+0x178> f1e: 74 1b je f3b <__rpc_execute+0x195> f20: 0f b7 b3 40 01 00 00 movzwl 0x140(%rbx),%esi f27: 8b 4b 30 mov 0x30(%rbx),%ecx f2a: 44 89 e2 mov %r12d,%edx f2d: 48 c7 c7 00 00 00 00 mov $0x0,%rdi f34: 31 c0 xor %eax,%eax f36: e8 00 00 00 00 callq f3b <__rpc_execute+0x195> f3b: 48 8d bb d0 00 00 00 lea 0xd0(%rbx),%rdi f42: f0 0f ba b3 d0 00 00 lock btrl $0x4,0xd0(%rbx) f49: 00 04 f4b: be 04 00 00 00 mov $0x4,%esi f50: e8 00 00 00 00 callq f55 <__rpc_execute+0x1af> f55: 48 89 df mov %rbx,%rdi f58: e8 00 00 00 00 callq f5d <__rpc_execute+0x1b7> f5d: 44 89 e0 mov %r12d,%eax f60: 5b pop %rbx f61: 5d pop %rbp f62: 41 5c pop %r12 f64: c3 retq 0000000000000f65 : rpc_execute(): f65: b8 04 00 00 00 mov $0x4,%eax f6a: f0 0f ab 87 d0 00 00 lock bts %eax,0xd0(%rdi) f71: 00 f72: 30 c0 xor %al,%al f74: f0 0f ab 87 d0 00 00 lock bts %eax,0xd0(%rdi) f7b: 00 f7c: e9 25 fe ff ff jmpq da6 <__rpc_execute> 0000000000000f81 : rpc_async_schedule(): f81: e9 20 fe ff ff jmpq da6 <__rpc_execute> 0000000000000f86 : rpc_run_task(): f86: 53 push %rbx f87: e8 00 00 00 00 callq f8c f8c: 48 89 c3 mov %rax,%rbx f8f: 48 c7 c0 f4 ff ff ff mov $0xfffffffffffffff4,%rax f96: 48 85 db test %rbx,%rbx f99: 74 0e je fa9 f9b: f0 ff 03 lock incl (%rbx) f9e: 48 89 df mov %rbx,%rdi fa1: e8 00 00 00 00 callq fa6 fa6: 48 89 d8 mov %rbx,%rax fa9: 5b pop %rbx faa: c3 retq 0000000000000fab : rpc_killall_tasks(): fab: 55 push %rbp fac: 48 89 fd mov %rdi,%rbp faf: 53 push %rbx fb0: 50 push %rax fb1: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # fb8 fb8: 74 11 je fcb fba: 48 89 fe mov %rdi,%rsi fbd: 31 c0 xor %eax,%eax fbf: 48 c7 c7 00 00 00 00 mov $0x0,%rdi fc6: e8 00 00 00 00 callq fcb fcb: 48 c7 c7 00 00 00 00 mov $0x0,%rdi fd2: e8 00 00 00 00 callq fd7 fd7: 48 8b 1d 00 00 00 00 mov 0(%rip),%rbx # fde fde: 48 8b 03 mov (%rbx),%rax fe1: 0f 18 08 prefetcht0 (%rax) fe4: 48 81 fb 00 00 00 00 cmp $0x0,%rbx feb: 74 3b je 1028 fed: 8b 83 c0 00 00 00 mov 0xc0(%rbx),%eax ff3: 48 8d 7b f0 lea 0xfffffffffffffff0(%rbx),%rdi ff7: a8 10 test $0x10,%al ff9: 74 28 je 1023 ffb: 48 85 ed test %rbp,%rbp ffe: 74 06 je 1006 1000: 48 39 6f 20 cmp %rbp,0x20(%rdi) 1004: 75 1d jne 1023 1006: 66 81 8f c8 00 00 00 orw $0x100,0xc8(%rdi) 100d: 00 01 100f: c7 47 30 fb ff ff ff movl $0xfffffffb,0x30(%rdi) 1016: 48 c7 47 78 00 00 00 movq $0x0,0x78(%rdi) 101d: 00 101e: e8 00 00 00 00 callq 1023 1023: 48 8b 1b mov (%rbx),%rbx 1026: eb b6 jmp fde 1028: 58 pop %rax 1029: 5b pop %rbx 102a: 5d pop %rbp 102b: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 1032: e9 00 00 00 00 jmpq 1037 0000000000001037 : rpciod_up(): 1037: 55 push %rbp 1038: 48 c7 c5 00 00 00 00 mov $0x0,%rbp 103f: 48 89 ef mov %rbp,%rdi 1042: 53 push %rbx 1043: 31 db xor %ebx,%ebx 1045: 51 push %rcx 1046: f0 ff 0d 00 00 00 00 lock decl 0(%rip) # 104d 104d: 0f 88 01 04 00 00 js 1454 <.text.lock.sched> 1053: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # 105a 105a: 74 14 je 1070 105c: 8b 35 00 00 00 00 mov 0(%rip),%esi # 1062 1062: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 1069: 31 c0 xor %eax,%eax 106b: e8 00 00 00 00 callq 1070 1070: 8b 35 00 00 00 00 mov 0(%rip),%esi # 1076 1076: ff c6 inc %esi 1078: 48 83 3d 00 00 00 00 cmpq $0x0,0(%rip) # 1080 107f: 00 1080: 89 35 00 00 00 00 mov %esi,0(%rip) # 1086 1086: 75 4a jne 10d2 1088: 83 fe 01 cmp $0x1,%esi 108b: 76 0e jbe 109b 108d: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 1094: 31 c0 xor %eax,%eax 1096: e8 00 00 00 00 callq 109b 109b: 31 f6 xor %esi,%esi 109d: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 10a4: bb f4 ff ff ff mov $0xfffffff4,%ebx 10a9: e8 00 00 00 00 callq 10ae 10ae: 48 85 c0 test %rax,%rax 10b1: 75 16 jne 10c9 10b3: 89 de mov %ebx,%esi 10b5: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 10bc: e8 00 00 00 00 callq 10c1 10c1: ff 0d 00 00 00 00 decl 0(%rip) # 10c7 10c7: eb 09 jmp 10d2 10c9: 48 89 05 00 00 00 00 mov %rax,0(%rip) # 10d0 10d0: 31 db xor %ebx,%ebx 10d2: 48 89 ef mov %rbp,%rdi 10d5: f0 ff 05 00 00 00 00 lock incl 0(%rip) # 10dc 10dc: 0f 8e 7c 03 00 00 jle 145e <.text.lock.sched+0xa> 10e2: 5a pop %rdx 10e3: 89 d8 mov %ebx,%eax 10e5: 5b pop %rbx 10e6: 5d pop %rbp 10e7: c3 retq 00000000000010e8 : rpciod_down(): 10e8: 55 push %rbp 10e9: 48 c7 c5 00 00 00 00 mov $0x0,%rbp 10f0: 53 push %rbx 10f1: 57 push %rdi 10f2: 48 89 ef mov %rbp,%rdi 10f5: f0 ff 0d 00 00 00 00 lock decl 0(%rip) # 10fc 10fc: 0f 88 66 03 00 00 js 1468 <.text.lock.sched+0x14> 1102: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # 1109 1109: 74 14 je 111f 110b: 8b 35 00 00 00 00 mov 0(%rip),%esi # 1111 1111: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 1118: 31 c0 xor %eax,%eax 111a: e8 00 00 00 00 callq 111f 111f: 8b 05 00 00 00 00 mov 0(%rip),%eax # 1125 1125: 85 c0 test %eax,%eax 1127: 74 11 je 113a 1129: ff c8 dec %eax 112b: 85 c0 test %eax,%eax 112d: 89 05 00 00 00 00 mov %eax,0(%rip) # 1133 1133: 74 13 je 1148 1135: e9 ef 00 00 00 jmpq 1229 113a: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 1141: 31 c0 xor %eax,%eax 1143: e8 00 00 00 00 callq 1148 1148: 48 83 3d 00 00 00 00 cmpq $0x0,0(%rip) # 1150 114f: 00 1150: 75 20 jne 1172 1152: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # 1159 1159: 0f 84 ca 00 00 00 je 1229 115f: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 1166: 31 c0 xor %eax,%eax 1168: e8 00 00 00 00 callq 116d 116d: e9 b7 00 00 00 jmpq 1229 1172: 48 81 3d 00 00 00 00 cmpq $0x0,0(%rip) # 117d 1179: 00 00 00 00 117d: 74 50 je 11cf 117f: 65 48 8b 04 25 10 00 mov %gs:0x10,%rax 1186: 00 00 1188: f0 0f ba b0 38 e0 ff lock btrl $0x2,0xffffffffffffe038(%rax) 118f: ff 02 1191: 31 ff xor %edi,%edi 1193: e8 00 00 00 00 callq 1198 1198: 48 8b 3d 00 00 00 00 mov 0(%rip),%rdi # 119f 119f: e8 00 00 00 00 callq 11a4 11a4: 48 81 3d 00 00 00 00 cmpq $0x0,0(%rip) # 11af 11ab: 00 00 00 00 11af: 74 c1 je 1172 11b1: f6 05 00 00 00 00 40 testb $0x40,0(%rip) # 11b8 11b8: 74 0e je 11c8 11ba: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 11c1: 31 c0 xor %eax,%eax 11c3: e8 00 00 00 00 callq 11c8 11c8: e8 00 00 00 00 callq 11cd 11cd: eb a3 jmp 1172 11cf: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 11d6: 00 00 11d8: 48 8b b8 d0 05 00 00 mov 0x5d0(%rax),%rdi 11df: 48 81 c7 08 08 00 00 add $0x808,%rdi 11e6: e8 00 00 00 00 callq 11eb 11eb: 48 89 c3 mov %rax,%rbx 11ee: e8 00 00 00 00 callq 11f3 11f3: 65 48 8b 04 25 00 00 mov %gs:0x0,%rax 11fa: 00 00 11fc: 48 8b b8 d0 05 00 00 mov 0x5d0(%rax),%rdi 1203: 48 89 de mov %rbx,%rsi 1206: 48 81 c7 08 08 00 00 add $0x808,%rdi 120d: e8 00 00 00 00 callq 1212 1212: 48 8b 3d 00 00 00 00 mov 0(%rip),%rdi # 1219 1219: e8 00 00 00 00 callq 121e 121e: 48 c7 05 00 00 00 00 movq $0x0,0(%rip) # 1229 1225: 00 00 00 00 1229: 48 89 ef mov %rbp,%rdi 122c: f0 ff 05 00 00 00 00 lock incl 0(%rip) # 1233 1233: 0f 8e 39 02 00 00 jle 1472 <.text.lock.sched+0x1e> 1239: 5e pop %rsi 123a: 5b pop %rbx 123b: 5d pop %rbp 123c: c3 retq 000000000000123d : rpc_show_tasks(): 123d: 53 push %rbx 123e: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 1245: e8 00 00 00 00 callq 124a 124a: 48 81 3d 00 00 00 00 cmpq $0x0,0(%rip) # 1255 1251: 00 00 00 00 1255: 0f 84 bd 00 00 00 je 1318 125b: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 1262: 31 c0 xor %eax,%eax 1264: e8 00 00 00 00 callq 1269 1269: 48 8b 1d 00 00 00 00 mov 0(%rip),%rbx # 1270 1270: 48 8b 03 mov (%rbx),%rax 1273: 0f 18 08 prefetcht0 (%rax) 1276: 48 81 fb 00 00 00 00 cmp $0x0,%rbx 127d: 0f 84 95 00 00 00 je 1318 1283: 8b 83 c0 00 00 00 mov 0xc0(%rbx),%eax 1289: 48 8d 7b f0 lea 0xfffffffffffffff0(%rbx),%rdi 128d: 49 c7 c2 00 00 00 00 mov $0x0,%r10 1294: a8 02 test $0x2,%al 1296: 74 1f je 12b7 1298: 48 8b 87 00 01 00 00 mov 0x100(%rdi),%rax 129f: 48 85 c0 test %rax,%rax 12a2: 74 09 je 12ad 12a4: 48 8b 40 48 mov 0x48(%rax),%rax 12a8: 48 85 c0 test %rax,%rax 12ab: 75 07 jne 12b4 12ad: 48 c7 c0 00 00 00 00 mov $0x0,%rax 12b4: 49 89 c2 mov %rax,%r10 12b7: 48 8b 47 38 mov 0x38(%rdi),%rax 12bb: 83 ca ff or $0xffffffffffffffff,%edx 12be: 4c 8b 4f 20 mov 0x20(%rdi),%r9 12c2: 44 8b 47 30 mov 0x30(%rdi),%r8d 12c6: 0f b7 8f c8 00 00 00 movzwl 0xc8(%rdi),%ecx 12cd: 48 85 c0 test %rax,%rax 12d0: 74 02 je 12d4 12d2: 8b 10 mov (%rax),%edx 12d4: 0f b7 b7 40 01 00 00 movzwl 0x140(%rdi),%esi 12db: ff b7 80 00 00 00 pushq 0x80(%rdi) 12e1: ff 77 78 pushq 0x78(%rdi) 12e4: 41 52 push %r10 12e6: ff b7 c0 00 00 00 pushq 0xc0(%rdi) 12ec: ff 77 28 pushq 0x28(%rdi) 12ef: 48 8b 47 20 mov 0x20(%rdi),%rax 12f3: 31 ff xor %edi,%edi 12f5: 48 85 c0 test %rax,%rax 12f8: 74 03 je 12fd 12fa: 8b 78 18 mov 0x18(%rax),%edi 12fd: 57 push %rdi 12fe: 31 c0 xor %eax,%eax 1300: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 1307: e8 00 00 00 00 callq 130c 130c: 48 8b 1b mov (%rbx),%rbx 130f: 48 83 c4 30 add $0x30,%rsp 1313: e9 58 ff ff ff jmpq 1270 1318: 5b pop %rbx 1319: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 1320: e9 00 00 00 00 jmpq 1325 0000000000001325 : rpc_destroy_mempool(): 1325: 41 52 push %r10 1327: 48 8b 3d 00 00 00 00 mov 0(%rip),%rdi # 132e 132e: 48 85 ff test %rdi,%rdi 1331: 74 05 je 1338 1333: e8 00 00 00 00 callq 1338 1338: 48 8b 3d 00 00 00 00 mov 0(%rip),%rdi # 133f 133f: 48 85 ff test %rdi,%rdi 1342: 74 05 je 1349 1344: e8 00 00 00 00 callq 1349 1349: 48 8b 3d 00 00 00 00 mov 0(%rip),%rdi # 1350 1350: 48 85 ff test %rdi,%rdi 1353: 74 17 je 136c 1355: e8 00 00 00 00 callq 135a 135a: 85 c0 test %eax,%eax 135c: 74 0e je 136c 135e: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 1365: 31 c0 xor %eax,%eax 1367: e8 00 00 00 00 callq 136c 136c: 48 8b 3d 00 00 00 00 mov 0(%rip),%rdi # 1373 1373: 48 85 ff test %rdi,%rdi 1376: 74 19 je 1391 1378: e8 00 00 00 00 callq 137d 137d: 85 c0 test %eax,%eax 137f: 74 10 je 1391 1381: 41 59 pop %r9 1383: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 138a: 31 c0 xor %eax,%eax 138c: e9 00 00 00 00 jmpq 1391 1391: 41 58 pop %r8 1393: c3 retq 0000000000001394 : rpc_init_mempool(): 1394: 50 push %rax 1395: 45 31 c9 xor %r9d,%r9d 1398: 45 31 c0 xor %r8d,%r8d 139b: 31 d2 xor %edx,%edx 139d: b9 00 20 00 00 mov $0x2000,%ecx 13a2: be 48 01 00 00 mov $0x148,%esi 13a7: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 13ae: e8 00 00 00 00 callq 13b3 13b3: 48 85 c0 test %rax,%rax 13b6: 48 89 05 00 00 00 00 mov %rax,0(%rip) # 13bd 13bd: 0f 84 82 00 00 00 je 1445 13c3: 45 31 c9 xor %r9d,%r9d 13c6: 45 31 c0 xor %r8d,%r8d 13c9: 31 d2 xor %edx,%edx 13cb: b9 00 20 00 00 mov $0x2000,%ecx 13d0: be 00 08 00 00 mov $0x800,%esi 13d5: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 13dc: e8 00 00 00 00 callq 13e1 13e1: 48 85 c0 test %rax,%rax 13e4: 48 89 05 00 00 00 00 mov %rax,0(%rip) # 13eb 13eb: 74 58 je 1445 13ed: 48 8b 0d 00 00 00 00 mov 0(%rip),%rcx # 13f4 13f4: 48 c7 c2 00 00 00 00 mov $0x0,%rdx 13fb: 48 c7 c6 00 00 00 00 mov $0x0,%rsi 1402: bf 08 00 00 00 mov $0x8,%edi 1407: e8 00 00 00 00 callq 140c 140c: 48 85 c0 test %rax,%rax 140f: 48 89 05 00 00 00 00 mov %rax,0(%rip) # 1416 1416: 74 2d je 1445 1418: 48 8b 0d 00 00 00 00 mov 0(%rip),%rcx # 141f 141f: 48 c7 c2 00 00 00 00 mov $0x0,%rdx 1426: 48 c7 c6 00 00 00 00 mov $0x0,%rsi 142d: bf 08 00 00 00 mov $0x8,%edi 1432: e8 00 00 00 00 callq 1437 1437: 31 d2 xor %edx,%edx 1439: 48 85 c0 test %rax,%rax 143c: 48 89 05 00 00 00 00 mov %rax,0(%rip) # 1443 1443: 75 0a jne 144f 1445: e8 00 00 00 00 callq 144a 144a: ba f4 ff ff ff mov $0xfffffff4,%edx 144f: 41 5b pop %r11 1451: 89 d0 mov %edx,%eax 1453: c3 retq 0000000000001454 <.text.lock.sched>: .text.lock.sched(): 1454: e8 00 00 00 00 callq 1459 <.text.lock.sched+0x5> 1459: e9 f5 fb ff ff jmpq 1053 145e: e8 00 00 00 00 callq 1463 <.text.lock.sched+0xf> 1463: e9 7a fc ff ff jmpq 10e2 1468: e8 00 00 00 00 callq 146d <.text.lock.sched+0x19> 146d: e9 90 fc ff ff jmpq 1102 1472: e8 00 00 00 00 callq 1477 <.text.lock.sched+0x23> 1477: e9 bd fd ff ff jmpq 1239 From rkuchimanchi at silverstorm.com Fri May 26 10:31:35 2006 From: rkuchimanchi at silverstorm.com (Kuchimanchi, Ramachandra) Date: Fri, 26 May 2006 13:31:35 -0400 Subject: [openib-general] [PATCH] SRP : Use correct port identifier format according to target io_class Message-ID: Hi Roland, There has been a change in the format of port identifiers between revision 10 of the SRP specification and the current revision 16A. Revision 10 specifies port identifier format as lower 8 bytes : GUID upper 8 bytes : Extension Where as revision 16A specifies it as lower 8 bytes : Extension upper 8 bytes : GUID There are older targets (e.g. SilverStorm Virtual Fibre Channel Bridge) which conform to revision 10 of the SRP specification. The IO class of revision 10 is 0xFF00 and the IO class of revision 16A is 0x0100. For supporting older targets, this patch: 1) Adds a new optional target creation parameter "io_class". Default value of io_class is 0x0100 (i.e. revision 16A) 2) Uses the correct port identifier format for targets with IO class of 0xFF00 (i.e. conforming to revision 10) Regards, Ram Signed-off-by: Ramachandra K (rkuchimanchi at silverstorm.com) Index: infiniband/ulp/srp/ib_srp.c =================================================================== --- infiniband/ulp/srp/ib_srp.c (revision 7460) +++ infiniband/ulp/srp/ib_srp.c (working copy) @@ -321,8 +321,34 @@ req->priv.req_it_iu_len = cpu_to_be32(srp_max_iu_len); req->priv.req_buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | SRP_BUF_FORMAT_INDIRECT); - memcpy(req->priv.initiator_port_id, target->srp_host->initiator_port_id, 16); /* + * Older targets conforming to Rev 10 of the SRP specification + * use the port identifier format which is + * + * lower 8 bytes : GUID + * upper 8 bytes : extension + * + * Where as according to the new SRP specification (Rev 16a), the + * port identifier format is + * + * lower 8 bytes : extension + * upper 8 bytes : GUID + * + * So check the IO class of the target to decide which format to use. + */ + + /* If its Rev 10, flip the initiator port id fields */ + if (target->io_class == SRP_REV10_IO_CLASS) { + memcpy(req->priv.initiator_port_id, + target->srp_host->initiator_port_id + 8 , 8); + memcpy(req->priv.initiator_port_id + 8, + target->srp_host->initiator_port_id, 8); + } + else { + memcpy(req->priv.initiator_port_id, + target->srp_host->initiator_port_id, 16); + } + /* * Topspin/Cisco SRP targets will reject our login unless we * zero out the first 8 bytes of our initiator port ID. The * second 8 bytes must be our local node GUID, but we always @@ -334,8 +360,14 @@ (unsigned long long) be64_to_cpu(target->ioc_guid)); memset(req->priv.initiator_port_id, 0, 8); } - memcpy(req->priv.target_port_id, &target->id_ext, 8); - memcpy(req->priv.target_port_id + 8, &target->ioc_guid, 8); + if (target->io_class == SRP_REV10_IO_CLASS) { + memcpy(req->priv.target_port_id, &target->ioc_guid, 8); + memcpy(req->priv.target_port_id + 8, &target->id_ext, 8); + } + else { + memcpy(req->priv.target_port_id, &target->id_ext, 8); + memcpy(req->priv.target_port_id + 8, &target->ioc_guid, 8); + } status = ib_send_cm_req(target->cm_id, &req->param); @@ -1513,6 +1545,7 @@ SRP_OPT_SERVICE_ID = 1 << 4, SRP_OPT_MAX_SECT = 1 << 5, SRP_OPT_MAX_CMD_PER_LUN = 1 << 6, + SRP_OPT_IO_CLASS = 1 << 7, SRP_OPT_ALL = (SRP_OPT_ID_EXT | SRP_OPT_IOC_GUID | SRP_OPT_DGID | @@ -1528,6 +1561,7 @@ { SRP_OPT_SERVICE_ID, "service_id=%s" }, { SRP_OPT_MAX_SECT, "max_sect=%d" }, { SRP_OPT_MAX_CMD_PER_LUN, "max_cmd_per_lun=%d" }, + { SRP_OPT_IO_CLASS, "io_class=%x" }, { SRP_OPT_ERR, NULL } }; @@ -1611,7 +1645,19 @@ } target->scsi_host->cmd_per_lun = min(token, SRP_SQ_SIZE); break; - + case SRP_OPT_IO_CLASS: + if (match_hex(args, &token)) { + printk(KERN_WARNING PFX "bad IO class parameter '%s' \n", p); + goto out; + } + if (token == SRP_REV10_IO_CLASS || token == SRP_REV16A_IO_CLASS) + target->io_class = (unsigned short)(token); + else + printk(KERN_WARNING PFX "unknown IO class parameter value" + " %x specified. Use %x or %x. Defaulting to IO class %x\n", + token, SRP_REV10_IO_CLASS, SRP_REV16A_IO_CLASS, + SRP_REV16A_IO_CLASS); + break; default: printk(KERN_WARNING PFX "unknown parameter or missing value " "'%s' in target creation request\n", p); @@ -1654,6 +1700,8 @@ target = host_to_target(target_host); memset(target, 0, sizeof *target); + /*Set default IO class of target to Rev 16A*/ + target->io_class = SRP_REV16A_IO_CLASS; target->scsi_host = target_host; target->srp_host = host; Index: infiniband/ulp/srp/ib_srp.h =================================================================== --- infiniband/ulp/srp/ib_srp.h (revision 7460) +++ infiniband/ulp/srp/ib_srp.h (working copy) @@ -48,6 +48,9 @@ #include #include +#define SRP_REV10_IO_CLASS 0xFF00 +#define SRP_REV16A_IO_CLASS 0x0100 + enum { SRP_PATH_REC_TIMEOUT_MS = 1000, SRP_ABORT_TIMEOUT_MS = 5000, @@ -122,6 +125,7 @@ __be64 id_ext; __be64 ioc_guid; __be64 service_id; + __be16 io_class; struct srp_host *srp_host; struct Scsi_Host *scsi_host; char target_name[32]; From rkuchimanchi at silverstorm.com Fri May 26 10:31:44 2006 From: rkuchimanchi at silverstorm.com (Kuchimanchi, Ramachandra) Date: Fri, 26 May 2006 13:31:44 -0400 Subject: [openib-general] [PATCH] SRPTOOLS : print out the target io_class in ibsrpdm Message-ID: Hi Roland, This patch prints out the target io_class value in ibsrpdm while displaying the target information and also with the -c switch. Regards, Ram Signed-off-by: Ramachandra K (rkuchimanchi at silverstorm.com) Index: userspace/srptools/src/srp-dm.c =================================================================== --- userspace/srptools/src/srp-dm.c (revision 7475) +++ userspace/srptools/src/srp-dm.c (working copy) @@ -398,6 +398,7 @@ (unsigned long long) ntohll(ioc_prof.guid)); pr_human(" vendor ID: %06x\n", ntohl(ioc_prof.vendor_id) >> 8); pr_human(" device ID: %06x\n", ntohl(ioc_prof.device_id)); + pr_human(" IO class : %hx\n", ntohs(ioc_prof.io_class)); pr_human(" ID: %s\n", ioc_prof.id); pr_human(" service entries: %d\n", ioc_prof.service_entries); @@ -429,11 +430,13 @@ "ioc_guid=%016llx," "dgid=%016llx%016llx," "pkey=ffff," + "io_class=%hx," "service_id=%016llx\n", id_ext, (unsigned long long) ntohll(ioc_prof.guid), (unsigned long long) subnet_prefix, (unsigned long long) guid, + (unsigned short) ntohs(ioc_prof.io_class), (unsigned long long) ntohll(svc_entries.service[k].id)); } } From Don.Albert at Bull.com Fri May 26 10:34:11 2006 From: Don.Albert at Bull.com (Don.Albert at Bull.com) Date: Fri, 26 May 2006 10:34:11 -0700 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: <1148661960.4583.6711.camel@hal.voltaire.com> Message-ID: Hal, > Hi again Paul, Since your last message was addressed to Paul, and you said my problem was completely different, I don't know if a backtrace would help in my case, but here it is anyway, just in case. (See below.) > > Would you rebuild OpenSM with debug: > ./configure --enable-debug && make clean && make && make install > > and then run opensm under gdb and provide the backtrace after the > failure? > > Thanks. > > -- Hal I can also rebuild with --enable_debug if it would be useful. -Don Albert- Backtrace of segfault in SM: [koa] (ib) ib> gdb /usr/local/ofed/bin/opensm GNU gdb Red Hat Linux (6.3.0.0-1.96rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"...(no debugging symbols found) Using host libthread_db library "/lib64/tls/libthread_db.so.1". (gdb) run Starting program: /usr/local/ofed/bin/opensm [Thread debugging using libthread_db enabled] [New Thread 47576487182656 (LWP 8030)] [New Thread 1082132832 (LWP 8033)] ------------------------------------------------- OpenSM Rev:openib-1.2.0 Based on OpenIB svn Exported revision Command Line Arguments: Log File: /var/log/osm.log ------------------------------------------------- OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision [New Thread 1090525536 (LWP 8034)] [New Thread 1098918240 (LWP 8035)] [New Thread 1107310944 (LWP 8036)] [New Thread 1115703648 (LWP 8037)] [New Thread 1124096352 (LWP 8038)] [New Thread 1132489056 (LWP 8039)] Using default GUID 0x2c90200216dc5 [New Thread 1140881760 (LWP 8040)] Entering MASTER state Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1090525536 (LWP 8034)] 0x000000000040b5bb in osm_physp_is_valid () (gdb) bt #0 0x000000000040b5bb in osm_physp_is_valid () #1 0x000000000040b555 in __osm_lid_mgr_set_remote_pi_state_to_init () #2 0x000000000040babf in __osm_lid_mgr_set_physp_pi () #3 0x000000000040c065 in __osm_lid_mgr_process_our_sm_node () #4 0x000000000040c151 in osm_lid_mgr_process_sm () #5 0x000000000043a2b9 in osm_state_mgr_process () #6 0x000000000043aefc in __osm_state_mgr_ctrl_disp_callback () #7 0x00002b454359db27 in __cl_disp_worker (context=0x57ca20) at cl_dispatcher.c:108 #8 0x00002b45435a6025 in __cl_thread_pool_routine (context=0x57ca98) at cl_threadpool.c:78 #9 0x00002b45435a5e6e in __cl_thread_wrapper (arg=0x57d7d0) at cl_thread.c:61 #10 0x0000003a80f0610a in start_thread () from /lib64/tls/libpthread.so.0 #11 0x0000003a806c6003 in clone () from /lib64/tls/libc.so.6 #12 0x0000000000000000 in ?? () (gdb) -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Fri May 26 10:27:25 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 26 May 2006 13:27:25 -0400 Subject: [openib-general] Re: OpenSM build problem In-Reply-To: <1AC79F16F5C5284499BB9591B33D6F0007CE0ED7@orsmsx408> References: <1AC79F16F5C5284499BB9591B33D6F0007CE0ED7@orsmsx408> Message-ID: <1148664444.4583.7586.camel@hal.voltaire.com> On Fri, 2006-05-26 at 12:56, Woodruff, Robert J wrote: > When I build the opensm on the 1.0 branch using the folowing commands > (the defaults) > cd ../management/libibcommon > ./autogen.sh && ./configure && make && make install > cd ../libibumad > ./autogen.sh && ./configure && make && make install > cd ../libibmad > ./autogen.sh && ./configure && make && make install > cd ../osm/complib > ./autogen.sh && ./configure && make && make install When you do this, where is the library for this (lisosmcomp) being installed ? What libosmcomp* are in that directory ? Is your LD_LIBRARY_PATH set so that this directory is included (or other mechanisms of doing the same) ? > cd ../libvendor > ./autogen.sh && ./configure && make && make install > cd ../opensm > ./autogen.sh && ./configure && make && make install > > I get the following error when trying to run it. > > [root at iclust-tiger1 woody]# /usr/local/bin/opensm: > /usr/local/lib/libosmcomp.so.1: version `OSMCOMP_1.0' not found > (required by /usr/local/bin/opensm) > > [1]+ Exit 1 /usr/local/bin/opensm > > This does not happen with the trunk version 7479 when I build it the > same way. The trunk's complib is now at version 1.1 whereas 1.0 branch is still 1.0 > Is there something that I need to specify when I build it for the 1.0 > version ? You shouldn't have to. -- Hal > woody > From bugzilla-daemon at openib.org Fri May 26 10:51:30 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Fri, 26 May 2006 10:51:30 -0700 (PDT) Subject: [openib-general] [Bug 99] New: Enable both 32bit and 64bit libraries on dual-arch systems (ppc64 and x86_64) Message-ID: <20060526175130.1163D22834D@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=99 Summary: Enable both 32bit and 64bit libraries on dual-arch systems (ppc64 and x86_64) Product: OpenFabrics Linux Version: 1.0rc5 Platform: X86-64 OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Verbs AssignedTo: bugzilla at openib.org ReportedBy: sp at scali.com I've tested RC5, on a rhel4 u2 x86_64 platform. Everything was fine (I had all the pre-req RPMs installed), however when trying to install 32bit libraries and 64bit libraries at the same time, I get : % rpm -i i386/libibverbs-1.0.3-0.i386.rpm file /etc/ld.so.conf.d/ofed.conf from install of libibverbs-1.0.3-0 conflicts with file from package libibverbs-1.0.3-0 Might I suggest renaming them to 'ofed-lib.conf' and 'ofed-lib64.conf' or something in that order? The OpenFabrics IB stack solves something people in HPC have been wanting for some time, running 32bit applications natively on 64bit machines (64bit kernels). By allowing the i386 and x86_64 RPMs to co-exist it enables that (and it actually works to, at least with Scali MPI Connect which is what I'm testing). _Or_ a possible better solution would be to compile both 32bit and 64bit libraries in the same package on x86_64 (I guess the same applies to ppc64) ? ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From robert.j.woodruff at intel.com Fri May 26 10:39:54 2006 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Fri, 26 May 2006 10:39:54 -0700 Subject: [openib-general] RE: OpenSM build problem In-Reply-To: <1148664444.4583.7586.camel@hal.voltaire.com> Message-ID: <000001c680eb$666da750$58a9070a@amr.corp.intel.com> Hal wrote, >When you do this, where is the library for this (lisosmcomp) being >installed ? What libosmcomp* are in that directory ? I let it default to /usr/local/lib for the libraries and /usr/local/bin for the binary. >Is your LD_LIBRARY_PATH set so that this directory is included (or other >mechanisms of doing the same) ? I set /etc/ld.so.conf to include /usr/local/lib in the path. Must have been some makefile changes or such between the trunk version and the 1.0 version since if I build the trunk the same way, it works just fine. woody From halr at voltaire.com Fri May 26 10:43:23 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 26 May 2006 13:43:23 -0400 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: References: Message-ID: <1148665402.4583.7943.camel@hal.voltaire.com> Hi Don, On Fri, 2006-05-26 at 13:34, Don.Albert at Bull.com wrote: > Hal, > > > Hi again Paul, > > Since your last message was addressed to Paul, and you said my problem > was completely different, I don't know if a backtrace would help in my > case, but here it is anyway, just in case. (See below.) > > > > > Would you rebuild OpenSM with debug: > > ./configure --enable-debug && make clean && make && make install > > > > and then run opensm under gdb and provide the backtrace after the > > failure? > > > > Thanks. > > > > -- Hal > > I can also rebuild with --enable_debug if it would be useful. > > -Don Albert- > > Backtrace of segfault in SM: > > [koa] (ib) ib> gdb /usr/local/ofed/bin/opensm > GNU gdb Red Hat Linux (6.3.0.0-1.96rh) > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and > you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for > details. > This GDB was configured as "x86_64-redhat-linux-gnu"...(no debugging > symbols found) > Using host libthread_db library "/lib64/tls/libthread_db.so.1". > > (gdb) run > Starting program: /usr/local/ofed/bin/opensm > [Thread debugging using libthread_db enabled] > [New Thread 47576487182656 (LWP 8030)] > [New Thread 1082132832 (LWP 8033)] > ------------------------------------------------- > OpenSM Rev:openib-1.2.0 > Based on OpenIB svn Exported revision > Command Line Arguments: > Log File: /var/log/osm.log > ------------------------------------------------- > OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision > > [New Thread 1090525536 (LWP 8034)] > [New Thread 1098918240 (LWP 8035)] > [New Thread 1107310944 (LWP 8036)] > [New Thread 1115703648 (LWP 8037)] > [New Thread 1124096352 (LWP 8038)] > [New Thread 1132489056 (LWP 8039)] > Using default GUID 0x2c90200216dc5 > [New Thread 1140881760 (LWP 8040)] > Entering MASTER state > > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 1090525536 (LWP 8034)] > 0x000000000040b5bb in osm_physp_is_valid () > (gdb) bt > #0 0x000000000040b5bb in osm_physp_is_valid () > #1 0x000000000040b555 in __osm_lid_mgr_set_remote_pi_state_to_init () > #2 0x000000000040babf in __osm_lid_mgr_set_physp_pi () > #3 0x000000000040c065 in __osm_lid_mgr_process_our_sm_node () > #4 0x000000000040c151 in osm_lid_mgr_process_sm () > #5 0x000000000043a2b9 in osm_state_mgr_process () > #6 0x000000000043aefc in __osm_state_mgr_ctrl_disp_callback () > #7 0x00002b454359db27 in __cl_disp_worker (context=0x57ca20) at > cl_dispatcher.c:108 > #8 0x00002b45435a6025 in __cl_thread_pool_routine (context=0x57ca98) > at cl_threadpool.c:78 > #9 0x00002b45435a5e6e in __cl_thread_wrapper (arg=0x57d7d0) at > cl_thread.c:61 > #10 0x0000003a80f0610a in start_thread () from > /lib64/tls/libpthread.so.0 > #11 0x0000003a806c6003 in clone () from /lib64/tls/libc.so.6 > #12 0x0000000000000000 in ?? () > (gdb) Yes, that is very useful. I had been working on trying to come up with what the problem was but this narrows it down to something I was thinking might be going on. It looks like you are running back to back HCAs, right ? It also looks to me like your remote (in terms of OpenSM) CA node is not responding to SMA requests like SubnGet NodeInfo yet the link is active. Can you describe what state that node is in (what modules are loaded, etc.) ? Can you do an ibstat/ibstatus on that node ? Can you try this patch to see if it gets you further and let me know ? Note that this is just a potential workaround right now. Thanks. -- Hal Index: opensm/osm_lid_mgr.c =================================================================== --- opensm/osm_lid_mgr.c (revision 7412) +++ opensm/osm_lid_mgr.c (working copy) @@ -932,6 +932,9 @@ __osm_lid_mgr_set_remote_pi_state_to_ini CL_ASSERT(p_rem_physp); + if ( p_rem_physp == NULL ) + return; + if (osm_physp_is_valid( p_rem_physp )) { p_pi = osm_physp_get_port_info_ptr( p_rem_physp ); From halr at voltaire.com Fri May 26 10:49:39 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 26 May 2006 13:49:39 -0400 Subject: [openib-general] RE: OpenSM build problem In-Reply-To: <000001c680eb$666da750$58a9070a@amr.corp.intel.com> References: <000001c680eb$666da750$58a9070a@amr.corp.intel.com> Message-ID: <1148665778.4583.8094.camel@hal.voltaire.com> On Fri, 2006-05-26 at 13:39, Bob Woodruff wrote: > Hal wrote, > >When you do this, where is the library for this (lisosmcomp) being > >installed ? What libosmcomp* are in that directory ? > > I let it default to /usr/local/lib for the libraries and /usr/local/bin > for the binary. > > >Is your LD_LIBRARY_PATH set so that this directory is included (or other > >mechanisms of doing the same) ? > > I set /etc/ld.so.conf to include /usr/local/lib in the path. > > Must have been some makefile changes or such between the trunk version > and the 1.0 version since if I build the trunk the same way, it works > just fine. I don't think it's a Makefile.am change. It's something else but not sure what. Can you send me the output of /usr/local/lib/libosmcomp* ? Can you do the following in your 1.0 complib: make clean && make && make install and rerun the 1.0 OpenSM and see if you still have the problem ? -- Hal > woody From Don.Albert at Bull.com Fri May 26 11:35:18 2006 From: Don.Albert at Bull.com (Don.Albert at Bull.com) Date: Fri, 26 May 2006 11:35:18 -0700 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: <1148665402.4583.7943.camel@hal.voltaire.com> Message-ID: Hal, > Yes, that is very useful. I had been working on trying to come up with > what the problem was but this narrows it down to something I was > thinking might be going on. > > It looks like you are running back to back HCAs, right ? Yes, the HCAs are 4X DDR, connected back to back. > > It also looks to me like your remote (in terms of OpenSM) CA node is not > responding to SMA requests like SubnGet NodeInfo yet the link is active. > Can you describe what state that node is in (what modules are loaded, > etc.) ? Can you do an ibstat/ibstatus on that node ? Both systems are booted and the link appears active. Here is the information you asked for: >>>>>>>>>>>>>>>>>>> Local System (where OpenSM is attempting to run) [koa] (ib) ib> ibstat CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.0.800 Hardware version: a0 Node GUID: 0x0002c90200216dc4 System image GUID: 0x0002c90200216dc7 Port 1: State: Initializing Physical state: LinkUp Rate: 20 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0002c90200216dc5 [koa] (ib) ib> ibstatus Infiniband device 'mthca0' port 1 status: default gid: fe80:0000:0000:0000:0002:c902:0021:6dc5 base lid: 0x0 sm lid: 0x0 state: 2: INIT phys state: 5: LinkUp rate: 20 Gb/sec (4X DDR) [koa] (ib) ib> /sbin/lsmod Module Size Used by parport_pc 28008 0 lp 12872 0 parport 37260 2 parport_pc,lp ib_ipath 58392 0 ipath_core 154596 1 ib_ipath pcmcia 34864 0 yenta_socket 25484 0 rsrc_nonstatic 12160 1 yenta_socket pcmcia_core 38068 3 pcmcia,yenta_socket,rsrc_nonstatic button 7328 0 battery 10120 0 ac 5512 0 uhci_hcd 31776 0 hw_random 6824 0 i2c_i801 10260 0 i2c_core 20992 1 i2c_i801 ib_mthca 109744 0 ib_ipoib 48792 0 ib_uverbs 34128 0 ib_umad 14000 0 ib_ucm 16520 0 ib_sa 13884 1 ib_ipoib ib_cm 30144 1 ib_ucm ib_mad 35896 4 ib_mthca,ib_umad,ib_sa,ib_cm ib_core 45952 9 ib_ipath,ib_mthca,ib_ipoib,ib_uverbs,ib_umad,ib_ucm,ib_sa,ib_cm,ib_mad floppy 67400 0 >>>>>>>>>>>>>>>>>>> Remote system (no OpenSM instance) [jatoba] (ib) ib> ibstat CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.0.800 Hardware version: a0 Node GUID: 0x0002c90200216e40 System image GUID: 0x0002c90200216e43 Port 1: State: Initializing Physical state: LinkUp Rate: 20 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0002c90200216e41 [jatoba] (ib) ib> ibstatus Infiniband device 'mthca0' port 1 status: default gid: fe80:0000:0000:0000:0002:c902:0021:6e41 base lid: 0x0 sm lid: 0x0 state: 2: INIT phys state: 5: LinkUp rate: 20 Gb/sec (4X DDR) [jatoba] (ib) ib> /sbin/lsmod Module Size Used by parport_pc 28008 0 lp 12872 0 parport 37260 2 parport_pc,lp ib_ipath 58392 0 ipath_core 154596 1 ib_ipath pcmcia 34864 0 yenta_socket 25484 0 rsrc_nonstatic 12160 1 yenta_socket pcmcia_core 38068 3 pcmcia,yenta_socket,rsrc_nonstatic button 7328 0 battery 10120 0 ac 5512 0 uhci_hcd 31776 0 hw_random 6824 0 i2c_i801 10260 0 i2c_core 20992 1 i2c_i801 ib_mthca 109744 0 ib_ipoib 48792 0 ib_uverbs 34128 0 ib_umad 14000 2 ib_ucm 16520 0 ib_sa 13884 1 ib_ipoib ib_cm 30144 1 ib_ucm ib_mad 35896 4 ib_mthca,ib_umad,ib_sa,ib_cm ib_core 45952 9 ib_ipath,ib_mthca,ib_ipoib,ib_uverbs,ib_umad,ib_ucm,ib_sa,ib_cm,ib_mad floppy 67400 0 >>>>>>>>>>>>>>>>>>> > > Can you try this patch to see if it gets you further and let me know ? > Note that this is just a potential workaround right now. > I will try rebuilding with the patch and let you know the results. Thanks, -Don Albert- -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.j.woodruff at intel.com Fri May 26 11:41:51 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Fri, 26 May 2006 11:41:51 -0700 Subject: [openib-general] RE: OpenSM build problem Message-ID: <1AC79F16F5C5284499BB9591B33D6F0007CE1129@orsmsx408> Too weird, When I did /usr/local/lib/libosmcomp* I got a segmentation fault ???? So I rebuilt everything from scratch and the problem went away. Must have had a corrupted file. woody -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Friday, May 26, 2006 10:50 AM To: Woodruff, Robert J Cc: Tziporet Koren; OpenFabricsEWG; openib-general Subject: RE: OpenSM build problem On Fri, 2006-05-26 at 13:39, Bob Woodruff wrote: > Hal wrote, > >When you do this, where is the library for this (lisosmcomp) being > >installed ? What libosmcomp* are in that directory ? > > I let it default to /usr/local/lib for the libraries and /usr/local/bin > for the binary. > > >Is your LD_LIBRARY_PATH set so that this directory is included (or other > >mechanisms of doing the same) ? > > I set /etc/ld.so.conf to include /usr/local/lib in the path. > > Must have been some makefile changes or such between the trunk version > and the 1.0 version since if I build the trunk the same way, it works > just fine. I don't think it's a Makefile.am change. It's something else but not sure what. Can you send me the output of /usr/local/lib/libosmcomp* ? Can you do the following in your 1.0 complib: make clean && make && make install and rerun the 1.0 OpenSM and see if you still have the problem ? -- Hal > woody From halr at voltaire.com Fri May 26 11:47:36 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 26 May 2006 14:47:36 -0400 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: References: Message-ID: <1148669254.4583.9320.camel@hal.voltaire.com> Don, On Fri, 2006-05-26 at 14:35, Don.Albert at Bull.com wrote: > Hal, > > > Yes, that is very useful. I had been working on trying to come up > with > > what the problem was but this narrows it down to something I was > > thinking might be going on. > > > > It looks like you are running back to back HCAs, right ? > > Yes, the HCAs are 4X DDR, connected back to back. > > > > > It also looks to me like your remote (in terms of OpenSM) CA node is > not > > responding to SMA requests like SubnGet NodeInfo yet the link is > active. > > Can you describe what state that node is in (what modules are > loaded, > > etc.) ? Can you do an ibstat/ibstatus on that node ? > > Both systems are booted and the link appears active. Here is the > information you asked for: > > >>>>>>>>>>>>>>>>>>> > > Local System (where OpenSM is attempting to run) > > [koa] (ib) ib> ibstat > CA 'mthca0' > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.0.800 > Hardware version: a0 > Node GUID: 0x0002c90200216dc4 > System image GUID: 0x0002c90200216dc7 > Port 1: > State: Initializing > Physical state: LinkUp > Rate: 20 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x02510a68 > Port GUID: 0x0002c90200216dc5 > [koa] (ib) ib> ibstatus > Infiniband device 'mthca0' port 1 status: > default gid: fe80:0000:0000:0000:0002:c902:0021:6dc5 > base lid: 0x0 > sm lid: 0x0 > state: 2: INIT > phys state: 5: LinkUp > rate: 20 Gb/sec (4X DDR) > > [koa] (ib) ib> /sbin/lsmod > Module Size Used by > parport_pc 28008 0 > lp 12872 0 > parport 37260 2 parport_pc,lp > ib_ipath 58392 0 > ipath_core 154596 1 ib_ipath > pcmcia 34864 0 > yenta_socket 25484 0 > rsrc_nonstatic 12160 1 yenta_socket > pcmcia_core 38068 3 pcmcia,yenta_socket,rsrc_nonstatic > button 7328 0 > battery 10120 0 > ac 5512 0 > uhci_hcd 31776 0 > hw_random 6824 0 > i2c_i801 10260 0 > i2c_core 20992 1 i2c_i801 > ib_mthca 109744 0 > ib_ipoib 48792 0 > ib_uverbs 34128 0 > ib_umad 14000 0 > ib_ucm 16520 0 > ib_sa 13884 1 ib_ipoib > ib_cm 30144 1 ib_ucm > ib_mad 35896 4 ib_mthca,ib_umad,ib_sa,ib_cm > ib_core 45952 9 > ib_ipath,ib_mthca,ib_ipoib,ib_uverbs,ib_umad,ib_ucm,ib_sa,ib_cm,ib_mad > floppy 67400 0 > > >>>>>>>>>>>>>>>>>>> > > Remote system (no OpenSM instance) > > [jatoba] (ib) ib> ibstat > CA 'mthca0' > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.0.800 > Hardware version: a0 > Node GUID: 0x0002c90200216e40 > System image GUID: 0x0002c90200216e43 > Port 1: > State: Initializing > Physical state: LinkUp > Rate: 20 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x02510a68 > Port GUID: 0x0002c90200216e41 > [jatoba] (ib) ib> ibstatus > Infiniband device 'mthca0' port 1 status: > default gid: fe80:0000:0000:0000:0002:c902:0021:6e41 > base lid: 0x0 > sm lid: 0x0 > state: 2: INIT > phys state: 5: LinkUp > rate: 20 Gb/sec (4X DDR) One more thing on the remote side, try: smpquery nodeinfo -D 0 > [jatoba] (ib) ib> /sbin/lsmod > Module Size Used by > parport_pc 28008 0 > lp 12872 0 > parport 37260 2 parport_pc,lp > ib_ipath 58392 0 > ipath_core 154596 1 ib_ipath > pcmcia 34864 0 > yenta_socket 25484 0 > rsrc_nonstatic 12160 1 yenta_socket > pcmcia_core 38068 3 pcmcia,yenta_socket,rsrc_nonstatic > button 7328 0 > battery 10120 0 > ac 5512 0 > uhci_hcd 31776 0 > hw_random 6824 0 > i2c_i801 10260 0 > i2c_core 20992 1 i2c_i801 > ib_mthca 109744 0 > ib_ipoib 48792 0 > ib_uverbs 34128 0 > ib_umad 14000 2 > ib_ucm 16520 0 > ib_sa 13884 1 ib_ipoib > ib_cm 30144 1 ib_ucm > ib_mad 35896 4 ib_mthca,ib_umad,ib_sa,ib_cm > ib_core 45952 9 > ib_ipath,ib_mthca,ib_ipoib,ib_uverbs,ib_umad,ib_ucm,ib_sa,ib_cm,ib_mad > floppy 67400 0 Do you also have an iPath adapter ? If not, no need to load those modules. > >>>>>>>>>>>>>>>>>>> > > > > > Can you try this patch to see if it gets you further and let me know > ? > > Note that this is just a potential workaround right now. > > > > I will try rebuilding with the patch and let you know the results. Thanks for your help in resolving this. -- Hal > Thanks, > -Don Albert- From halr at voltaire.com Fri May 26 11:52:42 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 26 May 2006 14:52:42 -0400 Subject: [openib-general] RE: OpenSM build problem In-Reply-To: <1AC79F16F5C5284499BB9591B33D6F0007CE1129@orsmsx408> References: <1AC79F16F5C5284499BB9591B33D6F0007CE1129@orsmsx408> Message-ID: <1148669347.4583.9367.camel@hal.voltaire.com> On Fri, 2006-05-26 at 14:41, Woodruff, Robert J wrote: > Too weird, > > When I did > /usr/local/lib/libosmcomp* > > I got a segmentation fault ???? Is that what you did or an ls of that ? > So I rebuilt everything from scratch and the problem went away. > > Must have had a corrupted file. Perhaps another build gremlin in this space :-( There have been similar reports that rebuilds fixed some first time issues with other related aspects... -- Hal > woody > > > > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Friday, May 26, 2006 10:50 AM > To: Woodruff, Robert J > Cc: Tziporet Koren; OpenFabricsEWG; openib-general > Subject: RE: OpenSM build problem > > On Fri, 2006-05-26 at 13:39, Bob Woodruff wrote: > > Hal wrote, > > >When you do this, where is the library for this (lisosmcomp) being > > >installed ? What libosmcomp* are in that directory ? > > > > I let it default to /usr/local/lib for the libraries and > /usr/local/bin > > for the binary. > > > > >Is your LD_LIBRARY_PATH set so that this directory is included (or > other > > >mechanisms of doing the same) ? > > > > I set /etc/ld.so.conf to include /usr/local/lib in the path. > > > > Must have been some makefile changes or such between the trunk version > > and the 1.0 version since if I build the trunk the same way, it works > > just fine. > > I don't think it's a Makefile.am change. It's something else but not > sure what. > > Can you send me the output of /usr/local/lib/libosmcomp* ? > > Can you do the following in your 1.0 complib: > make clean && make && make install > > and rerun the 1.0 OpenSM and see if you still have the problem ? > > -- Hal > > > woody From Don.Albert at Bull.com Fri May 26 12:31:23 2006 From: Don.Albert at Bull.com (Don.Albert at Bull.com) Date: Fri, 26 May 2006 12:31:23 -0700 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: <1148669254.4583.9320.camel@hal.voltaire.com> Message-ID: Hal, > > One more thing on the remote side, try: > > smpquery nodeinfo -D 0 > Here is the smpquery on the remote (system "jatoba") side >>>>>>>>>>>>> [jatoba] (ib) ib> smpquery nodeinfo -D 0 # Node info: DR path [0] BaseVers:........................1 ClassVers:.......................1 NodeType:........................Channel Adapter NumPorts:........................1 SystemGuid:......................0x0002c90200216e43 Guid:............................0x0002c90200216e40 PortGuid:........................0x0002c90200216e41 PartCap:.........................64 DevId:...........................0x6274 Revision:........................0x000000a0 LocalPort:.......................1 VendorId:........................0x0002c9 >>>>>>>>>>>>> For good measure, here is the local (system "koa") side [koa] (ib) ib> smpquery nodeinfo -D 0 # Node info: DR path [0] BaseVers:........................1 ClassVers:.......................1 NodeType:........................Channel Adapter NumPorts:........................1 SystemGuid:......................0x0002c90200216dc7 Guid:............................0x0002c90200216dc4 PortGuid:........................0x0002c90200216dc5 PartCap:.........................64 DevId:...........................0x6274 Revision:........................0x000000a0 LocalPort:.......................1 VendorId:........................0x0002c9 >>>>>>>>>>>>> > Do you also have an iPath adapter ? If not, no need to load those > modules. > We do not have an iPath adapter. I just did a "build all packages" in the OFED install.sh script, and it included it. I did a "modprobe -r ib_ipath" and it removed it ok. -Don Albert- -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri May 26 13:06:44 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 26 May 2006 13:06:44 -0700 Subject: [openib-general] [PATCH 06/16] ehca: interrupt handling routines In-Reply-To: <4468BD63.6070509@de.ibm.com> (Heiko J. Schick's message of "Mon, 15 May 2006 19:41:55 +0200") References: <4468BD63.6070509@de.ibm.com> Message-ID: > + for_each_online_cpu(cpu) { > + task = create_comp_task(pool, cpu); > + if (task) { > + kthread_bind(task, cpu); > + wake_up_process(task); > + } > + } How does this creation of a thread pool work with respect to CPU hotplug? What happens if a CPU goes away? How about if only one CPU is running when the driver is loaded, and then 15 more are hot-added? > + for (i = 0; i < NR_CPUS; i++) { > + if (cpu_online(i)) > + destroy_comp_task(pool, i); > + } And it seems in the destroy function, you will possibly leak threads or try to kill a non-existent thread if the set of online CPUs has changed since the driver started... - R. From xma at us.ibm.com Fri May 26 14:22:18 2006 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 26 May 2006 14:22:18 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: Message-ID: Roland, Yes. The lock sequences are right to me. What I found that the ah is always available the IPoIB neigh, I can modify this patch like that in ipoib_send: if (unlikely(*to_ipoib_neigh(skb->dst->neighbour))) kref_get(); in ipoib completion: if (unlikely(*to_ipoib_neigh(skb->dst->neighbour))) kref_put(); Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Don.Albert at Bull.com Fri May 26 14:32:16 2006 From: Don.Albert at Bull.com (Don.Albert at Bull.com) Date: Fri, 26 May 2006 14:32:16 -0700 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: Message-ID: Hal, I rebuilt the opensm executable with the patch you provided. The patch fixes (or avoids) the segmentation fault and opensm comes up and runs. However, the link is still not becoming operational. On the local side it goes to ARMED, and on the remote side it goes to INIT. The osm.log seems to show that the MAD packets are timing out. Here is the first part of the file, it just repeats after this at one minute intervals. [koa] (ib) root> cat /var/log/osm.log May 26 14:05:43 369104 [8EFC3D00] -> OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision May 26 14:05:43 369260 [0000] -> OpenSM Rev:openib-1.2.0 OpenIB svn Exported revision May 26 14:05:43 370571 [8EFC3D00] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe8000000000000 0,0x0000000000000000 May 26 14:05:43 370631 [8EFC3D00] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe8000000000000 0,0x0000000000000000 May 26 14:05:43 373005 [8EFC3D00] -> osm_vendor_bind: Binding to port 0x2c90200216dc5 May 26 14:05:43 374685 [8EFC3D00] -> osm_vendor_bind: Binding to port 0x2c90200216dc5 May 26 14:05:44 172028 [44007960] -> umad_receiver: ERR 5409: send completed with error (method=0x1 attr=0x11 trans_id=0x1239) -- dr opping May 26 14:05:44 172070 [44007960] -> umad_receiver: ERR 5411: DR SMP May 26 14:05:44 172083 [44007960] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT) May 26 14:05:44 172148 [44007960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x1 trans_id................0x1239 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1] Return path: [0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 26 14:05:44 172199 [42003960] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200216dc4 port 1. Adding to light sweep sampling list May 26 14:05:44 172240 [42003960] -> Directed Path Dump of 0 hop path: Path = [0] May 26 14:05:44 172256 [0000] -> Entering MASTER state May 26 14:05:44 179081 [0000] -> SUBNET UP May 26 14:05:54 180461 [44007960] -> umad_receiver: ERR 5409: send completed with error (method=0x1 attr=0x11 trans_id=0x1240) -- dr opping May 26 14:05:54 180515 [44007960] -> umad_receiver: ERR 5411: DR SMP May 26 14:05:54 180528 [44007960] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT) May 26 14:05:54 180569 [44007960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x1 trans_id................0x1240 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1] Return path: [0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 26 14:05:54 180624 [42003960] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200216dc4 port 1. Adding to light sweep sampling list May 26 14:05:54 180649 [42003960] -> Directed Path Dump of 0 hop path: Path = [0] The physical link appears to be up: here are the ibstat, ibstatus results for both sides: Local system [koa] (ib) root> ibstat CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.0.800 Hardware version: a0 Node GUID: 0x0002c90200216dc4 System image GUID: 0x0002c90200216dc7 Port 1: State: Armed Physical state: LinkUp Rate: 20 Base lid: 2 LMC: 0 SM lid: 2 Capability mask: 0x02510a6a Port GUID: 0x0002c90200216dc5 [koa] (ib) root> ibstatus Infiniband device 'mthca0' port 1 status: default gid: fe80:0000:0000:0000:0002:c902:0021:6dc5 base lid: 0x2 sm lid: 0x2 state: 3: ARMED phys state: 5: LinkUp rate: 20 Gb/sec (4X DDR) Remote system [jatoba] (ib) ib> ibstat CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.0.800 Hardware version: a0 Node GUID: 0x0002c90200216e40 System image GUID: 0x0002c90200216e43 Port 1: State: Initializing Physical state: LinkUp Rate: 20 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0002c90200216e41 [jatoba] (ib) ib> ibstatus Infiniband device 'mthca0' port 1 status: default gid: fe80:0000:0000:0000:0002:c902:0021:6e41 base lid: 0x0 sm lid: 0x0 state: 2: INIT phys state: 5: LinkUp rate: 20 Gb/sec (4X DDR) An "ibnetdiscover" on the local system gives the following: [koa] (ib) root> ibnetdiscover ibwarn: [20638] handle_port: NodeInfo on DR path [0][1] port 1 failed, skipping port # # Topology file: generated on Fri May 26 14:24:20 2006 # # Max of 1 hops discovered # Initiated from node 0002c90200216dc4 port 0002c90200216dc5 vendid=0x2c9 devid=0x6274 sysimgguid=0x2c90200216dc7 caguid=0x2c90200216dc4 Ca 1 "H-0002c90200216dc4" # koa HCA-1 What next, coach? -Don Albert- -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Fri May 26 14:36:02 2006 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 26 May 2006 14:36:02 -0700 Subject: [openib-general] [PATCH][1/7]ipoib performance patches -- remove ah_reap In-Reply-To: Message-ID: in ipoib send if (unlikely(!*to_ipoib_neigh(skb->dst->neighbour))) kref_get(); in ipoib completion: if (unlikely(!*to_ipoib_neigh(skb->dst->neighbour))) ipoib_put_ah(); Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri May 26 15:48:46 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 26 May 2006 15:48:46 -0700 Subject: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring In-Reply-To: (Shirley Ma's message of "Thu, 25 May 2006 19:30:23 -0700") References: Message-ID: Shirley> This patch reduces one extra ring between dev xmit queue Shirley> and device send queue and removes tx_lock in completion Shirley> handler. I don't think replacing the ring with a linked list is an improvement. At best it works out to be the same. But I didn't notice that you removed the tx_lock from the send completion handler before. I suspect that this is where all the improvement is coming from. Unfortunately it introduces a couple of race conditions. For example, if CPU1 is in ipoib_send() and CPU2 is running the send completion handler, you can have: CPU1 CPU2 err = post_send(priv, wr_id, address->ah, qpn, addr, skb->len); // err is non-zero if (!err) { } else { if (!netif_queue_stopped(dev)) { // send completion handler // runs many times and drains // completion queue // queue is never stopped so // it never gets woken again if (netif_queue_stopped(dev)) netif_wake_queue(dev); netif_stop_queue(dev); // no more sends are posted so // another completion never // occurs and the queue stays // stopped forever. There are analogous races that cause spurious wakeups too. However I think this is a good idea, and I think we can get around the races by working around them in a way that doesn't hurt performance (eg by having the netdev tx watchdog restart the queue in the scenario above). I cooked up a quick patch (below) but I didn't see any performance improvement from this in my quick tests -- NPtcp peaks at ~3400 Mbit/sec between my test systems with 2.6.17-rc5 both with and without this patch. Then I tried to apply your patches 1/7 and 3/7 to see if they were any different on my setup, but the patches didn't apply. 3/7 had no attachment, and it's hopeless to try and apply the patches you send inline. So can you resend up-to-date versions of the patches that give you a 10% improvement? (BTW it would be nice if you could figure out a way to fix your mail client to post patches inline without mangling them, or at least attach them with a mime type of text/plain or something) Also, if you're interested, you could try the patch below and see how it does on your tests. - R. Here's my tx_lock removal test patch (it would need better comments about all the races with tx_tail and some more careful review before we actually apply it): --- infiniband/ulp/ipoib/ipoib_main.c (revision 7507) +++ infiniband/ulp/ipoib/ipoib_main.c (working copy) @@ -634,6 +634,14 @@ static int ipoib_start_xmit(struct sk_bu return NETDEV_TX_BUSY; } + /* + * Because tx_lock is not held when updating tx_tail in the + * send completion handler, we may receive a spurious wakeup + * that starts our queue when there really isn't space yet. + */ + if (unlikely(priv->tx_head - priv->tx_tail == ipoib_sendq_size)) + return NETDEV_TX_BUSY; + if (skb->dst && skb->dst->neighbour) { if (unlikely(!*to_ipoib_neigh(skb->dst->neighbour))) { ipoib_path_lookup(skb, dev); @@ -703,6 +711,21 @@ static struct net_device_stats *ipoib_ge static void ipoib_timeout(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); + unsigned long flags; + int lost_wakeup = 0; + + spin_lock_irqsave(&priv->tx_lock, flags); + if (netif_queue_stopped(dev) && + priv->tx_head - priv->tx_tail < ipoib_sendq_size) { + ipoib_dbg(priv, "lost wakeup, head %u, tail %u\n", + priv->tx_head, priv->tx_tail); + lost_wakeup = 1; + netif_wake_queue(dev); + } + spin_unlock_irqrestore(&priv->tx_lock, flags); + + if (lost_wakeup) + return; ipoib_warn(priv, "transmit timeout: latency %d msecs\n", jiffies_to_msecs(jiffies - dev->trans_start)); --- infiniband/ulp/ipoib/ipoib_ib.c (revision 7511) +++ infiniband/ulp/ipoib/ipoib_ib.c (working copy) @@ -244,7 +244,6 @@ static void ipoib_ib_handle_wc(struct ne } else { struct ipoib_tx_buf *tx_req; - unsigned long flags; if (wr_id >= ipoib_sendq_size) { ipoib_warn(priv, "completion event with wrid %d (> %d)\n", @@ -266,12 +265,17 @@ static void ipoib_ib_handle_wc(struct ne dev_kfree_skb_any(tx_req->skb); - spin_lock_irqsave(&priv->tx_lock, flags); ++priv->tx_tail; + + /* + * Since we don't hold tx_lock here, this may lead to + * both lost wakeups (which we deal with in our + * watchdog) and spurious wakeups (which we deal with + * by handling TX ring overflows in the xmit function). + */ if (netif_queue_stopped(dev) && priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) netif_wake_queue(dev); - spin_unlock_irqrestore(&priv->tx_lock, flags); if (wc->status != IB_WC_SUCCESS && wc->status != IB_WC_WR_FLUSH_ERR) From xma at us.ibm.com Fri May 26 16:03:52 2006 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 26 May 2006 16:03:52 -0700 Subject: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring In-Reply-To: Message-ID: Roland, Roland, Thanks for the review comments. I will update these patches and tests results. >BTW it would be nice if you could figure out a way to fix your mail client to post patches inline without mangling them, or at least attach them with a mime type of text/plain or something. I will use my unix account to send out patches. >Also, if you're interested, you could try the patch below and see how it does on your tests. Sure. I will test it after this weekend. Did you see send queue overrun with tx_ring default size 128? Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Fri May 26 16:20:02 2006 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 26 May 2006 16:20:02 -0700 Subject: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring In-Reply-To: (Shirley Ma's message of "Fri, 26 May 2006 16:03:52 -0700") References: Message-ID: Shirley> Sure. I will test it after this weekend. Did you see send Shirley> queue overrun with tx_ring default size 128? No, but my patch would still prevent send queue overruns. It might stop the netdevice queue because the send queue is full, but I didn't turn on debugging to see that. (Also the default TX ring size is 64, not 128, isn't it?) - R. From xma at us.ibm.com Fri May 26 16:54:27 2006 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 26 May 2006 16:54:27 -0700 Subject: [openib-general] [PATCH][3/7]ipoib performance patches -- remove tx_ring In-Reply-To: Message-ID: Roland Dreier wrote on 05/26/2006 04:20:02 PM: > (Also the default TX ring size is 64, not 128, isn't it?) > > - R. Yes. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From weiny2 at llnl.gov Fri May 26 16:54:56 2006 From: weiny2 at llnl.gov (Ira Weiny) Date: Fri, 26 May 2006 16:54:56 -0700 Subject: [openib-general] [PATCH] ibv_*_pingpong examples : user option for pkey Message-ID: <20060526165456.25521a93.weiny2@llnl.gov> While testing the pkey features of opensm I added this patch to be able to check out the use of different pkeys. Ira -------------- next part -------------- A non-text attachment was scrubbed... Name: pingpong-pkey-option.patch Type: application/octet-stream Size: 8496 bytes Desc: not available URL: From halr at voltaire.com Fri May 26 17:59:46 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 26 May 2006 20:59:46 -0400 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: References: Message-ID: <1148691586.4358.4739.camel@hal.voltaire.com> Don, On Fri, 2006-05-26 at 17:32, Don.Albert at Bull.com wrote: > Hal, > > I rebuilt the opensm executable with the patch you provided. The > patch fixes (or avoids) the segmentation fault and opensm comes up and > runs. Thanks for trying this out. > However, the link is still not becoming operational. On the local > side it goes to ARMED, and on the remote side it goes to INIT. The > osm.log seems to show that the MAD packets are timing out. Yes, as I mentioned the remote end is not responding to SMA packets (as the right modules appear to be loaded to do that). I don't know why this is but this is NOT an OpenSM issue. > Here is the first part of the file, it just repeats after this at > one minute intervals. Right, OpenSM sees the Physical Link Up and tries to bring the port to active but can't because the remote SMA is not responding. Periodically, it downs the port and reattempts to bring it back up (but can't). > [koa] (ib) root> cat /var/log/osm.log > May 26 14:05:43 369104 [8EFC3D00] -> OpenSM Rev:openib-1.2.0 OpenIB > svn Exported revision > May 26 14:05:43 369260 [0000] -> OpenSM Rev:openib-1.2.0 OpenIB svn > Exported revision > > May 26 14:05:43 370571 [8EFC3D00] -> osm_report_notice: Reporting > Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe8000000000000 > 0,0x0000000000000000 > May 26 14:05:43 370631 [8EFC3D00] -> osm_report_notice: Reporting > Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe8000000000000 > 0,0x0000000000000000 > May 26 14:05:43 373005 [8EFC3D00] -> osm_vendor_bind: Binding to port > 0x2c90200216dc5 > May 26 14:05:43 374685 [8EFC3D00] -> osm_vendor_bind: Binding to port > 0x2c90200216dc5 > May 26 14:05:44 172028 [44007960] -> umad_receiver: ERR 5409: send > completed with error (method=0x1 attr=0x11 trans_id=0x1239) -- dr > opping > May 26 14:05:44 172070 [44007960] -> umad_receiver: ERR 5411: DR SMP > May 26 14:05:44 172083 [44007960] -> __osm_sm_mad_ctrl_send_err_cb: > ERR 3113: MAD completed in error (IB_TIMEOUT) > May 26 14:05:44 172148 [44007960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x1 (SubnGet) > D bit...................0x0 > status..................0x0 > hop_ptr.................0x0 > hop_count...............0x1 > trans_id................0x1239 > attr_id.................0x11 > (NodeInfo) > resv....................0x0 > attr_mod................0x0 > > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1] > Return path: [0][0] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 > > May 26 14:05:44 172199 [42003960] -> osm_drop_mgr_process: ERR 0108: > Unknown remote side for node 0x0002c90200216dc4 port 1. Adding > to light sweep sampling list > May 26 14:05:44 172240 [42003960] -> Directed Path Dump of 0 hop path: > Path = [0] > May 26 14:05:44 172256 [0000] -> Entering MASTER state > > May 26 14:05:44 179081 [0000] -> SUBNET UP > > May 26 14:05:54 180461 [44007960] -> umad_receiver: ERR 5409: send > completed with error (method=0x1 attr=0x11 trans_id=0x1240) -- dr > opping > May 26 14:05:54 180515 [44007960] -> umad_receiver: ERR 5411: DR SMP > May 26 14:05:54 180528 [44007960] -> __osm_sm_mad_ctrl_send_err_cb: > ERR 3113: MAD completed in error (IB_TIMEOUT) > May 26 14:05:54 180569 [44007960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x1 (SubnGet) > D bit...................0x0 > status..................0x0 > hop_ptr.................0x0 > hop_count...............0x1 > trans_id................0x1240 > attr_id.................0x11 > (NodeInfo) > resv....................0x0 > attr_mod................0x0 > > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1] > Return path: [0][0] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 > > May 26 14:05:54 180624 [42003960] -> osm_drop_mgr_process: ERR 0108: > Unknown remote side for node 0x0002c90200216dc4 port 1. Adding > to light sweep sampling list > May 26 14:05:54 180649 [42003960] -> Directed Path Dump of 0 hop path: > Path = [0] > > > The physical link appears to be up: > here are the ibstat, ibstatus results for both sides: > > Local system > > [koa] (ib) root> ibstat > CA 'mthca0' > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.0.800 > Hardware version: a0 > Node GUID: 0x0002c90200216dc4 > System image GUID: 0x0002c90200216dc7 > Port 1: > State: Armed > Physical state: LinkUp > Rate: 20 > Base lid: 2 > LMC: 0 > SM lid: 2 > Capability mask: 0x02510a6a > Port GUID: 0x0002c90200216dc5 > [koa] (ib) root> ibstatus > Infiniband device 'mthca0' port 1 status: > default gid: fe80:0000:0000:0000:0002:c902:0021:6dc5 > base lid: 0x2 > sm lid: 0x2 > state: 3: ARMED > phys state: 5: LinkUp > rate: 20 Gb/sec (4X DDR) > > Remote system > > [jatoba] (ib) ib> ibstat > CA 'mthca0' > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.0.800 > Hardware version: a0 > Node GUID: 0x0002c90200216e40 > System image GUID: 0x0002c90200216e43 > Port 1: > State: Initializing > Physical state: LinkUp > Rate: 20 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x02510a68 > Port GUID: 0x0002c90200216e41 > [jatoba] (ib) ib> ibstatus > Infiniband device 'mthca0' port 1 status: > default gid: fe80:0000:0000:0000:0002:c902:0021:6e41 > base lid: 0x0 > sm lid: 0x0 > state: 2: INIT > phys state: 5: LinkUp > rate: 20 Gb/sec (4X DDR) > > An "ibnetdiscover" on the local system gives the following: > > [koa] (ib) root> ibnetdiscover > ibwarn: [20638] handle_port: NodeInfo on DR path [0][1] port 1 failed, > skipping port Right; that's the same thing the SM sees. The remote SMA is not responding to requests (same request SM Get NodeInfo). > # > # Topology file: generated on Fri May 26 14:24:20 2006 > # > # Max of 1 hops discovered > # Initiated from node 0002c90200216dc4 port 0002c90200216dc5 > > vendid=0x2c9 > devid=0x6274 > sysimgguid=0x2c90200216dc7 > caguid=0x2c90200216dc4 > Ca 1 "H-0002c90200216dc4" # koa HCA-1 > > What next, coach? Can you turn on madeye on the remote node and see what packets are received and sent ? Let me know if you need help with that. I think you said you were running OFED, right ? -- Hal > -Don Albert- From halr at voltaire.com Fri May 26 18:57:30 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 26 May 2006 21:57:30 -0400 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: <1148691586.4358.4739.camel@hal.voltaire.com> References: <1148691586.4358.4739.camel@hal.voltaire.com> Message-ID: <1148695049.4358.5966.camel@hal.voltaire.com> Don, On Fri, 2006-05-26 at 20:59, Hal Rosenstock wrote: > > What next, coach? > > Can you turn on madeye on the remote node and see what packets are > received and sent ? Let me know if you need help with that. I think you > said you were running OFED, right ? I don't think madeye is part of OFED :-( Can it get added for RC6, Tziporet ? I think it would be a useful tool to add for problems like this. Also, was this a working setup before ? Did anything else change besides installing RC5 on both nodes ? I have two more experiments I'd like you to try, before we go down the madeye "route": 1. Do you have another IB cable to try ? 2. Can you completely shutdown and repower the remote node and see if it starts responding ? Thanks. -- Hal From paul.lundin at gmail.com Fri May 26 23:26:38 2006 From: paul.lundin at gmail.com (Paul) Date: Sat, 27 May 2006 02:26:38 -0400 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: <1148661960.4583.6711.camel@hal.voltaire.com> References: <1148657954.4583.5262.camel@hal.voltaire.com> <1148661960.4583.6711.camel@hal.voltaire.com> Message-ID: Hi Hal, My lab is undergoing maitanence this weekend so I wont be able to get you any results til tuesday, however the results are readily reproducable. Everything is 64bit. Regards. On 26 May 2006 12:46:01 -0400, Hal Rosenstock wrote: > > Hi again Paul, > > On Fri, 2006-05-26 at 12:14, Paul wrote: > > No, I figured all of that out, ppc64 was not supported/working in RC4. > > Either way, here is what I see with opensm: > > > > [root at something ~]# /etc/init.d/opensmd start > > *** glibc detected *** realloc(): invalid next size: > > 0x00000000100ab1e0 *** > > /etc/init.d/opensmd: line 330: 7854 Done echo $PORT_FLAG > > 7855 Aborted | $prog $START_FLAGS >/dev/null 2>&1 > > opensm start [FAILED] > > [root at something ~]# > > OK; that's a totally different problem than Don's. I would like to get > to the bottom of this. > > 0x100ab1e0 is a pretty big size. Is this reproducible ? > > I'm not sure how realloc gets called as I do not believe OpenSM calls it > directly (or any of its libraries). > > Are you using 32 or 64 bit libraries for this ? > > Would you rebuild OpenSM with debug: > ./configure --enable-debug && make clean && make && make install > > and then run opensm under gdb and provide the backtrace after the > failure? > > Thanks. > > -- Hal > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Sat May 27 03:08:35 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 27 May 2006 06:08:35 -0400 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: References: <1148657954.4583.5262.camel@hal.voltaire.com> <1148661960.4583.6711.camel@hal.voltaire.com> Message-ID: <1148724514.4358.16571.camel@hal.voltaire.com> Hi Paul, On Sat, 2006-05-27 at 02:26, Paul wrote: > Hi Hal, > My lab is undergoing maitanence this weekend so I wont be able > to get you any results til tuesday, however the results are readily > reproducable. Everything is 64bit. Unfortunately I don't have access to a PPC64 machine on which to do this myself. I wish I did. So can you help next week ? Thanks. -- Hal > Regards. > > On 26 May 2006 12:46:01 -0400, Hal Rosenstock > wrote: > Hi again Paul, > > On Fri, 2006-05-26 at 12:14, Paul wrote: > > No, I figured all of that out, ppc64 was not > supported/working in RC4. > > Either way, here is what I see with opensm: > > > > [root at something ~]# /etc/init.d/opensmd start > > *** glibc detected *** realloc(): invalid next size: > > 0x00000000100ab1e0 *** > > /etc/init.d/opensmd: line 330: 7854 Done echo $PORT_FLAG > > 7855 Aborted | $prog $START_FLAGS >/dev/null 2>&1 > > opensm start [FAILED] > > [root at something ~]# > > OK; that's a totally different problem than Don's. I would > like to get > to the bottom of this. > > 0x100ab1e0 is a pretty big size. Is this reproducible ? > > I'm not sure how realloc gets called as I do not believe > OpenSM calls it > directly (or any of its libraries). > > Are you using 32 or 64 bit libraries for this ? > > Would you rebuild OpenSM with debug: > ./configure --enable-debug && make clean && make && make > install > > and then run opensm under gdb and provide the backtrace after > the > failure? > > Thanks. > > -- Hal > > From halr at voltaire.com Sat May 27 04:04:47 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 27 May 2006 07:04:47 -0400 Subject: [openib-general] Failed multicast join with new multicast module Message-ID: <1148727884.4358.17728.camel@hal.voltaire.com> Hi Sean, I just (finally) updated to include the new multicast module (I'm at the latest svn) and I see one anomaly. When a multicast join fails (properly) for a group (e.g. status 0x0600), it appears to be continually retried and it never gives up. The join request is being denied for some IPv6 groups as they are not previously created. In about 100 usecs after the ERR_REQ_INSUFFICIENT_COMPONENTS status is returned, the module appears to rerequest and never give up. I forget exactly what the strategy for this was before the multicast module was introduced: whether it was exponential backoff up to some limit, or whether it was linear up to some retry count. Also, in looking at the new multicast code, I see the following: static int retry_timer = 5000; /* 5 sec */ module_param(retry_timer, int, 0444); MODULE_PARM_DESC(retry_timer, "Time in ms between retried requests."); static int retries = 3; module_param(retries, int, 0444); MODULE_PARM_DESC(retries, "Number of times to retry a request."); so it appears that the multicast module has it's own retry strategy. Is that true ? If so, does this interact with IPoIB's for rerequesting or has that changed ? Thanks. -- Hal From peter at flashgoddess.com Sat May 27 08:59:05 2006 From: peter at flashgoddess.com (kylen nadiyah) Date: Sat, 27 May 2006 15:59:05 +0000 Subject: [openib-general] The heck with the day... carpe scrotum! Message-ID: <56072481.20060315015103@222.237.108.227> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rrmwyqm.gif Type: image/gif Size: 8997 bytes Desc: not available URL: From sean.hefty at intel.com Sat May 27 09:44:52 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Sat, 27 May 2006 09:44:52 -0700 Subject: [openib-general] Failed multicast join with new multicast module In-Reply-To: <1148727884.4358.17728.camel@hal.voltaire.com> Message-ID: >I forget exactly what the strategy for this was before the multicast >module was introduced: whether it was exponential backoff up to some >limit, or whether it was linear up to some retry count. > >Also, in looking at the new multicast code, I see the following: > >static int retry_timer = 5000; /* 5 sec */ >module_param(retry_timer, int, 0444); >MODULE_PARM_DESC(retry_timer, "Time in ms between retried requests."); > >static int retries = 3; >module_param(retries, int, 0444); >MODULE_PARM_DESC(retries, "Number of times to retry a request."); > >so it appears that the multicast module has it's own retry strategy. Is >that true ? If so, does this interact with IPoIB's for rerequesting or >has that changed ? The multicast module uses its own retry strategy, basically just passing the request down to the MAD layer. It should fail the join request to the user if the retries are exceeded. I should have a userspace multicast test module by the end of this coming week which will let me stress the multicast code more. Ipoib uses its own retry strategy, and I believe re-issues the request. Ipoib uses an exponential backoff strategy, so it sounds like there's an issue with the ipoib changes. Looking at the code, I need to understand how send-only joins are retried. - Sean From eitan at mellanox.co.il Sat May 27 10:08:51 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sat, 27 May 2006 20:08:51 +0300 Subject: [openib-general] RE: OpenSM build problem Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302368729@mtlexch01.mtl.com> I have a strange feeling the problems you see with libosmcomp are result from the change in library version that happened on the trunk. I suspect once you have compiled the trunk on the machine the new lib version is available and as it is not compatible with the OFED OpenSM you get into all sort of trouble. Just make sure you cleanup all libosmcomp* before you build. My 2 cents Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Hal Rosenstock > Sent: Friday, May 26, 2006 8:50 PM > To: Robert J Woodruff > Cc: OpenFabricsEWG; openib-general > Subject: [openib-general] RE: OpenSM build problem > > On Fri, 2006-05-26 at 13:39, Bob Woodruff wrote: > > Hal wrote, > > >When you do this, where is the library for this (lisosmcomp) being > > >installed ? What libosmcomp* are in that directory ? > > > > I let it default to /usr/local/lib for the libraries and /usr/local/bin > > for the binary. > > > > >Is your LD_LIBRARY_PATH set so that this directory is included (or other > > >mechanisms of doing the same) ? > > > > I set /etc/ld.so.conf to include /usr/local/lib in the path. > > > > Must have been some makefile changes or such between the trunk version > > and the 1.0 version since if I build the trunk the same way, it works > > just fine. > > I don't think it's a Makefile.am change. It's something else but not > sure what. > > Can you send me the output of /usr/local/lib/libosmcomp* ? > > Can you do the following in your 1.0 complib: > make clean && make && make install > > and rerun the 1.0 OpenSM and see if you still have the problem ? > > -- Hal > > > woody > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sashak at voltaire.com Sat May 27 11:23:26 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 27 May 2006 21:23:26 +0300 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: References: <1148657954.4583.5262.camel@hal.voltaire.com> Message-ID: <20060527182326.GA20901@sashak.voltaire.com> Hi Paul, On 12:14 Fri 26 May , Paul wrote: > No, I figured all of that out, ppc64 was not supported/working in RC4. > Either way, here is what I see with opensm: > > [root at something ~]# /etc/init.d/opensmd start > *** glibc detected *** realloc(): invalid next size: 0x00000000100ab1e0 *** I remember this report, there was gdb backtrace as well. Looks that '0x00000000100ab1e0' is not size but address (32-bit?) and it is resulted by shift in argument list. Could you provide ssh access to this machine so I will be able to debug this (I don't have ppc64). Sasha From rpearson at systemfabricworks.com Sat May 27 14:16:59 2006 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Sat, 27 May 2006 16:16:59 -0500 Subject: [openib-general] SVN problem Message-ID: <20060527211705.NHXG144.rrcs-fep-10.hrndva.rr.com@BOBP> I'm having problems checking out the svn repository. As anyone seen this? svn: In directory 'gen2/trunk/src/userspace/mpi/mvapich-gen2/www/www1' svn: Can't copy 'gen2/trunk/src/userspace/mpi/mvapich-gen2/www/www1/.svn/tmp/text-bas nk/src/userspace/mpi/mvapich-gen2/www/www1/mpicc.html.tmp': No such file or directory -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Sat May 27 23:54:33 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Sun, 28 May 2006 09:54:33 +0300 Subject: [openib-general] RC 5 ppc64 problems In-Reply-To: References: Message-ID: <44794929.4050708@mellanox.co.il> Paul wrote: > So far I have had 2 issues with the RC5 build. First the compile fails > as it does not pass the correct parameters to g++ (regardless of > CXXFLAGS, LDFLAGS, CCFLAGS, CFLAGS) settings. I made this work > correctly by creating a bash script in place of g++ that called g++ > -m64 explicit ally. > > Now that I have everything compiled I am experiencing the same problem > as before ... with some further information available. As noted before > I was experiencing some issues with running pallas. I had hand built > pallas. Now I am using the full OFED stack (openib, open-mpi, pallas) > and the resulting pallas binary will run (localhost test), only if the > mthca.so file is missing. If the mthca file is present I get the > following consistent error: > > [root at something PMB-2.2.1]# > /usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/bin/mpirun -np 2 -hostfile > machine.list ./PMB-MPI1 > Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) > Failing at addr:0x3000100a619d > [0] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/libopal.so.0 > [0x80001d0038] > [1] func:[0x1ffffffe5f0] > [2] func:/usr/local/ofed/mpi/gcc/openmpi- 1.1a7-1/lib64/libmpi.so.0 > [0x800006a9dc] > [3] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/libmpi.so.0 > [0x800006abf8] > [4] > func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/openmpi/mca_btl_openib.so > [0x800055a5f0] > [5] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/libmpi.so.0 > [0x80000d7e48] > [6] > func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/openmpi/mca_bml_r2.so > [0x800053f99c] > [7] func:/usr/local/ofed/mpi/gcc/openmpi- 1.1a7-1/lib64/libmpi.so.0 > [0x80000d7530] > [8] > func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/openmpi/mca_pml_ob1.so > [0x800051f00c] > [9] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/libmpi.so.0 > [0x80000e0558] > [10] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/libmpi.so.0 > [0x800008c900] > [11] func:/usr/local/ofed/mpi/gcc/openmpi-1.1a7-1/lib64/libmpi.so.0 > [0x80000b6f20] > [12] func:./PMB-MPI1 [0x10003144] > [13] func:/lib64/tls/libc.so.6 [0x8064e9415c] > [14] func:/lib64/tls/libc.so.6 [0x8064e942e4] > *** End of error message *** > [root at something PMB-2.2.1]# > ------------------------------------------------------------------------ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general Can you open a bug in bugzilla for this issue Thanks, Tziporet From bugzilla-daemon at openib.org Sun May 28 00:06:52 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Sun, 28 May 2006 00:06:52 -0700 (PDT) Subject: [openib-general] [Bug 99] Enable both 32bit and 64bit libraries on dual-arch systems (ppc64 and x86_64) Message-ID: <20060528070652.BA4F32283D5@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=99 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |vlad at mellanox.co.il ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Sun May 28 00:11:25 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Sun, 28 May 2006 00:11:25 -0700 (PDT) Subject: [openib-general] [Bug 95] Stack seems to reorder multicast entries Message-ID: <20060528071125.60F112283F1@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=95 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #1 from tziporet at mellanox.co.il 2006-05-28 00:11 ------- >From Eli Cohen: I saw this problem with unicast UD datagrams as well. This was on Fedora C4 kernel 2.6.11-1.1369_FC4smp. I verified that the packets arrived in order just before calling netif_rx_ni() by peeking into the ip and udp layers. After that I tried this on kernel 2.6.16.17 and the there were no out of order reports. So I guess this was a Linux networking stack problem that was resolved in newer kernels. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at openib.org Sun May 28 00:13:37 2006 From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org) Date: Sun, 28 May 2006 00:13:37 -0700 (PDT) Subject: [openib-general] [Bug 91] sizeof(srp_indirect_buf) wrong on 64-bit platforms Message-ID: <20060528071337.18B37228410@openib.ca.sandia.gov> http://openib.org/bugzilla/show_bug.cgi?id=91 tziporet at mellanox.co.il changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bugzilla at openib.org |vu at mellanox.com ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From mst at mellanox.co.il Sun May 28 00:18:39 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 28 May 2006 10:18:39 +0300 Subject: [openib-general] Re: NOP problem in ib_mthca on OFED RC4 In-Reply-To: References: <20060517144830.GE30211@mellanox.co.il> Message-ID: <20060528071838.GF21266@mellanox.co.il> Quoting r. Don.Albert at bull.com : > Subject: Re: NOP problem in ib_mthca on OFED RC4 > > > Michael, > > Sorry for the long delay in replying, I was on vacation for 10 days, then > when I returned, the OFED RC5 release was imminent, so I decided to wait to > install it before persuing this further. Of course, when I did, the problem > mysteriously went away. The ib_mthca module now initializes correctly on > both EM64T machines. I noticed some discussion between you and Roland about > making the parameter "fw_cmd_doorbell=0" the default. Did this occur in RC5? Yes, we changed fw_cmd_doorbell to 0 by default for now because it seemed safer. I expect if you load mthca with fw_cmd_doorbell=1 you still get an error, isn't that right? > > > > Could you please give more detail on the exact system that had/has > > this problem? Model, chipset revision, full lspci -v output, etc. > > -- > > MST > > In case the problem comes back again with RC6, below is some information on the machine that had the problem. > > -Don Albert- > > > MODEL x86_64 [type=x86_64] > CPU 4 x Intel(R) Xeon(TM) CPU 3.00GHz, 64 bits 2992.628 Mhz > MEM 2055516 kB real memory > FIRM e820 > OS Red Hat Enterprise Linux AS release 4 (Nahant Update 3) - kernel 2.6.16 > > [jatoba] (ib) ib> /sbin/lspci -v > pcilib: Resource 2 in /sys/bus/pci/devices/0000:03:00.0/resource has a 64-bit address, ignoring > 00:00.0 Host bridge: Intel Corporation E7525 Memory Controller Hub (rev 0c) > Subsystem: Intel Corporation: Unknown device 3444 > Flags: bus master, fast devsel, latency 0 > Capabilities: [40] Vendor Specific Information > > 00:00.1 Class ff00: Intel Corporation E7525/E7520 Error Reporting Registers (rev 0c) > Subsystem: Intel Corporation: Unknown device 3444 > Flags: fast devsel > > 00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 0c) (prog-if 00 [Normal decode]) > Flags: bus master, fast devsel, latency 0 > Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 > Capabilities: [50] Power Management version 2 > Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable- > Capabilities: [64] Express Root Port (Slot-) IRQ 0 > Capabilities: [100] Advanced Error Reporting > > 00:03.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A1 (rev 0c) (prog-if 00 [Normal decode]) > Flags: bus master, fast devsel, latency 0 > Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 > Capabilities: [50] Power Management version 2 > Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable- > Capabilities: [64] Express Root Port (Slot-) IRQ 0 > Capabilities: [100] Advanced Error Reporting > > 00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 0c) (prog-if 00 [Normal decode]) > Flags: bus master, fast devsel, latency 0 > Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 > Memory behind bridge: ded00000-dedfffff > Prefetchable memory behind bridge: 00000000ff800000-00000000fff00000 > Capabilities: [50] Power Management version 2 > Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable- > Capabilities: [64] Express Root Port (Slot+) IRQ 0 > Capabilities: [100] Advanced Error Reporting > > 00:1c.0 PCI bridge: Intel Corporation 6300ESB 64-bit PCI-X Bridge (rev 02) (prog-if 00 [Normal decode]) > Flags: bus master, 66Mhz, fast devsel, latency 64 > Bus: primary=00, secondary=04, subordinate=04, sec-latency=48 > Memory behind bridge: dee00000-deefffff > Capabilities: [50] PCI-X bridge device. > > 00:1d.0 USB Controller: Intel Corporation 6300ESB USB Universal Host Controller (rev 02) (prog-if 00 [UHCI]) > Subsystem: Intel Corporation: Unknown device 3444 > Flags: bus master, medium devsel, latency 0, IRQ 169 > I/O ports at d880 [size=32] > > 00:1d.1 USB Controller: Intel Corporation 6300ESB USB Universal Host Controller (rev 02) (prog-if 00 [UHCI]) > Subsystem: Intel Corporation: Unknown device 3444 > Flags: bus master, medium devsel, latency 0, IRQ 201 > I/O ports at dc00 [size=32] > > 00:1d.4 System peripheral: Intel Corporation 6300ESB Watchdog Timer (rev 02) > Subsystem: Intel Corporation: Unknown device 3444 > Flags: medium devsel > Memory at decff800 (32-bit, non-prefetchable) [size=16] > > 00:1d.5 PIC: Intel Corporation 6300ESB I/O Advanced Programmable Interrupt Controller (rev 02) (prog-if 20 [IO(X)-APIC]) > Subsystem: Intel Corporation: Unknown device 3444 > Flags: bus master, fast devsel, latency 0 > Capabilities: [50] PCI-X non-bridge device. > > 00:1d.7 USB Controller: Intel Corporation 6300ESB USB2 Enhanced Host Controller (rev 02) (prog-if 20 [EHCI]) > Subsystem: Intel Corporation: Unknown device 3444 > Flags: bus master, medium devsel, latency 0, IRQ 193 > Memory at decffc00 (32-bit, non-prefetchable) [size=1K] > Capabilities: [50] Power Management version 2 > Capabilities: [58] Debug port > > 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 0a) (prog-if 00 [Normal decode]) > Flags: bus master, fast devsel, latency 0 > Bus: primary=00, secondary=05, subordinate=05, sec-latency=32 > I/O behind bridge: 0000e000-0000efff > Memory behind bridge: def00000-dfffffff > Prefetchable memory behind bridge: 88000000-880fffff > > 00:1f.0 ISA bridge: Intel Corporation 6300ESB LPC Interface Controller (rev 02) > Flags: bus master, medium devsel, latency 0 > > 00:1f.1 IDE interface: Intel Corporation 6300ESB PATA Storage Controller (rev 02) (prog-if 8a [Master SecP PriP]) > Subsystem: Intel Corporation: Unknown device 3444 > Flags: bus master, medium devsel, latency 0, IRQ 185 > I/O ports at > I/O ports at > I/O ports at > I/O ports at > I/O ports at fc00 [size=16] > Memory at 88100000 (32-bit, non-prefetchable) [size=1K] > > 00:1f.2 IDE interface: Intel Corporation 6300ESB SATA Storage Controller (rev 02) (prog-if 8f [Master SecP SecO PriP PriO]) > Subsystem: Intel Corporation: Unknown device 3444 > Flags: bus master, 66Mhz, medium devsel, latency 0, IRQ 185 > I/O ports at d800 [size=8] > I/O ports at d480 [size=4] > I/O ports at d400 [size=8] > I/O ports at d080 [size=4] > I/O ports at d000 [size=16] > > 00:1f.3 SMBus: Intel Corporation 6300ESB SMBus Controller (rev 02) > Subsystem: Intel Corporation: Unknown device 3444 > Flags: medium devsel, IRQ 11 > I/O ports at 0400 [size=32] > > 03:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20) > Subsystem: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] > Flags: bus master, fast devsel, latency 0, IRQ 169 > Memory at ded00000 (64-bit, non-prefetchable) [size=1M] > Memory at (64-bit, prefetchable) > Capabilities: [40] Power Management version 2 > Capabilities: [48] Vital Product Data > Capabilities: [90] Message Signalled Interrupts: 64bit+ Queue=0/5 Enable- > Capabilities: [84] MSI-X: Enable- Mask- TabSize=32 > Capabilities: [60] Express Endpoint IRQ 0 > > 04:02.0 Ethernet controller: Alteon Networks Inc. AceNIC Gigabit Ethernet (rev 01) > Subsystem: IBM Gigabit Ethernet-SX PCI Adapter > Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 177 > Memory at deefc000 (32-bit, non-prefetchable) [size=16K] > > 05:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA]) > Subsystem: Intel Corporation: Unknown device 3444 > Flags: bus master, stepping, medium devsel, latency 64, IRQ 11 > Memory at df000000 (32-bit, non-prefetchable) [size=16M] > I/O ports at e800 [size=256] > Memory at defff000 (32-bit, non-prefetchable) [size=4K] > Expansion ROM at 88000000 [disabled] [size=128K] > Capabilities: [5c] Power Management version 2 > > -- MST From fareymist at kumikazi.com Mon May 29 03:21:04 2006 From: fareymist at kumikazi.com (Maddox Wright) Date: Mon, 29 May 2006 18:21:04 +0800 Subject: [openib-general] Buy OEM Software Message-ID: <000001c6826a$8cfa5c80$0100007f@localhost> Special Offer Adobe Video Collection Adobe Premiere 1.5 Professional Adobe After Effects 6.5 Professional Adobe Audition 1.5 Adobe Encore DVD 1.5 $149.95 More Info >> Microsoft 2 in 1 MS Windows XP Pro MS Office 2003 Pro $99.95 More Info >> Microsoft + Adobe 3 in 1 MS Windows XP Pro MS Office 2003 Pro Adobe Acrobat 7.0 Professional $149.95 More Info >> Bestsellers Microsoft Office Professional Edition 2003 Rating: 6 reviews Retail price: $550.00 You save: $480.05 (87%) Our price: $69.95 [Add to cart] Microsoft Windows XP Professional Rating: 8 reviews Retail price: $200.00 You save: $150.05 (75%) Our price: $49.95 [Add to cart] Adobe Photoshop CS2 V 9.0 Rating: 3 reviews Retail price: $599.00 You save: $529.05 (88%) Our price: $69.95 [Add to cart] -------------- next part -------------- An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Sun May 28 04:08:10 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 28 May 2006 14:08:10 +0300 (IDT) Subject: [openib-general] IB/iser fix for building with DEBUG_SCSI defined Message-ID: fixed compile error when DEBUG_SCSI is defined in include/scsi/libiscsi.h, use iser_ctask->rdma_data_count instead of ctask->rdma_data_count Signed-off-by: Or Gerlitz diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index 4f724a3..e1d717d 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -148,10 +148,10 @@ iscsi_iser_cmd_init(struct iscsi_cmd_tas ctask->imm_count - ctask->unsol_count; - debug_scsi("cmd [itt %x total %d imm %d imm_data %d " + debug_scsi("cmd [itt %x total %d imm %d unsol_data %d " "rdma_data %d]\n", ctask->itt, ctask->total_length, ctask->imm_count, - ctask->unsol_count, ctask->rdma_data_count); + ctask->unsol_count, iser_ctask->rdma_data_count); } else /* bytes to be sent via RDMA operations */ iser_ctask->rdma_data_count = ctask->total_length; From cassadyetaranis at inna.net Sun May 28 04:22:57 2006 From: cassadyetaranis at inna.net (Taranis Cassady) Date: Sun, 28 May 2006 04:22:57 -0700 Subject: [openib-general] chippi 7032 Message-ID: <000001c68249$12601cb0$cb8ba8c0@qfn8> Hi, M e R / D / A V / a G R A A m B / E N V A L / u M L e V / T R A T r & m a d o I P R O z & C S O m & A m o x / c i I l / n C i A L / S X & n a x http://www.surendeawers.com up again, said Gandalf, or soon there will be no getting over the mountains at all. As soon as Gandalf had heard Bilbos yell he realized what had happened. In the flash which killed the goblins that were grabbing him he had nipped inside the crack, just as it snapped to. He followed after the drivers and prisoners right to the edge of the great hall, and there -------------- next part -------------- An HTML attachment was scrubbed... URL: From eli at mellanox.co.il Sun May 28 04:34:56 2006 From: eli at mellanox.co.il (Eli Cohen) Date: Sun, 28 May 2006 14:34:56 +0300 Subject: [openib-general] [PATCH] ipoib Message-ID: <1148816096.8812.2.camel@mtls03.yok.mtl.com> When ipoib_stop is called it first calls netif_stop_queue() to stop the kernel from passing more packets to the network driver. After that ipoib_ib_dev_stop() moves the QP to error causing all pending work requests to complete with error but then the completion handler may call netif_wake_queue() re-enabling packet transfer. This patch makes sure to enable calling netif_wake_queue() only if IPOIB_FLAG_ADMIN_UP is set. Signed-off-by: Eli Cohen Signed-off-by: Michael Tsirkin Index: linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- linux-kernel.orig/infiniband/ulp/ipoib/ipoib_ib.c 2006-05-28 12:53:19.661381000 +0300 +++ linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c 2006-05-28 13:00:02.512096000 +0300 @@ -269,6 +269,7 @@ spin_lock_irqsave(&priv->tx_lock, flags); ++priv->tx_tail; if (netif_queue_stopped(dev) && + test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) && priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) netif_wake_queue(dev); spin_unlock_irqrestore(&priv->tx_lock, flags); From ogerlitz at voltaire.com Sun May 28 05:02:19 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 28 May 2006 15:02:19 +0300 (IDT) Subject: [openib-general] [PATCHv2 1/2] mthca support for max_map_per_fmr device attribute In-Reply-To: References: Message-ID: OK, here's a modified version of the patch to support both mem full/free HCAs, in the memfree case the code relies on this quote from the (section 4.2 pp 45) memfree PRM: "A memory key is composed of two fields, a 24-bit index and an 8-bit key. The key field is an arbitrarily chosen number. The index field is unique number used as an index to an MPT table entry, ..." Or. implement max_map_per_fmr device attribute for mthca Signed-off-by: Or Gerlitz Index: hw/mthca/mthca_provider.c =================================================================== --- hw/mthca/mthca_provider.c (revision 7031) +++ hw/mthca/mthca_provider.c (working copy) @@ -116,6 +116,15 @@ static int mthca_query_device(struct ib_ props->max_total_mcast_qp_attach = props->max_mcast_qp_attach * props->max_mcast_grp; + /* on memfull HCA an FMR can be remapped 2^B - 1 times where B < 32 is + * the number of bits which are not used for MPT addressing, on memfree + * HCA B=8 so an FMR can be remapped 255 times. + */ + if(!mthca_is_memfree(mdev)) + props->max_map_per_fmr = (1 << (32 - + long_log2(mdev->limits.num_mpts))) - 1; + else + props->max_map_per_fmr = (1 << 8) - 1; err = 0; out: kfree(in_mad); From ogerlitz at voltaire.com Sun May 28 05:03:24 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 28 May 2006 15:03:24 +0300 (IDT) Subject: [openib-general] [PATCHv2 2/2] port the fmr pool to use the max_map_per_fmr device attribute In-Reply-To: References: Message-ID: fixed the patch to allocate the device attributes structure dynamically Or. port the generic fmr pool to query the ib device and use the device attribute as for the max number of fmr remaps. If the device does not suport the attribute, the code reverts to use the IB_FMR_MAX_REMAPS (32) default. Signed-off-by: Or Gerlitz Index: core/fmr_pool.c =================================================================== --- core/fmr_pool.c (revision 7031) +++ core/fmr_pool.c (working copy) @@ -54,7 +54,7 @@ enum { /* * If an FMR is not in use, then the list member will point to either * its pool's free_list (if the FMR can be mapped again; that is, - * remap_count < IB_FMR_MAX_REMAPS) or its pool's dirty_list (if the + * remap_count < device_attr.max_map_per_fmr) or its pool's dirty_list (if the * FMR needs to be unmapped before being remapped). In either of * these cases it is a bug if the ref_count is not 0. In other words, * if ref_count is > 0, then the list member must not be linked into @@ -84,6 +84,7 @@ struct ib_fmr_pool { int pool_size; int max_pages; + int max_remaps; int dirty_watermark; int dirty_len; struct list_head free_list; @@ -214,8 +215,10 @@ struct ib_fmr_pool *ib_create_fmr_pool(s { struct ib_device *device; struct ib_fmr_pool *pool; + struct ib_device_attr *attr; int i; int ret; + int max_remaps; if (!params) return ERR_PTR(-EINVAL); @@ -228,6 +231,25 @@ struct ib_fmr_pool *ib_create_fmr_pool(s return ERR_PTR(-ENOSYS); } + attr = kmalloc(sizeof *attr, GFP_KERNEL); + if (!attr) { + printk(KERN_WARNING "couldn't allocate device attr struct"); + return ERR_PTR(-ENOMEM); + } + ret = ib_query_device(device, attr); + if (ret) { + printk(KERN_WARNING "couldn't query device"); + kfree(attr); + return ERR_PTR(ret); + } + /* use the default max remaps for drivers not setting the attribute */ + if (!attr->max_map_per_fmr) + max_remaps = IB_FMR_MAX_REMAPS; + else + max_remaps = attr->max_map_per_fmr; + + kfree(attr); + pool = kmalloc(sizeof *pool, GFP_KERNEL); if (!pool) { printk(KERN_WARNING "couldn't allocate pool struct"); @@ -258,6 +280,7 @@ struct ib_fmr_pool *ib_create_fmr_pool(s pool->pool_size = 0; pool->max_pages = params->max_pages_per_fmr; + pool->max_remaps = max_remaps; pool->dirty_watermark = params->dirty_watermark; pool->dirty_len = 0; spin_lock_init(&pool->pool_lock); @@ -279,7 +302,7 @@ struct ib_fmr_pool *ib_create_fmr_pool(s struct ib_pool_fmr *fmr; struct ib_fmr_attr attr = { .max_pages = params->max_pages_per_fmr, - .max_maps = IB_FMR_MAX_REMAPS, + .max_maps = pool->max_remaps, .page_shift = params->page_shift }; @@ -489,7 +512,7 @@ int ib_fmr_pool_unmap(struct ib_pool_fmr --fmr->ref_count; if (!fmr->ref_count) { - if (fmr->remap_count < IB_FMR_MAX_REMAPS) { + if (fmr->remap_count < pool->max_remaps) { list_add_tail(&fmr->list, &pool->free_list); } else { list_add_tail(&fmr->list, &pool->dirty_list); From eli at mellanox.co.il Sun May 28 05:47:28 2006 From: eli at mellanox.co.il (Eli Cohen) Date: Sun, 28 May 2006 15:47:28 +0300 Subject: [openib-general] [PATCH] ipoib Message-ID: <200605281547.29313.eli@mellanox.co.il> Sorry about the badly formatted previous email - here it is again... When ipoib_stop is called it first calls netif_stop_queue() to stop the kernel from passing more packets to the network driver. After that ipoib_ib_dev_stop() moves the QP to error causing all pending work requests to complete with error but then the completion handler may call netif_wake_queue() re-enabling packet transfer. This patch makes sure to enable calling netif_wake_queue() only if IPOIB_FLAG_ADMIN_UP is set. Signed-off-by: Eli Cohen Signed-off-by: Michael Tsirkin Index: linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- linux-kernel.orig/infiniband/ulp/ipoib/ipoib_ib.c 2006-05-28 12:53:19.661381000 +0300 +++ linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c 2006-05-28 13:00:02.512096000 +0300 @@ -269,6 +269,7 @@ spin_lock_irqsave(&priv->tx_lock, flags); ++priv->tx_tail; if (netif_queue_stopped(dev) && + test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) && priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) netif_wake_queue(dev); spin_unlock_irqrestore(&priv->tx_lock, flags); From mst at mellanox.co.il Sun May 28 06:15:27 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 28 May 2006 16:15:27 +0300 Subject: [openib-general] ia64: kernel unaligned access in ipoib Message-ID: <20060528131527.GX21266@mellanox.co.il> Hi, Roland! When running ipoib on an ia64 machine, I sometimes see kernel unaligned access to 0xe0000002ffb53f0c, ip=0xa00000020041b450 kernel unaligned access to 0xe0000002ffb53f14, ip=0xa00000020041b470 where 0xa00000020041b450 and 0xa00000020041b470 appear to be inside ipoib path_rec_create function. -- MST From ishai at mellanox.co.il Sun May 28 07:27:12 2006 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Sun, 28 May 2006 17:27:12 +0300 Subject: [openib-general] [PATCH] ibsrpdm - allocate agent Message-ID: <20060528142712.GB14812@mellanox.co.il> The array agent is allocated with no entries. When agent is being accessed in create_agent there is a memory corruption. Signed-off-by: Ishai Rabinovitz Index: last_stable/src/userspace/srptools/src/srp-dm.c =================================================================== --- last_stable.orig/src/userspace/srptools/src/srp-dm.c 2006-05-28 14:24:25.000000000 +0300 +++ last_stable/src/userspace/srptools/src/srp-dm.c 2006-05-28 14:55:41.000000000 +0300 @@ -536,7 +536,7 @@ static int get_port_list(int fd, uint32_ int main(int argc, char *argv[]) { int fd; - uint32_t agent[0]; + uint32_t agent[2]; char *cmd_name = strdup(argv[0]); while (1) { -- Ishai Rabinovitz From genevievesantos at seeq.com Sun May 28 06:50:28 2006 From: genevievesantos at seeq.com (augie cross) Date: Sun, 28 May 2006 09:50:28 -0400 Subject: [openib-general] Need cash, Mongol-galchic alphabet Message-ID: <00b401c6825c$cf1ca180$f03e33aa@qisvgrk> How much are you paying for your Home? To much? You have been pre-approved to fill out for a ref inance laon, if you need some cash to spend ANY way you like, or simply wish to LOWER your monthly payments by a third or more, etc. We skip the middle man to save hundreds with deals we have! This offer is for you, we DONT CARE about your credit. Apply online now for your instant quote. Stop over paying... http://onfot.org/d2/ sea letter pocket lighter catchfly grass Java almond slight-esteemed stamp booklet cathedral church bandy leg cotton seed hand plow saddle hand Pro-freudian ground line vegetable orange ginger ale regent bird well-wired pulley lathe vocal fremitus relative-in-law shuffle scale chocolate molder ten-grain meadow fescue fog whistle tomato hamper shore cover well-dissembled Pro-californian simple-stemmed sugar pine -------------- next part -------------- An HTML attachment was scrubbed... URL: From agartansy at hinet.org Sun May 28 08:50:04 2006 From: agartansy at hinet.org (Tansy Agarwal) Date: Sun, 28 May 2006 08:50:04 -0700 Subject: [openib-general] Re: refnance it Message-ID: <000001c6826e$630da590$c299a8c0@ksd41> D t ea d r H v om o e O x wne k r, Your c q re q di m t doesn't matter to us! If you OV o VN r u ea r l e i st r at q e and want I p MME x DI i ATE c i as f h to s d pe h nd ANY way you like, or simply wish to L c OW k ER your monthly p p ayme d nt x s by a third or more, here are the d o ea l ls r we have T i OD o AY: $ 4 y 90 , 00 o 0 a l s lo k w a c s 3 , 6 d 5 % $ 3 l 70 , 0 c 00 a g s lo z w a c s 3 , 9 j 0 % $ 25 g 0 , 0 z 00 a i s l o ow a g s 3 , 3 s 5 % $ 20 r 0 , 00 u 0 a r s l r ow a v s 3 , 5 c 5 % V k is k it o s ur web s x it e e Tansy Agarwal , A o ppr g ova e l M b ana n ge j r even coalmining. But we have never forgotten our stolen treasure. And even now, when I will allow we have a good bit laid by and are not so badly off-here Thorin stroked the gold chain round his neck-we still mean to get it back, and to bring our curses home to Smaug-if we can. -------------- next part -------------- An HTML attachment was scrubbed... URL: From semprinivea at newberlinmagic.com Sun May 28 09:39:17 2006 From: semprinivea at newberlinmagic.com (Cade Roberts) Date: Mon, 29 May 2006 00:39:17 +0800 Subject: [openib-general] You can save up to 70% Message-ID: <000001c68273$dbf8bf80$0100007f@localhost> Langdon looked again at the fax an ancient myth confirmed in black and white. The implications were frightening. He gazed absently through the bay window. The first hint of dawn was sifting through the birch trees in his backyard, but the view looked somehow different this morning. As an odd combination of fear and exhilaration settled over him, Langdon knew he had no choice The man led Langdon the length of the hangar. They rounded the corner onto the runway. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 15237 bytes Desc: not available URL: From ayako-anet at odn.ne.jp Sun May 28 10:14:27 2006 From: ayako-anet at odn.ne.jp (=?shift-jis?B?jbKToYFAloOXnQ==?=) Date: Sun, 28 May 2006 10:14:27 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCISpGQ0pMNGsyaDMrOkUbKEI=?= =?iso-2022-jp?b?GyRCIXkbKEI=?= Message-ID: <20060528171427.CB7B022841A@openib.ca.sandia.gov> ☆今もっとも男女ともに大人気のサイト☆ http://www.gyakuen-queen.net/?e61 Q. どんな人たちがいるの? A. 只今男性の平均年齢約38歳、    女性の平均は約34歳となっております。    真面目に結婚したい方から既婚者同士仲良く    なりたい方、気軽に話せる友達募集の方。    最高齢は男性81歳女性71歳となります。    第二の人生志望者も相当多いのが現実です。    Q. フリーアドレスだけで大丈夫? A. はい、それだけでOKです。携帯アドレスや    番号などは必要ありません。    相手に教えたい時に教えてあげてね☆        ☆退会も自由☆         ↓↓↓↓ [ http://www.gyakuen-queen.net/?e61 ]      配信停止希望及び不要の方はお手数ですが、 以下までお願いします。 k_49singing_in_the_rain at yahoo.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From nene4649 at walla.com Sun May 28 12:20:25 2006 From: nene4649 at walla.com (nene4649 at walla.com) Date: Sun, 28 May 2006 12:20:25 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCIVohITojPTUkTjVVMWcbKEI=?= =?iso-2022-jp?b?GyRCPXUlaSVzJS0lcyUwISEjVCNPI1AhISM1InYhISFbGyhC?= Message-ID: 20060529025429.73451mail@mail.pop_lachere_8754158754_top881server_system87_lachere.net ご覧ください(^-^)~♪★!   今週の当サイト各希望者願望達成ランキングの結果です。    天候など色々と安定しない一週間でしたが結果の方は・・・!?        【 今週の逆援助ランキング TOP 5♪ 】        ☆1位 新潟  ATSUSHIさん  月30万で契約         2位 東京  まことさん   月20万で契約         3位 神奈川 お願いしますさん 1週間程度、8万で契約         4位 大阪  カズさん    1回5万で成立         5位 熊本  佐藤亮さん   1回2万で成立 http://lachere.net/h/ 【 今週の即逢いランキング TOP 5♪ 】       (即H希望者の連絡先交換までの時間です。)        ☆1位 東京都 会いたい!さん 15分48秒         2位 埼玉  ゆうさん    13分05秒         3位 名古屋 松井115さん   11分42秒         4位 福島  40ですが・・・さん 10分13秒         5位 高知  ともさん    9分25秒 http://lachere.net/h/  約28分に一人の割合で出会い成立!!        ★                               ★ ★ ★ 1分に 約6983円 の逆援助が成立!! ★ ★       ~~~~~~~~~~~            ★ ★ ★ 今日までの出会い成立人数 41268名 !! ★ ★              ~~~~~~~~~~~~~ ★ ★ ★ 現在会員登録者 54126名 !! ★ ★            ~~~~~~~~~~~~~~~             ★ ★  ★ 1日の平均アクセス数 5000HIT !!            ★ ★              ~~~~~~~~~~~~~            ★ ★ ( ※ 比べてください。この数値を・・・         ★ ★     今度は貴方が体験してください。この凄さを・・・ ) http://lachere.net/h/ From jsshepherd at ohayani.com Sun May 28 12:12:53 2006 From: jsshepherd at ohayani.com (Trent Perez) Date: Mon, 29 May 2006 03:12:53 +0800 Subject: [openib-general] Need S0ftware? Message-ID: <000001c682b5$8f35d880$0100007f@localhost> Special Offer Adobe Video Collection Adobe Premiere 1.5 Professional Adobe After Effects 6.5 Professional Adobe Audition 1.5 Adobe Encore DVD 1.5 $149.95 More Info >> Microsoft 2 in 1 MS Windows XP Pro MS Office 2003 Pro $99.95 More Info >> Microsoft + Adobe 3 in 1 MS Windows XP Pro MS Office 2003 Pro Adobe Acrobat 7.0 Professional $149.95 More Info >> Bestsellers Microsoft Office Professional Edition 2003 Rating: 6 reviews Retail price: $550.00 You save: $480.05 (87%) Our price: $69.95 [Add to cart] Microsoft Windows XP Professional Rating: 8 reviews Retail price: $200.00 You save: $150.05 (75%) Our price: $49.95 [Add to cart] Adobe Photoshop CS2 V 9.0 Rating: 3 reviews Retail price: $599.00 You save: $529.05 (88%) Our price: $69.95 [Add to cart] -------------- next part -------------- An HTML attachment was scrubbed... URL: From necojp at citiz.net Sun May 28 14:52:29 2006 From: necojp at citiz.net (=?shift-jis?B?ZXJpX21ha2k=?=) Date: Sun, 28 May 2006 14:52:29 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCOSUkLTd5JCQkTzZYO18bKEI=?= =?iso-2022-jp?b?GyRCJEckOSFKPlAhSxsoQg==?= Message-ID: <20060528215229.2925F22841A@openib.ca.sandia.gov> 徐々に女の子のスカートの丈が短くなってくる季節ですね。 街でムラムラしているのもむなしいですから、ぜひそんな彼女たちとの 接点をここで見つけてみませんか? 若い女性から30歳以上の熟女までさまざまな女性が登録しています。 完全無料だからメールも打ち放題です。 相性の問題はありますが、ゲット率は結構高いという定評のサイトです。 ですから、じっくりと彼女たちの警戒心を解き、エッチをしたい!という 気持ちをくすぐってあげてくださいね。 http://www.meguriai-max.net/?j96 拒否 p_for_the_pussycat at yahoo.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From tsumami_kui117117 at yahoo.co.jp Sun May 28 15:45:17 2006 From: tsumami_kui117117 at yahoo.co.jp (tsumami_kui117117 at yahoo.co.jp) Date: Sun, 28 May 2006 15:45:17 -0700 (PDT) Subject: [openib-general] =?utf-8?b?woJSwoJPwpViwoLDhcKPwoDClMO1worCrsKX?= =?utf-8?b?wrk=?= Message-ID: 20050529073021.49673mail@mail.love-woman889889_gogo-server114_freesystem01_freefree-lovelove.tv �����@�Ȕ��򂢁I�@���� �������������������������������������� ���@�o�^�v���t�@�C���^�C�v�őI�ׂ�@���@ �������������������������������������� �����@���ݓo�^�Ȑ��@���� http://ad.deai-wife.net/?moakw ���G���n�ȁ@�@3286�l �� ���M�����n�ȁ@5512�l �� �����^�n�ȁ@�@2010�l �� �������n�ȁ@�@3824�l �� ���n���n�ȁ@�@18501�l�� �@�o�^���E���p�� �E�E�E�E�E�E�E�E�E�y�����z ���[���̑���M �E�E�E�E�E�E�E�E�E�y�����z ���[�U�[�̌��� �E�E�E�E�E�E�E�E�E�y�����z �f���‚̉{���E������ �E�E�E�E�E�E�y�����z �摜���E�A�b�v���[�h �E�E�E�E�E�y�����z �A�h���X���E�d�b�ԍ��� �E�E�E�y�����z �S�ẴT�[�r�X�������Ŋy���߂�͓̂�����O�ł��ˁB �ł���ꂾ�����I�X�X�����������R�ł͂���܂��� ���o�^�͒��ȒP�I�I �����D���ȃj�b�N�l�[���A���[���A�h���X ���n��A�N��A���t�^�A���� ������͂��邾���I�I �@http://ad.deai-wife.net/?moakw ���g���A�X�^�C���A�o�X�g �@�D�݂̃^�C�v�Ől�Ȍ��� ���o�^�n��Ől�Ȍ��� �@http://ad.deai-wife.net/?moakw ���{���̓��e�� ���@�䂤������F19�� �����ɓo�^���Ă݂܂��� �@http://ad.deai-wife.net/?moakw ���@�܂肠����F39�� ��???�W���[�X���s�����Ă��܂��A��������? �@http://ad.deai-wife.net/?moakw ���@�~�J�R����F25�� ���͌������Ďq������܂����T�|�[�g���Ē������l�̊֌W��]�ł��� �@http://ad.deai-wife.net/?moakw ���@����������F18�� �����f��������X�^�C���Ɗ�ɂ쎩�M����?��낵���j����? �@http://ad.deai-wife.net/?moakw ���@�n���[����F25�� �N���[�����F�ɂȂ��āB �@http://ad.deai-wife.net/?moakw ���@�������҂���F22�� ������Ƃ��݂������Ȃ�ł����ǁA���[�����璇�ǂ����Ă��������� �@http://ad.deai-wife.net/?moakw ���������������������������������������� ���@�Ȕ���ID���s�́@�@�@�@�@�@�@�@�@�� ���@��http://ad.deai-wife.net/?moakw�@�� ���������������������������������������� From aika2006 at cooltoad.com Sun May 28 16:25:18 2006 From: aika2006 at cooltoad.com (aika2006 at cooltoad.com) Date: Sun, 28 May 2006 16:25:18 -0700 (PDT) Subject: [openib-general] =?utf-8?b?woLCscKCwr/CgsOnwoLDicKCw4jCgsOowoI=?= =?utf-8?b?w5zCgsK3?= Message-ID: 20030923144031.77143mail@mail.hyper_luckylady8754158754_webserver52_serebusystem59_lily-adolescence.cx ���������������������������������������������������������������������� �@�@�@�@�@�@�@�@���M���ɑf�G�Ȑ��E�������v���܂����@ ���������������������������������������������������������������������� �@�@�@�@���߂܂��āA����ł��I����˗����������Љ�̌��ł��A���v���܂����B �@�����E�n���̕��ł����߂̏o����]���Ă�����Q�l�̏Љ�˗���󂯂܂����B �@������Љ�E����M���E�������ȂǁA��ؗ�������ł��Љ�v���܂��B �@���S�����ł̂��Љ�ł��̂ŁA�����S�������B�������ߐ؂�̏ꍇ�͂������������B �����񄫂����Є���̄������̈́� �������������������������������� �@�@�@�@���O�F���󂳂�@�@�@�@�@�@�@�@�@�@�@�@�@���O�F�Í]���� �@�@�@�@�N��F31�΁@�@�@�@�@�@�@�@�@�@�@�@�@�@�@�N��F44�� �@�@�@�@�E�ƁF��w�@�@�@�@�@�@�@�@�@�@�@�@�@�@�@�E�ƁF�����֌W �@����l�̏ڂ����ڍׂ͂�����ˁ@http://lily-adolescence.cx/h/ From masayo-your at ocn.ne.jp Sun May 28 17:18:13 2006 From: masayo-your at ocn.ne.jp (=?gb2312?B?jYKW7IFAl1KLSQ==?=) Date: Sun, 28 May 2006 17:18:13 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCQDo7UiQvJEAkNSQkGyhC?= Message-ID: <20060529001813.61BAB22841A@openib.ca.sandia.gov> 突然申し訳ありません。多田ユリと申します。結婚13年目訳あって子供ができません。 是非、精子を分けて頂けませんでしょうか?こちらは中出しという形でいっこうに構いません。 多少ですがお礼の方もさせていただきます。どうか宜しくお願いいたします。 http://www.meguriai-max.net/?2007 受信拒否 p_partyparty0125 at yahoo.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From yukihana at yahoo.co.jp Sun May 28 18:01:08 2006 From: yukihana at yahoo.co.jp (=?iso-2022-jp?B?eXVraWhhbmE=?=) Date: Sun, 28 May 2006 18:01:08 -0700 (PDT) Subject: [openib-general] Re: Message-ID: <20060529010108.B080D22841A@openib.ca.sandia.gov> 完┃全┃無┃料┃&┃使┃い┃放┃題┃ ━┛━┛━┛━┛━┛━┛━┛━┛━┛   ▽▽▽▽▽▽ http://love-match.bz/pc/?02 ◆入会費/年会費→→→→→→→→→→→無料 ◇メール受信&送信→→→→→→→→→→無料 ◆理想の相手/ご近所検索→→→→→→→無料 ◇写真/動画の登録&閲覧→→→→→→→無料 ◆住所/アドレス/電話番号の交換→→→無料 男性♂女性♀ともに完全無料で全てご利用頂けます。 http://love-match.bz/pc/?02 安心サイト宣言 ・・・…━━━━━━━━━━━━━━━━━━━━━━━━━★ 当番組はスポンサーサイト様からの広告料のみで運営しております。 その為、全サービスを完全無料で全てご利用頂けます。 理想の相手を求める方々に安心と信頼を第一に考えながら、 気軽にお使い頂けるサービスを目指し、 万全のサポート体制で皆様に合ったお相手探しをお手伝いしております。 ★━━━━━━━━━━━━━━━━━━━━━━━━━…・・・ http://ad.deai-ciao.net/?hkbb 広東省茂名市人民大街3-6-4-533 友誼網絡公司 139-3668-7892 From herneh5kee at hotmail.com Mon May 29 06:43:12 2006 From: herneh5kee at hotmail.com (Harrison) Date: Mon, 29 May 2006 08:43:12 -0500 Subject: [openib-general] hurry and watch HYWI.PK get straight to earning Message-ID: <20060529032551.1879F22841A@openib.ca.sandia.gov> Generate more money with expert info on booming stokcs Professional stokc advice that drives up the profits Get HYWI First Thing on MOnday, This stcok Going To Explode for at least 30% Check out for Hot News! Holylwood Intermedaite, Inc. Symbol: H Y W I - H Y W I - H Y W I- H Y W I- H Y W I- H Y W I Current prise: $1.28 , but will increase at least 30-50 % on Monday! About the company: Hollyowod Intermeditae provdies a propriteary techonlogy of Digiatl Intremediate services to feature filmmakers for post-production for film mastering and restoration. This technology gives the filmmakers total creative control over the look of their productions. Whether shooting on film or acquiring in HD or SD video, Hollywood Intermdeiate puts a powerful cluster of digital tools at the director's disposal to achieve stunning results on the big screen. Matchframe Digital Intermediate, a division of Hlolywood Intremediate, Inc., packages a full array of post-production services with negative handling expertise and cost-effective 2K digital intermediate and 35mm film out systems. Worthy stokc information that puts more in your pocket Growth stokc picks and stokc recommendations The Digital Intermediate process eliminates current post-production redundancies by creating a single high-resolution master file from which all versions can be made, including all theatrical and High Definition formats. By creating a single master file with resolution higher than the current High Definition broadcast standards, the DI master file enables cinema and television distributors to extract and archive all current and future cinema and television formats including Digital Cinema, Television and High Definition.Improve your yearly gains with expert stokc advice Don't forget to include this sotck to you bag! Learn stokc market patters that help earn more Unbiased stokc info and valuable insider data Read great new on this stcok A man comes from the dust and in the dust he will end - and in the meantime it is good to drink a sip of vodka. The faintest ink is more powerful than the strongest memory Well done is better than well said He is an ill companion that has a good memory An ant may well destroy a whole dam Charity covers a multitude of sins Early to bed and early to rise, make a man healthy, wealthy and wise To know wisdom and instruction; to perceive the words of understanding. Half a loaf is better than none. And they lay wait for their own blood; they lurk privily for their own lives. The West Wind Always Brings Wet Weather, The East Wind Wet and Cold Together, the South Wind Surely Brings Us Rain, the North Wind Blows It Back Again A hairy man is a happy man, a hairy wife is a witch. Revenge is a dish best served cold There is a time and a place for everything Love nature , live lightly A stumble may prevent a fall. Good broth may be made in an old pot Diamonds cut diamonds From kaori at vaqb.com Sun May 28 22:06:18 2006 From: kaori at vaqb.com (=?ISO-2022-JP?B?GyRCMkI/JRsoQg==?=) Date: 29 May 2006 14:06:18 +0900 Subject: [openib-general] $B 写真(写メ)でご希望のお相手を選べます。 写真を登録している方のみとなりますが、本気度が高い女性会員ほど写真を登録している方が多いのが現状です。 登録するだけでご利用ポイントが8000円分無料サービスとなりますので、無料ポイントだけを利用して体験してみてはいかがでしょうか。(現在は登録も無料となっております) ※退会もいつでも出来ます。      ↓↓入口はこちらから↓↓ http://www.qduw.com ※最近流行りのワンクリック詐欺や架空請求が無いことはもちろん登録されたお客様のプライバシーを守るための万全のセキュリティー体制も整っておりますので安心してご利用頂けます。 配信拒否希望の方はお手数ですがこちらまでお願いします。 cancel at ocxv.com From weimaldwy at cst-cpa.com Mon May 29 00:03:12 2006 From: weimaldwy at cst-cpa.com (Maldwyn Weinstock) Date: Mon, 29 May 2006 00:03:12 -0700 Subject: [openib-general] discolouratio 7237 Message-ID: <000001c682ed$f3576990$8611a8c0@mqr43> Hi, A m o x / c i I l / n X & n a x V / a G R A V A L / u M P R O z & C T r & m a d o I S O m & M e R / D / A C i A L / S L e V / T R A A m B / E N http://www.hisheron.com the goblins from the mountain-slopes, casting them over precipices, or driving them down shrieking and bewildered among their foes. It was not long before they had freed the Lonely Mountain, and elves and men on either side of the valley could come at last to the help of the battle below. But even with the Eagles they were still outnumbered. In that last -------------- next part -------------- An HTML attachment was scrubbed... URL: From clearrecord at highheelsmodels4fun.com Mon May 29 00:25:41 2006 From: clearrecord at highheelsmodels4fun.com (Blake Henderson) Date: Mon, 29 May 2006 15:25:41 +0800 Subject: [openib-general] Need S0ftware? Message-ID: <000001c6831c$233a8b00$0100007f@localhost> Special Offer Adobe Video Collection Adobe Premiere 1.5 Professional Adobe After Effects 6.5 Professional Adobe Audition 1.5 Adobe Encore DVD 1.5 $149.95 More Info >> Microsoft 2 in 1 MS Windows XP Pro MS Office 2003 Pro $99.95 More Info >> Microsoft + Adobe 3 in 1 MS Windows XP Pro MS Office 2003 Pro Adobe Acrobat 7.0 Professional $149.95 More Info >> Bestsellers Microsoft Office Professional Edition 2003 Rating: 6 reviews Retail price: $550.00 You save: $480.05 (87%) Our price: $69.95 [Add to cart] Microsoft Windows XP Professional Rating: 8 reviews Retail price: $200.00 You save: $150.05 (75%) Our price: $49.95 [Add to cart] Adobe Photoshop CS2 V 9.0 Rating: 3 reviews Retail price: $599.00 You save: $529.05 (88%) Our price: $69.95 [Add to cart] -------------- next part -------------- An HTML attachment was scrubbed... URL: From colombe at intexenterprises.ne Mon May 29 00:39:50 2006 From: colombe at intexenterprises.ne (Colombe Lotts) Date: Mon, 29 May 2006 00:39:50 -0700 Subject: [openib-general] biso 6130 Message-ID: <000001c682f3$118b6650$3c41a8c0@uap42> Hi, S O m & C i A L / S L e V / T R A V / a G R A P R O z & C A m o x / c i I l / n M e R / D / A T r & m a d o I X & n a x A m B / E N V A L / u M http://www.hisheron.com Fili and Kili, I believe, said Gandalf, as these two now appeared and stood smiling and bowing. Thats enough! said Beorn. Sit down and be quiet! Now go on, Gandalf! So Gandalf went on with the tale, until he came to the fight in the dark, the discovery of the lower gate, and their horror when they found that Mr. Baggins had been mislaid. -------------- next part -------------- An HTML attachment was scrubbed... URL: From info at pxhv.com Mon May 29 01:07:53 2006 From: info at pxhv.com (info at pxhv.com) Date: 29 May 2006 17:07:53 +0900 Subject: [openib-general] $B!ZBh#1#82s![Kh=5EZF|3+:E(B Message-ID: <20060529080753.8381.qmail@mail.pxhv.com> !セックスパーティーのお知らせ MIME-Version: 1.0 Content-type: text/plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-Mailer: WelcomeMail 【第18回】毎週土日開催!セックスパーティーのお知らせ http://re372.com/ai/ ★☆━━━━直メール対応で即アポ!━━━━★☆ _┌───────────────────────────┐_ \│ ◆無料セックス決定!セックスランキングのお相手を発表♪◆│ /└┬─────────────────────────┬┘/  ̄ ̄                           \ *゜+.。o○*☆*○o。゜+.。*゜+.。o○*☆*○o。゜+.。 ●今回は恒例のセックスランキング発表の前に、 ●ヌキヌキ☆イベントの超おトクなお知らせです♪ *゜+.。o○*☆*○o。゜+.。*゜+.。o○*☆*○o。゜+.。  〓☆〓☆〓☆〓<Let’s PARTY!>☆〓☆〓☆〓☆〓☆ ヌキヌキ☆セックス・パーティも今回で18回目。おかげさまで 個室等充実しており、大盛況となっております。既に参加されている場合は、 ご容赦ください。 http://re372.com/ai/  ☆〓☆〓☆〓☆〓☆〓☆〓☆〓☆〓☆〓☆〓☆〓☆〓☆〓☆〓☆ 今回、新規メンバーの方へのお誘いとなりますので、是非権利を獲得してください。 参加権などはすべて<無料>です!!なお、当サイトでは すでに多数の募集者の中から抽選を開始させて頂いてます。 http://re372.com/ai/ *゜+.。o○*☆*○o。゜+.。 ●今回できる娘は彼女! ●イマドキ娘の性を知る *゜+.。o○*☆*○o。゜+.。 お待たせしました! それでは、恒例!ランキングサイトから、交際希望女性の紹介です♪。 差出人は http://re372.com/ai/ 星村ちあき さん 26歳 フリーター です。 【はじめまして!そして突然のメールですみません。。。。 実は来週で遊べる人いませんかぁ?特に好みはないですが☆】 …(続きは下記をクリック)…     ↓ ↓ ↓  http://re372.com/ai/ 全ての内容は、 http://re372.com/ai/ で確認できます。この機会に、パーティーも理想の女性もGET! 期間は今月末まで。この機会、是非ともお見逃しなく! ------------------------------------ 勿論紹介に関わる料金は一切発生しません。 現在はフリーメールでもやり取りが可能です。 お気軽な出会いにぜひ  http://re372.com/ai/ をご利用下さい。 <広告> ・・・‥‥……━━━━━━━━━━━━━━━━━━━━━━……‥‥・・・    世界で唯一のバーチャルSEXゲームメーカー【 イリュージョン 】 ・・・‥‥……━━━━━━━━━━━━━━━━━━━━━━……‥‥・・・ 世の中に、こんなに美しくエロいゲームがあったなんて…… ●イリュージョンは、世界で唯一、バーチャルSEXをゲームとして販売しています。 このゲームでは、実在しているような女の子達と触れ合い、本物以上の快感を得られます。! 言葉だけじゃ伝わらないこの快感を【 無料 】体験版でお試し下さい! また、このページには過激な表現が多く含まれていますのでご注意下さい! http://www.illusion.jp/enter.html --------------------- ・配信停止はこちら  ainikite_daisuki at yahoo.co.jp From alltidkurd at thegardenofmemories.com Mon May 29 03:59:16 2006 From: alltidkurd at thegardenofmemories.com (Jorge Edwards) Date: Mon, 29 May 2006 07:59:16 -0300 Subject: [openib-general] cheap oem soft shipping //orldwide Message-ID: <000001c6833a$0c0a9600$0100007f@localhost> Special Offer Adobe Video Collection Adobe Premiere 1.5 Professional Adobe After Effects 6.5 Professional Adobe Audition 1.5 Adobe Encore DVD 1.5 $149.95 More Info >> Microsoft 2 in 1 MS Windows XP Pro MS Office 2003 Pro $99.95 More Info >> Microsoft + Adobe 3 in 1 MS Windows XP Pro MS Office 2003 Pro Adobe Acrobat 7.0 Professional $149.95 More Info >> Bestsellers Microsoft Office Professional Edition 2003 Rating: 6 reviews Retail price: $550.00 You save: $480.05 (87%) Our price: $69.95 [Add to cart] Microsoft Windows XP Professional Rating: 8 reviews Retail price: $200.00 You save: $150.05 (75%) Our price: $49.95 [Add to cart] Adobe Photoshop CS2 V 9.0 Rating: 3 reviews Retail price: $599.00 You save: $529.05 (88%) Our price: $69.95 [Add to cart] -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Mon May 29 04:07:26 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 29 May 2006 14:07:26 +0300 Subject: [openib-general] problems with git Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA70A9@mtlexch01.mtl.com> Hi Roland, I got errors trying to clone from git: $> git clone git://www.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git infiniband defaulting to local storage area warning: templates not found /home/dag/share/git-core/templates/ fatal: unable to connect a socket (Connection timed out) Do you have any ides or instructions to do it differently? Thanks Tziporet Koren Software Director Mellanox Technologies mailto: tziporet at mellanox.co.il Tel +972-4-9097200, ext 380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ambiversion at screensite.org Mon May 29 05:18:21 2006 From: ambiversion at screensite.org (josephryan brea) Date: Mon, 29 May 2006 12:18:21 +0000 Subject: [openib-general] If Barbie is so popular, why do you have to buy all her friends? Message-ID: <87529851.20060116054445@88.136.121.104> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: acamdcp.gif Type: image/gif Size: 8802 bytes Desc: not available URL: From rdreier at cisco.com Mon May 29 05:30:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 May 2006 05:30:08 -0700 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib References: <20060528131527.GX21266@mellanox.co.il> Message-ID: > When running ipoib on an ia64 machine, I sometimes see > kernel unaligned access to 0xe0000002ffb53f0c, ip=0xa00000020041b450 > kernel unaligned access to 0xe0000002ffb53f14, ip=0xa00000020041b470 > > where 0xa00000020041b450 and 0xa00000020041b470 appear to be > inside ipoib path_rec_create function. I don't see any obvious misaligned accesses in path_rec_create(). And unfortunately I don't have any IA64 machines handy to track this down. So you're on your own on this one... - R. From halr at voltaire.com Mon May 29 05:33:09 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 29 May 2006 08:33:09 -0400 Subject: [openib-general] Failed multicast join with new multicast module In-Reply-To: References: Message-ID: <1148905985.4358.78023.camel@hal.voltaire.com> On Sat, 2006-05-27 at 12:44, Sean Hefty wrote: > >I forget exactly what the strategy for this was before the multicast > >module was introduced: whether it was exponential backoff up to some > >limit, or whether it was linear up to some retry count. > > > >Also, in looking at the new multicast code, I see the following: > > > >static int retry_timer = 5000; /* 5 sec */ > >module_param(retry_timer, int, 0444); > >MODULE_PARM_DESC(retry_timer, "Time in ms between retried requests."); > > > >static int retries = 3; > >module_param(retries, int, 0444); > >MODULE_PARM_DESC(retries, "Number of times to retry a request."); > > > >so it appears that the multicast module has it's own retry strategy. Is > >that true ? If so, does this interact with IPoIB's for rerequesting or > >has that changed ? > > The multicast module uses its own retry strategy, basically just passing the > request down to the MAD layer. It should fail the join request to the user if > the retries are exceeded. I should have a userspace multicast test module by > the end of this coming week which will let me stress the multicast code more. > > Ipoib uses its own retry strategy, and I believe re-issues the request. Ipoib > uses an exponential backoff strategy, so it sounds like there's an issue with > the ipoib changes. Looking at the code, I need to understand how send-only > joins are retried. Send-only joins is another case. These are full member joins (JoinState 1) to groups which are not yet created so they fail. -- Hal > - Sean From ogerlitz at voltaire.com Mon May 29 05:51:16 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 29 May 2006 15:51:16 +0300 Subject: [openib-general] [PATCHv2 1/2] mthca support for max_map_per_fmr device attribute In-Reply-To: References: Message-ID: <447AEE44.1020703@voltaire.com> Or Gerlitz wrote: > OK, here's a modified version of the patch to support both mem full/free > HCAs, in the memfree case the code relies on this quote from the > (section 4.2 pp 45) memfree PRM: > > "A memory key is composed of two fields, a 24-bit index and an 8-bit > key. The key field is an arbitrarily chosen number. The index field is > unique number used as an index to an MPT table entry, ..." Just forgot to add that: i have tested this 2 element patch series with iSER doing FMR mapping via the FMR pool on top of svn 7031 over both memfull (Tavor) and memfree (Sinai). Or. From halr at voltaire.com Mon May 29 06:02:45 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 29 May 2006 09:02:45 -0400 Subject: [openib-general] [PATCH] OpenSM/osm_lid_mgr.c::__osm_lid_mgr_set_remote_pi_state_to_init: Handle NULL p_rem_physp Message-ID: <1148907763.4358.78699.camel@hal.voltaire.com> OpenSM/osm_lid_mgr.c::__osm_lid_mgr_set_remote_pi_state_to_init: Handle NULL p_rem_physp In osm_lid_mgr.c::__osm_lid_mgr_set_remote_pi_state_to_init, p_rem_physp can validly be NULL when the remote SMA is not responding (but physical link is up). This has been observed by Don Albert on OFED 1.0 RC5. Signed-off-by: Hal Rosenstock Index: opensm/osm_lid_mgr.c =================================================================== --- opensm/osm_lid_mgr.c (revision 7535) +++ opensm/osm_lid_mgr.c (working copy) @@ -931,7 +931,8 @@ __osm_lid_mgr_set_remote_pi_state_to_ini ib_port_info_t *p_pi; osm_physp_t *p_rem_physp = osm_physp_get_remote(p_physp); - CL_ASSERT(p_rem_physp); + if ( p_rem_physp == NULL ) + return; if (osm_physp_is_valid( p_rem_physp )) { From mst at mellanox.co.il Mon May 29 06:22:13 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 29 May 2006 16:22:13 +0300 Subject: [openib-general] Re: [PATCH] ipoib In-Reply-To: <200605281547.29313.eli@mellanox.co.il> References: <200605281547.29313.eli@mellanox.co.il> Message-ID: <20060529132213.GI21266@mellanox.co.il> Quoting r. Eli Cohen : > When ipoib_stop is called it first calls netif_stop_queue() to > stop the kernel from passing more packets to the network driver. > After that ipoib_ib_dev_stop() moves the QP to error causing all > pending work requests to complete with error but then the completion > handler may call netif_wake_queue() re-enabling packet transfer. BTW, Roland, this looks like a 2.6.17 material, doesn't it? It's causing real failures for us. -- MST From drkenozobia201 at yahoo.com Mon May 29 07:27:29 2006 From: drkenozobia201 at yahoo.com (drkenozobia201 Ozobia) Date: Mon, 29 May 2006 07:27:29 -0700 (PDT) Subject: [openib-general] From The Desk Of DR KEN OZOBIA (URGENT) Message-ID: <20060529142729.75423.qmail@web55515.mail.re4.yahoo.com> I have a new email address!You can now email me at: drkenozobia201 at yahoo.com >From The Desk Of DR KEN OZOBIA (URGENT) Federal Ministry of Works & Housing Ikoyi Lagos. Nigeria. Sir, First, I must solicit your strictest confidence in this transaction,this is by virtue of it's nature as been utterly confidential and top secret as you were introduced to us in confidence through the Nigerian/Gambian Chambers of Commerce, Foreign Trade division, during an official assignment to Banjul. We are top officials from the Federal Ministry of Works & Housing (FMW&H),Nigerian National Petroleum coporation(NNPC), Federal Ministry of Finance and the Presidency, making up the Contract Review Panel (CRP) set up by the Federal Government of Nigeria to review contracts awarded by the past Military administration. In the course of our work in the CRP, we discovered this fund which resulted from grossly over-invoiced contracts which were executed for the FMW&H during the last administration. The companies that executed the contract have been duly paid and the contracts commissioned leaving the sum of US$20.5M floating in the Escrow account of the Central Bank of Nigeria ready for payment. I have therefore been mandated as a matter of trust by my colleagues in the Panel to look for an over-seas partner to whom we could transfer the sum of US$20.5M by Legally subcontracting the contract entitlement to your company. This is bearing in mind that our civil service code of conduct forbids us from owning foreign companies or running foreign accounts while in Government service hence the need for an over-seas partner. We have agreed that the funds will be shared thus after it has been transferred into your account: (1) 30% of the money will go to you for acting as the beneficiary of the fund. (2) 65% to us the Government officials (with which we wish to commence an importation business in conjunction with you). (3) 5% has been set aside as an abstract projection for reimbursement to both parties for incidental expenses that may be incurred in the course of this transaction. All logistics are in place and all modalities worked out for the smooth conclusion of the transaction within ten to fourteen working days of commencement after receipt of the following information from you: Your company name, address, company's details and activities, telephone & fax numbers. These information will enable us make applications and lodge claims to the concerned Ministries & agencies in favor of your company and it is pertinent to state here that this transaction is entirely based on trust as the Solar Bank Draft or Certified Cheque will be drawable in any of the Central Bank of Nigeria correspondent Bankers. Yours faithfully, DR KEN OZOBIA - drkenozobia201 Ozobia -------------- next part -------------- An HTML attachment was scrubbed... URL: From eastydlaw at terra.com.mx Mon May 29 12:37:51 2006 From: eastydlaw at terra.com.mx (Easty David lawrence) Date: Mon, 29 May 2006 20:37:51 +0100 Subject: [openib-general] Hello Message-ID: <20060529145033.22FCB2283EA@openib.ca.sandia.gov> Good day, I am Mr. Easty David Lawrence,a Manager of the Broad Street Cahoot Holding & Finance Ltd in London. On June 6, 2000, a Consultant with the Exxon Mobil oil company in Malaysia, Mr. Alfred Carlson made a numbered time (Fixed) Deposit for twelve calendar months,to the tune of �14,000,000.00(Fourteen Million, Pounds Sterling) in my Branch. Upon maturity, I sent a routine notification to his forwarding address but got no reply. After a month, I wrote a reminder and finally I discovered from his employers, Exxon Mobil Corporation that Mr. Alfred Carlson died in an explosion while working on one of their Oil Rigs in Malaysia. On further investigation, I found out that he died without making a WILL. I therefore made additional investigation and discovered that Mr. Alfred Carlson did not declare any next of kin or relations in all his official documents, including his Bank Deposit papers in our possession. The sum of �15,000,000.00, which represents Capital and Interest on the deposit, has been carefully moved out of my bank to a security company for safekeeping. No one will ever come forward to claim it. According to our Banking Act of 1990, at the expiration of 6 (Six) years,the money will revert to the ownership of the Government if nobody applies to claim the fund. I would want you, as a person to stand in as his Next of Kin. You will then be entitled to the deposit. I will prepare all the paper work and necessary documents involved and as the manager of the branch and custodian of the deposit. I will ensure that all the documents regarding this Money reflect your name as the next of Kin to Late Mr. Alfred Carlson. I will share the money with you in the ratio 35:5:60. 35% for your effort,5%for logistics and 60% for me. I want to assure you that there is virtually no risk on your person as you will not be made to appear in person and all the paper work will be prepared in your name by me. To achieve this: you will first of all, assure me that you will maintain strict confidentiality on this information and must assure me that you will not make away with the money when it gets into your possession and that no harm will befall me when I join you to invest my share of the money. Your reply is urgently needed. Upon your response, I shall then provide you with more details. Reply to my confidential email eastydlawrence2 at yahoo.co.uk or you could fax me on +448701351830 Please observe utmost confidentiality. Awaiting your urgent reply. Regards. Easty David Lawrence From kyochan at walla.com Mon May 29 07:50:43 2006 From: kyochan at walla.com (kyochan at walla.com) Date: Mon, 29 May 2006 07:50:43 -0700 (PDT) Subject: [openib-general] =?utf-8?b?wpHCgcKLfcKCw4nCgsKywotMwpPDvMKCwq0=?= =?utf-8?b?woLCvsKCwrPCgsKi?= Message-ID: 20030924063707.96346mail@mail.hyper_luckylady8754158754_lookserver772_serebusystem03_heavensgift55.st �M�a�̏Љ�˗����ɋL���R�ꂪ����܂����B ���̂܂܂ł��Ƌt�����ΏۂɂȂ�܂���̂Ő��m�ɏЉ�˗����L�����ς܂��������B http://heavensgift.st/h/ ���M�a�w������U���݉”\�ȏ��������ł� ��������敥����s�U���”\�V�X�e������ �����݁i2006-01�j�̋t�������ϑ����7.9���~�`20.2���~�ł��B ��PC �g�сA���p�”\�ł��B �܂��͏Љ�˗����L�����炨�肢�v���܂��B http://heavensgift.st/h/ From necojp at citiz.net Mon May 29 08:04:05 2006 From: necojp at citiz.net (=?gb2312?B?aW5mb3JtYXRpb24=?=) Date: Mon, 29 May 2006 08:04:05 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCIXsheztJN2NFKiRKRnwbKEI=?= =?iso-2022-jp?b?GyRCITkkckF3JGokXiQ7JHMkKyF7IXsbKEI=?= Message-ID: <20060529150405.618AC2283EA@openib.ca.sandia.gov>   癒しを求める女性が急増中   http://www.deai-allfree.net/?bid17   割り切り・逆援交際で生活を変えてみませんか?   割り切ったお付き合いで、後腐れのない関係を楽しんでみたい方   逆援で出会い、長期契約で秘密の関係を希望してる方・・・   ★男女共に、割り切りで刺激的な出会いをしたい方募集中★    http://www.deai-allfree.net/?bid17    只今、一万円プレゼント中。詳しくはサイト内にてご案内しております。   登録料は完全無料。   男女の素敵な出会いに是非、ご活用下さい。     ↓refuse an email↓    info_partynews at yahoo.com    -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevonbartley at poczta.onet.pl Mon May 29 06:55:29 2006 From: bevonbartley at poczta.onet.pl (catherin gray) Date: Mon, 29 May 2006 13:55:29 +0000 Subject: [openib-general] YOU keep the profits Message-ID: <006801c68327$1c654580$7af7a737@bbhluq> You've been selected to Play at the HI-ROLLER CASIN0! * Up to $888 real money to gamble with * Play at no-cost to see how exciting and easy it is * Rapid payouts to all clients & 24/7 support http://cedrid.com/casino/ sweat shop lime-white pressure tube Pseudo-isidore gentleman-recusant self-opiniated half-republican rattlesnake root thunder-breathing F-sharp stone-dead tear-derived grease moth muscle-building cholera infantum censer pot dense-wooded white-spotted nid-nod long-finned dry-pick wind-worn Joe miller gum stick -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon May 29 08:15:47 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 29 May 2006 18:15:47 +0300 Subject: [openib-general] [PATCH] ipoib: fix ah leak at interface down In-Reply-To: <200605281547.29313.eli@mellanox.co.il> References: <200605281547.29313.eli@mellanox.co.il> Message-ID: <20060529151547.GO21266@mellanox.co.il> Ugh, it seems Eli's client has wrapped some lines again. Here it is in a format that actually can be applied, and a slighly more detailed description of the problem it fixes for us. If this makes sense, please push into 2.6.17. --- When ipoib_stop is called it first calls netif_stop_queue() to stop the kernel from passing more packets to the network driver. However, the completion handler may call netif_wake_queue() re-enabling packet transfer. This might result in leaks (we see ah leaks which we think can be attributed to this bug) as new packets get posted while the interface is going down. Signed-off-by: Eli Cohen Signed-off-by: Michael Tsirkin Index: linux-2.6.16/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- linux-2.6.16.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2006-05-28 10:38:01.000000000 +0300 +++ linux-2.6.16/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2006-05-29 17:28:12.000000000 +0300 @@ -269,6 +269,7 @@ static void ipoib_ib_handle_wc(struct ne spin_lock_irqsave(&priv->tx_lock, flags); ++priv->tx_tail; if (netif_queue_stopped(dev) && + test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) && priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1) netif_wake_queue(dev); spin_unlock_irqrestore(&priv->tx_lock, flags); -- MST From codigoda at gmail.com Mon May 29 08:18:26 2006 From: codigoda at gmail.com (codigoda) Date: Mon, 29 May 2006 15:18:26 GMT Subject: [openib-general] Saiba tudo !! sobre Codigo da Vinci.. Message-ID: <20060529153111.87EA2228591@openib.ca.sandia.gov> An HTML attachment was scrubbed... URL: From andrew at ourneighborhoodnews.com Mon May 29 08:24:45 2006 From: andrew at ourneighborhoodnews.com (Henry Howard) Date: Mon, 29 May 2006 23:24:45 +0800 Subject: [openib-general] Buy OEM Software Message-ID: <000001c6835f$0f644880$0100007f@localhost> Special Offer Adobe Video Collection Adobe Premiere 1.5 Professional Adobe After Effects 6.5 Professional Adobe Audition 1.5 Adobe Encore DVD 1.5 $149.95 More Info >> Microsoft 2 in 1 MS Windows XP Pro MS Office 2003 Pro $99.95 More Info >> Microsoft + Adobe 3 in 1 MS Windows XP Pro MS Office 2003 Pro Adobe Acrobat 7.0 Professional $149.95 More Info >> Bestsellers Microsoft Office Professional Edition 2003 Rating: 6 reviews Retail price: $550.00 You save: $480.05 (87%) Our price: $69.95 [Add to cart] Microsoft Windows XP Professional Rating: 8 reviews Retail price: $200.00 You save: $150.05 (75%) Our price: $49.95 [Add to cart] Adobe Photoshop CS2 V 9.0 Rating: 3 reviews Retail price: $599.00 You save: $529.05 (88%) Our price: $69.95 [Add to cart] -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon May 29 08:53:50 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 29 May 2006 18:53:50 +0300 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: References: <20060528131527.GX21266@mellanox.co.il> Message-ID: <20060529155350.GR21266@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: ia64: kernel unaligned access in ipoib > > > When running ipoib on an ia64 machine, I sometimes see > > kernel unaligned access to 0xe0000002ffb53f0c, ip=0xa00000020041b450 > > kernel unaligned access to 0xe0000002ffb53f14, ip=0xa00000020041b470 > > > > where 0xa00000020041b450 and 0xa00000020041b470 appear to be > > inside ipoib path_rec_create function. > > I don't see any obvious misaligned accesses in path_rec_create(). And > unfortunately I don't have any IA64 machines handy to track this down. > So you're on your own on this one... It seems Jack has found it (see below). Since you cast the pointer to struct ib_gid type, and then pass a struct pointer to memcpy, it assumes the address is naturally aligned and replaces memcpy with inline st8/ld8 instructions which can't operate on misaligned addresses. Here's a list of places that cast a misaligned address to union ib_gid: ~>grep -n -e '->ha + 4' *c ipoib_main.c:507: path = __path_find(dev, (union ib_gid *) (skb->dst->neighbour->ha + 4)); ipoib_main.c:510: (union ib_gid *) (skb->dst->neighbour->ha + 4)); ipoib_main.c:560: ipoib_mcast_send(dev, (union ib_gid *) (skb->dst->neighbour->ha + 4), skb); ipoib_main.c:776: IPOIB_GID_ARG(*((union ib_gid *) (n->ha + 4)))); So, I think the fix will be 1. pass gid inside a void * without cast so that the compiler does not assume its aligned 2. fix IPOIB_GID_FMT/IPOIB_GID_ARG to read the gid byte by byte and not by 16 byte chunks ----------- From: "Jack Morgenstein" Subject: Possible IA64 unaligned access? file: ipoib_main.c, proc neigh_add_path: Parameters to path_rec_create might be 4-byte aligned! if (!path) { path path_rec_create(dev, (union ib_gid *) (skb->dst->neighbour->ha + 4)); if (!path) goto err_path; __path_add(dev, path); } -- MST From rdreier at cisco.com Mon May 29 09:11:03 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 May 2006 09:11:03 -0700 Subject: [openib-general] Re: problems with git In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA70A9@mtlexch01.mtl.com> (Tziporet Koren's message of "Mon, 29 May 2006 14:07:26 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA70A9@mtlexch01.mtl.com> Message-ID: > git clone git://www.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git infiniband > warning: templates not found /home/dag/share/git-core/templates/ Looks like you have a slight issue with your git installation... > fatal: unable to connect a socket (Connection timed out) And you are not able to connect to the git server. In your command you are using a URL like git://www.kernel.org/..., which I'm not sure will work -- www.kernel.org will likely point to a local mirror, which is likely not running a native git server. So you can try two things: 1) replace www.kernel.org with git.kernel.org -- this is the best thing to do, as git native protocol will work better than http. 2) If your firewall is blocking git:// URLs, then try replacing the git:// with http:// (but leave the server as www.kernel.org). - R. From rdreier at cisco.com Mon May 29 09:12:02 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 May 2006 09:12:02 -0700 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: <20060529155350.GR21266@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 29 May 2006 18:53:50 +0300") References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> Message-ID: Michael> It seems Jack has found it (see below). Since you cast Michael> the pointer to struct ib_gid type, and then pass a struct Michael> pointer to memcpy, it assumes the address is naturally Michael> aligned and replaces memcpy with inline st8/ld8 Michael> instructions which can't operate on misaligned addresses. Hmm, that's strange behavior (I would expect memcpy to work regardless of alignment), but I guess it's technically a valid optimization. I'll post a patch tomorrow when I'm back at work (today is Memorial Day in the US). - R. From mst at mellanox.co.il Mon May 29 09:14:05 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 29 May 2006 19:14:05 +0300 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: <20060529155350.GR21266@mellanox.co.il> References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> Message-ID: <20060529161405.GT21266@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Here's a list of places that cast a misaligned address to union ib_gid: > > ~>grep -n -e '->ha + 4' *c > > ipoib_main.c:507: path = __path_find(dev, (union ib_gid *) (skb->dst->neighbour->ha + 4)); > ipoib_main.c:510: (union ib_gid *) (skb->dst->neighbour->ha + 4)); > ipoib_main.c:560: ipoib_mcast_send(dev, (union ib_gid *) (skb->dst->neighbour->ha + 4), skb); > ipoib_main.c:776: IPOIB_GID_ARG(*((union ib_gid *) (n->ha + > 4)))); > > So, I think the fix will be > 1. pass gid inside a void * without cast so that the compiler does not assume > its aligned > 2. fix IPOIB_GID_FMT/IPOIB_GID_ARG to read the gid byte by byte and not > by 16 byte chunks We will be testing the following patch, will let you know later: its a bit big but the change itself is trivial. --- Fix misaligned access faults on ia64: never cast a misaligned ha + 4 pointer to union ib_gid type, pass a void * pointer instead. Note that the cast in IPOIB_GID_ARG is safe, since we fixed it to only access each byte separately. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: openib/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- openib/drivers/infiniband/ulp/ipoib/ipoib_main.c (revision 7541) +++ openib/drivers/infiniband/ulp/ipoib/ipoib_main.c (working copy) @@ -190,8 +190,7 @@ static int ipoib_change_mtu(struct net_d return 0; } -static struct ipoib_path *__path_find(struct net_device *dev, - union ib_gid *gid) +static struct ipoib_path *__path_find(struct net_device *dev, void *gid) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct rb_node *n = priv->path_tree.rb_node; @@ -201,7 +200,7 @@ static struct ipoib_path *__path_find(st while (n) { path = rb_entry(n, struct ipoib_path, rb_node); - ret = memcmp(gid->raw, path->pathrec.dgid.raw, + ret = memcmp(gid, path->pathrec.dgid.raw, sizeof (union ib_gid)); if (ret < 0) @@ -430,8 +429,7 @@ static void path_rec_completion(int stat } } -static struct ipoib_path *path_rec_create(struct net_device *dev, - union ib_gid *gid) +static struct ipoib_path *path_rec_create(struct net_device *dev, void *gid) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_path *path; @@ -446,7 +444,7 @@ static struct ipoib_path *path_rec_creat INIT_LIST_HEAD(&path->neigh_list); - memcpy(path->pathrec.dgid.raw, gid->raw, sizeof (union ib_gid)); + memcpy(path->pathrec.dgid.raw, gid, sizeof (union ib_gid)); path->pathrec.sgid = priv->local_gid; path->pathrec.pkey = cpu_to_be16(priv->pkey); path->pathrec.numb_path = 1; @@ -504,10 +502,9 @@ static void neigh_add_path(struct sk_buf */ spin_lock(&priv->lock); - path = __path_find(dev, (union ib_gid *) (skb->dst->neighbour->ha + 4)); + path = __path_find(dev, skb->dst->neighbour->ha + 4); if (!path) { - path = path_rec_create(dev, - (union ib_gid *) (skb->dst->neighbour->ha + 4)); + path = path_rec_create(dev, skb->dst->neighbour->ha + 4); if (!path) goto err_path; @@ -557,7 +554,7 @@ static void ipoib_path_lookup(struct sk_ /* Add in the P_Key for multicasts */ skb->dst->neighbour->ha[8] = (priv->pkey >> 8) & 0xff; skb->dst->neighbour->ha[9] = priv->pkey & 0xff; - ipoib_mcast_send(dev, (union ib_gid *) (skb->dst->neighbour->ha + 4), skb); + ipoib_mcast_send(dev, skb->dst->neighbour->ha + 4, skb); } static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, @@ -572,10 +569,9 @@ static void unicast_arp_send(struct sk_b */ spin_lock(&priv->lock); - path = __path_find(dev, (union ib_gid *) (phdr->hwaddr + 4)); + path = __path_find(dev, phdr->hwaddr + 4); if (!path) { - path = path_rec_create(dev, - (union ib_gid *) (phdr->hwaddr + 4)); + path = path_rec_create(dev, phdr->hwaddr + 4); if (path) { /* put pseudoheader back on for next time */ skb_push(skb, sizeof *phdr); @@ -666,7 +662,7 @@ static int ipoib_start_xmit(struct sk_bu phdr->hwaddr[8] = (priv->pkey >> 8) & 0xff; phdr->hwaddr[9] = priv->pkey & 0xff; - ipoib_mcast_send(dev, (union ib_gid *) (phdr->hwaddr + 4), skb); + ipoib_mcast_send(dev, phdr->hwaddr + 4, skb); } else { /* unicast GID -- should be ARP or RARP reply */ Index: openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision 7541) +++ openib/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -153,7 +153,7 @@ static struct ipoib_mcast *ipoib_mcast_a return mcast; } -static struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, union ib_gid *mgid) +static struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, void *mgid) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct rb_node *n = priv->multicast_tree.rb_node; @@ -164,7 +164,7 @@ static struct ipoib_mcast *__ipoib_mcast mcast = rb_entry(n, struct ipoib_mcast, rb_node); - ret = memcmp(mgid->raw, mcast->mcmember.mgid.raw, + ret = memcmp(mgid, mcast->mcmember.mgid.raw, sizeof (union ib_gid)); if (ret < 0) n = n->rb_left; @@ -639,8 +639,7 @@ static int ipoib_mcast_leave(struct net_ return 0; } -void ipoib_mcast_send(struct net_device *dev, union ib_gid *mgid, - struct sk_buff *skb) +void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_mcast *mcast; @@ -675,7 +674,7 @@ void ipoib_mcast_send(struct net_device } set_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags); - mcast->mcmember.mgid = *mgid; + memcpy(mcast->mcmember.mgid.raw, mgid, sizeof (union ib_gid)); __ipoib_mcast_add(dev, mcast); list_add_tail(&mcast->list, &priv->multicast_list); } Index: openib/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- openib/drivers/infiniband/ulp/ipoib/ipoib.h (revision 7541) +++ openib/drivers/infiniband/ulp/ipoib/ipoib.h (working copy) @@ -278,8 +278,7 @@ int ipoib_dev_init(struct net_device *de void ipoib_dev_cleanup(struct net_device *dev); void ipoib_mcast_join_task(void *dev_ptr); -void ipoib_mcast_send(struct net_device *dev, union ib_gid *mgid, - struct sk_buff *skb); +void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb); void ipoib_mcast_restart_task(void *dev_ptr); int ipoib_mcast_start_thread(struct net_device *dev); @@ -375,15 +374,24 @@ extern int ipoib_debug_level; #endif /* CONFIG_INFINIBAND_IPOIB_DEBUG_DATA */ -#define IPOIB_GID_FMT "%x:%x:%x:%x:%x:%x:%x:%x" +#define IPOIB_GID_FMT "%2.2x%2.2x:%2.2x%2.2x:%2.2x%2.2x:%2.2x%2.2x:" + "%2.2x%2.2x:%2.2x%2.2x:%2.2x%2.2x:%2.2x%2.2x" -#define IPOIB_GID_ARG(gid) be16_to_cpup((__be16 *) ((gid).raw + 0)), \ - be16_to_cpup((__be16 *) ((gid).raw + 2)), \ - be16_to_cpup((__be16 *) ((gid).raw + 4)), \ - be16_to_cpup((__be16 *) ((gid).raw + 6)), \ - be16_to_cpup((__be16 *) ((gid).raw + 8)), \ - be16_to_cpup((__be16 *) ((gid).raw + 10)), \ - be16_to_cpup((__be16 *) ((gid).raw + 12)), \ - be16_to_cpup((__be16 *) ((gid).raw + 14)) +#define IPOIB_GID_ARG(gid) (gid).raw[0], \ + (gid).raw[1], \ + (gid).raw[2], \ + (gid).raw[3], \ + (gid).raw[4], \ + (gid).raw[5], \ + (gid).raw[6], \ + (gid).raw[7], \ + (gid).raw[8], \ + (gid).raw[9], \ + (gid).raw[10],\ + (gid).raw[11],\ + (gid).raw[12],\ + (gid).raw[13],\ + (gid).raw[14],\ + (gid).raw[15] #endif /* _IPOIB_H */ -- MST From rdreier at cisco.com Mon May 29 09:13:04 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 May 2006 09:13:04 -0700 Subject: [openib-general] Re: [PATCH] ipoib: fix ah leak at interface down In-Reply-To: <20060529151547.GO21266@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 29 May 2006 18:15:47 +0300") References: <200605281547.29313.eli@mellanox.co.il> <20060529151547.GO21266@mellanox.co.il> Message-ID: Michael> If this makes sense, please push into 2.6.17. Yes, looks OK for 2.6.17. Out of curiousity: Michael> This might result in leaks (we see ah leaks which we Michael> think can be attributed to this bug) as new packets get Michael> posted while the interface is going down. with this patch applied, do the leaks go away? - R. From mst at mellanox.co.il Mon May 29 09:15:58 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 29 May 2006 19:15:58 +0300 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> Message-ID: <20060529161558.GU21266@mellanox.co.il> Quoting r. Roland Dreier : > I'll post a patch tomorrow when I'm back at work (today is Memorial > Day in the US). We've written up a patch with Jack - do you want us to test it or prefer to re-write it yourself? -- MST From rdreier at cisco.com Mon May 29 09:15:25 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 May 2006 09:15:25 -0700 Subject: [openib-general] Re: [PATCH] ibsrpdm - allocate agent In-Reply-To: <20060528142712.GB14812@mellanox.co.il> (Ishai Rabinovitz's message of "Sun, 28 May 2006 17:27:12 +0300") References: <20060528142712.GB14812@mellanox.co.il> Message-ID: Thanks, applied. From mst at mellanox.co.il Mon May 29 09:18:13 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 29 May 2006 19:18:13 +0300 Subject: [openib-general] Re: [PATCH] ipoib: fix ah leak at interface down In-Reply-To: References: <200605281547.29313.eli@mellanox.co.il> <20060529151547.GO21266@mellanox.co.il> Message-ID: <20060529161813.GV21266@mellanox.co.il> Quoting r. Roland Dreier : > Michael> This might result in leaks (we see ah leaks which we > Michael> think can be attributed to this bug) as new packets get > Michael> posted while the interface is going down. > > with this patch applied, do the leaks go away? We don't know yet, it is only seen on 4-way machines and we didn't yet get a time slot on these to re-test with the patch. -- MST From rdreier at cisco.com Mon May 29 09:16:55 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 May 2006 09:16:55 -0700 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: <20060529161558.GU21266@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 29 May 2006 19:15:58 +0300") References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> <20060529161558.GU21266@mellanox.co.il> Message-ID: Michael> We've written up a patch with Jack - do you want us to Michael> test it or prefer to re-write it yourself? Go ahead and test it -- I replied before I saw your patch. - R. From mst at mellanox.co.il Mon May 29 09:19:24 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 29 May 2006 19:19:24 +0300 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> <20060529161558.GU21266@mellanox.co.il> Message-ID: <20060529161924.GW21266@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: ia64: kernel unaligned access in ipoib > > Michael> We've written up a patch with Jack - do you want us to > Michael> test it or prefer to re-write it yourself? > > Go ahead and test it -- I replied before I saw your patch. OK, hope to let you know tomorrow. -- MST From colnohdaejung_rtd1 at yahoo.com.hk Mon May 29 09:19:57 2006 From: colnohdaejung_rtd1 at yahoo.com.hk (=?big5?q?Col.=20Noh=20Dae-Jung=20(rtd)?=) Date: Mon, 29 May 2006 09:19:57 -0700 (PDT) Subject: [openib-general] Email from Col. Noh Dae-Jung (rtd) Message-ID: <20060529161957.93865.qmail@web55401.mail.re4.yahoo.com> 我有新的電郵地址!你現可電郵給我:colnohdaejung_rtd1 at yahoo.com.hk Greetings from me and my family. Getting your contact was not an easy task because since I am not computer literate, I ordered my son to seek a partner very far away and he went to the institute of International Business to apply and he paid them the charges. My name is NOH DAE-JUNG, The brother of GENERAL. NOH TAE-WOO, the former President of South Korea who ascended to power through a popular electoral victory in 1988 to 1992. After serving duly, he was accused of mass embezzlement which ran into billions of won because of the GREAT OLYMPICS 1988 which brought Korea to the world lime light, and was arrested but released after long years of incarceration and now under scrutiny. Before my brother's was overthrown, I secretly siphoned the sum of 30 mil United States Currency out of Seoul and deposited the money with a security firm that transports valuable goods and through diplomatic means.I am contacting you because I want you to deal with the security company and claim the money on my behalf since I have declared that the consignment belong to my foreign business partner. You shall also be required to assist me in investment in your country. I expect you to declare what percentage of the total money you will take for your assistance. When I receive your positive response I will let you know where the security company is and the payment pin code to claim the money which is very important.we do not want the government of my Country to know about the money because they will believe I got the money from my brother while he was still in office as president .Once you confirm the receipt of the money ,I will come over with my Children to your Country or any Country in Europe to start a new life with my Family. As soon as payment is effected, and the amount mentioned above is successfully transferred into your account, we intend to use our own share in acquiring some estates abroad. For this too you shall also be our overseas manager of all our properties and you will be paid based on a certain percentage agreed on by both parties. For now, let all our communication be by e-mail because my line is right now connected to the South Korean Telecommunication Network services therefore we can not take the chances of being heard. Thank you in anticipation of your cooperation. Yours faithfully, Noh Dae-Jung. - Col. Noh Dae-Jung (rtd) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon May 29 09:34:46 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 29 May 2006 19:34:46 +0300 Subject: [openib-general] mthca: alloc->last question Message-ID: <20060529163446.GX21266@mellanox.co.il> Hello, Roland! Looking at mthca_allocator.c, alloc->last field is initialized to 0 and seems to always stay at 0. What purpose does this field serve? -- MST From codigoda at gmail.com Mon May 29 09:38:53 2006 From: codigoda at gmail.com (codigoda) Date: Mon, 29 May 2006 16:38:53 GMT Subject: [openib-general] Saiba tudo !! sobre Codigo da Vinci.. Message-ID: <20060529165137.E55B92283EA@openib.ca.sandia.gov> An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon May 29 09:47:43 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 May 2006 09:47:43 -0700 Subject: [openib-general] Re: IB/iser fix for building with DEBUG_SCSI defined In-Reply-To: (Or Gerlitz's message of "Sun, 28 May 2006 14:08:10 +0300 (IDT)") References: Message-ID: I merged this into the iser branch. BTW the whitespace in the email got really messed up, so I just applied by hand. - R. From rdreier at cisco.com Mon May 29 09:48:23 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 May 2006 09:48:23 -0700 Subject: [openib-general] Re: [PATCHv2 1/2] mthca support for max_map_per_fmr device attribute In-Reply-To: (Or Gerlitz's message of "Sun, 28 May 2006 15:02:19 +0300 (IDT)") References: Message-ID: This looks pretty OK. But your email client is wrapping lines everywhere. Can you resend so I can apply the patches? From rdreier at cisco.com Mon May 29 09:53:43 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 May 2006 09:53:43 -0700 Subject: [openib-general] Re: mthca: alloc->last question In-Reply-To: <20060529163446.GX21266@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 29 May 2006 19:34:46 +0300") References: <20060529163446.GX21266@mellanox.co.il> Message-ID: Michael> Hello, Roland! Looking at mthca_allocator.c, alloc->last Michael> field is initialized to 0 and seems to always stay at 0. Look in mthca_free(). last is a hint about where to start looking for a free entry, and the last freed object is a good place to start. - R. From BrendaClements at computercllphone.com Mon May 29 10:58:53 2006 From: BrendaClements at computercllphone.com (Katherine Hyde) Date: Mon, 29 May 2006 07:58:53 -1000 Subject: [openib-general] MS Project 2003 Server - Full Version Message-ID: <694720D9.97456D5@computercllphone.com> An HTML attachment was scrubbed... URL: From rdreier at cisco.com Mon May 29 10:07:12 2006 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 May 2006 10:07:12 -0700 Subject: [openib-general] Re: [PATCH] SRP : Use correct port identifier format according to target io_class In-Reply-To: (Ramachandra Kuchimanchi's message of "Fri, 26 May 2006 13:31:35 -0400") References: Message-ID: Overall seems OK. Some comments: > + } > + else { should be written just as } else { (this appears quite a few times) > { SRP_OPT_MAX_CMD_PER_LUN, "max_cmd_per_lun=%d" }, > + { SRP_OPT_IO_CLASS, "io_class=%x" }, > { SRP_OPT_ERR, NULL } please keep the formatting consistent here. > + target->io_class = (unsigned short)(token); why is the cast needed here? > + /*Set default IO class of target to Rev 16A*/ > + target->io_class = SRP_REV16A_IO_CLASS; just delete this comment -- anyone who can't figure out what the next line is doing probably won't be able to figure out the comment either. > +#define SRP_REV10_IO_CLASS 0xFF00 > +#define SRP_REV16A_IO_CLASS 0x0100 I think these should be in an enum in , since they're generic constants from the SRP spec. Can you regenerate the patch and resend? Thanks, Roland From blasy at ibfletcher.com Mon May 29 10:41:45 2006 From: blasy at ibfletcher.com (Jernej Blassingame) Date: Mon, 29 May 2006 10:41:45 -0700 Subject: [openib-general] Re: 57 azimut Message-ID: <000001c68347$27385e00$2580a8c0@nod40> Hi, C ? A L i S S O M ^ P R O Z ^ C L E V ? T R A X ^ N A X M E R ? D i A V ? A G R A A M B / E N V A L / U M http://www.teronforte.com The spiders saw the sword, though I dont suppose they knew what it was, and at once the whole lot of them came hurrying after the hobbit along the ground and the branches, hairy legs waving, nippers and spinners snapping, eyes popping, full of froth and rage. They followed him into the forest until Bilbo had gone as far as he dared. Then quieter than a mouse he stole back. He had precious little time, -------------- next part -------------- An HTML attachment was scrubbed... URL: From hitozumayu-waku at hitmail.cc Mon May 29 11:18:18 2006 From: hitozumayu-waku at hitmail.cc (hitozumayu-waku at hitmail.cc) Date: Mon, 29 May 2006 11:18:18 -0700 (PDT) Subject: [openib-general] =?utf-8?b?wpBswo3DiMKXVcKYZsKLw6TCinnClcKUwoI=?= =?utf-8?b?w4zCj8K1wpHDksKPw7PCgsKqwpPDjcKCwqLCgsOEwoLCosKCw5zCgsK3?= Message-ID: 20060530030339.49775mail@mail.love-woman889889_gogo-server114_freesystem01_freefree-lovelove.tv ����ɂ���, �l�ȗU�f��y���^�c�����ǂł��B ����������Ȃ��� �\�[�V�����l�b�g���[�L���O�T�C�g�l�ȗU�f��y���֏��҂��Ă��܂��B ���b�Z�[�W�F���҂��܂����B ���L��URL���o�^��ʂ������������܂��B �������炩��g�b�v�y�[�W���邱�Ƃ��ł��܂��B http://r-20.com/index.html?media=pc439 �������@�l�ȗU�f��y�����ĉ��H�@������ �l�ȗU�f��y���́A�����o�[��菵�҂��ꂽ���݂̂ō\������Ă���A �ŋߗ��s�̃\�[�V�����l�b�g���[�L���O�T�C�g�ł��B ���l�ȗU�f��y���Ȃ炱��܂ňȏ�ɏo��������ł��� �M���ł���F�B�A���l�A���l�A���t�������̊�������}�邳�܂��܂� �c�[�����p�ӂ��Ă��܂��B ���݂�Ȃƌ𗬂ł��� http://r-20.com/index.html?media=pc439 �l�ȗU�f��y����g���ΐl�ȓ��m�̃l�b�g���[�N���ǂ��ė���p�[�e�B�Ȃǂ̌𗬂� �ȒP�ɂł��܂��B�����ɂ͂��Ȃ��̃p�[�g�i�[����q����M���ł���l�b�g���[�N�� �`������Ă��܂��B�l�ȗU�f��y���͂ǂ����Ōq�����Ă���l���m���W�܂�o������T�C�g �ł���A���ꂪ�l�ȗU�f��y���̓����ł��B ���l�ȗU�f��y���Ȃ�ʐ^�A�v���t�B�[���̓ǂݏ����A���J���ł��� http://r-20.com/index.html?media=pc439 �݂Ȃ���̓v���t�B�[���A�ʐ^����J���邱�Ƃɂ���ēo�^���Ă���l�X�ɑ����� ���𔭐M���邱�Ƃ��”\�ł��B ���p�A�o�^���͖����ł��B �l�ȗU�f��y���֎Q���� http://r-20.com/index.html?media=pc439 ����ł́A�Q����S��肨�҂����Ă���܂� From spi at kenzer.com Mon May 29 11:13:15 2006 From: spi at kenzer.com (Bernetta Spitz) Date: Mon, 29 May 2006 11:13:15 -0700 Subject: [openib-general] Re: 260 airfoi Message-ID: <000001c6834b$8e095540$58cfa8c0@mfx11> Hi, S O M ^ L E V ? T R A V ? A G R A P R O Z ^ C M E R ? D i A C ? A L i S V A L / U M X ^ N A X A M B / E N http://www.teronforte.com some meal or other; but that only made him miserabler. He could not think what to do; nor could he think what had happened; or why he had been left behind; or why, if he had been left behind, the goblins had not caught him; or even why his head was so sore. The truth was he had been lying quiet, out of sight and out of mind, in a very dark corner for a long while. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pqfulha at logica.com Wed May 24 21:43:48 2006 From: pqfulha at logica.com (Wentorf Michaela) Date: Wed, 24 May 2006 20:43:48 -0800 Subject: [openib-general] Michael meinte dieser Shop ist der beste Message-ID: <007401c67fb3$064e34e0$287a8982@ksvmnt> Guten Morgen Regina, Michael meinte dieser Shop ist der beste http://wswldo.hostssoft.com/?eksfnvmqteim gut Lukas Hahn von Fall... Regina geklettert was, und Fliessheck Forderungsübernahme in aber Flusswasseramsel Regina nach Deckenhalter an Das müsste reichen. gehäuft Datenerhebung wie Bauakustik. -------------- next part -------------- An HTML attachment was scrubbed... URL: From or.gerlitz at gmail.com Mon May 29 13:15:06 2006 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Mon, 29 May 2006 22:15:06 +0200 Subject: [openib-general] Re: [PATCHv2 1/2] mthca support for max_map_per_fmr device attribute In-Reply-To: References: Message-ID: <15ddcffd0605291315k6408424bk6062b6b8211686cc@mail.gmail.com> On 5/29/06, Roland Dreier wrote: > This looks pretty OK. But your email client is wrapping lines > everywhere. Can you resend so I can apply the patches? OK, I will check what's going on with my mailer and resend tomorrow Or. From collectible at ffgg.com Mon May 29 15:29:24 2006 From: collectible at ffgg.com (Hi-Roller Casino) Date: Tue, 30 May 2006 04:29:24 +0600 Subject: [openib-general] The hottest online casino! Message-ID: <10118.muezzin@saponify> An HTML attachment was scrubbed... URL: From maripumaripu at excite.co.jp Mon May 29 05:28:27 2006 From: maripumaripu at excite.co.jp (=?iso-2022-jp?B?GyRCIVobKEIxMDAwMDAwMBskQjFfIVsbKEI=?=) Date: Mon, 29 May 2006 21:28:27 +0900 Subject: [openib-general] =?iso-2022-jp?b?GyRCQ2pBKiRPJDQkNiQkJF4bKEI=?= =?iso-2022-jp?b?GyRCJDskcyEqGyhC?= Message-ID: <20060529215236.4426C2283EA@openib.ca.sandia.gov> _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ 私たちとご一緒に体験しませんか?年収一千万以上のセレブな女性ばかりが集まって 立ち上げたクラブが「Sクラス」 http://sclass.cx/c/entry.html です。 デザイナーやショップ経営者、医者や会社経営者、ビルオーナーや資産家の方など、 職業も年齢もさまざまな女性が出会いを求めて入会しています。 みなさん男性に金銭的な負担をかけたくないと考えているばかりなので、 デート費用などの心配は必要ありません。 もしご希望なら女性会員の方とのご相談となりますが、サポートのご希望をお受けに なっていらっしゃる方も多数在籍しています。 「Sクラス」  http://sclass.cx/c/entry.html はお金よりも出会いを求めている 女性の方のためのクラブです。 やはり女性には、男性との出会いがあってこその幸せな人生といえるのです。 みなさん真剣に出会いを求めている方ばかりですので、必ず貴方のためになる出会いが見つかります。 「Sクラス」 http://sclass.cx/c/entry.html では女性の入会希望の方に年収などの審査を行い、 身元等、当クラブの入会条件に当てはまる方のみが入会しています。 ―――――――――当クラブに籍を置く100人の女性に聞いてみました―――――――― [年収] 1000万以上→54人 2000万以上→29人 5000万以上→14人 1億円以上→2人 10億円以上→1人 ……………………………………………………………………………………………………… [職業] 会社経営→67人(各業種を含める) 主婦→25人 家事手伝い→2人 その他→6人 ……………………………………………………………………………………………………… [報酬] 男性に謝礼を支払っても良い→100人 難しい→0人 ……………………………………………………………………………………………………… [肉体関係] クラブで知り合った男性と肉体関係を持ちたい→99人 フィーリングによる→1人 ――――――――――――――――――――――――――――――――――――――― 出会いを求めている女性の為に貴方のご参加をお待ちしています。          姫野 未耶 「Sクラス」 http://sclass.cx/c/entry.html _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ From john at englishforum.biz Mon May 29 10:21:58 2006 From: john at englishforum.biz (Rogert) Date: Mon, 29 May 2006 18:21:58 +0100 Subject: [openib-general] Medicines for men before Valentine Day !!! Message-ID: <000001c6836e$4c840700$0100007f@Ashley> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled-2.jpg Type: image/jpeg Size: 20429 bytes Desc: not available URL: From doruroes at kopensky.com Mon May 29 15:20:46 2006 From: doruroes at kopensky.com (Dora Roeser) Date: Mon, 29 May 2006 15:20:46 -0700 Subject: [openib-general] Re: 834 reticulate Message-ID: <000001c6836e$2227c230$a115a8c0@kts50> Hi, X ^ N A X V A L / U M V ? A G R A M E R ? D i A S O M ^ A M B / E N C ? A L i S P R O Z ^ C L E V ? T R A http://www.arterremai.com could be Beorn in enchanted shape, and if he would come in as a bear and kill them. He dived under the blankets and hid his head, and fell asleep again at last in spite of his fears. It was full morning when he awoke. One of the dwarves had fallen over him in the shadows where he lay, and had rolled down with a bump from the platform on to the floor. It was Bofur, and he was grumbling about -------------- next part -------------- An HTML attachment was scrubbed... URL: From seaton.cornellzrq at gmail.com Mon May 29 07:31:47 2006 From: seaton.cornellzrq at gmail.com (Lauren Wolff) Date: Mon, 29 May 2006 19:31:47 +0500 Subject: [openib-general] Open something new for your self Message-ID: <20060529234422.C57672283E6@openib.ca.sandia.gov> Cialis Soft Tabs is the new impotence treatment drug that everyone is talking about. It has benefits over Viagra and other ED treatment solutions. Here goes some reasons to choose Cialis Soft Tabs: 1. You can mix alcohol drinks with Cialis Soft Tabs without any undesired effects. 2.Cialis Soft Tabs does not make you feel dizzy or make vision blurred, so you can easily drive a car or operate heavy machinery. 3.Cialis soft tabs works much faster than any known ED treatment solution. Cialis Soft Tabs enters the bloodstream directly instead of going through the stomach, thus you need only 15 minutes till you feel the effect. Just look at the graph below If you are interested ? Just click here and Read more about it http://8mrw6.tetborn.com/ct AND ALSO Cialis Soft Tabs formula is effective for 95% of the patients. If this treatment is not effective for you, we will refund you for every unopened pack. All you have to do is send them back, and we will immediatley refund your account! From frdt at 019.net.il Mon May 29 19:59:30 2006 From: frdt at 019.net.il (com-line) Date: Tue, 30 May 2006 04:59:30 +0200 Subject: [openib-general] =?iso-8859-1?q?=EB=EE=E4_=EC=F7=E5=E7=E5=FA_=E7?= =?iso-8859-1?q?=E3=F9=E9=ED_=E0=FA=E4_=F8=E5=F6=E4?= Message-ID: <20060530021050.117FB2283F1@openib.ca.sandia.gov> An HTML attachment was scrubbed... URL: From frdt at 019.net.il Mon May 29 19:59:30 2006 From: frdt at 019.net.il (com-line) Date: Tue, 30 May 2006 04:59:30 +0200 Subject: [openib-general] =?iso-8859-1?q?=EB=EE=E4_=EC=F7=E5=E7=E5=FA_=E7?= =?iso-8859-1?q?=E3=F9=E9=ED_=E0=FA=E4_=F8=E5=F6=E4?= Message-ID: <20060530021050.117FB2283F1@openib.ca.sandia.gov> An HTML attachment was scrubbed... URL: From barbabrafleming at hitstrike.com Mon May 29 18:51:52 2006 From: barbabrafleming at hitstrike.com (caresa law) Date: Mon, 29 May 2006 21:51:52 -0400 Subject: [openib-general] Need some cash, just refinnace... pull-on Message-ID: <00c101c6838b$30429300$9d64628c@yceudhn> How much are you paying for your Home? To much? You have been pre-approved to fill out for a ref inance laon, if you need some cash to spend ANY way you like, or simply wish to LOWER your monthly payments by a third or more, etc. We skip the middle man to save hundreds with deals we have! This offer is for you, we DONT CARE about your credit. Apply online now for your instant quote. Stop over paying... http://neroz.org/d2/ Post-romantic sound-sweet crystal vision purse cutting bird bolt steam jacket cow-lice culture mixing woolly-haired meadow fescue benzyl alcohol squirrel plague miter-clamped well-framed all-potency weasel cat sharp-shinned statute book shilly-shally flounder-man carbon monoxide caper berry Omicron ceti savin oil ramper eel wheel-supported fancy-stirring nonassessable mutual weak-kneedness light magnesia half-gill -------------- next part -------------- An HTML attachment was scrubbed... URL: From mariko_aru at ocn.ne.jp Mon May 29 20:21:08 2006 From: mariko_aru at ocn.ne.jp (=?shift-jis?B?aWt1bWk=?=) Date: Mon, 29 May 2006 20:21:08 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCRjE5JTJxMEZGYhsoQg==?= Message-ID: <20060530032108.66C6F2283E6@openib.ca.sandia.gov> ▼女性紹介サークル・掲示板などを使って相手探しをする主婦達....。  ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ 「一度不倫をしてしまってから罪悪感がなくなりました・・・。」 「初めて不倫したのは半年前。女性紹介サークルに登録して会った男性と、 その日のうちに関係を持ちました。」という彼女。 金銭目的では無いため、お金は受け取ったことが無いと言う。 それ以来、特定の女性紹介サークルを使って3人の男性と不倫中だ。 女性紹介サークルを使う理由は匿名性・手軽さ・後腐れのなさ。 自宅のパソコンでは利用しづらいため携帯と連動している。 女性紹介サークル利用する主婦急増中!  無料入口・・・  http://www.meguriai-max.net/?2014 配信不要 p_partyparty0125 at yahoo.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From departementservices at yahoo.com Tue May 30 16:59:05 2006 From: departementservices at yahoo.com (Shelly Kasar) Date: Tue, 30 May 2006 17:59:05 -0600 Subject: [openib-general] We can help you generate more leads Message-ID: <@> An HTML attachment was scrubbed... URL: From tomomi at hushmail.com Mon May 29 22:21:52 2006 From: tomomi at hushmail.com (tomomi at hushmail.com) Date: Mon, 29 May 2006 22:21:52 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCPXdALSRiJSQlZCVpJTcbKEI=?= =?iso-2022-jp?b?GyRCJSQkcyRHJDkhKhsoQg==?= Message-ID: 20060530130036.86362mail@mail.serebu_woman-server99_soondeai-go-free1919_system08_heart-kiss.tv ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ■━■━■━■━■━■━■━■━■━■━■━■━■━■━■━■━ ┃                                ■  絶対に逢えます!絶対にH出来ます!絶対にお金が貰えます!  ┃                                ■       だって、ココは本物の…       ┃                                ■        セレブな女性達の集まりですから         ┃        ━━━━━━━━━━━━━━━━        ■━■━■━■━■━■━■━■━■━■━■━■━■━■━■━■━ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ┏┓ ┗★ なぜ絶対に逢えると言い切れるか?        ■■ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━ http://perfection.cx/h/ 当サークルは男性様が主役。当社のシステムがその理由です。女性 様の登録条件が以下の通りなのです。 ¶.登録の際、登録金額をお振込み頂き、男性様への謝礼金としての保障と   してもお預かりさせて頂く ¶.直接連絡先の交換は、男性様からのメールが届き次第速やかに行う事。  これは双方の信頼性・安全性を高める上での絶対条件の為 ¶.<男性様が貴方にお会い頂ける>という認識を大切に、ご希望条件(肉   の体関係の求愛・逆援助・送迎)等には快く従う http://perfection.cx/h/ ┏┓ ┗★ 貴方が思っている以上に女性は淫乱なんです…   ■■ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━ http://perfection.cx/h/ 某有名雑誌にはこんな記事が掲載されております。 『一般的に性欲と言う物は<男性の方が高く持ち合わせている>という説  が殆どの方の認識であると思うが、某有名病院○○医師の見解によると  、どうやらそれは違う様である。』  つまり、女性は貴方のその性欲よりも<更に強く>SEXを求めている  のです。これは物理的に考えると『絶対に逢える!』と言う答えが、<  必然的>に裏付けられるのです。 http://perfection.cx/h/ ┏┓ ┗★ ご存知ですが・・・               ■■ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━ http://perfection.cx/h/ よく『本当ですか?』等のご質問を頂きますが、何も行動されないから 不安になるんです。こうしてる間にもセレブ女性は貴方を待っています。 当社は男性様に上記の項目に完全に当てはまる女性をご 紹介しております。 信じる・信じないは貴方様の自由で御座います。只、          <紹介という事実> は、決して曲げる事の出来ない事実で御座います。 男性は登録料・紹介料など一切かかりません また、当サークルは問題視されている不正請求・自動課金も一切行っておりません。 どなた様も安心してご利用いただけます。 ▼18歳未満のご利用は禁止されています▼ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ http://perfection.cx/h/ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ From emiiii at ocn.ne.jp Mon May 29 22:44:50 2006 From: emiiii at ocn.ne.jp (=?shift-jis?B?YXlh?=) Date: Mon, 29 May 2006 22:44:50 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCMnEkQyRGRDokMSRsJFAbKEI=?= =?iso-2022-jp?b?GyRCOjkkNz5lJDIkXiQ5ISMbKEI=?= Message-ID: <20060530054450.260CC2283E6@openib.ca.sandia.gov> はじめまして、澤田と申します。いきなりのメールで申し訳ございません。 私は、社長である仮名/宮崎 里美(34)の秘書をしております、 社長に依頼され、 社長と楽しい時間を過ごしていただける、男性を探しております。、 接待費として、振込みもできます、もし時間を作って会っていただけるのであれば、それなりの費用をこちらで用意します。 少しでも興味がおありでしたら、No199945 金額の条件、社長の顔写真の確認をこちらでして下さい。メール頂ければ電話番号を送らせて頂きます http://www.gyakuen-queen.net/?1921 拒否 k_49singing_in_the_rain at yahoo.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Mon May 29 23:22:59 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 30 May 2006 09:22:59 +0300 (IDT) Subject: [openib-general] [PATCHv2 1/2] resend: mthca support for max_map_per_fmr device attribute In-Reply-To: References: Message-ID: resending - with the mail client problem fixed - Or. OK, here's a modified version of the patch to support both mem full/free HCAs, in the memfree case the code relies on this quote from the (section 4.2 pp 45) memfree PRM: "A memory key is composed of two fields, a 24-bit index and an 8-bit key. The key field is an arbitrarily chosen number. The index field is unique number used as an index to an MPT table entry, ..." ======================================================================= implement max_map_per_fmr device attribute for mthca Signed-off-by: Or Gerlitz Index: hw/mthca/mthca_provider.c =================================================================== --- hw/mthca/mthca_provider.c (revision 7031) +++ hw/mthca/mthca_provider.c (working copy) @@ -116,6 +116,15 @@ static int mthca_query_device(struct ib_ props->max_total_mcast_qp_attach = props->max_mcast_qp_attach * props->max_mcast_grp; + /* on memfull HCA an FMR can be remapped 2^B - 1 times where B < 32 is + * the number of bits which are not used for MPT addressing, on memfree + * HCA B=8 so an FMR can be remapped 255 times. + */ + if(!mthca_is_memfree(mdev)) + props->max_map_per_fmr = (1 << (32 - + long_log2(mdev->limits.num_mpts))) - 1; + else + props->max_map_per_fmr = (1 << 8) - 1; err = 0; out: kfree(in_mad); From ogerlitz at voltaire.com Mon May 29 23:23:41 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 30 May 2006 09:23:41 +0300 (IDT) Subject: [openib-general] [PATCHv2 2/2] resend: port the fmr pool to use the max_map_per_fmr device attribute In-Reply-To: References: Message-ID: resending - with the mail client problem fixed fixed the patch to allocate the device attributes structure dynamically Or. port the generic fmr pool to query the ib device and use the device attribute as for the max number of fmr remaps. If the device does not suport the attribute, the code reverts to use the IB_FMR_MAX_REMAPS (32) default. Signed-off-by: Or Gerlitz Index: core/fmr_pool.c =================================================================== --- core/fmr_pool.c (revision 7031) +++ core/fmr_pool.c (working copy) @@ -54,7 +54,7 @@ enum { /* * If an FMR is not in use, then the list member will point to either * its pool's free_list (if the FMR can be mapped again; that is, - * remap_count < IB_FMR_MAX_REMAPS) or its pool's dirty_list (if the + * remap_count < device_attr.max_map_per_fmr) or its pool's dirty_list (if the * FMR needs to be unmapped before being remapped). In either of * these cases it is a bug if the ref_count is not 0. In other words, * if ref_count is > 0, then the list member must not be linked into @@ -84,6 +84,7 @@ struct ib_fmr_pool { int pool_size; int max_pages; + int max_remaps; int dirty_watermark; int dirty_len; struct list_head free_list; @@ -214,8 +215,10 @@ struct ib_fmr_pool *ib_create_fmr_pool(s { struct ib_device *device; struct ib_fmr_pool *pool; + struct ib_device_attr *attr; int i; int ret; + int max_remaps; if (!params) return ERR_PTR(-EINVAL); @@ -228,6 +231,25 @@ struct ib_fmr_pool *ib_create_fmr_pool(s return ERR_PTR(-ENOSYS); } + attr = kmalloc(sizeof *attr, GFP_KERNEL); + if (!attr) { + printk(KERN_WARNING "couldn't allocate device attr struct"); + return ERR_PTR(-ENOMEM); + } + ret = ib_query_device(device, attr); + if (ret) { + printk(KERN_WARNING "couldn't query device"); + kfree(attr); + return ERR_PTR(ret); + } + /* use the default max remaps for drivers not setting the attribute */ + if (!attr->max_map_per_fmr) + max_remaps = IB_FMR_MAX_REMAPS; + else + max_remaps = attr->max_map_per_fmr; + + kfree(attr); + pool = kmalloc(sizeof *pool, GFP_KERNEL); if (!pool) { printk(KERN_WARNING "couldn't allocate pool struct"); @@ -258,6 +280,7 @@ struct ib_fmr_pool *ib_create_fmr_pool(s pool->pool_size = 0; pool->max_pages = params->max_pages_per_fmr; + pool->max_remaps = max_remaps; pool->dirty_watermark = params->dirty_watermark; pool->dirty_len = 0; spin_lock_init(&pool->pool_lock); @@ -279,7 +302,7 @@ struct ib_fmr_pool *ib_create_fmr_pool(s struct ib_pool_fmr *fmr; struct ib_fmr_attr attr = { .max_pages = params->max_pages_per_fmr, - .max_maps = IB_FMR_MAX_REMAPS, + .max_maps = pool->max_remaps, .page_shift = params->page_shift }; @@ -489,7 +512,7 @@ int ib_fmr_pool_unmap(struct ib_pool_fmr --fmr->ref_count; if (!fmr->ref_count) { - if (fmr->remap_count < IB_FMR_MAX_REMAPS) { + if (fmr->remap_count < pool->max_remaps) { list_add_tail(&fmr->list, &pool->free_list); } else { list_add_tail(&fmr->list, &pool->dirty_list); From kneier at neulreich.com Mon May 29 23:33:04 2006 From: kneier at neulreich.com (Allen James) Date: Tue, 30 May 2006 08:33:04 +0200 Subject: [openib-general] Don't be the "little guy" in the club Message-ID: <000001c683de$0797b900$0100007f@localhost> In a sprawling without hoop the face of entertaining grew kirkcaldy Black excellency mouths, the communing swallowed up the sun paintin air was emanating with suppressed turning The wind intrigue through the long wart and sobbed and rollin the secret lazy -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: top.jpg Type: image/jpeg Size: 8387 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: down.gif Type: image/gif Size: 7523 bytes Desc: not available URL: From eimili169 at ocn.ne.jp Tue May 30 00:18:45 2006 From: eimili169 at ocn.ne.jp (=?shift-jis?B?bWFuYW5p?=) Date: Tue, 30 May 2006 00:18:45 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCJCIkSiQ/JE5LPiRfJHIbKEI=?= =?iso-2022-jp?b?GyRCJCskSiQoJF4kOSEjISMbKEI=?= Message-ID: <20060530071845.5E734228408@openib.ca.sandia.gov> $B5.J}$O:#8=:_=w at -$H$I$N$h$&$J;v$r$7$?$$$G$9$+!)IaCJ$G$O8@$($J$$$h$&$JCQ$:$+$7$$(B $B;v$G$b$J$s$G$b8@$C$F$_$F$/$@$5$$!#(B $B$=$N4j$$$r$+$J$($F$"$2$^$9!#(B $B$3$N$h$&$J%a!<%k$r8+$F?.$8$i$l$J$$J}$,$[$H$s$I$G$7$g$&!#(B $B$=$l$b$=$N$O$:$G$9!#$G$9$+$i=i2s$OL5NA$G$=$N8z2L$r;n$7$F$_$F$/$@$5$$!#(B $B2x$7$$OC$G$b$J$s$G$b$"$j$^$;$s!#$=$l$O5.J}$,$_3p$($^$9!A(B $B!!!!!!!!!!!!"-(B http://www.gyakuen-queen.net/?1936 $B=w at -$N;v$K8B$i$:$K$*6b$KG:$s$G$i$C$7$c$kJ}$b@'Hs$4$i$s$K$J$C$F$_$F$/$@$5$$!#(B $B at dBP$KB;$O$J$$$H$3$OJ]>Z$$$?$7$^$9!#!!!!!!!!$=$l$G$O!#!#!#(B $BG[?.5qH](B $B"-(B k_49singing_in_the_rain at yahoo.co.uk From hitozumabi at yahoo.co.jp Tue May 30 00:39:35 2006 From: hitozumabi at yahoo.co.jp (hitozumabi at yahoo.co.jp) Date: Tue, 30 May 2006 00:39:35 -0700 (PDT) Subject: [openib-general] =?utf-8?b?woNcwoFbwoNWwoPCg8KDwovCg2zCg2LCg2c=?= =?utf-8?b?woPCj8KBW8KDTMKDwpPCg0/Cg1TCg0PCg2fCj8K1wpHDksKPw7M=?= Message-ID: 20030925000251.75279mail@mail.love-woman889889_gogo-server114_freesystem01_freefree-lovelove.tv ����ɂ��́A������͐l�Ȕ��^�c�����ǁ@�R�{����ł��B �l�Ȕ��Ƃ́A�����o�[�݂̂ō\������Ă���ŋߗ��s��SNS�i�\�[�V�����l�b�g���[�L���O�T�C�g�ł��B ���񃉃��_�����I�ł��Ȃ��l�ɏ��ҏ�����炳���Ă��������܂����B ���L��URL���o�^��s���Ă��������l�b�g���[�N�������̊F�l�Ƃ̌𗬂��肢�������܂��B �@�@�@http://yaii.net/htm �����F�l�̓v���t�B�[���A�ʐ^��o�^�A���J���邱�Ƃɂ���Ă�葽���̕��X�ɏ��� �@�@���M���邱�Ƃ��o���܂��B���p�A�o�^�͖����ł��B �@�@�v���t�B�[���A�ʐ^�̓o�^�A���J�@���� �@�@�@http://yaii.net/htm �����l�Ȕ��ł͐M���ł�����l�A�F�B�A���l�A�Z�b�N�X�t�����h�A���܂��܂ȃc�[�����p�ӂ��Ă���܂��B �@�@�@http://yaii.net/htm �����l�Ȕ���g���ΐl�ȓ��m�̃l�b�g���[�N���ǂ��ė���p�[�e�B�Ȃǂ̌𗬂� �@�@�ȒP�ɂł��܂��B�����ɂ͂��Ȃ��̃p�[�g�i�[����q����M���ł���l�b�g���[�N�� �@�@�`������Ă��܂��B�l�Ȕ��͂ǂ����Ōq�����Ă���l���m���W�܂�o������T�C�g �@�@�ł���A���ꂪ�l�Ȕ��̓����ł��B �@�@�@http://yaii.net/htm ����ł́A�Q����S��肨�҂����Ă���܂��B�^�c�ǁ@�R�{�@���� From admin at pai-kin-mk.com Tue May 30 00:54:26 2006 From: admin at pai-kin-mk.com (admin at pai-kin-mk.com) Date: Tue, 30 May 2006 00:54:26 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCIiMiIyIjJTslbCVWOGYbKEI=?= =?iso-2022-jp?b?GyRCTVFDIyROI1M1aSUzJVQhPCVWJWklcyVJIiMiIyIjGyhC?= Message-ID: <20060530075426.DB0DA2283FF@openib.ca.sandia.gov> 配信停止希望の方はメール下部をご覧ください。                 ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ───────────────────── 発行 VIP.Collection        驚愕のクオリティ!スーパーコピーウォッチ!                 ─────────────────────                  ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ VIP.Collection  http://vip-coll-mart.info/           Super CopyAAA&Sクラス専門店  http://vip-c-market.com/        ─────────────────────────────────────                                        ★あなたの知り合いが持っている高級腕時計は本当に本物ですか?(笑)        本物は高い!でも完璧に再現されたコピーなら驚きの価格で....                                                  勿論、すぐに偽物と分かるような粗悪なものは御座いません!                                                 ★ニュースや質屋を騒がせたスーパーコピーを各種取り揃えました。        数に限りがありますので、お早めにご注文下さい!                 例をあげると....                               ロレックスの完全デイトジャスト機能はオリジナル製品同等の動き!        フランクミューラークレイジーアワーズの完全ジャンプ機能は圧巻!       そこいらのコピー品とは明らかに一線を画しております!                                                    ★高級腕時計を身に着けると、女の子の視線まで変わります!                                                  ★サイト上でその質感を余す所無くご紹介しておりますので、           是非その違いを実感して下さい。                                                               http://vip-coll-mart.info/                           http://vip-c-market.com/                                                                                                            ┏━┓                                   ┏━┃各┃人気ブランドのスーパーコピー御座います!     ━┓    ┃ ┗━┛                                  ┃ ◆ロレックス                    〜〜〜☆        ┃ ◇デイトナ      ◇サブマリーナ        〜〜〜☆       ┃ ◇エクスプローラー  ◇デイトジャスト他..     〜〜〜☆        ┃ ◆カルティエ                    〜〜〜☆       ┃ ◇サントス      ◇ロードスター他..      〜〜〜☆        ┃ ◆フランクミューラー                〜〜〜☆       ┃ ◇クレイジーアワーズ ◇ベガス他..         〜〜〜☆       ┃ ◆ブルガリ                     〜〜〜☆        ┃ ◇ブルガリブルガリ  ◇デイアゴノ等        〜〜〜☆       ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛                                            ────────────────────────────────────── ◇配信対象について                               このメールマガジンは、提携無料一括投稿サイトからの無料投稿を         ご利用いただいた方や、スピードくじ、もしくは懸賞サイトなどにに        応募された方々へご利用規程に基づいてお送りしています。                                                   しかしながら、アドレス間違いやイタズラの可能性もありますので、        身に覚えのない方、このメールの配信をを不快または不要と思われる方は      お手数ですが下記アドレス迄お送り下さいますようお願い申し上げます。      In an unnecessary delivery, even here is                   del6 at hja984oeu.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From morten at om.openib.org Tue May 30 01:13:00 2006 From: morten at om.openib.org (Morten Outten) Date: Tue, 30 May 2006 01:13:00 -0700 Subject: [openib-general] Re: refnance it Message-ID: <000001c683c0$dd8c5980$9202a8c0@rmx60> D m ea s r H o om v e O h wn a er, Your c x re l di m t doesn't matter to us! If you OV v VN r j ea p l e j st i at n e and want I j MM e EDI c ATE c t as y h to s p pen y d ANY way you like, or simply wish to L k OW b ER your monthly pa p ym x ent r s by a third or more, here are the d v ea v ls j we have T i OD n AY: $ 49 w 0 , 0 r 00 a m s l u ow a a s 3 , 6 f 5 % $ 37 t 0 , 0 j 00 a v s l m ow a m s 3 , 9 q 0 % $ 2 p 50 , 0 j 00 a w s lo g w a v s 3 , 3 m 5 % $ 20 y 0 , 0 w 00 a g s lo i w a a s 3 , 5 v 5 % V p isi s t ou d r web s g it w e Morten Outten , Ap z pr z ova y l Ma x na e ge v r along at a tremendous pace, I can tell you, when they have to-they took it in turn to carry him on their backs. Still goblins go faster than dwarves, and these goblins knew the way better (they had made the paths themselves), and were madly angry; so that do what they could the -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Tue May 30 01:54:34 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 30 May 2006 11:54:34 +0300 Subject: [openib-general] RE: problems with git Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA70B7@mtlexch01.mtl.com> Thanks, It was a problem with my git installation. I reinstall and everything is OK now. Tziporet -----Original Message----- From: Roland Dreier [mailto:rdreier at cisco.com] Sent: Monday, May 29, 2006 7:11 PM To: Tziporet Koren Cc: openib Subject: Re: problems with git > git clone git://www.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git infiniband > warning: templates not found /home/dag/share/git-core/templates/ Looks like you have a slight issue with your git installation... > fatal: unable to connect a socket (Connection timed out) And you are not able to connect to the git server. In your command you are using a URL like git://www.kernel.org/..., which I'm not sure will work -- www.kernel.org will likely point to a local mirror, which is likely not running a native git server. So you can try two things: 1) replace www.kernel.org with git.kernel.org -- this is the best thing to do, as git native protocol will work better than http. 2) If your firewall is blocking git:// URLs, then try replacing the git:// with http:// (but leave the server as www.kernel.org). - R. From nicholas at seaton.biz Mon May 29 19:03:17 2006 From: nicholas at seaton.biz (Gilbert) Date: Tue, 30 May 2006 03:03:17 +0100 Subject: [openib-general] Medicines for men before Valentine Day !!! Message-ID: <000001c683c7$e3c97880$0100007f@KK> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled-2.jpg Type: image/jpeg Size: 20429 bytes Desc: not available URL: From yipeeyipeeyipeeyipee at yahoo.com Tue May 30 02:35:03 2006 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Tue, 30 May 2006 09:35:03 +0000 (UTC) Subject: [openib-general] special qp's creation from userspace Message-ID: Hi, Can I use ib_mad_port_close() (mad.c) to close qp0 & qp1 amd reopen them from a userspace application? That way I can handle all mads without any kernel intervention? Thanks, x From mst at mellanox.co.il Tue May 30 02:39:30 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 30 May 2006 12:39:30 +0300 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> <20060529161558.GU21266@mellanox.co.il> Message-ID: <20060530093930.GE21266@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: ia64: kernel unaligned access in ipoib > > Michael> We've written up a patch with Jack - do you want us to > Michael> test it or prefer to re-write it yourself? > > Go ahead and test it -- I replied before I saw your patch. The following fixed the issue for us, pls review. Can this go into 2.6.17? If yes, I think it's prudent to let it run for another night before pushing it out, since we had to touch a lot of lines here. We'll do that and let you know tomorrow. --- Fix misaligned access faults on ia64: never cast a misaligned ha + 4 pointer to union ib_gid type, pass a void * pointer instead. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: src/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- src.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-04-16 11:12:16.105871000 +0300 +++ src/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-05-30 12:09:43.113172000 +0300 @@ -190,8 +190,7 @@ static int ipoib_change_mtu(struct net_d return 0; } -static struct ipoib_path *__path_find(struct net_device *dev, - union ib_gid *gid) +static struct ipoib_path *__path_find(struct net_device *dev, void *gid) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct rb_node *n = priv->path_tree.rb_node; @@ -201,7 +200,7 @@ static struct ipoib_path *__path_find(st while (n) { path = rb_entry(n, struct ipoib_path, rb_node); - ret = memcmp(gid->raw, path->pathrec.dgid.raw, + ret = memcmp(gid, path->pathrec.dgid.raw, sizeof (union ib_gid)); if (ret < 0) @@ -430,8 +429,7 @@ static void path_rec_completion(int stat } } -static struct ipoib_path *path_rec_create(struct net_device *dev, - union ib_gid *gid) +static struct ipoib_path *path_rec_create(struct net_device *dev, void *gid) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_path *path; @@ -446,7 +444,7 @@ static struct ipoib_path *path_rec_creat INIT_LIST_HEAD(&path->neigh_list); - memcpy(path->pathrec.dgid.raw, gid->raw, sizeof (union ib_gid)); + memcpy(path->pathrec.dgid.raw, gid, sizeof (union ib_gid)); path->pathrec.sgid = priv->local_gid; path->pathrec.pkey = cpu_to_be16(priv->pkey); path->pathrec.numb_path = 1; @@ -504,10 +502,9 @@ static void neigh_add_path(struct sk_buf */ spin_lock(&priv->lock); - path = __path_find(dev, (union ib_gid *) (skb->dst->neighbour->ha + 4)); + path = __path_find(dev, skb->dst->neighbour->ha + 4); if (!path) { - path = path_rec_create(dev, - (union ib_gid *) (skb->dst->neighbour->ha + 4)); + path = path_rec_create(dev, skb->dst->neighbour->ha + 4); if (!path) goto err_path; @@ -557,7 +554,7 @@ static void ipoib_path_lookup(struct sk_ /* Add in the P_Key for multicasts */ skb->dst->neighbour->ha[8] = (priv->pkey >> 8) & 0xff; skb->dst->neighbour->ha[9] = priv->pkey & 0xff; - ipoib_mcast_send(dev, (union ib_gid *) (skb->dst->neighbour->ha + 4), skb); + ipoib_mcast_send(dev, skb->dst->neighbour->ha + 4, skb); } static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, @@ -572,10 +569,9 @@ static void unicast_arp_send(struct sk_b */ spin_lock(&priv->lock); - path = __path_find(dev, (union ib_gid *) (phdr->hwaddr + 4)); + path = __path_find(dev, phdr->hwaddr + 4); if (!path) { - path = path_rec_create(dev, - (union ib_gid *) (phdr->hwaddr + 4)); + path = path_rec_create(dev, phdr->hwaddr + 4); if (path) { /* put pseudoheader back on for next time */ skb_push(skb, sizeof *phdr); @@ -666,7 +662,7 @@ static int ipoib_start_xmit(struct sk_bu phdr->hwaddr[8] = (priv->pkey >> 8) & 0xff; phdr->hwaddr[9] = priv->pkey & 0xff; - ipoib_mcast_send(dev, (union ib_gid *) (phdr->hwaddr + 4), skb); + ipoib_mcast_send(dev, phdr->hwaddr + 4, skb); } else { /* unicast GID -- should be ARP or RARP reply */ @@ -677,7 +673,7 @@ static int ipoib_start_xmit(struct sk_bu skb->dst ? "neigh" : "dst", be16_to_cpup((__be16 *) skb->data), be32_to_cpup((__be32 *) phdr->hwaddr), - IPOIB_GID_ARG(*(union ib_gid *) (phdr->hwaddr + 4))); + IPOIB_GID_RAW_ARG(phdr->hwaddr + 4)); dev_kfree_skb_any(skb); ++priv->stats.tx_dropped; goto out; @@ -773,7 +769,7 @@ static void ipoib_neigh_destructor(struc ipoib_dbg(priv, "neigh_destructor for %06x " IPOIB_GID_FMT "\n", be32_to_cpup((__be32 *) n->ha), - IPOIB_GID_ARG(*((union ib_gid *) (n->ha + 4)))); + IPOIB_GID_RAW_ARG(n->ha + 4)); spin_lock_irqsave(&priv->lock, flags); Index: src/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- src.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2006-05-25 11:35:23.334409000 +0300 +++ src/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2006-05-30 12:09:28.568595000 +0300 @@ -153,7 +153,7 @@ static struct ipoib_mcast *ipoib_mcast_a return mcast; } -static struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, union ib_gid *mgid) +static struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, void *mgid) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct rb_node *n = priv->multicast_tree.rb_node; @@ -164,7 +164,7 @@ static struct ipoib_mcast *__ipoib_mcast mcast = rb_entry(n, struct ipoib_mcast, rb_node); - ret = memcmp(mgid->raw, mcast->mcmember.mgid.raw, + ret = memcmp(mgid, mcast->mcmember.mgid.raw, sizeof (union ib_gid)); if (ret < 0) n = n->rb_left; @@ -639,8 +639,7 @@ static int ipoib_mcast_leave(struct net_ return 0; } -void ipoib_mcast_send(struct net_device *dev, union ib_gid *mgid, - struct sk_buff *skb) +void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_mcast *mcast; @@ -663,7 +662,7 @@ void ipoib_mcast_send(struct net_device if (!mcast) { /* Let's create a new send only group now */ ipoib_dbg_mcast(priv, "setting up send only multicast group for " - IPOIB_GID_FMT "\n", IPOIB_GID_ARG(*mgid)); + IPOIB_GID_FMT "\n", IPOIB_GID_RAW_ARG(mgid)); mcast = ipoib_mcast_alloc(dev, 0); if (!mcast) { @@ -675,7 +674,7 @@ void ipoib_mcast_send(struct net_device } set_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags); - mcast->mcmember.mgid = *mgid; + memcpy(mcast->mcmember.mgid.raw, mgid, sizeof (union ib_gid)); __ipoib_mcast_add(dev, mcast); list_add_tail(&mcast->list, &priv->multicast_list); } Index: src/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- src.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2006-04-06 10:03:29.420250000 +0300 +++ src/drivers/infiniband/ulp/ipoib/ipoib.h 2006-05-30 12:20:39.572837000 +0300 @@ -278,8 +278,7 @@ int ipoib_dev_init(struct net_device *de void ipoib_dev_cleanup(struct net_device *dev); void ipoib_mcast_join_task(void *dev_ptr); -void ipoib_mcast_send(struct net_device *dev, union ib_gid *mgid, - struct sk_buff *skb); +void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb); void ipoib_mcast_restart_task(void *dev_ptr); int ipoib_mcast_start_thread(struct net_device *dev); @@ -375,15 +374,26 @@ extern int ipoib_debug_level; #endif /* CONFIG_INFINIBAND_IPOIB_DEBUG_DATA */ -#define IPOIB_GID_FMT "%x:%x:%x:%x:%x:%x:%x:%x" +#define IPOIB_GID_FMT "%2.2x%2.2x:%2.2x%2.2x:%2.2x%2.2x:%2.2x%2.2x:" \ + "%2.2x%2.2x:%2.2x%2.2x:%2.2x%2.2x:%2.2x%2.2x" -#define IPOIB_GID_ARG(gid) be16_to_cpup((__be16 *) ((gid).raw + 0)), \ - be16_to_cpup((__be16 *) ((gid).raw + 2)), \ - be16_to_cpup((__be16 *) ((gid).raw + 4)), \ - be16_to_cpup((__be16 *) ((gid).raw + 6)), \ - be16_to_cpup((__be16 *) ((gid).raw + 8)), \ - be16_to_cpup((__be16 *) ((gid).raw + 10)), \ - be16_to_cpup((__be16 *) ((gid).raw + 12)), \ - be16_to_cpup((__be16 *) ((gid).raw + 14)) +#define IPOIB_GID_RAW_ARG(gid) ((u8 *)(gid))[0], \ + ((u8 *)(gid))[1], \ + ((u8 *)(gid))[2], \ + ((u8 *)(gid))[3], \ + ((u8 *)(gid))[4], \ + ((u8 *)(gid))[5], \ + ((u8 *)(gid))[6], \ + ((u8 *)(gid))[7], \ + ((u8 *)(gid))[8], \ + ((u8 *)(gid))[9], \ + ((u8 *)(gid))[10],\ + ((u8 *)(gid))[11],\ + ((u8 *)(gid))[12],\ + ((u8 *)(gid))[13],\ + ((u8 *)(gid))[14],\ + ((u8 *)(gid))[15] + +#define IPOIB_GID_ARG(gid) IPOIB_GID_RAW_ARG((gid).raw) #endif /* _IPOIB_H */ -- MST From leonida at voltaire.com Tue May 30 04:03:19 2006 From: leonida at voltaire.com (Leonid Arsh) Date: Tue, 30 May 2006 14:03:19 +0300 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: <20060509060958.GA482@voltaire.com> References: <20060509060958.GA482@voltaire.com> Message-ID: <10e223bf0605300403lcc24b8bwf1e1d7059edab416@mail.gmail.com> Roland, Aren't you going to apply these patches? This is a standard feature, and we need it rather urgently. Regards, Leonid On 5/9/06, Leonid Arsh wrote: > Roland, > I'm reposting the Client Reregister event support patch for the kernel space. > > The patch defines the event and implements it in MTHCA. > The event is implemented in software, as Michael proposed. > It also moves the port_info structure definition from ipath_mad.c to ib_smi.h. > > Regards, > Leonid > > From leonida at voltaire.com Tue May 30 04:17:16 2006 From: leonida at voltaire.com (Leonid Arsh) Date: Tue, 30 May 2006 14:17:16 +0300 Subject: [openib-general][RFC][PATCH] mthca: HCA initialization parameters Message-ID: <20060530111716.GA32463@voltaire.com> Roland, Further to our discussions I'm sending the fixed patch. This patch implements the module parameters allowing the user to change the HCA initialization values. I left only needed parameters and added the parameter validation. Now the set of the parameters is closest to the profile parameters used in the old Mellanox driver. The parameters may be read from the sysfs, but cannot be changed. Regards, Leonid Signed-off-by: Leonid Arsh --- openib-1.0/src/linux-kernel/infiniband/hw/mthca/mthca_main.c 2006-05-04 15:48:33.000000000 +0300 +++ openib-1.0/src/linux-kernel/infiniband/hw/mthca/mthca_main.c.NEW 2006-05-29 18:12:31.000000000 +0300 @@ -81,9 +81,6 @@ module_param(tune_pci, int, 0444); MODULE_PARM_DESC(tune_pci, "increase PCI burst from the default set by BIOS if nonzero"); -static const char mthca_version[] __devinitdata = - DRV_NAME ": Mellanox InfiniBand HCA driver v" - DRV_VERSION " (" DRV_RELDATE ")\n"; static struct mthca_profile default_profile = { .num_qp = 1 << 16, @@ -97,6 +94,116 @@ .uarc_size = 1 << 18, /* Arbel only */ }; +module_param_named(num_qp, default_profile.num_qp, int, 0444); +MODULE_PARM_DESC(num_qp, "Maximum number of QPs available per HCA"); + +module_param_named(rdb_per_qp, default_profile.rdb_per_qp, int, 0444); +MODULE_PARM_DESC(rdb_per_qp, "Number of RDB buffers per QP"); + +module_param_named(num_srq, default_profile.num_srq, int, 0444); +MODULE_PARM_DESC(num_srq, "Maximum number of Shared Receive Queues per HCA "); + +module_param_named(num_cq, default_profile.num_cq, int, 0444); +MODULE_PARM_DESC(num_cq, "Maximum number of CQs per HCA"); + +module_param_named(num_mcg, default_profile.num_mcg, int, 0444); +MODULE_PARM_DESC(num_mcg, "Maximum number of Multicast groups per HCA"); + +module_param_named(num_mpt, default_profile.num_mpt, int, 0444); +MODULE_PARM_DESC(num_mpt, + "Maximum number of Memory Protection Table entries per HCA"); + +module_param_named(num_mtt, default_profile.num_mtt, int, 0444); +MODULE_PARM_DESC(num_mtt, + "Maximum number of Memory Translation table segments per HCA"); +/* Tavor only */ +module_param_named(num_udav, default_profile.num_udav, int, 0444); +MODULE_PARM_DESC(num_udav, "Maximum number of UD Address Vectors per HCA"); + +/* Tavor only */ +module_param_named(fmr_reserved_mtts, default_profile.fmr_reserved_mtts, int, 0444); +MODULE_PARM_DESC(fmr_reserved_mtts, + "Number of Memory Translation table segments reserved for FMR"); + +static const char mthca_version[] __devinitdata = + DRV_NAME ": Mellanox InfiniBand HCA driver v" + DRV_VERSION " (" DRV_RELDATE ")\n"; + +static int __devinit mthca_validate_profile(struct mthca_dev *mdev, + struct mthca_profile *profile) +{ + + if(default_profile.num_qp & (default_profile.num_qp-1)) { + mthca_err(mdev, "Invalid num_qp parameter value (%d).\n", + default_profile.num_qp); + goto err_inval; + } + + if(default_profile.rdb_per_qp & (default_profile.rdb_per_qp-1)) { + mthca_err(mdev, "Invalid rdb_per_qp parameter value (%d)\n", + default_profile.rdb_per_qp); + goto err_inval; + } + + if(default_profile.num_srq & (default_profile.num_srq-1)) { + mthca_err(mdev, "Invalid num_srq parameter value (%d)\n", + default_profile.num_srq); + goto err_inval; + } + + if(default_profile.num_cq & (default_profile.num_cq-1)) { + mthca_err(mdev, "Invalid num_cq parameter value (%d)\n", + default_profile.num_cq); + goto err_inval; + } + + if(default_profile.num_mcg & (default_profile.num_mcg-1)) { + mthca_err(mdev, "Invalid num_mcg parameter value (%d)\n", + default_profile.num_mcg); + goto err_inval; + } + if(default_profile.num_mpt & (default_profile.num_mpt-1)) { + mthca_err(mdev, "Invalid num_mpt parameter value (%d)\n", + default_profile.num_mpt); + goto err_inval; + } + + if(default_profile.num_mtt & (default_profile.num_mtt-1)) { + mthca_err(mdev, "Invalid num_mtt parameter value (%d)\n", + default_profile.num_mtt); + goto err_inval; + } + + if (mthca_is_memfree(mdev)) { + + if(default_profile.num_udav & (default_profile.num_udav-1)) { + mthca_err(mdev, "Invalid num_udav parameter value (%d)\n", + default_profile.num_udav); + goto err_inval; + } + + if(default_profile.fmr_reserved_mtts & (default_profile.fmr_reserved_mtts-1)) { + mthca_err(mdev, "Invalid fmr_reserved_mtts parameter value (%d)\n", + default_profile.fmr_reserved_mtts); + goto err_inval; + } else if (default_profile.fmr_reserved_mtts >= default_profile.num_mtt ) { + mthca_err(mdev, + "Invalid fmr_reserved_mtts parameter value (%d). " + "Must be lower then num_mtt (%d)\n", + default_profile.fmr_reserved_mtts, + default_profile.num_mtt ); + return -EINVAL; + } + } + + return 0; + +err_inval: + mthca_err(mdev, "This parameter must be power of two.\n"); + return -EINVAL; + +} + static int __devinit mthca_tune_pci(struct mthca_dev *mdev) { int cap; @@ -994,6 +1101,7 @@ printk(KERN_INFO PFX "Initializing %s\n", pci_name(pdev)); + if (id->driver_data >= ARRAY_SIZE(mthca_hca_table)) { printk(KERN_ERR PFX "%s has invalid driver data %lx\n", pci_name(pdev), id->driver_data); @@ -1095,6 +1203,10 @@ if (err) goto err_cmd; + err = mthca_validate_profile(mdev, &default_profile); + if (err) + goto err_profile; + err = mthca_init_hca(mdev); if (err) goto err_cmd; @@ -1147,6 +1259,7 @@ mthca_close_hca(mdev); err_cmd: +err_profile: mthca_cmd_cleanup(mdev); err_free_dev: From tsumami_kui117117 at yahoo.co.jp Tue May 30 05:01:10 2006 From: tsumami_kui117117 at yahoo.co.jp (tsumami_kui117117 at yahoo.co.jp) Date: Tue, 30 May 2006 05:01:10 -0700 (PDT) Subject: [openib-general] =?utf-8?b?woJSwoJPwpViwoLDhcKPwoDClMO1worCrsKX?= =?utf-8?b?wrk=?= Message-ID: 20050530204607.65076mail@mail.love-woman889889_gogo-server114_freesystem01_freefree-lovelove.tv �����@�Ȕ��򂢁I�@���� �������������������������������������� ���@�o�^�v���t�@�C���^�C�v�őI�ׂ�@���@ �������������������������������������� �����@���ݓo�^�Ȑ��@���� http://ad.deai-wife.net/?moakw ���G���n�ȁ@�@3286�l �� ���M�����n�ȁ@5512�l �� �����^�n�ȁ@�@2010�l �� �������n�ȁ@�@3824�l �� ���n���n�ȁ@�@18501�l�� �@�o�^���E���p�� �E�E�E�E�E�E�E�E�E�y�����z ���[���̑���M �E�E�E�E�E�E�E�E�E�y�����z ���[�U�[�̌��� �E�E�E�E�E�E�E�E�E�y�����z �f���‚̉{���E������ �E�E�E�E�E�E�y�����z �摜���E�A�b�v���[�h �E�E�E�E�E�y�����z �A�h���X���E�d�b�ԍ��� �E�E�E�y�����z �S�ẴT�[�r�X�������Ŋy���߂�͓̂�����O�ł��ˁB �ł���ꂾ�����I�X�X�����������R�ł͂���܂��� ���o�^�͒��ȒP�I�I �����D���ȃj�b�N�l�[���A���[���A�h���X ���n��A�N��A���t�^�A���� ������͂��邾���I�I �@http://ad.deai-wife.net/?moakw ���g���A�X�^�C���A�o�X�g �@�D�݂̃^�C�v�Ől�Ȍ��� ���o�^�n��Ől�Ȍ��� �@http://ad.deai-wife.net/?moakw ���{���̓��e�� ���@�䂤������F19�� �����ɓo�^���Ă݂܂��� �@http://ad.deai-wife.net/?moakw ���@�܂肠����F39�� ��???�W���[�X���s�����Ă��܂��A��������? �@http://ad.deai-wife.net/?moakw ���@�~�J�R����F25�� ���͌������Ďq������܂����T�|�[�g���Ē������l�̊֌W��]�ł��� �@http://ad.deai-wife.net/?moakw ���@����������F18�� �����f��������X�^�C���Ɗ�ɂ쎩�M����?��낵���j����? �@http://ad.deai-wife.net/?moakw ���@�n���[����F25�� �N���[�����F�ɂȂ��āB �@http://ad.deai-wife.net/?moakw ���@�������҂���F22�� ������Ƃ��݂������Ȃ�ł����ǁA���[�����璇�ǂ����Ă��������� �@http://ad.deai-wife.net/?moakw ���������������������������������������� ���@�Ȕ���ID���s�́@�@�@�@�@�@�@�@�@�� ���@��http://ad.deai-wife.net/?moakw�@�� ���������������������������������������� From ogerlitz at voltaire.com Tue May 30 04:56:32 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 30 May 2006 14:56:32 +0300 (IDT) Subject: [openib-general] [PATCH] IB/iser: do I/O path allocations with GFP_NOIO Message-ID: Thanks for Mike Christie for pointing this out - Or. a block driver is not allowed to use GFP_KERNEL allocations on its I/O code path since the allocation might require I/O (eg to pageout other memory), resulting in either deadlock or tightloop. move I/O path (queuecommand) allocations to be done with GFP_NOIO Signed-off-by: Or Gerlitz diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index 2703bb0..073e7b5 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -225,7 +225,7 @@ static int iser_post_receive_control(str struct iser_device *device = iser_conn->ib_conn->device; int rx_data_size, err = 0; - rx_desc = kmem_cache_alloc(ig.desc_cache, GFP_KERNEL); + rx_desc = kmem_cache_alloc(ig.desc_cache, GFP_NOIO); if (rx_desc == NULL) { iser_err("Failed to alloc desc for post recv\n"); return -ENOMEM; @@ -238,7 +238,7 @@ static int iser_post_receive_control(str else /* FIXME till user space sets conn->max_recv_dlength correctly */ rx_data_size = 128; - rx_desc->data = kmalloc(rx_data_size, GFP_KERNEL); + rx_desc->data = kmalloc(rx_data_size, GFP_NOIO); if (rx_desc->data == NULL) { iser_err("Failed to alloc data buf for post recv\n"); err = -ENOMEM; @@ -467,7 +467,7 @@ int iser_send_data_out(struct iscsi_conn iser_dbg("%s itt %d dseg_len %d offset %d\n", __func__,(int)itt,(int)data_seg_len,(int)buf_offset); - tx_desc = kmem_cache_alloc(ig.desc_cache, GFP_KERNEL); + tx_desc = kmem_cache_alloc(ig.desc_cache, GFP_NOIO); if (tx_desc == NULL) { iser_err("Failed to alloc desc for post dataout\n"); return -ENOMEM; diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 0881f55..31950a5 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -111,10 +111,10 @@ int iser_start_rdma_unaligned_sg(struct unsigned long cmd_data_len = data->data_len; if (cmd_data_len > ISER_KMALLOC_THRESHOLD) - mem = (void *)__get_free_pages(GFP_KERNEL, + mem = (void *)__get_free_pages(GFP_NOIO, long_log2(roundup_pow_of_two(cmd_data_len)) - PAGE_SHIFT); else - mem = kmalloc(cmd_data_len, GFP_KERNEL); + mem = kmalloc(cmd_data_len, GFP_NOIO); if (mem == NULL) { iser_err("Failed to allocate mem size %d %d for copying sglist\n", From ogerlitz at voltaire.com Tue May 30 04:59:54 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 30 May 2006 14:59:54 +0300 (IDT) Subject: [openib-general] [PATCH] IB/iser: removed redundant check from iser's conn_bind transport function Message-ID: This fix isn't critical but it aligns the code to be as in the buddy iscsi_tcp driver - Or. removed redundant check of the conn stop_stage from iser's conn_bind transport func Signed-off-by: Or Gerlitz diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index c3eeeaa..4d14eb8 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -348,23 +348,21 @@ iscsi_iser_conn_bind(struct iscsi_cls_se if (error) return error; - if (conn->stop_stage != STOP_CONN_SUSPEND) { - /* the transport ep handle comes from user space so it must be - * verified against the global ib connections list */ - ib_conn = iscsi_iser_ib_conn_lookup(transport_eph); - if (!ib_conn) { - iser_err("can't bind eph %llx\n", - (unsigned long long)transport_eph); - return -EINVAL; - } - /* binds the iSER connection retrieved from the previously - * connected ep_handle to the iSCSI layer connection. exchanges - * connection pointers */ - iser_err("binding iscsi conn %p to iser_conn %p\n",conn,ib_conn); - iser_conn = conn->dd_data; - ib_conn->iser_conn = iser_conn; - iser_conn->ib_conn = ib_conn; + /* the transport ep handle comes from user space so it must be + * verified against the global ib connections list */ + ib_conn = iscsi_iser_ib_conn_lookup(transport_eph); + if (!ib_conn) { + iser_err("can't bind eph %llx\n", + (unsigned long long)transport_eph); + return -EINVAL; } + /* binds the iSER connection retrieved from the previously + * connected ep_handle to the iSCSI layer connection. exchanges + * connection pointers */ + iser_err("binding iscsi conn %p to iser_conn %p\n",conn,ib_conn); + iser_conn = conn->dd_data; + ib_conn->iser_conn = iser_conn; + iser_conn->ib_conn = ib_conn; return 0; } From halr at voltaire.com Tue May 30 05:11:31 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 May 2006 08:11:31 -0400 Subject: [openib-general] [PATCH] OpenSM: Remove unicast and multicast dump files relative to dump_files_dir Message-ID: <1148991091.4358.109540.camel@hal.voltaire.com> OpenSM: Remove unicast and multicast dump files relative to dump_files_dir Signed-off-by: Hal Rosenstock Index: opensm/osm_mcast_mgr.c =================================================================== --- opensm/osm_mcast_mgr.c (revision 7535) +++ opensm/osm_mcast_mgr.c (working copy) @@ -1461,6 +1461,25 @@ osm_mcast_mgr_dump_mcast_routes( OSM_LOG_EXIT( p_mgr->p_log ); } +static void +__unlink_mcast_fdb(IN osm_mcast_mgr_t* const p_mgr) +{ + char *file_name = NULL; + + file_name = + (char*)cl_malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 12); + + CL_ASSERT(file_name); + + strcpy(file_name, p_mgr->p_subn->opt.dump_files_dir); + strcat(file_name, "/osm.mcfdbs"); + + unlink(file_name); + + if (file_name) + cl_free(file_name); +} + /********************************************************************** Process the entire group. @@ -1496,7 +1515,9 @@ osm_mcast_mgr_process_mgrp( /* initialize the mc fdb dump file: */ if( osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) - unlink("/tmp/osm.mcfdbs"); + { + __unlink_mcast_fdb( p_mgr ); + } /* Walk the switches and download the tables for each. @@ -1570,7 +1591,9 @@ osm_mcast_mgr_process( /* initialize the mc fdb dump file: */ if( osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) - unlink("/tmp/osm.mcfdbs"); + { + __unlink_mcast_fdb( p_mgr ); + } /* Walk the switches and download the tables for each. Index: opensm/osm_ucast_mgr.c =================================================================== --- opensm/osm_ucast_mgr.c (revision 7535) +++ opensm/osm_ucast_mgr.c (working copy) @@ -1047,6 +1047,7 @@ osm_ucast_mgr_process( uint32_t iteration_max; osm_signal_t signal; cl_qmap_t *p_sw_guid_tbl; + char *file_name = NULL; OSM_LOG_ENTER( p_mgr->p_log, osm_ucast_mgr_process ); @@ -1149,7 +1150,20 @@ osm_ucast_mgr_process( /* initialize the fdb dump file: */ if( osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) - unlink("/tmp/osm.fdbs"); + { + file_name = + (char*)cl_malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 12); + + CL_ASSERT(file_name); + + strcpy(file_name, p_mgr->p_subn->opt.dump_files_dir); + strcat(file_name, "/osm.fdbs"); + + unlink(file_name); + + if (file_name) + cl_free(file_name); + } cl_qmap_apply_func( p_sw_guid_tbl, __osm_ucast_mgr_process_tbl, p_mgr ); From eitan at mellanox.co.il Tue May 30 05:36:26 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 30 May 2006 15:36:26 +0300 Subject: [openib-general] RE: [PATCH] OpenSM: Remove unicast and multicast dump files relativeto dump_files_dir Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30236876A@mtlexch01.mtl.com> Looks good. Thanks for fixing it. Eitan > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Tuesday, May 30, 2006 3:12 PM > To: openib-general at openib.org > Cc: Eitan Zahavi > Subject: [PATCH] OpenSM: Remove unicast and multicast dump files relativeto > dump_files_dir > > OpenSM: Remove unicast and multicast dump files relative to > dump_files_dir > > Signed-off-by: Hal Rosenstock > > Index: opensm/osm_mcast_mgr.c > =================================================================== > --- opensm/osm_mcast_mgr.c (revision 7535) > +++ opensm/osm_mcast_mgr.c (working copy) > @@ -1461,6 +1461,25 @@ osm_mcast_mgr_dump_mcast_routes( > OSM_LOG_EXIT( p_mgr->p_log ); > } > > +static void > +__unlink_mcast_fdb(IN osm_mcast_mgr_t* const p_mgr) > +{ > + char *file_name = NULL; > + > + file_name = > + (char*)cl_malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 12); > + > + CL_ASSERT(file_name); > + > + strcpy(file_name, p_mgr->p_subn->opt.dump_files_dir); > + strcat(file_name, "/osm.mcfdbs"); > + > + unlink(file_name); > + > + if (file_name) > + cl_free(file_name); > +} > + > /********************************************************************** > Process the entire group. > > @@ -1496,7 +1515,9 @@ osm_mcast_mgr_process_mgrp( > > /* initialize the mc fdb dump file: */ > if( osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) > - unlink("/tmp/osm.mcfdbs"); > + { > + __unlink_mcast_fdb( p_mgr ); > + } > > /* > Walk the switches and download the tables for each. > @@ -1570,7 +1591,9 @@ osm_mcast_mgr_process( > > /* initialize the mc fdb dump file: */ > if( osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) > - unlink("/tmp/osm.mcfdbs"); > + { > + __unlink_mcast_fdb( p_mgr ); > + } > > /* > Walk the switches and download the tables for each. > Index: opensm/osm_ucast_mgr.c > =================================================================== > --- opensm/osm_ucast_mgr.c (revision 7535) > +++ opensm/osm_ucast_mgr.c (working copy) > @@ -1047,6 +1047,7 @@ osm_ucast_mgr_process( > uint32_t iteration_max; > osm_signal_t signal; > cl_qmap_t *p_sw_guid_tbl; > + char *file_name = NULL; > > OSM_LOG_ENTER( p_mgr->p_log, osm_ucast_mgr_process ); > > @@ -1149,7 +1150,20 @@ osm_ucast_mgr_process( > > /* initialize the fdb dump file: */ > if( osm_log_is_active( p_mgr->p_log, OSM_LOG_ROUTING ) ) > - unlink("/tmp/osm.fdbs"); > + { > + file_name = > + (char*)cl_malloc(strlen(p_mgr->p_subn->opt.dump_files_dir) + 12); > + > + CL_ASSERT(file_name); > + > + strcpy(file_name, p_mgr->p_subn->opt.dump_files_dir); > + strcat(file_name, "/osm.fdbs"); > + > + unlink(file_name); > + > + if (file_name) > + cl_free(file_name); > + } > > cl_qmap_apply_func( p_sw_guid_tbl, > __osm_ucast_mgr_process_tbl, p_mgr ); > From kotoko at centralpets.com Tue May 30 05:56:36 2006 From: kotoko at centralpets.com (kotoko at centralpets.com) Date: Tue, 30 May 2006 05:56:36 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCRD5AXDtYTD5GfiRqJF4bKEI=?= =?iso-2022-jp?b?GyRCJDckPxsoQg==?= Message-ID: 20060530203011.33074mail@mail.hyper_luckylady8754158754_lookserver772_gogosystem03_BigWillieStyle.go 先日、紹介のご連絡をさせて頂いた、紹介担当の山本 梓です。 その後、裕子様へのご連絡がなかったのでメールさせて頂きました。 今回のご連絡が最後の指名になります。 ■あなた様へ《高額逆援助》のご指名です■ □名前□ 裕子(ゆうこ)様 □年齢□ 33歳 □職業□ 某通販会社 社長 □趣味□ ショッピング、エステ □お礼□ 1月50万以上 http://big-willie.net/h/ □↓PR↓□ 連絡が来ると信じていました・・・でも、なんで連絡くれないんですか? ここまで私は本気なのに。。。これでだめなら諦めます。 150万円今すぐ持っていきます。それでもだめですか? □↑PR↑□ 下記URLよりご返信できます。 http://big-willie.net/h/ ◆裕子様からのメールは、あなた様を【直接指名】されて送られています。 当社より…今後の指名を約束するものではありませんので、 あなた様へのご紹介は、今回限りになるかもしれんません。 ※無料アドレスでも登録可能なので安心です。 From halr at voltaire.com Tue May 30 06:20:27 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 May 2006 09:20:27 -0400 Subject: [openib-general] special qp's creation from userspace In-Reply-To: References: Message-ID: <1148995226.4358.111151.camel@hal.voltaire.com> On Tue, 2006-05-30 at 05:35, yipee wrote: > Hi, > > Can I use ib_mad_port_close() (mad.c) to close qp0 & qp1 Yes, that would close QP0/1 on each port. Just unloading the ib_mad module will have that effect. All agents and managers (using ib_mad) should be stopped. I'm presuming you don't want these to run. > amd reopen them from a userspace application? I'm not sure whether user verbs supports the special QPs (QP0/1) or not but I suspect not (since it doesn't appear that port_num can be supplied to ibv_create_qp). > That way I can handle all mads without any kernel intervention? There would still be kernel intervention as these MADs will need to be moved in and out of userspace; just less intervention. -- Hal > Thanks, > x > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Tue May 30 07:14:55 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 May 2006 07:14:55 -0700 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: <10e223bf0605300403lcc24b8bwf1e1d7059edab416@mail.gmail.com> (Leonid Arsh's message of "Tue, 30 May 2006 14:03:19 +0300") References: <20060509060958.GA482@voltaire.com> <10e223bf0605300403lcc24b8bwf1e1d7059edab416@mail.gmail.com> Message-ID: Leonid> Roland, Aren't you going to apply these patches? This is Leonid> a standard feature, and we need it rather urgently. Sorry, I had let these patches fall off of my queue. So why is this an urgent feature? A good description of your use case would make it easier for me to write the changelog entries (since your patches didn't include usable descriptions). Thanks, Roland From rdreier at cisco.com Tue May 30 07:16:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 May 2006 07:16:00 -0700 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: <20060530093930.GE21266@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 30 May 2006 12:39:30 +0300") References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> <20060529161558.GU21266@mellanox.co.il> <20060530093930.GE21266@mellanox.co.il> Message-ID: Michael> The following fixed the issue for us, pls review. Can Michael> this go into 2.6.17? If yes, I think it's prudent to let Michael> it run for another night before pushing it out, since we Michael> had to touch a lot of lines here. We'll do that and let Michael> you know tomorrow. I don't really see this as 2.6.17 material -- it's slightly annoying but at this point in the cycle it's probably not worth the risk. - R. From rdreier at cisco.com Tue May 30 07:19:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 May 2006 07:19:24 -0700 Subject: [openib-general] Re: [PATCH] IB/iser: do I/O path allocations with GFP_NOIO In-Reply-To: (Or Gerlitz's message of "Tue, 30 May 2006 14:56:32 +0300 (IDT)") References: Message-ID: OK, I added both patches on top of what I have queued. - R. From mst at mellanox.co.il Tue May 30 07:24:44 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 30 May 2006 17:24:44 +0300 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> <20060529161558.GU21266@mellanox.co.il> <20060530093930.GE21266@mellanox.co.il> Message-ID: <20060530142444.GA8405@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: ia64: kernel unaligned access in ipoib > > I don't really see this as 2.6.17 material -- it's slightly annoying > but at this point in the cycle it's probably not worth the risk. 2.6.18 then? -- MST From rdreier at cisco.com Tue May 30 07:23:07 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 May 2006 07:23:07 -0700 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: <20060509060958.GA482@voltaire.com> (Leonid Arsh's message of "Tue, 9 May 2006 09:09:58 +0300") References: <20060509060958.GA482@voltaire.com> Message-ID: > +struct port_info { > + __be64 mkey; > + __be64 gid_prefix; > + __be16 lid; > + __be16 sm_lid; > + __be32 cap_mask; > + __be16 diag_code; > + __be16 mkey_lease_period; > + u8 local_port_num; > + u8 link_width_enabled; > + u8 link_width_supported; > + u8 link_width_active; > + u8 linkspeed_portstate; /* 4 bits, 4 bits */ > + u8 portphysstate_linkdown; /* 4 bits, 4 bits */ > + u8 mkeyprot_resv_lmc; /* 2 bits, 3 bits, 3 bits */ > + u8 linkspeedactive_enabled; /* 4 bits, 4 bits */ > + u8 neighbormtu_mastersmsl; /* 4 bits, 4 bits */ > + u8 vlcap_inittype; /* 4 bits, 4 bits */ > + u8 vl_high_limit; > + u8 vl_arb_high_cap; > + u8 vl_arb_low_cap; > + u8 inittypereply_mtucap; /* 4 bits, 4 bits */ > + u8 vlstallcnt_hoqlife; /* 3 bits, 5 bits */ > + u8 operationalvl_pei_peo_fpi_fpo; /* 4 bits, 1, 1, 1, 1 */ > + __be16 mkey_violations; > + __be16 pkey_violations; > + __be16 qkey_violations; > + u8 guid_cap; > + u8 clientrereg_resv_subnetto; /* 1 bit, 2 bits, 5 bits */ > + u8 resv_resptimevalue; /* 3 bits, 5 bits */ > + u8 localphyerrors_overrunerrors; /* 4 bits, 4 bits */ > + __be16 max_credit_hint; > + u8 resv; > + u8 link_roundtrip_latency[3]; > +} __attribute__ ((packed)); Any reason why this needs to be packed? It looks like everything is naturally aligned to its size anyway. From rdreier at cisco.com Tue May 30 07:24:16 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 May 2006 07:24:16 -0700 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: <20060530142444.GA8405@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 30 May 2006 17:24:44 +0300") References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> <20060529161558.GU21266@mellanox.co.il> <20060530093930.GE21266@mellanox.co.il> <20060530142444.GA8405@mellanox.co.il> Message-ID: Michael> 2.6.18 then? Yes, definitely. - R. From rdreier at cisco.com Tue May 30 07:31:24 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 May 2006 07:31:24 -0700 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: <20060530142444.GA8405@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 30 May 2006 17:24:44 +0300") References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> <20060529161558.GU21266@mellanox.co.il> <20060530093930.GE21266@mellanox.co.il> <20060530142444.GA8405@mellanox.co.il> Message-ID: By the way, did you get a chance to test the AH leak fix to see if it really fixes your leak? That would make me feel better about asking Linus to pull it into 2.6.17. - R. From Don.Albert at Bull.com Tue May 30 07:35:18 2006 From: Don.Albert at Bull.com (Don.Albert at Bull.com) Date: Tue, 30 May 2006 07:35:18 -0700 Subject: [openib-general] Re: NOP problem in ib_mthca on OFED RC4 In-Reply-To: <20060528071838.GF21266@mellanox.co.il> Message-ID: Michael, > > The ib_mthca module now initializes correctly on > > both EM64T machines. I noticed some discussion between you and Roland about > > making the parameter "fw_cmd_doorbell=0" the default. Did this > > occur in RC5? > > Yes, we changed fw_cmd_doorbell to 0 by default for now because it seemed > safer. I expect if you load mthca with fw_cmd_doorbell=1 you still get an > error, isn't that right? > Although the change in RC5 for fw_cmd_doorbell *seemed* to allow the ib_mthca module to initialize, I don't think I am out of the woods yet on this particular machine. The link never comes up, and the other machine, which is connected back to back with this one, and on which I am trying to run OpenSM, does not get a response to its MAD packets. When I try to shut down the openib stack with the "/etc/init.d/openibd stop" script, the processes hang trying to set device "ib0" down. Here is an excerpt from a terminal session: [jatoba] (ib) ib> ibstat CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.0.800 Hardware version: a0 Node GUID: 0x0002c90200216e40 System image GUID: 0x0002c90200216e43 Port 1: State: Initializing Physical state: LinkUp Rate: 20 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0002c90200216e41 [jatoba] (ib) ib> ibstatus Infiniband device 'mthca0' port 1 status: default gid: fe80:0000:0000:0000:0002:c902:0021:6e41 base lid: 0x0 sm lid: 0x0 state: 2: INIT phys state: 5: LinkUp rate: 20 Gb/sec (4X DDR) [jatoba] (ib) ib> /etc/init.d/opensmd status opensm is stopped [jatoba] (ib) ib> /etc/init.d/openibd status HCA driver loaded Configured devices: ib0 Currently active devices: ib0 The following modules are also loaded: ib_cm [jatoba] (ib) ib> /etc/init.d/openibd stop At this point the command hangs. Doing a "ps -ef" from another terminal reveals: root 6882 6755 0 15:31 pts/0 00:00:00 /bin/bash /etc/init.d/openibd stop root 7012 6882 0 15:31 pts/0 00:00:00 /bin/bash /sbin/ifdown ib0 root 7031 7012 0 15:31 pts/0 00:00:00 ip link set dev ib0 down I tried using gdb to "attach" to process 7031 to see its stack, but that hung too, as well as an attempt to see what the status of the interface was with "/sbin/ifconfig". It is rather difficult for me to debug this sort of hang, since I telecommute from Tucson and the machines are located in Phoenix. Anyone have any suggestions? -Don Albert- -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at mellanox.co.il Tue May 30 07:41:05 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 30 May 2006 17:41:05 +0300 Subject: [openib-general] QoS RFC Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30236876C@mtlexch01.mtl.com> Hi All Please find the attached RFC describing how QoS policy support could be implemented in the OpenFabrics stack. Your comments are welcome. Eitan RFC: OpenFabrics Enhancements for QoS Support =============================================== Authors: . Eitan Zahavi Date: .... May 2006. Revision: 0.1 Table of contents: 1. Overview 2. Architecture 3. Supported Policy 4. CMA functionality 5. IPoIB functionality 6. SDP functionality 7. SRP functionality 8. iSER functionality 9. OpenSM functionality 1. Overview ------------ Quality of Service requirements stem from the realization of I/O consolidation over IB network: As multiple applications and ULPs share the same fabric, means to control their use of the network resources are becoming a must. The basic need is to differentiate the service levels provided to different traffic flows. Such that a policy could be enforced and control each flow utilization of the fabric resources. IBTA specification defined several hardware features and management interfaces to support QoS: * Up to 15 Virtual Lanes (VL) could carry traffic in a non-blocking manner * Arbitration between traffic of different VL is performed by a 2 priority levels weighted round robin arbiter. The arbiter is programmable with a sequence of (VL, weight) pairs and maximal number of high priority credits to be processed before low priority is served * Packets carry class of service marking in the range 0 to 15 in their header SL field * Each switch can map the incoming packet by its SL to a particular output VL based on programmable table VL=SL-to-VL-MAP(in-port, out-port, SL) * The Subnet Administrator controls each communication flow parameters by providing them as a response to Path Record query The IB QoS features provide the means to implement a DiffServ like architecture. DiffServ architecture (IETF RFC2474 2475) is widely used today in highly dynamic fabrics. This proposal provides the detailed functional definition for the various software elements that are required to enable a DiffServ like architecture over the OpenFabrics software stack. 2. Architecture ---------------- This proposal split the QoS functionality between the SM/SA, CMA and the various ULPS. We take the "chronology approach" to describe how the overall system works: 2.1. The network manager (human) provides a set of rules (policy) that defines how the network is being configured and how its resources are split to different QoS-Levels. The policy also define how to decide which QoS-Level each application or ULP or service use. 2.2. The SM analyzes the provided policy to see if it is realizable and performs the necessary fabric setup. The SM may continuously monitor the policy and adapt to changes in it. Part of this policy defines the default QoS-Level of each partition. The SA is being enhanced to match the requested Source, Destination, TClass, Service-ID (and optionally SL and priority) against the policy. So clients (ULPs, programs) can obtain a policy enforced QoS. The SM is also enhanced to support setting up partitions with appropriate IPoIB broadcast group. This broadcast group carries its QoS attributes: TClass, SL, MTU and RATE. 2.3. IPoIB is being setup. IPoIB uses the SL, MTU and RATE available on the multicast group which forms the broadcast group of this partition. 2.4. MPI which provides non IB based connection management should be configured to run using hard coded SLs. It uses these SLs in every QP being opened. 2.5. ULPs that use CM interface (like SRP) should have their own pre-assigned Service-ID and use it while obtaining PathRecord for establishing their connections. The SA receiving the PathRecord should match it against the policy and return the appropriate PathRecord including SL, MTU, RATE and TClass. 2.6. ULPs and programs using CMA to establish RC connection should provide the CMA the target IP and Service-ID. Some of the ULPs might also provide TClass (E.g. for SDP sockets that are provided the TOS socket option). The CMA should then use the provided Service-ID and optional TClass and pass them in the PathRecord request. The resulting PathRecord should be used for configuring the connection QP. PathRecord and MultiPathRecord enhancement for QoS: As mentioned above the PathRecord and MultiPathRecord attributes should be enhanced to carry the Service-ID which is a 64bit value. Given the existing definition for these attributes we propose to use the following fields for Service-ID: * For PathRecord: use the first 2 reserved fields whicg are 32bits each (component masks 0x1 and 0x2). Component mask 1 should be used to refer to the merged Service-ID field * For MultiPathRecord: use 2 reserved fields: 1. after the packet life (8 bits) which is component mask bit 0x10000 (17) 2. the field before SDGID1 (56 bits) which is component mask bit 0x200000 (22) Once merged they should be selected using component mask bit 0x10000 (17) A new capability bit should describe the SM QoS support in the SA class port info. This approach provides an easy migration path for existing access layer and ULPs by not introducing a new attribute. 3. Supported Policy -------------------- The QoS policy supported by this proposal is divided into 4 sub sections: * Node Group: a set of HCAs, Routers or Switches that share the same settings. A node groups might be a partition defined by the partition manager policy in terms of GUIDs. Future implementations might provide support for NodeDescription based definition of node groups. * Fabric Setup: Defines how the SL2VL and VLArb tables should be setup. This policy definition assumes the computation of target behavior should be performed outside of OpenSM. * QoS-Levels Definition: This section defines the possible sets of parameters for QoS that a client might be mapped to. Each set holds: SL and optionally: Max MTU, Max Rate, Path Bits (in case LMC > 0 is used for QoS) and TClass. * Matching Rules: A list of rules that match an incoming PathRecord request to a QoS-Level. The rules are processed in order such as the first match is applied. Each rule is built out of set of match expressions which should all match for the rule to apply. The matching expressions are defined for the following fields ** SRC and DST to lists of node groups ** Service-ID to a list of Service-ID or Service-ID ranges ** TClass to a list of TClass values or ranges XML style syntax is provided for the policy file. However, a strict BNF format (provided in section 8) should be used for parsing it. Storage our SRP storage targets 0x1000000000000001 0x1000000000000002 Virtual Servers node desc and IB port # vs1/HCA-1/P1 vs3/HCA-1/P1 vs3/HCA-2/P1 Partition 1 default settings Part1 Routers all routers ROUTER Part1 * * 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 Storage * 1 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1 Storage * 0:255,1:127,2:63,3:31,4:15,5:7,6:3,7:1 8:255,9:127,10:63,11:31,12:15,13:7,14:3 10 1 for the lowest priority comm 16 2 low latency best bandwidth 0 7 3 just an example 0 32 1 1 1 low latency by tclass 7-9 or 11> 7-9,11 1 2 Storage targets connection> Storage 22,4719 3 4. IPoIB --------- IPoIB already query the SA for its broadcast group information. The additional functionality required is for IPoIB to provide the broadcast group SL, MTU, RATE and TClass in every following PathRecord query performed when a new UDAV is needed by IPoIB. We could assign a special Service-ID for IPoIB use but since all communication on the same IPoIB interface shares the same QoS-Level without the ability to differentiate it by target service we can ignore it for simplicity. 5. CMA features ---------------- The CMA interface supports Service-ID through the notion of port space as a prefixes to the port_num which is part of the sockaddr provided to rdma_resolve_add(). What is missing is the explicit request for a TClass that should allow the ULP (like SDP) to propagate a specific request for a class of service. A mechanism for providing the TClass is available in the IPv6 address, so we could use that address field. Another option is to implement a special connection options API for CMA. Missing functionality by CMA is the usage of the provided TClass and Service-ID in the sent PathRecord. When a response is obtained it is an existing requirement for the CMA to use the PathRecord from the response in setting up the QP address vector. 6. SDP ------- SDP uses CMA for building its connections. The Service-ID for SDP is 0x000000000001PPPP, where PPPP are 4 hex digits holding the remote TCP/IP Port Number to connect to. SDP might be provided with SO_PRIORITY socket option. In that case the value provided should be sent to the CMA as the TClass option of that connection. 7. SRP ------- Current SRP implementation uses its own CM callbacks (not CMA). So SRP should fill in the Service-ID in the PathRecord by itself and use that information in setting up the QP. The T10 SRP standard defines the SRP Service-ID to be defined by the SRP target I/O Controller (but they should also comply with IBTA Service- ID rules). Anyway, the Service-ID is reported by the I/O Controller in the ServiceEntries DMA attribute and should be used in the PathRecord if the SA reports its ability to handle QoS PathRecords. 8. iSER -------- iSER uses CMA and thus should be very close to SDP. The Service-ID for iSER should be TBD. 9. OpenSM features ------------------- The QoS related functionality to be provided by OpenSM can be split into two main parts: 3.1. Fabric Setup During fabric initialization the SM should parse the policy and apply its settings to the discovered fabric elements. The following actions should be performed: * Parsing of policy * Node Group identification. Warning should be provided for each node not specified but found. * SL2VL settings validation should be checked: + A warning will be provided if there are no matching targets for the SL2VL setting statement. + An error message will be printed to the log file if an invalid setting is found. A setting is invalid if it refers to: - Non existing port numbers of the target devices - Unsupported VLs for the target device. In the later case the map to non existing VLs should be replaced to VL15 i.e. packets will be dropped. * SL2VL setting is to be performed * VL Arbitration table settings should be validated according to the following rules: + A warning will be provided if there are no matching targets for the setting statement + An error will be provided if the port number exceeds the target ports + An error will be generated if the table length exceeds device capabilities + An warning will be generated if the table quote a VL that is not supported by the target device * VL Arbitration tables will be set on the appropriate targets 3.2. PathRecord query handling: OpenSM should be able to enforce the provided policy on client request. The overall flow for such requests is: first the request is matched against the defined match rules such that the target QoS-Level definition is found. Given the QoS-Level a path(s) search is performed with the given restrictions imposed by that level. The following two sections describe these steps. One issue not standardized by the IBTA is how Service-ID is carried in the PathRecord and MultiPathRecord attributes. There are basically two options: a. Replace the SM-Key field by the Service-ID. In that case no component mask bit will be assigned to it. Such that if the field is zero we should treat it as if the component mask bit is clear. b. Encode it into spare fields. For PathRecord the first two fields are reserved and are 64 bit when combined. The first component mask bit maps to the first reserved field and should be used for Service-ID masking. For MultiPathRecord attribute there are no adjacent reserve fields that makes a 64 bit field. So the reserve field following the packet-lifetime (8 bits) combined with the reserved field DGIDCount (56 bits) can make the Service-ID. In this case also the first reserve field component mask bit should be used as the Service-ID component mask bit. 3.2.1. Matching rule search: A rule is "matching" a PathRecord request using the following criteria: * Matching rules provide values in a list of either single value, or range of values. A PathRecord field is "matching" the rule field if it is explicitly noted in the list of values or is one of the values covered by a range included in the field values list. * Only PathRecord fields that have their component mask bit set should be compared. * For a rule to be "matching" a PathRecord request all the rule fields should be "matching" their PathRecord fields. Such that a PathRecord request that does not have a component mask field set for one of the rule defined fields - can not match that rule. * A PathRecord request that have a component mask bit set for one of the fields that is not defined by the rule - can match the rule. The algorithm to be used for searching for a rule match might be as simple as a sequential search through all rules or enhanced for better performance. The semantics of every rule field and its matching PathRecord field are described below: * Source: the SGID or SLID should be part of this group * Destination: the DGID or DLID should be part of this group * Service-ID: check if the requested Service-ID (available in the PathRecord old SM-Key field) is matching any of this rule Service-IDs * TClass: check if the PathRecord TClass field is matching 3.2.2 PathRecord response generation: The QoS-Level pointed by the first rule that matches the PathRecord request should be used for obtaining the response SL, MTU-Limit, RATE-Limit, Path-Bits and TClass. A default QoS-Level should be used if no rule is matching the query. The efficient algorithm for finding paths that meet the QoS-Level criteria is beyond the scope of this RFC and left for the implementer to provide. However the criteria by which the paths match the QoS-Level are described below: * SL: The paths found should all use the given SL. For that sake PathRecord algorithm should traverse the path from source to destination only through ports that carry a valid VL (not VL15) by the SL2VL map (should consider input and output ports and SL). * MTU-Limit: The resulting paths MTU should not exceed the given MTU-Limit * Rate-Limit: The resulting paths RATE should not exceed the given RATE-Limit (rate limit is given in units of link BW = Width*Speed according to IBTA Specification Vol-1 table-205 p-901 l-24). * Path-Bits: define the target LID lowest bits (number of bits defined by the target port PortInfo.LMC field). The path should traverse the LFT using the target port LID with the path-bits set. * TClass: should be returned in the result PathRecord. When routing is going to be supported by OpenSM - we might use this field in selecting the target router too in a TBD way. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue May 30 07:43:41 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 May 2006 07:43:41 -0700 Subject: [openib-general] Re: QoS RFC In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30236876C@mtlexch01.mtl.com> (Eitan Zahavi's message of "Tue, 30 May 2006 17:41:05 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30236876C@mtlexch01.mtl.com> Message-ID: > Service-ID: > * For PathRecord: use the first 2 reserved fields whicg are 32bits each > (component masks 0x1 and 0x2). Component mask 1 should be used to > refer to the > merged Service-ID field > A new capability bit should describe the SM QoS support in the SA class > port > info. This approach provides an easy migration path for existing access > layer > and ULPs by not introducing a new attribute. This is OK but it's sort of a pain to have to query SA ClassPortInfo all the time. Do you have a plan for how to make this transparent to ULPs? (BTW something in your email client is really messing up the formatting of your message) - R. From rdreier at cisco.com Tue May 30 07:44:33 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 May 2006 07:44:33 -0700 Subject: [openib-general] Re: QoS RFC In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30236876C@mtlexch01.mtl.com> (Eitan Zahavi's message of "Tue, 30 May 2006 17:41:05 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30236876C@mtlexch01.mtl.com> Message-ID: BTW I think these changes to PathRecord and MultiPathRecord need to be standardized through IBTA before we implement it in Linux, to avoid a non-standard implementation proliferating everywhere. From rdreier at cisco.com Tue May 30 07:46:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 May 2006 07:46:08 -0700 Subject: [openib-general] Re: NOP problem in ib_mthca on OFED RC4 In-Reply-To: (Don Albert's message of "Tue, 30 May 2006 07:35:18 -0700") References: Message-ID: Don> It is rather difficult for me to debug this sort of hang, Don> since I telecommute from Tucson and the machines are located Don> in Phoenix. Anyone have any suggestions? cat /proc//wchan for the process in question. "echo t > /proc/sysrq-trigger" will produce copious output that might help as well. - R. From eitan at mellanox.co.il Tue May 30 07:51:50 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 30 May 2006 17:51:50 +0300 Subject: [openib-general] RE: QoS RFC Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30236876F@mtlexch01.mtl.com> Hi Roland, > > This is OK but it's sort of a pain to have to query SA ClassPortInfo > all the time. Do you have a plan for how to make this transparent to ULPs? [EZ] Well, a ULP that uses CMA will have it handled by CMA... But an old SM implementation that does not support this kind of PathRecord extension will probably choke on the new fields once their component mask bits are set. You could however query once for each Client-Reregister event. > > (BTW something in your email client is really messing up the > formatting of your message) [EZ] Thanks I will resend . > > - R. From eitan at mtlpx01.yok.mtl.com Tue May 30 07:53:29 2006 From: eitan at mtlpx01.yok.mtl.com (Eitan Zahavi) Date: Tue, 30 May 2006 17:53:29 +0300 Subject: [openib-general] QoS RFC - Resend using a friendly mailer Message-ID: To: OPENIB Subject: QoS RFC - Resend using a friendly mailer --text follows this line-- Hi All Please find the attached RFC describing how QoS policy support could be implemented in the OpenFabrics stack. Your comments are welcome. Eitan RFC: OpenFabrics Enhancements for QoS Support =============================================== Authors: . Eitan Zahavi Date: .... May 2006. Revision: 0.1 Table of contents: 1. Overview 2. Architecture 3. Supported Policy 4. CMA functionality 5. IPoIB functionality 6. SDP functionality 7. SRP functionality 8. iSER functionality 9. OpenSM functionality 1. Overview ------------ Quality of Service requirements stem from the realization of I/O consolidation over IB network: As multiple applications and ULPs share the same fabric, means to control their use of the network resources are becoming a must. The basic need is to differentiate the service levels provided to different traffic flows. Such that a policy could be enforced and control each flow utilization of the fabric resources. IBTA specification defined several hardware features and management interfaces to support QoS: * Up to 15 Virtual Lanes (VL) could carry traffic in a non-blocking manner * Arbitration between traffic of different VL is performed by a 2 priority levels weighted round robin arbiter. The arbiter is programmable with a sequence of (VL, weight) pairs and maximal number of high priority credits to be processed before low priority is served * Packets carry class of service marking in the range 0 to 15 in their header SL field * Each switch can map the incoming packet by its SL to a particular output VL based on programmable table VL=SL-to-VL-MAP(in-port, out-port, SL) * The Subnet Administrator controls each communication flow parameters by providing them as a response to Path Record query The IB QoS features provide the means to implement a DiffServ like architecture. DiffServ architecture (IETF RFC2474 2475) is widely used today in highly dynamic fabrics. This proposal provides the detailed functional definition for the various software elements that are required to enable a DiffServ like architecture over the OpenFabrics software stack. 2. Architecture ---------------- This proposal split the QoS functionality between the SM/SA, CMA and the various ULPS. We take the "chronology approach" to describe how the overall system works: 2.1. The network manager (human) provides a set of rules (policy) that defines how the network is being configured and how its resources are split to different QoS-Levels. The policy also define how to decide which QoS-Level each application or ULP or service use. 2.2. The SM analyzes the provided policy to see if it is realizable and performs the necessary fabric setup. The SM may continuously monitor the policy and adapt to changes in it. Part of this policy defines the default QoS-Level of each partition. The SA is being enhanced to match the requested Source, Destination, TClass, Service-ID (and optionally SL and priority) against the policy. So clients (ULPs, programs) can obtain a policy enforced QoS. The SM is also enhanced to support setting up partitions with appropriate IPoIB broadcast group. This broadcast group carries its QoS attributes: TClass, SL, MTU and RATE. 2.3. IPoIB is being setup. IPoIB uses the SL, MTU and RATE available on the multicast group which forms the broadcast group of this partition. 2.4. MPI which provides non IB based connection management should be configured to run using hard coded SLs. It uses these SLs in every QP being opened. 2.5. ULPs that use CM interface (like SRP) should have their own pre-assigned Service-ID and use it while obtaining PathRecord for establishing their connections. The SA receiving the PathRecord should match it against the policy and return the appropriate PathRecord including SL, MTU, RATE and TClass. 2.6. ULPs and programs using CMA to establish RC connection should provide the CMA the target IP and Service-ID. Some of the ULPs might also provide TClass (E.g. for SDP sockets that are provided the TOS socket option). The CMA should then use the provided Service-ID and optional TClass and pass them in the PathRecord request. The resulting PathRecord should be used for configuring the connection QP. PathRecord and MultiPathRecord enhancement for QoS: As mentioned above the PathRecord and MultiPathRecord attributes should be enhanced to carry the Service-ID which is a 64bit value. Given the existing definition for these attributes we propose to use the following fields for Service-ID: * For PathRecord: use the first 2 reserved fields whicg are 32bits each (component masks 0x1 and 0x2). Component mask 1 should be used to refer to the merged Service-ID field * For MultiPathRecord: use 2 reserved fields: 1. after the packet life (8 bits) which is component mask bit 0x10000 (17) 2. the field before SDGID1 (56 bits) which is component mask bit 0x200000 (22) Once merged they should be selected using component mask bit 0x10000 (17) A new capability bit should describe the SM QoS support in the SA class port info. This approach provides an easy migration path for existing access layer and ULPs by not introducing a new attribute. 3. Supported Policy -------------------- The QoS policy supported by this proposal is divided into 4 sub sections: * Node Group: a set of HCAs, Routers or Switches that share the same settings. A node groups might be a partition defined by the partition manager policy in terms of GUIDs. Future implementations might provide support for NodeDescription based definition of node groups. * Fabric Setup: Defines how the SL2VL and VLArb tables should be setup. This policy definition assumes the computation of target behavior should be performed outside of OpenSM. * QoS-Levels Definition: This section defines the possible sets of parameters for QoS that a client might be mapped to. Each set holds: SL and optionally: Max MTU, Max Rate, Path Bits (in case LMC > 0 is used for QoS) and TClass. * Matching Rules: A list of rules that match an incoming PathRecord request to a QoS-Level. The rules are processed in order such as the first match is applied. Each rule is built out of set of match expressions which should all match for the rule to apply. The matching expressions are defined for the following fields ** SRC and DST to lists of node groups ** Service-ID to a list of Service-ID or Service-ID ranges ** TClass to a list of TClass values or ranges XML style syntax is provided for the policy file. However, a strict BNF format (provided in section 8) should be used for parsing it. Storage our SRP storage targets 0x1000000000000001 0x1000000000000002 Virtual Servers node desc and IB port # vs1/HCA-1/P1 vs3/HCA-1/P1 vs3/HCA-2/P1 Partition 1 default settings Part1 Routers all routers ROUTER Part1 * * 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 Storage * 1 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1 Storage * 0:255,1:127,2:63,3:31,4:15,5:7,6:3,7:1 8:255,9:127,10:63,11:31,12:15,13:7,14:3 10 1 for the lowest priority comm 16 2 low latency best bandwidth 0 7 3 just an example 0 32 1 1 1 low latency by tclass 7-9 or 11> 7-9,11 1 2 Storage targets connection> Storage 22,4719 3 4. IPoIB --------- IPoIB already query the SA for its broadcast group information. The additional functionality required is for IPoIB to provide the broadcast group SL, MTU, RATE and TClass in every following PathRecord query performed when a new UDAV is needed by IPoIB. We could assign a special Service-ID for IPoIB use but since all communication on the same IPoIB interface shares the same QoS-Level without the ability to differentiate it by target service we can ignore it for simplicity. 5. CMA features ---------------- The CMA interface supports Service-ID through the notion of port space as a prefixes to the port_num which is part of the sockaddr provided to rdma_resolve_add(). What is missing is the explicit request for a TClass that should allow the ULP (like SDP) to propagate a specific request for a class of service. A mechanism for providing the TClass is available in the IPv6 address, so we could use that address field. Another option is to implement a special connection options API for CMA. Missing functionality by CMA is the usage of the provided TClass and Service-ID in the sent PathRecord. When a response is obtained it is an existing requirement for the CMA to use the PathRecord from the response in setting up the QP address vector. 6. SDP ------- SDP uses CMA for building its connections. The Service-ID for SDP is 0x000000000001PPPP, where PPPP are 4 hex digits holding the remote TCP/IP Port Number to connect to. SDP might be provided with SO_PRIORITY socket option. In that case the value provided should be sent to the CMA as the TClass option of that connection. 7. SRP ------- Current SRP implementation uses its own CM callbacks (not CMA). So SRP should fill in the Service-ID in the PathRecord by itself and use that information in setting up the QP. The T10 SRP standard defines the SRP Service-ID to be defined by the SRP target I/O Controller (but they should also comply with IBTA Service- ID rules). Anyway, the Service-ID is reported by the I/O Controller in the ServiceEntries DMA attribute and should be used in the PathRecord if the SA reports its ability to handle QoS PathRecords. 8. iSER -------- iSER uses CMA and thus should be very close to SDP. The Service-ID for iSER should be TBD. 9. OpenSM features ------------------- The QoS related functionality to be provided by OpenSM can be split into two main parts: 3.1. Fabric Setup During fabric initialization the SM should parse the policy and apply its settings to the discovered fabric elements. The following actions should be performed: * Parsing of policy * Node Group identification. Warning should be provided for each node not specified but found. * SL2VL settings validation should be checked: + A warning will be provided if there are no matching targets for the SL2VL setting statement. + An error message will be printed to the log file if an invalid setting is found. A setting is invalid if it refers to: - Non existing port numbers of the target devices - Unsupported VLs for the target device. In the later case the map to non existing VLs should be replaced to VL15 i.e. packets will be dropped. * SL2VL setting is to be performed * VL Arbitration table settings should be validated according to the following rules: + A warning will be provided if there are no matching targets for the setting statement + An error will be provided if the port number exceeds the target ports + An error will be generated if the table length exceeds device capabilities + An warning will be generated if the table quote a VL that is not supported by the target device * VL Arbitration tables will be set on the appropriate targets 3.2. PathRecord query handling: OpenSM should be able to enforce the provided policy on client request. The overall flow for such requests is: first the request is matched against the defined match rules such that the target QoS-Level definition is found. Given the QoS-Level a path(s) search is performed with the given restrictions imposed by that level. The following two sections describe these steps. One issue not standardized by the IBTA is how Service-ID is carried in the PathRecord and MultiPathRecord attributes. There are basically two options: a. Replace the SM-Key field by the Service-ID. In that case no component mask bit will be assigned to it. Such that if the field is zero we should treat it as if the component mask bit is clear. b. Encode it into spare fields. For PathRecord the first two fields are reserved and are 64 bit when combined. The first component mask bit maps to the first reserved field and should be used for Service-ID masking. For MultiPathRecord attribute there are no adjacent reserve fields that makes a 64 bit field. So the reserve field following the packet-lifetime (8 bits) combined with the reserved field DGIDCount (56 bits) can make the Service-ID. In this case also the first reserve field component mask bit should be used as the Service-ID component mask bit. 3.2.1. Matching rule search: A rule is "matching" a PathRecord request using the following criteria: * Matching rules provide values in a list of either single value, or range of values. A PathRecord field is "matching" the rule field if it is explicitly noted in the list of values or is one of the values covered by a range included in the field values list. * Only PathRecord fields that have their component mask bit set should be compared. * For a rule to be "matching" a PathRecord request all the rule fields should be "matching" their PathRecord fields. Such that a PathRecord request that does not have a component mask field set for one of the rule defined fields can not match that rule. * A PathRecord request that have a component mask bit set for one of the fields that is not defined by the rule can match the rule. The algorithm to be used for searching for a rule match might be as simple as a sequential search through all rules or enhanced for better performance. The semantics of every rule field and its matching PathRecord field are described below: * Source: the SGID or SLID should be part of this group * Destination: the DGID or DLID should be part of this group * Service-ID: check if the requested Service-ID (available in the PathRecord old SM-Key field) is matching any of this rule Service-IDs * TClass: check if the PathRecord TClass field is matching 3.2.2 PathRecord response generation: The QoS-Level pointed by the first rule that matches the PathRecord request should be used for obtaining the response SL, MTU-Limit, RATE-Limit, Path-Bits and TClass. A default QoS-Level should be used if no rule is matching the query. The efficient algorithm for finding paths that meet the QoS-Level criteria is beyond the scope of this RFC and left for the implementer to provide. However the criteria by which the paths match the QoS-Level are described below: * SL: The paths found should all use the given SL. For that sake PathRecord algorithm should traverse the path from source to destination only through ports that carry a valid VL (not VL15) by the SL2VL map (should consider input and output ports and SL). * MTU-Limit: The resulting paths MTU should not exceed the given MTU-Limit * Rate-Limit: The resulting paths RATE should not exceed the given RATE-Limit (rate limit is given in units of link BW = Width*Speed according to IBTA Specification Vol-1 table-205 p-901 l-24). * Path-Bits: define the target LID lowest bits (number of bits defined by the target port PortInfo.LMC field). The path should traverse the LFT using the target port LID with the path-bits set. * TClass: should be returned in the result PathRecord. When routing is going to be supported by OpenSM we might use this field in selecting the target router too in a TBD way. From austin_rock223 at yahoo.com.au Tue May 30 07:06:12 2006 From: austin_rock223 at yahoo.com.au (=?iso-8859-1?q?Austin=20Rock?=) Date: Tue, 30 May 2006 07:06:12 -0700 (PDT) Subject: [openib-general] PROPOSAL Message-ID: <20060530140612.7060.qmail@web55505.mail.re4.yahoo.com> I have a new email address!You can now email me at: austin_rock223 at yahoo.com.au Dear Sir/Madam, For security reasons, I wouldn't like to reveal the identity of my client right now until I receive your letter of intent to assist in this transaction. For your information,my client have Twenty Million Dollars deposited in Diplomatic Courier Company and he wants to secretly invest this funds in choice properties (Commercial and Residential Houses) abroad due to political cartography in his country Benin Republic and I would appreciate it if you will kindly assist us to make this transaction a success for our mutual benefit.You will be responsible for managing the investment for my client until his arrival in your country with his family. Your every assistance rendered to us in this regard will be highly appreciated and rewarded handsomely by my client. Please when replying do not forget to let me have your private tel/fax numbers for easy and effective communication. Yours faithfully, - Austin Rock -------------- next part -------------- An HTML attachment was scrubbed... URL: From Don.Albert at Bull.com Tue May 30 07:55:34 2006 From: Don.Albert at Bull.com (Don.Albert at Bull.com) Date: Tue, 30 May 2006 07:55:34 -0700 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: <1148695049.4358.5966.camel@hal.voltaire.com> Message-ID: Hal, With your patch to OpenSM, I think everything is ok on the local node. The remote node is definitely having some problems, resulting in not responding to the MAD packets. I have entered a separate message on the problems with the "ib0" interface on that machine. > > On Fri, 2006-05-26 at 20:59, Hal Rosenstock wrote: > > > What next, coach? > > > > Can you turn on madeye on the remote node and see what packets are > > received and sent ? Let me know if you need help with that. I think you > > said you were running OFED, right ? > Yes, I am running kernel 2.6.16 with the OFED RC5 release. I will investigate how to run madeye, but the hangs on the remote machine are probably the root cause of the link failure. > I don't think madeye is part of OFED :-( Can it get added for RC6, > Tziporet ? I think it would be a useful tool to add for problems like > this. > > Also, was this a working setup before ? Did anything else change besides > installing RC5 on both nodes ? > This back to back setup was working originally with a backported 2.6.11-34 kernel and I believe it was revision 6500 from the OpenIB svn trunk at that time. The problems started when I tried to move to RC4 and now RC5 of the OFED release, with the 2.6.16 kernel. > I have two more experiments I'd like you to try, before we go down the > madeye "route": > > 1. Do you have another IB cable to try ? > > 2. Can you completely shutdown and repower the remote node and see if it > starts responding ? > It is difficult for me to debug this sort of thing, since I telecommute from Tucson and the machines are located in Phoenix. But I can get someone there to power the machine down and reboot. -Don Albert- -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at mellanox.co.il Tue May 30 08:02:45 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 30 May 2006 18:02:45 +0300 Subject: [openib-general] RE: QoS RFC Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302368770@mtlexch01.mtl.com> Hi Roland, > BTW I think these changes to PathRecord and MultiPathRecord need to be > standardized through IBTA before we implement it in Linux, to avoid a > non-standard implementation proliferating everywhere. These extensions are already being discussed in IBTA LWG. It will take some time before they will be resolved. I think it worthwhile to show there is an implementation plan to make that process progress even faster. Eitan From rdreier at cisco.com Tue May 30 07:58:53 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 May 2006 07:58:53 -0700 Subject: [openib-general] Re: QoS RFC In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30236876F@mtlexch01.mtl.com> (Eitan Zahavi's message of "Tue, 30 May 2006 17:51:50 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30236876F@mtlexch01.mtl.com> Message-ID: > [EZ] Well, a ULP that uses CMA will have it handled by CMA... > But an old SM implementation that does not support this kind of > PathRecord extension will probably choke on the new fields once their > component mask bits are set. > You could however query once for each Client-Reregister event. Right, but for example SRP cannot use the CMA because the SRP protocol does not use IP addressing. It seems that the SA query module is really the right place to handle the query of ClassPortInfo, caching results, invalidating cache on client-reregister, etc. - R. From halr at voltaire.com Tue May 30 07:50:09 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 May 2006 10:50:09 -0400 Subject: [openib-general] RE: QoS RFC In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30236876F@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30236876F@mtlexch01.mtl.com> Message-ID: <1149000608.4358.113315.camel@hal.voltaire.com> On Tue, 2006-05-30 at 10:51, Eitan Zahavi wrote: > Hi Roland, > > > > This is OK but it's sort of a pain to have to query SA ClassPortInfo > > all the time. Do you have a plan for how to make this transparent to > ULPs? > [EZ] Well, a ULP that uses CMA will have it handled by CMA... > But an old SM implementation that does not support this kind of > PathRecord extension will probably choke on the new fields once their > component mask bits are set. What do you mean by "choke" ? Wouldn't the new components just be ignored ? > You could however query once for each Client-Reregister event. > > > > (BTW something in your email client is really messing up the > > formatting of your message) > [EZ] Thanks I will resend . > > > > - R. From eitan at mellanox.co.il Tue May 30 08:06:54 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 30 May 2006 18:06:54 +0300 Subject: [openib-general] RE: QoS RFC Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302368772@mtlexch01.mtl.com> > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Tuesday, May 30, 2006 5:50 PM > To: Eitan Zahavi > Cc: Roland Dreier; openib-general at openib.org; Nimrod Gindi; Aviram Gutman; > Sasha Khapyorsky; sean.hefty at intel.com; Vu Pham; Roland Dreier > Subject: RE: QoS RFC > > On Tue, 2006-05-30 at 10:51, Eitan Zahavi wrote: > > Hi Roland, > > > > > > This is OK but it's sort of a pain to have to query SA ClassPortInfo > > > all the time. Do you have a plan for how to make this transparent to > > ULPs? > > [EZ] Well, a ULP that uses CMA will have it handled by CMA... > > But an old SM implementation that does not support this kind of > > PathRecord extension will probably choke on the new fields once their > > component mask bits are set. > > What do you mean by "choke" ? Wouldn't the new components just be > ignored ? [EZ] By choke I mean - the SA might decide to error the request on invalid parameter. > > > You could however query once for each Client-Reregister event. > > > > > > (BTW something in your email client is really messing up the > > > formatting of your message) > > [EZ] Thanks I will resend . > > > > > > - R. > From yipeeyipeeyipeeyipee at yahoo.com Tue May 30 08:02:43 2006 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Tue, 30 May 2006 15:02:43 +0000 (UTC) Subject: [openib-general] Re: special qp's creation from userspace References: <1148995226.4358.111151.camel@hal.voltaire.com> Message-ID: Hal Rosenstock voltaire.com> writes: > On Tue, 2006-05-30 at 05:35, yipee wrote: > > Can I use ib_mad_port_close() (mad.c) to close qp0 & qp1 > > Yes, that would close QP0/1 on each port. Just unloading the ib_mad > module will have that effect. Would this unloading cause the port status to downgrade from ACTIVE to some other state? Does ib_mad tell the active SM that it is going down? What would happen to active RC connections? Would they be affected by this? Thanks, x From eitan at mellanox.co.il Tue May 30 08:08:51 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 30 May 2006 18:08:51 +0300 Subject: [openib-general] RE: QoS RFC Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302368773@mtlexch01.mtl.com> > > > [EZ] Well, a ULP that uses CMA will have it handled by CMA... > > But an old SM implementation that does not support this kind of > > PathRecord extension will probably choke on the new fields once their > > component mask bits are set. > > You could however query once for each Client-Reregister event. > > Right, but for example SRP cannot use the CMA because the SRP protocol > does not use IP addressing. > > It seems that the SA query module is really the right place to handle > the query of ClassPortInfo, caching results, invalidating cache on > client-reregister, etc. [EZ] Yes I agree SA client might be the right place for dealing with the different SA optional capabilities . > > - R. From paul.lundin at gmail.com Tue May 30 08:06:01 2006 From: paul.lundin at gmail.com (Paul) Date: Tue, 30 May 2006 11:06:01 -0400 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: References: <1148695049.4358.5966.camel@hal.voltaire.com> Message-ID: Hi All, I will be working on this as time permits this week. Unfortunately my employer is not crazy about giving out remote access, so I will have to be your hands on this. If you want me to do something just tell me what it is. I know its a pain I have been there myself. Regards. On 5/30/06, Don.Albert at bull.com wrote: > > > Hal, > > With your patch to OpenSM, I think everything is ok on the local node. > The remote node is definitely having some problems, resulting in not > responding to the MAD packets. I have entered a separate message on the > problems with the "ib0" interface on that machine. > > > > > On Fri, 2006-05-26 at 20:59, Hal Rosenstock wrote: > > > > What next, coach? > > > > > > Can you turn on madeye on the remote node and see what packets are > > > received and sent ? Let me know if you need help with that. I think > you > > > said you were running OFED, right ? > > > > Yes, I am running kernel 2.6.16 with the OFED RC5 release. I will > investigate how to run madeye, but the hangs on the remote machine are > probably the root cause of the link failure. > > > > I don't think madeye is part of OFED :-( Can it get added for RC6, > > Tziporet ? I think it would be a useful tool to add for problems like > > this. > > > > Also, was this a working setup before ? Did anything else change besides > > installing RC5 on both nodes ? > > > > This back to back setup was working originally with a backported 2.6.11-34kernel and I believe it was revision 6500 from the OpenIB svn trunk at that > time. The problems started when I tried to move to RC4 and now RC5 of the > OFED release, with the 2.6.16 kernel. > > > > I have two more experiments I'd like you to try, before we go down the > > madeye "route": > > > > 1. Do you have another IB cable to try ? > > > > 2. Can you completely shutdown and repower the remote node and see if it > > starts responding ? > > > > It is difficult for me to debug this sort of thing, since I telecommute > from Tucson and the machines are located in Phoenix. But I can get someone > there to power the machine down and reboot. > > -Don Albert- > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at mellanox.co.il Tue May 30 08:16:36 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 30 May 2006 18:16:36 +0300 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> <20060529161558.GU21266@mellanox.co.il> <20060530093930.GE21266@mellanox.co.il> <20060530142444.GA8405@mellanox.co.il> Message-ID: <447C61D4.5020803@mellanox.co.il> Roland Dreier wrote: > By the way, did you get a chance to test the AH leak fix to see if it > really fixes your leak? That would make me feel better about asking > Linus to pull it into 2.6.17. > > - R. > We got the bug report from a customer. We did not succeeded to reproduce the failure here, since we don't have such strong machines (4 dual core CPU) as the customer has. Only when the customer will test the fix we can know for sure. Tziporet From halr at voltaire.com Tue May 30 08:12:39 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 May 2006 11:12:39 -0400 Subject: [openib-general] RE: QoS RFC In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E302368772@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E302368772@mtlexch01.mtl.com> Message-ID: <1149001957.4358.113884.camel@hal.voltaire.com> On Tue, 2006-05-30 at 11:06, Eitan Zahavi wrote: > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Tuesday, May 30, 2006 5:50 PM > > To: Eitan Zahavi > > Cc: Roland Dreier; openib-general at openib.org; Nimrod Gindi; Aviram > Gutman; > > Sasha Khapyorsky; sean.hefty at intel.com; Vu Pham; Roland Dreier > > Subject: RE: QoS RFC > > > > On Tue, 2006-05-30 at 10:51, Eitan Zahavi wrote: > > > Hi Roland, > > > > > > > > This is OK but it's sort of a pain to have to query SA > ClassPortInfo > > > > all the time. Do you have a plan for how to make this transparent > to > > > ULPs? > > > [EZ] Well, a ULP that uses CMA will have it handled by CMA... > > > But an old SM implementation that does not support this kind of > > > PathRecord extension will probably choke on the new fields once > their > > > component mask bits are set. > > > > What do you mean by "choke" ? Wouldn't the new components just be > > ignored ? > [EZ] By choke I mean - the SA might decide to error the request on > invalid parameter. Sure and that should be handled by the end node. Depending on what component has control over the request, a non QoS request could be remade if appropriate. -- Hal > > > > > You could however query once for each Client-Reregister event. > > > > > > > > (BTW something in your email client is really messing up the > > > > formatting of your message) > > > [EZ] Thanks I will resend . > > > > > > > > - R. > > > From rdreier at cisco.com Tue May 30 08:22:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 May 2006 08:22:08 -0700 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: <447C61D4.5020803@mellanox.co.il> (Tziporet Koren's message of "Tue, 30 May 2006 18:16:36 +0300") References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> <20060529161558.GU21266@mellanox.co.il> <20060530093930.GE21266@mellanox.co.il> <20060530142444.GA8405@mellanox.co.il> <447C61D4.5020803@mellanox.co.il> Message-ID: Tziporet> We got the bug report from a customer. We did not Tziporet> succeeded to reproduce the failure here, since we don't Tziporet> have such strong machines (4 dual core CPU) as the Tziporet> customer has. I do have 4-socket dual core systems (although my lab has no power right now due to a transformer fire). What is the recipe to reproduce this? And what is the symptom (how will I know I've reproduced the bug)? - R. From halr at voltaire.com Tue May 30 08:17:10 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 May 2006 11:17:10 -0400 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: References: Message-ID: <1149002229.4358.114002.camel@hal.voltaire.com> Don, On Tue, 2006-05-30 at 10:55, Don.Albert at Bull.com wrote: > Hal, > > With your patch to OpenSM, I think everything is ok on the local node. That patch with one minor change (elimination of the CL_ASSERT) will be part of the upcoming RC6. > The remote node is definitely having some problems, resulting in not > responding to the MAD packets. I have entered a separate message on > the problems with the "ib0" interface on that machine. > > > > On Fri, 2006-05-26 at 20:59, Hal Rosenstock wrote: > > > > What next, coach? > > > > > > Can you turn on madeye on the remote node and see what packets are > > > received and sent ? Let me know if you need help with that. I > think you > > > said you were running OFED, right ? > > > > Yes, I am running kernel 2.6.16 with the OFED RC5 release. I will > investigate how to run madeye, but the hangs on the remote machine are > probably the root cause of the link failure. Ah; got it. It's tied into the other problem. Yes, when the hangs are resolved, the SMA on the remote node will respond and I would expect the port to get to active and you should be on your way then. > > I don't think madeye is part of OFED :-( Can it get added for RC6, > > Tziporet ? I think it would be a useful tool to add for problems > like > > this. > > > > Also, was this a working setup before ? Did anything else change > besides > > installing RC5 on both nodes ? > > > > This back to back setup was working originally with a backported > 2.6.11-34 kernel and I believe it was revision 6500 from the OpenIB > svn trunk at that time. The problems started when I tried to move to > RC4 and now RC5 of the OFED release, with the 2.6.16 kernel. > > > I have two more experiments I'd like you to try, before we go down > the > > madeye "route": > > > > 1. Do you have another IB cable to try ? > > > > 2. Can you completely shutdown and repower the remote node and see > if it > > starts responding ? > > > > It is difficult for me to debug this sort of thing, since I > telecommute from Tucson and the machines are located in Phoenix. But > I can get someone there to power the machine down and reboot. It's OK; you explained the state of the remote node so neither of those experiments is necessary. -- Hal > -Don Albert- > From Don.Albert at Bull.com Tue May 30 08:25:56 2006 From: Don.Albert at Bull.com (Don.Albert at Bull.com) Date: Tue, 30 May 2006 08:25:56 -0700 Subject: [openfabrics-ewg] Re: [openib-general] Re: NOP problem in ib_mthca on OFED RC4 In-Reply-To: Message-ID: Roland, openfabrics-ewg-bounces at openib.org wrote on 05/30/2006 07:46:08 AM: > Don> It is rather difficult for me to debug this sort of hang, > Don> since I telecommute from Tucson and the machines are located > Don> in Phoenix. Anyone have any suggestions? > > cat /proc//wchan for the process in question. For the process executing the "ip link set dev ib0 down" command, this yields: [jatoba] (root) root> cat /proc/7031/wchan flush_cpu_workqueue > "echo t > > /proc/sysrq-trigger" > will produce copious output that might help as well. > > - R. I also tried this. I didn't see any output on my terminal. Where does all this "copious output" go? -Don Albert- -------------- next part -------------- An HTML attachment was scrubbed... URL: From trimmer at silverstorm.com Tue May 30 08:27:47 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Tue, 30 May 2006 11:27:47 -0400 Subject: [openib-general] QoS RFC - Resend using a friendly mailer Message-ID: > Eitan wrote: > 2.2. The SM analyzes the provided policy to see if it is > realizable and performs > the necessary fabric setup. The SM may continuously monitor > the policy and adapt > to changes in it. Part of this policy defines the default > QoS-Level of each > partition. The SA is being enhanced to match the requested > Source, Destination, > TClass, Service-ID (and optionally SL and priority) against > the policy. So > clients (ULPs, programs) can obtain a policy enforced QoS. > The SM is also > enhanced to support setting up partitions with appropriate > IPoIB broadcast > group. This broadcast group carries its QoS attributes: > TClass, SL, MTU and > RATE. While using the Service ID is an interesting idea, the problem is the Service ID values are not well defined by IBTA. Rather each endpoint is permitted to define its own, potentially transient set of Service ID values. The Service ID values are discovered via Service Records in the SA or Device Management queries which get their data from the IOU. Hence while a few service ID values are well defined (such as those for SDP), many are not (such as those for MPI, uDAPL, SRP, etc) and may vary between both hardware and software suppliers. Many are likely to be duplicated between different vendors target devices (for example a uDAPL target application may duplicate values used by an SRP target) and this would not be a problem provided both applications were never run on the same IB Node target device. Some might even change on each reboot (IBTA spec implies this could be a 64 bit pointer or context in the target), although I'm not aware of any which do. I believe it is for the above reasons that IBTA chose not to make ServiceID part of the PathRecord and MultiPathRecord queries. As Roland suggest, before implementing a non-standard approach, IBTA should be engaged to define an appropriate extension to the standard. Such extensions would need to be carefully defined to avoid breaking existing applications and fabrics. Todd Rimmer From rdreier at cisco.com Tue May 30 08:34:08 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 May 2006 08:34:08 -0700 Subject: [openfabrics-ewg] Re: [openib-general] Re: NOP problem in ib_mthca on OFED RC4 In-Reply-To: (Don Albert's message of "Tue, 30 May 2006 08:25:56 -0700") References: Message-ID: Don> I also tried this. I didn't see any output on my terminal. Don> Where does all this "copious output" go? Into the kernel log. - R. From rdreier at cisco.com Tue May 30 08:47:40 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 May 2006 08:47:40 -0700 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: <20060529161405.GT21266@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 29 May 2006 19:14:05 +0300") References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> <20060529161405.GT21266@mellanox.co.il> Message-ID: I queued this for 2.6.18. From halr at voltaire.com Tue May 30 09:59:17 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 May 2006 12:59:17 -0400 Subject: [openib-general] Re: QoS RFC In-Reply-To: References: <6AB138A2AB8C8E4A98B9C0C3D52670E30236876F@mtlexch01.mtl.com> Message-ID: <1149008356.4358.116331.camel@hal.voltaire.com> On Tue, 2006-05-30 at 10:58, Roland Dreier wrote: > > [EZ] Well, a ULP that uses CMA will have it handled by CMA... > > But an old SM implementation that does not support this kind of > > PathRecord extension will probably choke on the new fields once their > > component mask bits are set. > > You could however query once for each Client-Reregister event. > > Right, but for example SRP cannot use the CMA because the SRP protocol > does not use IP addressing. > > It seems that the SA query module is really the right place to handle > the query of ClassPortInfo, caching results, invalidating cache on > client-reregister, etc. Yes, but there is no requirement for client reregistration being supported on the SM side so I think there is more than this that is needed (at least in theory). -- Hal > - R. From halr at voltaire.com Tue May 30 10:14:41 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 May 2006 13:14:41 -0400 Subject: [openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5 In-Reply-To: References: <1148695049.4358.5966.camel@hal.voltaire.com> Message-ID: <1149008845.4358.116552.camel@hal.voltaire.com> Hi Paul, On Tue, 2006-05-30 at 11:06, Paul wrote: > Hi All, > I will be working on this as time permits this week. > Unfortunately my employer is not crazy about giving out remote access, > so I will have to be your hands on this. If you want me to do > something just tell me what it is. I know its a pain I have been there > myself. I should have access to a G5 in a day or so so let me see if I can recreate this. -- Hal > Regards. > > On 5/30/06, Don.Albert at bull.com wrote: > Hal, > > With your patch to OpenSM, I think everything is ok on the > local node. The remote node is definitely having some > problems, resulting in not responding to the MAD packets. I > have entered a separate message on the problems with the "ib0" > interface on that machine. > > > > > On Fri, 2006-05-26 at 20:59, Hal Rosenstock wrote: > > > > What next, coach? > > > > > > Can you turn on madeye on the remote node and see what > packets are > > > received and sent ? Let me know if you need help with > that. I think you > > > said you were running OFED, right ? > > > > > Yes, I am running kernel 2.6.16 with the OFED RC5 release. I > will investigate how to run madeye, but the hangs on the > remote machine are probably the root cause of the link > failure. > > > I don't think madeye is part of OFED :-( Can it get added > for RC6, > > Tziporet ? I think it would be a useful tool to add for > problems like > > this. > > > > Also, was this a working setup before ? Did anything else > change besides > > installing RC5 on both nodes ? > > > > > This back to back setup was working originally with a > backported 2.6.11-34 kernel and I believe it was revision 6500 > from the OpenIB svn trunk at that time. The problems started > when I tried to move to RC4 and now RC5 of the OFED release, > with the 2.6.16 kernel. > > > I have two more experiments I'd like you to try, before we > go down the > > madeye "route": > > > > 1. Do you have another IB cable to try ? > > > > 2. Can you completely shutdown and repower the remote node > and see if it > > starts responding ? > > > > > It is difficult for me to debug this sort of thing, since I > telecommute from Tucson and the machines are located in > Phoenix. But I can get someone there to power the machine > down and reboot. > > -Don Albert- > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > From mshefty at ichips.intel.com Tue May 30 10:44:58 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 30 May 2006 10:44:58 -0700 Subject: [openib-general] Failed multicast join with new multicast module In-Reply-To: <1148905985.4358.78023.camel@hal.voltaire.com> References: <1148905985.4358.78023.camel@hal.voltaire.com> Message-ID: <447C849A.4060200@ichips.intel.com> Hal Rosenstock wrote: > Send-only joins is another case. These are full member joins (JoinState > 1) to groups which are not yet created so they fail. I see the problem, and checked in a fix. I forgot to record the last join operation that was initiated, so that it could be failed on an error. This resulted in the join being retried repeatedly. The join request should now fail, and ipoib will retry using an exponential backoff strategy. - Sean From halr at voltaire.com Tue May 30 10:38:40 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 May 2006 13:38:40 -0400 Subject: [openib-general] QoS RFC - Resend using a friendly mailer In-Reply-To: References: Message-ID: <1149010718.4358.117351.camel@hal.voltaire.com> On Tue, 2006-05-30 at 10:53, Eitan Zahavi wrote: > To: OPENIB > Subject: QoS RFC - Resend using a friendly mailer > --text follows this line-- > Hi All > > Please find the attached RFC describing how QoS policy support could be implemented in the OpenFabrics stack. > Your comments are welcome. Some initial comments. > > Eitan > > RFC: OpenFabrics Enhancements for QoS Support > =============================================== > > Authors: . Eitan Zahavi > Date: .... May 2006. > Revision: 0.1 > > Table of contents: > 1. Overview > 2. Architecture > 3. Supported Policy > 4. CMA functionality > 5. IPoIB functionality > 6. SDP functionality > 7. SRP functionality > 8. iSER functionality > 9. OpenSM functionality > > 1. Overview > ------------ > Quality of Service requirements stem from the realization of I/O consolidation > over IB network: As multiple applications and ULPs share the same fabric, means > to control their use of the network resources are becoming a must. The basic > need is to differentiate the service levels provided to different traffic flows. > Such that a policy could be enforced and control each flow utilization of the > fabric resources. > > IBTA specification defined several hardware features and management interfaces > to support QoS: > * Up to 15 Virtual Lanes (VL) could carry traffic in a non-blocking manner > * Arbitration between traffic of different VL is performed by a 2 priority > levels weighted round robin arbiter. The arbiter is programmable with > a sequence of (VL, weight) pairs and maximal number of high priority credits > to be processed before low priority is served > * Packets carry class of service marking in the range 0 to 15 in their > header SL field > * Each switch can map the incoming packet by its SL to a particular output > VL based on programmable table VL=SL-to-VL-MAP(in-port, out-port, SL) > * The Subnet Administrator controls each communication flow parameters > by providing them as a response to Path Record query > > The IB QoS features provide the means to implement a DiffServ like architecture. > DiffServ architecture (IETF RFC2474 2475) is widely used today in highly dynamic > fabrics. Only certain DSCP code point equivalents are provided by IBA. > This proposal provides the detailed functional definition for the various > software elements that are required to enable a DiffServ like architecture over > the OpenFabrics software stack. > > > > > > 2. Architecture > ---------------- > This proposal split the QoS functionality between the SM/SA, CMA and the various > ULPS. We take the "chronology approach" to describe how the overall system > works: > > 2.1. The network manager (human) provides a set of rules (policy) that defines > how the network is being configured and how its resources are split to different > QoS-Levels. The policy also define how to decide which QoS-Level each > application or ULP or service use. > 2.2. The SM analyzes the provided policy to see if it is realizable and performs > the necessary fabric setup. The SM may continuously monitor the policy and adapt > to changes in it. Do you mean monitor the policy or the fabric here ? > Part of this policy defines the default QoS-Level of each > partition. The SA is being enhanced to match the requested Source, Destination, > TClass, Service-ID Service ID does not apply to many ULPs. Also, how is it known what ULP/application a particular service ID refers to (other than perhaps some well known ones) ? > (and optionally SL and priority) against the policy. So > clients (ULPs, programs) can obtain a policy enforced QoS. The SM is also > enhanced to support setting up partitions with appropriate IPoIB broadcast > group. This broadcast group carries its QoS attributes: TClass, SL, MTU and > RATE. > > 2.3. IPoIB is being setup. IPoIB uses the SL, MTU and RATE available on the > multicast group which forms the broadcast group of this partition. > > 2.4. MPI which provides non IB based connection management should be configured > to run using hard coded SLs. It uses these SLs in every QP being opened. > > 2.5. ULPs that use CM interface (like SRP) should have their own pre-assigned > Service-ID and use it while obtaining PathRecord for establishing their > connections. The SA receiving the PathRecord should match it against the policy > and return the appropriate PathRecord including SL, MTU, RATE and TClass. > > 2.6. ULPs and programs using CMA to establish RC connection should provide the > CMA the target IP and Service-ID. Some of the ULPs might also provide TClass > (E.g. for SDP sockets that are provided the TOS socket option). The CMA should > then use the provided Service-ID and optional TClass and pass them in the > PathRecord request. The resulting PathRecord should be used for configuring the > connection QP. > > PathRecord and MultiPathRecord enhancement for QoS: > As mentioned above the PathRecord and MultiPathRecord attributes should be > enhanced to carry the Service-ID which is a 64bit value. Given the existing > definition for these attributes we propose to use the following fields for > Service-ID: > * For PathRecord: use the first 2 reserved fields whicg are 32bits each > (component masks 0x1 and 0x2). Component mask 1 should be used to refer to the > merged Service-ID field > * For MultiPathRecord: use 2 reserved fields: > 1. after the packet life (8 bits) which is component mask bit 0x10000 (17) > 2. the field before SDGID1 (56 bits) which is component mask bit 0x200000 (22) This is not possible with the existing approved 1.2 erratum changes. > Once merged they should be selected using component mask bit 0x10000 (17) > A new capability bit should describe the SM QoS support in the SA class port > info. This approach provides an easy migration path for existing access layer > and ULPs by not introducing a new attribute. > > > 3. Supported Policy > -------------------- > > The QoS policy supported by this proposal is divided into 4 sub sections: > > * Node Group: a set of HCAs, Routers or Switches that share the same settings. > A node groups might be a partition defined by the partition manager policy in > terms of GUIDs. Future implementations might provide support for NodeDescription > based definition of node groups. > > * Fabric Setup: > Defines how the SL2VL and VLArb tables should be setup. This policy definition > assumes the computation of target behavior should be performed outside of > OpenSM. > > * QoS-Levels Definition: > This section defines the possible sets of parameters for QoS that a client might > be mapped to. Each set holds: SL and optionally: Max MTU, Max Rate, Path Bits > (in case LMC > 0 is used for QoS) and TClass. > > * Matching Rules: > A list of rules that match an incoming PathRecord request to a QoS-Level. The > rules are processed in order such as the first match is applied. Each rule is > built out of set of match expressions which should all match for the rule to > apply. The matching expressions are defined for the following fields > ** SRC and DST to lists of node groups > ** Service-ID to a list of Service-ID or Service-ID ranges > ** TClass to a list of TClass values or ranges > > XML style syntax is provided for the policy file. However, a strict BNF format > (provided in section 8) What section ? > should be used for parsing it. > > > > > > > Storage our SRP storage targets > 0x1000000000000001 > 0x1000000000000002 > > > Virtual Servers node desc and IB port # > vs1/HCA-1/P1 > vs3/HCA-1/P1 > vs3/HCA-2/P1 > > > Partition 1 default settings > Part1 > > > Routers all routers > ROUTER > > > > > > > Part1 * * > 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 > > > Storage * 1 > 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1 > > > > > > > Storage * > > 0:255,1:127,2:63,3:31,4:15,5:7,6:3,7:1 > 8:255,9:127,10:63,11:31,12:15,13:7,14:3 > 10 > > > > > > > 1 for the lowest priority comm > 16 > > > 2 low latency best bandwidth > 0 7 > > > 3 just an example > 0 32 1 > 1 > > > > > > 1 low latency by tclass 7-9 or 11> > 7-9,11 1 > > > 2 Storage targets connection> > Storage 22,4719 > 3 > > > > > > > 4. IPoIB > --------- > > IPoIB already query the SA for its broadcast group information. The additional > functionality required is for IPoIB to provide the broadcast group SL, MTU, RATE > and TClass in every following PathRecord query performed when a new UDAV is > needed by IPoIB. > We could assign a special Service-ID for IPoIB use but since all communication > on the same IPoIB interface shares the same QoS-Level without the ability to > differentiate it by target service we can ignore it for simplicity. > > 5. CMA features > ---------------- > > The CMA interface supports Service-ID through the notion of port space as a > prefixes to the port_num which is part of the sockaddr provided to > rdma_resolve_add(). What is missing is the explicit request for a TClass that > should allow the ULP (like SDP) to propagate a specific request for a class of > service. A mechanism for providing the TClass is available in the IPv6 address, > so we could use that address field. Another option is to implement a special > connection options API for CMA. > > Missing functionality by CMA is the usage of the provided TClass and Service-ID > in the sent PathRecord. When a response is obtained it is an existing > requirement for the CMA to use the PathRecord from the response in setting up > the QP address vector. > > > 6. SDP > ------- > > SDP uses CMA for building its connections. > The Service-ID for SDP is 0x000000000001PPPP, where PPPP are 4 hex digits > holding the remote TCP/IP Port Number to connect to. > SDP might be provided with SO_PRIORITY socket option. In that case the value > provided should be sent to the CMA as the TClass option of that connection. > > 7. SRP > ------- > > Current SRP implementation uses its own CM callbacks (not CMA). So SRP should > fill in the Service-ID in the PathRecord by itself and use that information in > setting up the QP. The T10 SRP standard defines the SRP Service-ID to be defined > by the SRP target I/O Controller (but they should also comply with IBTA Service- > ID rules). Anyway, the Service-ID is reported by the I/O Controller in the > ServiceEntries DMA attribute and should be used in the PathRecord if the SA > reports its ability to handle QoS PathRecords. > > 8. iSER > -------- > iSER uses CMA and thus should be very close to SDP. The Service-ID for iSER > should be TBD. > > > 9. OpenSM features > ------------------- > The QoS related functionality to be provided by OpenSM can be split into two > main parts: > > 3.1. Fabric Setup > During fabric initialization the SM should parse the policy and apply its > settings to the discovered fabric elements. The following actions should be > performed: > * Parsing of policy > * Node Group identification. Warning should be provided for each node not > specified but found. What about the other way 'round too (nodes specified but not found) ? > * SL2VL settings validation should be checked: > + A warning will be provided if there are no matching targets for the SL2VL > setting statement. > + An error message will be printed to the log file if an invalid setting is > found. A setting is invalid if it refers to: > - Non existing port numbers of the target devices > - Unsupported VLs for the target device. In the later case the map to non > existing VLs should be replaced to VL15 i.e. packets will be dropped. > * SL2VL setting is to be performed > * VL Arbitration table settings should be validated according to the following > rules: > + A warning will be provided if there are no matching targets for the setting > statement > + An error will be provided if the port number exceeds the target ports > + An error will be generated if the table length exceeds device capabilities > + An warning will be generated if the table quote a VL that is not supported > by the target device > * VL Arbitration tables will be set on the appropriate targets One needs to be careful about these rules as there are a number of different "shapes" to these tables. > 3.2. PathRecord query handling: > OpenSM should be able to enforce the provided policy on client request. > The overall flow for such requests is: first the request is matched against the > defined match rules such that the target QoS-Level definition is found. Given > the QoS-Level a path(s) search is performed with the given restrictions imposed > by that level. The following two sections describe these steps. > > One issue not standardized by the IBTA is how Service-ID is carried in the > PathRecord and MultiPathRecord attributes. There are basically two options: > a. Replace the SM-Key field by the Service-ID. In that case no component mask > bit will be assigned to it. Such that if the field is zero we should treat it > as if the component mask bit is clear. > b. Encode it into spare fields. For PathRecord the first two fields are reserved > and are 64 bit when combined. The first component mask bit maps to the first > reserved field and should be used for Service-ID masking. For MultiPathRecord > attribute there are no adjacent reserve fields that makes a 64 bit field. So > the reserve field following the packet-lifetime (8 bits) combined with the > reserved field DGIDCount (56 bits) can make the Service-ID. In this case also > the first reserve field component mask bit should be used as the Service-ID > component mask bit. > > > > 3.2.1. Matching rule search: > A rule is "matching" a PathRecord request using the following criteria: > * Matching rules provide values in a list of either single value, or range of > values. A PathRecord field is "matching" the rule field if it is explicitly > noted in the list of values or is one of the values covered by a range > included in the field values list. > * Only PathRecord fields that have their component mask bit set should be > compared. > * For a rule to be "matching" a PathRecord request all the rule fields should be > "matching" their PathRecord fields. Such that a PathRecord request that does > not have a component mask field set for one of the rule defined fields can > not match that rule. > * A PathRecord request that have a component mask bit set for one of the fields > that is not defined by the rule can match the rule. > > The algorithm to be used for searching for a rule match might be as simple as a > sequential search through all rules or enhanced for better performance. The > semantics of every rule field and its matching PathRecord field are described > below: > * Source: the SGID or SLID should be part of this group > * Destination: the DGID or DLID should be part of this group > * Service-ID: check if the requested Service-ID (available in the PathRecord old > SM-Key field) is matching any of this rule Service-IDs > * TClass: check if the PathRecord TClass field is matching > > 3.2.2 PathRecord response generation: > The QoS-Level pointed by the first rule that matches the PathRecord request > should be used for obtaining the response SL, MTU-Limit, RATE-Limit, Path-Bits > and TClass. A default QoS-Level should be used if no rule is matching the query. > > The efficient algorithm for finding paths that meet the QoS-Level criteria is > beyond the scope of this RFC and left for the implementer to provide. However > the criteria by which the paths match the QoS-Level are described below: > > * SL: The paths found should all use the given SL. For that sake PathRecord > algorithm should traverse the path from source to destination only through > ports that carry a valid VL (not VL15) by the SL2VL map (should consider input > and output ports and SL). > * MTU-Limit: The resulting paths MTU should not exceed the given MTU-Limit > * Rate-Limit: The resulting paths RATE should not exceed the given RATE-Limit > (rate limit is given in units of link BW = Width*Speed according to IBTA > Specification Vol-1 table-205 p-901 l-24). > * Path-Bits: define the target LID lowest bits (number of bits defined by the > target port PortInfo.LMC field). The path should traverse the LFT using the > target port LID with the path-bits set. > * TClass: should be returned in the result PathRecord. When routing is going to > be supported by OpenSM we might use this field in selecting the target > router too in a TBD way. > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mst at mellanox.co.il Tue May 30 10:54:20 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 30 May 2006 20:54:20 +0300 Subject: [openib-general] RFC: CMA backlog Message-ID: <20060530175420.GE10234@mellanox.co.il> Hello, Sean! I am looking at implementing the listen backlog parameter correctly. Here's what this does in TCP: TCP counts the number of connect requests at the specific local socket that were not yet accepted by accept(). Once this number exceeds the backlog specified in listen, new SYN packets will be dropped, and the remote side will retry. Implementing this correctly is important for scalability when a lot of clients connect to a single server - this way if there is a pressure on server to handle connections, it is distributed between clients who have to retry. Currently, this is not something that can be implemented by ULP on top of CMA, because returning error from REQ will result in reject rather than REQ drop. CMA already has backlog parameter in listen but it is ignored as far as I can see. I propose extending cma API with the following options: rdma_backlog_added - connection was added to backlog queue rdma_backlog_removed - connection was removed from backlog queue Internally, CMA will count the # of connections in backlog. If If REQ arrives and this number exceeds the backlog given in listen, CMA will drop the REQ, without creating the new CMA ID. This mimics the TCP behaviour closely an will hopefully scale well. Opinions? -- MST From krause at cup.hp.com Tue May 30 11:01:36 2006 From: krause at cup.hp.com (Michael Krause) Date: Tue, 30 May 2006 11:01:36 -0700 Subject: [openib-general] QoS RFC - Resend using a friendly mailer In-Reply-To: References: Message-ID: <6.2.0.14.2.20060530104831.02af18b8@esmail.cup.hp.com> High-level feedback: - An IB fabric could be used for a single ULP and still require QoS. The issue is how to differentiate flows on a given shared element within the fabric. - QoS controls must be dynamic. The document references initialization as the time when decisions are made but obviously that is just a first pass on use of the fabric and not what it will become in potentially a short period of time. - QoS also involves multi-path support (not really touched upon in terms of specifics in this document). Distributing or segregating work even if for the same ULP should be done across multiple or distinct paths. In one sense this may complicate the work but in another it is simpler in that arbitration controls for shared links become easier to manage if the number of flows is reduced. - IP over IB defines a multicast group which is ultimately a spanning tree. That should not constrain what paths are used to communicate between endnode pairs. That only defines the multicast paths which are not strongly ordered relative to the unicast traffic. Further IP over IB may operate using the RC mode between endnodes. It is very simple to replicate RC and then segregate these into QoS domains (one could just align priority with the 802.1p for simplicity and practical execution) which can in turn flow over shared or distinct paths. - IB is a centrally managed fabric. Adding in SID into records and such really isn't going to help solve the problem unless there is also a centralized management entity well above IB that can prioritize communication service rates for different ULP and endnode pairs. Given most of these centralized management entities are rather ignorant of IB at the moment, this presents a chicken-egg dilemma which is further complicated by developing SOA technology. It might be more valuable in one sense to examine SOA technology and how it is translating itself to say Ethernet and then see how this can be leveraged to IB. - QoS needs to examine the sums of the consumers of a given path and their service rate requirements. It isn't just about setting a priority level but also about the packet injection rate to the fabric on that priority. This needs to be taken into account as well. Overall, it is not clear to me what the end value of this document. The challenge for any network admin is to translate SOA driven requirements into fabric control knob setting. Without such translation algorithms / understanding, it is not clear that there is anything truly missing in the IBTA spec suite or that this RFC will really advance the integration of IB into the data center in a truly meaningful manner. Mike At 07:53 AM 5/30/2006, Eitan Zahavi wrote: >To: OPENIB >Subject: QoS RFC - Resend using a friendly mailer >--text follows this line-- >Hi All > >Please find the attached RFC describing how QoS policy support could be >implemented in the OpenFabrics stack. >Your comments are welcome. > >Eitan > > RFC: OpenFabrics Enhancements for QoS Support > =============================================== > >Authors: . Eitan Zahavi >Date: .... May 2006. >Revision: 0.1 > >Table of contents: >1. Overview >2. Architecture >3. Supported Policy >4. CMA functionality >5. IPoIB functionality >6. SDP functionality >7. SRP functionality >8. iSER functionality >9. OpenSM functionality > >1. Overview >------------ >Quality of Service requirements stem from the realization of I/O >consolidation >over IB network: As multiple applications and ULPs share the same fabric, >means >to control their use of the network resources are becoming a must. The basic >need is to differentiate the service levels provided to different traffic >flows. >Such that a policy could be enforced and control each flow utilization of the >fabric resources. > >IBTA specification defined several hardware features and management >interfaces >to support QoS: >* Up to 15 Virtual Lanes (VL) could carry traffic in a non-blocking manner >* Arbitration between traffic of different VL is performed by a 2 priority > levels weighted round robin arbiter. The arbiter is programmable with > a sequence of (VL, weight) pairs and maximal number of high priority > credits > to be processed before low priority is served >* Packets carry class of service marking in the range 0 to 15 in their > header SL field >* Each switch can map the incoming packet by its SL to a particular output > VL based on programmable table VL=SL-to-VL-MAP(in-port, out-port, SL) >* The Subnet Administrator controls each communication flow parameters > by providing them as a response to Path Record query > >The IB QoS features provide the means to implement a DiffServ like >architecture. >DiffServ architecture (IETF RFC2474 2475) is widely used today in highly >dynamic >fabrics. > >This proposal provides the detailed functional definition for the various >software elements that are required to enable a DiffServ like architecture >over >the OpenFabrics software stack. > > > > > >2. Architecture >---------------- >This proposal split the QoS functionality between the SM/SA, CMA and the >various >ULPS. We take the "chronology approach" to describe how the overall system >works: > >2.1. The network manager (human) provides a set of rules (policy) that >defines >how the network is being configured and how its resources are split to >different >QoS-Levels. The policy also define how to decide which QoS-Level each >application or ULP or service use. > >2.2. The SM analyzes the provided policy to see if it is realizable and >performs >the necessary fabric setup. The SM may continuously monitor the policy and >adapt >to changes in it. Part of this policy defines the default QoS-Level of each >partition. The SA is being enhanced to match the requested Source, >Destination, >TClass, Service-ID (and optionally SL and priority) against the policy. So >clients (ULPs, programs) can obtain a policy enforced QoS. The SM is also >enhanced to support setting up partitions with appropriate IPoIB broadcast >group. This broadcast group carries its QoS attributes: TClass, SL, MTU and >RATE. > >2.3. IPoIB is being setup. IPoIB uses the SL, MTU and RATE available on the >multicast group which forms the broadcast group of this partition. > >2.4. MPI which provides non IB based connection management should be >configured >to run using hard coded SLs. It uses these SLs in every QP being opened. > >2.5. ULPs that use CM interface (like SRP) should have their own pre-assigned >Service-ID and use it while obtaining PathRecord for establishing their >connections. The SA receiving the PathRecord should match it against the >policy >and return the appropriate PathRecord including SL, MTU, RATE and TClass. > >2.6. ULPs and programs using CMA to establish RC connection should provide >the >CMA the target IP and Service-ID. Some of the ULPs might also provide TClass >(E.g. for SDP sockets that are provided the TOS socket option). The CMA >should >then use the provided Service-ID and optional TClass and pass them in the >PathRecord request. The resulting PathRecord should be used for >configuring the >connection QP. > >PathRecord and MultiPathRecord enhancement for QoS: >As mentioned above the PathRecord and MultiPathRecord attributes should be >enhanced to carry the Service-ID which is a 64bit value. Given the existing >definition for these attributes we propose to use the following fields for >Service-ID: >* For PathRecord: use the first 2 reserved fields whicg are 32bits each > (component masks 0x1 and 0x2). Component mask 1 should be used to refer > to the > merged Service-ID field >* For MultiPathRecord: use 2 reserved fields: > 1. after the packet life (8 bits) which is component mask bit 0x10000 (17) > 2. the field before SDGID1 (56 bits) which is component mask bit > 0x200000 (22) > Once merged they should be selected using component mask bit 0x10000 (17) >A new capability bit should describe the SM QoS support in the SA class port >info. This approach provides an easy migration path for existing access layer >and ULPs by not introducing a new attribute. > > >3. Supported Policy >-------------------- > >The QoS policy supported by this proposal is divided into 4 sub sections: > >* Node Group: a set of HCAs, Routers or Switches that share the same >settings. >A node groups might be a partition defined by the partition manager policy in >terms of GUIDs. Future implementations might provide support for >NodeDescription >based definition of node groups. > >* Fabric Setup: >Defines how the SL2VL and VLArb tables should be setup. This policy >definition >assumes the computation of target behavior should be performed outside of >OpenSM. > >* QoS-Levels Definition: >This section defines the possible sets of parameters for QoS that a client >might >be mapped to. Each set holds: SL and optionally: Max MTU, Max Rate, Path Bits >(in case LMC > 0 is used for QoS) and TClass. > >* Matching Rules: >A list of rules that match an incoming PathRecord request to a QoS-Level. The >rules are processed in order such as the first match is applied. Each rule is >built out of set of match expressions which should all match for the rule to >apply. The matching expressions are defined for the following fields >** SRC and DST to lists of node groups >** Service-ID to a list of Service-ID or Service-ID ranges >** TClass to a list of TClass values or ranges > >XML style syntax is provided for the policy file. However, a strict BNF >format >(provided in section 8) should be used for parsing it. > > > > > > > Storage our SRP storage targets > 0x1000000000000001 > 0x1000000000000002 > > > Virtual Servers node desc and IB port > # > vs1/HCA-1/P1 > vs3/HCA-1/P1 > vs3/HCA-2/P1 > > > Partition 1 default settings > Part1 > > > Routers all routers > ROUTER > > > > > > > Part1 * * > 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 > > > Storage * 1 > 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1 > > > > > > > Storage * > > 0:255,1:127,2:63,3:31,4:15,5:7,6:3,7:1 > 8:255,9:127,10:63,11:31,12:15,13:7,14:3 > 10 > > > > > > > 1 for the lowest priority comm > 16 > > > 2 low latency best bandwidth > 0 7 > > > 3 just an example > 0 32 1 > 1 > > > > > > 1 low latency by tclass 7-9 or 11> > 7-9,11 1 > > > 2 Storage targets connection> > Storage 22,4719 > 3 > > > > > > >4. IPoIB >--------- > >IPoIB already query the SA for its broadcast group information. The >additional >functionality required is for IPoIB to provide the broadcast group SL, >MTU, RATE >and TClass in every following PathRecord query performed when a new UDAV is >needed by IPoIB. >We could assign a special Service-ID for IPoIB use but since all >communication >on the same IPoIB interface shares the same QoS-Level without the ability to >differentiate it by target service we can ignore it for simplicity. > >5. CMA features >---------------- > >The CMA interface supports Service-ID through the notion of port space as a >prefixes to the port_num which is part of the sockaddr provided to >rdma_resolve_add(). What is missing is the explicit request for a TClass that >should allow the ULP (like SDP) to propagate a specific request for a >class of >service. A mechanism for providing the TClass is available in the IPv6 >address, >so we could use that address field. Another option is to implement a special >connection options API for CMA. > >Missing functionality by CMA is the usage of the provided TClass and >Service-ID >in the sent PathRecord. When a response is obtained it is an existing >requirement for the CMA to use the PathRecord from the response in setting up >the QP address vector. > > >6. SDP >------- > >SDP uses CMA for building its connections. >The Service-ID for SDP is 0x000000000001PPPP, where PPPP are 4 hex digits >holding the remote TCP/IP Port Number to connect to. >SDP might be provided with SO_PRIORITY socket option. In that case the value >provided should be sent to the CMA as the TClass option of that connection. > >7. SRP >------- > >Current SRP implementation uses its own CM callbacks (not CMA). So SRP should >fill in the Service-ID in the PathRecord by itself and use that >information in >setting up the QP. The T10 SRP standard defines the SRP Service-ID to be >defined >by the SRP target I/O Controller (but they should also comply with IBTA >Service- >ID rules). Anyway, the Service-ID is reported by the I/O Controller in the >ServiceEntries DMA attribute and should be used in the PathRecord if the SA >reports its ability to handle QoS PathRecords. > >8. iSER >-------- >iSER uses CMA and thus should be very close to SDP. The Service-ID for iSER >should be TBD. > > >9. OpenSM features >------------------- >The QoS related functionality to be provided by OpenSM can be split into two >main parts: > >3.1. Fabric Setup >During fabric initialization the SM should parse the policy and apply its >settings to the discovered fabric elements. The following actions should be >performed: >* Parsing of policy >* Node Group identification. Warning should be provided for each node not > specified but found. >* SL2VL settings validation should be checked: > + A warning will be provided if there are no matching targets for the > SL2VL > setting statement. > + An error message will be printed to the log file if an invalid > setting is > found. A setting is invalid if it refers to: > - Non existing port numbers of the target devices > - Unsupported VLs for the target device. In the later case the map to non > existing VLs should be replaced to VL15 i.e. packets will be dropped. >* SL2VL setting is to be performed >* VL Arbitration table settings should be validated according to the >following > rules: > + A warning will be provided if there are no matching targets for the > setting > statement > + An error will be provided if the port number exceeds the target ports > + An error will be generated if the table length exceeds device > capabilities > + An warning will be generated if the table quote a VL that is not > supported > by the target device >* VL Arbitration tables will be set on the appropriate targets > >3.2. PathRecord query handling: >OpenSM should be able to enforce the provided policy on client request. >The overall flow for such requests is: first the request is matched >against the >defined match rules such that the target QoS-Level definition is found. Given >the QoS-Level a path(s) search is performed with the given restrictions >imposed >by that level. The following two sections describe these steps. > >One issue not standardized by the IBTA is how Service-ID is carried in the >PathRecord and MultiPathRecord attributes. There are basically two options: >a. Replace the SM-Key field by the Service-ID. In that case no >component mask > bit will be assigned to it. Such that if the field is zero we should > treat it > as if the component mask bit is clear. >b. Encode it into spare fields. For PathRecord the first two fields are >reserved > and are 64 bit when combined. The first component mask bit maps to the > first > reserved field and should be used for Service-ID masking. For > MultiPathRecord > attribute there are no adjacent reserve fields that makes a 64 bit > field. So > the reserve field following the packet-lifetime (8 bits) combined with > the > reserved field DGIDCount (56 bits) can make the Service-ID. In this > case also > the first reserve field component mask bit should be used as the > Service-ID > component mask bit. > > > >3.2.1. Matching rule search: >A rule is "matching" a PathRecord request using the following criteria: >* Matching rules provide values in a list of either single value, or range >of > values. A PathRecord field is "matching" the rule field if it is > explicitly > noted in the list of values or is one of the values covered by a range > included in the field values list. >* Only PathRecord fields that have their component mask bit set should be > compared. >* For a rule to be "matching" a PathRecord request all the rule fields >should be > "matching" their PathRecord fields. Such that a PathRecord request that > does > not have a component mask field set for one of the rule defined > fields can > not match that rule. >* A PathRecord request that have a component mask bit set for one of the >fields > that is not defined by the rule can match the rule. > >The algorithm to be used for searching for a rule match might be as simple >as a >sequential search through all rules or enhanced for better performance. The >semantics of every rule field and its matching PathRecord field are described >below: >* Source: the SGID or SLID should be part of this group >* Destination: the DGID or DLID should be part of this group >* Service-ID: check if the requested Service-ID (available in the >PathRecord old > SM-Key field) is matching any of this rule Service-IDs >* TClass: check if the PathRecord TClass field is matching > >3.2.2 PathRecord response generation: >The QoS-Level pointed by the first rule that matches the PathRecord request >should be used for obtaining the response SL, MTU-Limit, RATE-Limit, >Path-Bits >and TClass. A default QoS-Level should be used if no rule is matching the >query. > >The efficient algorithm for finding paths that meet the QoS-Level criteria is >beyond the scope of this RFC and left for the implementer to provide. However >the criteria by which the paths match the QoS-Level are described below: > >* SL: The paths found should all use the given SL. For that sake PathRecord > algorithm should traverse the path from source to destination only through > ports that carry a valid VL (not VL15) by the SL2VL map (should > consider input > and output ports and SL). >* MTU-Limit: The resulting paths MTU should not exceed the given MTU-Limit >* Rate-Limit: The resulting paths RATE should not exceed the given RATE-Limit > (rate limit is given in units of link BW = Width*Speed according to IBTA > Specification Vol-1 table-205 p-901 l-24). >* Path-Bits: define the target LID lowest bits (number of bits defined by the > target port PortInfo.LMC field). The path should traverse the LFT using > the > target port LID with the path-bits set. >* TClass: should be returned in the result PathRecord. When routing is >going to > be supported by OpenSM we might use this field in selecting the target > router too in a TBD way. > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From barristerpauluba at walla.com Tue May 30 11:07:24 2006 From: barristerpauluba at walla.com (=?UTF-8?Q?=70=61=75=6C=20=75=62=61?=) Date: Tue, 30 May 2006 21:07:24 +0300 Subject: [openib-general] GOOD DAY Message-ID: <1149012443.790000-76669150-5017@walla.com> An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Tue May 30 11:14:55 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 30 May 2006 11:14:55 -0700 Subject: [openib-general] RE: CMA backlog In-Reply-To: <20060530175420.GE10234@mellanox.co.il> Message-ID: I think that there are some issues that would need to be worked out, but in general I'm in favor of trying to do something here. >Currently, this is not something that can be implemented by ULP on top of >CMA, because returning error from REQ will result in reject rather than REQ >drop. A generic ULP could handle this by making use of the private data, and retrying requests after a REJ with insufficient resources. >CMA already has backlog parameter in listen but it is ignored as far as I can >see. I propose extending cma API with the following options: The backlog applies more for iWarp and userspace. I couldn't find a usable way to make use of backlog in the kernel, since it uses a callback model. >rdma_backlog_added - connection was added to backlog queue >rdma_backlog_removed - connection was removed from backlog queue *ponders* >Internally, CMA will count the # of connections in backlog. If >If REQ arrives and this number exceeds the backlog given in listen, >CMA will drop the REQ, without creating the new CMA ID. Incrementing the number of pending connections on a listen is easy. Decrementing it is more difficult, since a listen request can be destroyed after a connection request is received, but before it is responded to. This is difficult to handle, especially for userspace clients. Additionally, the CMA can't just drop the REQ. The REQ has been received by the IB CM, which is expecting a response. You would need to push backlog into the IB CM, which requires defining what it means at that level. From the perspective of the IB CM, sending a REJ with "No resources available" (reject code 3) seems to make more sense than simply discarding the MAD. One possible fix is to remove sending a reject on destruction of a cm_id. I'm not sure what effect this would have on other code or the overall protocol though. - Sean From caitlinb at broadcom.com Tue May 30 11:28:52 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Tue, 30 May 2006 11:28:52 -0700 Subject: [openib-general] RFC: CMA backlog Message-ID: <54AD0F12E08D1541B826BE97C98F99F150D0E2@NT-SJCA-0751.brcm.ad.broadcom.com> openib-general-bounces at openib.org wrote: > Hello, Sean! > I am looking at implementing the listen backlog parameter correctly. > Here's what this does in TCP: TCP counts the number of > connect requests at the specific local socket that were not > yet accepted by accept(). > Once this number exceeds the backlog specified in listen, new > SYN packets will be dropped, and the remote side will retry. > > Implementing this correctly is important for scalability when > a lot of clients connect to a single server - this way if > there is a pressure on server to handle connections, it is > distributed between clients who have to retry. > > Currently, this is not something that can be implemented by > ULP on top of CMA, because returning error from REQ will > result in reject rather than REQ drop. > > CMA already has backlog parameter in listen but it is ignored > as far as I can see. I propose extending cma API with the > following options: > > rdma_backlog_added - connection was added to backlog queue > rdma_backlog_removed - connection was removed from backlog queue > > Internally, CMA will count the # of connections in backlog. > If If REQ arrives and this number exceeds the backlog given > in listen, CMA will drop the REQ, without creating the new CMA ID. > > This mimics the TCP behaviour closely an will hopefully scale well. > > Opinions? Dropping is one option, but sending an explicit non-peer Reject certainly makes sense for iWARP. In any event, a Reject from the user is a peer reject. No matter what the protocol, the CMA has to be able to ignore or reject requests that exceed the number it can submit to the user for approval. The other clarification is that the connectino request counts against the total until it is accepted *or* rejected. I'm not following what the options actually do, clearly there is a need to have a count of pending connection requests per listen. And once that credit is exhausted there are no more connection requests generated (either by reflex rejection or by dropping). Are you suggesting something beyond that? From mst at mellanox.co.il Tue May 30 11:34:54 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 30 May 2006 21:34:54 +0300 Subject: [openib-general] Re: CMA backlog In-Reply-To: References: <20060530175420.GE10234@mellanox.co.il> Message-ID: <20060530183454.GH10234@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: CMA backlog > > I think that there are some issues that would need to be worked out, but in > general I'm in favor of trying to do something here. > > >Currently, this is not something that can be implemented by ULP on top of > >CMA, because returning error from REQ will result in reject rather than REQ > >drop. > > A generic ULP could handle this by making use of the private data, and retrying > requests after a REJ with insufficient resources. > > >CMA already has backlog parameter in listen but it is ignored as far as I can > >see. I propose extending cma API with the following options: > > The backlog applies more for iWarp and userspace. I couldn't find a usable way > to make use of backlog in the kernel, since it uses a callback model. > > >rdma_backlog_added - connection was added to backlog queue > >rdma_backlog_removed - connection was removed from backlog queue > > *ponders* > > >Internally, CMA will count the # of connections in backlog. If > >If REQ arrives and this number exceeds the backlog given in listen, > >CMA will drop the REQ, without creating the new CMA ID. > > Incrementing the number of pending connections on a listen is easy. > Decrementing it is more difficult, since a listen request can be destroyed after > a connection request is received, but before it is responded to. This is > difficult to handle, especially for userspace clients. That is why, in my opinion, this should be up to the ULP to handle, calling rdma_backlog_added/rdma_backlog_removed as appropriate. Existing ULPs that don't call rdma_backlog_added will simply get all requests. > Additionally, the CMA can't just drop the REQ. The REQ has been received by the > IB CM, which is expecting a response. You would need to push backlog into the > IB CM, which requires defining what it means at that level. From the > perspective of the IB CM, sending a REJ with "No resources available" (reject > code 3) seems to make more sense than simply discarding the MAD. This approach would affect all ULPs, however. For example, no SDP imlementation that I know of retries after a REJ - so this approach won't be interoperable. And AFAIK SDP spec already interprets reject as connection refused. There's no provision I cansee in SDP spec for retries on specific reject code. Dropping REQ simply seems a nice approach since client retries REQ MADs anyway. > One possible fix is to remove sending a reject on destruction of a cm_id. I'm > not sure what effect this would have on other code or the overall protocol > though. Yes, that was my thinking. To avoid touching all users, maybe the simplest way is to make ib_cm discard the new cm_id without reject if the client callback returned -ENOMEM? If you consider that in out of memory situation sending reject will also likely fail, this might be a good idea, regardless. Sounds good? -- MST From sean.hefty at intel.com Tue May 30 11:42:24 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 30 May 2006 11:42:24 -0700 Subject: [openib-general] RE: CMA backlog In-Reply-To: <20060530183454.GH10234@mellanox.co.il> Message-ID: >This approach would affect all ULPs, however. For example, no SDP imlementation >that I know of retries after a REJ - so this approach won't be interoperable. >And AFAIK SDP spec already interprets reject as connection refused. >There's no provision I cansee in SDP spec for retries on specific >reject code. How did SDP expect to handle backlog then? Or was that a consideration? >Yes, that was my thinking. To avoid touching all users, maybe the simplest way >is to make ib_cm discard the new cm_id without reject if the client callback >returned -ENOMEM? > >If you consider that in out of memory situation sending reject will also likely >fail, this might be a good idea, regardless. > >Sounds good? I'd like to get some other feedback, but this approach sounds reasonable. - Sean From mst at mellanox.co.il Tue May 30 11:46:58 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 30 May 2006 21:46:58 +0300 Subject: [openib-general] Re: CMA backlog In-Reply-To: References: <20060530175420.GE10234@mellanox.co.il> Message-ID: <20060530184658.GI10234@mellanox.co.il> Quoting r. Sean Hefty : > >Currently, this is not something that can be implemented by ULP on top of > >CMA, because returning error from REQ will result in reject rather than REQ > >drop. > > A generic ULP could handle this by making use of the private data, and retrying > requests after a REJ with insufficient resources. You are right in this, of course. What I meant is that dropping REQ can not be implemented by ULPs without extending our CMA and CM. As you point out, retrying on reject might make sense for some ULPs, but happily these can already implement this without extending CMA and CM. So we are set there. I suggest adding an option of dropping REQ, emulating TCP behaviour. This is IMO required for SDP, as I read the spec and from I know about other SDP implementations. ULPs will be able to select the best approach appropriate for them. -- MST From mst at mellanox.co.il Tue May 30 11:48:43 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 30 May 2006 21:48:43 +0300 Subject: [openib-general] Re: CMA backlog In-Reply-To: References: <20060530183454.GH10234@mellanox.co.il> Message-ID: <20060530184843.GJ10234@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: CMA backlog > > >This approach would affect all ULPs, however. For example, no SDP imlementation > >that I know of retries after a REJ - so this approach won't be interoperable. > >And AFAIK SDP spec already interprets reject as connection refused. > >There's no provision I cansee in SDP spec for retries on specific > >reject code. > > How did SDP expect to handle backlog then? Or was that a consideration? Right. That's the only way I can see to correctly handle backlog in SDP. -- MST From mst at mellanox.co.il Tue May 30 12:07:16 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 30 May 2006 22:07:16 +0300 Subject: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: <44749D61.1060506@ichips.intel.com> References: <20060524133728.GN21266@mellanox.co.il> <20060524162242.GC21266@mellanox.co.il> <44748D8B.1000508@ichips.intel.com> <20060524173604.GB25186@mellanox.co.il> <44749D61.1060506@ichips.intel.com> Message-ID: <20060530190716.GA11169@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 > > Michael S. Tsirkin wrote: > >>It's not completely trivial to reproduce. I tried loading and > >>unloading ib_ipoib a few times, then I tried to load ib_ipoib and > >>unload ib_mthca, I tried pinging in between loading and unloading, and > >>I didn't see any crashes. > > > > > >Maybe SM was down. > > I've tested loading, ifconfig, unloading a substantial number of times with > the SM up and down, and I can't reproduce this. I'll continue to look into > this, but if you can provide any more information about the test setup, it > would be helpful. I'm still looking at isolating this failure. I'd like to understand the new code better, however. What prevents ipoib_mcast_leave and later ipoib_mcast_free from being called on an mcast that has an outstanding query? We used to have a completion to signal that but it seems to be gone. -- MST From sashak at voltaire.com Tue May 30 12:09:36 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 30 May 2006 22:09:36 +0300 Subject: [openib-general] QoS RFC - Resend using a friendly mailer In-Reply-To: References: Message-ID: <20060530190936.GD21212@sashak.voltaire.com> Hi Eitan, First comments... On 17:53 Tue 30 May , Eitan Zahavi wrote: > > 3. Supported Policy > -------------------- > > The QoS policy supported by this proposal is divided into 4 sub sections: > > * Node Group: a set of HCAs, Routers or Switches that share the same settings. > A node groups might be a partition defined by the partition manager policy in > terms of GUIDs. Future implementations might provide support for NodeDescription > based definition of node groups. Port/Node groups could be defined as separate configuration, then those definitions will be shared by different policies like Partitions, QoS (and maybe others in future). > * Fabric Setup: > Defines how the SL2VL and VLArb tables should be setup. This policy definition > assumes the computation of target behavior should be performed outside of > OpenSM. > > * QoS-Levels Definition: > This section defines the possible sets of parameters for QoS that a client might > be mapped to. Each set holds: SL and optionally: Max MTU, Max Rate, Path Bits > (in case LMC > 0 is used for QoS) and TClass. > > * Matching Rules: > A list of rules that match an incoming PathRecord request to a QoS-Level. The > rules are processed in order such as the first match is applied. Each rule is > built out of set of match expressions which should all match for the rule to > apply. The matching expressions are defined for the following fields > ** SRC and DST to lists of node groups > ** Service-ID to a list of Service-ID or Service-ID ranges > ** TClass to a list of TClass values or ranges > > XML style syntax is provided for the policy file. Why XML? It is not too much readable and writable (by human) format. Sasha From ftillier at silverstorm.com Tue May 30 12:14:42 2006 From: ftillier at silverstorm.com (Fabian Tillier) Date: Tue, 30 May 2006 12:14:42 -0700 Subject: [openib-general] Re: CMA backlog In-Reply-To: <20060530184658.GI10234@mellanox.co.il> References: <20060530175420.GE10234@mellanox.co.il> <20060530184658.GI10234@mellanox.co.il> Message-ID: <79ae2f320605301214k3a399462m56941dc50f44d777@mail.gmail.com> On 5/30/06, Michael S. Tsirkin wrote: > Quoting r. Sean Hefty : > > >Currently, this is not something that can be implemented by ULP on top of > > >CMA, because returning error from REQ will result in reject rather than REQ > > >drop. > > > > A generic ULP could handle this by making use of the private data, and retrying > > requests after a REJ with insufficient resources. > > You are right in this, of course. What I meant is that dropping REQ > can not be implemented by ULPs without extending our CMA and CM. > > As you point out, retrying on reject might make sense for some > ULPs, but happily these can already implement this without extending CMA > and CM. So we are set there. > > I suggest adding an option of dropping REQ, emulating TCP behaviour. > This is IMO required for SDP, as I read the spec and from I know about other > SDP implementations. You mean a half-reject, where it only rejects locally but doesn't send the REJ? This shouldn't be that hard to do if the local CM supported a special reject code that would suppress the REJ MAD being sent. Alternatively, at least in Windows, the CM will continue to retry REQ requests if it receives a REJ with INVALID_SID as the reason (i.e. the app is not up yet). If the Linux CM could do the same, we could use this reject reason to "drop" the REQ. - Fab From codigoda at gmail.com Tue May 30 12:17:37 2006 From: codigoda at gmail.com (codigoda) Date: Tue, 30 May 2006 19:17:37 GMT Subject: [openib-general] Saiba tudo !! sobre Codigo da Vinci.. Message-ID: <20060530193024.69F142283D5@openib.ca.sandia.gov> An HTML attachment was scrubbed... URL: From eitan at mellanox.co.il Tue May 30 12:27:40 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 30 May 2006 22:27:40 +0300 Subject: [openib-general] QoS RFC - Resend using a friendly mailer Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302368776@mtlexch01.mtl.com> Hi Todd, > While using the Service ID is an interesting idea, the problem is the Service ID values > are not well defined by IBTA. Rather each endpoint is permitted to define its own, > potentially transient set of Service ID values. The Service ID values are discovered via > Service Records in the SA or Device Management queries which get their data from > the IOU. [EZ] Actually there are quite a few rules for how service IDs are made. Different service vendors are supposed to use different Service-IDs. Also this RFC does enforce using Service-IDs in cases where they are not defined. But it does provide the means to do that when such service are defined. So in no way you can say it breaks existing implementations. Just provide a way for applications that do make a constant use of Service-IDs benefit from that property. > > Hence while a few service ID values are well defined (such as those for SDP), many > are not (such as those for MPI, uDAPL, SRP, etc) and may vary between both > hardware and software suppliers. Many are likely to be duplicated between different > vendors target devices (for example a uDAPL target application may duplicate values > used by an SRP target) and this would not be a problem provided both applications > were never run on the same IB Node target device. Some might even change on each > reboot (IBTA spec implies this could be a 64 bit pointer or context in the target), > although I'm not aware of any which do. > > I believe it is for the above reasons that IBTA chose not to make ServiceID part of the > PathRecord and MultiPathRecord queries. > > As Roland suggest, before implementing a non-standard approach, IBTA should be > engaged to define an appropriate extension to the standard. Such extensions would > need to be carefully defined to avoid breaking existing applications and fabrics. [EZ] You are welcome to join IBTA and work on this too. > > Todd Rimmer > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Tue May 30 12:16:22 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 May 2006 15:16:22 -0400 Subject: [openib-general] Re: special qp's creation from userspace In-Reply-To: References: <1148995226.4358.111151.camel@hal.voltaire.com> Message-ID: <1149016578.4358.119850.camel@hal.voltaire.com> On Tue, 2006-05-30 at 11:02, yipee wrote: > Hal Rosenstock voltaire.com> writes: > > > On Tue, 2006-05-30 at 05:35, yipee wrote: > > > Can I use ib_mad_port_close() (mad.c) to close qp0 & qp1 > > > > Yes, that would close QP0/1 on each port. Just unloading the ib_mad > > module will have that effect. > > Would this unloading cause the port status to downgrade from ACTIVE to some > other state? > Does ib_mad tell the active SM that it is going down? With mthca, you can't unload ib_mad without unloading mthca as well so this scheme won't work. How would you call ib_mad_port_close ? It's not exposed to userspace. > What would happen to active RC connections? Would they be affected by this? I think we've discussed this on the list before. -- Hal > Thanks, > x > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From eitan at mellanox.co.il Tue May 30 12:34:55 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 30 May 2006 22:34:55 +0300 Subject: [openib-general] QoS RFC - Resend using a friendly mailer Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302368777@mtlexch01.mtl.com> Hi Hal, Please see my responses inside Eitan > > > > RFC: OpenFabrics Enhancements for QoS Support > > =============================================== > > > > Authors: . Eitan Zahavi > > Date: .... May 2006. > > Revision: 0.1 > > > > Table of contents: > > 1. Overview > > 2. Architecture > > 3. Supported Policy > > 4. CMA functionality > > 5. IPoIB functionality > > 6. SDP functionality > > 7. SRP functionality > > 8. iSER functionality > > 9. OpenSM functionality > > > > 1. Overview > > ------------ > > Quality of Service requirements stem from the realization of I/O consolidation > > over IB network: As multiple applications and ULPs share the same fabric, means > > to control their use of the network resources are becoming a must. The basic > > need is to differentiate the service levels provided to different traffic flows. > > Such that a policy could be enforced and control each flow utilization of the > > fabric resources. > > > > IBTA specification defined several hardware features and management interfaces > > to support QoS: > > * Up to 15 Virtual Lanes (VL) could carry traffic in a non-blocking manner > > * Arbitration between traffic of different VL is performed by a 2 priority > > levels weighted round robin arbiter. The arbiter is programmable with > > a sequence of (VL, weight) pairs and maximal number of high priority credits > > to be processed before low priority is served > > * Packets carry class of service marking in the range 0 to 15 in their > > header SL field > > * Each switch can map the incoming packet by its SL to a particular output > > VL based on programmable table VL=SL-to-VL-MAP(in-port, out-port, SL) > > * The Subnet Administrator controls each communication flow parameters > > by providing them as a response to Path Record query > > > > The IB QoS features provide the means to implement a DiffServ like architecture. > > DiffServ architecture (IETF RFC2474 2475) is widely used today in highly dynamic > > fabrics. > > Only certain DSCP code point equivalents are provided by IBA. [EZ] True. > > > This proposal provides the detailed functional definition for the various > > software elements that are required to enable a DiffServ like architecture over > > the OpenFabrics software stack. > > > > > > > > > > > > 2. Architecture > > ---------------- > > This proposal split the QoS functionality between the SM/SA, CMA and the various > > ULPS. We take the "chronology approach" to describe how the overall system > > works: > > > > 2.1. The network manager (human) provides a set of rules (policy) that defines > > how the network is being configured and how its resources are split to different > > QoS-Levels. The policy also define how to decide which QoS-Level each > > application or ULP or service use. > > > 2.2. The SM analyzes the provided policy to see if it is realizable and performs > > the necessary fabric setup. The SM may continuously monitor the policy and adapt > > to changes in it. > > Do you mean monitor the policy or the fabric here ? [EZ] I mean monitor the policy such that changes in it are enforced. > > > Part of this policy defines the default QoS-Level of each > > partition. The SA is being enhanced to match the requested Source, Destination, > > TClass, Service-ID > > Service ID does not apply to many ULPs. Also, how is it known what > ULP/application a particular service ID refers to (other than perhaps > some well known ones) ? [EZ] True - only well known Service-IDs can have a predefined policy attached to. But I disagree on the fact services are unknown - if they are unknown how are they being found by the clients? > > > (and optionally SL and priority) against the policy. So > > clients (ULPs, programs) can obtain a policy enforced QoS. The SM is also > > enhanced to support setting up partitions with appropriate IPoIB broadcast > > group. This broadcast group carries its QoS attributes: TClass, SL, MTU and > > RATE. > > > > 2.3. IPoIB is being setup. IPoIB uses the SL, MTU and RATE available on the > > multicast group which forms the broadcast group of this partition. > > > > 2.4. MPI which provides non IB based connection management should be > configured > > to run using hard coded SLs. It uses these SLs in every QP being opened. > > > > 2.5. ULPs that use CM interface (like SRP) should have their own pre-assigned > > Service-ID and use it while obtaining PathRecord for establishing their > > connections. The SA receiving the PathRecord should match it against the policy > > and return the appropriate PathRecord including SL, MTU, RATE and TClass. > > > > 2.6. ULPs and programs using CMA to establish RC connection should provide the > > CMA the target IP and Service-ID. Some of the ULPs might also provide TClass > > (E.g. for SDP sockets that are provided the TOS socket option). The CMA should > > then use the provided Service-ID and optional TClass and pass them in the > > PathRecord request. The resulting PathRecord should be used for configuring the > > connection QP. > > > > PathRecord and MultiPathRecord enhancement for QoS: > > As mentioned above the PathRecord and MultiPathRecord attributes should be > > enhanced to carry the Service-ID which is a 64bit value. Given the existing > > definition for these attributes we propose to use the following fields for > > Service-ID: > > * For PathRecord: use the first 2 reserved fields whicg are 32bits each > > (component masks 0x1 and 0x2). Component mask 1 should be used to refer to the > > merged Service-ID field > > * For MultiPathRecord: use 2 reserved fields: > > 1. after the packet life (8 bits) which is component mask bit 0x10000 (17) > > 2. the field before SDGID1 (56 bits) which is component mask bit 0x200000 (22) > > This is not possible with the existing approved 1.2 erratum changes. [EZ] Ooops I was using 1.2 spec. Can you elaborate on the field I missed? Can we find a replacement? > > > Once merged they should be selected using component mask bit 0x10000 (17) > > A new capability bit should describe the SM QoS support in the SA class port > > info. This approach provides an easy migration path for existing access layer > > and ULPs by not introducing a new attribute. > > > > > > 3. Supported Policy > > -------------------- > > > > The QoS policy supported by this proposal is divided into 4 sub sections: > > > > * Node Group: a set of HCAs, Routers or Switches that share the same settings. > > A node groups might be a partition defined by the partition manager policy in > > terms of GUIDs. Future implementations might provide support for > NodeDescription > > based definition of node groups. > > > > * Fabric Setup: > > Defines how the SL2VL and VLArb tables should be setup. This policy definition > > assumes the computation of target behavior should be performed outside of > > OpenSM. > > > > * QoS-Levels Definition: > > This section defines the possible sets of parameters for QoS that a client might > > be mapped to. Each set holds: SL and optionally: Max MTU, Max Rate, Path Bits > > (in case LMC > 0 is used for QoS) and TClass. > > > > * Matching Rules: > > A list of rules that match an incoming PathRecord request to a QoS-Level. The > > rules are processed in order such as the first match is applied. Each rule is > > built out of set of match expressions which should all match for the rule to > > apply. The matching expressions are defined for the following fields > > ** SRC and DST to lists of node groups > > ** Service-ID to a list of Service-ID or Service-ID ranges > > ** TClass to a list of TClass values or ranges > > > > XML style syntax is provided for the policy file. However, a strict BNF format > > (provided in section 8) > > What section ? [EZ] Sorry I planned to add it and did not make it for this mail. Please ignore this. I will provide the BNF once we make some progress. > > > should be used for parsing it. > > > > > > > > > > > > > > Storage our SRP storage targets > > 0x1000000000000001 > > 0x1000000000000002 > > > > > > Virtual Servers node desc and IB port # > > vs1/HCA-1/P1 > > vs3/HCA-1/P1 > > vs3/HCA-2/P1 > > > > > > Partition 1 default settings > > Part1 > > > > > > Routers all routers > > ROUTER > > > > > > > > > > > > > > Part1 * * > > 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 > > > > > > Storage * 1 > > 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1 > > > > > > > > > > > > > > Storage * > > > > 0:255,1:127,2:63,3:31,4:15,5:7,6:3,7:1 > > 8:255,9:127,10:63,11:31,12:15,13:7,14:3 > > 10 > > > > > > > > > > > > > > 1 for the lowest priority comm > > 16 > > > > > > 2 low latency best bandwidth > > 0 7 > > > > > > 3 just an example > > 0 32 1 > > 1 > > > > > > > > > > > > 1 low latency by tclass 7-9 or 11> > > 7-9,11 1 > > > > > > 2 Storage targets connection> > > Storage 22,4719 > > 3 > > > > > > > > > > > > > > 4. IPoIB > > --------- > > > > IPoIB already query the SA for its broadcast group information. The additional > > functionality required is for IPoIB to provide the broadcast group SL, MTU, RATE > > and TClass in every following PathRecord query performed when a new UDAV is > > needed by IPoIB. > > We could assign a special Service-ID for IPoIB use but since all communication > > on the same IPoIB interface shares the same QoS-Level without the ability to > > differentiate it by target service we can ignore it for simplicity. > > > > 5. CMA features > > ---------------- > > > > The CMA interface supports Service-ID through the notion of port space as a > > prefixes to the port_num which is part of the sockaddr provided to > > rdma_resolve_add(). What is missing is the explicit request for a TClass that > > should allow the ULP (like SDP) to propagate a specific request for a class of > > service. A mechanism for providing the TClass is available in the IPv6 address, > > so we could use that address field. Another option is to implement a special > > connection options API for CMA. > > > > Missing functionality by CMA is the usage of the provided TClass and Service-ID > > in the sent PathRecord. When a response is obtained it is an existing > > requirement for the CMA to use the PathRecord from the response in setting up > > the QP address vector. > > > > > > 6. SDP > > ------- > > > > SDP uses CMA for building its connections. > > The Service-ID for SDP is 0x000000000001PPPP, where PPPP are 4 hex digits > > holding the remote TCP/IP Port Number to connect to. > > SDP might be provided with SO_PRIORITY socket option. In that case the value > > provided should be sent to the CMA as the TClass option of that connection. > > > > 7. SRP > > ------- > > > > Current SRP implementation uses its own CM callbacks (not CMA). So SRP should > > fill in the Service-ID in the PathRecord by itself and use that information in > > setting up the QP. The T10 SRP standard defines the SRP Service-ID to be defined > > by the SRP target I/O Controller (but they should also comply with IBTA Service- > > ID rules). Anyway, the Service-ID is reported by the I/O Controller in the > > ServiceEntries DMA attribute and should be used in the PathRecord if the SA > > reports its ability to handle QoS PathRecords. > > > > 8. iSER > > -------- > > iSER uses CMA and thus should be very close to SDP. The Service-ID for iSER > > should be TBD. > > > > > > 9. OpenSM features > > ------------------- > > The QoS related functionality to be provided by OpenSM can be split into two > > main parts: > > > > 3.1. Fabric Setup > > During fabric initialization the SM should parse the policy and apply its > > settings to the discovered fabric elements. The following actions should be > > performed: > > * Parsing of policy > > * Node Group identification. Warning should be provided for each node not > > specified but found. > > What about the other way 'round too (nodes specified but not found) ? [EZ] Yep. Will require some warning too. > > > * SL2VL settings validation should be checked: > > + A warning will be provided if there are no matching targets for the SL2VL > > setting statement. > > + An error message will be printed to the log file if an invalid setting is > > found. A setting is invalid if it refers to: > > - Non existing port numbers of the target devices > > - Unsupported VLs for the target device. In the later case the map to non > > existing VLs should be replaced to VL15 i.e. packets will be dropped. > > * SL2VL setting is to be performed > > * VL Arbitration table settings should be validated according to the following > > rules: > > + A warning will be provided if there are no matching targets for the setting > > statement > > + An error will be provided if the port number exceeds the target ports > > + An error will be generated if the table length exceeds device capabilities > > + An warning will be generated if the table quote a VL that is not supported > > by the target device > > * VL Arbitration tables will be set on the appropriate targets > > One needs to be careful about these rules as there are a number of > different "shapes" to these tables. [EZ] Not sure what you mean by shape. IBTA defined all VLArb with same format? > > > 3.2. PathRecord query handling: > > OpenSM should be able to enforce the provided policy on client request. > > The overall flow for such requests is: first the request is matched against the > > defined match rules such that the target QoS-Level definition is found. Given > > the QoS-Level a path(s) search is performed with the given restrictions imposed > > by that level. The following two sections describe these steps. > > > > One issue not standardized by the IBTA is how Service-ID is carried in the > > PathRecord and MultiPathRecord attributes. There are basically two options: > > a. Replace the SM-Key field by the Service-ID. In that case no component mask > > bit will be assigned to it. Such that if the field is zero we should treat it > > as if the component mask bit is clear. > > b. Encode it into spare fields. For PathRecord the first two fields are reserved > > and are 64 bit when combined. The first component mask bit maps to the first > > reserved field and should be used for Service-ID masking. For MultiPathRecord > > attribute there are no adjacent reserve fields that makes a 64 bit field. So > > the reserve field following the packet-lifetime (8 bits) combined with the > > reserved field DGIDCount (56 bits) can make the Service-ID. In this case also > > the first reserve field component mask bit should be used as the Service-ID > > component mask bit. > > > > > > > > 3.2.1. Matching rule search: > > A rule is "matching" a PathRecord request using the following criteria: > > * Matching rules provide values in a list of either single value, or range of > > values. A PathRecord field is "matching" the rule field if it is explicitly > > noted in the list of values or is one of the values covered by a range > > included in the field values list. > > * Only PathRecord fields that have their component mask bit set should be > > compared. > > * For a rule to be "matching" a PathRecord request all the rule fields should be > > "matching" their PathRecord fields. Such that a PathRecord request that does > > not have a component mask field set for one of the rule defined fields can > > not match that rule. > > * A PathRecord request that have a component mask bit set for one of the fields > > that is not defined by the rule can match the rule. > > > > The algorithm to be used for searching for a rule match might be as simple as a > > sequential search through all rules or enhanced for better performance. The > > semantics of every rule field and its matching PathRecord field are described > > below: > > * Source: the SGID or SLID should be part of this group > > * Destination: the DGID or DLID should be part of this group > > * Service-ID: check if the requested Service-ID (available in the PathRecord old > > SM-Key field) is matching any of this rule Service-IDs > > * TClass: check if the PathRecord TClass field is matching > > > > 3.2.2 PathRecord response generation: > > The QoS-Level pointed by the first rule that matches the PathRecord request > > should be used for obtaining the response SL, MTU-Limit, RATE-Limit, Path-Bits > > and TClass. A default QoS-Level should be used if no rule is matching the query. > > > > The efficient algorithm for finding paths that meet the QoS-Level criteria is > > beyond the scope of this RFC and left for the implementer to provide. However > > the criteria by which the paths match the QoS-Level are described below: > > > > * SL: The paths found should all use the given SL. For that sake PathRecord > > algorithm should traverse the path from source to destination only through > > ports that carry a valid VL (not VL15) by the SL2VL map (should consider input > > and output ports and SL). > > * MTU-Limit: The resulting paths MTU should not exceed the given MTU-Limit > > * Rate-Limit: The resulting paths RATE should not exceed the given RATE-Limit > > (rate limit is given in units of link BW = Width*Speed according to IBTA > > Specification Vol-1 table-205 p-901 l-24). > > * Path-Bits: define the target LID lowest bits (number of bits defined by the > > target port PortInfo.LMC field). The path should traverse the LFT using the > > target port LID with the path-bits set. > > * TClass: should be returned in the result PathRecord. When routing is going to > > be supported by OpenSM we might use this field in selecting the target > > router too in a TBD way. > > From eitan at mellanox.co.il Tue May 30 12:43:28 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 30 May 2006 22:43:28 +0300 Subject: [openib-general] QoS RFC - Resend using a friendly mailer Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302368778@mtlexch01.mtl.com> Hi Sasha, Thanks for your comments. Please see my comments inside > > 3. Supported Policy > > -------------------- > > > > The QoS policy supported by this proposal is divided into 4 sub sections: > > > > * Node Group: a set of HCAs, Routers or Switches that share the same settings. > > A node groups might be a partition defined by the partition manager policy in > > terms of GUIDs. Future implementations might provide support for > NodeDescription > > based definition of node groups. > > Port/Node groups could be defined as separate configuration, then those > definitions will be shared by different policies like Partitions, QoS (and > maybe others in future). [EZ] Great idea. I would suggest using NodeDescription as a way to get node names. But this is yet another issue for discussion on the IBTA and this list. > > > * Fabric Setup: > > Defines how the SL2VL and VLArb tables should be setup. This policy definition > > assumes the computation of target behavior should be performed outside of > > OpenSM. > > > > * QoS-Levels Definition: > > This section defines the possible sets of parameters for QoS that a client might > > be mapped to. Each set holds: SL and optionally: Max MTU, Max Rate, Path Bits > > (in case LMC > 0 is used for QoS) and TClass. > > > > * Matching Rules: > > A list of rules that match an incoming PathRecord request to a QoS-Level. The > > rules are processed in order such as the first match is applied. Each rule is > > built out of set of match expressions which should all match for the rule to > > apply. The matching expressions are defined for the following fields > > ** SRC and DST to lists of node groups > > ** Service-ID to a list of Service-ID or Service-ID ranges > > ** TClass to a list of TClass values or ranges > > > > XML style syntax is provided for the policy file. > > Why XML? It is not too much readable and writable (by human) format. [EZ] Well, I agree with you but already got so many requests for XML that I could not resists. Maybe we could do both. If we have a nice BNF it would be just a matter of some yacc exercise. IMO it is the least of our problems. > > Sasha > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From michaelc at cs.wisc.edu Tue May 30 12:50:44 2006 From: michaelc at cs.wisc.edu (Mike Christie) Date: Tue, 30 May 2006 14:50:44 -0500 Subject: [openib-general] Re: [PATCH] IB/iser: do I/O path allocations with GFP_NOIO In-Reply-To: References: Message-ID: <447CA214.7010202@cs.wisc.edu> Should iser patches have linux-scsi ccd on them in the future? And should they go through the scsi maintainer normally (I understand they cannot now since James does not have all the infinniband bits)? I am really just trying to avoid any coordinatation issues that come about by having core iscsi and tcp iscsi patched sent to the scsi maintainer then having to have iser going through Roland. For example I left a bit in the core iscsi code so I would not break iser. Now iser is updating their code, so we do not need that bit, but Or's patch missed the cleanup. If we sent everything through one maintainer then we could have cleaned everything up in one pass. Does srp go from openib-general and Roland then to lkml? For iscsi we do not go through net-dev and we live in drivers/scsi so maybe we are the odd driver?:) What is the proper or normal procedure? Or Gerlitz wrote: > Thanks for Mike Christie for pointing this out - Or. > > a block driver is not allowed to use GFP_KERNEL allocations on its I/O code > path since the allocation might require I/O (eg to pageout other memory), > resulting in either deadlock or tightloop. > > move I/O path (queuecommand) allocations to be done with GFP_NOIO > > Signed-off-by: Or Gerlitz > > diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c > index 2703bb0..073e7b5 100644 > --- a/drivers/infiniband/ulp/iser/iser_initiator.c > +++ b/drivers/infiniband/ulp/iser/iser_initiator.c > @@ -225,7 +225,7 @@ static int iser_post_receive_control(str > struct iser_device *device = iser_conn->ib_conn->device; > int rx_data_size, err = 0; > > - rx_desc = kmem_cache_alloc(ig.desc_cache, GFP_KERNEL); > + rx_desc = kmem_cache_alloc(ig.desc_cache, GFP_NOIO); > if (rx_desc == NULL) { > iser_err("Failed to alloc desc for post recv\n"); > return -ENOMEM; > @@ -238,7 +238,7 @@ static int iser_post_receive_control(str > else /* FIXME till user space sets conn->max_recv_dlength correctly */ > rx_data_size = 128; > > - rx_desc->data = kmalloc(rx_data_size, GFP_KERNEL); > + rx_desc->data = kmalloc(rx_data_size, GFP_NOIO); > if (rx_desc->data == NULL) { > iser_err("Failed to alloc data buf for post recv\n"); > err = -ENOMEM; > @@ -467,7 +467,7 @@ int iser_send_data_out(struct iscsi_conn > iser_dbg("%s itt %d dseg_len %d offset %d\n", > __func__,(int)itt,(int)data_seg_len,(int)buf_offset); > > - tx_desc = kmem_cache_alloc(ig.desc_cache, GFP_KERNEL); > + tx_desc = kmem_cache_alloc(ig.desc_cache, GFP_NOIO); > if (tx_desc == NULL) { > iser_err("Failed to alloc desc for post dataout\n"); > return -ENOMEM; > diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c > index 0881f55..31950a5 100644 > --- a/drivers/infiniband/ulp/iser/iser_memory.c > +++ b/drivers/infiniband/ulp/iser/iser_memory.c > @@ -111,10 +111,10 @@ int iser_start_rdma_unaligned_sg(struct > unsigned long cmd_data_len = data->data_len; > > if (cmd_data_len > ISER_KMALLOC_THRESHOLD) > - mem = (void *)__get_free_pages(GFP_KERNEL, > + mem = (void *)__get_free_pages(GFP_NOIO, > long_log2(roundup_pow_of_two(cmd_data_len)) - PAGE_SHIFT); > else > - mem = kmalloc(cmd_data_len, GFP_KERNEL); > + mem = kmalloc(cmd_data_len, GFP_NOIO); > > if (mem == NULL) { > iser_err("Failed to allocate mem size %d %d for copying sglist\n", From mst at mellanox.co.il Tue May 30 12:55:15 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 30 May 2006 22:55:15 +0300 Subject: [openib-general] Re: CMA backlog In-Reply-To: <79ae2f320605301214k3a399462m56941dc50f44d777@mail.gmail.com> References: <20060530175420.GE10234@mellanox.co.il> <20060530184658.GI10234@mellanox.co.il> <79ae2f320605301214k3a399462m56941dc50f44d777@mail.gmail.com> Message-ID: <20060530195515.GB11169@mellanox.co.il> Quoting r. Fabian Tillier : > Subject: Re: [openib-general] Re: CMA backlog > > On 5/30/06, Michael S. Tsirkin wrote: > >Quoting r. Sean Hefty : > >> >Currently, this is not something that can be implemented by ULP on top > >of > >> >CMA, because returning error from REQ will result in reject rather than > >REQ > >> >drop. > >> > >> A generic ULP could handle this by making use of the private data, and > >retrying > >> requests after a REJ with insufficient resources. > > > >You are right in this, of course. What I meant is that dropping REQ > >can not be implemented by ULPs without extending our CMA and CM. > > > >As you point out, retrying on reject might make sense for some > >ULPs, but happily these can already implement this without extending CMA > >and CM. So we are set there. > > > >I suggest adding an option of dropping REQ, emulating TCP behaviour. > >This is IMO required for SDP, as I read the spec and from I know about > >other > >SDP implementations. > > You mean a half-reject, where it only rejects locally but doesn't send > the REJ? This shouldn't be that hard to do if the local CM supported > a special reject code that would suppress the REJ MAD being sent. Yes. > Alternatively, at least in Windows, the CM will continue to retry REQ > requests if it receives a REJ with INVALID_SID as the reason (i.e. the > app is not up yet). This seems like a weird thing to do. Connecting to a port where no app listens with TCP generates an immediate error: >telnet localhost 3456 Trying 127.0.0.1... telnet: Unable to connect to remote host: Connection refused Exit 1 so why would SDP want to behave differently? > If the Linux CM could do the same, we could use > this reject reason to "drop" the REQ. We could I guess, but I don't think this is spec compliant. Its probably better to really drop it. -- MST From caraudet at genesishealth.com Tue May 30 13:29:08 2006 From: caraudet at genesishealth.com (Carsten Audet) Date: Tue, 30 May 2006 13:29:08 -0700 Subject: [openib-general] Re: 15 straggl Message-ID: <000001c68427$b3ec5a00$037ba8c0@tth80> Hi, L E V ? T R A V A L / U M P R O Z ^ C V ? A G R A X ^ N A X C ? A L i S S O M ^ M E R ? D i A A M B / E N http://www.donmissu.com by, taking no notice of him, cursing and whispering as he ran. What could it mean? Gollum could see in the dark. Bilbo could see the light of his eyes palely shining even from behind. Painfully he got up, and sheathed his sword, which was now glowing faintly again, then very cautiously he followed. There seemed nothing else to do. It was no good crawling back down to Gollums water. Perhaps if he followed him, Gollum -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdreier at cisco.com Tue May 30 13:35:37 2006 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 May 2006 13:35:37 -0700 Subject: [openib-general] Re: [PATCH] IB/iser: do I/O path allocations with GFP_NOIO In-Reply-To: <447CA214.7010202@cs.wisc.edu> (Mike Christie's message of "Tue, 30 May 2006 14:50:44 -0500") References: <447CA214.7010202@cs.wisc.edu> Message-ID: Mike> Should iser patches have linux-scsi ccd on them in the Mike> future? And should they go through the scsi maintainer Mike> normally (I understand they cannot now since James does not Mike> have all the infinniband bits)? I am really just trying to Mike> avoid any coordinatation issues that come about by having Mike> core iscsi and tcp iscsi patched sent to the scsi maintainer Mike> then having to have iser going through Roland. Mike> Does srp go from openib-general and Roland then to lkml? For Mike> iscsi we do not go through net-dev and we live in Mike> drivers/scsi so maybe we are the odd driver?:) What is the Mike> proper or normal procedure? It's a problem because SRP and iSER are straddling both the SCSI and IB worlds. Probably the best policy is to cc all relevant mailing lists (at least linux-scsi and openib-general) whenever there's a doubt about who should see something. As far as merging patches goes, I've been merging SRP changes directly to Linus, except for generic fixes to , which I've been sending through James. Or felt that iSCSI should be merged through my tree, but I have no problem if in the future patches bypass my tree. (But I would like to be cc'ed on changes to IB stuff, especially core things outside of specific drivers) (Which all reminds me I have a question about SCSI EH and SRP to send to the linux-scsi list...) - R. From sashak at voltaire.com Tue May 30 14:17:05 2006 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 31 May 2006 00:17:05 +0300 Subject: [openib-general] QoS RFC - Resend using a friendly mailer In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E302368778@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E302368778@mtlexch01.mtl.com> Message-ID: <20060530211705.GI21212@sashak.voltaire.com> On 22:43 Tue 30 May , Eitan Zahavi wrote: > > > > > > XML style syntax is provided for the policy file. > > > > Why XML? It is not too much readable and writable (by human) format. > [EZ] Well, I agree with you but already got so many requests for XML > that I could not resists. Maybe we could do both. If we have a nice BNF > it would be just a matter of some yacc exercise. I less care about a parser complexity but more about people which will wish to edit the policy definitions with just 'vi' (or any other text editor). And I agree that OpenSM config -> XML converter may be not so hard to do. Sasha From mshefty at ichips.intel.com Tue May 30 14:21:43 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 30 May 2006 14:21:43 -0700 Subject: [openib-general] Re: ipoib use of multicast module on trunk causes kernel oops on 2.6.16 In-Reply-To: <20060530190716.GA11169@mellanox.co.il> References: <20060524133728.GN21266@mellanox.co.il> <20060524162242.GC21266@mellanox.co.il> <44748D8B.1000508@ichips.intel.com> <20060524173604.GB25186@mellanox.co.il> <44749D61.1060506@ichips.intel.com> <20060530190716.GA11169@mellanox.co.il> Message-ID: <447CB767.9090401@ichips.intel.com> Michael S. Tsirkin wrote: > I'm still looking at isolating this failure. I'd like to understand the new > code better, however. What prevents ipoib_mcast_leave and later > ipoib_mcast_free from being called on an mcast that has an outstanding query? > > We used to have a completion to signal that but it seems to be gone. The multicast module requires a call to ib_free_multicast() after ib_join_multicast() has been called. It doesn't matter when ib_free_multicast() is called, but it is a blocking call. After ib_free_multicast() returns, the user's callback will not be invoked. Any synchronization issues, such as leaving a group while it has an outstanding query, are pushed into the multicast module. This is necessary to serialize join and leave requests from multiple users on the same group. - Sean From halr at voltaire.com Tue May 30 14:21:06 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 May 2006 17:21:06 -0400 Subject: [openib-general] Failed multicast join with new multicast module In-Reply-To: <447C849A.4060200@ichips.intel.com> References: <1148905985.4358.78023.camel@hal.voltaire.com> <447C849A.4060200@ichips.intel.com> Message-ID: <1149024065.4510.757.camel@hal.voltaire.com> On Tue, 2006-05-30 at 13:44, Sean Hefty wrote: > Hal Rosenstock wrote: > > Send-only joins is another case. These are full member joins (JoinState > > 1) to groups which are not yet created so they fail. > > I see the problem, and checked in a fix. I forgot to record the last join > operation that was initiated, so that it could be failed on an error. This > resulted in the join being retried repeatedly. The join request should now > fail, and ipoib will retry using an exponential backoff strategy. Yes, that took care of the continual retrying. That's better. Thanks. Is client reregister handled properly by the multicast module ? -- Hal > - Sean From mshefty at ichips.intel.com Tue May 30 14:33:10 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 30 May 2006 14:33:10 -0700 Subject: [openib-general] Failed multicast join with new multicast module In-Reply-To: <1149024065.4510.757.camel@hal.voltaire.com> References: <1148905985.4358.78023.camel@hal.voltaire.com> <447C849A.4060200@ichips.intel.com> <1149024065.4510.757.camel@hal.voltaire.com> Message-ID: <447CBA16.4060106@ichips.intel.com> Hal Rosenstock wrote: > Is client reregister handled properly by the multicast module ? Can you clarify what you mean by this? Are you asking about re-sending join requests based on some event? - Sean From halr at voltaire.com Tue May 30 14:33:24 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 May 2006 17:33:24 -0400 Subject: [openib-general] Failed multicast join with new multicast module In-Reply-To: <447CBA16.4060106@ichips.intel.com> References: <1148905985.4358.78023.camel@hal.voltaire.com> <447C849A.4060200@ichips.intel.com> <1149024065.4510.757.camel@hal.voltaire.com> <447CBA16.4060106@ichips.intel.com> Message-ID: <1149024804.4510.1056.camel@hal.voltaire.com> On Tue, 2006-05-30 at 17:33, Sean Hefty wrote: > Hal Rosenstock wrote: > > Is client reregister handled properly by the multicast module ? > > Can you clarify what you mean by this? Are you asking about re-sending join > requests based on some event? Yes; when the SM sends a Set PortInfo with ClientReregister set. -- Hal > > - Sean From kyochan at walla.com Tue May 30 15:07:22 2006 From: kyochan at walla.com (kyochan at walla.com) Date: Tue, 30 May 2006 15:07:22 -0700 (PDT) Subject: [openib-general] =?utf-8?b?wonDr8KIw7XCl2zCj8OQwonDrsKPw67ClcOx?= Message-ID: 20030925135350.78202mail@mail.pop_lavieen-r8754158754_top881server_serebusystem06_lavieen-r.cx ���T�C�g�̉���l�ł��B���ׂĖ����ł̂��ē�ł��B �������҂����킹�ł���l�T���Ă܂��B�@�@�@�@�@���O�F���юq�@�@�@�N��F25�΁i�f���������ݎ����@19�F23�@�j ������Ǝ��Ԃɂ͗]�T����̂ŁA���ЁA�@�@�@�@�@���O�F�������@�@�@�N��F28�΁i�f���������ݎ����@19�F25�@�j���̂̊֌W�݂̂ŁA�A�@�@�@�@�@�@�@�@�@�@�@�@�@���O�FM.R�@ �@�N��F23�΁i�f���������ݎ����@19�F37�@�j ���P�X���ȍ~���l���܂����H�@�@�@�@�@�@�@�@���O�FOL���Ă܂��@�N��F23�΁i�f���������ݎ����@20�F08�@�j http://lavieen-r.cx/h/ ���֓��̐l���܂��񂩁H�@�@�@�@�@�@�@�@�@�@�@�@���O�F�Ƃ�����@�@�N��F34�΁i�f���������ݎ����@20�F11�@�j �����ł��B�������炢�̔N��̐l���[�����������B���O�F�������@�@�@�N��F22�΁i�f���������ݎ����@20�F33�@�j���{���ɏo��n���ĉ��̂��ȁH�H�@�@�@�@�@���O�F�i�B�B�j�@�@�N��F21�΁i�f���������ݎ����@20�F34�@�j ������A24�΁A�Ԏ����B�@�@�@�@�@�@�@�@�@�@�@�@���O�F����@�@�@�@�N��F28�΁i�f���������ݎ����@20�F45�@�j �������s���ł���j���Ȃ�N�ł�����ł����@�@�@���O�F�{�C�ł��@�@�N��F23�΁i�f���������ݎ����@20�F59�@�j http://lavieen-r.cx/h/ From inx10200505 at yahoo.co.jp Sun May 28 18:01:12 2006 From: inx10200505 at yahoo.co.jp (=?iso-2022-jp?B?GyRCNVpAbkZgMXsbKEI=?=) Date: Mon, 29 May 2006 10:01:12 +0900 Subject: [openib-general] =?iso-2022-jp?b?GyRCPVU6WiReJCQhIU4pMlYbKEI=?= =?iso-2022-jp?b?GyRCTX07UiEhJEokSSROTiIjRCNWI0QbKEI=?= Message-ID: <20060530083059.AED5D1E29CE@muhux8.mine.nu> 春菜まいちゃんの無修正・裏DVDや、 いまや痴女優の女王となった立花理子ちゃんの裏DVDなど、 有名女優の裏モノを各種取り揃えております! http://masu2.pink-no1.net 裏・無修正DVDのネットショップ http://masu2.pink-no1.net まずはその品揃えと、価格の安さに驚いてくださいヽ(^_^) From yipeeyipeeyipeeyipee at yahoo.com Tue May 30 14:55:20 2006 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Tue, 30 May 2006 21:55:20 +0000 (UTC) Subject: [openib-general] Re: special qp's creation from userspace References: <1148995226.4358.111151.camel@hal.voltaire.com> <1149016578.4358.119850.camel@hal.voltaire.com> Message-ID: Hal Rosenstock voltaire.com> writes: [snip] > How would you call ib_mad_port_close ? It's not exposed to userspace. I'm gonna hack the module and export this symbol. I don't need to run this code on a vanilla kernel, it is being used in a closed environment. Thanks Hal, z From trimmer at silverstorm.com Tue May 30 15:05:05 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Tue, 30 May 2006 18:05:05 -0400 Subject: [openib-general] QoS RFC - Resend using a friendly mailer Message-ID: > Eitan Wrote: > > As Roland suggest, before implementing a non-standard approach, IBTA > should be > > engaged to define an appropriate extension to the standard. Such > extensions would > > need to be carefully defined to avoid breaking existing applications > and fabrics. > [EZ] You are welcome to join IBTA and work on this too. I am a member of IBTA however I have not noticed this discussion on the IBTA working groups. Which working group have you engaged with this proposal? Todd Rimmer From bundren at r.com Tue May 30 15:07:25 2006 From: bundren at r.com (Mair Bundren) Date: Tue, 30 May 2006 15:07:25 -0700 Subject: [openib-general] Re: 291 nurserygovernes Message-ID: <000001c68435$6f1443d0$5562a8c0@wmp72> Hi, M E R ? D i A A M B / E N P R O Z ^ C V ? A G R A C ? A L i S L E V ? T R A S O M ^ X ^ N A X V A L / U M http://www.roflasikasumon.com Suddenly on the path ahead appeared some white deer, a hind and fawns as snowy white as the hart had been dark. They glimmered in the shadows. Before Thorin could cry out three of the dwarves had leaped to their feet and loosed off arrows from their bows. None seemed to find their mark. The deer turned and vanished in the trees as silently as they had come, and in vain the dwarves shot their arrows after them. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Tue May 30 15:11:03 2006 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 30 May 2006 15:11:03 -0700 Subject: [openib-general] Failed multicast join withnew multicast module In-Reply-To: <1149024804.4510.1056.camel@hal.voltaire.com> Message-ID: >On Tue, 2006-05-30 at 17:33, Sean Hefty wrote: >> Hal Rosenstock wrote: >> > Is client reregister handled properly by the multicast module ? >> >> Can you clarify what you mean by this? Are you asking about re-sending join >> requests based on some event? > >Yes; when the SM sends a Set PortInfo with ClientReregister set. No - the current implementation does not. But the information is there to resend the join requests. - Sean From iod00d at hp.com Tue May 30 15:49:17 2006 From: iod00d at hp.com (Grant Grundler) Date: Tue, 30 May 2006 15:49:17 -0700 Subject: [openib-general] QoS RFC - Resend using a friendly mailer In-Reply-To: <20060530190936.GD21212@sashak.voltaire.com> References: <20060530190936.GD21212@sashak.voltaire.com> Message-ID: <20060530224917.GM29770@esmail.cup.hp.com> On Tue, May 30, 2006 at 10:09:36PM +0300, Sasha Khapyorsky wrote: > > XML style syntax is provided for the policy file. > > Why XML? It is not too much readable and writable (by human) format. It is human readable and very portable. An example is here: http://svn.gnumonks.org/trunk/mmio_test/mmio_test.xml And GPL libraries can parse XML. So the new code is fairly short: http://svn.gnumonks.org/trunk/mmio_test/xmlin.c hth, grant From halr at voltaire.com Tue May 30 16:00:59 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 30 May 2006 19:00:59 -0400 Subject: [openib-general] QoS RFC - Resend using a friendly mailer In-Reply-To: References: Message-ID: <1149030058.4510.3164.camel@hal.voltaire.com> On Tue, 2006-05-30 at 18:05, Rimmer, Todd wrote: > > Eitan Wrote: > > > As Roland suggest, before implementing a non-standard approach, IBTA > > should be > > > engaged to define an appropriate extension to the standard. Such > > extensions would > > > need to be carefully defined to avoid breaking existing applications > > and fabrics. > > [EZ] You are welcome to join IBTA and work on this too. > > I am a member of IBTA however I have not noticed this discussion > on the IBTA working groups. Which working group have you engaged > with this proposal? It's at the LWG. -- Hal > > Todd Rimmer > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From flaloto at webmail.co.za Wed May 31 14:32:01 2006 From: flaloto at webmail.co.za (flaloto at webmail.co.za) Date: Wed, 31 May 2006 13:32:01 -0800 Subject: [openib-general] FINAL NOTICE OF AWARD NOTIFICATION Message-ID: <20060530234502.356AF22834D@openib.ca.sandia.gov> FROM THE DESK OF THE PROMOTIONS LOTTERY MANAGER, PROTEA WINNERS ORGANIZATION LOTTERY SOUTH AFRICA, 13,LAKE VIEW DRIVE,AUCKLAND PARK P.O.BOX 296, AUCKLAND PARK, JOHANNESBURY, SOUTH AFRICA. Your kind Attention: FINAL NOTICE OF AWARD NOTIFICATION We are pleased to inform you of the announcement today the 30Th MAY 2006,of winners of the PROTEA WINNERS ORGANIZATION LOTTERY, held on 30th APRIL 2006 as part of our promotional draws. Participants were selected through a computer ballot system drawn from 2,500,000 email addresses of individuals and companies from Africa, America, Asia, Australia,Canada,Europe, Middle East,and New Zealand as part of our electronic business Promotions lottery Programm You qualified for the draw as a result of you visiting Various websites we are running the e-business promotions lottery for. You/Your Company, attached to ticket number 139-3201-6409,with serial number 570-10 drew the lucky numbers 1,8,14,20,31,46,72,and consequently won in the Second Category. You have therefore been approved for a lump sum pay out of US$3,000,000.00 in cash,which is the winning payout for Second category winners.This is from the total prize money of S$21,000, 000.00 shared among the Seven international winners in the Second category. CONGRATULATIONS! Your fund is now deposited with the Maco Finance and Security Company insured in your name. Due to the mix up of some numbers and names, we award strictly from public notice until your claim has been processed and your money remitted to your account. This is part of our security protocol to avoid double claiming or unscrupulous acts by participants of this program. We hope with a part of your prize, you will participate in our up coming mid year (2007) high stakes US$1.3 billion International Lottery. To begin your claim, please contact your claim agent immediately: DR MARK ZOLO FOREIGN SERVICE MANAGER, TEL:+27-78-3030-229 EMAIL: markzolo_7777 at yahoo.co.uk Kindly contact your claims officer and provide him with the following informations, 1. The Refrence Number.------------------------------------------- 2. The Batch Number.----------------------------------------------- 3. The Ticket Number.---------------------------------------------- 4. The serial Number.------------------------------------------------ 5. The lucky Number.----------------------------------------------- FULL NAMES:________________________MAILING ADDRESS:__________________________________SEX:_____________________ AGE/DATE OF BIRTH________________MARITAL STATUS:___________________ OCCUPATION:______________________TEL/FAX UMBER:_____________________ AMOUNT WON:__________________STATE/COUNTRY:_____________________COMPANY NAME:________________________ If you do not contact your claims agent within 14 working days of this notification, your winning prize money would be revoked. NOTE: In order to avoid unnecessary delays and complications, please remember to quote your reference and batch numbers. Winners are advised to keep their winning details/ information from the public to avoid fraudulent claims. (IMPORTANT) pending the transfer/claim REFERENCE NUMBER: KFQ-XV69-013b-9 BATCH NUMBER: 57-488-BBN Congratulations once again from all our staff and thank you promotions program. Sincerely, MRS STEPHANIE NKOSI. Lottery Co-ordinator PROTEA WINNERS ORGANIZATION LOTTERY SOUTH AFRICA From tedivey at hot.ee Tue May 30 15:56:21 2006 From: tedivey at hot.ee (weider goss) Date: Tue, 30 May 2006 18:56:21 -0400 Subject: [openib-general] Over paying, just refinnace, hind wing Message-ID: <00b401c6843b$a9012800$0b73bdcc@kadyjbc> How much are you paying for your Home? To much? You have been pre-approved to fill out for a ref inance laon, if you need some cash to spend ANY way you like, or simply wish to LOWER your monthly payments by a third or more, etc. We skip the middle man to save hundreds with deals we have! This offer is for you, we DONT CARE about your credit. Apply online now for your instant quote. Stop over paying... http://keons.org/d2/ cigarette lighter red-rose Temperate zone glee club out-soul heat exchanger Anglo-israelite butter chip poet-historian wood baboon drill tester fifty-fifty trumpet-loud sodium bichromate balance rudder card grinder dye grinder re-evaluate sister ship fly honeysuckle pine grass hoity-toity wagon master non compos tinsel-bright quick-compounded cabbage-root maggot sales floor beauty-blooming world-overcoming crank throw -------------- next part -------------- An HTML attachment was scrubbed... URL: From hitozumayu-waku at hitmail.cc Tue May 30 18:03:12 2006 From: hitozumayu-waku at hitmail.cc (hitozumayu-waku at hitmail.cc) Date: Tue, 30 May 2006 18:03:12 -0700 (PDT) Subject: [openib-general] =?utf-8?b?wpBswo3DiMKXVcKYZsKLw6TCinnClcKUwoI=?= =?utf-8?b?w4zCj8K1wpHDksKPw7PCgsKqwpPDjcKCwqLCgsOEwoLCosKCw5zCgsK3?= Message-ID: 20060531094830.95584mail@mail.love-woman889889_gogo-server114_freesystem01_freefree-lovelove.tv ����ɂ���, �l�ȗU�f��y���^�c�����ǂł��B ����������Ȃ��� �\�[�V�����l�b�g���[�L���O�T�C�g�l�ȗU�f��y���֏��҂��Ă��܂��B ���b�Z�[�W�F���҂��܂����B ���L��URL���o�^��ʂ������������܂��B �������炩��g�b�v�y�[�W���邱�Ƃ��ł��܂��B http://r-20.com/index.html?media=pc439 �������@�l�ȗU�f��y�����ĉ��H�@������ �l�ȗU�f��y���́A�����o�[��菵�҂��ꂽ���݂̂ō\������Ă���A �ŋߗ��s�̃\�[�V�����l�b�g���[�L���O�T�C�g�ł��B ���l�ȗU�f��y���Ȃ炱��܂ňȏ�ɏo��������ł��� �M���ł���F�B�A���l�A���l�A���t�������̊�������}�邳�܂��܂� �c�[�����p�ӂ��Ă��܂��B ���݂�Ȃƌ𗬂ł��� http://r-20.com/index.html?media=pc439 �l�ȗU�f��y����g���ΐl�ȓ��m�̃l�b�g���[�N���ǂ��ė���p�[�e�B�Ȃǂ̌𗬂� �ȒP�ɂł��܂��B�����ɂ͂��Ȃ��̃p�[�g�i�[����q����M���ł���l�b�g���[�N�� �`������Ă��܂��B�l�ȗU�f��y���͂ǂ����Ōq�����Ă���l���m���W�܂�o������T�C�g �ł���A���ꂪ�l�ȗU�f��y���̓����ł��B ���l�ȗU�f��y���Ȃ�ʐ^�A�v���t�B�[���̓ǂݏ����A���J���ł��� http://r-20.com/index.html?media=pc439 �݂Ȃ���̓v���t�B�[���A�ʐ^����J���邱�Ƃɂ���ēo�^���Ă���l�X�ɑ����� ���𔭐M���邱�Ƃ��”\�ł��B ���p�A�o�^���͖����ł��B �l�ȗU�f��y���֎Q���� http://r-20.com/index.html?media=pc439 ����ł́A�Q����S��肨�҂����Ă���܂� From kelloggvif at hotmail.com Tue May 30 19:39:55 2006 From: kelloggvif at hotmail.com (Lucinda) Date: Wed, 31 May 2006 02:39:55 -0000 Subject: [openib-general] proceed to watching HYWI.PK welcome to see Message-ID: <20060531023620.2D9EC2283D6@openib.ca.sandia.gov> Market trends and stokc analysis, ratings and investment information Price-sensitive insider information on stokcs to boost revenues Get HYWI First Thing on MOnday, This stcok Going To Explode for at least 30% Check out for Hot News! Hollywood Intermdeiate, Inc. Symbol: H Y W I Current prise: $1.28 , but will increase at least 30-50 % on Monday! stokcs that perform like rockets explained Don't forget to include this stcok to you daily trade! Improve your yearly gains with expert stokc advice stokc market analysis for beneficial cooperation Read great news on this stcok Tis better to get things for free, than to work up a sweat No great genius has ever existed without some touch of madness Too many clicks spoil the browse. A nod is as good as a wink to a blind man. Learning is a treasure that will follow its owner everywhere A friend is not so soon gotten as lost. Gie yer tongue mair holidays nor yer heid A penny saved makes Jack a dull boy. As you shall make your bed, so shall you..........mess it up. Constant dripping wears away the stone If you must choose between two evils, pick the one you've never tried before. The Devil rides upon a fiddlestick He who has once burnt his mouth always blows his soup There's no fool like an old fool. A banker is a fellow who lends you his umbrella when the sun is shining, but wants it back the minute it begins to rain. Not to know is bad; not to wish to know is worse After dinner rest a while, after supper walk a mile What does not kill you makes you stronger The country is in ruins, and there are still mountains and rivers - japanese proverb From kyochan at walla.com Tue May 30 20:51:57 2006 From: kyochan at walla.com (kyochan at walla.com) Date: Tue, 30 May 2006 20:51:57 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?Rnc6GyRCM1okSz1QMnEkKCRrGyhC?= =?iso-2022-jp?b?GyRCTDVOQSRKPVAycSQkPlIycDR1Sz4bKEI=?= Message-ID: 20060531113025.41078mail@mail.kiss_woman-server59_firsttime-go-free889_system08_love-kiss.tv  :*.☆。                 。☆.*:  ◆ ━…‥‥…━…‥‥…━…‥‥…━…‥…━ ◆ ◆●◆   今の生活に満足していますか?  ◆●◆  ◆ ━…‥‥…━…‥‥…━…‥‥…━…‥…━ ◆  :*.☆。                 。☆.*:   ・仕事で疲れ切って、ヒマとゆとりがない   ・同じ生活サイクルで変化が少ない   ・人間関係に軽くストレスがある   ・日常に少し疲れている…   今の生活に刺激と変化を与えるには   「新しい出会い」によって誰かと話すことです。   http://wmn-line.cx/h/      18歳以上であれば、どなたでも参加する資格があります。   ◇忙しくて日常では出会いが少なすぎる。   ◇自宅でゆっくりとした時間に異性を探したい。   ◇職場での出会いがなさ過ぎる。   ◇異性の友人は多いけど…友達以上になるのは中々…。   そんな方にオススメするサービス   ☆--☆--☆安心の完全無料☆--☆--☆--☆--☆--☆--☆--☆--   |登録料・月額費・ポイント等は一切かかりません。    |   |メール送受信、掲示板書き込み・閲覧、お相手検索等の他、|   |直メールや直電話も無料でご利用頂けます。       |   ☆--☆--☆--☆--☆--☆--☆--☆--☆--☆--☆--☆--☆--☆--   ↓こちらからどうぞ↓   http://wmn-line.cx/h/   PRシート登録・・・・・・・・無 料   お相手からメールを受け取る・無 料   自分からメールを送る・・・・無 料   画像を閲覧する・・・・・・・無 料     会員の投稿(一部)  □-----------------------------------------------□  紗奈さん 21歳  似ている芸能人 平原綾香さん  紗奈と淫乱な関係になってくれますか?  親が厳しいので今まで親が決めた相手しか付き合ったことがなくて  いい思い出もありません(>_<)  本気で秘密の関係にしてほしいんです!  約束できるならちゃんとした謝礼は出しますので、  色々楽しい遊び方を教えてくれませんか(゜-゜*)  □-----------------------------------------------□  登録確認: http://wmn-line.cx/h/  □-----------------------------------------------□  ゆみこさん 28歳   似ている芸能人 伊藤裕子さん  30万で今週末一緒に過ごせる?  急な話で悪いんですけど(*・O・)  今週末の予定は埋まっちゃってますか?  お金足りなかったらもう少し出せるのでとりあえず連絡ください☆  □-----------------------------------------------□  完全無料登録: http://wmn-line.cx/h/  □-----------------------------------------------□  ※安全にご利用頂くため、   完全会員限定のサービスとさせて頂いております。   ご利用は登録から退会まで   全サービスが無料でお楽しみ頂けます。 http://wmn-line.cx/h/ From eitan at mellanox.co.il Tue May 30 22:34:43 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 31 May 2006 08:34:43 +0300 Subject: [openib-general] QoS RFC - Resend using a friendly mailer Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302368780@mtlexch01.mtl.com> Hi Todd, It is LWG. MgtWG will also be involved. > > I am a member of IBTA however I have not noticed this discussion on the IBTA > working groups. Which working group have you engaged with this proposal? > > Todd Rimmer From mst at mellanox.co.il Wed May 31 01:16:44 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 31 May 2006 11:16:44 +0300 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: <20060530093930.GE21266@mellanox.co.il> References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> <20060529161558.GU21266@mellanox.co.il> <20060530093930.GE21266@mellanox.co.il> Message-ID: <20060531081644.GN21266@mellanox.co.il> Roland, it seems you've taken the wrong patch (the one I posted for review pre-testing) into your tree. Please revert and take the one below. It's been running here without issues. ----- Quoting r. Michael S. Tsirkin : Subject: Re: ia64: kernel unaligned access in ipoib Quoting r. Roland Dreier : > Subject: Re: ia64: kernel unaligned access in ipoib > > Michael> We've written up a patch with Jack - do you want us to > Michael> test it or prefer to re-write it yourself? > > Go ahead and test it -- I replied before I saw your patch. The following fixed the issue for us, pls review. Can this go into 2.6.17? If yes, I think it's prudent to let it run for another night before pushing it out, since we had to touch a lot of lines here. We'll do that and let you know tomorrow. --- Fix misaligned access faults on ia64: never cast a misaligned ha + 4 pointer to union ib_gid type, pass a void * pointer instead. Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: src/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- src.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-04-16 11:12:16.105871000 +0300 +++ src/drivers/infiniband/ulp/ipoib/ipoib_main.c 2006-05-30 12:09:43.113172000 +0300 @@ -190,8 +190,7 @@ static int ipoib_change_mtu(struct net_d return 0; } -static struct ipoib_path *__path_find(struct net_device *dev, - union ib_gid *gid) +static struct ipoib_path *__path_find(struct net_device *dev, void *gid) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct rb_node *n = priv->path_tree.rb_node; @@ -201,7 +200,7 @@ static struct ipoib_path *__path_find(st while (n) { path = rb_entry(n, struct ipoib_path, rb_node); - ret = memcmp(gid->raw, path->pathrec.dgid.raw, + ret = memcmp(gid, path->pathrec.dgid.raw, sizeof (union ib_gid)); if (ret < 0) @@ -430,8 +429,7 @@ static void path_rec_completion(int stat } } -static struct ipoib_path *path_rec_create(struct net_device *dev, - union ib_gid *gid) +static struct ipoib_path *path_rec_create(struct net_device *dev, void *gid) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_path *path; @@ -446,7 +444,7 @@ static struct ipoib_path *path_rec_creat INIT_LIST_HEAD(&path->neigh_list); - memcpy(path->pathrec.dgid.raw, gid->raw, sizeof (union ib_gid)); + memcpy(path->pathrec.dgid.raw, gid, sizeof (union ib_gid)); path->pathrec.sgid = priv->local_gid; path->pathrec.pkey = cpu_to_be16(priv->pkey); path->pathrec.numb_path = 1; @@ -504,10 +502,9 @@ static void neigh_add_path(struct sk_buf */ spin_lock(&priv->lock); - path = __path_find(dev, (union ib_gid *) (skb->dst->neighbour->ha + 4)); + path = __path_find(dev, skb->dst->neighbour->ha + 4); if (!path) { - path = path_rec_create(dev, - (union ib_gid *) (skb->dst->neighbour->ha + 4)); + path = path_rec_create(dev, skb->dst->neighbour->ha + 4); if (!path) goto err_path; @@ -557,7 +554,7 @@ static void ipoib_path_lookup(struct sk_ /* Add in the P_Key for multicasts */ skb->dst->neighbour->ha[8] = (priv->pkey >> 8) & 0xff; skb->dst->neighbour->ha[9] = priv->pkey & 0xff; - ipoib_mcast_send(dev, (union ib_gid *) (skb->dst->neighbour->ha + 4), skb); + ipoib_mcast_send(dev, skb->dst->neighbour->ha + 4, skb); } static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, @@ -572,10 +569,9 @@ static void unicast_arp_send(struct sk_b */ spin_lock(&priv->lock); - path = __path_find(dev, (union ib_gid *) (phdr->hwaddr + 4)); + path = __path_find(dev, phdr->hwaddr + 4); if (!path) { - path = path_rec_create(dev, - (union ib_gid *) (phdr->hwaddr + 4)); + path = path_rec_create(dev, phdr->hwaddr + 4); if (path) { /* put pseudoheader back on for next time */ skb_push(skb, sizeof *phdr); @@ -666,7 +662,7 @@ static int ipoib_start_xmit(struct sk_bu phdr->hwaddr[8] = (priv->pkey >> 8) & 0xff; phdr->hwaddr[9] = priv->pkey & 0xff; - ipoib_mcast_send(dev, (union ib_gid *) (phdr->hwaddr + 4), skb); + ipoib_mcast_send(dev, phdr->hwaddr + 4, skb); } else { /* unicast GID -- should be ARP or RARP reply */ @@ -677,7 +673,7 @@ static int ipoib_start_xmit(struct sk_bu skb->dst ? "neigh" : "dst", be16_to_cpup((__be16 *) skb->data), be32_to_cpup((__be32 *) phdr->hwaddr), - IPOIB_GID_ARG(*(union ib_gid *) (phdr->hwaddr + 4))); + IPOIB_GID_RAW_ARG(phdr->hwaddr + 4)); dev_kfree_skb_any(skb); ++priv->stats.tx_dropped; goto out; @@ -773,7 +769,7 @@ static void ipoib_neigh_destructor(struc ipoib_dbg(priv, "neigh_destructor for %06x " IPOIB_GID_FMT "\n", be32_to_cpup((__be32 *) n->ha), - IPOIB_GID_ARG(*((union ib_gid *) (n->ha + 4)))); + IPOIB_GID_RAW_ARG(n->ha + 4)); spin_lock_irqsave(&priv->lock, flags); Index: src/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- src.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2006-05-25 11:35:23.334409000 +0300 +++ src/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2006-05-30 12:09:28.568595000 +0300 @@ -153,7 +153,7 @@ static struct ipoib_mcast *ipoib_mcast_a return mcast; } -static struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, union ib_gid *mgid) +static struct ipoib_mcast *__ipoib_mcast_find(struct net_device *dev, void *mgid) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct rb_node *n = priv->multicast_tree.rb_node; @@ -164,7 +164,7 @@ static struct ipoib_mcast *__ipoib_mcast mcast = rb_entry(n, struct ipoib_mcast, rb_node); - ret = memcmp(mgid->raw, mcast->mcmember.mgid.raw, + ret = memcmp(mgid, mcast->mcmember.mgid.raw, sizeof (union ib_gid)); if (ret < 0) n = n->rb_left; @@ -639,8 +639,7 @@ static int ipoib_mcast_leave(struct net_ return 0; } -void ipoib_mcast_send(struct net_device *dev, union ib_gid *mgid, - struct sk_buff *skb) +void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_mcast *mcast; @@ -663,7 +662,7 @@ void ipoib_mcast_send(struct net_device if (!mcast) { /* Let's create a new send only group now */ ipoib_dbg_mcast(priv, "setting up send only multicast group for " - IPOIB_GID_FMT "\n", IPOIB_GID_ARG(*mgid)); + IPOIB_GID_FMT "\n", IPOIB_GID_RAW_ARG(mgid)); mcast = ipoib_mcast_alloc(dev, 0); if (!mcast) { @@ -675,7 +674,7 @@ void ipoib_mcast_send(struct net_device } set_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags); - mcast->mcmember.mgid = *mgid; + memcpy(mcast->mcmember.mgid.raw, mgid, sizeof (union ib_gid)); __ipoib_mcast_add(dev, mcast); list_add_tail(&mcast->list, &priv->multicast_list); } Index: src/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- src.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2006-04-06 10:03:29.420250000 +0300 +++ src/drivers/infiniband/ulp/ipoib/ipoib.h 2006-05-30 12:20:39.572837000 +0300 @@ -278,8 +278,7 @@ int ipoib_dev_init(struct net_device *de void ipoib_dev_cleanup(struct net_device *dev); void ipoib_mcast_join_task(void *dev_ptr); -void ipoib_mcast_send(struct net_device *dev, union ib_gid *mgid, - struct sk_buff *skb); +void ipoib_mcast_send(struct net_device *dev, void *mgid, struct sk_buff *skb); void ipoib_mcast_restart_task(void *dev_ptr); int ipoib_mcast_start_thread(struct net_device *dev); @@ -375,15 +374,26 @@ extern int ipoib_debug_level; #endif /* CONFIG_INFINIBAND_IPOIB_DEBUG_DATA */ -#define IPOIB_GID_FMT "%x:%x:%x:%x:%x:%x:%x:%x" +#define IPOIB_GID_FMT "%2.2x%2.2x:%2.2x%2.2x:%2.2x%2.2x:%2.2x%2.2x:" \ + "%2.2x%2.2x:%2.2x%2.2x:%2.2x%2.2x:%2.2x%2.2x" -#define IPOIB_GID_ARG(gid) be16_to_cpup((__be16 *) ((gid).raw + 0)), \ - be16_to_cpup((__be16 *) ((gid).raw + 2)), \ - be16_to_cpup((__be16 *) ((gid).raw + 4)), \ - be16_to_cpup((__be16 *) ((gid).raw + 6)), \ - be16_to_cpup((__be16 *) ((gid).raw + 8)), \ - be16_to_cpup((__be16 *) ((gid).raw + 10)), \ - be16_to_cpup((__be16 *) ((gid).raw + 12)), \ - be16_to_cpup((__be16 *) ((gid).raw + 14)) +#define IPOIB_GID_RAW_ARG(gid) ((u8 *)(gid))[0], \ + ((u8 *)(gid))[1], \ + ((u8 *)(gid))[2], \ + ((u8 *)(gid))[3], \ + ((u8 *)(gid))[4], \ + ((u8 *)(gid))[5], \ + ((u8 *)(gid))[6], \ + ((u8 *)(gid))[7], \ + ((u8 *)(gid))[8], \ + ((u8 *)(gid))[9], \ + ((u8 *)(gid))[10],\ + ((u8 *)(gid))[11],\ + ((u8 *)(gid))[12],\ + ((u8 *)(gid))[13],\ + ((u8 *)(gid))[14],\ + ((u8 *)(gid))[15] + +#define IPOIB_GID_ARG(gid) IPOIB_GID_RAW_ARG((gid).raw) #endif /* _IPOIB_H */ -- MST From mst at mellanox.co.il Wed May 31 01:17:42 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 31 May 2006 11:17:42 +0300 Subject: [openib-general] [PATCH] libmthca to include Message-ID: <20060531081742.GO21266@mellanox.co.il> Fix compilation on SLES10 RC2: mthca.h uses offsetof so it must include stddef.h Signed-off-by: Jack Morgenstein Signed-off-by: Michael S. Tsirkin Index: l/src/userspace/libmthca/src/mthca.h =================================================================== --- l/src/userspace/libmthca/src/mthca.h (revision 7569) +++ l/src/userspace/libmthca/src/mthca.h (working copy) @@ -36,6 +36,7 @@ #ifndef MTHCA_H #define MTHCA_H +#include #include #include -- MST From mst at mellanox.co.il Wed May 31 01:50:29 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 31 May 2006 11:50:29 +0300 Subject: [openib-general] [PATCH] RFC: use stdint.h types Message-ID: <20060531085029.GP21266@mellanox.co.il> Roland, I think we should get rid of __u32 and friends in userspace: including linux/types.h there does all kind of kernel-related tricks such as overriding offsetof from compiler builtin. This patch removes this dependency from libibverbs, libmthca and libipathverbs and supercedes the libmthca patch that I posted previously. What this patch does: replace all __uXX with uintXX_t and __sXX with intXX_t. fix include files appropriately: #include -> #include add stddef.h to libmthca and ipathverbs for offsetof macro (linux/stddef.h redefines it and it is pulled in by linux/types.h on some distros). add sysfs/libsysfs.h to ipathverbs (not sure how does it currently compile on trunk without) --- libibverbs and libmthca pull in types from under linux/ directory which are broken on some distros/compilers. There's no good reason for us to do so: integer types of specific widths are specified in stdint.h in a portable way. Signed-off-by: Michael S. Tsirkin Index: libibverbs/include/infiniband/sa-kern-abi.h =================================================================== --- libibverbs/include/infiniband/sa-kern-abi.h (revision 7580) +++ libibverbs/include/infiniband/sa-kern-abi.h (working copy) @@ -33,7 +33,7 @@ #ifndef INFINIBAND_SA_KERN_ABI_H #define INFINIBAND_SA_KERN_ABI_H -#include +#include /* * Obsolete, deprecated names. Will be removed in libibverbs 1.1. @@ -41,25 +41,25 @@ #define ib_kern_path_rec ibv_kern_path_rec struct ibv_kern_path_rec { - __u8 dgid[16]; - __u8 sgid[16]; - __u16 dlid; - __u16 slid; - __u32 raw_traffic; - __u32 flow_label; - __u32 reversible; - __u32 mtu; - __u16 pkey; - __u8 hop_limit; - __u8 traffic_class; - __u8 numb_path; - __u8 sl; - __u8 mtu_selector; - __u8 rate_selector; - __u8 rate; - __u8 packet_life_time_selector; - __u8 packet_life_time; - __u8 preference; + uint8_t dgid[16]; + uint8_t sgid[16]; + uint16_t dlid; + uint16_t slid; + uint32_t raw_traffic; + uint32_t flow_label; + uint32_t reversible; + uint32_t mtu; + uint16_t pkey; + uint8_t hop_limit; + uint8_t traffic_class; + uint8_t numb_path; + uint8_t sl; + uint8_t mtu_selector; + uint8_t rate_selector; + uint8_t rate; + uint8_t packet_life_time_selector; + uint8_t packet_life_time; + uint8_t preference; }; #endif /* INFINIBAND_SA_KERN_ABI_H */ Index: libibverbs/include/infiniband/kern-abi.h =================================================================== --- libibverbs/include/infiniband/kern-abi.h (revision 7580) +++ libibverbs/include/infiniband/kern-abi.h (working copy) @@ -37,7 +37,7 @@ #ifndef KERN_ABI_H #define KERN_ABI_H -#include +#include /* * This file must be kept in sync with the kernel's version of @@ -95,663 +95,663 @@ * that they pack the same way on 32-bit and 64-bit architectures (to * avoid incompatibility between 32-bit userspace and 64-bit kernels). * Specifically: - * - Do not use pointer types -- pass pointers in __u64 instead. + * - Do not use pointer types -- pass pointers in uint64_t instead. * - Make sure that any structure larger than 4 bytes is padded to a * multiple of 8 bytes. Otherwise the structure size will be * different between 32-bit and 64-bit architectures. */ struct ibv_kern_async_event { - __u64 element; - __u32 event_type; - __u32 reserved; + uint64_t element; + uint32_t event_type; + uint32_t reserved; }; struct ibv_comp_event { - __u64 cq_handle; + uint64_t cq_handle; }; /* - * All commands from userspace should start with a __u32 command field - * followed by __u16 in_words and out_words fields (which give the + * All commands from userspace should start with a uint32_t command field + * followed by uint16_t in_words and out_words fields (which give the * length of the command block and response buffer if any in 32-bit * words). The kernel driver will read these fields first and read * the rest of the command struct based on these value. */ struct ibv_query_params { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; }; struct ibv_query_params_resp { - __u32 num_cq_events; + uint32_t num_cq_events; }; struct ibv_get_context { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint64_t driver_data[0]; }; struct ibv_get_context_resp { - __u32 async_fd; - __u32 num_comp_vectors; + uint32_t async_fd; + uint32_t num_comp_vectors; }; struct ibv_query_device { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint64_t driver_data[0]; }; struct ibv_query_device_resp { - __u64 fw_ver; - __u64 node_guid; - __u64 sys_image_guid; - __u64 max_mr_size; - __u64 page_size_cap; - __u32 vendor_id; - __u32 vendor_part_id; - __u32 hw_ver; - __u32 max_qp; - __u32 max_qp_wr; - __u32 device_cap_flags; - __u32 max_sge; - __u32 max_sge_rd; - __u32 max_cq; - __u32 max_cqe; - __u32 max_mr; - __u32 max_pd; - __u32 max_qp_rd_atom; - __u32 max_ee_rd_atom; - __u32 max_res_rd_atom; - __u32 max_qp_init_rd_atom; - __u32 max_ee_init_rd_atom; - __u32 atomic_cap; - __u32 max_ee; - __u32 max_rdd; - __u32 max_mw; - __u32 max_raw_ipv6_qp; - __u32 max_raw_ethy_qp; - __u32 max_mcast_grp; - __u32 max_mcast_qp_attach; - __u32 max_total_mcast_qp_attach; - __u32 max_ah; - __u32 max_fmr; - __u32 max_map_per_fmr; - __u32 max_srq; - __u32 max_srq_wr; - __u32 max_srq_sge; - __u16 max_pkeys; - __u8 local_ca_ack_delay; - __u8 phys_port_cnt; - __u8 reserved[4]; + uint64_t fw_ver; + uint64_t node_guid; + uint64_t sys_image_guid; + uint64_t max_mr_size; + uint64_t page_size_cap; + uint32_t vendor_id; + uint32_t vendor_part_id; + uint32_t hw_ver; + uint32_t max_qp; + uint32_t max_qp_wr; + uint32_t device_cap_flags; + uint32_t max_sge; + uint32_t max_sge_rd; + uint32_t max_cq; + uint32_t max_cqe; + uint32_t max_mr; + uint32_t max_pd; + uint32_t max_qp_rd_atom; + uint32_t max_ee_rd_atom; + uint32_t max_res_rd_atom; + uint32_t max_qp_init_rd_atom; + uint32_t max_ee_init_rd_atom; + uint32_t atomic_cap; + uint32_t max_ee; + uint32_t max_rdd; + uint32_t max_mw; + uint32_t max_raw_ipv6_qp; + uint32_t max_raw_ethy_qp; + uint32_t max_mcast_grp; + uint32_t max_mcast_qp_attach; + uint32_t max_total_mcast_qp_attach; + uint32_t max_ah; + uint32_t max_fmr; + uint32_t max_map_per_fmr; + uint32_t max_srq; + uint32_t max_srq_wr; + uint32_t max_srq_sge; + uint16_t max_pkeys; + uint8_t local_ca_ack_delay; + uint8_t phys_port_cnt; + uint8_t reserved[4]; }; struct ibv_query_port { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u8 port_num; - __u8 reserved[7]; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint8_t port_num; + uint8_t reserved[7]; + uint64_t driver_data[0]; }; struct ibv_query_port_resp { - __u32 port_cap_flags; - __u32 max_msg_sz; - __u32 bad_pkey_cntr; - __u32 qkey_viol_cntr; - __u32 gid_tbl_len; - __u16 pkey_tbl_len; - __u16 lid; - __u16 sm_lid; - __u8 state; - __u8 max_mtu; - __u8 active_mtu; - __u8 lmc; - __u8 max_vl_num; - __u8 sm_sl; - __u8 subnet_timeout; - __u8 init_type_reply; - __u8 active_width; - __u8 active_speed; - __u8 phys_state; - __u8 reserved[3]; + uint32_t port_cap_flags; + uint32_t max_msg_sz; + uint32_t bad_pkey_cntr; + uint32_t qkey_viol_cntr; + uint32_t gid_tbl_len; + uint16_t pkey_tbl_len; + uint16_t lid; + uint16_t sm_lid; + uint8_t state; + uint8_t max_mtu; + uint8_t active_mtu; + uint8_t lmc; + uint8_t max_vl_num; + uint8_t sm_sl; + uint8_t subnet_timeout; + uint8_t init_type_reply; + uint8_t active_width; + uint8_t active_speed; + uint8_t phys_state; + uint8_t reserved[3]; }; struct ibv_alloc_pd { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint64_t driver_data[0]; }; struct ibv_alloc_pd_resp { - __u32 pd_handle; + uint32_t pd_handle; }; struct ibv_dealloc_pd { - __u32 command; - __u16 in_words; - __u16 out_words; - __u32 pd_handle; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint32_t pd_handle; }; struct ibv_reg_mr { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u64 start; - __u64 length; - __u64 hca_va; - __u32 pd_handle; - __u32 access_flags; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint64_t start; + uint64_t length; + uint64_t hca_va; + uint32_t pd_handle; + uint32_t access_flags; + uint64_t driver_data[0]; }; struct ibv_reg_mr_resp { - __u32 mr_handle; - __u32 lkey; - __u32 rkey; + uint32_t mr_handle; + uint32_t lkey; + uint32_t rkey; }; struct ibv_dereg_mr { - __u32 command; - __u16 in_words; - __u16 out_words; - __u32 mr_handle; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint32_t mr_handle; }; struct ibv_create_comp_channel { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; }; struct ibv_create_comp_channel_resp { - __u32 fd; + uint32_t fd; }; struct ibv_create_cq { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u64 user_handle; - __u32 cqe; - __u32 comp_vector; - __s32 comp_channel; - __u32 reserved; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint64_t user_handle; + uint32_t cqe; + uint32_t comp_vector; + int32_t comp_channel; + uint32_t reserved; + uint64_t driver_data[0]; }; struct ibv_create_cq_resp { - __u32 cq_handle; - __u32 cqe; + uint32_t cq_handle; + uint32_t cqe; }; struct ibv_kern_wc { - __u64 wr_id; - __u32 status; - __u32 opcode; - __u32 vendor_err; - __u32 byte_len; - __u32 imm_data; - __u32 qp_num; - __u32 src_qp; - __u32 wc_flags; - __u16 pkey_index; - __u16 slid; - __u8 sl; - __u8 dlid_path_bits; - __u8 port_num; - __u8 reserved; + uint64_t wr_id; + uint32_t status; + uint32_t opcode; + uint32_t vendor_err; + uint32_t byte_len; + uint32_t imm_data; + uint32_t qp_num; + uint32_t src_qp; + uint32_t wc_flags; + uint16_t pkey_index; + uint16_t slid; + uint8_t sl; + uint8_t dlid_path_bits; + uint8_t port_num; + uint8_t reserved; }; struct ibv_poll_cq { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u32 cq_handle; - __u32 ne; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint32_t cq_handle; + uint32_t ne; }; struct ibv_poll_cq_resp { - __u32 count; - __u32 reserved; + uint32_t count; + uint32_t reserved; struct ibv_kern_wc wc[0]; }; struct ibv_req_notify_cq { - __u32 command; - __u16 in_words; - __u16 out_words; - __u32 cq_handle; - __u32 solicited; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint32_t cq_handle; + uint32_t solicited; }; struct ibv_resize_cq { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u32 cq_handle; - __u32 cqe; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint32_t cq_handle; + uint32_t cqe; + uint64_t driver_data[0]; }; struct ibv_resize_cq_resp { - __u32 cqe; + uint32_t cqe; }; struct ibv_destroy_cq { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u32 cq_handle; - __u32 reserved; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint32_t cq_handle; + uint32_t reserved; }; struct ibv_destroy_cq_resp { - __u32 comp_events_reported; - __u32 async_events_reported; + uint32_t comp_events_reported; + uint32_t async_events_reported; }; struct ibv_kern_global_route { - __u8 dgid[16]; - __u32 flow_label; - __u8 sgid_index; - __u8 hop_limit; - __u8 traffic_class; - __u8 reserved; + uint8_t dgid[16]; + uint32_t flow_label; + uint8_t sgid_index; + uint8_t hop_limit; + uint8_t traffic_class; + uint8_t reserved; }; struct ibv_kern_ah_attr { struct ibv_kern_global_route grh; - __u16 dlid; - __u8 sl; - __u8 src_path_bits; - __u8 static_rate; - __u8 is_global; - __u8 port_num; - __u8 reserved; + uint16_t dlid; + uint8_t sl; + uint8_t src_path_bits; + uint8_t static_rate; + uint8_t is_global; + uint8_t port_num; + uint8_t reserved; }; struct ibv_kern_qp_attr { - __u32 qp_attr_mask; - __u32 qp_state; - __u32 cur_qp_state; - __u32 path_mtu; - __u32 path_mig_state; - __u32 qkey; - __u32 rq_psn; - __u32 sq_psn; - __u32 dest_qp_num; - __u32 qp_access_flags; + uint32_t qp_attr_mask; + uint32_t qp_state; + uint32_t cur_qp_state; + uint32_t path_mtu; + uint32_t path_mig_state; + uint32_t qkey; + uint32_t rq_psn; + uint32_t sq_psn; + uint32_t dest_qp_num; + uint32_t qp_access_flags; struct ibv_kern_ah_attr ah_attr; struct ibv_kern_ah_attr alt_ah_attr; /* ib_qp_cap */ - __u32 max_send_wr; - __u32 max_recv_wr; - __u32 max_send_sge; - __u32 max_recv_sge; - __u32 max_inline_data; + uint32_t max_send_wr; + uint32_t max_recv_wr; + uint32_t max_send_sge; + uint32_t max_recv_sge; + uint32_t max_inline_data; - __u16 pkey_index; - __u16 alt_pkey_index; - __u8 en_sqd_async_notify; - __u8 sq_draining; - __u8 max_rd_atomic; - __u8 max_dest_rd_atomic; - __u8 min_rnr_timer; - __u8 port_num; - __u8 timeout; - __u8 retry_cnt; - __u8 rnr_retry; - __u8 alt_port_num; - __u8 alt_timeout; - __u8 reserved[5]; + uint16_t pkey_index; + uint16_t alt_pkey_index; + uint8_t en_sqd_async_notify; + uint8_t sq_draining; + uint8_t max_rd_atomic; + uint8_t max_dest_rd_atomic; + uint8_t min_rnr_timer; + uint8_t port_num; + uint8_t timeout; + uint8_t retry_cnt; + uint8_t rnr_retry; + uint8_t alt_port_num; + uint8_t alt_timeout; + uint8_t reserved[5]; }; struct ibv_create_qp { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u64 user_handle; - __u32 pd_handle; - __u32 send_cq_handle; - __u32 recv_cq_handle; - __u32 srq_handle; - __u32 max_send_wr; - __u32 max_recv_wr; - __u32 max_send_sge; - __u32 max_recv_sge; - __u32 max_inline_data; - __u8 sq_sig_all; - __u8 qp_type; - __u8 is_srq; - __u8 reserved; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint64_t user_handle; + uint32_t pd_handle; + uint32_t send_cq_handle; + uint32_t recv_cq_handle; + uint32_t srq_handle; + uint32_t max_send_wr; + uint32_t max_recv_wr; + uint32_t max_send_sge; + uint32_t max_recv_sge; + uint32_t max_inline_data; + uint8_t sq_sig_all; + uint8_t qp_type; + uint8_t is_srq; + uint8_t reserved; + uint64_t driver_data[0]; }; struct ibv_create_qp_resp { - __u32 qp_handle; - __u32 qpn; - __u32 max_send_wr; - __u32 max_recv_wr; - __u32 max_send_sge; - __u32 max_recv_sge; - __u32 max_inline_data; - __u32 reserved; + uint32_t qp_handle; + uint32_t qpn; + uint32_t max_send_wr; + uint32_t max_recv_wr; + uint32_t max_send_sge; + uint32_t max_recv_sge; + uint32_t max_inline_data; + uint32_t reserved; }; struct ibv_qp_dest { - __u8 dgid[16]; - __u32 flow_label; - __u16 dlid; - __u16 reserved; - __u8 sgid_index; - __u8 hop_limit; - __u8 traffic_class; - __u8 sl; - __u8 src_path_bits; - __u8 static_rate; - __u8 is_global; - __u8 port_num; + uint8_t dgid[16]; + uint32_t flow_label; + uint16_t dlid; + uint16_t reserved; + uint8_t sgid_index; + uint8_t hop_limit; + uint8_t traffic_class; + uint8_t sl; + uint8_t src_path_bits; + uint8_t static_rate; + uint8_t is_global; + uint8_t port_num; }; struct ibv_query_qp { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u32 qp_handle; - __u32 attr_mask; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint32_t qp_handle; + uint32_t attr_mask; + uint64_t driver_data[0]; }; struct ibv_query_qp_resp { struct ibv_qp_dest dest; struct ibv_qp_dest alt_dest; - __u32 max_send_wr; - __u32 max_recv_wr; - __u32 max_send_sge; - __u32 max_recv_sge; - __u32 max_inline_data; - __u32 qkey; - __u32 rq_psn; - __u32 sq_psn; - __u32 dest_qp_num; - __u32 qp_access_flags; - __u16 pkey_index; - __u16 alt_pkey_index; - __u8 qp_state; - __u8 cur_qp_state; - __u8 path_mtu; - __u8 path_mig_state; - __u8 en_sqd_async_notify; - __u8 max_rd_atomic; - __u8 max_dest_rd_atomic; - __u8 min_rnr_timer; - __u8 port_num; - __u8 timeout; - __u8 retry_cnt; - __u8 rnr_retry; - __u8 alt_port_num; - __u8 alt_timeout; - __u8 sq_sig_all; - __u8 reserved[5]; - __u64 driver_data[0]; + uint32_t max_send_wr; + uint32_t max_recv_wr; + uint32_t max_send_sge; + uint32_t max_recv_sge; + uint32_t max_inline_data; + uint32_t qkey; + uint32_t rq_psn; + uint32_t sq_psn; + uint32_t dest_qp_num; + uint32_t qp_access_flags; + uint16_t pkey_index; + uint16_t alt_pkey_index; + uint8_t qp_state; + uint8_t cur_qp_state; + uint8_t path_mtu; + uint8_t path_mig_state; + uint8_t en_sqd_async_notify; + uint8_t max_rd_atomic; + uint8_t max_dest_rd_atomic; + uint8_t min_rnr_timer; + uint8_t port_num; + uint8_t timeout; + uint8_t retry_cnt; + uint8_t rnr_retry; + uint8_t alt_port_num; + uint8_t alt_timeout; + uint8_t sq_sig_all; + uint8_t reserved[5]; + uint64_t driver_data[0]; }; struct ibv_modify_qp { - __u32 command; - __u16 in_words; - __u16 out_words; + uint32_t command; + uint16_t in_words; + uint16_t out_words; struct ibv_qp_dest dest; struct ibv_qp_dest alt_dest; - __u32 qp_handle; - __u32 attr_mask; - __u32 qkey; - __u32 rq_psn; - __u32 sq_psn; - __u32 dest_qp_num; - __u32 qp_access_flags; - __u16 pkey_index; - __u16 alt_pkey_index; - __u8 qp_state; - __u8 cur_qp_state; - __u8 path_mtu; - __u8 path_mig_state; - __u8 en_sqd_async_notify; - __u8 max_rd_atomic; - __u8 max_dest_rd_atomic; - __u8 min_rnr_timer; - __u8 port_num; - __u8 timeout; - __u8 retry_cnt; - __u8 rnr_retry; - __u8 alt_port_num; - __u8 alt_timeout; - __u8 reserved[2]; - __u64 driver_data[0]; + uint32_t qp_handle; + uint32_t attr_mask; + uint32_t qkey; + uint32_t rq_psn; + uint32_t sq_psn; + uint32_t dest_qp_num; + uint32_t qp_access_flags; + uint16_t pkey_index; + uint16_t alt_pkey_index; + uint8_t qp_state; + uint8_t cur_qp_state; + uint8_t path_mtu; + uint8_t path_mig_state; + uint8_t en_sqd_async_notify; + uint8_t max_rd_atomic; + uint8_t max_dest_rd_atomic; + uint8_t min_rnr_timer; + uint8_t port_num; + uint8_t timeout; + uint8_t retry_cnt; + uint8_t rnr_retry; + uint8_t alt_port_num; + uint8_t alt_timeout; + uint8_t reserved[2]; + uint64_t driver_data[0]; }; struct ibv_destroy_qp { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u32 qp_handle; - __u32 reserved; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint32_t qp_handle; + uint32_t reserved; }; struct ibv_destroy_qp_resp { - __u32 events_reported; + uint32_t events_reported; }; struct ibv_kern_send_wr { - __u64 wr_id; - __u32 num_sge; - __u32 opcode; - __u32 send_flags; - __u32 imm_data; + uint64_t wr_id; + uint32_t num_sge; + uint32_t opcode; + uint32_t send_flags; + uint32_t imm_data; union { struct { - __u64 remote_addr; - __u32 rkey; - __u32 reserved; + uint64_t remote_addr; + uint32_t rkey; + uint32_t reserved; } rdma; struct { - __u64 remote_addr; - __u64 compare_add; - __u64 swap; - __u32 rkey; - __u32 reserved; + uint64_t remote_addr; + uint64_t compare_add; + uint64_t swap; + uint32_t rkey; + uint32_t reserved; } atomic; struct { - __u32 ah; - __u32 remote_qpn; - __u32 remote_qkey; - __u32 reserved; + uint32_t ah; + uint32_t remote_qpn; + uint32_t remote_qkey; + uint32_t reserved; } ud; } wr; }; struct ibv_post_send { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u32 qp_handle; - __u32 wr_count; - __u32 sge_count; - __u32 wqe_size; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint32_t qp_handle; + uint32_t wr_count; + uint32_t sge_count; + uint32_t wqe_size; struct ibv_kern_send_wr send_wr[0]; }; struct ibv_post_send_resp { - __u32 bad_wr; + uint32_t bad_wr; }; struct ibv_kern_recv_wr { - __u64 wr_id; - __u32 num_sge; - __u32 reserved; + uint64_t wr_id; + uint32_t num_sge; + uint32_t reserved; }; struct ibv_post_recv { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u32 qp_handle; - __u32 wr_count; - __u32 sge_count; - __u32 wqe_size; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint32_t qp_handle; + uint32_t wr_count; + uint32_t sge_count; + uint32_t wqe_size; struct ibv_kern_recv_wr recv_wr[0]; }; struct ibv_post_recv_resp { - __u32 bad_wr; + uint32_t bad_wr; }; struct ibv_post_srq_recv { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u32 srq_handle; - __u32 wr_count; - __u32 sge_count; - __u32 wqe_size; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint32_t srq_handle; + uint32_t wr_count; + uint32_t sge_count; + uint32_t wqe_size; struct ibv_kern_recv_wr recv_wr[0]; }; struct ibv_post_srq_recv_resp { - __u32 bad_wr; + uint32_t bad_wr; }; struct ibv_create_ah { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u64 user_handle; - __u32 pd_handle; - __u32 reserved; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint64_t user_handle; + uint32_t pd_handle; + uint32_t reserved; struct ibv_kern_ah_attr attr; }; struct ibv_create_ah_resp { - __u32 handle; + uint32_t handle; }; struct ibv_destroy_ah { - __u32 command; - __u16 in_words; - __u16 out_words; - __u32 ah_handle; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint32_t ah_handle; }; struct ibv_attach_mcast { - __u32 command; - __u16 in_words; - __u16 out_words; - __u8 gid[16]; - __u32 qp_handle; - __u16 mlid; - __u16 reserved; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint8_t gid[16]; + uint32_t qp_handle; + uint16_t mlid; + uint16_t reserved; + uint64_t driver_data[0]; }; struct ibv_detach_mcast { - __u32 command; - __u16 in_words; - __u16 out_words; - __u8 gid[16]; - __u32 qp_handle; - __u16 mlid; - __u16 reserved; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint8_t gid[16]; + uint32_t qp_handle; + uint16_t mlid; + uint16_t reserved; + uint64_t driver_data[0]; }; struct ibv_create_srq { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u64 user_handle; - __u32 pd_handle; - __u32 max_wr; - __u32 max_sge; - __u32 srq_limit; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint64_t user_handle; + uint32_t pd_handle; + uint32_t max_wr; + uint32_t max_sge; + uint32_t srq_limit; + uint64_t driver_data[0]; }; struct ibv_create_srq_resp { - __u32 srq_handle; - __u32 max_wr; - __u32 max_sge; - __u32 reserved; + uint32_t srq_handle; + uint32_t max_wr; + uint32_t max_sge; + uint32_t reserved; }; struct ibv_modify_srq { - __u32 command; - __u16 in_words; - __u16 out_words; - __u32 srq_handle; - __u32 attr_mask; - __u32 max_wr; - __u32 srq_limit; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint32_t srq_handle; + uint32_t attr_mask; + uint32_t max_wr; + uint32_t srq_limit; + uint64_t driver_data[0]; }; struct ibv_query_srq { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u32 srq_handle; - __u32 reserved; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint32_t srq_handle; + uint32_t reserved; + uint64_t driver_data[0]; }; struct ibv_query_srq_resp { - __u32 max_wr; - __u32 max_sge; - __u32 srq_limit; - __u32 reserved; + uint32_t max_wr; + uint32_t max_sge; + uint32_t srq_limit; + uint32_t reserved; }; struct ibv_destroy_srq { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u32 srq_handle; - __u32 reserved; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint32_t srq_handle; + uint32_t reserved; }; struct ibv_destroy_srq_resp { - __u32 events_reported; + uint32_t events_reported; }; /* @@ -806,76 +806,76 @@ }; struct ibv_destroy_cq_v1 { - __u32 command; - __u16 in_words; - __u16 out_words; - __u32 cq_handle; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint32_t cq_handle; }; struct ibv_destroy_qp_v1 { - __u32 command; - __u16 in_words; - __u16 out_words; - __u32 qp_handle; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint32_t qp_handle; }; struct ibv_destroy_srq_v1 { - __u32 command; - __u16 in_words; - __u16 out_words; - __u32 srq_handle; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint32_t srq_handle; }; struct ibv_get_context_v2 { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u64 cq_fd_tab; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint64_t cq_fd_tab; + uint64_t driver_data[0]; }; struct ibv_create_cq_v2 { - __u32 command; - __u16 in_words; - __u16 out_words; - __u64 response; - __u64 user_handle; - __u32 cqe; - __u32 event_handler; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint64_t response; + uint64_t user_handle; + uint32_t cqe; + uint32_t event_handler; + uint64_t driver_data[0]; }; struct ibv_modify_srq_v3 { - __u32 command; - __u16 in_words; - __u16 out_words; - __u32 srq_handle; - __u32 attr_mask; - __u32 max_wr; - __u32 max_sge; - __u32 srq_limit; - __u32 reserved; - __u64 driver_data[0]; + uint32_t command; + uint16_t in_words; + uint16_t out_words; + uint32_t srq_handle; + uint32_t attr_mask; + uint32_t max_wr; + uint32_t max_sge; + uint32_t srq_limit; + uint32_t reserved; + uint64_t driver_data[0]; }; struct ibv_create_qp_resp_v3 { - __u32 qp_handle; - __u32 qpn; + uint32_t qp_handle; + uint32_t qpn; }; struct ibv_create_qp_resp_v4 { - __u32 qp_handle; - __u32 qpn; - __u32 max_send_wr; - __u32 max_recv_wr; - __u32 max_send_sge; - __u32 max_recv_sge; - __u32 max_inline_data; + uint32_t qp_handle; + uint32_t qpn; + uint32_t max_send_wr; + uint32_t max_recv_wr; + uint32_t max_send_sge; + uint32_t max_recv_sge; + uint32_t max_inline_data; }; struct ibv_create_srq_resp_v5 { - __u32 srq_handle; + uint32_t srq_handle; }; #endif /* KERN_ABI_H */ Index: libmthca/src/mthca.h =================================================================== --- libmthca/src/mthca.h (revision 7574) +++ libmthca/src/mthca.h (working copy) @@ -36,6 +36,7 @@ #ifndef MTHCA_H #define MTHCA_H +#include #include #include Index: libmthca/src/mthca-abi.h =================================================================== --- libmthca/src/mthca-abi.h (revision 7574) +++ libmthca/src/mthca-abi.h (working copy) @@ -36,63 +36,64 @@ #ifndef MTHCA_ABI_H #define MTHCA_ABI_H +#include #include struct mthca_alloc_ucontext_resp { struct ibv_get_context_resp ibv_resp; - __u32 qp_tab_size; - __u32 uarc_size; + uint32_t qp_tab_size; + uint32_t uarc_size; }; struct mthca_alloc_pd_resp { struct ibv_alloc_pd_resp ibv_resp; - __u32 pdn; - __u32 reserved; + uint32_t pdn; + uint32_t reserved; }; struct mthca_create_cq { struct ibv_create_cq ibv_cmd; - __u32 lkey; - __u32 pdn; - __u64 arm_db_page; - __u64 set_db_page; - __u32 arm_db_index; - __u32 set_db_index; + uint32_t lkey; + uint32_t pdn; + uint64_t arm_db_page; + uint64_t set_db_page; + uint32_t arm_db_index; + uint32_t set_db_index; }; struct mthca_create_cq_resp { struct ibv_create_cq_resp ibv_resp; - __u32 cqn; - __u32 reserved; + uint32_t cqn; + uint32_t reserved; }; struct mthca_resize_cq { struct ibv_resize_cq ibv_cmd; - __u32 lkey; - __u32 reserved; + uint32_t lkey; + uint32_t reserved; }; struct mthca_create_srq { struct ibv_create_srq ibv_cmd; - __u32 lkey; - __u32 db_index; - __u64 db_page; + uint32_t lkey; + uint32_t db_index; + uint64_t db_page; }; struct mthca_create_srq_resp { struct ibv_create_srq_resp ibv_resp; - __u32 srqn; - __u32 reserved; + uint32_t srqn; + uint32_t reserved; }; struct mthca_create_qp { struct ibv_create_qp ibv_cmd; - __u32 lkey; - __u32 reserved; - __u64 sq_db_page; - __u64 rq_db_page; - __u32 sq_db_index; - __u32 rq_db_index; + uint32_t lkey; + uint32_t reserved; + uint64_t sq_db_page; + uint64_t rq_db_page; + uint32_t sq_db_index; + uint32_t rq_db_index; }; #endif /* MTHCA_ABI_H */ Index: libipathverbs/src/ipathverbs.h =================================================================== --- libipathverbs/src/ipathverbs.h (revision 7584) +++ libipathverbs/src/ipathverbs.h (working copy) @@ -40,6 +40,8 @@ #include #include +#include +#include #include #include #include -- MST From mst at mellanox.co.il Wed May 31 02:08:17 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 31 May 2006 12:08:17 +0300 Subject: [openib-general] Re: Failed multicast join withnew multicast module In-Reply-To: References: <1149024804.4510.1056.camel@hal.voltaire.com> Message-ID: <20060531090817.GQ21266@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: Failed multicast join withnew multicast module > > >On Tue, 2006-05-30 at 17:33, Sean Hefty wrote: > >> Hal Rosenstock wrote: > >> > Is client reregister handled properly by the multicast module ? > >> > >> Can you clarify what you mean by this? Are you asking about re-sending join > >> requests based on some event? > > > >Yes; when the SM sends a Set PortInfo with ClientReregister set. > > No - the current implementation does not. But the information is there to > resend the join requests. Hmm. I think ipoib used to handle this properly at the ULP level. -- MST From leonida at voltaire.com Wed May 31 02:49:50 2006 From: leonida at voltaire.com (Leonid Arsh) Date: Wed, 31 May 2006 12:49:50 +0300 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: References: <20060509060958.GA482@voltaire.com> <10e223bf0605300403lcc24b8bwf1e1d7059edab416@mail.gmail.com> Message-ID: <10e223bf0605310249t57e89b72ne8e38e57945a2cec@mail.gmail.com> Hi, the most urgent and critical case is the SM failure/restart when the SM is not connected to the host directly. In this cases neither PortError no PortActive events will be generated on the host. The SM will lose the multicast group configuration for the host and the host will need to rejoin its multicast groups in this case. IPoIB shall handle the problem by catching the ClientReregister event. There are additional cases. Any client which registers itself on the SA, will need to handle this event in order to work properly after the SM failure/restart. We'll need it very soon for a user mode application. On 5/30/06, Roland Dreier wrote: > Leonid> Roland, Aren't you going to apply these patches? This is > Leonid> a standard feature, and we need it rather urgently. > > Sorry, I had let these patches fall off of my queue. > > So why is this an urgent feature? A good description of your use case > would make it easier for me to write the changelog entries (since your > patches didn't include usable descriptions). > > Thanks, > Roland > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From tziporet at mellanox.co.il Wed May 31 02:55:34 2006 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 31 May 2006 12:55:34 +0300 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> <20060529161405.GT21266@mellanox.co.il> Message-ID: <447D6816.2090405@mellanox.co.il> Roland Dreier wrote: > I queued this for 2.6.18. > _______________________________________________ > It will be very good if we have it for 2.6.17 too since then we can get it directly from git and no need for a fix in OFED. Also I think that ia64 customers that will work directly with kernel 2.6.17 will be bothered by these messages. Tziporet From leonida at voltaire.com Wed May 31 02:58:38 2006 From: leonida at voltaire.com (Leonid Arsh) Date: Wed, 31 May 2006 12:58:38 +0300 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: References: <20060509060958.GA482@voltaire.com> Message-ID: <10e223bf0605310258k753df4d2i7f68617f110ccdaa@mail.gmail.com> I just moved the structure from ipath_mad.c to ib_smi.h. I see no special reason to have '__attribute__ ((packed))' here. Isn't it a good style to have a structure packed when it is sent over wire? On 5/30/06, Roland Dreier wrote: > > +struct port_info { > > + __be64 mkey; > > + __be64 gid_prefix; > > + __be16 lid; > > + __be16 sm_lid; > > + __be32 cap_mask; > > + __be16 diag_code; > > + __be16 mkey_lease_period; > > + u8 local_port_num; > > + u8 link_width_enabled; > > + u8 link_width_supported; > > + u8 link_width_active; > > + u8 linkspeed_portstate; /* 4 bits, 4 bits */ > > + u8 portphysstate_linkdown; /* 4 bits, 4 bits */ > > + u8 mkeyprot_resv_lmc; /* 2 bits, 3 bits, 3 bits */ > > + u8 linkspeedactive_enabled; /* 4 bits, 4 bits */ > > + u8 neighbormtu_mastersmsl; /* 4 bits, 4 bits */ > > + u8 vlcap_inittype; /* 4 bits, 4 bits */ > > + u8 vl_high_limit; > > + u8 vl_arb_high_cap; > > + u8 vl_arb_low_cap; > > + u8 inittypereply_mtucap; /* 4 bits, 4 bits */ > > + u8 vlstallcnt_hoqlife; /* 3 bits, 5 bits */ > > + u8 operationalvl_pei_peo_fpi_fpo; /* 4 bits, 1, 1, 1, 1 */ > > + __be16 mkey_violations; > > + __be16 pkey_violations; > > + __be16 qkey_violations;it packed. > > + u8 guid_cap; > > + u8 clientrereg_resv_subnetto; /* 1 bit, 2 bits, 5 bits */ > > + u8 resv_resptimevalue; /* 3 bits, 5 bits */ > > + u8 localphyerrors_overrunerrors; /* 4 bits, 4 bits */ > > + __be16 max_credit_hint; > > + u8 resv; > > + u8 link_roundtrip_latency[3]; > > +} __attribute__ ((packed)); > > Any reason why this needs to be packed? It looks like everything is > naturally aligned to its size anyway. > From asgeir at chelsio.com Wed May 31 03:28:39 2006 From: asgeir at chelsio.com (Asgeir Eiriksson) Date: Wed, 31 May 2006 03:28:39 -0700 Subject: [openib-general] Re: QoS RFC Message-ID: <67D69596DDF0C2448DB0F0547D0F947E01E1DBF2@yogi.asicdesigners.com> Roland This thread also indirectly brings up the issue of OpenFabrics, RNIC, and QoS. The RNIC devices don't have to be, but are typically unified wire devices, i.e. have simultaneous support for regular Ethernet NIC functions such as LSO/TSO and checksum offloads; iSCSI HBA initiator and/or target functionality; RNIC functionality; and TCP/IP offload. All the usual Ethernet TCP/IP management and configuration tools work as expected for such a device, e.g. for the purpose of this discussion QoS configuration with DiffServ, etc. At the same time: if there's an OpenFabrics QoS API, then this API needs to be transport agnostic, such that RNIC QoS can be configured through this same OpenFabrics QoS API. To be clear: the QoS requirements presented in the RFC look quite reasonable for an RNIC device, but the initial proposal didn't abstract out non-essential (to QoS) IB detail. Regards, Asgeir Eiriksson CTO Chelsio Communications Inc. > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Roland Dreier > Sent: Tuesday, May 30, 2006 7:44 AM > To: Eitan Zahavi > Cc: Nimrod Gindi; Roland Dreier; openib-general at openib.org > Subject: [openib-general] Re: QoS RFC > > > Service-ID: > > * For PathRecord: use the first 2 reserved fields whicg are 32bits each > > (component masks 0x1 and 0x2). Component mask 1 should be used to > > refer to the > > merged Service-ID field > > > A new capability bit should describe the SM QoS support in the SA class > > port > > info. This approach provides an easy migration path for existing access > > layer > > and ULPs by not introducing a new attribute. > > This is OK but it's sort of a pain to have to query SA ClassPortInfo > all the time. Do you have a plan for how to make this transparent to > ULPs? > > (BTW something in your email client is really messing up the > formatting of your message) > > - R. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general From asgeir at chelsio.com Wed May 31 03:31:31 2006 From: asgeir at chelsio.com (Asgeir Eiriksson) Date: Wed, 31 May 2006 03:31:31 -0700 Subject: [openib-general] QoS RFC Message-ID: <67D69596DDF0C2448DB0F0547D0F947E01E1DBF3@yogi.asicdesigners.com> Eitan Let's assume for the moment (because you're presenting this under the OpenFabrics banner) that the capability you describe below will be hidden behind wire protocol agnostic API that can be used by IB HCA, Ethernet RNIC, etc. It is not clear from the description, but I assume the arbiter you refer to below allows for traffic classes where a) the bandwidth of the different flows within a particular class aggregate to a fixed bandwidth, and also b) where each flow within a class has a fixed bandwidth, e.g. where each flow corresponds to some type of fixed rate media stream. A QoS API should also needs have support for flows with large RTT (Round Trip Times). Large RTT flows typically require the sender to space the packets equally on the wire (referred to as pacing), with the spacing interval being dynamically computed based on the RTT. It is sufficient in this case to be able to set a pacing attribute through the API. Regards, Asgeir Eiriksson CTO Chelsio Communications _____ From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Eitan Zahavi Sent: Tuesday, May 30, 2006 7:41 AM To: openib-general at openib.org Cc: Nimrod Gindi; Roland Dreier Subject: [openib-general] QoS RFC Hi All Please find the attached RFC describing how QoS policy support could be implemented in the OpenFabrics stack. Your comments are welcome. Eitan RFC: OpenFabrics Enhancements for QoS Support =============================================== Authors: . Eitan Zahavi Date: .... May 2006. Revision: 0.1 Table of contents: 1. Overview 2. Architecture 3. Supported Policy 4. CMA functionality 5. IPoIB functionality 6. SDP functionality 7. SRP functionality 8. iSER functionality 9. OpenSM functionality 1. Overview ------------ Quality of Service requirements stem from the realization of I/O consolidation over IB network: As multiple applications and ULPs share the same fabric, means to control their use of the network resources are becoming a must. The basic need is to differentiate the service levels provided to different traffic flows. Such that a policy could be enforced and control each flow utilization of the fabric resources. IBTA specification defined several hardware features and management interfaces to support QoS: * Up to 15 Virtual Lanes (VL) could carry traffic in a non-blocking manner * Arbitration between traffic of different VL is performed by a 2 priority levels weighted round robin arbiter. The arbiter is programmable with a sequence of (VL, weight) pairs and maximal number of high priority credits to be processed before low priority is served * Packets carry class of service marking in the range 0 to 15 in their header SL field * Each switch can map the incoming packet by its SL to a particular output VL based on programmable table VL=SL-to-VL-MAP(in-port, out-port, SL) * The Subnet Administrator controls each communication flow parameters by providing them as a response to Path Record query The IB QoS features provide the means to implement a DiffServ like architecture. DiffServ architecture (IETF RFC2474 2475) is widely used today in highly dynamic fabrics. This proposal provides the detailed functional definition for the various software elements that are required to enable a DiffServ like architecture over the OpenFabrics software stack. < cut to end of original email> -------------- next part -------------- An HTML attachment was scrubbed... URL: From clayton at adnoc.com Wed May 31 05:01:07 2006 From: clayton at adnoc.com (Hubert Welch) Date: Wed, 31 May 2006 04:01:07 -0800 Subject: [openib-general] Agents compete for your refi!! Message-ID: <434938072779837.7185480@msn.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: swam.3.gif Type: image/gif Size: 7610 bytes Desc: not available URL: From mst at mellanox.co.il Wed May 31 04:12:25 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 31 May 2006 14:12:25 +0300 Subject: [openib-general] [PATCH] librdma: use stdint.h types Message-ID: <20060531111225.GR21266@mellanox.co.il> Same patch as posted previously for libibverbs/libmthca. --- librdma pulls in types from under linux/ directory which are broken on some distros/compilers. There's no good reason for us to do so: integer types of specific widths are specified in stdint.h in a portable way. Signed-off-by: Michael S. Tsirkin Index: openib/src/userspace/librdmacm/include/rdma/rdma_cma_abi.h =================================================================== --- openib/src/userspace/librdmacm/include/rdma/rdma_cma_abi.h (revision 7599) +++ openib/src/userspace/librdmacm/include/rdma/rdma_cma_abi.h (working copy) @@ -33,6 +33,7 @@ #ifndef RDMA_CMA_ABI_H #define RDMA_CMA_ABI_H +#include #include /* @@ -63,146 +64,146 @@ }; struct ucma_abi_cmd_hdr { - __u32 cmd; - __u16 in; - __u16 out; + uint32_t cmd; + uint16_t in; + uint16_t out; }; struct ucma_abi_create_id { - __u64 uid; - __u64 response; + uint64_t uid; + uint64_t response; }; struct ucma_abi_create_id_resp { - __u32 id; + uint32_t id; }; struct ucma_abi_destroy_id { - __u64 response; - __u32 id; - __u32 reserved; + uint64_t response; + uint32_t id; + uint32_t reserved; }; struct ucma_abi_destroy_id_resp { - __u32 events_reported; + uint32_t events_reported; }; struct ucma_abi_bind_addr { - __u64 response; + uint64_t response; struct sockaddr_in6 addr; - __u32 id; + uint32_t id; }; struct ucma_abi_resolve_addr { struct sockaddr_in6 src_addr; struct sockaddr_in6 dst_addr; - __u32 id; - __u32 timeout_ms; + uint32_t id; + uint32_t timeout_ms; }; struct ucma_abi_resolve_route { - __u32 id; - __u32 timeout_ms; + uint32_t id; + uint32_t timeout_ms; }; struct ucma_abi_query_route { - __u64 response; - __u32 id; - __u32 reserved; + uint64_t response; + uint32_t id; + uint32_t reserved; }; struct ucma_abi_query_route_resp { - __u64 node_guid; + uint64_t node_guid; struct ibv_kern_path_rec ib_route[2]; struct sockaddr_in6 src_addr; struct sockaddr_in6 dst_addr; - __u32 num_paths; - __u8 port_num; - __u8 reserved[3]; + uint32_t num_paths; + uint8_t port_num; + uint8_t reserved[3]; }; struct ucma_abi_conn_param { - __u32 qp_num; - __u32 qp_type; - __u8 private_data[RDMA_MAX_PRIVATE_DATA]; - __u8 private_data_len; - __u8 srq; - __u8 responder_resources; - __u8 initiator_depth; - __u8 flow_control; - __u8 retry_count; - __u8 rnr_retry_count; - __u8 valid; + uint32_t qp_num; + uint32_t qp_type; + uint8_t private_data[RDMA_MAX_PRIVATE_DATA]; + uint8_t private_data_len; + uint8_t srq; + uint8_t responder_resources; + uint8_t initiator_depth; + uint8_t flow_control; + uint8_t retry_count; + uint8_t rnr_retry_count; + uint8_t valid; }; struct ucma_abi_connect { struct ucma_abi_conn_param conn_param; - __u32 id; - __u32 reserved; + uint32_t id; + uint32_t reserved; }; struct ucma_abi_listen { - __u32 id; - __u32 backlog; + uint32_t id; + uint32_t backlog; }; struct ucma_abi_accept { - __u64 uid; + uint64_t uid; struct ucma_abi_conn_param conn_param; - __u32 id; - __u32 reserved; + uint32_t id; + uint32_t reserved; }; struct ucma_abi_reject { - __u32 id; - __u8 private_data_len; - __u8 reserved[3]; - __u8 private_data[RDMA_MAX_PRIVATE_DATA]; + uint32_t id; + uint8_t private_data_len; + uint8_t reserved[3]; + uint8_t private_data[RDMA_MAX_PRIVATE_DATA]; }; struct ucma_abi_disconnect { - __u32 id; + uint32_t id; }; struct ucma_abi_init_qp_attr { - __u64 response; - __u32 id; - __u32 qp_state; + uint64_t response; + uint32_t id; + uint32_t qp_state; }; struct ucma_abi_get_event { - __u64 response; + uint64_t response; }; struct ucma_abi_event_resp { - __u64 uid; - __u32 id; - __u32 event; - __u32 status; - __u8 private_data_len; - __u8 reserved[3]; - __u8 private_data[RDMA_MAX_PRIVATE_DATA]; + uint64_t uid; + uint32_t id; + uint32_t event; + uint32_t status; + uint8_t private_data_len; + uint8_t reserved[3]; + uint8_t private_data[RDMA_MAX_PRIVATE_DATA]; }; struct ucma_abi_get_option { - __u64 response; - __u64 optval; - __u32 id; - __u32 level; - __u32 optname; - __u32 optlen; + uint64_t response; + uint64_t optval; + uint32_t id; + uint32_t level; + uint32_t optname; + uint32_t optlen; }; struct ucma_abi_get_option_resp { - __u32 optlen; + uint32_t optlen; }; struct ucma_abi_set_option { - __u64 optval; - __u32 id; - __u32 level; - __u32 optname; - __u32 optlen; + uint64_t optval; + uint32_t id; + uint32_t level; + uint32_t optname; + uint32_t optlen; }; #endif /* RDMA_CMA_ABI_H */ Index: openib/src/userspace/librdmacm/configure.in =================================================================== --- openib/src/userspace/librdmacm/configure.in (revision 7599) +++ openib/src/userspace/librdmacm/configure.in (working copy) @@ -23,19 +23,21 @@ AC_CHECK_SIZEOF(long) dnl Checks for libraries +AC_CHECK_LIB(sysfs, sysfs_open_class, [], + AC_MSG_ERROR([sysfs_open_class() not found. librdmacm requires libsysfs.])) + if test "$disable_libcheck" != "yes" then -AC_CHECK_LIB(sysfs, sysfs_open_class, [], - AC_MSG_ERROR([sysfs_open_class() not found. librdmacm requires libsysfs.])) AC_CHECK_LIB(ibverbs, ibv_get_device_list, [], AC_MSG_ERROR([ibv_get_device_list() not found. librdmacm requires libibverbs.])) fi dnl Checks for header files. +AC_CHECK_HEADER(sysfs/libsysfs.h, [], + AC_MSG_ERROR([ not found. librdmacm requires libsysfs.])) + if test "$disable_libcheck" != "yes" then -AC_CHECK_HEADER(sysfs/libsysfs.h, [], - AC_MSG_ERROR([ not found. librdmacm requires libsysfs.])) AC_CHECK_HEADER(infiniband/verbs.h, [], AC_MSG_ERROR([ not found. Is libibverbs installed?])) fi -- MST From mst at mellanox.co.il Wed May 31 04:18:37 2006 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 31 May 2006 14:18:37 +0300 Subject: [openib-general] [PATCH] libibcm: use stdint.h types Message-ID: <20060531111837.GS21266@mellanox.co.il> Same logic as posted previously for libibverbs. -- Don't pull in types from under linux/ directory which are broken on some distros/compilers. There's no good reason for us to do so: integer types of specific widths are specified in stdint.h in a portable way. Signed-off-by: Michael S. Tsirkin Index: openib/src/userspace/libibcm/include/infiniband/cm_abi.h =================================================================== --- openib/src/userspace/libibcm/include/infiniband/cm_abi.h (revision 7516) +++ openib/src/userspace/libibcm/include/infiniband/cm_abi.h (working copy) @@ -36,7 +36,7 @@ #ifndef CM_ABI_H #define CM_ABI_H -#include +#include #include #include @@ -74,211 +74,211 @@ * command ABI structures. */ struct cm_abi_cmd_hdr { - __u32 cmd; - __u16 in; - __u16 out; + uint32_t cmd; + uint16_t in; + uint16_t out; }; struct cm_abi_create_id { - __u64 uid; - __u64 response; + uint64_t uid; + uint64_t response; }; struct cm_abi_create_id_resp { - __u32 id; + uint32_t id; }; struct cm_abi_destroy_id { - __u64 response; - __u32 id; - __u32 reserved; + uint64_t response; + uint32_t id; + uint32_t reserved; }; struct cm_abi_destroy_id_resp { - __u32 events_reported; + uint32_t events_reported; }; struct cm_abi_attr_id { - __u64 response; - __u32 id; - __u32 reserved; + uint64_t response; + uint32_t id; + uint32_t reserved; }; struct cm_abi_attr_id_resp { - __u64 service_id; - __u64 service_mask; - __u32 local_id; - __u32 remote_id; + uint64_t service_id; + uint64_t service_mask; + uint32_t local_id; + uint32_t remote_id; }; struct cm_abi_init_qp_attr { - __u64 response; - __u32 id; - __u32 qp_state; + uint64_t response; + uint32_t id; + uint32_t qp_state; }; struct cm_abi_listen { - __u64 service_id; - __u64 service_mask; - __u32 id; - __u32 reserved; + uint64_t service_id; + uint64_t service_mask; + uint32_t id; + uint32_t reserved; }; struct cm_abi_establish { - __u32 id; + uint32_t id; }; struct cm_abi_private_data { - __u64 data; - __u32 id; - __u8 len; - __u8 reserved[3]; + uint64_t data; + uint32_t id; + uint8_t len; + uint8_t reserved[3]; }; struct cm_abi_req { - __u32 id; - __u32 qpn; - __u32 qp_type; - __u32 psn; - __u64 sid; - __u64 data; - __u64 primary_path; - __u64 alternate_path; - __u8 len; - __u8 peer_to_peer; - __u8 responder_resources; - __u8 initiator_depth; - __u8 remote_cm_response_timeout; - __u8 flow_control; - __u8 local_cm_response_timeout; - __u8 retry_count; - __u8 rnr_retry_count; - __u8 max_cm_retries; - __u8 srq; - __u8 reserved[5]; + uint32_t id; + uint32_t qpn; + uint32_t qp_type; + uint32_t psn; + uint64_t sid; + uint64_t data; + uint64_t primary_path; + uint64_t alternate_path; + uint8_t len; + uint8_t peer_to_peer; + uint8_t responder_resources; + uint8_t initiator_depth; + uint8_t remote_cm_response_timeout; + uint8_t flow_control; + uint8_t local_cm_response_timeout; + uint8_t retry_count; + uint8_t rnr_retry_count; + uint8_t max_cm_retries; + uint8_t srq; + uint8_t reserved[5]; }; struct cm_abi_rep { - __u64 uid; - __u64 data; - __u32 id; - __u32 qpn; - __u32 psn; - __u8 len; - __u8 responder_resources; - __u8 initiator_depth; - __u8 target_ack_delay; - __u8 failover_accepted; - __u8 flow_control; - __u8 rnr_retry_count; - __u8 srq; - __u8 reserved[4]; + uint64_t uid; + uint64_t data; + uint32_t id; + uint32_t qpn; + uint32_t psn; + uint8_t len; + uint8_t responder_resources; + uint8_t initiator_depth; + uint8_t target_ack_delay; + uint8_t failover_accepted; + uint8_t flow_control; + uint8_t rnr_retry_count; + uint8_t srq; + uint8_t reserved[4]; }; struct cm_abi_info { - __u32 id; - __u32 status; - __u64 info; - __u64 data; - __u8 info_len; - __u8 data_len; - __u8 reserved[6]; + uint32_t id; + uint32_t status; + uint64_t info; + uint64_t data; + uint8_t info_len; + uint8_t data_len; + uint8_t reserved[6]; }; struct cm_abi_mra { - __u64 data; - __u32 id; - __u8 len; - __u8 timeout; - __u8 reserved[2]; + uint64_t data; + uint32_t id; + uint8_t len; + uint8_t timeout; + uint8_t reserved[2]; }; struct cm_abi_lap { - __u64 path; - __u64 data; - __u32 id; - __u8 len; - __u8 reserved[3]; + uint64_t path; + uint64_t data; + uint32_t id; + uint8_t len; + uint8_t reserved[3]; }; struct cm_abi_sidr_req { - __u32 id; - __u32 timeout; - __u64 sid; - __u64 data; - __u64 path; - __u16 pkey; - __u8 len; - __u8 max_cm_retries; - __u8 reserved[4]; + uint32_t id; + uint32_t timeout; + uint64_t sid; + uint64_t data; + uint64_t path; + uint16_t pkey; + uint8_t len; + uint8_t max_cm_retries; + uint8_t reserved[4]; }; struct cm_abi_sidr_rep { - __u32 id; - __u32 qpn; - __u32 qkey; - __u32 status; - __u64 info; - __u64 data; - __u8 info_len; - __u8 data_len; - __u8 reserved[6]; + uint32_t id; + uint32_t qpn; + uint32_t qkey; + uint32_t status; + uint64_t info; + uint64_t data; + uint8_t info_len; + uint8_t data_len; + uint8_t reserved[6]; }; /* * event notification ABI structures. */ struct cm_abi_event_get { - __u64 response; - __u64 data; - __u64 info; - __u8 data_len; - __u8 info_len; - __u8 reserved[6]; + uint64_t response; + uint64_t data; + uint64_t info; + uint8_t data_len; + uint8_t info_len; + uint8_t reserved[6]; }; struct cm_abi_req_event_resp { struct ibv_kern_path_rec primary_path; struct ibv_kern_path_rec alternate_path; - __u64 remote_ca_guid; - __u32 remote_qkey; - __u32 remote_qpn; - __u32 qp_type; - __u32 starting_psn; - __u8 responder_resources; - __u8 initiator_depth; - __u8 local_cm_response_timeout; - __u8 flow_control; - __u8 remote_cm_response_timeout; - __u8 retry_count; - __u8 rnr_retry_count; - __u8 srq; - __u8 port; - __u8 reserved[7]; + uint64_t remote_ca_guid; + uint32_t remote_qkey; + uint32_t remote_qpn; + uint32_t qp_type; + uint32_t starting_psn; + uint8_t responder_resources; + uint8_t initiator_depth; + uint8_t local_cm_response_timeout; + uint8_t flow_control; + uint8_t remote_cm_response_timeout; + uint8_t retry_count; + uint8_t rnr_retry_count; + uint8_t srq; + uint8_t port; + uint8_t reserved[7]; }; struct cm_abi_rep_event_resp { - __u64 remote_ca_guid; - __u32 remote_qkey; - __u32 remote_qpn; - __u32 starting_psn; - __u8 responder_resources; - __u8 initiator_depth; - __u8 target_ack_delay; - __u8 failover_accepted; - __u8 flow_control; - __u8 rnr_retry_count; - __u8 srq; - __u8 reserved[5]; + uint64_t remote_ca_guid; + uint32_t remote_qkey; + uint32_t remote_qpn; + uint32_t starting_psn; + uint8_t responder_resources; + uint8_t initiator_depth; + uint8_t target_ack_delay; + uint8_t failover_accepted; + uint8_t flow_control; + uint8_t rnr_retry_count; + uint8_t srq; + uint8_t reserved[5]; }; struct cm_abi_rej_event_resp { - __u32 reason; + uint32_t reason; /* ari in cm_abi_event_get info field. */ }; struct cm_abi_mra_event_resp { - __u8 timeout; - __u8 reserved[3]; + uint8_t timeout; + uint8_t reserved[3]; }; struct cm_abi_lap_event_resp { @@ -286,20 +286,20 @@ }; struct cm_abi_apr_event_resp { - __u32 status; + uint32_t status; /* apr info in cm_abi_event_get info field. */ }; struct cm_abi_sidr_req_event_resp { - __u16 pkey; - __u8 port; - __u8 reserved; + uint16_t pkey; + uint8_t port; + uint8_t reserved; }; struct cm_abi_sidr_rep_event_resp { - __u32 status; - __u32 qkey; - __u32 qpn; + uint32_t status; + uint32_t qkey; + uint32_t qpn; /* info in cm_abi_event_get info field. */ }; @@ -309,11 +309,11 @@ #define CM_ABI_PRES_ALTERNATE 0x08 struct cm_abi_event_resp { - __u64 uid; - __u32 id; - __u32 event; - __u32 present; - __u32 reserved; + uint64_t uid; + uint32_t id; + uint32_t event; + uint32_t present; + uint32_t reserved; union { struct cm_abi_req_event_resp req_resp; struct cm_abi_rep_event_resp rep_resp; @@ -325,7 +325,7 @@ struct cm_abi_sidr_req_event_resp sidr_req_resp; struct cm_abi_sidr_rep_event_resp sidr_rep_resp; - __u32 send_status; + uint32_t send_status; } u; }; -- MST From garyb at a1.esvax.es.dupont.com Wed May 31 05:22:30 2006 From: garyb at a1.esvax.es.dupont.com (Lynnette Hammer) Date: Wed, 31 May 2006 04:22:30 -0800 Subject: [openib-general] Notice: Loww mortagee ratee approved Message-ID: <575536849.2475269936895.JavaMail.ebayapp@sj-besreco049> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mcconnel.6.gif Type: image/gif Size: 8503 bytes Desc: not available URL: From usatan at walla.com Wed May 31 04:41:31 2006 From: usatan at walla.com (usatan at walla.com) Date: Wed, 31 May 2006 04:41:31 -0700 (PDT) Subject: [openib-general] =?utf-8?b?wpdUwpXCn8KCw4jCj8KXwpDCq8KCw7DCl0Q=?= =?utf-8?b?woLCtcKCwq3ClsO8woLCtcKCw4TCgsKtwoLCvsKCwrPCgsKi?= Message-ID: 20030926032801.13796mail@mail.hyper_s_class552158754_lookserver772_serebusystem03_woman-s-class.tv ����=================================================================������ �@�@�@�@�@�N���d�˂Ă�����������̂��肢�𕷂��Ă�炦�܂��񂩁H ����=================================================================������ �@�@�@�@�@�@�@�@�����@http://sclass.cx/h/entry.html�@���� �@�@���肢�Ƃ͒[�I�Ɍ����āA�u���т����𖄂߂ė~�����v�Ƃ������Ƃł��B �@�@���Ɏd�����y�����āA����𐶂������ɂ��Ă��������قǁA �@�@�N���d�˂�Ƃ��̌X���������悤�ł��B �@�@�@�@�@�@�@�@�����@http://sclass.cx/h/entry.html�@���� ���̓T�^�I�ȕ՗�Ƃ��ẮA ������������������������������������������������������������������������������ ���@�Ⴂ���́A�Ⴂ�����Ƃ��������ł�Ă͂₳��āA���̐��ɓ�S�C����悭 �@�@���Ȃ���A�d�����ނ����ɍs���Ă����B �@�@�� ���@���ꂪ���΂炭����ƁA�j���̂����u�Ⴂ�v�Ƃ����J�e�S���[����O��Ă��܂� �@�@�܂�肪�ςɐÂ��ɂȂ��Ă����B����Ɏd���������x�������������āA�]�T�� �@�@�łĂ����B �@�@�� ���@�]�T�����镪�A����̐Â����ɂƂÂ�Ȃ��₵��������悤�ɂȂ��Ă����B �@�@�� ���@�ł�A���̎₵���𖄂߂Ă���鑊�����܂Ō��‚��Ă��Ȃ������B �@�@�� ���@������A����̖ڂɂ͊��ɏܖ������̉߂����u�d���l�ԁv�Ƃ����f���ĂȂ��B ������������������������������������������������������������������������������ �Ƃ��������ł��B �@�@�@�@�@�@�@�@�����@http://sclass.cx/h/entry.html�@���� �@�@�ޏ������͂���Ƃ�����x�̎��Ԃ͎����Ă܂��B �@�@�������A�ޏ������̗~���𖞂����Ă����j���������g�̎���ɂ��܂���B �@�@������Ȃ����㉿���߂�Ȃ�A����ɉ����邾���̊�ʂ��ޏ������ɂ� �@�@����܂��B���Ȃ����ޏ������̗~���𖞂����Ă���邱�Ƃɋ��͂��Ă����� �@�@�̂ł���΁B �@�@���Ȃ����Ⴂ�j���Ȃ�A�N������Ă���Ƃ���ƊO�����E�E�E�ƍl������ �@�@�������܂���B �@�@�������A�ĊO�ޏ������͎Ⴂ�ł��B�ςɗ�����d�˂Ă��Ȃ����u�s���A�v�� �@�@��������܂��c���Ă��邭�炢�ł��B �@�@�ޏ������ƒm�荇���������� �@�@�@�@�@�@�@�@�����@http://sclass.cx/h/entry.html�@���� �@�@�T�C�g�ɓo�^���āi���S�����j�ޏ������ɃA�v���[�`����ۂɈȉ��̓_���� �@�@���ӂ��Ă��������B �P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P �E�@�����������ɂ����l�ł��邩�̃A�s�[�� �E�@�ޏ������̂��Ƃ����ƒm�肽���Ƃ����X�^���X �P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P�P �@�@������Ȃ������̂Q�_�Ɍ��y����Ȃ�΁A�ޏ������͑��̒j�����‚��� �@�@���ƂɂȂ�ł��傤�B �@�@�ޏ������ɂƂ��āA��̂Q�_�͕s�v�܂��͂m�f�ł��B �@�@�K�v�Ȃ��Ƃ́A �@�@�@�@�@�@�@�@�����@http://sclass.cx/h/entry.html�@���� ���������������������������������������������������������������� ���@���� ���@�ǂ��������Ȃ��邩 ���������������������������������������������������������������� �@�@�Ƃ����Q�_���Ƃ������Ƃ�Y��Ȃ��悤�ɂ��Ă��������B �@�@�ޏ������̗~���͋������A�X�g���[�g�ȕ��ɌX���X��������܂��B �@�@���̓_��������Ē����āA�ޏ������Ƃ̔鎖��s���Ă��������B �@�@��낵�����肢���܂��B �@�@���肪�Ƃ��������܂����B From lger at doneasy.com Wed May 31 05:30:57 2006 From: lger at doneasy.com (Carly Griggs) Date: Wed, 31 May 2006 04:30:57 -0800 Subject: [openib-general] Great loww ratess Message-ID: <451040250.4981334553055.JavaMail.ebayapp@sj-besreco018> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: antipodean.8.gif Type: image/gif Size: 8503 bytes Desc: not available URL: From Faubourg at freesticky.com Wed May 31 04:48:01 2006 From: Faubourg at freesticky.com (ladarius brin) Date: Wed, 31 May 2006 11:48:01 +0000 Subject: [openib-general] This is not a court of Justice, young man, it is a court of Law! Message-ID: <60361594.20060108012110@83.39.209.37> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: wtnsgau.gif Type: image/gif Size: 13879 bytes Desc: not available URL: From ogerlitz at voltaire.com Wed May 31 05:55:28 2006 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 31 May 2006 15:55:28 +0300 Subject: [openib-general] Re: [PATCH] IB/iser: do I/O path allocations with GFP_NOIO In-Reply-To: References: <447CA214.7010202@cs.wisc.edu> Message-ID: <447D9240.3010106@voltaire.com> Roland Dreier wrote: > It's a problem because SRP and iSER are straddling both the SCSI and > IB worlds. Probably the best policy is to cc all relevant mailing > lists (at least linux-scsi and openib-general) whenever there's a > doubt about who should see something. We are monitoring linux-scsi for tracking iscsi upstream updates. > As far as merging patches goes, I've been merging SRP changes directly > to Linus, except for generic fixes to , which I've been > sending through James. Or felt that iSCSI should be merged through my > tree, but I have no problem if in the future patches bypass my tree. > (But I would like to be cc'ed on changes to IB stuff, especially core > things outside of specific drivers) At this point, we prefer that iser related updates/merges would go through Roland, the IB maintainer, if this poses a problem for scsi/iscsi updates we are open to send our updates/merges via the scsi maintainer. Or. From roannasolomon at vrn-kk.com Wed May 31 06:03:54 2006 From: roannasolomon at vrn-kk.com (opalina barrow) Date: Wed, 31 May 2006 13:03:54 +0000 Subject: [openib-general] Check out this Match Bonus Message-ID: <003001c684b1$6da14080$452cb68c@zcvwecx> You've won the right to Play at the HI-ROLLER CASIN0! * Up to USD 888 bonus start to gamble with * Play free to see how amazing and easy it is * Instant payouts to all players and 24/7 support http://calisen.com/casino/ black-feathered suicide clause vacuum tester pseudo exemplar driggle-draggle quasi negligence scribing compass round-armed gas warfare sugar fish Non-pali drug user sparked-back Pre-newtonian u-o umlaut bull apple shovel-bladed angel-builded sharp-piled lacewing fly cant ribband wood buffalo quasi-enthusiastic trades-unionist -------------- next part -------------- An HTML attachment was scrubbed... URL: From masami_bb at ocn.ne.jp Wed May 31 07:29:55 2006 From: masami_bb at ocn.ne.jp (=?shift-jis?B?b29zZQ==?=) Date: Wed, 31 May 2006 07:29:55 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCPzc1LCQqO24kNyViJUsbKEI=?= =?iso-2022-jp?b?GyRCJT8hPEpnPTghKiEqGyhC?= Message-ID: <20060531142955.08F4222834D@openib.ca.sandia.gov> 新規立ち上げ業務につきお試しモニターを募集しております。 〜諸業務〜 ○派遣業務 ○お客様のニーズや好みに合わせたお相手を派遣する。 ※ただ今お試し業務期間につきまして無料にて業務をしております。(女の子の質や業務の改善点などの感想をいただければと思います。) さまざまなフェチの方やイメージプレイを望んでらっしゃるお客様にも対応できるように女の子はたくさんいますし、 実際に会う前にメールやお話でコンタクトをとることももちろん可能です。 実際にお客様の反応をみまして、業務拡大も考え中でございます。料金は無料ですので是非ご参加いただければ幸いです。 問い合わせ・紹介などはこちらです。→ http://www.deai-news24.net/?c115 迷惑でしたらこちらまでお願い致します。          ↓ s_for_sweetbaby at yahoo.co.uk From fujita.tomonori at lab.ntt.co.jp Wed May 31 07:29:30 2006 From: fujita.tomonori at lab.ntt.co.jp (FUJITA Tomonori) Date: Wed, 31 May 2006 23:29:30 +0900 Subject: [openib-general] Re: [PATCH] IB/iser: do I/O path allocations with GFP_NOIO In-Reply-To: <447CA214.7010202@cs.wisc.edu> References: <447CA214.7010202@cs.wisc.edu> Message-ID: <20060531232930L.fujita.tomonori@lab.ntt.co.jp> From: Mike Christie Subject: Re: [PATCH] IB/iser: do I/O path allocations with GFP_NOIO Date: Tue, 30 May 2006 14:50:44 -0500 > Should iser patches have linux-scsi ccd on them in the future? And > should they go through the scsi maintainer normally (I understand they > cannot now since James does not have all the infinniband bits)? I am > really just trying to avoid any coordinatation issues that come about by > having core iscsi and tcp iscsi patched sent to the scsi maintainer then > having to have iser going through Roland. > > For example I left a bit in the core iscsi code so I would not break > iser. Now iser is updating their code, so we do not need that bit, but > Or's patch missed the cleanup. If we sent everything through one > maintainer then we could have cleaned everything up in one pass. Roland, as proposed in the past, how about moving the iSER and SRP drivers to drivers/scsi? As you said, they straddle in SCSI and IB worlds, however, they are just LLDs like iscsi_tcp, which straddle in SCSI and TCP worlds. That would help to avoid coordinatation issues too. From rdreier at cisco.com Wed May 31 07:36:46 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 07:36:46 -0700 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: <10e223bf0605310258k753df4d2i7f68617f110ccdaa@mail.gmail.com> (Leonid Arsh's message of "Wed, 31 May 2006 12:58:38 +0300") References: <20060509060958.GA482@voltaire.com> <10e223bf0605310258k753df4d2i7f68617f110ccdaa@mail.gmail.com> Message-ID: Leonid> I just moved the structure from ipath_mad.c to ib_smi.h. Leonid> I see no special reason to have '__attribute__ ((packed))' Leonid> here. Isn't it a good style to have a structure packed Leonid> when it is sent over wire? No, since it leads to gcc generating much worse code on architectures like ia64. - R. From rdreier at cisco.com Wed May 31 07:38:00 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 07:38:00 -0700 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: <447D6816.2090405@mellanox.co.il> (Tziporet Koren's message of "Wed, 31 May 2006 12:55:34 +0300") References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> <20060529161405.GT21266@mellanox.co.il> <447D6816.2090405@mellanox.co.il> Message-ID: Tziporet> It will be very good if we have it for 2.6.17 too since Tziporet> then we can get it directly from git and no need for a Tziporet> fix in OFED. Also I think that ia64 customers that will Tziporet> work directly with kernel 2.6.17 will be bothered by Tziporet> these messages. It's too close to the 2.6.17 release for such a big patch, which fixes such a minor issue. - R. From rdreier at cisco.com Wed May 31 07:41:33 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 07:41:33 -0700 Subject: [openib-general] Re: [PATCH] IB/iser: do I/O path allocations with GFP_NOIO In-Reply-To: <20060531232930L.fujita.tomonori@lab.ntt.co.jp> (FUJITA Tomonori's message of "Wed, 31 May 2006 23:29:30 +0900") References: <447CA214.7010202@cs.wisc.edu> <20060531232930L.fujita.tomonori@lab.ntt.co.jp> Message-ID: > Roland, as proposed in the past, how about moving the iSER > and SRP drivers to drivers/scsi? As you said, they > straddle in SCSI and IB worlds, however, they are just > LLDs like iscsi_tcp, which straddle in SCSI and TCP > worlds. I have no problem with that, although in general I'm not thrilled about moving files around, since it usually is churn without a technical advantage. If SRP and iSER were in drivers/scsi, would that mean James would have to maintain them? That would be a change in workflow. - R. From rdreier at cisco.com Wed May 31 07:45:27 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 07:45:27 -0700 Subject: [openib-general] Re: ia64: kernel unaligned access in ipoib In-Reply-To: <20060531081644.GN21266@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 31 May 2006 11:16:44 +0300") References: <20060528131527.GX21266@mellanox.co.il> <20060529155350.GR21266@mellanox.co.il> <20060529161558.GU21266@mellanox.co.il> <20060530093930.GE21266@mellanox.co.il> <20060531081644.GN21266@mellanox.co.il> Message-ID: Thanks, I updated it. - R. From rdreier at cisco.com Wed May 31 07:50:56 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 07:50:56 -0700 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: <10e223bf0605310249t57e89b72ne8e38e57945a2cec@mail.gmail.com> (Leonid Arsh's message of "Wed, 31 May 2006 12:49:50 +0300") References: <20060509060958.GA482@voltaire.com> <10e223bf0605300403lcc24b8bwf1e1d7059edab416@mail.gmail.com> <10e223bf0605310249t57e89b72ne8e38e57945a2cec@mail.gmail.com> Message-ID: > the most urgent and critical case is > the SM failure/restart when the SM is not connected to the host directly. > In this cases neither PortError no PortActive events will be > generated on the host. > The SM will lose the multicast group configuration for the host and > the host will need to rejoin its multicast groups in this case. > IPoIB shall handle the problem by catching the ClientReregister event. It seems your patch doesn't help in this situation at all. Right now mthca will generate a LID_CHANGE event for any set of PortInfo; IPoIB will catch that event and rejoin all multicast groups. Your patch changes some of those events to CLIENT_REREGISTER events and has IPoIB treat them exactly the same way as LID_CHANGE events. So the behavior won't change at all. > There are additional cases. Any client which registers itself on the > SA, will need to handle this event in order to work properly after the > SM failure/restart. We'll need it very soon for a user mode > application. OK, but you could use LID_CHANGE events the same way as IPoIB does now. Since ClientReregister support is optional, and in fact you didn't fix ipath to generate these events, your app can't count on CLIENT_REREGISTER events being generated anyway. I'm not really opposed to these changes, but it is adding additional code for what looks like very minimal improvement. So I'm trying to understand how this really helps you. - R. From rdreier at cisco.com Wed May 31 07:54:02 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 07:54:02 -0700 Subject: [openib-general] Re: [PATCH] RFC: use stdint.h types In-Reply-To: <20060531085029.GP21266@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 31 May 2006 11:50:29 +0300") References: <20060531085029.GP21266@mellanox.co.il> Message-ID: My initial reaction is that I don't like this, since it makes it harder to keep the kernel ABI files in sync between libraries and the kernel. Does overriding offsetof() really cause any problems? Does including break anything? > add sysfs/libsysfs.h to ipathverbs > (not sure how does it currently compile on trunk without) I assume there's more to be fixed to make libipathverbs work, since it doesn't export the right driver entry point either. - R. From leonida at voltaire.com Wed May 31 07:54:14 2006 From: leonida at voltaire.com (Leonid Arsh) Date: Wed, 31 May 2006 17:54:14 +0300 Subject: [openib-general][RFC][PATCH] mthca: HCA initialization parameters Message-ID: <20060531145414.GA9851@voltaire.com> Roland, I'm resending the patch again. I removed the num_srq (Maximum number of Shared Receive Queues) parameter here - it's initialized according to the device limit and we don't need to change it, most probably. Regards, Leonid On 5/30/06, Leonid Arsh wrote: > > Roland, > > Further to our discussions I'm sending the fixed patch. > This patch implements the module parameters allowing the user to change the HCA initialization values. > I left only needed parameters and added the parameter validation. > Now the set of the parameters is closest to the profile parameters used in the old Mellanox driver. > The parameters may be read from the sysfs, but cannot be changed. > > Regards, > Leonid > Signed-off-by: Leonid Arsh --- openib-1.0/src/linux-kernel/infiniband/hw/mthca/mthca_main.c.orig 2006-05-31 20:28:28.000000000 +0300 +++ openib-1.0/src/linux-kernel/infiniband/hw/mthca/mthca_main.c.OK2 2006-05-31 20:34:30.000000000 +0300 @@ -81,9 +81,6 @@ module_param(tune_pci, int, 0444); MODULE_PARM_DESC(tune_pci, "increase PCI burst from the default set by BIOS if nonzero"); -static const char mthca_version[] __devinitdata = - DRV_NAME ": Mellanox InfiniBand HCA driver v" - DRV_VERSION " (" DRV_RELDATE ")\n"; static struct mthca_profile default_profile = { .num_qp = 1 << 16, @@ -97,6 +94,107 @@ .uarc_size = 1 << 18, /* Arbel only */ }; +module_param_named(num_qp, default_profile.num_qp, int, 0444); +MODULE_PARM_DESC(num_qp, "Maximum number of QPs available per HCA"); + +module_param_named(rdb_per_qp, default_profile.rdb_per_qp, int, 0444); +MODULE_PARM_DESC(rdb_per_qp, "Number of RDB buffers per QP"); + +module_param_named(num_cq, default_profile.num_cq, int, 0444); +MODULE_PARM_DESC(num_cq, "Maximum number of CQs per HCA"); + +module_param_named(num_mcg, default_profile.num_mcg, int, 0444); +MODULE_PARM_DESC(num_mcg, "Maximum number of Multicast groups per HCA"); + +module_param_named(num_mpt, default_profile.num_mpt, int, 0444); +MODULE_PARM_DESC(num_mpt, + "Maximum number of Memory Protection Table entries per HCA"); + +module_param_named(num_mtt, default_profile.num_mtt, int, 0444); +MODULE_PARM_DESC(num_mtt, + "Maximum number of Memory Translation table segments per HCA"); +/* Tavor only */ +module_param_named(num_udav, default_profile.num_udav, int, 0444); +MODULE_PARM_DESC(num_udav, "Maximum number of UD Address Vectors per HCA"); + +/* Tavor only */ +module_param_named(fmr_reserved_mtts, default_profile.fmr_reserved_mtts, int, 0444); +MODULE_PARM_DESC(fmr_reserved_mtts, + "Number of Memory Translation table segments reserved for FMR"); + +static const char mthca_version[] __devinitdata = + DRV_NAME ": Mellanox InfiniBand HCA driver v" + DRV_VERSION " (" DRV_RELDATE ")\n"; + +static int __devinit mthca_validate_profile(struct mthca_dev *mdev, + struct mthca_profile *profile) +{ + + if(default_profile.num_qp & (default_profile.num_qp-1)) { + mthca_err(mdev, "Invalid num_qp parameter value (%d).\n", + default_profile.num_qp); + goto err_inval; + } + + if(default_profile.rdb_per_qp & (default_profile.rdb_per_qp-1)) { + mthca_err(mdev, "Invalid rdb_per_qp parameter value (%d)\n", + default_profile.rdb_per_qp); + goto err_inval; + } + + if(default_profile.num_cq & (default_profile.num_cq-1)) { + mthca_err(mdev, "Invalid num_cq parameter value (%d)\n", + default_profile.num_cq); + goto err_inval; + } + + if(default_profile.num_mcg & (default_profile.num_mcg-1)) { + mthca_err(mdev, "Invalid num_mcg parameter value (%d)\n", + default_profile.num_mcg); + goto err_inval; + } + if(default_profile.num_mpt & (default_profile.num_mpt-1)) { + mthca_err(mdev, "Invalid num_mpt parameter value (%d)\n", + default_profile.num_mpt); + goto err_inval; + } + + if(default_profile.num_mtt & (default_profile.num_mtt-1)) { + mthca_err(mdev, "Invalid num_mtt parameter value (%d)\n", + default_profile.num_mtt); + goto err_inval; + } + + if (mthca_is_memfree(mdev)) { + + if(default_profile.num_udav & (default_profile.num_udav-1)) { + mthca_err(mdev, "Invalid num_udav parameter value (%d)\n", + default_profile.num_udav); + goto err_inval; + } + + if(default_profile.fmr_reserved_mtts & (default_profile.fmr_reserved_mtts-1)) { + mthca_err(mdev, "Invalid fmr_reserved_mtts parameter value (%d)\n", + default_profile.fmr_reserved_mtts); + goto err_inval; + } else if (default_profile.fmr_reserved_mtts >= default_profile.num_mtt ) { + mthca_err(mdev, + "Invalid fmr_reserved_mtts parameter value (%d). " + "Must be lower then num_mtt (%d)\n", + default_profile.fmr_reserved_mtts, + default_profile.num_mtt ); + return -EINVAL; + } + } + + return 0; + +err_inval: + mthca_err(mdev, "This parameter must be power of two.\n"); + return -EINVAL; + +} + static int __devinit mthca_tune_pci(struct mthca_dev *mdev) { int cap; @@ -994,6 +1092,7 @@ printk(KERN_INFO PFX "Initializing %s\n", pci_name(pdev)); + if (id->driver_data >= ARRAY_SIZE(mthca_hca_table)) { printk(KERN_ERR PFX "%s has invalid driver data %lx\n", pci_name(pdev), id->driver_data); @@ -1095,6 +1194,10 @@ if (err) goto err_cmd; + err = mthca_validate_profile(mdev, &default_profile); + if (err) + goto err_profile; + err = mthca_init_hca(mdev); if (err) goto err_cmd; @@ -1147,6 +1250,7 @@ mthca_close_hca(mdev); err_cmd: +err_profile: mthca_cmd_cleanup(mdev); err_free_dev: From leonida at voltaire.com Wed May 31 08:48:04 2006 From: leonida at voltaire.com (Leonid Arsh) Date: Wed, 31 May 2006 18:48:04 +0300 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: References: <20060509060958.GA482@voltaire.com> <10e223bf0605300403lcc24b8bwf1e1d7059edab416@mail.gmail.com> <10e223bf0605310249t57e89b72ne8e38e57945a2cec@mail.gmail.com> Message-ID: <10e223bf0605310848r20ae5dd3rdf279a387524a326@mail.gmail.com> Generating the LID_CHANGE event instead of CLIENT_REREGISTER is simply not correct. We need the event for our user mode applications. Although the patch doesn't change current functionality, I wouldn't like to write applications based on the erroneous code. The application won't just work with devices that generate the event correctly. Although the event is optional, it's very helpful and, I think, it will be supported by most of the devices/drivers soon. The fix does not affect the ipath behaviour, anyway. Thanks, Leonid On 5/31/06, Roland Dreier wrote: > > the most urgent and critical case is > > the SM failure/restart when the SM is not connected to the host directly. > > In this cases neither PortError no PortActive events will be > > generated on the host. > > The SM will lose the multicast group configuration for the host and > > the host will need to rejoin its multicast groups in this case. > > IPoIB shall handle the problem by catching the ClientReregister event. > > It seems your patch doesn't help in this situation at all. Right now > mthca will generate a LID_CHANGE event for any set of PortInfo; IPoIB > will catch that event and rejoin all multicast groups. Your patch > changes some of those events to CLIENT_REREGISTER events and has IPoIB > treat them exactly the same way as LID_CHANGE events. So the behavior > won't change at all. > > > There are additional cases. Any client which registers itself on the > > SA, will need to handle this event in order to work properly after the > > SM failure/restart. We'll need it very soon for a user mode > > application. > > OK, but you could use LID_CHANGE events the same way as IPoIB does > now. Since ClientReregister support is optional, and in fact you > didn't fix ipath to generate these events, your app can't count on > CLIENT_REREGISTER events being generated anyway. > > I'm not really opposed to these changes, but it is adding additional > code for what looks like very minimal improvement. So I'm trying to > understand how this really helps you. > > - R. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From caitlinb at broadcom.com Wed May 31 09:08:47 2006 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Wed, 31 May 2006 09:08:47 -0700 Subject: [openib-general] Re: QoS RFC Message-ID: <54AD0F12E08D1541B826BE97C98F99F150D212@NT-SJCA-0751.brcm.ad.broadcom.com> openib-general-bounces at openib.org wrote: > Roland > > This thread also indirectly brings up the issue of > OpenFabrics, RNIC, and QoS. > > The RNIC devices don't have to be, but are typically unified > wire devices, i.e. have simultaneous support for regular > Ethernet NIC functions such as LSO/TSO and checksum offloads; > iSCSI HBA initiator and/or target functionality; RNIC > functionality; and TCP/IP offload. > > All the usual Ethernet TCP/IP management and configuration > tools work as expected for such a device, e.g. for the > purpose of this discussion QoS configuration with DiffServ, etc. > > At the same time: if there's an OpenFabrics QoS API, then > this API needs to be transport agnostic, such that RNIC QoS > can be configured through this same OpenFabrics QoS API. > > To be clear: the QoS requirements presented in the RFC look > quite reasonable for an RNIC device, but the initial proposal > didn't abstract out non-essential (to QoS) IB detail. > > Regards, > > Asgeir Eiriksson > CTO > Chelsio Communications Inc. > > For the reasons Asgeir cites, I believe that any QoS solution that is compatible with a "unified wire" 'NIC' would be something so inclusive that the proper forum for its discussion would be netdev. Creating specialized controls to deal with a portion of the network traffic will not be generally useful. I have no opinion as to how well defined the proposal is for dealing specifically with InfiniBand traffic, but I do not think it represents a generalized approach to RDMA QoS for the simple reason that on an IP network there is no such thing as QoS for *one* type of traffic. QoS needs to cover all traffic to be of value. From rdreier at cisco.com Wed May 31 09:30:57 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 09:30:57 -0700 Subject: [openib-general][RFC][PATCH] mthca: HCA initialization parameters In-Reply-To: <20060531145414.GA9851@voltaire.com> (Leonid Arsh's message of "Wed, 31 May 2006 17:54:14 +0300") References: <20060531145414.GA9851@voltaire.com> Message-ID: This looks OK, but: > +module_param_named(num_mcg, default_profile.num_mcg, int, 0444); > +MODULE_PARM_DESC(num_mcg, "Maximum number of Multicast groups per HCA"); > + > +module_param_named(num_mpt, default_profile.num_mpt, int, 0444); > +MODULE_PARM_DESC(num_mpt, > + "Maximum number of Memory Protection Table entries per HCA"); > + > +module_param_named(num_mtt, default_profile.num_mtt, int, 0444); > +MODULE_PARM_DESC(num_mtt, > + "Maximum number of Memory Translation table segments per HCA"); very inconsistent capitalization in the help texts... > +{ > + extra blank line here > + if(default_profile.num_qp & (default_profile.num_qp-1)) { There should be a space in "if(" -- it should be "if (". Also there should be spaces around the "-" like "num_qp - 1". > +err_inval: > + mthca_err(mdev, "This parameter must be power of two.\n"); this all seems rather unfriendly -- maybe just round things up to a power of 2 and print a warning about it? > printk(KERN_INFO PFX "Initializing %s\n", > pci_name(pdev)); > > + > if (id->driver_data >= ARRAY_SIZE(mthca_hca_table)) { > printk(KERN_ERR PFX "%s has invalid driver data %lx\n", > pci_name(pdev), id->driver_data); strange whitespace change here... > + err = mthca_validate_profile(mdev, &default_profile); > + if (err) > + goto err_profile; > + > err = mthca_init_hca(mdev); why check the profile for every individual HCA? wouldn't it make more sense to do it once in the module_init function and refuse to load if the profile isn't valid? - R. From miail at 2sun.ru Wed May 31 09:31:59 2006 From: miail at 2sun.ru (Galina B.) Date: Wed, 31 May 2006 19:31:59 +0300 Subject: [openib-general] Re[4]:Greeting ! Message-ID: <005d01c684cf$b9fb7900$870b000a@BfUElS> Greeting, I am a lovely and lonely Lady who is looking for the man who will make me happy and whom I want to feel like in paradise with! If you want to be my beautiful Hero who will save me from this loneliness find me http://www.CuiE.come2meetlove.com/ and wake me up with a warm kiss. so long... Galina B. From mshefty at ichips.intel.com Wed May 31 09:35:27 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 31 May 2006 09:35:27 -0700 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: <10e223bf0605310249t57e89b72ne8e38e57945a2cec@mail.gmail.com> References: <20060509060958.GA482@voltaire.com> <10e223bf0605300403lcc24b8bwf1e1d7059edab416@mail.gmail.com> <10e223bf0605310249t57e89b72ne8e38e57945a2cec@mail.gmail.com> Message-ID: <447DC5CF.30605@ichips.intel.com> Leonid Arsh wrote: > the most urgent and critical case is > the SM failure/restart when the SM is not connected to the host directly. Why not patch the SM to handle this sort of case and rebuild its database without every client in the fabric needing to send it MADs? Why can't the SM save/restore the configuration itself? - Sean From mshefty at ichips.intel.com Wed May 31 09:48:56 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 31 May 2006 09:48:56 -0700 Subject: [openib-general] Re: Failed multicast join withnew multicast module In-Reply-To: <20060531090817.GQ21266@mellanox.co.il> References: <1149024804.4510.1056.camel@hal.voltaire.com> <20060531090817.GQ21266@mellanox.co.il> Message-ID: <447DC8F8.60409@ichips.intel.com> Michael S. Tsirkin wrote: > Hmm. I think ipoib used to handle this properly at the ULP level. I believe that this still works. Ipoib should leave all multicast groups, then rejoin when an event occurs. As long as no other clients join the ipoib groups, this should work. I've added a todo to have the multicast module re-register all multicast groups. - Sean From mshefty at ichips.intel.com Wed May 31 09:52:01 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 31 May 2006 09:52:01 -0700 Subject: [openib-general] [PATCH] CM: store and return attributes needed to send to a UD QP after SIDR In-Reply-To: References: Message-ID: <447DC9B1.8000405@ichips.intel.com> Sean Hefty wrote: > Modify the CM to maintain the necessary information needed to send > to a UD QP after a user has performed SIDR. Expose the remote QPN, > remote QKey, and address handle attributes through the > ib_cm_init_qp_attr() routine, so that the information is available > from userspace without changing the ABI. I'm re-working this patch to expose a new API. - Sean From anna_kimie at so-net.ne.jp Wed May 31 10:12:01 2006 From: anna_kimie at so-net.ne.jp (=?shift-jis?B?dGFrYWtv?=) Date: Wed, 31 May 2006 10:12:01 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCJEEkZyRDJEgkMCRpJCQbKEI=?= =?iso-2022-jp?b?GyRCISYhJiEmJSQlJCRzJDgkYyRKJCQhKRsoQg==?= Message-ID: <20060531171201.0CCD922834D@openib.ca.sandia.gov> 真剣な彼女探しから秘密の交際希望*:.。☆..。 ココはあなたのお望み通り素敵な女性との出逢いが堪能デキマス♪ アクセスは↓コチラ↓ http://www.meguriai-max.net/?j24 リッチでキュートなオバサマやピチピチ処女まで…簡単絞込みでアナタの好みの 女性が、最新のプロフィール検索でバッチリ見つかりマース♪ イチドクライ…↓コチラ↓ http://www.meguriai-max.net/?j24 配信停止ハコチラ p_for_the_pussycat at yahoo.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at mellanox.co.il Wed May 31 10:03:57 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 31 May 2006 20:03:57 +0300 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30243EF0F@mtlexch01.mtl.com> Hi Sean, Well, this is a very old topic well discussed years ago. All credits to Ashok Raj which you know better then me. The argument against the idea for the SM to be the keeper of these registrations goes as follows: Our problem has to do with data that is not persistently stored in the fabric (which is: Multicast MGID->MLID mapping, Service-Records and InformInfo records). Without the use of client re-registration the only way to keep this data consistent between the master SM and its standby slaves (or across SM restarts) is by using a transaction safe model. Transaction safe models are well known and require distributed handshake or journaling systems (in our case distributed ones). Anyway what this means is that every transaction (like creation of new Service Record or registering to a multicast group) will have to be first committed and approved by all standby SMs before it is responded. This strict transaction safe model is not fitting very well with the requirement for scalability of the fabric - which is hard to make even without that complication. The Client-Re-Registration concept resolves this need as the clients need to track their registrations and repeat them with the new master SM. Eitan Zahavi Senior Engineering Director, Software Architect Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Sean Hefty > Sent: Wednesday, May 31, 2006 7:35 PM > To: Leonid Arsh > Cc: Roland Dreier; openib-general at openib.org > Subject: Re: [openib-general][PATCH 1 of 3] repost: Client Reregister support for > kernel space > > Leonid Arsh wrote: > > the most urgent and critical case is > > the SM failure/restart when the SM is not connected to the host directly. > > Why not patch the SM to handle this sort of case and rebuild its database > without every client in the fabric needing to send it MADs? Why can't the SM > save/restore the configuration itself? > > - Sean > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From leonida at voltaire.com Wed May 31 10:11:17 2006 From: leonida at voltaire.com (Leonid Arsh) Date: Wed, 31 May 2006 20:11:17 +0300 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: <447DC5CF.30605@ichips.intel.com> References: <20060509060958.GA482@voltaire.com> <10e223bf0605300403lcc24b8bwf1e1d7059edab416@mail.gmail.com> <10e223bf0605310249t57e89b72ne8e38e57945a2cec@mail.gmail.com> <447DC5CF.30605@ichips.intel.com> Message-ID: <10e223bf0605311011x38197170r3aa8d97d48bb3de8@mail.gmail.com> On 5/31/06, Sean Hefty wrote: > Why not patch the SM to handle this sort of case and rebuild its database > without every client in the fabric needing to send it MADs? Why can't the SM > save/restore the configuration itself? Sending CleintReregister request in SM's business. It doesn't have to send it, but keeping information persistently on the SM may be problematic to implement. What if a client sent a join request during the SM reboot? The client will never know that SM is up again and it will never send join multicast requests. Regards, Leonid From rjsgjkkop at faftt.com Wed May 31 10:11:04 2006 From: rjsgjkkop at faftt.com (Gillian Brown) Date: Thu, 1 Jun 2006 02:11:04 +0900 Subject: [openib-general] Anuncio de trabajo ES, Stellenanzeige DE, Job announcement ENG Message-ID: <002301c684d5$d5f3e84e$b1744f3d@hco.uxlbq> ES WMN Consulting Group LLC es una empresa de asuntos de manejar (fue establecido en 1988). Nuestros expertos de tecnilogia industrial y consejeros de manejar ayudan nuestros clientes llevar la direcion de los poyectos de systemas computadoras. Hoy dia constantemente estamos buscando un grupo de unos contratistas independientes en toda la Europa. Contratistas independientes ayudan nuestros equipos tener la correspondencia con los jovenes e ingeniosos elaboradores de software con derechos completos. No se nececista alguna experiencia especial para obtener esta posicion. Salario a destajo es hasta 3000 euros para un mes. No hay gastos de comienzo o aportacion alguna para empezar trabajando con nosotros. Visita nuestro web site para recibir mas informacion y enivianos su peticion si esta interesado/interesada. Quedamos a la espera de sus noticias, Gillian Brown, gerente de Recursos Humanos www.wmn-consult.net DE WMN Consulting Group LLC ist eine Unternehmensberatung (gegrundet 1988). Unsere Experten fur die Technologien-Branche und Geschaftsleitungsfuhrungsberater helfen unseren Kunden, die Projekte im Bereich Computersysteme zu leiten. Momentan sind wir standig auf der Suche nach einer Gruppe selbstandiger Auftragnehmer aus ganz Europa. Unabhangige Auftragnehmer helfen unseren Teams, vollberechtigte Geschaftsbeziehungen zu jungen begabten Software-Entwicklern herzustellen. Diese Stelle erfordert keine Sonderfahigkeiten bzw. Sondererfahrung. Die Bezahlung erfolgt nach Akkordlohn und betragt ca. 3000,- Euro pro Monat. Es fallen keine Anlaufkosten bzw. andere Gebuhren an, um die Arbeit mit uns anfangen zu konnen. Bitte besuchen Sie unsere Website, um weitere Informationen zu bekommen und bei Interesse sich um diese Stelle zu bewerben. Wir sehen Ihren weiteren Nachrichten mit Interesse entgegen, Gillian Brown, Leiterin der Personalfuhrung www.wmn-consult.net ENG WMN Consulting Group LLC is a management consulting company (was founded in 1988). Our technology industry experts and management advisors help our clients in directing computer systems projects. At this moment we are constantly on the look out for the group of independent contractors around the Europe. Independent contractors help our teams to have full right business relationship with young talented software developers. No special skills or experience required for this position. Piece rate payment up to 5000 euros per month. There is no start-up cost or any other fee to begin working with us. Please visit our web site for more information and apply if you are interested. Look forward to hearing from you soon, Gillian Brown, human-resource manager www.wmn-consult.net From mshefty at ichips.intel.com Wed May 31 10:17:25 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 31 May 2006 10:17:25 -0700 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30243EF0F@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30243EF0F@mtlexch01.mtl.com> Message-ID: <447DCFA5.8010103@ichips.intel.com> Eitan Zahavi wrote: > Well, this is a very old topic well discussed years ago. All credits to > Ashok Raj which you know better then me. The argument against the idea > for the SM to be the keeper of these registrations goes as follows: Yes - and I still don't understand why this isn't a personal problem of the SM. > Without the use of client re-registration the only way to keep this data > consistent between the master SM and its standby slaves (or across SM > restarts) is by using a transaction safe model. There are probable other solutions to this problem. Couldn't the data also be written to a shared file system? > Transaction safe models are well known and require distributed handshake > or journaling systems (in our case distributed ones). Anyway what this > means is that every transaction (like creation of new Service Record or > registering to a multicast group) will have to be first committed and > approved by all standby SMs before it is responded. > > This strict transaction safe model is not fitting very well with the > requirement for scalability of the fabric - which is hard to make even > without that complication. How is this less scalable than every client in the fabric maintaining this data, detecting failures (if this can be done), and reissuing the requests? Having standby SMs rely on fabric clients in order to build their database just seems like the wrong approach. If they actually had an up to data database, then those SMs could also respond to queries. - Sean From halr at voltaire.com Wed May 31 10:09:54 2006 From: halr at voltaire.com (Hal Rosenstock) Date: 31 May 2006 13:09:54 -0400 Subject: [openib-general] Re: Failed multicast join withnew multicast module In-Reply-To: <447DC8F8.60409@ichips.intel.com> References: <1149024804.4510.1056.camel@hal.voltaire.com> <20060531090817.GQ21266@mellanox.co.il> <447DC8F8.60409@ichips.intel.com> Message-ID: <1149095100.4510.29902.camel@hal.voltaire.com> On Wed, 2006-05-31 at 12:48, Sean Hefty wrote: > Michael S. Tsirkin wrote: > > Hmm. I think ipoib used to handle this properly at the ULP level. > > I believe that this still works. Ipoib should leave all multicast groups, then > rejoin when an event occurs. As long as no other clients join the ipoib groups, > this should work. Are you saying that IPoIB handles the event ? Does the multicast module cooperate (in terms of reregistering for a group it may think is already registered) ? -- Hal > I've added a todo to have the multicast module re-register all multicast groups. > > - Sean > From mshefty at ichips.intel.com Wed May 31 10:31:16 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 31 May 2006 10:31:16 -0700 Subject: [openib-general] Re: Failed multicast join withnew multicast module In-Reply-To: <1149095100.4510.29902.camel@hal.voltaire.com> References: <1149024804.4510.1056.camel@hal.voltaire.com> <20060531090817.GQ21266@mellanox.co.il> <447DC8F8.60409@ichips.intel.com> <1149095100.4510.29902.camel@hal.voltaire.com> Message-ID: <447DD2E4.3030709@ichips.intel.com> Hal Rosenstock wrote: >>I believe that this still works. Ipoib should leave all multicast groups, then >>rejoin when an event occurs. As long as no other clients join the ipoib groups, >>this should work. > > > Are you saying that IPoIB handles the event ? Does the multicast module > cooperate (in terms of reregistering for a group it may think is already > registered) ? Ipoib handles the event. It first leaves all multicast groups, then re-joins the groups. Technically, the multicast module does _not_ cooperate. That is, if the multicast module receives a join request for a group that it is already a member of, it will simply increment a reference count, rather than send a new join request. (Assuming that the join states match.) The multicast module should work in this specific case, since the only client is ipoib, and ipoib first leaves the group before re-joining. - Sean From eitan at mellanox.co.il Wed May 31 10:45:48 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 31 May 2006 20:45:48 +0300 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30243EF16@mtlexch01.mtl.com> Hi Sean, Leonid just sent an example for a race that might happen if the SM is to be the maintainer of the data. > Eitan Zahavi wrote: > > Well, this is a very old topic well discussed years ago. All credits to > > Ashok Raj which you know better then me. The argument against the idea > > for the SM to be the keeper of these registrations goes as follows: > > Yes - and I still don't understand why this isn't a personal problem of the SM. [EZ] Anyway, I hoped for some modesty. > > > Without the use of client re-registration the only way to keep this data > > consistent between the master SM and its standby slaves (or across SM > > restarts) is by using a transaction safe model. > > There are probable other solutions to this problem. Couldn't the data also be > written to a shared file system? [EZ] Yes a journaling file system that is never dropping transactions but still does not impose extra latency is what you are looking for - and I do not think you will find such ideal solution > > > Transaction safe models are well known and require distributed handshake > > or journaling systems (in our case distributed ones). Anyway what this > > means is that every transaction (like creation of new Service Record or > > registering to a multicast group) will have to be first committed and > > approved by all standby SMs before it is responded. > > > > This strict transaction safe model is not fitting very well with the > > requirement for scalability of the fabric - which is hard to make even > > without that complication. > > How is this less scalable than every client in the fabric maintaining this data, > detecting failures (if this can be done), and reissuing the requests? [EZ] The SM is a single entity that has to respond to all requests from the entire cluster. (Even redirection requests). When you require that SM to also provide transaction safe storage or even worse then that consistency with multiple standby SMs you worsen the problem. The clients on the their side only need to maintain their own registrations. I proposed in the past a SA client will do this job in a transparent manner for all clients on a machine. ( So I thought we have already gone through this ritual...) > > Having standby SMs rely on fabric clients in order to build their database just > seems like the wrong approach. If they actually had an up to data database, > then those SMs could also respond to queries. [EZ] Well to me and several other people this seems the right approach. > > - Sean From mshefty at ichips.intel.com Wed May 31 11:14:50 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 31 May 2006 11:14:50 -0700 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30243EF16@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30243EF16@mtlexch01.mtl.com> Message-ID: <447DDD1A.9070500@ichips.intel.com> Eitan Zahavi wrote: > Leonid just sent an example for a race that might happen if the SM is to > be the maintainer of the data. The race Leonid mentioned is a client sending a request when the SM is down. That request will fail, so there's no data for the SM to maintain for that node. That's a retry condition that the client must deal with. > [EZ] The SM is a single entity that has to respond to all requests from > the entire cluster. (Even redirection requests). When you require that > SM to also provide transaction safe storage or even worse then that > consistency with multiple standby SMs you worsen the problem. The > clients on the their side only need to maintain their own registrations. I don't believe that there's any requirement that the SM be a single system. But I do believe that the SM should be able to recover from all SM problems without interrupting any existing communication that is occurring the fabric. SM failover or failure/restart should be as transparent to the clients (i.e the non-SM nodes in the fabric) as possible. (Btw, I also believe that the SM should run on top of a real DBMS and support SQL style queries...) You don't want to push this problem to every application running in the fabric, so why even push it to every node in the fabric? - Sean From swise at opengridcomputing.com Wed May 31 11:26:50 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 31 May 2006 13:26:50 -0500 Subject: [openib-general] [PATCH 0/2][RFC] iWARP Core Support Message-ID: <20060531182650.3308.81538.stgit@stevo-desktop> This patchset defines the modifications to the Linux infiniband subsystem to support iWARP devices. We're submitting it for review now with the goal for inclusion in the 2.6.19 kernel. This code has gone through several reviews in the openib-general list. Now we are submitting it for external review by the linux community. This StGIT patchset is cloned from Roland Dreier's infiniband.git for-2.6.18 branch. The patchset consists of 2 patches: 1 - New iWARP CM implementation. 2 - Core changes to support iWARP. Signed-off-by: Tom Tucker Signed-off-by: Steve Wise From swise at opengridcomputing.com Wed May 31 11:26:52 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 31 May 2006 13:26:52 -0500 Subject: [openib-general] [PATCH 1/2] iWARP Connection Manager. In-Reply-To: <20060531182650.3308.81538.stgit@stevo-desktop> References: <20060531182650.3308.81538.stgit@stevo-desktop> Message-ID: <20060531182652.3308.1244.stgit@stevo-desktop> This patch provides the new files implementing the iWARP Connection Manager. --- drivers/infiniband/core/iwcm.c | 887 ++++++++++++++++++++++++++++++++++++++++ include/rdma/iw_cm.h | 254 +++++++++++ include/rdma/iw_cm_private.h | 62 +++ 3 files changed, 1203 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c new file mode 100644 index 0000000..5657ee8 --- /dev/null +++ b/drivers/infiniband/core/iwcm.c @@ -0,0 +1,887 @@ +/* + * Copyright (c) 2004, 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2004 Topspin Corporation. All rights reserved. + * Copyright (c) 2004, 2005 Voltaire Corporation. All rights reserved. + * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * Copyright (c) 2005 Network Appliance, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +MODULE_AUTHOR("Tom Tucker"); +MODULE_DESCRIPTION("iWARP CM"); +MODULE_LICENSE("Dual BSD/GPL"); + +static struct workqueue_struct *iwcm_wq; +struct iwcm_work { + struct work_struct work; + struct iwcm_id_private *cm_id; + struct list_head list; + struct iw_cm_event event; +}; + +/* + * Release a reference on cm_id. If the last reference is being removed + * and iw_destroy_cm_id is waiting, wake up the waiting thread. + */ +static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) +{ + int ret = 0; + + BUG_ON(atomic_read(&cm_id_priv->refcount)==0); + if (atomic_dec_and_test(&cm_id_priv->refcount)) { + BUG_ON(!list_empty(&cm_id_priv->work_list)); + if (waitqueue_active(&cm_id_priv->destroy_wait)) { + BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING); + BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, + &cm_id_priv->flags)); + ret = 1; + wake_up(&cm_id_priv->destroy_wait); + } + } + + return ret; +} + +static void add_ref(struct iw_cm_id *cm_id) +{ + struct iwcm_id_private *cm_id_priv; + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + atomic_inc(&cm_id_priv->refcount); +} + +static void rem_ref(struct iw_cm_id *cm_id) +{ + struct iwcm_id_private *cm_id_priv; + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + iwcm_deref_id(cm_id_priv); +} + +static void cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event); + +struct iw_cm_id *iw_create_cm_id(struct ib_device *device, + iw_cm_handler cm_handler, + void *context) +{ + struct iwcm_id_private *cm_id_priv; + + cm_id_priv = kzalloc(sizeof *cm_id_priv, GFP_KERNEL); + if (!cm_id_priv) + return ERR_PTR(-ENOMEM); + + cm_id_priv->state = IW_CM_STATE_IDLE; + cm_id_priv->id.device = device; + cm_id_priv->id.cm_handler = cm_handler; + cm_id_priv->id.context = context; + cm_id_priv->id.event_handler = cm_event_handler; + cm_id_priv->id.add_ref = add_ref; + cm_id_priv->id.rem_ref = rem_ref; + spin_lock_init(&cm_id_priv->lock); + atomic_set(&cm_id_priv->refcount, 1); + init_waitqueue_head(&cm_id_priv->connect_wait); + init_waitqueue_head(&cm_id_priv->destroy_wait); + INIT_LIST_HEAD(&cm_id_priv->work_list); + + return &cm_id_priv->id; +} +EXPORT_SYMBOL(iw_create_cm_id); + + +static int iwcm_modify_qp_err(struct ib_qp *qp) +{ + struct ib_qp_attr qp_attr; + + if (!qp) + return -EINVAL; + + qp_attr.qp_state = IB_QPS_ERR; + return ib_modify_qp(qp, &qp_attr, IB_QP_STATE); +} + +/* + * This is really the RDMAC CLOSING state. It is most similar to the + * IB SQD QP state. + */ +static int iwcm_modify_qp_sqd(struct ib_qp *qp) +{ + struct ib_qp_attr qp_attr; + + BUG_ON(qp == NULL); + qp_attr.qp_state = IB_QPS_SQD; + return ib_modify_qp(qp, &qp_attr, IB_QP_STATE); +} + +/* + * CM_ID <-- CLOSING + * + * Block if a passive or active connection is currenlty being processed. Then + * process the event as follows: + * - If we are ESTABLISHED, move to CLOSING and modify the QP state + * based on the abrupt flag + * - If the connection is already in the CLOSING or IDLE state, the peer is + * disconnecting concurrently with us and we've already seen the + * DISCONNECT event -- ignore the request and return 0 + * - Disconnect on a listening endpoint returns -EINVAL + */ +int iw_cm_disconnect(struct iw_cm_id *cm_id, int abrupt) +{ + struct iwcm_id_private *cm_id_priv; + unsigned long flags; + int ret = 0; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + /* Wait if we're currently in a connect or accept downcall */ + wait_event(cm_id_priv->connect_wait, + !test_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags)); + + spin_lock_irqsave(&cm_id_priv->lock, flags); + switch (cm_id_priv->state) { + case IW_CM_STATE_ESTABLISHED: + cm_id_priv->state = IW_CM_STATE_CLOSING; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + if (cm_id_priv->qp) { /* QP could be for user-mode client */ + if (abrupt) + ret = iwcm_modify_qp_err(cm_id_priv->qp); + else + ret = iwcm_modify_qp_sqd(cm_id_priv->qp); + /* + * If both sides are disconnecting the QP could + * already be in ERR or SQD states + */ + ret = 0; + } + else + ret = -EINVAL; + break; + case IW_CM_STATE_LISTEN: + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ret = -EINVAL; + break; + case IW_CM_STATE_CLOSING: + /* remote peer closed first */ + case IW_CM_STATE_IDLE: + /* accept or connect returned !0 */ + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + break; + case IW_CM_STATE_CONN_RECV: + /* + * App called disconnect before/without calling accept after + * connect_request event delivered. + */ + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + break; + case IW_CM_STATE_CONN_SENT: + /* Can only get here if wait above fails */ + default: + BUG_ON(1); + } + + return ret; +} +EXPORT_SYMBOL(iw_cm_disconnect); + +/* + * CM_ID <-- DESTROYING + * + * Clean up all resources associated with the connection and release + * the initial reference taken by iw_create_cm_id. + */ +static void destroy_cm_id(struct iw_cm_id *cm_id) +{ + struct iwcm_id_private *cm_id_priv; + unsigned long flags; + int ret; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + /* Wait if we're currently in a connect or accept downcall. A + * listening endpoint should never block here. */ + wait_event(cm_id_priv->connect_wait, + !test_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags)); + + spin_lock_irqsave(&cm_id_priv->lock, flags); + switch (cm_id_priv->state) { + case IW_CM_STATE_LISTEN: + cm_id_priv->state = IW_CM_STATE_DESTROYING; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + /* destroy the listening endpoint */ + ret = cm_id->device->iwcm->destroy_listen(cm_id); + break; + case IW_CM_STATE_ESTABLISHED: + cm_id_priv->state = IW_CM_STATE_DESTROYING; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + /* Abrupt close of the connection */ + (void)iwcm_modify_qp_err(cm_id_priv->qp); + break; + case IW_CM_STATE_IDLE: + case IW_CM_STATE_CLOSING: + cm_id_priv->state = IW_CM_STATE_DESTROYING; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + break; + case IW_CM_STATE_CONN_RECV: + /* + * App called destroy before/without calling accept after + * receiving connection request event notification. + */ + cm_id_priv->state = IW_CM_STATE_DESTROYING; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + break; + case IW_CM_STATE_CONN_SENT: + case IW_CM_STATE_DESTROYING: + default: + BUG_ON(1); + break; + } + + spin_lock_irqsave(&cm_id_priv->lock, flags); + if (cm_id_priv->qp) { + cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp); + cm_id_priv->qp = NULL; + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + + (void)iwcm_deref_id(cm_id_priv); +} + +/* + * This function is only called by the application thread and cannot + * be called by the event thread. The function will wait for all + * references to be released on the cm_id and then kfree the cm_id + * object. + */ +void iw_destroy_cm_id(struct iw_cm_id *cm_id) +{ + struct iwcm_id_private *cm_id_priv; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)); + + destroy_cm_id(cm_id); + + wait_event(cm_id_priv->destroy_wait, + !atomic_read(&cm_id_priv->refcount)); + + kfree(cm_id_priv); +} +EXPORT_SYMBOL(iw_destroy_cm_id); + +/* + * CM_ID <-- LISTEN + * + * Start listening for connect requests. Generates one CONNECT_REQUEST + * event for each inbound connect request. + */ +int iw_cm_listen(struct iw_cm_id *cm_id, int backlog) +{ + struct iwcm_id_private *cm_id_priv; + unsigned long flags; + int ret = 0; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + spin_lock_irqsave(&cm_id_priv->lock, flags); + switch (cm_id_priv->state) { + case IW_CM_STATE_IDLE: + cm_id_priv->state = IW_CM_STATE_LISTEN; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ret = cm_id->device->iwcm->create_listen(cm_id, backlog); + if (ret) + cm_id_priv->state = IW_CM_STATE_IDLE; + break; + default: + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ret = -EINVAL; + } + + return ret; +} +EXPORT_SYMBOL(iw_cm_listen); + +/* + * CM_ID <-- IDLE + * + * Rejects an inbound connection request. No events are generated. + */ +int iw_cm_reject(struct iw_cm_id *cm_id, + const void *private_data, + u8 private_data_len) +{ + struct iwcm_id_private *cm_id_priv; + unsigned long flags; + int ret; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + + spin_lock_irqsave(&cm_id_priv->lock, flags); + if (cm_id_priv->state != IW_CM_STATE_CONN_RECV) { + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + wake_up_all(&cm_id_priv->connect_wait); + return -EINVAL; + } + cm_id_priv->state = IW_CM_STATE_IDLE; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + + ret = cm_id->device->iwcm->reject(cm_id, private_data, + private_data_len); + + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + wake_up_all(&cm_id_priv->connect_wait); + + return ret; +} +EXPORT_SYMBOL(iw_cm_reject); + +/* + * CM_ID <-- ESTABLISHED + * + * Accepts an inbound connection request and generates an ESTABLISHED + * event. Callers of iw_cm_disconnect and iw_destroy_cm_id will block + * until the ESTABLISHED event is received from the provider. + */ +int iw_cm_accept(struct iw_cm_id *cm_id, + struct iw_cm_conn_param *iw_param) +{ + struct iwcm_id_private *cm_id_priv; + struct ib_qp *qp; + unsigned long flags; + int ret; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + + spin_lock_irqsave(&cm_id_priv->lock, flags); + if (cm_id_priv->state != IW_CM_STATE_CONN_RECV) { + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + wake_up_all(&cm_id_priv->connect_wait); + return -EINVAL; + } + /* Get the ib_qp given the QPN */ + qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn); + if (!qp) { + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + return -EINVAL; + } + cm_id->device->iwcm->add_ref(qp); + cm_id_priv->qp = qp; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + + ret = cm_id->device->iwcm->accept(cm_id, iw_param); + if (ret) { + /* An error on accept precludes provider events */ + BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_RECV); + cm_id_priv->state = IW_CM_STATE_IDLE; + spin_lock_irqsave(&cm_id_priv->lock, flags); + if (cm_id_priv->qp) { + cm_id->device->iwcm->rem_ref(qp); + cm_id_priv->qp = NULL; + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + printk("Accept failed, ret=%d\n", ret); + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + wake_up_all(&cm_id_priv->connect_wait); + } + + return ret; +} +EXPORT_SYMBOL(iw_cm_accept); + +/* + * Active Side: CM_ID <-- CONN_SENT + * + * If successful, results in the generation of a CONNECT_REPLY + * event. iw_cm_disconnect and iw_cm_destroy will block until the + * CONNECT_REPLY event is received from the provider. + */ +int iw_cm_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param) +{ + struct iwcm_id_private *cm_id_priv; + int ret = 0; + unsigned long flags; + struct ib_qp *qp; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + + spin_lock_irqsave(&cm_id_priv->lock, flags); + if (cm_id_priv->state != IW_CM_STATE_IDLE) { + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + wake_up_all(&cm_id_priv->connect_wait); + return -EINVAL; + } + + /* Get the ib_qp given the QPN */ + qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn); + if (!qp) { + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + return -EINVAL; + } + cm_id->device->iwcm->add_ref(qp); + cm_id_priv->qp = qp; + cm_id_priv->state = IW_CM_STATE_CONN_SENT; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + + ret = cm_id->device->iwcm->connect(cm_id, iw_param); + if (ret) { + spin_lock_irqsave(&cm_id_priv->lock, flags); + if (cm_id_priv->qp) { + cm_id->device->iwcm->rem_ref(qp); + cm_id_priv->qp = NULL; + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_SENT); + cm_id_priv->state = IW_CM_STATE_IDLE; + printk("Connect failed, ret=%d\n", ret); + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + wake_up_all(&cm_id_priv->connect_wait); + } + + return ret; +} +EXPORT_SYMBOL(iw_cm_connect); + +/* + * Passive Side: new CM_ID <-- CONN_RECV + * + * Handles an inbound connect request. The function creates a new + * iw_cm_id to represent the new connection and inherits the client + * callback function and other attributes from the listening parent. + * + * The work item contains a pointer to the listen_cm_id and the event. The + * listen_cm_id contains the client cm_handler, context and + * device. These are copied when the device is cloned. The event + * contains the new four tuple. + * + * An error on the child should not affect the parent, so this + * function does not return a value. + */ +static void cm_conn_req_handler(struct iwcm_id_private *listen_id_priv, + struct iw_cm_event *iw_event) +{ + unsigned long flags; + struct iw_cm_id *cm_id; + struct iwcm_id_private *cm_id_priv; + int ret; + + /* The provider should never generate a connection request + * event with a bad status. + */ + BUG_ON(iw_event->status); + + /* We could be destroying the listening id. If so, ignore this + * upcall. */ + spin_lock_irqsave(&listen_id_priv->lock, flags); + if (listen_id_priv->state != IW_CM_STATE_LISTEN) { + spin_unlock_irqrestore(&listen_id_priv->lock, flags); + return; + } + spin_unlock_irqrestore(&listen_id_priv->lock, flags); + + cm_id = iw_create_cm_id(listen_id_priv->id.device, + listen_id_priv->id.cm_handler, + listen_id_priv->id.context); + /* If the cm_id could not be created, ignore the request */ + if (IS_ERR(cm_id)) + return; + + cm_id->provider_data = iw_event->provider_data; + cm_id->local_addr = iw_event->local_addr; + cm_id->remote_addr = iw_event->remote_addr; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + cm_id_priv->state = IW_CM_STATE_CONN_RECV; + + /* Call the client CM handler */ + ret = cm_id->cm_handler(cm_id, iw_event); + if (ret) { + printk("destroying child id %p, ret=%d\n", + cm_id, ret); + set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); + destroy_cm_id(cm_id); + if (atomic_read(&cm_id_priv->refcount)==0) + kfree(cm_id); + } +} + +/* + * Passive Side: CM_ID <-- ESTABLISHED + * + * The provider generated an ESTABLISHED event which means that + * the MPA negotion has completed successfully and we are now in MPA + * FPDU mode. + * + * This event can only be received in the CONN_RECV state. If the + * remote peer closed, the ESTABLISHED event would be received followed + * by the CLOSE event. If the app closes, it will block until we wake + * it up after processing this event. + */ +static int cm_conn_est_handler(struct iwcm_id_private *cm_id_priv, + struct iw_cm_event *iw_event) +{ + unsigned long flags; + int ret = 0; + + spin_lock_irqsave(&cm_id_priv->lock, flags); + + /* We clear the CONNECT_WAIT bit here to allow the callback + * function to call iw_cm_disconnect. Calling iw_destroy_cm_id + * from a callback handler is not allowed */ + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + switch (cm_id_priv->state) { + case IW_CM_STATE_CONN_RECV: + cm_id_priv->state = IW_CM_STATE_ESTABLISHED; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event); + break; + default: + BUG_ON(1); + } + wake_up_all(&cm_id_priv->connect_wait); + + return ret; +} + +/* + * Active Side: CM_ID <-- ESTABLISHED + * + * The app has called connect and is waiting for the established event to + * post it's requests to the server. This event will wake up anyone + * blocked in iw_cm_disconnect or iw_destroy_id. + */ +static int cm_conn_rep_handler(struct iwcm_id_private *cm_id_priv, + struct iw_cm_event *iw_event) +{ + unsigned long flags; + int ret = 0; + + spin_lock_irqsave(&cm_id_priv->lock, flags); + /* Clear the connect wait bit so a callback function calling + * iw_cm_disconnect will not wait and deadlock this thread */ + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); + switch (cm_id_priv->state) { + case IW_CM_STATE_CONN_SENT: + if (iw_event->status == IW_CM_EVENT_STATUS_ACCEPTED) { + cm_id_priv->id.local_addr = iw_event->local_addr; + cm_id_priv->id.remote_addr = iw_event->remote_addr; + cm_id_priv->state = IW_CM_STATE_ESTABLISHED; + } else { + /* REJECTED or RESET */ + cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp); + cm_id_priv->qp = NULL; + cm_id_priv->state = IW_CM_STATE_IDLE; + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event); + break; + default: + BUG_ON(1); + } + /* Wake up waiters on connect complete */ + wake_up_all(&cm_id_priv->connect_wait); + + return ret; +} + +/* + * CM_ID <-- CLOSING + * + * If in the ESTABLISHED state, move to CLOSING. + */ +static void cm_disconnect_handler(struct iwcm_id_private *cm_id_priv, + struct iw_cm_event *iw_event) +{ + unsigned long flags; + + spin_lock_irqsave(&cm_id_priv->lock, flags); + if (cm_id_priv->state == IW_CM_STATE_ESTABLISHED) + cm_id_priv->state = IW_CM_STATE_CLOSING; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); +} + +/* + * CM_ID <-- IDLE + * + * If in the ESTBLISHED or CLOSING states, the QP will have have been + * moved by the provider to the ERR state. Disassociate the CM_ID from + * the QP, move to IDLE, and remove the 'connected' reference. + * + * If in some other state, the cm_id was destroyed asynchronously. + * This is the last reference that will result in waking up + * the app thread blocked in iw_destroy_cm_id. + */ +static int cm_close_handler(struct iwcm_id_private *cm_id_priv, + struct iw_cm_event *iw_event) +{ + unsigned long flags; + int ret = 0; + /* TT */printk("%s:%d cm_id_priv=%p, state=%d\n", + __FUNCTION__, __LINE__, + cm_id_priv,cm_id_priv->state); + spin_lock_irqsave(&cm_id_priv->lock, flags); + + if (cm_id_priv->qp) { + cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp); + cm_id_priv->qp = NULL; + } + switch (cm_id_priv->state) { + case IW_CM_STATE_ESTABLISHED: + case IW_CM_STATE_CLOSING: + cm_id_priv->state = IW_CM_STATE_IDLE; + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event); + break; + case IW_CM_STATE_DESTROYING: + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + break; + default: + BUG_ON(1); + } + + return ret; +} + +static int process_event(struct iwcm_id_private *cm_id_priv, + struct iw_cm_event *iw_event) +{ + int ret = 0; + + switch (iw_event->event) { + case IW_CM_EVENT_CONNECT_REQUEST: + cm_conn_req_handler(cm_id_priv, iw_event); + break; + case IW_CM_EVENT_CONNECT_REPLY: + ret = cm_conn_rep_handler(cm_id_priv, iw_event); + break; + case IW_CM_EVENT_ESTABLISHED: + ret = cm_conn_est_handler(cm_id_priv, iw_event); + break; + case IW_CM_EVENT_DISCONNECT: + cm_disconnect_handler(cm_id_priv, iw_event); + break; + case IW_CM_EVENT_CLOSE: + ret = cm_close_handler(cm_id_priv, iw_event); + break; + default: + BUG_ON(1); + } + + return ret; +} + +/* + * Process events on the work_list for the cm_id. If the callback + * function requests that the cm_id be deleted, a flag is set in the + * cm_id flags to indicate that when the last reference is + * removed, the cm_id is to be destroyed. This is necessary to + * distinguish between an object that will be destroyed by the app + * thread asleep on the destroy_wait list vs. an object destroyed + * here synchronously when the last reference is removed. + */ +static void cm_work_handler(void *arg) +{ + struct iwcm_work *work = (struct iwcm_work*)arg; + struct iwcm_id_private *cm_id_priv = work->cm_id; + unsigned long flags; + int empty; + int ret = 0; + + spin_lock_irqsave(&cm_id_priv->lock, flags); + empty = list_empty(&cm_id_priv->work_list); + while (!empty) { + work = list_entry(cm_id_priv->work_list.next, + struct iwcm_work, list); + list_del_init(&work->list); + empty = list_empty(&cm_id_priv->work_list); + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + + ret = process_event(cm_id_priv, &work->event); + kfree(work); + if (ret) { + set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); + destroy_cm_id(&cm_id_priv->id); + } + BUG_ON(atomic_read(&cm_id_priv->refcount)==0); + if (iwcm_deref_id(cm_id_priv)) + return; + + if (atomic_read(&cm_id_priv->refcount)==0 && + test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { + kfree(cm_id_priv); + return; + } + spin_lock_irqsave(&cm_id_priv->lock, flags); + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); +} + +/* + * This function is called on interrupt context. Schedule events on + * the iwcm_wq thread to allow callback functions to downcall into + * the CM and/or block. Events are queued to a per-CM_ID + * work_list. If this is the first event on the work_list, the work + * element is also queued on the iwcm_wq thread. + * + * Each event holds a reference on the cm_id. Until the last posted + * event has been delivered and processed, the cm_id cannot be + * deleted. + */ +static void cm_event_handler(struct iw_cm_id *cm_id, + struct iw_cm_event *iw_event) +{ + struct iwcm_work *work; + struct iwcm_id_private *cm_id_priv; + unsigned long flags; + + work = kmalloc(sizeof *work, GFP_ATOMIC); + if (!work) + return; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + atomic_inc(&cm_id_priv->refcount); + + INIT_WORK(&work->work, cm_work_handler, work); + work->cm_id = cm_id_priv; + work->event = *iw_event; + + spin_lock_irqsave(&cm_id_priv->lock, flags); + if (list_empty(&cm_id_priv->work_list)) { + list_add_tail(&work->list, &cm_id_priv->work_list); + queue_work(iwcm_wq, &work->work); + } else + list_add_tail(&work->list, &cm_id_priv->work_list); + spin_unlock_irqrestore(&cm_id_priv->lock, flags); +} + +static int iwcm_init_qp_init_attr(struct iwcm_id_private *cm_id_priv, + struct ib_qp_attr *qp_attr, + int *qp_attr_mask) +{ + unsigned long flags; + int ret; + + spin_lock_irqsave(&cm_id_priv->lock, flags); + switch (cm_id_priv->state) { + case IW_CM_STATE_IDLE: + case IW_CM_STATE_CONN_SENT: + case IW_CM_STATE_CONN_RECV: + case IW_CM_STATE_ESTABLISHED: + *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS; + qp_attr->qp_access_flags = IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_WRITE| + IB_ACCESS_REMOTE_READ; + ret = 0; + break; + default: + ret = -EINVAL; + break; + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + return ret; +} + +static int iwcm_init_qp_rts_attr(struct iwcm_id_private *cm_id_priv, + struct ib_qp_attr *qp_attr, + int *qp_attr_mask) +{ + unsigned long flags; + int ret; + + spin_lock_irqsave(&cm_id_priv->lock, flags); + switch (cm_id_priv->state) { + case IW_CM_STATE_IDLE: + case IW_CM_STATE_CONN_SENT: + case IW_CM_STATE_CONN_RECV: + case IW_CM_STATE_ESTABLISHED: + *qp_attr_mask = 0; + ret = 0; + break; + default: + ret = -EINVAL; + break; + } + spin_unlock_irqrestore(&cm_id_priv->lock, flags); + return ret; +} + +int iw_cm_init_qp_attr(struct iw_cm_id *cm_id, + struct ib_qp_attr *qp_attr, + int *qp_attr_mask) +{ + struct iwcm_id_private *cm_id_priv; + int ret; + + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); + switch (qp_attr->qp_state) { + case IB_QPS_INIT: + case IB_QPS_RTR: + ret = iwcm_init_qp_init_attr(cm_id_priv, + qp_attr, qp_attr_mask); + break; + case IB_QPS_RTS: + ret = iwcm_init_qp_rts_attr(cm_id_priv, + qp_attr, qp_attr_mask); + break; + default: + ret = -EINVAL; + break; + } + return ret; +} +EXPORT_SYMBOL(iw_cm_init_qp_attr); + +static int __init iw_cm_init(void) +{ + iwcm_wq = create_singlethread_workqueue("iw_cm_wq"); + if (!iwcm_wq) + return -ENOMEM; + + return 0; +} + +static void __exit iw_cm_cleanup(void) +{ + destroy_workqueue(iwcm_wq); +} + +module_init(iw_cm_init); +module_exit(iw_cm_cleanup); diff --git a/include/rdma/iw_cm.h b/include/rdma/iw_cm.h new file mode 100644 index 0000000..0752a94 --- /dev/null +++ b/include/rdma/iw_cm.h @@ -0,0 +1,254 @@ +/* + * Copyright (c) 2005 Network Appliance, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#if !defined(IW_CM_H) +#define IW_CM_H + +#include +#include + +struct iw_cm_id; + +enum iw_cm_event_type { + IW_CM_EVENT_CONNECT_REQUEST = 1, /* connect request received */ + IW_CM_EVENT_CONNECT_REPLY, /* reply from active connect request */ + IW_CM_EVENT_ESTABLISHED, /* passive side accept successful */ + IW_CM_EVENT_DISCONNECT, /* orderly shutdown */ + IW_CM_EVENT_CLOSE /* close complete */ +}; +enum iw_cm_event_status { + IW_CM_EVENT_STATUS_OK = 0, /* request successful */ + IW_CM_EVENT_STATUS_ACCEPTED = 0, /* connect request accepted */ + IW_CM_EVENT_STATUS_REJECTED, /* connect request rejected */ + IW_CM_EVENT_STATUS_TIMEOUT, /* the operation timed out */ + IW_CM_EVENT_STATUS_RESET, /* reset from remote peer */ + IW_CM_EVENT_STATUS_EINVAL, /* asynchronous failure for bad parm */ +}; +struct iw_cm_event { + enum iw_cm_event_type event; + enum iw_cm_event_status status; + struct sockaddr_in local_addr; + struct sockaddr_in remote_addr; + void *private_data; + u8 private_data_len; + void* provider_data; +}; + +/** + * iw_cm_handler - Function to be called by the IW CM when delivering events + * to the client. + * + * @cm_id: The IW CM identifier associated with the event. + * @event: Pointer to the event structure. + */ +typedef int (*iw_cm_handler)(struct iw_cm_id *cm_id, + struct iw_cm_event *event); + +/** + * iw_event_handler - Function called by the provider when delivering provider + * events to the IW CM. + * + * @cm_id: The IW CM identifier associated with the event. + * @event: Pointer to the event structure. + */ +typedef void (*iw_event_handler)(struct iw_cm_id *cm_id, + struct iw_cm_event *event); +struct iw_cm_id { + iw_cm_handler cm_handler; /* client callback function */ + void *context; /* client cb context */ + struct ib_device *device; + struct sockaddr_in local_addr; + struct sockaddr_in remote_addr; + void *provider_data; /* provider private data */ + iw_event_handler event_handler; /* cb for provider + events */ + /* Used by provider to add and remove refs on IW cm_id */ + void (*add_ref)(struct iw_cm_id *); + void (*rem_ref)(struct iw_cm_id *); +}; + +struct iw_cm_conn_param { + const void *private_data; + u16 private_data_len; + u32 ord; + u32 ird; + u32 qpn; +}; + +struct iw_cm_verbs { + void (*add_ref)(struct ib_qp *qp); + + void (*rem_ref)(struct ib_qp *qp); + + struct ib_qp * (*get_qp)(struct ib_device *device, + int qpn); + + int (*connect)(struct iw_cm_id *cm_id, + struct iw_cm_conn_param *conn_param); + + int (*accept)(struct iw_cm_id *cm_id, + struct iw_cm_conn_param *conn_param); + + int (*reject)(struct iw_cm_id *cm_id, + const void *pdata, u8 pdata_len); + + int (*create_listen)(struct iw_cm_id *cm_id, + int backlog); + + int (*destroy_listen)(struct iw_cm_id *cm_id); +}; + +/** + * iw_create_cm_id - Create an IW CM identifier. + * + * @device: The IB device on which to create the IW CM identier. + * @event_handler: User callback invoked to report events associated with the + * returned IW CM identifier. + * @context: User specified context associated with the id. + */ +struct iw_cm_id *iw_create_cm_id(struct ib_device *device, + iw_cm_handler cm_handler, void *context); + +/** + * iw_destroy_cm_id - Destroy an IW CM identifier. + * + * @cm_id: The previously created IW CM identifier to destroy. + * + * The client can assume that no events will be delivered for the CM ID after + * this function returns. + */ +void iw_destroy_cm_id(struct iw_cm_id *cm_id); + +/** + * iw_cm_bind_qp - Unbind the specified IW CM identifier and QP + * + * @cm_id: The IW CM idenfier to unbind from the QP. + * @qp: The QP + * + * This is called by the provider when destroying the QP to ensure + * that any references held by the IWCM are released. It may also + * be called by the IWCM when destroying a CM_ID to that any + * references held by the provider are released. + */ +void iw_cm_unbind_qp(struct iw_cm_id *cm_id, struct ib_qp *qp); + +/** + * iw_cm_get_qp - Return the ib_qp associated with a QPN + * + * @ib_device: The IB device + * @qpn: The queue pair number + */ +struct ib_qp *iw_cm_get_qp(struct ib_device *device, int qpn); + +/** + * iw_cm_listen - Listen for incoming connection requests on the + * specified IW CM id. + * + * @cm_id: The IW CM identifier. + * @backlog: The maximum number of outstanding un-accepted inbound listen + * requests to queue. + * + * The source address and port number are specified in the IW CM identifier + * structure. + */ +int iw_cm_listen(struct iw_cm_id *cm_id, int backlog); + +/** + * iw_cm_accept - Called to accept an incoming connect request. + * + * @cm_id: The IW CM identifier associated with the connection request. + * @iw_param: Pointer to a structure containing connection establishment + * parameters. + * + * The specified cm_id will have been provided in the event data for a + * CONNECT_REQUEST event. Subsequent events related to this connection will be + * delivered to the specified IW CM identifier prior and may occur prior to + * the return of this function. If this function returns a non-zero value, the + * client can assume that no events will be delivered to the specified IW CM + * identifier. + */ +int iw_cm_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param); + +/** + * iw_cm_reject - Reject an incoming connection request. + * + * @cm_id: Connection identifier associated with the request. + * @private_daa: Pointer to data to deliver to the remote peer as part of the + * reject message. + * @private_data_len: The number of bytes in the private_data parameter. + * + * The client can assume that no events will be delivered to the specified IW + * CM identifier following the return of this function. The private_data + * buffer is available for reuse when this function returns. + */ +int iw_cm_reject(struct iw_cm_id *cm_id, const void *private_data, + u8 private_data_len); + +/** + * iw_cm_connect - Called to request a connection to a remote peer. + * + * @cm_id: The IW CM identifier for the connection. + * @iw_param: Pointer to a structure containing connection establishment + * parameters. + * + * Events may be delivered to the specified IW CM identifier prior to the + * return of this function. If this function returns a non-zero value, the + * client can assume that no events will be delivered to the specified IW CM + * identifier. + */ +int iw_cm_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param); + +/** + * iw_cm_disconnect - Close the specified connection. + * + * @cm_id: The IW CM identifier to close. + * @abrupt: If 0, the connection will be closed gracefully, otherwise, the + * connection will be reset. + * + * The IW CM identifier is still active until the IW_CM_EVENT_CLOSE event is + * delivered. + */ +int iw_cm_disconnect(struct iw_cm_id *cm_id, int abrupt); + +/** + * iw_cm_init_qp_attr - Called to initialize the attributes of the QP + * associated with a IW CM identifier. + * + * @cm_id: The IW CM identifier associated with the QP + * @qp_attr: Pointer to the QP attributes structure. + * @qp_attr_mask: Pointer to a bit vector specifying which QP attributes are + * valid. + */ +int iw_cm_init_qp_attr(struct iw_cm_id *cm_id, struct ib_qp_attr *qp_attr, + int *qp_attr_mask); + +#endif /* IW_CM_H */ diff --git a/include/rdma/iw_cm_private.h b/include/rdma/iw_cm_private.h new file mode 100644 index 0000000..d07034e --- /dev/null +++ b/include/rdma/iw_cm_private.h @@ -0,0 +1,62 @@ +/* + * Copyright (c) 2005 Network Appliance, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#if !defined(IW_CM_PRIVATE_H) +#define IW_CM_PRIVATE_H + +#include + +enum iw_cm_state { + IW_CM_STATE_IDLE, /* unbound, inactive */ + IW_CM_STATE_LISTEN, /* listen waiting for connect */ + IW_CM_STATE_CONN_RECV, /* inbound waiting for user accept */ + IW_CM_STATE_CONN_SENT, /* outbound waiting for peer accept */ + IW_CM_STATE_ESTABLISHED, /* established */ + IW_CM_STATE_CLOSING, /* disconnect */ + IW_CM_STATE_DESTROYING /* object being deleted */ +}; + +struct iwcm_id_private { + struct iw_cm_id id; + enum iw_cm_state state; + unsigned long flags; + struct ib_qp *qp; + wait_queue_head_t destroy_wait; + wait_queue_head_t connect_wait; + struct list_head work_list; + spinlock_t lock; + atomic_t refcount; +}; +#define IWCM_F_CALLBACK_DESTROY 1 +#define IWCM_F_CONNECT_WAIT 2 + +#endif /* IW_CM_PRIVATE_H */ From swise at opengridcomputing.com Wed May 31 11:26:55 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 31 May 2006 13:26:55 -0500 Subject: [openib-general] [PATCH 2/2] iWARP Core Changes. In-Reply-To: <20060531182650.3308.81538.stgit@stevo-desktop> References: <20060531182650.3308.81538.stgit@stevo-desktop> Message-ID: <20060531182654.3308.41372.stgit@stevo-desktop> This patch contains modifications to the existing rdma header files, core files, drivers, and ulp files to support iWARP. --- drivers/infiniband/core/Makefile | 4 drivers/infiniband/core/addr.c | 8 - drivers/infiniband/core/cache.c | 8 - drivers/infiniband/core/cm.c | 3 drivers/infiniband/core/cma.c | 349 +++++++++++++++++++++++--- drivers/infiniband/core/device.c | 6 drivers/infiniband/core/mad.c | 11 + drivers/infiniband/core/sa_query.c | 5 drivers/infiniband/core/smi.c | 18 + drivers/infiniband/core/sysfs.c | 18 + drivers/infiniband/core/ucm.c | 5 drivers/infiniband/core/user_mad.c | 9 - drivers/infiniband/hw/ipath/ipath_verbs.c | 2 drivers/infiniband/hw/mthca/mthca_provider.c | 2 drivers/infiniband/ulp/ipoib/ipoib_main.c | 8 + drivers/infiniband/ulp/srp/ib_srp.c | 2 include/rdma/ib_addr.h | 15 + include/rdma/ib_verbs.h | 39 +++ 18 files changed, 427 insertions(+), 85 deletions(-) diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 68e73ec..163d991 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -1,7 +1,7 @@ infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS) := ib_addr.o rdma_cm.o obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_sa.o \ - ib_cm.o $(infiniband-y) + ib_cm.o iw_cm.o $(infiniband-y) obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o @@ -14,6 +14,8 @@ ib_sa-y := sa_query.o ib_cm-y := cm.o +iw_cm-y := iwcm.o + rdma_cm-y := cma.o ib_addr-y := addr.o diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index d294bbc..5a9be54 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -60,12 +60,15 @@ static LIST_HEAD(req_list); static DECLARE_WORK(work, process_req, NULL); static struct workqueue_struct *addr_wq; -static int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, +int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, unsigned char *dst_dev_addr) { switch (dev->type) { case ARPHRD_INFINIBAND: - dev_addr->dev_type = IB_NODE_CA; + dev_addr->dev_type = RDMA_NODE_IB_CA; + break; + case ARPHRD_ETHER: + dev_addr->dev_type = RDMA_NODE_RNIC; break; default: return -EADDRNOTAVAIL; @@ -77,6 +80,7 @@ static int copy_addr(struct rdma_dev_add memcpy(dev_addr->dst_dev_addr, dst_dev_addr, MAX_ADDR_LEN); return 0; } +EXPORT_SYMBOL(copy_addr); int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr) { diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index e05ca2c..061858c 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -32,13 +32,12 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: cache.c 1349 2004-12-16 21:09:43Z roland $ + * $Id: cache.c 6885 2006-05-03 18:22:02Z sean.hefty $ */ #include #include #include -#include /* INIT_WORK, schedule_work(), flush_scheduled_work() */ #include @@ -62,12 +61,13 @@ struct ib_update_work { static inline int start_port(struct ib_device *device) { - return device->node_type == IB_NODE_SWITCH ? 0 : 1; + return (device->node_type == RDMA_NODE_IB_SWITCH) ? 0 : 1; } static inline int end_port(struct ib_device *device) { - return device->node_type == IB_NODE_SWITCH ? 0 : device->phys_port_cnt; + return (device->node_type == RDMA_NODE_IB_SWITCH) ? + 0 : device->phys_port_cnt; } int ib_get_cached_gid(struct ib_device *device, diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 1c7463b..cf43ccb 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3253,6 +3253,9 @@ static void cm_add_one(struct ib_device int ret; u8 i; + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + cm_dev = kmalloc(sizeof(*cm_dev) + sizeof(*port) * device->phys_port_cnt, GFP_KERNEL); if (!cm_dev) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 94555d2..2e0be1d 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -43,6 +43,7 @@ #include #include #include #include +#include MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("Generic RDMA CM Agent"); @@ -124,6 +125,7 @@ struct rdma_id_private { int query_id; union { struct ib_cm_id *ib; + struct iw_cm_id *iw; } cm_id; u32 seq_num; @@ -259,13 +261,23 @@ static void cma_detach_from_dev(struct r id_priv->cma_dev = NULL; } -static int cma_acquire_ib_dev(struct rdma_id_private *id_priv) +static int cma_acquire_dev(struct rdma_id_private *id_priv) { + enum rdma_node_type dev_type = id_priv->id.route.addr.dev_addr.dev_type; struct cma_device *cma_dev; union ib_gid *gid; int ret = -ENODEV; - gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr); + switch (rdma_node_get_transport(dev_type)) { + case RDMA_TRANSPORT_IB: + gid = ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr); + break; + case RDMA_TRANSPORT_IWARP: + gid = iw_addr_get_sgid(&id_priv->id.route.addr.dev_addr); + break; + default: + return -ENODEV; + } mutex_lock(&lock); list_for_each_entry(cma_dev, &dev_list, list) { @@ -280,16 +292,6 @@ static int cma_acquire_ib_dev(struct rdm return ret; } -static int cma_acquire_dev(struct rdma_id_private *id_priv) -{ - switch (id_priv->id.route.addr.dev_addr.dev_type) { - case IB_NODE_CA: - return cma_acquire_ib_dev(id_priv); - default: - return -ENODEV; - } -} - static void cma_deref_id(struct rdma_id_private *id_priv) { if (atomic_dec_and_test(&id_priv->refcount)) @@ -347,6 +349,16 @@ static int cma_init_ib_qp(struct rdma_id IB_QP_PKEY_INDEX | IB_QP_PORT); } +static int cma_init_iw_qp(struct rdma_id_private *id_priv, struct ib_qp *qp) +{ + struct ib_qp_attr qp_attr; + + qp_attr.qp_state = IB_QPS_INIT; + qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE; + + return ib_modify_qp(qp, &qp_attr, IB_QP_STATE | IB_QP_ACCESS_FLAGS); +} + int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd, struct ib_qp_init_attr *qp_init_attr) { @@ -362,10 +374,13 @@ int rdma_create_qp(struct rdma_cm_id *id if (IS_ERR(qp)) return PTR_ERR(qp); - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: ret = cma_init_ib_qp(id_priv, qp); break; + case RDMA_TRANSPORT_IWARP: + ret = cma_init_iw_qp(id_priv, qp); + break; default: ret = -ENOSYS; break; @@ -451,13 +466,17 @@ int rdma_init_qp_attr(struct rdma_cm_id int ret; id_priv = container_of(id, struct rdma_id_private, id); - switch (id_priv->id.device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id_priv->id.device->node_type)) { + case RDMA_TRANSPORT_IB: ret = ib_cm_init_qp_attr(id_priv->cm_id.ib, qp_attr, qp_attr_mask); if (qp_attr->qp_state == IB_QPS_RTR) qp_attr->rq_psn = id_priv->seq_num; break; + case RDMA_TRANSPORT_IWARP: + ret = iw_cm_init_qp_attr(id_priv->cm_id.iw, qp_attr, + qp_attr_mask); + break; default: ret = -ENOSYS; break; @@ -590,8 +609,8 @@ static int cma_notify_user(struct rdma_i static void cma_cancel_route(struct rdma_id_private *id_priv) { - switch (id_priv->id.device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id_priv->id.device->node_type)) { + case RDMA_TRANSPORT_IB: if (id_priv->query) ib_sa_cancel_query(id_priv->query_id, id_priv->query); break; @@ -611,11 +630,15 @@ static void cma_destroy_listen(struct rd cma_exch(id_priv, CMA_DESTROYING); if (id_priv->cma_dev) { - switch (id_priv->id.device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id_priv->id.device->node_type)) { + case RDMA_TRANSPORT_IB: if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib)) ib_destroy_cm_id(id_priv->cm_id.ib); break; + case RDMA_TRANSPORT_IWARP: + if (id_priv->cm_id.iw && !IS_ERR(id_priv->cm_id.iw)) + iw_destroy_cm_id(id_priv->cm_id.iw); + break; default: break; } @@ -690,11 +713,15 @@ void rdma_destroy_id(struct rdma_cm_id * cma_cancel_operation(id_priv, state); if (id_priv->cma_dev) { - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib)) ib_destroy_cm_id(id_priv->cm_id.ib); break; + case RDMA_TRANSPORT_IWARP: + if (id_priv->cm_id.iw && !IS_ERR(id_priv->cm_id.iw)) + iw_destroy_cm_id(id_priv->cm_id.iw); + break; default: break; } @@ -868,7 +895,7 @@ static struct rdma_id_private *cma_new_i ib_addr_set_sgid(&rt->addr.dev_addr, &rt->path_rec[0].sgid); ib_addr_set_dgid(&rt->addr.dev_addr, &rt->path_rec[0].dgid); ib_addr_set_pkey(&rt->addr.dev_addr, be16_to_cpu(rt->path_rec[0].pkey)); - rt->addr.dev_addr.dev_type = IB_NODE_CA; + rt->addr.dev_addr.dev_type = RDMA_NODE_IB_CA; id_priv = container_of(id, struct rdma_id_private, id); id_priv->state = CMA_CONNECT; @@ -897,7 +924,7 @@ static int cma_req_handler(struct ib_cm_ } atomic_inc(&conn_id->dev_remove); - ret = cma_acquire_ib_dev(conn_id); + ret = cma_acquire_dev(conn_id); if (ret) { ret = -ENODEV; cma_release_remove(conn_id); @@ -981,6 +1008,124 @@ static void cma_set_compare_data(enum rd } } +static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) +{ + struct rdma_id_private *id_priv = iw_id->context; + enum rdma_cm_event_type event = 0; + struct sockaddr_in *sin; + int ret = 0; + + atomic_inc(&id_priv->dev_remove); + + switch (iw_event->event) { + case IW_CM_EVENT_CLOSE: + event = RDMA_CM_EVENT_DISCONNECTED; + break; + case IW_CM_EVENT_CONNECT_REPLY: + sin = (struct sockaddr_in*)&id_priv->id.route.addr.src_addr; + *sin = iw_event->local_addr; + sin = (struct sockaddr_in*)&id_priv->id.route.addr.dst_addr; + *sin = iw_event->remote_addr; + if (iw_event->status) + event = RDMA_CM_EVENT_REJECTED; + else + event = RDMA_CM_EVENT_ESTABLISHED; + break; + case IW_CM_EVENT_ESTABLISHED: + event = RDMA_CM_EVENT_ESTABLISHED; + break; + default: + BUG_ON(1); + } + + ret = cma_notify_user(id_priv, event, iw_event->status, + iw_event->private_data, + iw_event->private_data_len); + if (ret) { + /* Destroy the CM ID by returning a non-zero value. */ + id_priv->cm_id.iw = NULL; + cma_exch(id_priv, CMA_DESTROYING); + cma_release_remove(id_priv); + rdma_destroy_id(&id_priv->id); + return ret; + } + + cma_release_remove(id_priv); + return ret; +} + +struct net_device *ip_dev_find(u32 ip); +static int iw_conn_req_handler(struct iw_cm_id *cm_id, + struct iw_cm_event *iw_event) +{ + struct rdma_cm_id *new_cm_id; + struct rdma_id_private *listen_id, *conn_id; + struct sockaddr_in *sin; + struct net_device *dev; + int ret; + + listen_id = cm_id->context; + atomic_inc(&listen_id->dev_remove); + if (!cma_comp(listen_id, CMA_LISTEN)) { + ret = -ECONNABORTED; + goto out; + } + + /* Create a new RDMA id for the new IW CM ID */ + new_cm_id = rdma_create_id(listen_id->id.event_handler, + listen_id->id.context, + RDMA_PS_TCP); + if (!new_cm_id) { + ret = -ENOMEM; + goto out; + } + conn_id = container_of(new_cm_id, struct rdma_id_private, id); + atomic_inc(&conn_id->dev_remove); + conn_id->state = CMA_CONNECT; + + dev = ip_dev_find(iw_event->local_addr.sin_addr.s_addr); + if (!dev) { + ret = -EADDRNOTAVAIL; + rdma_destroy_id(new_cm_id); + goto out; + } + ret = copy_addr(&conn_id->id.route.addr.dev_addr, dev, NULL); + if (ret) { + rdma_destroy_id(new_cm_id); + goto out; + } + + ret = cma_acquire_dev(conn_id); + if (ret) { + rdma_destroy_id(new_cm_id); + goto out; + } + + conn_id->cm_id.iw = cm_id; + cm_id->context = conn_id; + cm_id->cm_handler = cma_iw_handler; + + sin = (struct sockaddr_in*)&new_cm_id->route.addr.src_addr; + *sin = iw_event->local_addr; + sin = (struct sockaddr_in*)&new_cm_id->route.addr.dst_addr; + *sin = iw_event->remote_addr; + + ret = cma_notify_user(conn_id, RDMA_CM_EVENT_CONNECT_REQUEST, 0, + iw_event->private_data, + iw_event->private_data_len); + if (ret) { + /* User wants to destroy the CM ID */ + conn_id->cm_id.iw = NULL; + cma_exch(conn_id, CMA_DESTROYING); + cma_release_remove(conn_id); + rdma_destroy_id(&conn_id->id); + } + +out: + cma_release_remove(listen_id); + return ret; +} + static int cma_ib_listen(struct rdma_id_private *id_priv) { struct ib_cm_compare_data compare_data; @@ -1010,6 +1155,30 @@ static int cma_ib_listen(struct rdma_id_ return ret; } +static int cma_iw_listen(struct rdma_id_private *id_priv, int backlog) +{ + int ret; + struct sockaddr_in *sin; + + id_priv->cm_id.iw = iw_create_cm_id(id_priv->id.device, + iw_conn_req_handler, + id_priv); + if (IS_ERR(id_priv->cm_id.iw)) + return PTR_ERR(id_priv->cm_id.iw); + + sin = (struct sockaddr_in*)&id_priv->id.route.addr.src_addr; + id_priv->cm_id.iw->local_addr = *sin; + + ret = iw_cm_listen(id_priv->cm_id.iw, backlog); + + if (ret) { + iw_destroy_cm_id(id_priv->cm_id.iw); + id_priv->cm_id.iw = NULL; + } + + return ret; +} + static int cma_listen_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) { @@ -1085,12 +1254,17 @@ int rdma_listen(struct rdma_cm_id *id, i return -EINVAL; if (id->device) { - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: ret = cma_ib_listen(id_priv); if (ret) goto err; break; + case RDMA_TRANSPORT_IWARP: + ret = cma_iw_listen(id_priv, backlog); + if (ret) + goto err; + break; default: ret = -ENOSYS; goto err; @@ -1229,6 +1403,23 @@ err: } EXPORT_SYMBOL(rdma_set_ib_paths); +static int cma_resolve_iw_route(struct rdma_id_private *id_priv, int timeout_ms) +{ + struct cma_work *work; + + work = kzalloc(sizeof *work, GFP_KERNEL); + if (!work) + return -ENOMEM; + + work->id = id_priv; + INIT_WORK(&work->work, cma_work_handler, work); + work->old_state = CMA_ROUTE_QUERY; + work->new_state = CMA_ROUTE_RESOLVED; + work->event.event = RDMA_CM_EVENT_ROUTE_RESOLVED; + queue_work(cma_wq, &work->work); + return 0; +} + int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms) { struct rdma_id_private *id_priv; @@ -1239,10 +1430,13 @@ int rdma_resolve_route(struct rdma_cm_id return -EINVAL; atomic_inc(&id_priv->refcount); - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: ret = cma_resolve_ib_route(id_priv, timeout_ms); break; + case RDMA_TRANSPORT_IWARP: + ret = cma_resolve_iw_route(id_priv, timeout_ms); + break; default: ret = -ENOSYS; break; @@ -1646,6 +1840,47 @@ out: return ret; } +static int cma_connect_iw(struct rdma_id_private *id_priv, + struct rdma_conn_param *conn_param) +{ + struct iw_cm_id *cm_id; + struct sockaddr_in* sin; + int ret; + struct iw_cm_conn_param iw_param; + + cm_id = iw_create_cm_id(id_priv->id.device, cma_iw_handler, id_priv); + if (IS_ERR(cm_id)) { + ret = PTR_ERR(cm_id); + goto out; + } + + id_priv->cm_id.iw = cm_id; + + sin = (struct sockaddr_in*)&id_priv->id.route.addr.src_addr; + cm_id->local_addr = *sin; + + sin = (struct sockaddr_in*)&id_priv->id.route.addr.dst_addr; + cm_id->remote_addr = *sin; + + ret = cma_modify_qp_rtr(&id_priv->id); + if (ret) { + iw_destroy_cm_id(cm_id); + return ret; + } + + iw_param.ord = conn_param->initiator_depth; + iw_param.ird = conn_param->responder_resources; + iw_param.private_data = conn_param->private_data; + iw_param.private_data_len = conn_param->private_data_len; + if (id_priv->id.qp) + iw_param.qpn = id_priv->qp_num; + else + iw_param.qpn = conn_param->qp_num; + ret = iw_cm_connect(cm_id, &iw_param); +out: + return ret; +} + int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) { struct rdma_id_private *id_priv; @@ -1661,10 +1896,13 @@ int rdma_connect(struct rdma_cm_id *id, id_priv->srq = conn_param->srq; } - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: ret = cma_connect_ib(id_priv, conn_param); break; + case RDMA_TRANSPORT_IWARP: + ret = cma_connect_iw(id_priv, conn_param); + break; default: ret = -ENOSYS; break; @@ -1705,6 +1943,28 @@ static int cma_accept_ib(struct rdma_id_ return ib_send_cm_rep(id_priv->cm_id.ib, &rep); } +static int cma_accept_iw(struct rdma_id_private *id_priv, + struct rdma_conn_param *conn_param) +{ + struct iw_cm_conn_param iw_param; + int ret; + + ret = cma_modify_qp_rtr(&id_priv->id); + if (ret) + return ret; + + iw_param.ord = conn_param->initiator_depth; + iw_param.ird = conn_param->responder_resources; + iw_param.private_data = conn_param->private_data; + iw_param.private_data_len = conn_param->private_data_len; + if (id_priv->id.qp) { + iw_param.qpn = id_priv->qp_num; + } else + iw_param.qpn = conn_param->qp_num; + + return iw_cm_accept(id_priv->cm_id.iw, &iw_param); +} + int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) { struct rdma_id_private *id_priv; @@ -1720,13 +1980,16 @@ int rdma_accept(struct rdma_cm_id *id, s id_priv->srq = conn_param->srq; } - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: if (conn_param) ret = cma_accept_ib(id_priv, conn_param); else ret = cma_rep_recv(id_priv); break; + case RDMA_TRANSPORT_IWARP: + ret = cma_accept_iw(id_priv, conn_param); + break; default: ret = -ENOSYS; break; @@ -1753,12 +2016,16 @@ int rdma_reject(struct rdma_cm_id *id, c if (!cma_comp(id_priv, CMA_CONNECT)) return -EINVAL; - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: ret = ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, private_data, private_data_len); break; + case RDMA_TRANSPORT_IWARP: + ret = iw_cm_reject(id_priv->cm_id.iw, + private_data, private_data_len); + break; default: ret = -ENOSYS; break; @@ -1777,16 +2044,18 @@ int rdma_disconnect(struct rdma_cm_id *i !cma_comp(id_priv, CMA_DISCONNECT)) return -EINVAL; - ret = cma_modify_qp_err(id); - if (ret) - goto out; - - switch (id->device->node_type) { - case IB_NODE_CA: + switch (rdma_node_get_transport(id->device->node_type)) { + case RDMA_TRANSPORT_IB: + ret = cma_modify_qp_err(id); + if (ret) + goto out; /* Initiate or respond to a disconnect. */ if (ib_send_cm_dreq(id_priv->cm_id.ib, NULL, 0)) ib_send_cm_drep(id_priv->cm_id.ib, NULL, 0); break; + case RDMA_TRANSPORT_IWARP: + ret = iw_cm_disconnect(id_priv->cm_id.iw, 0); + break; default: break; } diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index b2f3cb9..7318fba 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -30,7 +30,7 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: device.c 1349 2004-12-16 21:09:43Z roland $ + * $Id: device.c 5943 2006-03-22 00:58:04Z roland $ */ #include @@ -505,7 +505,7 @@ int ib_query_port(struct ib_device *devi u8 port_num, struct ib_port_attr *port_attr) { - if (device->node_type == IB_NODE_SWITCH) { + if (device->node_type == RDMA_NODE_IB_SWITCH) { if (port_num) return -EINVAL; } else if (port_num < 1 || port_num > device->phys_port_cnt) @@ -580,7 +580,7 @@ int ib_modify_port(struct ib_device *dev u8 port_num, int port_modify_mask, struct ib_port_modify *port_modify) { - if (device->node_type == IB_NODE_SWITCH) { + if (device->node_type == RDMA_NODE_IB_SWITCH) { if (port_num) return -EINVAL; } else if (port_num < 1 || port_num > device->phys_port_cnt) diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index b38e02a..a928ecf 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2004-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2005 Intel Corporation. All rights reserved. * Copyright (c) 2005 Mellanox Technologies Ltd. All rights reserved. * @@ -31,7 +31,7 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: mad.c 5596 2006-03-03 01:00:07Z sean.hefty $ + * $Id: mad.c 7294 2006-05-17 18:12:30Z roland $ */ #include #include @@ -2877,7 +2877,10 @@ static void ib_mad_init_device(struct ib { int start, end, i; - if (device->node_type == IB_NODE_SWITCH) { + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + + if (device->node_type == RDMA_NODE_IB_SWITCH) { start = 0; end = 0; } else { @@ -2924,7 +2927,7 @@ static void ib_mad_remove_device(struct { int i, num_ports, cur_port; - if (device->node_type == IB_NODE_SWITCH) { + if (device->node_type == RDMA_NODE_IB_SWITCH) { num_ports = 1; cur_port = 0; } else { diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 501cc05..4230277 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -887,7 +887,10 @@ static void ib_sa_add_one(struct ib_devi struct ib_sa_device *sa_dev; int s, e, i; - if (device->node_type == IB_NODE_SWITCH) + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + + if (device->node_type == RDMA_NODE_IB_SWITCH) s = e = 0; else { s = 1; diff --git a/drivers/infiniband/core/smi.c b/drivers/infiniband/core/smi.c index 35852e7..b81b2b9 100644 --- a/drivers/infiniband/core/smi.c +++ b/drivers/infiniband/core/smi.c @@ -34,7 +34,7 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: smi.c 1389 2004-12-27 22:56:47Z roland $ + * $Id: smi.c 5258 2006-02-01 20:32:40Z sean.hefty $ */ #include @@ -64,7 +64,7 @@ int smi_handle_dr_smp_send(struct ib_smp /* C14-9:2 */ if (hop_ptr && hop_ptr < hop_cnt) { - if (node_type != IB_NODE_SWITCH) + if (node_type != RDMA_NODE_IB_SWITCH) return 0; /* smp->return_path set when received */ @@ -77,7 +77,7 @@ int smi_handle_dr_smp_send(struct ib_smp if (hop_ptr == hop_cnt) { /* smp->return_path set when received */ smp->hop_ptr++; - return (node_type == IB_NODE_SWITCH || + return (node_type == RDMA_NODE_IB_SWITCH || smp->dr_dlid == IB_LID_PERMISSIVE); } @@ -95,7 +95,7 @@ int smi_handle_dr_smp_send(struct ib_smp /* C14-13:2 */ if (2 <= hop_ptr && hop_ptr <= hop_cnt) { - if (node_type != IB_NODE_SWITCH) + if (node_type != RDMA_NODE_IB_SWITCH) return 0; smp->hop_ptr--; @@ -107,7 +107,7 @@ int smi_handle_dr_smp_send(struct ib_smp if (hop_ptr == 1) { smp->hop_ptr--; /* C14-13:3 -- SMPs destined for SM shouldn't be here */ - return (node_type == IB_NODE_SWITCH || + return (node_type == RDMA_NODE_IB_SWITCH || smp->dr_slid == IB_LID_PERMISSIVE); } @@ -142,7 +142,7 @@ int smi_handle_dr_smp_recv(struct ib_smp /* C14-9:2 -- intermediate hop */ if (hop_ptr && hop_ptr < hop_cnt) { - if (node_type != IB_NODE_SWITCH) + if (node_type != RDMA_NODE_IB_SWITCH) return 0; smp->return_path[hop_ptr] = port_num; @@ -156,7 +156,7 @@ int smi_handle_dr_smp_recv(struct ib_smp smp->return_path[hop_ptr] = port_num; /* smp->hop_ptr updated when sending */ - return (node_type == IB_NODE_SWITCH || + return (node_type == RDMA_NODE_IB_SWITCH || smp->dr_dlid == IB_LID_PERMISSIVE); } @@ -175,7 +175,7 @@ int smi_handle_dr_smp_recv(struct ib_smp /* C14-13:2 */ if (2 <= hop_ptr && hop_ptr <= hop_cnt) { - if (node_type != IB_NODE_SWITCH) + if (node_type != RDMA_NODE_IB_SWITCH) return 0; /* smp->hop_ptr updated when sending */ @@ -190,7 +190,7 @@ int smi_handle_dr_smp_recv(struct ib_smp return 1; } /* smp->hop_ptr updated when sending */ - return (node_type == IB_NODE_SWITCH); + return (node_type == RDMA_NODE_IB_SWITCH); } /* C14-13:4 -- hop_ptr = 0 -> give to SM */ diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c index 21f9282..cfd2c06 100644 --- a/drivers/infiniband/core/sysfs.c +++ b/drivers/infiniband/core/sysfs.c @@ -31,7 +31,7 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: sysfs.c 1349 2004-12-16 21:09:43Z roland $ + * $Id: sysfs.c 6940 2006-05-04 17:04:55Z roland $ */ #include "core_priv.h" @@ -589,10 +589,16 @@ static ssize_t show_node_type(struct cla return -ENODEV; switch (dev->node_type) { - case IB_NODE_CA: return sprintf(buf, "%d: CA\n", dev->node_type); - case IB_NODE_SWITCH: return sprintf(buf, "%d: switch\n", dev->node_type); - case IB_NODE_ROUTER: return sprintf(buf, "%d: router\n", dev->node_type); - default: return sprintf(buf, "%d: \n", dev->node_type); + case RDMA_NODE_IB_CA: + return sprintf(buf, "%d: CA\n", dev->node_type); + case RDMA_NODE_RNIC: + return sprintf(buf, "%d: RNIC\n", dev->node_type); + case RDMA_NODE_IB_SWITCH: + return sprintf(buf, "%d: switch\n", dev->node_type); + case RDMA_NODE_IB_ROUTER: + return sprintf(buf, "%d: router\n", dev->node_type); + default: + return sprintf(buf, "%d: \n", dev->node_type); } } @@ -708,7 +714,7 @@ int ib_device_register_sysfs(struct ib_d if (ret) goto err_put; - if (device->node_type == IB_NODE_SWITCH) { + if (device->node_type == RDMA_NODE_IB_SWITCH) { ret = add_port(device, 0); if (ret) goto err_put; diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c index 67caf36..ad2e417 100644 --- a/drivers/infiniband/core/ucm.c +++ b/drivers/infiniband/core/ucm.c @@ -30,7 +30,7 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: ucm.c 4311 2005-12-05 18:42:01Z sean.hefty $ + * $Id: ucm.c 7119 2006-05-11 16:40:38Z sean.hefty $ */ #include @@ -1248,7 +1248,8 @@ static void ib_ucm_add_one(struct ib_dev { struct ib_ucm_device *ucm_dev; - if (!device->alloc_ucontext) + if (!device->alloc_ucontext || + rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) return; ucm_dev = kzalloc(sizeof *ucm_dev, GFP_KERNEL); diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index afe70a5..0cbd692 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -1,6 +1,6 @@ /* * Copyright (c) 2004 Topspin Communications. All rights reserved. - * Copyright (c) 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2005-2006 Voltaire, Inc. All rights reserved. * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. * * This software is available to you under a choice of one of two @@ -31,7 +31,7 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: user_mad.c 5596 2006-03-03 01:00:07Z sean.hefty $ + * $Id: user_mad.c 6041 2006-03-27 21:06:00Z halr $ */ #include @@ -967,7 +967,10 @@ static void ib_umad_add_one(struct ib_de struct ib_umad_device *umad_dev; int s, e, i; - if (device->node_type == IB_NODE_SWITCH) + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + + if (device->node_type == RDMA_NODE_IB_SWITCH) s = e = 0; else { s = 1; diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index 28fdbda..e4b45d7 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -984,7 +984,7 @@ static void *ipath_register_ib_device(in (1ull << IB_USER_VERBS_CMD_QUERY_SRQ) | (1ull << IB_USER_VERBS_CMD_DESTROY_SRQ) | (1ull << IB_USER_VERBS_CMD_POST_SRQ_RECV); - dev->node_type = IB_NODE_CA; + dev->node_type = RDMA_NODE_IB_CA; dev->phys_port_cnt = 1; dev->dma_device = ipath_layer_get_device(dd); dev->class_dev.dev = dev->dma_device; diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c index a2eae8a..5c31819 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -1273,7 +1273,7 @@ int mthca_register_device(struct mthca_d (1ull << IB_USER_VERBS_CMD_MODIFY_SRQ) | (1ull << IB_USER_VERBS_CMD_QUERY_SRQ) | (1ull << IB_USER_VERBS_CMD_DESTROY_SRQ); - dev->ib_dev.node_type = IB_NODE_CA; + dev->ib_dev.node_type = RDMA_NODE_IB_CA; dev->ib_dev.phys_port_cnt = dev->limits.num_ports; dev->ib_dev.dma_device = &dev->pdev->dev; dev->ib_dev.class_dev.dev = &dev->pdev->dev; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 1c6ea1c..262427f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1084,13 +1084,16 @@ static void ipoib_add_one(struct ib_devi struct ipoib_dev_priv *priv; int s, e, p; + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + dev_list = kmalloc(sizeof *dev_list, GFP_KERNEL); if (!dev_list) return; INIT_LIST_HEAD(dev_list); - if (device->node_type == IB_NODE_SWITCH) { + if (device->node_type == RDMA_NODE_IB_SWITCH) { s = 0; e = 0; } else { @@ -1114,6 +1117,9 @@ static void ipoib_remove_one(struct ib_d struct ipoib_dev_priv *priv, *tmp; struct list_head *dev_list; + if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB) + return; + dev_list = ib_get_client_data(device, &ipoib_client); list_for_each_entry_safe(priv, tmp, dev_list, list) { diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index f1401e1..bba2956 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -1845,7 +1845,7 @@ static void srp_add_one(struct ib_device if (IS_ERR(srp_dev->fmr_pool)) srp_dev->fmr_pool = NULL; - if (device->node_type == IB_NODE_SWITCH) { + if (device->node_type == RDMA_NODE_IB_SWITCH) { s = 0; e = 0; } else { diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h index fcb5ba8..f2fd3cc 100644 --- a/include/rdma/ib_addr.h +++ b/include/rdma/ib_addr.h @@ -40,7 +40,7 @@ struct rdma_dev_addr { unsigned char src_dev_addr[MAX_ADDR_LEN]; unsigned char dst_dev_addr[MAX_ADDR_LEN]; unsigned char broadcast[MAX_ADDR_LEN]; - enum ib_node_type dev_type; + enum rdma_node_type dev_type; }; /** @@ -72,6 +72,9 @@ int rdma_resolve_ip(struct sockaddr *src void rdma_addr_cancel(struct rdma_dev_addr *addr); +int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, + unsigned char *dst_dev_addr); + static inline int ip_addr_size(struct sockaddr *addr) { return addr->sa_family == AF_INET6 ? @@ -111,4 +114,14 @@ static inline void ib_addr_set_dgid(stru memcpy(dev_addr->dst_dev_addr + 4, gid, sizeof *gid); } +static inline union ib_gid* iw_addr_get_sgid(struct rdma_dev_addr* rda) +{ + return (union ib_gid*)rda->src_dev_addr; +} + +static inline union ib_gid* iw_addr_get_dgid(struct rdma_dev_addr* rda) +{ + return (union ib_gid*)rda->dst_dev_addr; +} + #endif /* IB_ADDR_H */ diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index aeb4fcd..76e2351 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -35,7 +35,7 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * - * $Id: ib_verbs.h 1349 2004-12-16 21:09:43Z roland $ + * $Id: ib_verbs.h 6885 2006-05-03 18:22:02Z sean.hefty $ */ #if !defined(IB_VERBS_H) @@ -56,12 +56,35 @@ union ib_gid { } global; }; -enum ib_node_type { - IB_NODE_CA = 1, - IB_NODE_SWITCH, - IB_NODE_ROUTER +enum rdma_node_type { + /* IB values map to NodeInfo:NodeType. */ + RDMA_NODE_IB_CA = 1, + RDMA_NODE_IB_SWITCH, + RDMA_NODE_IB_ROUTER, + RDMA_NODE_RNIC }; +enum rdma_transport_type { + RDMA_TRANSPORT_IB, + RDMA_TRANSPORT_IWARP +}; + +static inline enum rdma_transport_type +rdma_node_get_transport(enum rdma_node_type node_type) +{ + switch (node_type) { + case RDMA_NODE_IB_CA: + case RDMA_NODE_IB_SWITCH: + case RDMA_NODE_IB_ROUTER: + return RDMA_TRANSPORT_IB; + case RDMA_NODE_RNIC: + return RDMA_TRANSPORT_IWARP; + default: + BUG(); + return 0; + } +} + enum ib_device_cap_flags { IB_DEVICE_RESIZE_MAX_WR = 1, IB_DEVICE_BAD_PKEY_CNTR = (1<<1), @@ -78,6 +101,9 @@ enum ib_device_cap_flags { IB_DEVICE_RC_RNR_NAK_GEN = (1<<12), IB_DEVICE_SRQ_RESIZE = (1<<13), IB_DEVICE_N_NOTIFY_CQ = (1<<14), + IB_DEVICE_ZERO_STAG = (1<<15), + IB_DEVICE_SEND_W_INV = (1<<16), + IB_DEVICE_MEM_WINDOW = (1<<17) }; enum ib_atomic_cap { @@ -830,6 +856,7 @@ struct ib_cache { u8 *lmc_cache; }; +struct iw_cm_verbs; struct ib_device { struct device *dma_device; @@ -846,6 +873,8 @@ struct ib_device { u32 flags; + struct iw_cm_verbs* iwcm; + int (*query_device)(struct ib_device *device, struct ib_device_attr *device_attr); int (*query_port)(struct ib_device *device, From braseameini at fsclaw.com Wed May 31 11:27:05 2006 From: braseameini at fsclaw.com (Meinir Brase) Date: Wed, 31 May 2006 11:27:05 -0700 Subject: [openib-general] Re: 199 AoMBBtEN Message-ID: <000001c684df$d1a39550$462aa8c0@bqf69> Hi, V A L \ U M A M B \ E N S O M & M E R \ D i A L E V \ T R A V \ A G R A X & N A X P R O Z & C C \ A L i S http://www.finagotto.com neck. He put it to the hole. It fitted and it turned! Snap! The gleam went out, the sun sank, the moon was gone, and evening sprang into the sky. Now they all pushed together, and slowly a part of the rock-wall gave way. Long straight cracks appeared and widened. A door five feet high and three broad was out lined, and slowly without a sound swung inwards. It seemed as if darkness flowed out like a vapour from the hole -------------- next part -------------- An HTML attachment was scrubbed... URL: From swise at opengridcomputing.com Wed May 31 11:27:33 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 31 May 2006 13:27:33 -0500 Subject: [openib-general] [PATCH 0/7][RFC] Ammasso 1100 iWARP Driver Message-ID: <20060531182733.3652.54755.stgit@stevo-desktop> This patchset implements the iWARP provider driver for the Ammasso 1100 RNIC. It is dependent on the "iWARP Core Support" patch set. We're submitting it for review with the goal for inclusion in the 2.6.19 kernel. This code has gone through several reviews in the openib-general list. Now we are submitting it for external review by the linux community. This StGIT patchset is cloned from Roland Dreier's infiniband.git for-2.6.18 branch. The patchset consists of 7 patches: 1 - Kconfig and Makefile 2 - Low-level device interface and native stack support 3 - Work request definitions 4 - Provider interface 5 - Memory management 6 - User mode message queue implementation 7 - Verbs queue implementation Signed-off-by: Tom Tucker Signed-off-by: Steve Wise From swise at opengridcomputing.com Wed May 31 11:27:35 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 31 May 2006 13:27:35 -0500 Subject: [openib-general] [PATCH 1/7] AMSO1100 Makefiles and Kconfig changes. In-Reply-To: <20060531182733.3652.54755.stgit@stevo-desktop> References: <20060531182733.3652.54755.stgit@stevo-desktop> Message-ID: <20060531182735.3652.44197.stgit@stevo-desktop> --- drivers/infiniband/Kconfig | 1 + drivers/infiniband/Makefile | 1 + drivers/infiniband/hw/amso1100/Kbuild | 10 ++++++++++ drivers/infiniband/hw/amso1100/Kconfig | 15 +++++++++++++++ drivers/infiniband/hw/amso1100/README | 11 +++++++++++ 5 files changed, 38 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index ba2d650..04e6d4f 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -36,6 +36,7 @@ config INFINIBAND_ADDR_TRANS source "drivers/infiniband/hw/mthca/Kconfig" source "drivers/infiniband/hw/ipath/Kconfig" +source "drivers/infiniband/hw/amso1100/Kconfig" source "drivers/infiniband/ulp/ipoib/Kconfig" diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile index eea2732..e2b93f9 100644 --- a/drivers/infiniband/Makefile +++ b/drivers/infiniband/Makefile @@ -1,5 +1,6 @@ obj-$(CONFIG_INFINIBAND) += core/ obj-$(CONFIG_INFINIBAND_MTHCA) += hw/mthca/ obj-$(CONFIG_IPATH_CORE) += hw/ipath/ +obj-$(CONFIG_INFINIBAND_AMSO1100) += hw/amso1100/ obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ obj-$(CONFIG_INFINIBAND_SRP) += ulp/srp/ diff --git a/drivers/infiniband/hw/amso1100/Kbuild b/drivers/infiniband/hw/amso1100/Kbuild new file mode 100644 index 0000000..3696c95 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/Kbuild @@ -0,0 +1,10 @@ +EXTRA_CFLAGS += -Idrivers/infiniband/include + +ifdef CONFIG_INFINIBAND_AMSO1100_DEBUG +EXTRA_CFLAGS += -DC2_DEBUG +endif + +obj-$(CONFIG_INFINIBAND_AMSO1100) += iw_c2.o + +iw_c2-y := c2.o c2_provider.o c2_rnic.o c2_alloc.o c2_mq.o c2_ae.o c2_vq.o \ + c2_intr.o c2_cq.o c2_qp.o c2_cm.o c2_mm.o c2_pd.o diff --git a/drivers/infiniband/hw/amso1100/Kconfig b/drivers/infiniband/hw/amso1100/Kconfig new file mode 100644 index 0000000..809cb14 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/Kconfig @@ -0,0 +1,15 @@ +config INFINIBAND_AMSO1100 + tristate "Ammasso 1100 HCA support" + depends on PCI && INET && INFINIBAND + ---help--- + This is a low-level driver for the Ammasso 1100 host + channel adapter (HCA). + +config INFINIBAND_AMSO1100_DEBUG + bool "Verbose debugging output" + depends on INFINIBAND_AMSO1100 + default n + ---help--- + This option causes the amso1100 driver to produce a bunch of + debug messages. Select this if you are developing the driver + or trying to diagnose a problem. diff --git a/drivers/infiniband/hw/amso1100/README b/drivers/infiniband/hw/amso1100/README new file mode 100644 index 0000000..1331353 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/README @@ -0,0 +1,11 @@ +This is the OpenFabrics provider driver for the +AMSO1100 1Gb RNIC adapter. + +This adapter is available in limited quantities +for development purposes from Open Grid Computing. + +This driver requires the IWCM and CMA mods necessary +to support iWARP. + +Contact tom at opengridcomputing.com for more information. + From swise at opengridcomputing.com Wed May 31 11:27:37 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 31 May 2006 13:27:37 -0500 Subject: [openib-general] [PATCH 2/7] AMSO1100 Low Level Driver. In-Reply-To: <20060531182733.3652.54755.stgit@stevo-desktop> References: <20060531182733.3652.54755.stgit@stevo-desktop> Message-ID: <20060531182737.3652.24752.stgit@stevo-desktop> This is the core of the driver and includes the hardware probe, low-level device interfaces and native Ethernet support. --- drivers/infiniband/hw/amso1100/c2.c | 1278 ++++++++++++++++++++++++++++++ drivers/infiniband/hw/amso1100/c2.h | 567 +++++++++++++ drivers/infiniband/hw/amso1100/c2_ae.c | 360 ++++++++ drivers/infiniband/hw/amso1100/c2_intr.c | 211 +++++ drivers/infiniband/hw/amso1100/c2_rnic.c | 720 +++++++++++++++++ 5 files changed, 3136 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2.c b/drivers/infiniband/hw/amso1100/c2.c new file mode 100644 index 0000000..0083dad --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2.c @@ -0,0 +1,1278 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include "c2.h" +#include "c2_provider.h" + +MODULE_AUTHOR("Tom Tucker "); +MODULE_DESCRIPTION("Ammasso AMSO1100 Low-level iWARP Driver"); +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_VERSION(DRV_VERSION); + +static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK + | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN; + +static int debug = -1; /* defaults above */ +module_param(debug, int, 0); +MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)"); + +static int c2_up(struct net_device *netdev); +static int c2_down(struct net_device *netdev); +static int c2_xmit_frame(struct sk_buff *skb, struct net_device *netdev); +static void c2_tx_interrupt(struct net_device *netdev); +static void c2_rx_interrupt(struct net_device *netdev); +static irqreturn_t c2_interrupt(int irq, void *dev_id, struct pt_regs *regs); +static void c2_tx_timeout(struct net_device *netdev); +static int c2_change_mtu(struct net_device *netdev, int new_mtu); +static void c2_reset(struct c2_port *c2_port); +static struct net_device_stats *c2_get_stats(struct net_device *netdev); + +static struct pci_device_id c2_pci_table[] = { + {0x18b8, 0xb001, PCI_ANY_ID, PCI_ANY_ID}, + {0} +}; + +MODULE_DEVICE_TABLE(pci, c2_pci_table); + +static void c2_print_macaddr(struct net_device *netdev) +{ + dprintk("%s: MAC %02X:%02X:%02X:%02X:%02X:%02X, " + "IRQ %u\n", netdev->name, + netdev->dev_addr[0], netdev->dev_addr[1], netdev->dev_addr[2], + netdev->dev_addr[3], netdev->dev_addr[4], netdev->dev_addr[5], + netdev->irq); +} + +static void c2_set_rxbufsize(struct c2_port *c2_port) +{ + struct net_device *netdev = c2_port->netdev; + + assert(netdev != NULL); + + if (netdev->mtu > RX_BUF_SIZE) + c2_port->rx_buf_size = + netdev->mtu + ETH_HLEN + sizeof(struct c2_rxp_hdr) + + NET_IP_ALIGN; + else + c2_port->rx_buf_size = sizeof(struct c2_rxp_hdr) + RX_BUF_SIZE; +} + +/* + * Allocate TX ring elements and chain them together. + * One-to-one association of adapter descriptors with ring elements. + */ +static int c2_tx_ring_alloc(struct c2_ring *tx_ring, void *vaddr, + dma_addr_t base, void __iomem * mmio_txp_ring) +{ + struct c2_tx_desc *tx_desc; + struct c2_txp_desc __iomem *txp_desc; + struct c2_element *elem; + int i; + + tx_ring->start = kmalloc(sizeof(*elem) * tx_ring->count, GFP_KERNEL); + if (!tx_ring->start) + return -ENOMEM; + + elem = tx_ring->start; + tx_desc = vaddr; + txp_desc = mmio_txp_ring; + for (i = 0; i < tx_ring->count; i++, elem++, tx_desc++, txp_desc++) { + tx_desc->len = 0; + tx_desc->status = 0; + + /* Set TXP_HTXD_UNINIT */ + __raw_writeq(cpu_to_be64(0x1122334455667788ULL), + (void __iomem *) txp_desc + C2_TXP_ADDR); + __raw_writew(0, (void __iomem *) txp_desc + C2_TXP_LEN); + __raw_writew(cpu_to_be16(TXP_HTXD_UNINIT), + (void __iomem *) txp_desc + C2_TXP_FLAGS); + + elem->skb = NULL; + elem->ht_desc = tx_desc; + elem->hw_desc = txp_desc; + + if (i == tx_ring->count - 1) { + elem->next = tx_ring->start; + tx_desc->next_offset = base; + } else { + elem->next = elem + 1; + tx_desc->next_offset = + base + (i + 1) * sizeof(*tx_desc); + } + } + + tx_ring->to_use = tx_ring->to_clean = tx_ring->start; + + return 0; +} + +/* + * Allocate RX ring elements and chain them together. + * One-to-one association of adapter descriptors with ring elements. + */ +static int c2_rx_ring_alloc(struct c2_ring *rx_ring, void *vaddr, + dma_addr_t base, void __iomem * mmio_rxp_ring) +{ + struct c2_rx_desc *rx_desc; + struct c2_rxp_desc __iomem *rxp_desc; + struct c2_element *elem; + int i; + + rx_ring->start = kmalloc(sizeof(*elem) * rx_ring->count, GFP_KERNEL); + if (!rx_ring->start) + return -ENOMEM; + + elem = rx_ring->start; + rx_desc = vaddr; + rxp_desc = mmio_rxp_ring; + for (i = 0; i < rx_ring->count; i++, elem++, rx_desc++, rxp_desc++) { + rx_desc->len = 0; + rx_desc->status = 0; + + /* Set RXP_HRXD_UNINIT */ + __raw_writew(cpu_to_be16(RXP_HRXD_OK), + (void __iomem *) rxp_desc + C2_RXP_STATUS); + __raw_writew(0, (void __iomem *) rxp_desc + C2_RXP_COUNT); + __raw_writew(0, (void __iomem *) rxp_desc + C2_RXP_LEN); + __raw_writeq(cpu_to_be64(0x99aabbccddeeffULL), + (void __iomem *) rxp_desc + C2_RXP_ADDR); + __raw_writew(cpu_to_be16(RXP_HRXD_UNINIT), + (void __iomem *) rxp_desc + C2_RXP_FLAGS); + + elem->skb = NULL; + elem->ht_desc = rx_desc; + elem->hw_desc = rxp_desc; + + if (i == rx_ring->count - 1) { + elem->next = rx_ring->start; + rx_desc->next_offset = base; + } else { + elem->next = elem + 1; + rx_desc->next_offset = + base + (i + 1) * sizeof(*rx_desc); + } + } + + rx_ring->to_use = rx_ring->to_clean = rx_ring->start; + + return 0; +} + +/* Setup buffer for receiving */ +static inline int c2_rx_alloc(struct c2_port *c2_port, struct c2_element *elem) +{ + struct c2_dev *c2dev = c2_port->c2dev; + struct c2_rx_desc *rx_desc = elem->ht_desc; + struct sk_buff *skb; + dma_addr_t mapaddr; + u32 maplen; + struct c2_rxp_hdr *rxp_hdr; + + skb = dev_alloc_skb(c2_port->rx_buf_size); + if (unlikely(!skb)) { + dprintk("%s: out of memory for receive\n", + c2_port->netdev->name); + return -ENOMEM; + } + + /* Zero out the rxp hdr in the sk_buff */ + memset(skb->data, 0, sizeof(*rxp_hdr)); + + skb->dev = c2_port->netdev; + + maplen = c2_port->rx_buf_size; + mapaddr = + pci_map_single(c2dev->pcidev, skb->data, maplen, + PCI_DMA_FROMDEVICE); + + /* Set the sk_buff RXP_header to RXP_HRXD_READY */ + rxp_hdr = (struct c2_rxp_hdr *) skb->data; + rxp_hdr->flags = RXP_HRXD_READY; + + __raw_writew(0, elem->hw_desc + C2_RXP_STATUS); + __raw_writew(cpu_to_be16((u16) maplen - sizeof(*rxp_hdr)), + elem->hw_desc + C2_RXP_LEN); + __raw_writeq(cpu_to_be64(mapaddr), elem->hw_desc + C2_RXP_ADDR); + __raw_writew(cpu_to_be16(RXP_HRXD_READY), elem->hw_desc + C2_RXP_FLAGS); + + elem->skb = skb; + elem->mapaddr = mapaddr; + elem->maplen = maplen; + rx_desc->len = maplen; + + return 0; +} + +/* + * Allocate buffers for the Rx ring + * For receive: rx_ring.to_clean is next received frame + */ +static int c2_rx_fill(struct c2_port *c2_port) +{ + struct c2_ring *rx_ring = &c2_port->rx_ring; + struct c2_element *elem; + int ret = 0; + + elem = rx_ring->start; + do { + if (c2_rx_alloc(c2_port, elem)) { + ret = 1; + break; + } + } while ((elem = elem->next) != rx_ring->start); + + rx_ring->to_clean = rx_ring->start; + return ret; +} + +/* Free all buffers in RX ring, assumes receiver stopped */ +static void c2_rx_clean(struct c2_port *c2_port) +{ + struct c2_dev *c2dev = c2_port->c2dev; + struct c2_ring *rx_ring = &c2_port->rx_ring; + struct c2_element *elem; + struct c2_rx_desc *rx_desc; + + elem = rx_ring->start; + do { + rx_desc = elem->ht_desc; + rx_desc->len = 0; + + __raw_writew(0, elem->hw_desc + C2_RXP_STATUS); + __raw_writew(0, elem->hw_desc + C2_RXP_COUNT); + __raw_writew(0, elem->hw_desc + C2_RXP_LEN); + __raw_writeq(cpu_to_be64(0x99aabbccddeeffULL), + elem->hw_desc + C2_RXP_ADDR); + __raw_writew(cpu_to_be16(RXP_HRXD_UNINIT), + elem->hw_desc + C2_RXP_FLAGS); + + if (elem->skb) { + pci_unmap_single(c2dev->pcidev, elem->mapaddr, + elem->maplen, PCI_DMA_FROMDEVICE); + dev_kfree_skb(elem->skb); + elem->skb = NULL; + } + } while ((elem = elem->next) != rx_ring->start); +} + +static inline int c2_tx_free(struct c2_dev *c2dev, struct c2_element *elem) +{ + struct c2_tx_desc *tx_desc = elem->ht_desc; + + tx_desc->len = 0; + + pci_unmap_single(c2dev->pcidev, elem->mapaddr, elem->maplen, + PCI_DMA_TODEVICE); + + if (elem->skb) { + dev_kfree_skb_any(elem->skb); + elem->skb = NULL; + } + + return 0; +} + +/* Free all buffers in TX ring, assumes transmitter stopped */ +static void c2_tx_clean(struct c2_port *c2_port) +{ + struct c2_ring *tx_ring = &c2_port->tx_ring; + struct c2_element *elem; + struct c2_txp_desc txp_htxd; + int retry; + unsigned long flags; + + spin_lock_irqsave(&c2_port->tx_lock, flags); + + elem = tx_ring->start; + + do { + retry = 0; + do { + txp_htxd.flags = + readw(elem->hw_desc + C2_TXP_FLAGS); + + if (txp_htxd.flags == TXP_HTXD_READY) { + retry = 1; + __raw_writew(0, + elem->hw_desc + C2_TXP_LEN); + __raw_writeq(0, + elem->hw_desc + C2_TXP_ADDR); + __raw_writew(cpu_to_be16(TXP_HTXD_DONE), + elem->hw_desc + C2_TXP_FLAGS); + c2_port->netstats.tx_dropped++; + break; + } else { + __raw_writew(0, + elem->hw_desc + C2_TXP_LEN); + __raw_writeq(cpu_to_be64(0x1122334455667788ULL), + elem->hw_desc + C2_TXP_ADDR); + __raw_writew(cpu_to_be16(TXP_HTXD_UNINIT), + elem->hw_desc + C2_TXP_FLAGS); + } + + c2_tx_free(c2_port->c2dev, elem); + + } while ((elem = elem->next) != tx_ring->start); + } while (retry); + + c2_port->tx_avail = c2_port->tx_ring.count - 1; + c2_port->c2dev->cur_tx = tx_ring->to_use - tx_ring->start; + + if (c2_port->tx_avail > MAX_SKB_FRAGS + 1) + netif_wake_queue(c2_port->netdev); + + spin_unlock_irqrestore(&c2_port->tx_lock, flags); +} + +/* + * Process transmit descriptors marked 'DONE' by the firmware, + * freeing up their unneeded sk_buffs. + */ +static void c2_tx_interrupt(struct net_device *netdev) +{ + struct c2_port *c2_port = netdev_priv(netdev); + struct c2_dev *c2dev = c2_port->c2dev; + struct c2_ring *tx_ring = &c2_port->tx_ring; + struct c2_element *elem; + struct c2_txp_desc txp_htxd; + + spin_lock(&c2_port->tx_lock); + + for (elem = tx_ring->to_clean; elem != tx_ring->to_use; + elem = elem->next) { + txp_htxd.flags = + be16_to_cpu(readw(elem->hw_desc + C2_TXP_FLAGS)); + + if (txp_htxd.flags != TXP_HTXD_DONE) + break; + + if (netif_msg_tx_done(c2_port)) { + /* PCI reads are expensive in fast path */ + txp_htxd.len = + be16_to_cpu(readw(elem->hw_desc + C2_TXP_LEN)); + dprintk("%s: tx done slot %3Zu status 0x%x len " + "%5u bytes\n", + netdev->name, elem - tx_ring->start, + txp_htxd.flags, txp_htxd.len); + } + + c2_tx_free(c2dev, elem); + ++(c2_port->tx_avail); + } + + tx_ring->to_clean = elem; + + if (netif_queue_stopped(netdev) + && c2_port->tx_avail > MAX_SKB_FRAGS + 1) + netif_wake_queue(netdev); + + spin_unlock(&c2_port->tx_lock); +} + +static void c2_rx_error(struct c2_port *c2_port, struct c2_element *elem) +{ + struct c2_rx_desc *rx_desc = elem->ht_desc; + struct c2_rxp_hdr *rxp_hdr = (struct c2_rxp_hdr *) elem->skb->data; + + if (rxp_hdr->status != RXP_HRXD_OK || + rxp_hdr->len > (rx_desc->len - sizeof(*rxp_hdr))) { + dprintk("BAD RXP_HRXD\n"); + dprintk(" rx_desc : %p\n", rx_desc); + dprintk(" index : %Zu\n", + elem - c2_port->rx_ring.start); + dprintk(" len : %u\n", rx_desc->len); + dprintk(" rxp_hdr : %p [PA %p]\n", rxp_hdr, + (void *) __pa((unsigned long) rxp_hdr)); + dprintk(" flags : 0x%x\n", rxp_hdr->flags); + dprintk(" status: 0x%x\n", rxp_hdr->status); + dprintk(" len : %u\n", rxp_hdr->len); + dprintk(" rsvd : 0x%x\n", rxp_hdr->rsvd); + } + + /* Setup the skb for reuse since we're dropping this pkt */ + elem->skb->tail = elem->skb->data = elem->skb->head; + + /* Zero out the rxp hdr in the sk_buff */ + memset(elem->skb->data, 0, sizeof(*rxp_hdr)); + + /* Write the descriptor to the adapter's rx ring */ + __raw_writew(0, elem->hw_desc + C2_RXP_STATUS); + __raw_writew(0, elem->hw_desc + C2_RXP_COUNT); + __raw_writew(cpu_to_be16((u16) elem->maplen - sizeof(*rxp_hdr)), + elem->hw_desc + C2_RXP_LEN); + __raw_writeq(cpu_to_be64(elem->mapaddr), elem->hw_desc + C2_RXP_ADDR); + __raw_writew(cpu_to_be16(RXP_HRXD_READY), elem->hw_desc + C2_RXP_FLAGS); + + dprintk("packet dropped\n"); + c2_port->netstats.rx_dropped++; +} + +static void c2_rx_interrupt(struct net_device *netdev) +{ + struct c2_port *c2_port = netdev_priv(netdev); + struct c2_dev *c2dev = c2_port->c2dev; + struct c2_ring *rx_ring = &c2_port->rx_ring; + struct c2_element *elem; + struct c2_rx_desc *rx_desc; + struct c2_rxp_hdr *rxp_hdr; + struct sk_buff *skb; + dma_addr_t mapaddr; + u32 maplen, buflen; + unsigned long flags; + + spin_lock_irqsave(&c2dev->lock, flags); + + /* Begin where we left off */ + rx_ring->to_clean = rx_ring->start + c2dev->cur_rx; + + for (elem = rx_ring->to_clean; elem->next != rx_ring->to_clean; + elem = elem->next) { + rx_desc = elem->ht_desc; + mapaddr = elem->mapaddr; + maplen = elem->maplen; + skb = elem->skb; + rxp_hdr = (struct c2_rxp_hdr *) skb->data; + + if (rxp_hdr->flags != RXP_HRXD_DONE) + break; + buflen = rxp_hdr->len; + + /* Sanity check the RXP header */ + if (rxp_hdr->status != RXP_HRXD_OK || + buflen > (rx_desc->len - sizeof(*rxp_hdr))) { + c2_rx_error(c2_port, elem); + continue; + } + + /* + * Allocate and map a new skb for replenishing the host + * RX desc + */ + if (c2_rx_alloc(c2_port, elem)) { + c2_rx_error(c2_port, elem); + continue; + } + + /* Unmap the old skb */ + pci_unmap_single(c2dev->pcidev, mapaddr, maplen, + PCI_DMA_FROMDEVICE); + + /* + * Skip past the leading 8 bytes comprising of the + * "struct c2_rxp_hdr", prepended by the adapter + * to the usual Ethernet header ("struct ethhdr"), + * to the start of the raw Ethernet packet. + * + * Fix up the various fields in the sk_buff before + * passing it up to netif_rx(). The transfer size + * (in bytes) specified by the adapter len field of + * the "struct rxp_hdr_t" does NOT include the + * "sizeof(struct c2_rxp_hdr)". + */ + skb->data += sizeof(*rxp_hdr); + skb->tail = skb->data + buflen; + skb->len = buflen; + skb->dev = netdev; + skb->protocol = eth_type_trans(skb, netdev); + + /* Drop arp requests to the pseudo nic ip addr */ + if (unlikely(ntohs(skb->protocol) == ETH_P_ARP)) { + u8 *tpa; + + /* pull out the tgt ip addr */ + tpa = skb->data /* beginning of the arp packet */ + + 8 /* arp addr fmts, lens, and opcode */ + + 6 /* arp src hw addr */ + + 4 /* arp src proto addr */ + + 6; /* arp tgt hw addr */ + if (is_rnic_addr(c2dev->pseudo_netdev, *((u32 *)tpa))) { + dprintk("Dropping arp req for" + " %03d.%03d.%03d.%03d\n", + tpa[0], tpa[1], tpa[2], tpa[3]); + kfree_skb(skb); + continue; + } + } + + netif_rx(skb); + + netdev->last_rx = jiffies; + c2_port->netstats.rx_packets++; + c2_port->netstats.rx_bytes += buflen; + } + + /* Save where we left off */ + rx_ring->to_clean = elem; + c2dev->cur_rx = elem - rx_ring->start; + C2_SET_CUR_RX(c2dev, c2dev->cur_rx); + + spin_unlock_irqrestore(&c2dev->lock, flags); +} + +/* + * Handle netisr0 TX & RX interrupts. + */ +static irqreturn_t c2_interrupt(int irq, void *dev_id, struct pt_regs *regs) +{ + unsigned int netisr0, dmaisr; + int handled = 0; + struct c2_dev *c2dev = (struct c2_dev *) dev_id; + + assert(c2dev != NULL); + + /* Process CCILNET interrupts */ + netisr0 = readl(c2dev->regs + C2_NISR0); + if (netisr0) { + + /* + * There is an issue with the firmware that always + * provides the status of RX for both TX & RX + * interrupts. So process both queues here. + */ + c2_rx_interrupt(c2dev->netdev); + c2_tx_interrupt(c2dev->netdev); + + /* Clear the interrupt */ + writel(netisr0, c2dev->regs + C2_NISR0); + handled++; + } + + /* Process RNIC interrupts */ + dmaisr = readl(c2dev->regs + C2_DISR); + if (dmaisr) { + writel(dmaisr, c2dev->regs + C2_DISR); + c2_rnic_interrupt(c2dev); + handled++; + } + + if (handled) { + return IRQ_HANDLED; + } else { + return IRQ_NONE; + } +} + +static int c2_up(struct net_device *netdev) +{ + struct c2_port *c2_port = netdev_priv(netdev); + struct c2_dev *c2dev = c2_port->c2dev; + struct c2_element *elem; + struct c2_rxp_hdr *rxp_hdr; + size_t rx_size, tx_size; + int ret, i; + unsigned int netimr0; + + assert(c2dev != NULL); + + if (netif_msg_ifup(c2_port)) + dprintk("%s: enabling interface\n", netdev->name); + + /* Set the Rx buffer size based on MTU */ + c2_set_rxbufsize(c2_port); + + /* Allocate DMA'able memory for Tx/Rx host descriptor rings */ + rx_size = c2_port->rx_ring.count * sizeof(struct c2_rx_desc); + tx_size = c2_port->tx_ring.count * sizeof(struct c2_tx_desc); + + c2_port->mem_size = tx_size + rx_size; + c2_port->mem = pci_alloc_consistent(c2dev->pcidev, c2_port->mem_size, + &c2_port->dma); + if (c2_port->mem == NULL) { + dprintk("Unable to allocate memory for " + "host descriptor rings\n"); + return -ENOMEM; + } + + memset(c2_port->mem, 0, c2_port->mem_size); + + /* Create the Rx host descriptor ring */ + if ((ret = + c2_rx_ring_alloc(&c2_port->rx_ring, c2_port->mem, c2_port->dma, + c2dev->mmio_rxp_ring))) { + dprintk("Unable to create RX ring\n"); + goto bail0; + } + + /* Allocate Rx buffers for the host descriptor ring */ + if (c2_rx_fill(c2_port)) { + dprintk("Unable to fill RX ring\n"); + goto bail1; + } + + /* Create the Tx host descriptor ring */ + if ((ret = c2_tx_ring_alloc(&c2_port->tx_ring, c2_port->mem + rx_size, + c2_port->dma + rx_size, + c2dev->mmio_txp_ring))) { + dprintk("Unable to create TX ring\n"); + goto bail1; + } + + /* Set the TX pointer to where we left off */ + c2_port->tx_avail = c2_port->tx_ring.count - 1; + c2_port->tx_ring.to_use = c2_port->tx_ring.to_clean = + c2_port->tx_ring.start + c2dev->cur_tx; + + /* missing: Initialize MAC */ + + BUG_ON(c2_port->tx_ring.to_use != c2_port->tx_ring.to_clean); + + /* Reset the adapter, ensures the driver is in sync with the RXP */ + c2_reset(c2_port); + + /* Reset the READY bit in the sk_buff RXP headers & adapter HRXDQ */ + for (i = 0, elem = c2_port->rx_ring.start; i < c2_port->rx_ring.count; + i++, elem++) { + rxp_hdr = (struct c2_rxp_hdr *) elem->skb->data; + rxp_hdr->flags = 0; + __raw_writew(cpu_to_be16(RXP_HRXD_READY), + elem->hw_desc + C2_RXP_FLAGS); + } + + /* Enable network packets */ + netif_start_queue(netdev); + + /* Enable IRQ */ + writel(0, c2dev->regs + C2_IDIS); + netimr0 = readl(c2dev->regs + C2_NIMR0); + netimr0 &= ~(C2_PCI_HTX_INT | C2_PCI_HRX_INT); + writel(netimr0, c2dev->regs + C2_NIMR0); + + return 0; + + bail1: + c2_rx_clean(c2_port); + kfree(c2_port->rx_ring.start); + + bail0: + pci_free_consistent(c2dev->pcidev, c2_port->mem_size, c2_port->mem, + c2_port->dma); + + return ret; +} + +static int c2_down(struct net_device *netdev) +{ + struct c2_port *c2_port = netdev_priv(netdev); + struct c2_dev *c2dev = c2_port->c2dev; + + if (netif_msg_ifdown(c2_port)) + dprintk("%s: disabling interface\n", + netdev->name); + + /* Wait for all the queued packets to get sent */ + c2_tx_interrupt(netdev); + + /* Disable network packets */ + netif_stop_queue(netdev); + + /* Disable IRQs by clearing the interrupt mask */ + writel(1, c2dev->regs + C2_IDIS); + writel(0, c2dev->regs + C2_NIMR0); + + /* missing: Stop transmitter */ + + /* missing: Stop receiver */ + + /* Reset the adapter, ensures the driver is in sync with the RXP */ + c2_reset(c2_port); + + /* missing: Turn off LEDs here */ + + /* Free all buffers in the host descriptor rings */ + c2_tx_clean(c2_port); + c2_rx_clean(c2_port); + + /* Free the host descriptor rings */ + kfree(c2_port->rx_ring.start); + kfree(c2_port->tx_ring.start); + pci_free_consistent(c2dev->pcidev, c2_port->mem_size, c2_port->mem, + c2_port->dma); + + return 0; +} + +static void c2_reset(struct c2_port *c2_port) +{ + struct c2_dev *c2dev = c2_port->c2dev; + unsigned int cur_rx = c2dev->cur_rx; + + /* Tell the hardware to quiesce */ + C2_SET_CUR_RX(c2dev, cur_rx | C2_PCI_HRX_QUI); + + /* + * The hardware will reset the C2_PCI_HRX_QUI bit once + * the RXP is quiesced. Wait 2 seconds for this. + */ + ssleep(2); + + cur_rx = C2_GET_CUR_RX(c2dev); + + if (cur_rx & C2_PCI_HRX_QUI) + dprintk("c2_reset: failed to quiesce the hardware!\n"); + + cur_rx &= ~C2_PCI_HRX_QUI; + + c2dev->cur_rx = cur_rx; + + dprintk("Current RX: %u\n", c2dev->cur_rx); +} + +static int c2_xmit_frame(struct sk_buff *skb, struct net_device *netdev) +{ + struct c2_port *c2_port = netdev_priv(netdev); + struct c2_dev *c2dev = c2_port->c2dev; + struct c2_ring *tx_ring = &c2_port->tx_ring; + struct c2_element *elem; + dma_addr_t mapaddr; + u32 maplen; + unsigned long flags; + unsigned int i; + + spin_lock_irqsave(&c2_port->tx_lock, flags); + + if (unlikely(c2_port->tx_avail < (skb_shinfo(skb)->nr_frags + 1))) { + netif_stop_queue(netdev); + spin_unlock_irqrestore(&c2_port->tx_lock, flags); + + dprintk("%s: Tx ring full when queue awake!\n", + netdev->name); + return NETDEV_TX_BUSY; + } + + maplen = skb_headlen(skb); + mapaddr = + pci_map_single(c2dev->pcidev, skb->data, maplen, PCI_DMA_TODEVICE); + + elem = tx_ring->to_use; + elem->skb = skb; + elem->mapaddr = mapaddr; + elem->maplen = maplen; + + /* Tell HW to xmit */ + __raw_writeq(cpu_to_be64(mapaddr), elem->hw_desc + C2_TXP_ADDR); + __raw_writew(cpu_to_be16(maplen), elem->hw_desc + C2_TXP_LEN); + __raw_writew(cpu_to_be16(TXP_HTXD_READY), elem->hw_desc + C2_TXP_FLAGS); + + c2_port->netstats.tx_packets++; + c2_port->netstats.tx_bytes += maplen; + + /* Loop thru additional data fragments and queue them */ + if (skb_shinfo(skb)->nr_frags) { + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { + skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; + maplen = frag->size; + mapaddr = + pci_map_page(c2dev->pcidev, frag->page, + frag->page_offset, maplen, + PCI_DMA_TODEVICE); + + elem = elem->next; + elem->skb = NULL; + elem->mapaddr = mapaddr; + elem->maplen = maplen; + + /* Tell HW to xmit */ + __raw_writeq(cpu_to_be64(mapaddr), + elem->hw_desc + C2_TXP_ADDR); + __raw_writew(cpu_to_be16(maplen), + elem->hw_desc + C2_TXP_LEN); + __raw_writew(cpu_to_be16(TXP_HTXD_READY), + elem->hw_desc + C2_TXP_FLAGS); + + c2_port->netstats.tx_packets++; + c2_port->netstats.tx_bytes += maplen; + } + } + + tx_ring->to_use = elem->next; + c2_port->tx_avail -= (skb_shinfo(skb)->nr_frags + 1); + + if (c2_port->tx_avail <= MAX_SKB_FRAGS + 1) { + netif_stop_queue(netdev); + if (netif_msg_tx_queued(c2_port)) + dprintk("%s: transmit queue full\n", + netdev->name); + } + + spin_unlock_irqrestore(&c2_port->tx_lock, flags); + + netdev->trans_start = jiffies; + + return NETDEV_TX_OK; +} + +static struct net_device_stats *c2_get_stats(struct net_device *netdev) +{ + struct c2_port *c2_port = netdev_priv(netdev); + + return &c2_port->netstats; +} + +static int c2_set_mac_address(struct net_device *netdev, void *p) +{ + return -1; +} + +static void c2_tx_timeout(struct net_device *netdev) +{ + struct c2_port *c2_port = netdev_priv(netdev); + + if (netif_msg_timer(c2_port)) + dprintk("%s: tx timeout\n", netdev->name); + + c2_tx_clean(c2_port); +} + +static int c2_change_mtu(struct net_device *netdev, int new_mtu) +{ + int ret = 0; + + if (new_mtu < ETH_ZLEN || new_mtu > ETH_JUMBO_MTU) + return -EINVAL; + + netdev->mtu = new_mtu; + + if (netif_running(netdev)) { + c2_down(netdev); + + c2_up(netdev); + } + + return ret; +} + +/* Initialize network device */ +static struct net_device *c2_devinit(struct c2_dev *c2dev, + void __iomem * mmio_addr) +{ + struct c2_port *c2_port = NULL; + struct net_device *netdev = alloc_etherdev(sizeof(*c2_port)); + + if (!netdev) { + dprintk("c2_port etherdev alloc failed"); + return NULL; + } + + SET_MODULE_OWNER(netdev); + SET_NETDEV_DEV(netdev, &c2dev->pcidev->dev); + + netdev->open = c2_up; + netdev->stop = c2_down; + netdev->hard_start_xmit = c2_xmit_frame; + netdev->get_stats = c2_get_stats; + netdev->tx_timeout = c2_tx_timeout; + netdev->set_mac_address = c2_set_mac_address; + netdev->change_mtu = c2_change_mtu; + netdev->watchdog_timeo = C2_TX_TIMEOUT; + netdev->irq = c2dev->pcidev->irq; + + c2_port = netdev_priv(netdev); + c2_port->netdev = netdev; + c2_port->c2dev = c2dev; + c2_port->msg_enable = netif_msg_init(debug, default_msg); + c2_port->tx_ring.count = C2_NUM_TX_DESC; + c2_port->rx_ring.count = C2_NUM_RX_DESC; + + spin_lock_init(&c2_port->tx_lock); + + /* Copy our 48-bit ethernet hardware address */ + memcpy_fromio(netdev->dev_addr, mmio_addr + C2_REGS_ENADDR, 6); + + /* Validate the MAC address */ + if (!is_valid_ether_addr(netdev->dev_addr)) { + dprintk("Invalid MAC Address\n"); + c2_print_macaddr(netdev); + free_netdev(netdev); + return NULL; + } + + c2dev->netdev = netdev; + + return netdev; +} + +static int __devinit c2_probe(struct pci_dev *pcidev, + const struct pci_device_id *ent) +{ + int ret = 0, i; + unsigned long reg0_start, reg0_flags, reg0_len; + unsigned long reg2_start, reg2_flags, reg2_len; + unsigned long reg4_start, reg4_flags, reg4_len; + unsigned kva_map_size; + struct net_device *netdev = NULL; + struct c2_dev *c2dev = NULL; + void __iomem *mmio_regs = NULL; + + assert(pcidev != NULL); + assert(ent != NULL); + + printk(KERN_INFO PFX "AMSO1100 Gigabit Ethernet driver v%s loaded\n", + DRV_VERSION); + + /* Enable PCI device */ + ret = pci_enable_device(pcidev); + if (ret) { + printk(KERN_ERR PFX "%s: Unable to enable PCI device\n", + pci_name(pcidev)); + goto bail0; + } + + reg0_start = pci_resource_start(pcidev, BAR_0); + reg0_len = pci_resource_len(pcidev, BAR_0); + reg0_flags = pci_resource_flags(pcidev, BAR_0); + + reg2_start = pci_resource_start(pcidev, BAR_2); + reg2_len = pci_resource_len(pcidev, BAR_2); + reg2_flags = pci_resource_flags(pcidev, BAR_2); + + reg4_start = pci_resource_start(pcidev, BAR_4); + reg4_len = pci_resource_len(pcidev, BAR_4); + reg4_flags = pci_resource_flags(pcidev, BAR_4); + + dprintk("BAR0 size = 0x%lX bytes\n", reg0_len); + dprintk("BAR2 size = 0x%lX bytes\n", reg2_len); + dprintk("BAR4 size = 0x%lX bytes\n", reg4_len); + + /* Make sure PCI base addr are MMIO */ + if (!(reg0_flags & IORESOURCE_MEM) || + !(reg2_flags & IORESOURCE_MEM) || !(reg4_flags & IORESOURCE_MEM)) { + printk(KERN_ERR PFX "PCI regions not an MMIO resource\n"); + ret = -ENODEV; + goto bail1; + } + + /* Check for weird/broken PCI region reporting */ + if ((reg0_len < C2_REG0_SIZE) || + (reg2_len < C2_REG2_SIZE) || (reg4_len < C2_REG4_SIZE)) { + printk(KERN_ERR PFX "Invalid PCI region sizes\n"); + ret = -ENODEV; + goto bail1; + } + + /* Reserve PCI I/O and memory resources */ + ret = pci_request_regions(pcidev, DRV_NAME); + if (ret) { + printk(KERN_ERR PFX "%s: Unable to request regions\n", + pci_name(pcidev)); + goto bail1; + } + + if ((sizeof(dma_addr_t) > 4)) { + ret = pci_set_dma_mask(pcidev, DMA_64BIT_MASK); + if (ret < 0) { + printk(KERN_ERR PFX "64b DMA configuration failed\n"); + goto bail2; + } + } else { + ret = pci_set_dma_mask(pcidev, DMA_32BIT_MASK); + if (ret < 0) { + printk(KERN_ERR PFX "32b DMA configuration failed\n"); + goto bail2; + } + } + + /* Enables bus-mastering on the device */ + pci_set_master(pcidev); + + /* Remap the adapter PCI registers in BAR4 */ + mmio_regs = ioremap_nocache(reg4_start + C2_PCI_REGS_OFFSET, + sizeof(struct c2_adapter_pci_regs)); + if (mmio_regs == 0UL) { + printk(KERN_ERR PFX + "Unable to remap adapter PCI registers in BAR4\n"); + ret = -EIO; + goto bail2; + } + + /* Validate PCI regs magic */ + for (i = 0; i < sizeof(c2_magic); i++) { + if (c2_magic[i] != readb(mmio_regs + C2_REGS_MAGIC + i)) { + printk(KERN_ERR PFX "Downlevel Firmware boot loader " + "[%d/%Zd: got 0x%x, exp 0x%x]. Use the cc_flash " + "utility to update your boot loader\n", + i + 1, sizeof(c2_magic), + readb(mmio_regs + C2_REGS_MAGIC + i), + c2_magic[i]); + printk(KERN_ERR PFX "Adapter not claimed\n"); + iounmap(mmio_regs); + ret = -EIO; + goto bail2; + } + } + + /* Validate the adapter version */ + if (be32_to_cpu(readl(mmio_regs + C2_REGS_VERS)) != C2_VERSION) { + printk(KERN_ERR PFX "Version mismatch " + "[fw=%u, c2=%u], Adapter not claimed\n", + be32_to_cpu(readl(mmio_regs + C2_REGS_VERS)), + C2_VERSION); + ret = -EINVAL; + iounmap(mmio_regs); + goto bail2; + } + + /* Validate the adapter IVN */ + if (be32_to_cpu(readl(mmio_regs + C2_REGS_IVN)) != C2_IVN) { + printk(KERN_ERR PFX "Downlevel FIrmware level. You should be using " + "the OpenIB device support kit. " + "[fw=0x%x, c2=0x%x], Adapter not claimed\n", + be32_to_cpu(readl(mmio_regs + C2_REGS_IVN)), + C2_IVN); + ret = -EINVAL; + iounmap(mmio_regs); + goto bail2; + } + + /* Allocate hardware structure */ + c2dev = (struct c2_dev *) ib_alloc_device(sizeof *c2dev); + if (!c2dev) { + printk(KERN_ERR PFX "%s: Unable to alloc hardware struct\n", + pci_name(pcidev)); + ret = -ENOMEM; + iounmap(mmio_regs); + goto bail2; + } + + memset(c2dev, 0, sizeof(*c2dev)); + spin_lock_init(&c2dev->lock); + c2dev->pcidev = pcidev; + c2dev->cur_tx = 0; + + /* Get the last RX index */ + c2dev->cur_rx = + (be32_to_cpu(readl(mmio_regs + C2_REGS_HRX_CUR)) - + 0xffffc000) / sizeof(struct c2_rxp_desc); + + /* Request an interrupt line for the driver */ + ret = request_irq(pcidev->irq, c2_interrupt, SA_SHIRQ, DRV_NAME, c2dev); + if (ret) { + printk(KERN_ERR PFX "%s: requested IRQ %u is busy\n", + pci_name(pcidev), pcidev->irq); + iounmap(mmio_regs); + goto bail3; + } + + /* Set driver specific data */ + pci_set_drvdata(pcidev, c2dev); + + /* Initialize network device */ + if ((netdev = c2_devinit(c2dev, mmio_regs)) == NULL) { + iounmap(mmio_regs); + goto bail4; + } + + /* Save off the actual size prior to unmapping mmio_regs */ + kva_map_size = be32_to_cpu(readl(mmio_regs + C2_REGS_PCI_WINSIZE)); + + /* Unmap the adapter PCI registers in BAR4 */ + iounmap(mmio_regs); + + /* Register network device */ + ret = register_netdev(netdev); + if (ret) { + printk(KERN_ERR PFX "Unable to register netdev, ret = %d\n", + ret); + goto bail5; + } + + /* Disable network packets */ + netif_stop_queue(netdev); + + /* Remap the adapter HRXDQ PA space to kernel VA space */ + c2dev->mmio_rxp_ring = ioremap_nocache(reg4_start + C2_RXP_HRXDQ_OFFSET, + C2_RXP_HRXDQ_SIZE); + if (c2dev->mmio_rxp_ring == 0UL) { + printk(KERN_ERR PFX "Unable to remap MMIO HRXDQ region\n"); + ret = -EIO; + goto bail6; + } + + /* Remap the adapter HTXDQ PA space to kernel VA space */ + c2dev->mmio_txp_ring = ioremap_nocache(reg4_start + C2_TXP_HTXDQ_OFFSET, + C2_TXP_HTXDQ_SIZE); + if (c2dev->mmio_txp_ring == 0UL) { + printk(KERN_ERR PFX "Unable to remap MMIO HTXDQ region\n"); + ret = -EIO; + goto bail7; + } + + /* Save off the current RX index in the last 4 bytes of the TXP Ring */ + C2_SET_CUR_RX(c2dev, c2dev->cur_rx); + + /* Remap the PCI registers in adapter BAR0 to kernel VA space */ + c2dev->regs = ioremap_nocache(reg0_start, reg0_len); + if (c2dev->regs == 0UL) { + printk(KERN_ERR PFX "Unable to remap BAR0\n"); + ret = -EIO; + goto bail8; + } + + /* Remap the PCI registers in adapter BAR4 to kernel VA space */ + c2dev->pa = reg4_start + C2_PCI_REGS_OFFSET; + c2dev->kva = ioremap_nocache(reg4_start + C2_PCI_REGS_OFFSET, + kva_map_size); + if (c2dev->kva == 0UL) { + printk(KERN_ERR PFX "Unable to remap BAR4\n"); + ret = -EIO; + goto bail9; + } + + /* Print out the MAC address */ + c2_print_macaddr(netdev); + + ret = c2_rnic_init(c2dev); + if (ret) { + printk(KERN_ERR PFX "c2_rnic_init failed: %d\n", ret); + goto bail10; + } + + c2_register_device(c2dev); + + return 0; + + bail10: + iounmap(c2dev->kva); + + bail9: + iounmap(c2dev->regs); + + bail8: + iounmap(c2dev->mmio_txp_ring); + + bail7: + iounmap(c2dev->mmio_rxp_ring); + + bail6: + unregister_netdev(netdev); + + bail5: + free_netdev(netdev); + + bail4: + free_irq(pcidev->irq, c2dev); + + bail3: + ib_dealloc_device(&c2dev->ibdev); + + bail2: + pci_release_regions(pcidev); + + bail1: + pci_disable_device(pcidev); + + bail0: + return ret; +} + +static void __devexit c2_remove(struct pci_dev *pcidev) +{ + struct c2_dev *c2dev = pci_get_drvdata(pcidev); + struct net_device *netdev = c2dev->netdev; + + assert(netdev != NULL); + + /* Unregister with OpenIB */ + c2_unregister_device(c2dev); + + /* Clean up the RNIC resources */ + c2_rnic_term(c2dev); + + /* Remove network device from the kernel */ + unregister_netdev(netdev); + + /* Free network device */ + free_netdev(netdev); + + /* Free the interrupt line */ + free_irq(pcidev->irq, c2dev); + + /* missing: Turn LEDs off here */ + + /* Unmap adapter PA space */ + iounmap(c2dev->kva); + iounmap(c2dev->regs); + iounmap(c2dev->mmio_txp_ring); + iounmap(c2dev->mmio_rxp_ring); + + /* Free the hardware structure */ + ib_dealloc_device(&c2dev->ibdev); + + /* Release reserved PCI I/O and memory resources */ + pci_release_regions(pcidev); + + /* Disable PCI device */ + pci_disable_device(pcidev); + + /* Clear driver specific data */ + pci_set_drvdata(pcidev, NULL); +} + +static struct pci_driver c2_pci_driver = { + .name = DRV_NAME, + .id_table = c2_pci_table, + .probe = c2_probe, + .remove = __devexit_p(c2_remove), +}; + +static int __init c2_init_module(void) +{ + return pci_module_init(&c2_pci_driver); +} + +static void __exit c2_exit_module(void) +{ + pci_unregister_driver(&c2_pci_driver); +} + +module_init(c2_init_module); +module_exit(c2_exit_module); diff --git a/drivers/infiniband/hw/amso1100/c2.h b/drivers/infiniband/hw/amso1100/c2.h new file mode 100644 index 0000000..8124c6b --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2.h @@ -0,0 +1,567 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef __C2_H +#define __C2_H + +#include +#include +#include +#include +#include +#include + +#include "c2_provider.h" +#include "c2_mq.h" +#include "c2_status.h" + +#define DRV_NAME "c2" +#define DRV_VERSION "1.1" +#define PFX DRV_NAME ": " + +#ifdef C2_DEBUG +#define assert(expr) \ + if(!(expr)) { \ + printk(KERN_ERR PFX "Assertion failed! %s, %s, %s, line %d\n",\ + #expr, __FILE__, __FUNCTION__, __LINE__); \ + } +#define dprintk(fmt, args...) do {printk(KERN_INFO PFX fmt, ##args);} while (0) +#else +#define assert(expr) do {} while (0) +#define dprintk(fmt, args...) do {} while (0) +#endif /* C2_DEBUG */ + +#define BAR_0 0 +#define BAR_2 2 +#define BAR_4 4 + +#define RX_BUF_SIZE (1536 + 8) +#define ETH_JUMBO_MTU 9000 +#define C2_MAGIC "CEPHEUS" +#define C2_VERSION 4 +#define C2_IVN (18 & 0x7fffffff) + +#define C2_REG0_SIZE (16 * 1024) +#define C2_REG2_SIZE (2 * 1024 * 1024) +#define C2_REG4_SIZE (256 * 1024 * 1024) +#define C2_NUM_TX_DESC 341 +#define C2_NUM_RX_DESC 256 +#define C2_PCI_REGS_OFFSET (0x10000) +#define C2_RXP_HRXDQ_OFFSET (((C2_REG4_SIZE)/2)) +#define C2_RXP_HRXDQ_SIZE (4096) +#define C2_TXP_HTXDQ_OFFSET (((C2_REG4_SIZE)/2) + C2_RXP_HRXDQ_SIZE) +#define C2_TXP_HTXDQ_SIZE (4096) +#define C2_TX_TIMEOUT (6*HZ) + +/* CEPHEUS */ +static const u8 c2_magic[] = { + 0x43, 0x45, 0x50, 0x48, 0x45, 0x55, 0x53 +}; + +enum adapter_pci_regs { + C2_REGS_MAGIC = 0x0000, + C2_REGS_VERS = 0x0008, + C2_REGS_IVN = 0x000C, + C2_REGS_PCI_WINSIZE = 0x0010, + C2_REGS_Q0_QSIZE = 0x0014, + C2_REGS_Q0_MSGSIZE = 0x0018, + C2_REGS_Q0_POOLSTART = 0x001C, + C2_REGS_Q0_SHARED = 0x0020, + C2_REGS_Q1_QSIZE = 0x0024, + C2_REGS_Q1_MSGSIZE = 0x0028, + C2_REGS_Q1_SHARED = 0x0030, + C2_REGS_Q2_QSIZE = 0x0034, + C2_REGS_Q2_MSGSIZE = 0x0038, + C2_REGS_Q2_SHARED = 0x0040, + C2_REGS_ENADDR = 0x004C, + C2_REGS_RDMA_ENADDR = 0x0054, + C2_REGS_HRX_CUR = 0x006C, +}; + +struct c2_adapter_pci_regs { + char reg_magic[8]; + u32 version; + u32 ivn; + u32 pci_window_size; + u32 q0_q_size; + u32 q0_msg_size; + u32 q0_pool_start; + u32 q0_shared; + u32 q1_q_size; + u32 q1_msg_size; + u32 q1_pool_start; + u32 q1_shared; + u32 q2_q_size; + u32 q2_msg_size; + u32 q2_pool_start; + u32 q2_shared; + u32 log_start; + u32 log_size; + u8 host_enaddr[8]; + u8 rdma_enaddr[8]; + u32 crash_entry; + u32 crash_ready[2]; + u32 fw_txd_cur; + u32 fw_hrxd_cur; + u32 fw_rxd_cur; +}; + +enum pci_regs { + C2_HISR = 0x0000, + C2_DISR = 0x0004, + C2_HIMR = 0x0008, + C2_DIMR = 0x000C, + C2_NISR0 = 0x0010, + C2_NISR1 = 0x0014, + C2_NIMR0 = 0x0018, + C2_NIMR1 = 0x001C, + C2_IDIS = 0x0020, +}; + +enum { + C2_PCI_HRX_INT = 1 << 8, + C2_PCI_HTX_INT = 1 << 17, + C2_PCI_HRX_QUI = 1 << 31, +}; + +/* + * Cepheus registers in BAR0. + */ +struct c2_pci_regs { + u32 hostisr; + u32 dmaisr; + u32 hostimr; + u32 dmaimr; + u32 netisr0; + u32 netisr1; + u32 netimr0; + u32 netimr1; + u32 int_disable; +}; + +/* TXP flags */ +enum c2_txp_flags { + TXP_HTXD_DONE = 0, + TXP_HTXD_READY = 1 << 0, + TXP_HTXD_UNINIT = 1 << 1, +}; + +/* RXP flags */ +enum c2_rxp_flags { + RXP_HRXD_UNINIT = 0, + RXP_HRXD_READY = 1 << 0, + RXP_HRXD_DONE = 1 << 1, +}; + +/* RXP status */ +enum c2_rxp_status { + RXP_HRXD_ZERO = 0, + RXP_HRXD_OK = 1 << 0, + RXP_HRXD_BUF_OV = 1 << 1, +}; + +/* TXP descriptor fields */ +enum txp_desc { + C2_TXP_FLAGS = 0x0000, + C2_TXP_LEN = 0x0002, + C2_TXP_ADDR = 0x0004, +}; + +/* RXP descriptor fields */ +enum rxp_desc { + C2_RXP_FLAGS = 0x0000, + C2_RXP_STATUS = 0x0002, + C2_RXP_COUNT = 0x0004, + C2_RXP_LEN = 0x0006, + C2_RXP_ADDR = 0x0008, +}; + +struct c2_txp_desc { + u16 flags; + u16 len; + u64 addr; +} __attribute__ ((packed)); + +struct c2_rxp_desc { + u16 flags; + u16 status; + u16 count; + u16 len; + u64 addr; +} __attribute__ ((packed)); + +struct c2_rxp_hdr { + u16 flags; + u16 status; + u16 len; + u16 rsvd; +} __attribute__ ((packed)); + +struct c2_tx_desc { + u32 len; + u32 status; + dma_addr_t next_offset; +}; + +struct c2_rx_desc { + u32 len; + u32 status; + dma_addr_t next_offset; +}; + +struct c2_alloc { + u32 last; + u32 max; + spinlock_t lock; + unsigned long *table; +}; + +struct c2_array { + struct { + void **page; + int used; + } *page_list; +}; + +/* + * The MQ shared pointer pool is organized as a linked list of + * chunks. Each chunk contains a linked list of free shared pointers + * that can be allocated to a given user mode client. + * + */ +struct sp_chunk { + struct sp_chunk *next; + gfp_t gfp_mask; + u16 head; + u16 shared_ptr[0]; +}; + +struct c2_pd_table { + struct c2_alloc alloc; + struct c2_array pd; +}; + +struct c2_qp_table { + struct c2_alloc alloc; + spinlock_t lock; + struct c2_array qp; + struct c2_qp** map; +}; + +struct c2_element { + struct c2_element *next; + void *ht_desc; /* host descriptor */ + void __iomem *hw_desc; /* hardware descriptor */ + struct sk_buff *skb; + dma_addr_t mapaddr; + u32 maplen; +}; + +struct c2_ring { + struct c2_element *to_clean; + struct c2_element *to_use; + struct c2_element *start; + unsigned long count; +}; + +struct c2_dev { + struct ib_device ibdev; + void __iomem *regs; + void __iomem *mmio_txp_ring; /* remapped adapter memory for hw rings */ + void __iomem *mmio_rxp_ring; + spinlock_t lock; + struct pci_dev *pcidev; + struct net_device *netdev; + struct net_device *pseudo_netdev; + unsigned int cur_tx; + unsigned int cur_rx; + u32 adapter_handle; + int device_cap_flags; + void __iomem *kva; /* KVA device memory */ + unsigned long pa; /* PA device memory */ + void **qptr_array; + + kmem_cache_t *host_msg_cache; + + struct list_head cca_link; /* adapter list */ + struct list_head eh_wakeup_list; /* event wakeup list */ + wait_queue_head_t req_vq_wo; + + /* Cached RNIC properties */ + struct ib_device_attr props; + + struct c2_pd_table pd_table; + struct c2_qp_table qp_table; + int ports; /* num of GigE ports */ + int devnum; + spinlock_t vqlock; /* sync vbs req MQ */ + + /* Verbs Queues */ + struct c2_mq req_vq; /* Verbs Request MQ */ + struct c2_mq rep_vq; /* Verbs Reply MQ */ + struct c2_mq aeq; /* Async Events MQ */ + + /* Kernel client MQs */ + struct sp_chunk *kern_mqsp_pool; + + /* Device updates these values when posting messages to a host + * target queue */ + u16 req_vq_shared; + u16 rep_vq_shared; + u16 aeq_shared; + u16 irq_claimed; + + /* + * Shared host target pages for user-accessible MQs. + */ + int hthead; /* index of first free entry */ + void *htpages; /* kernel vaddr */ + int htlen; /* length of htpages memory */ + void *htuva; /* user mapped vaddr */ + spinlock_t htlock; /* serialize allocation */ + + u64 adapter_hint_uva; /* access to the activity FIFO */ + + // spinlock_t aeq_lock; + // spinlock_t rnic_lock; + + u16 hint_count; + u16 hints_read; + + int init; /* TRUE if it's ready */ + char ae_cache_name[16]; + char vq_cache_name[16]; +}; + +struct c2_port { + u32 msg_enable; + struct c2_dev *c2dev; + struct net_device *netdev; + + spinlock_t tx_lock; + u32 tx_avail; + struct c2_ring tx_ring; + struct c2_ring rx_ring; + + void *mem; /* PCI memory for host rings */ + dma_addr_t dma; + unsigned long mem_size; + + u32 rx_buf_size; + + struct net_device_stats netstats; +}; + +/* + * Activity FIFO registers in BAR0. + */ +#define PCI_BAR0_HOST_HINT 0x100 +#define PCI_BAR0_ADAPTER_HINT 0x2000 + +/* + * Ammasso PCI vendor id and Cepheus PCI device id. + */ +#define CQ_ARMED 0x01 +#define CQ_WAIT_FOR_DMA 0x80 + +/* + * The format of a hint is as follows: + * Lower 16 bits are the count of hints for the queue. + * Next 15 bits are the qp_index + * Upper most bit depends on who reads it: + * If read by producer, then it means Full (1) or Not-Full (0) + * If read by consumer, then it means Empty (1) or Not-Empty (0) + */ +#define C2_HINT_MAKE(q_index, hint_count) (((q_index) << 16) | hint_count) +#define C2_HINT_GET_INDEX(hint) (((hint) & 0x7FFF0000) >> 16) +#define C2_HINT_GET_COUNT(hint) ((hint) & 0x0000FFFF) + + +/* + * The following defines the offset in SDRAM for the c2_adapter_pci_regs_t + * struct. + */ +#define C2_ADAPTER_PCI_REGS_OFFSET 0x10000 + +#ifndef readq +static inline u64 readq(const void __iomem * addr) +{ + u64 ret = readl(addr + 4); + ret <<= 32; + ret |= readl(addr); + + return ret; +} +#endif + +#ifndef __raw_writeq +static inline void __raw_writeq(u64 val, void __iomem * addr) +{ + __raw_writel((u32) (val), addr); + __raw_writel((u32) (val >> 32), (addr + 4)); +} +#endif + +#define C2_SET_CUR_RX(c2dev, cur_rx) \ + __raw_writel(cpu_to_be32(cur_rx), c2dev->mmio_txp_ring + 4092) + +#define C2_GET_CUR_RX(c2dev) \ + be32_to_cpu(readl(c2dev->mmio_txp_ring + 4092)) + +static inline struct c2_dev *to_c2dev(struct ib_device *ibdev) +{ + return container_of(ibdev, struct c2_dev, ibdev); +} + +static inline int c2_errno(void *reply) +{ + switch (c2_wr_get_result(reply)) { + case C2_OK: + return 0; + case CCERR_NO_BUFS: + case CCERR_INSUFFICIENT_RESOURCES: + case CCERR_ZERO_RDMA_READ_RESOURCES: + return -ENOMEM; + case CCERR_MR_IN_USE: + case CCERR_QP_IN_USE: + return -EBUSY; + case CCERR_ADDR_IN_USE: + return -EADDRINUSE; + case CCERR_ADDR_NOT_AVAIL: + return -EADDRNOTAVAIL; + case CCERR_CONN_RESET: + return -ECONNRESET; + case CCERR_NOT_IMPLEMENTED: + case CCERR_INVALID_WQE: + return -ENOSYS; + case CCERR_QP_NOT_PRIVILEGED: + return -EPERM; + case CCERR_STACK_ERROR: + return -EPROTO; + case CCERR_ACCESS_VIOLATION: + case CCERR_BASE_AND_BOUNDS_VIOLATION: + return -EFAULT; + case CCERR_STAG_STATE_NOT_INVALID: + case CCERR_INVALID_ADDRESS: + case CCERR_INVALID_CQ: + case CCERR_INVALID_EP: + case CCERR_INVALID_MODIFIER: + case CCERR_INVALID_MTU: + case CCERR_INVALID_PD_ID: + case CCERR_INVALID_QP: + case CCERR_INVALID_RNIC: + case CCERR_INVALID_STAG: + return -EINVAL; + default: + return -EAGAIN; + } +} + +/* Device */ +extern int c2_register_device(struct c2_dev *c2dev); +extern void c2_unregister_device(struct c2_dev *c2dev); +extern int c2_rnic_init(struct c2_dev *c2dev); +extern void c2_rnic_term(struct c2_dev *c2dev); +extern void c2_rnic_interrupt(struct c2_dev *c2dev); +extern int c2_rnic_query(struct c2_dev *c2dev, struct ib_device_attr *props); +extern int c2_del_addr(struct c2_dev *c2dev, u32 inaddr, u32 inmask); +extern int c2_add_addr(struct c2_dev *c2dev, u32 inaddr, u32 inmask); + +/* QPs */ +extern int c2_alloc_qp(struct c2_dev *c2dev, struct c2_pd *pd, + struct ib_qp_init_attr *qp_attrs, struct c2_qp *qp); +extern void c2_free_qp(struct c2_dev *c2dev, struct c2_qp *qp); +extern struct ib_qp *c2_get_qp(struct ib_device *device, int qpn); +extern int c2_qp_modify(struct c2_dev *c2dev, struct c2_qp *qp, + struct ib_qp_attr *attr, int attr_mask); +extern int c2_qp_set_read_limits(struct c2_dev *c2dev, struct c2_qp *qp, + int ord, int ird); +extern int c2_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr, + struct ib_send_wr **bad_wr); +extern int c2_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *ib_wr, + struct ib_recv_wr **bad_wr); +extern int __devinit c2_init_qp_table(struct c2_dev *c2dev); +extern void __devexit c2_cleanup_qp_table(struct c2_dev *c2dev); +extern void c2_set_qp_state(struct c2_qp *, int); + +/* PDs */ +extern int c2_pd_alloc(struct c2_dev *c2dev, int privileged, struct c2_pd *pd); +extern void c2_pd_free(struct c2_dev *c2dev, struct c2_pd *pd); +extern int __devinit c2_init_pd_table(struct c2_dev *c2dev); +extern void __devexit c2_cleanup_pd_table(struct c2_dev *c2dev); + +/* CQs */ +extern int c2_init_cq(struct c2_dev *c2dev, int entries, + struct c2_ucontext *ctx, struct c2_cq *cq); +extern void c2_free_cq(struct c2_dev *c2dev, struct c2_cq *cq); +extern void c2_cq_event(struct c2_dev *c2dev, u32 mq_index); +extern void c2_cq_clean(struct c2_dev *c2dev, struct c2_qp *qp, u32 mq_index); +extern int c2_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry); +extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify); + +/* CM */ +extern int c2_llp_connect(struct iw_cm_id *cm_id, + struct iw_cm_conn_param *iw_param); +extern int c2_llp_accept(struct iw_cm_id *cm_id, + struct iw_cm_conn_param *iw_param); +extern int c2_llp_reject(struct iw_cm_id *cm_id, const void *pdata, + u8 pdata_len); +extern int c2_llp_service_create(struct iw_cm_id *cm_id, int backlog); +extern int c2_llp_service_destroy(struct iw_cm_id *cm_id); + +/* MM */ +extern int c2_nsmr_register_phys_kern(struct c2_dev *c2dev, u64 *addr_list, + int page_size, int pbl_depth, u32 length, + u32 off, u64 *va, enum c2_acf acf, + struct c2_mr *mr); +extern int c2_stag_dealloc(struct c2_dev *c2dev, u32 stag_index); + +/* AE */ +extern void c2_ae_event(struct c2_dev *c2dev, u32 mq_index); + +/* Allocators */ +extern u32 c2_alloc(struct c2_alloc *alloc); +extern void c2_free(struct c2_alloc *alloc, u32 obj); +extern int c2_alloc_init(struct c2_alloc *alloc, u32 num, u32 reserved); +extern void c2_alloc_cleanup(struct c2_alloc *alloc); +extern int c2_init_mqsp_pool(gfp_t gfp_mask, struct sp_chunk **root); +extern void c2_free_mqsp_pool(struct sp_chunk *root); +extern u16 *c2_alloc_mqsp(struct sp_chunk *head); +extern void c2_free_mqsp(u16 * mqsp); +extern void c2_array_cleanup(struct c2_array *array, int nent); +extern int c2_array_init(struct c2_array *array, int nent); +extern void c2_array_clear(struct c2_array *array, int index); +extern int c2_array_set(struct c2_array *array, int index, void *value); +extern void *c2_array_get(struct c2_array *array, int index); + +#endif diff --git a/drivers/infiniband/hw/amso1100/c2_ae.c b/drivers/infiniband/hw/amso1100/c2_ae.c new file mode 100644 index 0000000..d5e6729 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_ae.c @@ -0,0 +1,360 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include "c2.h" +#include +#include "c2_status.h" +#include "c2_ae.h" + +static int c2_convert_cm_status(u32 c2_status) +{ + switch (c2_status) { + case C2_CONN_STATUS_SUCCESS: + return 0; + case C2_CONN_STATUS_REJECTED: + return -ENETRESET; + case C2_CONN_STATUS_REFUSED: + return -ECONNREFUSED; + case C2_CONN_STATUS_TIMEDOUT: + return -ETIMEDOUT; + case C2_CONN_STATUS_NETUNREACH: + return -ENETUNREACH; + case C2_CONN_STATUS_HOSTUNREACH: + return -EHOSTUNREACH; + case C2_CONN_STATUS_INVALID_RNIC: + return -EINVAL; + case C2_CONN_STATUS_INVALID_QP: + return -EINVAL; + case C2_CONN_STATUS_INVALID_QP_STATE: + return -EINVAL; + case C2_CONN_STATUS_ADDR_NOT_AVAIL: + return -EADDRNOTAVAIL; + default: + printk(KERN_ERR PFX + "%s - Unable to convert CM status: %d\n", + __FUNCTION__, c2_status); + return -EIO; + } +} + +#ifdef C2_DEBUG +static const char* to_event_str(int event) +{ + static const char* event_str[] = { + "CCAE_REMOTE_SHUTDOWN", + "CCAE_ACTIVE_CONNECT_RESULTS", + "CCAE_CONNECTION_REQUEST", + "CCAE_LLP_CLOSE_COMPLETE", + "CCAE_TERMINATE_MESSAGE_RECEIVED", + "CCAE_LLP_CONNECTION_RESET", + "CCAE_LLP_CONNECTION_LOST", + "CCAE_LLP_SEGMENT_SIZE_INVALID", + "CCAE_LLP_INVALID_CRC", + "CCAE_LLP_BAD_FPDU", + "CCAE_INVALID_DDP_VERSION", + "CCAE_INVALID_RDMA_VERSION", + "CCAE_UNEXPECTED_OPCODE", + "CCAE_INVALID_DDP_QUEUE_NUMBER", + "CCAE_RDMA_READ_NOT_ENABLED", + "CCAE_RDMA_WRITE_NOT_ENABLED", + "CCAE_RDMA_READ_TOO_SMALL", + "CCAE_NO_L_BIT", + "CCAE_TAGGED_INVALID_STAG", + "CCAE_TAGGED_BASE_BOUNDS_VIOLATION", + "CCAE_TAGGED_ACCESS_RIGHTS_VIOLATION", + "CCAE_TAGGED_INVALID_PD", + "CCAE_WRAP_ERROR", + "CCAE_BAD_CLOSE", + "CCAE_BAD_LLP_CLOSE", + "CCAE_INVALID_MSN_RANGE", + "CCAE_INVALID_MSN_GAP", + "CCAE_IRRQ_OVERFLOW", + "CCAE_IRRQ_MSN_GAP", + "CCAE_IRRQ_MSN_RANGE", + "CCAE_IRRQ_INVALID_STAG", + "CCAE_IRRQ_BASE_BOUNDS_VIOLATION", + "CCAE_IRRQ_ACCESS_RIGHTS_VIOLATION", + "CCAE_IRRQ_INVALID_PD", + "CCAE_IRRQ_WRAP_ERROR", + "CCAE_CQ_SQ_COMPLETION_OVERFLOW", + "CCAE_CQ_RQ_COMPLETION_ERROR", + "CCAE_QP_SRQ_WQE_ERROR", + "CCAE_QP_LOCAL_CATASTROPHIC_ERROR", + "CCAE_CQ_OVERFLOW", + "CCAE_CQ_OPERATION_ERROR", + "CCAE_SRQ_LIMIT_REACHED", + "CCAE_QP_RQ_LIMIT_REACHED", + "CCAE_SRQ_CATASTROPHIC_ERROR", + "CCAE_RNIC_CATASTROPHIC_ERROR" + }; + + if (event < CCAE_REMOTE_SHUTDOWN || + event > CCAE_RNIC_CATASTROPHIC_ERROR) + return ""; + + event -= CCAE_REMOTE_SHUTDOWN; + return event_str[event]; +} + +const char *to_qp_state_str(int state) +{ + switch (state) { + case C2_QP_STATE_IDLE: + return "C2_QP_STATE_IDLE"; + case C2_QP_STATE_CONNECTING: + return "C2_QP_STATE_CONNECTING"; + case C2_QP_STATE_RTS: + return "C2_QP_STATE_RTS"; + case C2_QP_STATE_CLOSING: + return "C2_QP_STATE_CLOSING"; + case C2_QP_STATE_TERMINATE: + return "C2_QP_STATE_TERMINATE"; + case C2_QP_STATE_ERROR: + return "C2_QP_STATE_ERROR"; + default: + return ""; + }; +} +#endif + +void c2_ae_event(struct c2_dev *c2dev, u32 mq_index) +{ + struct c2_mq *mq = c2dev->qptr_array[mq_index]; + union c2wr *wr; + void *resource_user_context; + struct iw_cm_event cm_event; + struct ib_event ib_event; + enum c2_resource_indicator resource_indicator; + enum c2_event_id event_id; + unsigned long flags; + u8 *pdata = NULL; + int status; + + /* + * retreive the message + */ + wr = c2_mq_consume(mq); + if (!wr) + return; + + memset(&ib_event, 0, sizeof(ib_event)); + memset(&cm_event, 0, sizeof(cm_event)); + + event_id = c2_wr_get_id(wr); + resource_indicator = be32_to_cpu(wr->ae.ae_generic.resource_type); + resource_user_context = + (void *) (unsigned long) wr->ae.ae_generic.user_context; + + status = cm_event.status = c2_convert_cm_status(c2_wr_get_result(wr)); + + dprintk("event received c2_dev=%p, event_id=%d, " + "resource_indicator=%d, user_context=%p, status = %d\n", + c2dev, event_id, resource_indicator, resource_user_context, + status); + + switch (resource_indicator) { + case C2_RES_IND_QP:{ + + struct c2_qp *qp = (struct c2_qp *)resource_user_context; + struct iw_cm_id *cm_id = qp->cm_id; + struct c2wr_ae_active_connect_results *res; + + if (!cm_id) { + dprintk("event received, but cm_id is , qp=%p!\n", + qp); + goto ignore_it; + } + dprintk("%s: event = %s, user_context=%llx, " + "resource_type=%x, " + "resource=%x, qp_state=%s\n", + __FUNCTION__, + to_event_str(event_id), + be64_to_cpu(wr->ae.ae_generic.user_context), + be32_to_cpu(wr->ae.ae_generic.resource_type), + be32_to_cpu(wr->ae.ae_generic.resource), + to_qp_state_str(be32_to_cpu(wr->ae.ae_generic.qp_state))); + + c2_set_qp_state(qp, be32_to_cpu(wr->ae.ae_generic.qp_state)); + + switch (event_id) { + case CCAE_ACTIVE_CONNECT_RESULTS: + res = &wr->ae.ae_active_connect_results; + cm_event.event = IW_CM_EVENT_CONNECT_REPLY; + cm_event.local_addr.sin_addr.s_addr = res->laddr; + cm_event.remote_addr.sin_addr.s_addr = res->raddr; + cm_event.local_addr.sin_port = res->lport; + cm_event.remote_addr.sin_port = res->rport; + if (status == 0) { + cm_event.private_data_len = + be32_to_cpu(res->private_data_length); + } else { + spin_lock_irqsave(&qp->lock, flags); + if (qp->cm_id) { + qp->cm_id->rem_ref(qp->cm_id); + qp->cm_id = NULL; + } + spin_unlock_irqrestore(&qp->lock, flags); + cm_event.private_data_len = 0; + cm_event.private_data = NULL; + } + if (cm_event.private_data_len) { + /* copy private data */ + pdata = + kmalloc(cm_event.private_data_len, + GFP_ATOMIC); + if (!pdata) { + /* Ignore the request, maybe the + * remote peer will retry */ + dprintk ("Ignored connect request -- " + "no memory for pdata" + "private_data_len=%d\n", + cm_event.private_data_len); + goto ignore_it; + } + + memcpy(pdata, res->private_data, + cm_event.private_data_len); + + cm_event.private_data = pdata; + } + if (cm_id->event_handler) + cm_id->event_handler(cm_id, &cm_event); + break; + case CCAE_TERMINATE_MESSAGE_RECEIVED: + case CCAE_CQ_SQ_COMPLETION_OVERFLOW: + ib_event.device = &c2dev->ibdev; + ib_event.element.qp = &qp->ibqp; + ib_event.event = IB_EVENT_QP_REQ_ERR; + + if (qp->ibqp.event_handler) + qp->ibqp.event_handler(&ib_event, + qp->ibqp. + qp_context); + break; + case CCAE_BAD_CLOSE: + case CCAE_LLP_CLOSE_COMPLETE: + case CCAE_LLP_CONNECTION_RESET: + case CCAE_LLP_CONNECTION_LOST: + BUG_ON(cm_id == NULL); + BUG_ON(cm_id->event_handler==(void*)0x6b6b6b6b); + + spin_lock_irqsave(&qp->lock, flags); + if (qp->cm_id) { + qp->cm_id->rem_ref(qp->cm_id); + qp->cm_id = NULL; + } + spin_unlock_irqrestore(&qp->lock, flags); + cm_event.event = IW_CM_EVENT_CLOSE; + cm_event.status = 0; + if (cm_id->event_handler) + cm_id->event_handler(cm_id, &cm_event); + break; + default: + BUG_ON(1); + dprintk("%s:%d Unexpected event_id=%d on QP=%p, " + "CM_ID=%p\n", + __FUNCTION__, __LINE__, + event_id, qp, cm_id); + break; + } + break; + } + + case C2_RES_IND_EP:{ + + struct c2wr_ae_connection_request *req = + &wr->ae.ae_connection_request; + struct iw_cm_id *cm_id = + (struct iw_cm_id *)resource_user_context; + + dprintk("C2_RES_IND_EP event_id=%d\n", event_id); + if (event_id != CCAE_CONNECTION_REQUEST) { + dprintk("%s: Invalid event_id: %d\n", + __FUNCTION__, event_id); + break; + } + cm_event.event = IW_CM_EVENT_CONNECT_REQUEST; + cm_event.provider_data = (void*)(unsigned long)req->cr_handle; + cm_event.local_addr.sin_addr.s_addr = req->laddr; + cm_event.remote_addr.sin_addr.s_addr = req->raddr; + cm_event.local_addr.sin_port = req->lport; + cm_event.remote_addr.sin_port = req->rport; + cm_event.private_data_len = + be32_to_cpu(req->private_data_length); + + if (cm_event.private_data_len) { + pdata = + kmalloc(cm_event.private_data_len, + GFP_ATOMIC); + if (!pdata) { + /* Ignore the request, maybe the remote peer + * will retry */ + dprintk ("Ignored connect request -- " + "no memory for pdata" + "private_data_len=%d\n", + cm_event.private_data_len); + goto ignore_it; + } + memcpy(pdata, + req->private_data, + cm_event.private_data_len); + + cm_event.private_data = pdata; + } + if (cm_id->event_handler) + cm_id->event_handler(cm_id, &cm_event); + break; + } + + case C2_RES_IND_CQ:{ + struct c2_cq *cq = + (struct c2_cq *) resource_user_context; + + dprintk("IB_EVENT_CQ_ERR\n"); + ib_event.device = &c2dev->ibdev; + ib_event.element.cq = &cq->ibcq; + ib_event.event = IB_EVENT_CQ_ERR; + + if (cq->ibcq.event_handler) + cq->ibcq.event_handler(&ib_event, + cq->ibcq.cq_context); + } + + default: + printk("Bad resource indicator = %d\n", + resource_indicator); + break; + } + + ignore_it: + c2_mq_free(mq); +} diff --git a/drivers/infiniband/hw/amso1100/c2_intr.c b/drivers/infiniband/hw/amso1100/c2_intr.c new file mode 100644 index 0000000..5306a15 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_intr.c @@ -0,0 +1,211 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include "c2.h" +#include +#include "c2_vq.h" + +static void handle_mq(struct c2_dev *c2dev, u32 index); +static void handle_vq(struct c2_dev *c2dev, u32 mq_index); + +/* + * Handle RNIC interrupts + */ +void c2_rnic_interrupt(struct c2_dev *c2dev) +{ + unsigned int mq_index; + + while (c2dev->hints_read != be16_to_cpu(c2dev->hint_count)) { + mq_index = readl(c2dev->regs + PCI_BAR0_HOST_HINT); + if (mq_index & 0x80000000) { + break; + } + + c2dev->hints_read++; + handle_mq(c2dev, mq_index); + } + +} + +/* + * Top level MQ handler + */ +static void handle_mq(struct c2_dev *c2dev, u32 mq_index) +{ + if (c2dev->qptr_array[mq_index] == NULL) { + dprintk(KERN_INFO "handle_mq: stray activity for mq_index=%d\n", + mq_index); + return; + } + + switch (mq_index) { + case (0): + /* + * An index of 0 in the activity queue + * indicates the req vq now has messages + * available... + * + * Wake up any waiters waiting on req VQ + * message availability. + */ + wake_up(&c2dev->req_vq_wo); + break; + case (1): + handle_vq(c2dev, mq_index); + break; + case (2): + /* We have to purge the VQ in case there are pending + * accept reply requests that would result in the + * generation of an ESTABLISHED event. If we don't + * generate these first, a CLOSE event could end up + * being delivered before the ESTABLISHED event. + */ + handle_vq(c2dev, 1); + + c2_ae_event(c2dev, mq_index); + break; + default: + /* There is no event synchronization between CQ events + * and AE or CM events. In fact, CQE could be + * delivered for all of the I/O up to and including the + * FLUSH for a peer disconenct prior to the ESTABLISHED + * event being delivered to the app. The reason for this + * is that CM events are delivered on a thread, while AE + * and CM events are delivered on interrupt context. + */ + c2_cq_event(c2dev, mq_index); + break; + } + + return; +} + +/* + * Handles verbs WR replies. + */ +static void handle_vq(struct c2_dev *c2dev, u32 mq_index) +{ + void *adapter_msg, *reply_msg; + struct c2wr_hdr *host_msg; + struct c2wr_hdr tmp; + struct c2_mq *reply_vq; + struct c2_vq_req *req; + struct iw_cm_event cm_event; + int err; + + reply_vq = (struct c2_mq *) c2dev->qptr_array[mq_index]; + + /* + * get next msg from mq_index into adapter_msg. + * don't free it yet. + */ + adapter_msg = c2_mq_consume(reply_vq); + if (adapter_msg == NULL) { + return; + } + + host_msg = vq_repbuf_alloc(c2dev); + + /* + * If we can't get a host buffer, then we'll still + * wakeup the waiter, we just won't give him the msg. + * It is assumed the waiter will deal with this... + */ + if (!host_msg) { + dprintk("handle_vq: no repbufs!\n"); + + /* + * just copy the WR header into a local variable. + * this allows us to still demux on the context + */ + host_msg = &tmp; + memcpy(host_msg, adapter_msg, sizeof(tmp)); + reply_msg = NULL; + } else { + memcpy(host_msg, adapter_msg, reply_vq->msg_size); + reply_msg = host_msg; + } + + /* + * consume the msg from the MQ + */ + c2_mq_free(reply_vq); + + /* + * wakeup the waiter. + */ + req = (struct c2_vq_req *) (unsigned long) host_msg->context; + if (req == NULL) { + /* + * We should never get here, as the adapter should + * never send us a reply that we're not expecting. + */ + vq_repbuf_free(c2dev, host_msg); + dprintk("handle_vq: UNEXPECTEDLY got NULL req\n"); + return; + } + + err = c2_errno(reply_msg); + if (!err) switch (req->event) { + case IW_CM_EVENT_ESTABLISHED: + BUG_ON(!req->qp); + c2_set_qp_state(req->qp, + C2_QP_STATE_RTS); + case IW_CM_EVENT_CLOSE: + BUG_ON(!req->cm_id); + /* + * Move the QP to RTS if this is + * the established event + */ + cm_event.event = req->event; + cm_event.status = 0; + cm_event.local_addr = req->cm_id->local_addr; + cm_event.remote_addr = req->cm_id->remote_addr; + cm_event.private_data = NULL; + cm_event.private_data_len = 0; + BUG_ON(req->cm_id->event_handler == NULL); + req->cm_id->event_handler(req->cm_id, &cm_event); + break; + default: + break; + } + + req->reply_msg = (u64) (unsigned long) (reply_msg); + atomic_set(&req->reply_ready, 1); + wake_up(&req->wait_object); + + /* + * If the request was cancelled, then this put will + * free the vq_req memory...and reply_msg!!! + */ + vq_req_put(c2dev, req); +} diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c new file mode 100644 index 0000000..6f255b0 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c @@ -0,0 +1,720 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#ifdef NETEVENT_NOTIFIER +#include +#include +#include +#endif + + +#include +#include +#include +#include +#include "c2.h" +#include "c2_vq.h" + +/* Device capabilities */ +#define C2_MIN_PAGESIZE 1024 + +#define C2_MAX_MRS 32768 +#define C2_MAX_QPS 16000 +#define C2_MAX_WQE_SZ 256 +#define C2_MAX_QP_WR ((128*1024)/C2_MAX_WQE_SZ) +#define C2_MAX_SGES 4 +#define C2_MAX_SGE_RD 1 +#define C2_MAX_CQS 32768 +#define C2_MAX_CQES 4096 +#define C2_MAX_PDS 16384 + +/* + * Send the adapter INIT message to the amso1100 + */ +static int c2_adapter_init(struct c2_dev *c2dev) +{ + struct c2wr_init_req wr; + int err; + + memset(&wr, 0, sizeof(wr)); + c2_wr_set_id(&wr, CCWR_INIT); + wr.hdr.context = 0; + wr.hint_count = cpu_to_be64(__pa(&c2dev->hint_count)); + wr.q0_host_shared = cpu_to_be64(__pa(c2dev->req_vq.shared)); + wr.q1_host_shared = cpu_to_be64(__pa(c2dev->rep_vq.shared)); + wr.q1_host_msg_pool = cpu_to_be64(__pa(c2dev->rep_vq.msg_pool.host)); + wr.q2_host_shared = cpu_to_be64(__pa(c2dev->aeq.shared)); + wr.q2_host_msg_pool = cpu_to_be64(__pa(c2dev->aeq.msg_pool.host)); + + /* Post the init message */ + err = vq_send_wr(c2dev, (union c2wr *) & wr); + + return err; +} + +/* + * Send the adapter TERM message to the amso1100 + */ +static void c2_adapter_term(struct c2_dev *c2dev) +{ + struct c2wr_init_req wr; + + memset(&wr, 0, sizeof(wr)); + c2_wr_set_id(&wr, CCWR_TERM); + wr.hdr.context = 0; + + /* Post the init message */ + vq_send_wr(c2dev, (union c2wr *) & wr); + c2dev->init = 0; + + return; +} + +/* + * Query the adapter + */ +int c2_rnic_query(struct c2_dev *c2dev, + struct ib_device_attr *props) +{ + struct c2_vq_req *vq_req; + struct c2wr_rnic_query_req wr; + struct c2wr_rnic_query_rep *reply; + int err; + + vq_req = vq_req_alloc(c2dev); + if (!vq_req) + return -ENOMEM; + + c2_wr_set_id(&wr, CCWR_RNIC_QUERY); + wr.hdr.context = (unsigned long) vq_req; + wr.rnic_handle = c2dev->adapter_handle; + + vq_req_get(c2dev, vq_req); + + err = vq_send_wr(c2dev, (union c2wr *) &wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail1; + } + + err = vq_wait_for_reply(c2dev, vq_req); + if (err) + goto bail1; + + reply = + (struct c2wr_rnic_query_rep *) (unsigned long) (vq_req->reply_msg); + if (!reply) + err = -ENOMEM; + + err = c2_errno(reply); + if (err) + goto bail2; + + props->fw_ver = + ((u64)be32_to_cpu(reply->fw_ver_major) << 32) | + ((be32_to_cpu(reply->fw_ver_minor) && 0xFFFF) << 16) | + (be32_to_cpu(reply->fw_ver_patch) && 0xFFFF); + memcpy(&props->sys_image_guid, c2dev->netdev->dev_addr, 6); + props->max_mr_size = 0xFFFFFFFF; + props->page_size_cap = ~(C2_MIN_PAGESIZE-1); + props->vendor_id = be32_to_cpu(reply->vendor_id); + props->vendor_part_id = be32_to_cpu(reply->part_number); + props->hw_ver = be32_to_cpu(reply->hw_version); + props->max_qp = be32_to_cpu(reply->max_qps); + props->max_qp_wr = be32_to_cpu(reply->max_qp_depth); + props->device_cap_flags = c2dev->device_cap_flags; + props->max_sge = C2_MAX_SGES; + props->max_sge_rd = C2_MAX_SGE_RD; + props->max_cq = be32_to_cpu(reply->max_cqs); + props->max_cqe = be32_to_cpu(reply->max_cq_depth); + props->max_mr = be32_to_cpu(reply->max_mrs); + props->max_pd = be32_to_cpu(reply->max_pds); + props->max_qp_rd_atom = be32_to_cpu(reply->max_qp_ird); + props->max_ee_rd_atom = 0; + props->max_res_rd_atom = be32_to_cpu(reply->max_global_ird); + props->max_qp_init_rd_atom = be32_to_cpu(reply->max_qp_ord); + props->max_ee_init_rd_atom = 0; + props->atomic_cap = IB_ATOMIC_NONE; + props->max_ee = 0; + props->max_rdd = 0; + props->max_mw = be32_to_cpu(reply->max_mws); + props->max_raw_ipv6_qp = 0; + props->max_raw_ethy_qp = 0; + props->max_mcast_grp = 0; + props->max_mcast_qp_attach = 0; + props->max_total_mcast_qp_attach = 0; + props->max_ah = 0; + props->max_fmr = 0; + props->max_map_per_fmr = 0; + props->max_srq = 0; + props->max_srq_wr = 0; + props->max_srq_sge = 0; + props->max_pkeys = 0; + props->local_ca_ack_delay = 0; + + bail2: + vq_repbuf_free(c2dev, reply); + + bail1: + vq_req_free(c2dev, vq_req); + return err; +} + +/* + * Add an IP address to the RNIC interface + */ +int c2_add_addr(struct c2_dev *c2dev, u32 inaddr, u32 inmask) +{ + struct c2_vq_req *vq_req; + struct c2wr_rnic_setconfig_req *wr; + struct c2wr_rnic_setconfig_rep *reply; + struct c2_netaddr netaddr; + int err, len; + + vq_req = vq_req_alloc(c2dev); + if (!vq_req) + return -ENOMEM; + + len = sizeof(struct c2_netaddr); + wr = kmalloc(c2dev->req_vq.msg_size, GFP_KERNEL); + if (!wr) { + err = -ENOMEM; + goto bail0; + } + + c2_wr_set_id(wr, CCWR_RNIC_SETCONFIG); + wr->hdr.context = (unsigned long) vq_req; + wr->rnic_handle = c2dev->adapter_handle; + wr->option = cpu_to_be32(C2_CFG_ADD_ADDR); + + netaddr.ip_addr = inaddr; + netaddr.netmask = inmask; + netaddr.mtu = 0; + + memcpy(wr->data, &netaddr, len); + + vq_req_get(c2dev, vq_req); + + err = vq_send_wr(c2dev, (union c2wr *) wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail1; + } + + err = vq_wait_for_reply(c2dev, vq_req); + if (err) + goto bail1; + + reply = + (struct c2wr_rnic_setconfig_rep *) (unsigned long) (vq_req->reply_msg); + if (!reply) { + err = -ENOMEM; + goto bail1; + } + + err = c2_errno(reply); + vq_repbuf_free(c2dev, reply); + + bail1: + kfree(wr); + bail0: + vq_req_free(c2dev, vq_req); + return err; +} + +/* + * Delete an IP address from the RNIC interface + */ +int c2_del_addr(struct c2_dev *c2dev, u32 inaddr, u32 inmask) +{ + struct c2_vq_req *vq_req; + struct c2wr_rnic_setconfig_req *wr; + struct c2wr_rnic_setconfig_rep *reply; + struct c2_netaddr netaddr; + int err, len; + + vq_req = vq_req_alloc(c2dev); + if (!vq_req) + return -ENOMEM; + + len = sizeof(struct c2_netaddr); + wr = kmalloc(c2dev->req_vq.msg_size, GFP_KERNEL); + if (!wr) { + err = -ENOMEM; + goto bail0; + } + + c2_wr_set_id(wr, CCWR_RNIC_SETCONFIG); + wr->hdr.context = (unsigned long) vq_req; + wr->rnic_handle = c2dev->adapter_handle; + wr->option = cpu_to_be32(C2_CFG_DEL_ADDR); + + netaddr.ip_addr = inaddr; + netaddr.netmask = inmask; + netaddr.mtu = 0; + + memcpy(wr->data, &netaddr, len); + + vq_req_get(c2dev, vq_req); + + err = vq_send_wr(c2dev, (union c2wr *) wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail1; + } + + err = vq_wait_for_reply(c2dev, vq_req); + if (err) + goto bail1; + + reply = + (struct c2wr_rnic_setconfig_rep *) (unsigned long) (vq_req->reply_msg); + if (!reply) { + err = -ENOMEM; + goto bail1; + } + + err = c2_errno(reply); + vq_repbuf_free(c2dev, reply); + + bail1: + kfree(wr); + bail0: + vq_req_free(c2dev, vq_req); + return err; +} + +/* + * Open a single RNIC instance to use with all + * low level openib calls + */ +static int c2_rnic_open(struct c2_dev *c2dev) +{ + struct c2_vq_req *vq_req; + union c2wr wr; + struct c2wr_rnic_open_rep *reply; + int err; + + vq_req = vq_req_alloc(c2dev); + if (vq_req == NULL) { + return -ENOMEM; + } + + memset(&wr, 0, sizeof(wr)); + c2_wr_set_id(&wr, CCWR_RNIC_OPEN); + wr.rnic_open.req.hdr.context = (unsigned long) (vq_req); + wr.rnic_open.req.flags = cpu_to_be16(RNIC_PRIV_MODE); + wr.rnic_open.req.port_num = cpu_to_be16(0); + wr.rnic_open.req.user_context = (unsigned long) c2dev; + + vq_req_get(c2dev, vq_req); + + err = vq_send_wr(c2dev, &wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail0; + } + + err = vq_wait_for_reply(c2dev, vq_req); + if (err) { + goto bail0; + } + + reply = (struct c2wr_rnic_open_rep *) (unsigned long) (vq_req->reply_msg); + if (!reply) { + err = -ENOMEM; + goto bail0; + } + + if ((err = c2_errno(reply)) != 0) { + goto bail1; + } + + c2dev->adapter_handle = reply->rnic_handle; + + bail1: + vq_repbuf_free(c2dev, reply); + bail0: + vq_req_free(c2dev, vq_req); + return err; +} + +/* + * Close the RNIC instance + */ +static int c2_rnic_close(struct c2_dev *c2dev) +{ + struct c2_vq_req *vq_req; + union c2wr wr; + struct c2wr_rnic_close_rep *reply; + int err; + + vq_req = vq_req_alloc(c2dev); + if (vq_req == NULL) { + return -ENOMEM; + } + + memset(&wr, 0, sizeof(wr)); + c2_wr_set_id(&wr, CCWR_RNIC_CLOSE); + wr.rnic_close.req.hdr.context = (unsigned long) vq_req; + wr.rnic_close.req.rnic_handle = c2dev->adapter_handle; + + vq_req_get(c2dev, vq_req); + + err = vq_send_wr(c2dev, &wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail0; + } + + err = vq_wait_for_reply(c2dev, vq_req); + if (err) { + goto bail0; + } + + reply = (struct c2wr_rnic_close_rep *) (unsigned long) (vq_req->reply_msg); + if (!reply) { + err = -ENOMEM; + goto bail0; + } + + if ((err = c2_errno(reply)) != 0) { + goto bail1; + } + + c2dev->adapter_handle = 0; + + bail1: + vq_repbuf_free(c2dev, reply); + bail0: + vq_req_free(c2dev, vq_req); + return err; +} + +#ifdef NETEVENT_NOTIFIER +static int netevent_notifier(struct notifier_block *self, unsigned long event, + void *data) +{ + int i; + u8 *ha; + struct neighbour *neigh = data; + struct netevent_redirect *redir = data; + struct netevent_route_change *rev = data; + + switch (event) { + case NETEVENT_ROUTE_UPDATE: + printk(KERN_ERR "NETEVENT_ROUTE_UPDATE:\n"); + printk(KERN_ERR "fib_flags : %d\n", + rev->fib_info->fib_flags); + printk(KERN_ERR "fib_protocol : %d\n", + rev->fib_info->fib_protocol); + printk(KERN_ERR "fib_prefsrc : %08x\n", + rev->fib_info->fib_prefsrc); + printk(KERN_ERR "fib_priority : %d\n", + rev->fib_info->fib_priority); + break; + + case NETEVENT_NEIGH_UPDATE: + printk(KERN_ERR "NETEVENT_NEIGH_UPDATE:\n"); + printk(KERN_ERR "nud_state : %d\n", neigh->nud_state); + printk(KERN_ERR "refcnt : %d\n", neigh->refcnt); + printk(KERN_ERR "used : %d\n", neigh->used); + printk(KERN_ERR "confirmed : %d\n", neigh->confirmed); + printk(KERN_ERR " ha: "); + for (i = 0; i < neigh->dev->addr_len; i += 4) { + ha = &neigh->ha[i]; + printk("%02x:%02x:%02x:%02x:", ha[0], ha[1], ha[2], + ha[3]); + } + printk("\n"); + + printk(KERN_ERR "%8s: ", neigh->dev->name); + for (i = 0; i < neigh->dev->addr_len; i += 4) { + ha = &neigh->ha[i]; + printk("%02x:%02x:%02x:%02x:", ha[0], ha[1], ha[2], + ha[3]); + } + printk("\n"); + break; + + case NETEVENT_REDIRECT: + printk(KERN_ERR "NETEVENT_REDIRECT:\n"); + printk(KERN_ERR "old: "); + for (i = 0; i < redir->old->neighbour->dev->addr_len; i += 4) { + ha = &redir->old->neighbour->ha[i]; + printk("%02x:%02x:%02x:%02x:", ha[0], ha[1], ha[2], + ha[3]); + } + printk("\n"); + + printk(KERN_ERR "new: "); + for (i = 0; i < redir->new->neighbour->dev->addr_len; i += 4) { + ha = &redir->new->neighbour->ha[i]; + printk("%02x:%02x:%02x:%02x:", ha[0], ha[1], ha[2], + ha[3]); + } + printk("\n"); + break; + + default: + printk(KERN_ERR "NETEVENT_WTFO:\n"); + } + + return NOTIFY_DONE; +} + +static struct notifier_block nb = { + .notifier_call = netevent_notifier, +}; +#endif +/* + * Called by c2_probe to initialize the RNIC. This principally + * involves initalizing the various limits and resouce pools that + * comprise the RNIC instance. + */ +int c2_rnic_init(struct c2_dev *c2dev) +{ + int err; + u32 qsize, msgsize; + void *q1_pages; + void *q2_pages; + void __iomem *mmio_regs; + + /* Device capabilities */ + c2dev->device_cap_flags = + (IB_DEVICE_RESIZE_MAX_WR | + IB_DEVICE_CURR_QP_STATE_MOD | + IB_DEVICE_SYS_IMAGE_GUID | + IB_DEVICE_ZERO_STAG | + IB_DEVICE_SEND_W_INV | IB_DEVICE_MEM_WINDOW); + + /* Allocate the qptr_array */ + c2dev->qptr_array = vmalloc(C2_MAX_CQS * sizeof(void *)); + if (!c2dev->qptr_array) { + return -ENOMEM; + } + + /* Inialize the qptr_array */ + memset(c2dev->qptr_array, 0, C2_MAX_CQS * sizeof(void *)); + c2dev->qptr_array[0] = (void *) &c2dev->req_vq; + c2dev->qptr_array[1] = (void *) &c2dev->rep_vq; + c2dev->qptr_array[2] = (void *) &c2dev->aeq; + + /* Initialize data structures */ + init_waitqueue_head(&c2dev->req_vq_wo); + spin_lock_init(&c2dev->vqlock); + spin_lock_init(&c2dev->lock); + + /* Allocate MQ shared pointer pool for kernel clients. User + * mode client pools are hung off the user context + */ + err = c2_init_mqsp_pool(GFP_KERNEL, &c2dev->kern_mqsp_pool); + if (err) { + goto bail0; + } + + /* Allocate shared pointers for Q0, Q1, and Q2 from + * the shared pointer pool. + */ + c2dev->req_vq.shared = c2_alloc_mqsp(c2dev->kern_mqsp_pool); + c2dev->rep_vq.shared = c2_alloc_mqsp(c2dev->kern_mqsp_pool); + c2dev->aeq.shared = c2_alloc_mqsp(c2dev->kern_mqsp_pool); + if (!c2dev->req_vq.shared || + !c2dev->rep_vq.shared || !c2dev->aeq.shared) { + err = -ENOMEM; + goto bail1; + } + + mmio_regs = c2dev->kva; + /* Initialize the Verbs Request Queue */ + c2_mq_req_init(&c2dev->req_vq, 0, + be32_to_cpu(readl(mmio_regs + C2_REGS_Q0_QSIZE)), + be32_to_cpu(readl(mmio_regs + C2_REGS_Q0_MSGSIZE)), + mmio_regs + + be32_to_cpu(readl(mmio_regs + C2_REGS_Q0_POOLSTART)), + mmio_regs + + be32_to_cpu(readl(mmio_regs + C2_REGS_Q0_SHARED)), + C2_MQ_ADAPTER_TARGET); + + /* Initialize the Verbs Reply Queue */ + qsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q1_QSIZE)); + msgsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q1_MSGSIZE)); + q1_pages = kmalloc(qsize * msgsize, GFP_KERNEL); + if (!q1_pages) { + err = -ENOMEM; + goto bail1; + } + c2_mq_rep_init(&c2dev->rep_vq, + 1, + qsize, + msgsize, + q1_pages, + mmio_regs + + be32_to_cpu(readl(mmio_regs + C2_REGS_Q1_SHARED)), + C2_MQ_HOST_TARGET); + + /* Initialize the Asynchronus Event Queue */ + qsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q2_QSIZE)); + msgsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q2_MSGSIZE)); + q2_pages = kmalloc(qsize * msgsize, GFP_KERNEL); + if (!q2_pages) { + err = -ENOMEM; + goto bail2; + } + c2_mq_rep_init(&c2dev->aeq, + 2, + qsize, + msgsize, + q2_pages, + mmio_regs + + be32_to_cpu(readl(mmio_regs + C2_REGS_Q2_SHARED)), + C2_MQ_HOST_TARGET); + + /* Initialize the verbs request allocator */ + err = vq_init(c2dev); + if (err) + goto bail3; + + /* Enable interrupts on the adapter */ + writel(0, c2dev->regs + C2_IDIS); + + /* create the WR init message */ + err = c2_adapter_init(c2dev); + if (err) + goto bail4; + c2dev->init++; + + /* open an adapter instance */ + err = c2_rnic_open(c2dev); + if (err) + goto bail4; + + /* Initialize cached the adapter limits */ + if (c2_rnic_query(c2dev, &c2dev->props)) + goto bail4; + + /* Initialize the PD pool */ + err = c2_init_pd_table(c2dev); + if (err) + goto bail5; + + /* Initialize the QP pool */ + err = c2_init_qp_table(c2dev); + if (err) + goto bail6; + +#ifdef NETEVENT_NOTIFIER + register_netevent_notifier(&nb); +#endif + return 0; + + bail6: + c2_cleanup_pd_table(c2dev); + bail5: + c2_rnic_close(c2dev); + bail4: + vq_term(c2dev); + bail3: + kfree(q2_pages); + bail2: + kfree(q1_pages); + bail1: + c2_free_mqsp_pool(c2dev->kern_mqsp_pool); + bail0: + vfree(c2dev->qptr_array); + + return err; +} + +/* + * Called by c2_remove to cleanup the RNIC resources. + */ +void c2_rnic_term(struct c2_dev *c2dev) +{ +#ifdef NETEVENT_NOTIFIER + unregister_netevent_notifier(&nb); +#endif + + /* Close the open adapter instance */ + c2_rnic_close(c2dev); + + /* Send the TERM message to the adapter */ + c2_adapter_term(c2dev); + + /* Disable interrupts on the adapter */ + writel(1, c2dev->regs + C2_IDIS); + + /* Free the QP pool */ + c2_cleanup_qp_table(c2dev); + + /* Free the PD pool */ + c2_cleanup_pd_table(c2dev); + + /* Free the verbs request allocator */ + vq_term(c2dev); + + /* Free the asynchronus event queue */ + kfree(c2dev->aeq.msg_pool.host); + + /* Free the verbs reply queue */ + kfree(c2dev->rep_vq.msg_pool.host); + + /* Free the MQ shared pointer pool */ + c2_free_mqsp_pool(c2dev->kern_mqsp_pool); + + /* Free the qptr_array */ + vfree(c2dev->qptr_array); + + return; +} From swise at opengridcomputing.com Wed May 31 11:27:40 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 31 May 2006 13:27:40 -0500 Subject: [openib-general] [PATCH 3/7] AMSO1100 WR / Event Definitions. In-Reply-To: <20060531182733.3652.54755.stgit@stevo-desktop> References: <20060531182733.3652.54755.stgit@stevo-desktop> Message-ID: <20060531182740.3652.64865.stgit@stevo-desktop> --- drivers/infiniband/hw/amso1100/c2_ae.h | 108 ++ drivers/infiniband/hw/amso1100/c2_status.h | 158 +++ drivers/infiniband/hw/amso1100/c2_wr.h | 1523 ++++++++++++++++++++++++++++ 3 files changed, 1789 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_ae.h b/drivers/infiniband/hw/amso1100/c2_ae.h new file mode 100644 index 0000000..3a065c3 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_ae.h @@ -0,0 +1,108 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef _C2_AE_H_ +#define _C2_AE_H_ + +/* + * WARNING: If you change this file, also bump C2_IVN_BASE + * in common/include/clustercore/c2_ivn.h. + */ + +/* + * Asynchronous Event Identifiers + * + * These start at 0x80 only so it's obvious from inspection that + * they are not work-request statuses. This isn't critical. + * + * NOTE: these event id's must fit in eight bits. + */ +enum c2_event_id { + CCAE_REMOTE_SHUTDOWN = 0x80, + CCAE_ACTIVE_CONNECT_RESULTS, + CCAE_CONNECTION_REQUEST, + CCAE_LLP_CLOSE_COMPLETE, + CCAE_TERMINATE_MESSAGE_RECEIVED, + CCAE_LLP_CONNECTION_RESET, + CCAE_LLP_CONNECTION_LOST, + CCAE_LLP_SEGMENT_SIZE_INVALID, + CCAE_LLP_INVALID_CRC, + CCAE_LLP_BAD_FPDU, + CCAE_INVALID_DDP_VERSION, + CCAE_INVALID_RDMA_VERSION, + CCAE_UNEXPECTED_OPCODE, + CCAE_INVALID_DDP_QUEUE_NUMBER, + CCAE_RDMA_READ_NOT_ENABLED, + CCAE_RDMA_WRITE_NOT_ENABLED, + CCAE_RDMA_READ_TOO_SMALL, + CCAE_NO_L_BIT, + CCAE_TAGGED_INVALID_STAG, + CCAE_TAGGED_BASE_BOUNDS_VIOLATION, + CCAE_TAGGED_ACCESS_RIGHTS_VIOLATION, + CCAE_TAGGED_INVALID_PD, + CCAE_WRAP_ERROR, + CCAE_BAD_CLOSE, + CCAE_BAD_LLP_CLOSE, + CCAE_INVALID_MSN_RANGE, + CCAE_INVALID_MSN_GAP, + CCAE_IRRQ_OVERFLOW, + CCAE_IRRQ_MSN_GAP, + CCAE_IRRQ_MSN_RANGE, + CCAE_IRRQ_INVALID_STAG, + CCAE_IRRQ_BASE_BOUNDS_VIOLATION, + CCAE_IRRQ_ACCESS_RIGHTS_VIOLATION, + CCAE_IRRQ_INVALID_PD, + CCAE_IRRQ_WRAP_ERROR, + CCAE_CQ_SQ_COMPLETION_OVERFLOW, + CCAE_CQ_RQ_COMPLETION_ERROR, + CCAE_QP_SRQ_WQE_ERROR, + CCAE_QP_LOCAL_CATASTROPHIC_ERROR, + CCAE_CQ_OVERFLOW, + CCAE_CQ_OPERATION_ERROR, + CCAE_SRQ_LIMIT_REACHED, + CCAE_QP_RQ_LIMIT_REACHED, + CCAE_SRQ_CATASTROPHIC_ERROR, + CCAE_RNIC_CATASTROPHIC_ERROR +/* WARNING If you add more id's, make sure their values fit in eight bits. */ +}; + +/* + * Resource Indicators and Identifiers + */ +enum c2_resource_indicator { + C2_RES_IND_QP = 1, + C2_RES_IND_EP, + C2_RES_IND_CQ, + C2_RES_IND_SRQ, +}; + +#endif /* _C2_AE_H_ */ diff --git a/drivers/infiniband/hw/amso1100/c2_status.h b/drivers/infiniband/hw/amso1100/c2_status.h new file mode 100644 index 0000000..6ee4aa9 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_status.h @@ -0,0 +1,158 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef _C2_STATUS_H_ +#define _C2_STATUS_H_ + +/* + * Verbs Status Codes + */ +enum c2_status { + C2_OK = 0, /* This must be zero */ + CCERR_INSUFFICIENT_RESOURCES = 1, + CCERR_INVALID_MODIFIER = 2, + CCERR_INVALID_MODE = 3, + CCERR_IN_USE = 4, + CCERR_INVALID_RNIC = 5, + CCERR_INTERRUPTED_OPERATION = 6, + CCERR_INVALID_EH = 7, + CCERR_INVALID_CQ = 8, + CCERR_CQ_EMPTY = 9, + CCERR_NOT_IMPLEMENTED = 10, + CCERR_CQ_DEPTH_TOO_SMALL = 11, + CCERR_PD_IN_USE = 12, + CCERR_INVALID_PD = 13, + CCERR_INVALID_SRQ = 14, + CCERR_INVALID_ADDRESS = 15, + CCERR_INVALID_NETMASK = 16, + CCERR_INVALID_QP = 17, + CCERR_INVALID_QP_STATE = 18, + CCERR_TOO_MANY_WRS_POSTED = 19, + CCERR_INVALID_WR_TYPE = 20, + CCERR_INVALID_SGL_LENGTH = 21, + CCERR_INVALID_SQ_DEPTH = 22, + CCERR_INVALID_RQ_DEPTH = 23, + CCERR_INVALID_ORD = 24, + CCERR_INVALID_IRD = 25, + CCERR_QP_ATTR_CANNOT_CHANGE = 26, + CCERR_INVALID_STAG = 27, + CCERR_QP_IN_USE = 28, + CCERR_OUTSTANDING_WRS = 29, + CCERR_STAG_IN_USE = 30, + CCERR_INVALID_STAG_INDEX = 31, + CCERR_INVALID_SGL_FORMAT = 32, + CCERR_ADAPTER_TIMEOUT = 33, + CCERR_INVALID_CQ_DEPTH = 34, + CCERR_INVALID_PRIVATE_DATA_LENGTH = 35, + CCERR_INVALID_EP = 36, + CCERR_MR_IN_USE = CCERR_STAG_IN_USE, + CCERR_FLUSHED = 38, + CCERR_INVALID_WQE = 39, + CCERR_LOCAL_QP_CATASTROPHIC_ERROR = 40, + CCERR_REMOTE_TERMINATION_ERROR = 41, + CCERR_BASE_AND_BOUNDS_VIOLATION = 42, + CCERR_ACCESS_VIOLATION = 43, + CCERR_INVALID_PD_ID = 44, + CCERR_WRAP_ERROR = 45, + CCERR_INV_STAG_ACCESS_ERROR = 46, + CCERR_ZERO_RDMA_READ_RESOURCES = 47, + CCERR_QP_NOT_PRIVILEGED = 48, + CCERR_STAG_STATE_NOT_INVALID = 49, + CCERR_INVALID_PAGE_SIZE = 50, + CCERR_INVALID_BUFFER_SIZE = 51, + CCERR_INVALID_PBE = 52, + CCERR_INVALID_FBO = 53, + CCERR_INVALID_LENGTH = 54, + CCERR_INVALID_ACCESS_RIGHTS = 55, + CCERR_PBL_TOO_BIG = 56, + CCERR_INVALID_VA = 57, + CCERR_INVALID_REGION = 58, + CCERR_INVALID_WINDOW = 59, + CCERR_TOTAL_LENGTH_TOO_BIG = 60, + CCERR_INVALID_QP_ID = 61, + CCERR_ADDR_IN_USE = 62, + CCERR_ADDR_NOT_AVAIL = 63, + CCERR_NET_DOWN = 64, + CCERR_NET_UNREACHABLE = 65, + CCERR_CONN_ABORTED = 66, + CCERR_CONN_RESET = 67, + CCERR_NO_BUFS = 68, + CCERR_CONN_TIMEDOUT = 69, + CCERR_CONN_REFUSED = 70, + CCERR_HOST_UNREACHABLE = 71, + CCERR_INVALID_SEND_SGL_DEPTH = 72, + CCERR_INVALID_RECV_SGL_DEPTH = 73, + CCERR_INVALID_RDMA_WRITE_SGL_DEPTH = 74, + CCERR_INSUFFICIENT_PRIVILEGES = 75, + CCERR_STACK_ERROR = 76, + CCERR_INVALID_VERSION = 77, + CCERR_INVALID_MTU = 78, + CCERR_INVALID_IMAGE = 79, + CCERR_PENDING = 98, /* not an error; user internally by adapter */ + CCERR_DEFER = 99, /* not an error; used internally by adapter */ + CCERR_FAILED_WRITE = 100, + CCERR_FAILED_ERASE = 101, + CCERR_FAILED_VERIFICATION = 102, + CCERR_NOT_FOUND = 103, + +}; + +/* + * CCAE_ACTIVE_CONNECT_RESULTS status result codes. + */ +enum c2_connect_status { + C2_CONN_STATUS_SUCCESS = C2_OK, + C2_CONN_STATUS_NO_MEM = CCERR_INSUFFICIENT_RESOURCES, + C2_CONN_STATUS_TIMEDOUT = CCERR_CONN_TIMEDOUT, + C2_CONN_STATUS_REFUSED = CCERR_CONN_REFUSED, + C2_CONN_STATUS_NETUNREACH = CCERR_NET_UNREACHABLE, + C2_CONN_STATUS_HOSTUNREACH = CCERR_HOST_UNREACHABLE, + C2_CONN_STATUS_INVALID_RNIC = CCERR_INVALID_RNIC, + C2_CONN_STATUS_INVALID_QP = CCERR_INVALID_QP, + C2_CONN_STATUS_INVALID_QP_STATE = CCERR_INVALID_QP_STATE, + C2_CONN_STATUS_REJECTED = CCERR_CONN_RESET, + C2_CONN_STATUS_ADDR_NOT_AVAIL = CCERR_ADDR_NOT_AVAIL, +}; + +/* + * Flash programming status codes. + */ +enum c2_flash_status { + C2_FLASH_STATUS_SUCCESS = 0x0000, + C2_FLASH_STATUS_VERIFY_ERR = 0x0002, + C2_FLASH_STATUS_IMAGE_ERR = 0x0004, + C2_FLASH_STATUS_ECLBS = 0x0400, + C2_FLASH_STATUS_PSLBS = 0x0800, + C2_FLASH_STATUS_VPENS = 0x1000, +}; + +#endif /* _C2_STATUS_H_ */ diff --git a/drivers/infiniband/hw/amso1100/c2_wr.h b/drivers/infiniband/hw/amso1100/c2_wr.h new file mode 100644 index 0000000..9d6468d --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_wr.h @@ -0,0 +1,1523 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef _C2_WR_H_ +#define _C2_WR_H_ + +#ifdef CCDEBUG +#define CCWR_MAGIC 0xb07700b0 +#endif + +#define C2_QP_NO_ATTR_CHANGE 0xFFFFFFFF + +/* Maximum allowed size in bytes of private_data exchange + * on connect. + */ +#define C2_MAX_PRIVATE_DATA_SIZE 200 + +/* + * These types are shared among the adapter, host, and CCIL consumer. + */ +enum c2_cq_notification_type { + C2_CQ_NOTIFICATION_TYPE_NONE = 1, + C2_CQ_NOTIFICATION_TYPE_NEXT, + C2_CQ_NOTIFICATION_TYPE_NEXT_SE +}; + +enum c2_setconfig_cmd { + C2_CFG_ADD_ADDR = 1, + C2_CFG_DEL_ADDR = 2, + C2_CFG_ADD_ROUTE = 3, + C2_CFG_DEL_ROUTE = 4 +}; + +enum c2_getconfig_cmd { + C2_GETCONFIG_ROUTES = 1, + C2_GETCONFIG_ADDRS +}; + +/* + * CCIL Work Request Identifiers + */ +enum c2wr_ids { + CCWR_RNIC_OPEN = 1, + CCWR_RNIC_QUERY, + CCWR_RNIC_SETCONFIG, + CCWR_RNIC_GETCONFIG, + CCWR_RNIC_CLOSE, + CCWR_CQ_CREATE, + CCWR_CQ_QUERY, + CCWR_CQ_MODIFY, + CCWR_CQ_DESTROY, + CCWR_QP_CONNECT, + CCWR_PD_ALLOC, + CCWR_PD_DEALLOC, + CCWR_SRQ_CREATE, + CCWR_SRQ_QUERY, + CCWR_SRQ_MODIFY, + CCWR_SRQ_DESTROY, + CCWR_QP_CREATE, + CCWR_QP_QUERY, + CCWR_QP_MODIFY, + CCWR_QP_DESTROY, + CCWR_NSMR_STAG_ALLOC, + CCWR_NSMR_REGISTER, + CCWR_NSMR_PBL, + CCWR_STAG_DEALLOC, + CCWR_NSMR_REREGISTER, + CCWR_SMR_REGISTER, + CCWR_MR_QUERY, + CCWR_MW_ALLOC, + CCWR_MW_QUERY, + CCWR_EP_CREATE, + CCWR_EP_GETOPT, + CCWR_EP_SETOPT, + CCWR_EP_DESTROY, + CCWR_EP_BIND, + CCWR_EP_CONNECT, + CCWR_EP_LISTEN, + CCWR_EP_SHUTDOWN, + CCWR_EP_LISTEN_CREATE, + CCWR_EP_LISTEN_DESTROY, + CCWR_EP_QUERY, + CCWR_CR_ACCEPT, + CCWR_CR_REJECT, + CCWR_CONSOLE, + CCWR_TERM, + CCWR_FLASH_INIT, + CCWR_FLASH, + CCWR_BUF_ALLOC, + CCWR_BUF_FREE, + CCWR_FLASH_WRITE, + CCWR_INIT, /* WARNING: Don't move this ever again! */ + + + + /* Add new IDs here */ + + + + /* + * WARNING: CCWR_LAST must always be the last verbs id defined! + * All the preceding IDs are fixed, and must not change. + * You can add new IDs, but must not remove or reorder + * any IDs. If you do, YOU will ruin any hope of + * compatability between versions. + */ + CCWR_LAST, + + /* + * Start over at 1 so that arrays indexed by user wr id's + * begin at 1. This is OK since the verbs and user wr id's + * are always used on disjoint sets of queues. + */ + /* + * The order of the CCWR_SEND_XX verbs must + * match the order of the RDMA_OPs + */ + CCWR_SEND = 1, + CCWR_SEND_INV, + CCWR_SEND_SE, + CCWR_SEND_SE_INV, + CCWR_RDMA_WRITE, + CCWR_RDMA_READ, + CCWR_RDMA_READ_INV, + CCWR_MW_BIND, + CCWR_NSMR_FASTREG, + CCWR_STAG_INVALIDATE, + CCWR_RECV, + CCWR_NOP, + CCWR_UNIMPL, +/* WARNING: This must always be the last user wr id defined! */ +}; +#define RDMA_SEND_OPCODE_FROM_WR_ID(x) (x+2) + +/* + * SQ/RQ Work Request Types + */ +enum c2_wr_type { + C2_WR_TYPE_SEND = CCWR_SEND, + C2_WR_TYPE_SEND_SE = CCWR_SEND_SE, + C2_WR_TYPE_SEND_INV = CCWR_SEND_INV, + C2_WR_TYPE_SEND_SE_INV = CCWR_SEND_SE_INV, + C2_WR_TYPE_RDMA_WRITE = CCWR_RDMA_WRITE, + C2_WR_TYPE_RDMA_READ = CCWR_RDMA_READ, + C2_WR_TYPE_RDMA_READ_INV_STAG = CCWR_RDMA_READ_INV, + C2_WR_TYPE_BIND_MW = CCWR_MW_BIND, + C2_WR_TYPE_FASTREG_NSMR = CCWR_NSMR_FASTREG, + C2_WR_TYPE_INV_STAG = CCWR_STAG_INVALIDATE, + C2_WR_TYPE_RECV = CCWR_RECV, + C2_WR_TYPE_NOP = CCWR_NOP, +}; + +struct c2_netaddr { + u32 ip_addr; + u32 netmask; + u32 mtu; +}; + +struct c2_route { + u32 ip_addr; /* 0 indicates the default route */ + u32 netmask; /* netmask associated with dst */ + u32 flags; + union { + u32 ipaddr; /* address of the nexthop interface */ + u8 enaddr[6]; + } nexthop; +}; + +/* + * A Scatter Gather Entry. + */ +struct c2_data_addr { + u32 stag; + u32 length; + u64 to; +}; + +/* + * MR and MW flags used by the consumer, RI, and RNIC. + */ +enum c2_mm_flags { + MEM_REMOTE = 0x0001, /* allow mw binds with remote access. */ + MEM_VA_BASED = 0x0002, /* Not Zero-based */ + MEM_PBL_COMPLETE = 0x0004, /* PBL array is complete in this msg */ + MEM_LOCAL_READ = 0x0008, /* allow local reads */ + MEM_LOCAL_WRITE = 0x0010, /* allow local writes */ + MEM_REMOTE_READ = 0x0020, /* allow remote reads */ + MEM_REMOTE_WRITE = 0x0040, /* allow remote writes */ + MEM_WINDOW_BIND = 0x0080, /* binds allowed */ + MEM_SHARED = 0x0100, /* set if MR is shared */ + MEM_STAG_VALID = 0x0200 /* set if STAG is in valid state */ +}; + +/* + * CCIL API ACF flags defined in terms of the low level mem flags. + * This minimizes translation needed in the user API + */ +enum c2_acf { + C2_ACF_LOCAL_READ = MEM_LOCAL_READ, + C2_ACF_LOCAL_WRITE = MEM_LOCAL_WRITE, + C2_ACF_REMOTE_READ = MEM_REMOTE_READ, + C2_ACF_REMOTE_WRITE = MEM_REMOTE_WRITE, + C2_ACF_WINDOW_BIND = MEM_WINDOW_BIND +}; + +/* + * Image types of objects written to flash + */ +#define C2_FLASH_IMG_BITFILE 1 +#define C2_FLASH_IMG_OPTION_ROM 2 +#define C2_FLASH_IMG_VPD 3 + +/* + * to fix bug 1815 we define the max size allowable of the + * terminate message (per the IETF spec).Refer to the IETF + * protocal specification, section 12.1.6, page 64) + * The message is prefixed by 20 types of DDP info. + * + * Then the message has 6 bytes for the terminate control + * and DDP segment length info plus a DDP header (either + * 14 or 18 byts) plus 28 bytes for the RDMA header. + * Thus the max size in: + * 20 + (6 + 18 + 28) = 72 + */ +#define C2_MAX_TERMINATE_MESSAGE_SIZE (72) + +/* + * Build String Length. It must be the same as C2_BUILD_STR_LEN in ccil_api.h + */ +#define WR_BUILD_STR_LEN 64 + +/* + * WARNING: All of these structs need to align any 64bit types on + * 64 bit boundaries! 64bit types include u64 and u64. + */ + +/* + * Clustercore Work Request Header. Be sensitive to field layout + * and alignment. + */ +struct c2wr_hdr { + /* wqe_count is part of the cqe. It is put here so the + * adapter can write to it while the wr is pending without + * clobbering part of the wr. This word need not be dma'd + * from the host to adapter by libccil, but we copy it anyway + * to make the memcpy to the adapter better aligned. + */ + u32 wqe_count; + + /* Put these fields next so that later 32- and 64-bit + * quantities are naturally aligned. + */ + u8 id; + u8 result; /* adapter -> host */ + u8 sge_count; /* host -> adapter */ + u8 flags; /* host -> adapter */ + + u64 context; +#ifdef CCMSGMAGIC + u32 magic; + u32 pad; +#endif +} __attribute__((packed)); + +/* + *------------------------ RNIC ------------------------ + */ + +/* + * WR_RNIC_OPEN + */ + +/* + * Flags for the RNIC WRs + */ +enum c2_rnic_flags { + RNIC_IRD_STATIC = 0x0001, + RNIC_ORD_STATIC = 0x0002, + RNIC_QP_STATIC = 0x0004, + RNIC_SRQ_SUPPORTED = 0x0008, + RNIC_PBL_BLOCK_MODE = 0x0010, + RNIC_SRQ_MODEL_ARRIVAL = 0x0020, + RNIC_CQ_OVF_DETECTED = 0x0040, + RNIC_PRIV_MODE = 0x0080 +}; + +struct c2wr_rnic_open_req { + struct c2wr_hdr hdr; + u64 user_context; + u16 flags; /* See enum c2_rnic_flags */ + u16 port_num; +} __attribute__((packed)); + +struct c2wr_rnic_open_rep { + struct c2wr_hdr hdr; + u32 rnic_handle; +} __attribute__((packed)); + +union c2wr_rnic_open { + struct c2wr_rnic_open_req req; + struct c2wr_rnic_open_rep rep; +} __attribute__((packed)); + +struct c2wr_rnic_query_req { + struct c2wr_hdr hdr; + u32 rnic_handle; +} __attribute__((packed)); + +/* + * WR_RNIC_QUERY + */ +struct c2wr_rnic_query_rep { + struct c2wr_hdr hdr; + u64 user_context; + u32 vendor_id; + u32 part_number; + u32 hw_version; + u32 fw_ver_major; + u32 fw_ver_minor; + u32 fw_ver_patch; + char fw_ver_build_str[WR_BUILD_STR_LEN]; + u32 max_qps; + u32 max_qp_depth; + u32 max_srq_depth; + u32 max_send_sgl_depth; + u32 max_rdma_sgl_depth; + u32 max_cqs; + u32 max_cq_depth; + u32 max_cq_event_handlers; + u32 max_mrs; + u32 max_pbl_depth; + u32 max_pds; + u32 max_global_ird; + u32 max_global_ord; + u32 max_qp_ird; + u32 max_qp_ord; + u32 flags; + u32 max_mws; + u32 pbe_range_low; + u32 pbe_range_high; + u32 max_srqs; + u32 page_size; +} __attribute__((packed)); + +union c2wr_rnic_query { + struct c2wr_rnic_query_req req; + struct c2wr_rnic_query_rep rep; +} __attribute__((packed)); + +/* + * WR_RNIC_GETCONFIG + */ + +struct c2wr_rnic_getconfig_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 option; /* see c2_getconfig_cmd_t */ + u64 reply_buf; + u32 reply_buf_len; +} __attribute__((packed)) ; + +struct c2wr_rnic_getconfig_rep { + struct c2wr_hdr hdr; + u32 option; /* see c2_getconfig_cmd_t */ + u32 count_len; /* length of the number of addresses configured */ +} __attribute__((packed)) ; + +union c2wr_rnic_getconfig { + struct c2wr_rnic_getconfig_req req; + struct c2wr_rnic_getconfig_rep rep; +} __attribute__((packed)) ; + +/* + * WR_RNIC_SETCONFIG + */ +struct c2wr_rnic_setconfig_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 option; /* See c2_setconfig_cmd_t */ + /* variable data and pad. See c2_netaddr and c2_route */ + u8 data[0]; +} __attribute__((packed)) ; + +struct c2wr_rnic_setconfig_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_rnic_setconfig { + struct c2wr_rnic_setconfig_req req; + struct c2wr_rnic_setconfig_rep rep; +} __attribute__((packed)) ; + +/* + * WR_RNIC_CLOSE + */ +struct c2wr_rnic_close_req { + struct c2wr_hdr hdr; + u32 rnic_handle; +} __attribute__((packed)) ; + +struct c2wr_rnic_close_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_rnic_close { + struct c2wr_rnic_close_req req; + struct c2wr_rnic_close_rep rep; +} __attribute__((packed)) ; + +/* + *------------------------ CQ ------------------------ + */ +struct c2wr_cq_create_req { + struct c2wr_hdr hdr; + u64 shared_ht; + u64 user_context; + u64 msg_pool; + u32 rnic_handle; + u32 msg_size; + u32 depth; +} __attribute__((packed)) ; + +struct c2wr_cq_create_rep { + struct c2wr_hdr hdr; + u32 mq_index; + u32 adapter_shared; + u32 cq_handle; +} __attribute__((packed)) ; + +union c2wr_cq_create { + struct c2wr_cq_create_req req; + struct c2wr_cq_create_rep rep; +} __attribute__((packed)) ; + +struct c2wr_cq_modify_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 cq_handle; + u32 new_depth; + u64 new_msg_pool; +} __attribute__((packed)) ; + +struct c2wr_cq_modify_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_cq_modify { + struct c2wr_cq_modify_req req; + struct c2wr_cq_modify_rep rep; +} __attribute__((packed)) ; + +struct c2wr_cq_destroy_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 cq_handle; +} __attribute__((packed)) ; + +struct c2wr_cq_destroy_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_cq_destroy { + struct c2wr_cq_destroy_req req; + struct c2wr_cq_destroy_rep rep; +} __attribute__((packed)) ; + +/* + *------------------------ PD ------------------------ + */ +struct c2wr_pd_alloc_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 pd_id; +} __attribute__((packed)) ; + +struct c2wr_pd_alloc_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_pd_alloc { + struct c2wr_pd_alloc_req req; + struct c2wr_pd_alloc_rep rep; +} __attribute__((packed)) ; + +struct c2wr_pd_dealloc_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 pd_id; +} __attribute__((packed)) ; + +struct c2wr_pd_dealloc_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_pd_dealloc { + struct c2wr_pd_dealloc_req req; + struct c2wr_pd_dealloc_rep rep; +} __attribute__((packed)) ; + +/* + *------------------------ SRQ ------------------------ + */ +struct c2wr_srq_create_req { + struct c2wr_hdr hdr; + u64 shared_ht; + u64 user_context; + u32 rnic_handle; + u32 srq_depth; + u32 srq_limit; + u32 sgl_depth; + u32 pd_id; +} __attribute__((packed)) ; + +struct c2wr_srq_create_rep { + struct c2wr_hdr hdr; + u32 srq_depth; + u32 sgl_depth; + u32 msg_size; + u32 mq_index; + u32 mq_start; + u32 srq_handle; +} __attribute__((packed)) ; + +union c2wr_srq_create { + struct c2wr_srq_create_req req; + struct c2wr_srq_create_rep rep; +} __attribute__((packed)) ; + +struct c2wr_srq_destroy_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 srq_handle; +} __attribute__((packed)) ; + +struct c2wr_srq_destroy_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_srq_destroy { + struct c2wr_srq_destroy_req req; + struct c2wr_srq_destroy_rep rep; +} __attribute__((packed)) ; + +/* + *------------------------ QP ------------------------ + */ +enum c2wr_qp_flags { + QP_RDMA_READ = 0x00000001, /* RDMA read enabled? */ + QP_RDMA_WRITE = 0x00000002, /* RDMA write enabled? */ + QP_MW_BIND = 0x00000004, /* MWs enabled */ + QP_ZERO_STAG = 0x00000008, /* enabled? */ + QP_REMOTE_TERMINATION = 0x00000010, /* remote end terminated */ + QP_RDMA_READ_RESPONSE = 0x00000020 /* Remote RDMA read */ + /* enabled? */ +}; + +struct c2wr_qp_create_req { + struct c2wr_hdr hdr; + u64 shared_sq_ht; + u64 shared_rq_ht; + u64 user_context; + u32 rnic_handle; + u32 sq_cq_handle; + u32 rq_cq_handle; + u32 sq_depth; + u32 rq_depth; + u32 srq_handle; + u32 srq_limit; + u32 flags; /* see enum c2wr_qp_flags */ + u32 send_sgl_depth; + u32 recv_sgl_depth; + u32 rdma_write_sgl_depth; + u32 ord; + u32 ird; + u32 pd_id; +} __attribute__((packed)) ; + +struct c2wr_qp_create_rep { + struct c2wr_hdr hdr; + u32 sq_depth; + u32 rq_depth; + u32 send_sgl_depth; + u32 recv_sgl_depth; + u32 rdma_write_sgl_depth; + u32 ord; + u32 ird; + u32 sq_msg_size; + u32 sq_mq_index; + u32 sq_mq_start; + u32 rq_msg_size; + u32 rq_mq_index; + u32 rq_mq_start; + u32 qp_handle; +} __attribute__((packed)) ; + +union c2wr_qp_create { + struct c2wr_qp_create_req req; + struct c2wr_qp_create_rep rep; +} __attribute__((packed)) ; + +struct c2wr_qp_query_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 qp_handle; +} __attribute__((packed)) ; + +struct c2wr_qp_query_rep { + struct c2wr_hdr hdr; + u64 user_context; + u32 rnic_handle; + u32 sq_depth; + u32 rq_depth; + u32 send_sgl_depth; + u32 rdma_write_sgl_depth; + u32 recv_sgl_depth; + u32 ord; + u32 ird; + u16 qp_state; + u16 flags; /* see c2wr_qp_flags_t */ + u32 qp_id; + u32 local_addr; + u32 remote_addr; + u16 local_port; + u16 remote_port; + u32 terminate_msg_length; /* 0 if not present */ + u8 data[0]; + /* Terminate Message in-line here. */ +} __attribute__((packed)) ; + +union c2wr_qp_query { + struct c2wr_qp_query_req req; + struct c2wr_qp_query_rep rep; +} __attribute__((packed)) ; + +struct c2wr_qp_modify_req { + struct c2wr_hdr hdr; + u64 stream_msg; + u32 stream_msg_length; + u32 rnic_handle; + u32 qp_handle; + u32 next_qp_state; + u32 ord; + u32 ird; + u32 sq_depth; + u32 rq_depth; + u32 llp_ep_handle; +} __attribute__((packed)) ; + +struct c2wr_qp_modify_rep { + struct c2wr_hdr hdr; + u32 ord; + u32 ird; + u32 sq_depth; + u32 rq_depth; + u32 sq_msg_size; + u32 sq_mq_index; + u32 sq_mq_start; + u32 rq_msg_size; + u32 rq_mq_index; + u32 rq_mq_start; +} __attribute__((packed)) ; + +union c2wr_qp_modify { + struct c2wr_qp_modify_req req; + struct c2wr_qp_modify_rep rep; +} __attribute__((packed)) ; + +struct c2wr_qp_destroy_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 qp_handle; +} __attribute__((packed)) ; + +struct c2wr_qp_destroy_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_qp_destroy { + struct c2wr_qp_destroy_req req; + struct c2wr_qp_destroy_rep rep; +} __attribute__((packed)) ; + +/* + * The CCWR_QP_CONNECT msg is posted on the verbs request queue. It can + * only be posted when a QP is in IDLE state. After the connect request is + * submitted to the LLP, the adapter moves the QP to CONNECT_PENDING state. + * No synchronous reply from adapter to this WR. The results of + * connection are passed back in an async event CCAE_ACTIVE_CONNECT_RESULTS + * See c2wr_ae_active_connect_results_t + */ +struct c2wr_qp_connect_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 qp_handle; + u32 remote_addr; + u16 remote_port; + u16 pad; + u32 private_data_length; + u8 private_data[0]; /* Private data in-line. */ +} __attribute__((packed)) ; + +struct c2wr_qp_connect { + struct c2wr_qp_connect_req req; + /* no synchronous reply. */ +} __attribute__((packed)) ; + + +/* + *------------------------ MM ------------------------ + */ + +struct c2wr_nsmr_stag_alloc_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 pbl_depth; + u32 pd_id; + u32 flags; +} __attribute__((packed)) ; + +struct c2wr_nsmr_stag_alloc_rep { + struct c2wr_hdr hdr; + u32 pbl_depth; + u32 stag_index; +} __attribute__((packed)) ; + +union c2wr_nsmr_stag_alloc { + struct c2wr_nsmr_stag_alloc_req req; + struct c2wr_nsmr_stag_alloc_rep rep; +} __attribute__((packed)) ; + +struct c2wr_nsmr_register_req { + struct c2wr_hdr hdr; + u64 va; + u32 rnic_handle; + u16 flags; + u8 stag_key; + u8 pad; + u32 pd_id; + u32 pbl_depth; + u32 pbe_size; + u32 fbo; + u32 length; + u32 addrs_length; + /* array of paddrs (must be aligned on a 64bit boundary) */ + u64 paddrs[0]; +} __attribute__((packed)) ; + +struct c2wr_nsmr_register_rep { + struct c2wr_hdr hdr; + u32 pbl_depth; + u32 stag_index; +} __attribute__((packed)) ; + +union c2wr_nsmr_register { + struct c2wr_nsmr_register_req req; + struct c2wr_nsmr_register_rep rep; +} __attribute__((packed)) ; + +struct c2wr_nsmr_pbl_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 flags; + u32 stag_index; + u32 addrs_length; + /* array of paddrs (must be aligned on a 64bit boundary) */ + u64 paddrs[0]; +} __attribute__((packed)) ; + +struct c2wr_nsmr_pbl_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_nsmr_pbl { + struct c2wr_nsmr_pbl_req req; + struct c2wr_nsmr_pbl_rep rep; +} __attribute__((packed)) ; + +struct c2wr_mr_query_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 stag_index; +} __attribute__((packed)) ; + +struct c2wr_mr_query_rep { + struct c2wr_hdr hdr; + u8 stag_key; + u8 pad[3]; + u32 pd_id; + u32 flags; + u32 pbl_depth; +} __attribute__((packed)) ; + +union c2wr_mr_query { + struct c2wr_mr_query_req req; + struct c2wr_mr_query_rep rep; +} __attribute__((packed)) ; + +struct c2wr_mw_query_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 stag_index; +} __attribute__((packed)) ; + +struct c2wr_mw_query_rep { + struct c2wr_hdr hdr; + u8 stag_key; + u8 pad[3]; + u32 pd_id; + u32 flags; +} __attribute__((packed)) ; + +union c2wr_mw_query { + struct c2wr_mw_query_req req; + struct c2wr_mw_query_rep rep; +} __attribute__((packed)) ; + + +struct c2wr_stag_dealloc_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 stag_index; +} __attribute__((packed)) ; + +struct c2wr_stag_dealloc_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_stag_dealloc { + struct c2wr_stag_dealloc_req req; + struct c2wr_stag_dealloc_rep rep; +} __attribute__((packed)) ; + +struct c2wr_nsmr_reregister_req { + struct c2wr_hdr hdr; + u64 va; + u32 rnic_handle; + u16 flags; + u8 stag_key; + u8 pad; + u32 stag_index; + u32 pd_id; + u32 pbl_depth; + u32 pbe_size; + u32 fbo; + u32 length; + u32 addrs_length; + u32 pad1; + /* array of paddrs (must be aligned on a 64bit boundary) */ + u64 paddrs[0]; +} __attribute__((packed)) ; + +struct c2wr_nsmr_reregister_rep { + struct c2wr_hdr hdr; + u32 pbl_depth; + u32 stag_index; +} __attribute__((packed)) ; + +union c2wr_nsmr_reregister { + struct c2wr_nsmr_reregister_req req; + struct c2wr_nsmr_reregister_rep rep; +} __attribute__((packed)) ; + +struct c2wr_smr_register_req { + struct c2wr_hdr hdr; + u64 va; + u32 rnic_handle; + u16 flags; + u8 stag_key; + u8 pad; + u32 stag_index; + u32 pd_id; +} __attribute__((packed)) ; + +struct c2wr_smr_register_rep { + struct c2wr_hdr hdr; + u32 stag_index; +} __attribute__((packed)) ; + +union c2wr_smr_register { + struct c2wr_smr_register_req req; + struct c2wr_smr_register_rep rep; +} __attribute__((packed)) ; + +struct c2wr_mw_alloc_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 pd_id; +} __attribute__((packed)) ; + +struct c2wr_mw_alloc_rep { + struct c2wr_hdr hdr; + u32 stag_index; +} __attribute__((packed)) ; + +union c2wr_mw_alloc { + struct c2wr_mw_alloc_req req; + struct c2wr_mw_alloc_rep rep; +} __attribute__((packed)) ; + +/* + *------------------------ WRs ----------------------- + */ + +struct c2wr_user_hdr { + struct c2wr_hdr hdr; /* Has status and WR Type */ +} __attribute__((packed)) ; + +enum c2_qp_state { + C2_QP_STATE_IDLE = 0x01, + C2_QP_STATE_CONNECTING = 0x02, + C2_QP_STATE_RTS = 0x04, + C2_QP_STATE_CLOSING = 0x08, + C2_QP_STATE_TERMINATE = 0x10, + C2_QP_STATE_ERROR = 0x20, +}; + +/* Completion queue entry. */ +struct c2wr_ce { + struct c2wr_hdr hdr; /* Has status and WR Type */ + u64 qp_user_context; /* c2_user_qp_t * */ + u32 qp_state; /* Current QP State */ + u32 handle; /* QPID or EP Handle */ + u32 bytes_rcvd; /* valid for RECV WCs */ + u32 stag; +} __attribute__((packed)) ; + + +/* + * Flags used for all post-sq WRs. These must fit in the flags + * field of the struct c2wr_hdr (eight bits). + */ +enum { + SQ_SIGNALED = 0x01, + SQ_READ_FENCE = 0x02, + SQ_FENCE = 0x04, +}; + +/* + * Common fields for all post-sq WRs. Namely the standard header and a + * secondary header with fields common to all post-sq WRs. + */ +struct c2_sq_hdr { + struct c2wr_user_hdr user_hdr; +} __attribute__((packed)); + +/* + * Same as above but for post-rq WRs. + */ +struct c2_rq_hdr { + struct c2wr_user_hdr user_hdr; +} __attribute__((packed)); + +/* + * use the same struct for all sends. + */ +struct c2wr_send_req { + struct c2_sq_hdr sq_hdr; + u32 sge_len; + u32 remote_stag; + u8 data[0]; /* SGE array */ +} __attribute__((packed)); +/* XXX c2wr_send_req_t, c2wr_send_se_req_t, c2wr_send_inv_req_t, + c2wr_send_se_inv_req_t;*/ + +union c2wr_send { + struct c2wr_send_req req; + struct c2wr_ce rep; +} __attribute__((packed)); + +struct c2wr_rdma_write_req { + struct c2_sq_hdr sq_hdr; + u64 remote_to; + u32 remote_stag; + u32 sge_len; + u8 data[0]; /* SGE array */ +} __attribute__((packed)); + +union c2wr_rdma_write { + struct c2wr_rdma_write_req req; + struct c2wr_ce rep; +} __attribute__((packed)); + +struct c2wr_rdma_read_req { + struct c2_sq_hdr sq_hdr; + u64 local_to; + u64 remote_to; + u32 local_stag; + u32 remote_stag; + u32 length; +} __attribute__((packed)); + +union c2wr_rdma_read { + struct c2wr_rdma_read_req req; + struct c2wr_ce rep; +} __attribute__((packed)); + +struct c2wr_mw_bind_req { + struct c2_sq_hdr sq_hdr; + u64 va; + u8 stag_key; + u8 pad[3]; + u32 mw_stag_index; + u32 mr_stag_index; + u32 length; + u32 flags; +} __attribute__((packed)); + +union c2wr_mw_bind { + struct c2wr_mw_bind_req req; + struct c2wr_ce rep; +} __attribute__((packed)); + +struct c2wr_nsmr_fastreg_req { + struct c2_sq_hdr sq_hdr; + u64 va; + u8 stag_key; + u8 pad[3]; + u32 stag_index; + u32 pbe_size; + u32 fbo; + u32 length; + u32 addrs_length; + /* array of paddrs (must be aligned on a 64bit boundary) */ + u64 paddrs[0]; +} __attribute__((packed)); + +union c2wr_nsmr_fastreg { + struct c2wr_nsmr_fastreg_req req; + struct c2wr_ce rep; +} __attribute__((packed)); + +struct c2wr_stag_invalidate_req { + struct c2_sq_hdr sq_hdr; + u8 stag_key; + u8 pad[3]; + u32 stag_index; +} __attribute__((packed)); + +union c2wr_stag_invalidate { + struct c2wr_stag_invalidate_req req; + struct c2wr_ce rep; +} __attribute__((packed)); + +union c2wr_sqwr { + struct c2_sq_hdr sq_hdr; + struct c2wr_send_req send; + struct c2wr_send_req send_se; + struct c2wr_send_req send_inv; + struct c2wr_send_req send_se_inv; + struct c2wr_rdma_write_req rdma_write; + struct c2wr_rdma_read_req rdma_read; + struct c2wr_mw_bind_req mw_bind; + struct c2wr_nsmr_fastreg_req nsmr_fastreg; + struct c2wr_stag_invalidate_req stag_inv; +} __attribute__((packed)); + + +/* + * RQ WRs + */ +struct c2wr_rqwr { + struct c2_rq_hdr rq_hdr; + u8 data[0]; /* array of SGEs */ +} __attribute__((packed)); +/* XXX c2wr_rqwr_t, c2wr_recv_req_t; */ + +union c2wr_recv { + struct c2wr_rqwr req; + struct c2wr_ce rep; +} __attribute__((packed)); + +/* + * All AEs start with this header. Most AEs only need to convey the + * information in the header. Some, like LLP connection events, need + * more info. The union typdef c2wr_ae_t has all the possible AEs. + * + * hdr.context is the user_context from the rnic_open WR. NULL If this + * is not affiliated with an rnic + * + * hdr.id is the AE identifier (eg; CCAE_REMOTE_SHUTDOWN, + * CCAE_LLP_CLOSE_COMPLETE) + * + * resource_type is one of: C2_RES_IND_QP, C2_RES_IND_CQ, C2_RES_IND_SRQ + * + * user_context is the context passed down when the host created the resource. + */ +struct c2wr_ae_hdr { + struct c2wr_hdr hdr; + u64 user_context; /* user context for this res. */ + u32 resource_type; /* see enum c2_resource_indicator */ + u32 resource; /* handle for resource */ + u32 qp_state; /* current QP State */ +} __attribute__((packed)); + +/* + * After submitting the CCAE_ACTIVE_CONNECT_RESULTS message on the AEQ, + * the adapter moves the QP into RTS state + */ +struct c2wr_ae_active_connect_results { + struct c2wr_ae_hdr ae_hdr; + u32 laddr; + u32 raddr; + u16 lport; + u16 rport; + u32 private_data_length; + u8 private_data[0]; /* data is in-line in the msg. */ +} __attribute__((packed)); + +/* + * When connections are established by the stack (and the private data + * MPA frame is received), the adapter will generate an event to the host. + * The details of the connection, any private data, and the new connection + * request handle is passed up via the CCAE_CONNECTION_REQUEST msg on the + * AE queue: + */ +struct c2wr_ae_connection_request { + struct c2wr_ae_hdr ae_hdr; + u32 cr_handle; /* connreq handle (sock ptr) */ + u32 laddr; + u32 raddr; + u16 lport; + u16 rport; + u32 private_data_length; + u8 private_data[0]; /* data is in-line in the msg. */ +} __attribute__((packed)); + +union c2wr_ae { + struct c2wr_ae_hdr ae_generic; + struct c2wr_ae_active_connect_results ae_active_connect_results; + struct c2wr_ae_connection_request ae_connection_request; +} __attribute__((packed)); + +struct c2wr_init_req { + struct c2wr_hdr hdr; + u64 hint_count; + u64 q0_host_shared; + u64 q1_host_shared; + u64 q1_host_msg_pool; + u64 q2_host_shared; + u64 q2_host_msg_pool; +} __attribute__((packed)); + +struct c2wr_init_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)); + +union c2wr_init { + struct c2wr_init_req req; + struct c2wr_init_rep rep; +} __attribute__((packed)); + +/* + * For upgrading flash. + */ + +struct c2wr_flash_init_req { + struct c2wr_hdr hdr; + u32 rnic_handle; +} __attribute__((packed)); + +struct c2wr_flash_init_rep { + struct c2wr_hdr hdr; + u32 adapter_flash_buf_offset; + u32 adapter_flash_len; +} __attribute__((packed)); + +union c2wr_flash_init { + struct c2wr_flash_init_req req; + struct c2wr_flash_init_rep rep; +} __attribute__((packed)); + +struct c2wr_flash_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 len; +} __attribute__((packed)); + +struct c2wr_flash_rep { + struct c2wr_hdr hdr; + u32 status; +} __attribute__((packed)); + +union c2wr_flash { + struct c2wr_flash_req req; + struct c2wr_flash_rep rep; +} __attribute__((packed)); + +struct c2wr_buf_alloc_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 size; +} __attribute__((packed)); + +struct c2wr_buf_alloc_rep { + struct c2wr_hdr hdr; + u32 offset; /* 0 if mem not available */ + u32 size; /* 0 if mem not available */ +} __attribute__((packed)); + +union c2wr_buf_alloc { + struct c2wr_buf_alloc_req req; + struct c2wr_buf_alloc_rep rep; +} __attribute__((packed)); + +struct c2wr_buf_free_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 offset; /* Must match value from alloc */ + u32 size; /* Must match value from alloc */ +} __attribute__((packed)); + +struct c2wr_buf_free_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)); + +union c2wr_buf_free { + struct c2wr_buf_free_req req; + struct c2wr_ce rep; +} __attribute__((packed)); + +struct c2wr_flash_write_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 offset; + u32 size; + u32 type; + u32 flags; +} __attribute__((packed)); + +struct c2wr_flash_write_rep { + struct c2wr_hdr hdr; + u32 status; +} __attribute__((packed)); + +union c2wr_flash_write { + struct c2wr_flash_write_req req; + struct c2wr_flash_write_rep rep; +} __attribute__((packed)); + +/* + * Messages for LLP connection setup. + */ + +/* + * Listen Request. This allocates a listening endpoint to allow passive + * connection setup. Newly established LLP connections are passed up + * via an AE. See c2wr_ae_connection_request_t + */ +struct c2wr_ep_listen_create_req { + struct c2wr_hdr hdr; + u64 user_context; /* returned in AEs. */ + u32 rnic_handle; + u32 local_addr; /* local addr, or 0 */ + u16 local_port; /* 0 means "pick one" */ + u16 pad; + u32 backlog; /* tradional tcp listen bl */ +} __attribute__((packed)); + +struct c2wr_ep_listen_create_rep { + struct c2wr_hdr hdr; + u32 ep_handle; /* handle to new listening ep */ + u16 local_port; /* resulting port... */ + u16 pad; +} __attribute__((packed)); + +union c2wr_ep_listen_create { + struct c2wr_ep_listen_create_req req; + struct c2wr_ep_listen_create_rep rep; +} __attribute__((packed)); + +struct c2wr_ep_listen_destroy_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 ep_handle; +} __attribute__((packed)); + +struct c2wr_ep_listen_destroy_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)); + +union c2wr_ep_listen_destroy { + struct c2wr_ep_listen_destroy_req req; + struct c2wr_ep_listen_destroy_rep rep; +} __attribute__((packed)); + +struct c2wr_ep_query_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 ep_handle; +} __attribute__((packed)); + +struct c2wr_ep_query_rep { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 local_addr; + u32 remote_addr; + u16 local_port; + u16 remote_port; +} __attribute__((packed)); + +union c2wr_ep_query { + struct c2wr_ep_query_req req; + struct c2wr_ep_query_rep rep; +} __attribute__((packed)); + + +/* + * The host passes this down to indicate acceptance of a pending iWARP + * connection. The cr_handle was obtained from the CONNECTION_REQUEST + * AE passed up by the adapter. See c2wr_ae_connection_request_t. + */ +struct c2wr_cr_accept_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 qp_handle; /* QP to bind to this LLP conn */ + u32 ep_handle; /* LLP handle to accept */ + u32 private_data_length; + u8 private_data[0]; /* data in-line in msg. */ +} __attribute__((packed)); + +/* + * adapter sends reply when private data is successfully submitted to + * the LLP. + */ +struct c2wr_cr_accept_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)); + +union c2wr_cr_accept { + struct c2wr_cr_accept_req req; + struct c2wr_cr_accept_rep rep; +} __attribute__((packed)); + +/* + * The host sends this down if a given iWARP connection request was + * rejected by the consumer. The cr_handle was obtained from a + * previous c2wr_ae_connection_request_t AE sent by the adapter. + */ +struct c2wr_cr_reject_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 ep_handle; /* LLP handle to reject */ +} __attribute__((packed)); + +/* + * Dunno if this is needed, but we'll add it for now. The adapter will + * send the reject_reply after the LLP endpoint has been destroyed. + */ +struct c2wr_cr_reject_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)); + +union c2wr_cr_reject { + struct c2wr_cr_reject_req req; + struct c2wr_cr_reject_rep rep; +} __attribute__((packed)); + +/* + * console command. Used to implement a debug console over the verbs + * request and reply queues. + */ + +/* + * Console request message. It contains: + * - message hdr with id = CCWR_CONSOLE + * - the physaddr/len of host memory to be used for the reply. + * - the command string. eg: "netstat -s" or "zoneinfo" + */ +struct c2wr_console_req { + struct c2wr_hdr hdr; /* id = CCWR_CONSOLE */ + u64 reply_buf; /* pinned host buf for reply */ + u32 reply_buf_len; /* length of reply buffer */ + u8 command[0]; /* NUL terminated ascii string */ + /* containing the command req */ +} __attribute__((packed)); + +/* + * flags used in the console reply. + */ +enum c2_console_flags { + CONS_REPLY_TRUNCATED = 0x00000001 /* reply was truncated */ +} __attribute__((packed)); + +/* + * Console reply message. + * hdr.result contains the c2_status_t error if the reply was _not_ generated, + * or C2_OK if the reply was generated. + */ +struct c2wr_console_rep { + struct c2wr_hdr hdr; /* id = CCWR_CONSOLE */ + u32 flags; +} __attribute__((packed)); + +union c2wr_console { + struct c2wr_console_req req; + struct c2wr_console_rep rep; +} __attribute__((packed)); + + +/* + * Giant union with all WRs. Makes life easier... + */ +union c2wr { + struct c2wr_hdr hdr; + struct c2wr_user_hdr user_hdr; + union c2wr_rnic_open rnic_open; + union c2wr_rnic_query rnic_query; + union c2wr_rnic_getconfig rnic_getconfig; + union c2wr_rnic_setconfig rnic_setconfig; + union c2wr_rnic_close rnic_close; + union c2wr_cq_create cq_create; + union c2wr_cq_modify cq_modify; + union c2wr_cq_destroy cq_destroy; + union c2wr_pd_alloc pd_alloc; + union c2wr_pd_dealloc pd_dealloc; + union c2wr_srq_create srq_create; + union c2wr_srq_destroy srq_destroy; + union c2wr_qp_create qp_create; + union c2wr_qp_query qp_query; + union c2wr_qp_modify qp_modify; + union c2wr_qp_destroy qp_destroy; + struct c2wr_qp_connect qp_connect; + union c2wr_nsmr_stag_alloc nsmr_stag_alloc; + union c2wr_nsmr_register nsmr_register; + union c2wr_nsmr_pbl nsmr_pbl; + union c2wr_mr_query mr_query; + union c2wr_mw_query mw_query; + union c2wr_stag_dealloc stag_dealloc; + union c2wr_sqwr sqwr; + struct c2wr_rqwr rqwr; + struct c2wr_ce ce; + union c2wr_ae ae; + union c2wr_init init; + union c2wr_ep_listen_create ep_listen_create; + union c2wr_ep_listen_destroy ep_listen_destroy; + union c2wr_cr_accept cr_accept; + union c2wr_cr_reject cr_reject; + union c2wr_console console; + union c2wr_flash_init flash_init; + union c2wr_flash flash; + union c2wr_buf_alloc buf_alloc; + union c2wr_buf_free buf_free; + union c2wr_flash_write flash_write; +} __attribute__((packed)); + + +/* + * Accessors for the wr fields that are packed together tightly to + * reduce the wr message size. The wr arguments are void* so that + * either a struct c2wr*, a struct c2wr_hdr*, or a pointer to any of the types + * in the struct c2wr union can be passed in. + */ +static __inline__ u8 c2_wr_get_id(void *wr) +{ + return ((struct c2wr_hdr *) wr)->id; +} +static __inline__ void c2_wr_set_id(void *wr, u8 id) +{ + ((struct c2wr_hdr *) wr)->id = id; +} +static __inline__ u8 c2_wr_get_result(void *wr) +{ + return ((struct c2wr_hdr *) wr)->result; +} +static __inline__ void c2_wr_set_result(void *wr, u8 result) +{ + ((struct c2wr_hdr *) wr)->result = result; +} +static __inline__ u8 c2_wr_get_flags(void *wr) +{ + return ((struct c2wr_hdr *) wr)->flags; +} +static __inline__ void c2_wr_set_flags(void *wr, u8 flags) +{ + ((struct c2wr_hdr *) wr)->flags = flags; +} +static __inline__ u8 c2_wr_get_sge_count(void *wr) +{ + return ((struct c2wr_hdr *) wr)->sge_count; +} +static __inline__ void c2_wr_set_sge_count(void *wr, u8 sge_count) +{ + ((struct c2wr_hdr *) wr)->sge_count = sge_count; +} +static __inline__ u32 c2_wr_get_wqe_count(void *wr) +{ + return ((struct c2wr_hdr *) wr)->wqe_count; +} +static __inline__ void c2_wr_set_wqe_count(void *wr, u32 wqe_count) +{ + ((struct c2wr_hdr *) wr)->wqe_count = wqe_count; +} + +#endif /* _C2_WR_H_ */ From swise at opengridcomputing.com Wed May 31 11:27:42 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 31 May 2006 13:27:42 -0500 Subject: [openib-general] [PATCH 4/7] AMSO1100 OpenFabrics Provider. In-Reply-To: <20060531182733.3652.54755.stgit@stevo-desktop> References: <20060531182733.3652.54755.stgit@stevo-desktop> Message-ID: <20060531182742.3652.15436.stgit@stevo-desktop> --- drivers/infiniband/hw/amso1100/c2_cm.c | 452 ++++++++++++ drivers/infiniband/hw/amso1100/c2_cq.c | 424 +++++++++++ drivers/infiniband/hw/amso1100/c2_pd.c | 71 ++ drivers/infiniband/hw/amso1100/c2_provider.c | 871 +++++++++++++++++++++++ drivers/infiniband/hw/amso1100/c2_provider.h | 182 +++++ drivers/infiniband/hw/amso1100/c2_qp.c | 975 ++++++++++++++++++++++++++ drivers/infiniband/hw/amso1100/c2_user.h | 82 ++ 7 files changed, 3057 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_cm.c b/drivers/infiniband/hw/amso1100/c2_cm.c new file mode 100644 index 0000000..018d11f --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_cm.c @@ -0,0 +1,452 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ +#include "c2.h" +#include "c2_wr.h" +#include "c2_vq.h" +#include + +int c2_llp_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param) +{ + struct c2_dev *c2dev = to_c2dev(cm_id->device); + struct ib_qp *ibqp; + struct c2_qp *qp; + struct c2wr_qp_connect_req *wr; /* variable size needs a malloc. */ + struct c2_vq_req *vq_req; + int err; + + ibqp = c2_get_qp(cm_id->device, iw_param->qpn); + if (!ibqp) + return -EINVAL; + qp = to_c2qp(ibqp); + + /* Associate QP <--> CM_ID */ + cm_id->provider_data = qp; + cm_id->add_ref(cm_id); + qp->cm_id = cm_id; + + /* + * only support the max private_data length + */ + if (iw_param->private_data_len > C2_MAX_PRIVATE_DATA_SIZE) { + err = -EINVAL; + goto bail0; + } + /* + * Set the rdma read limits + */ + err = c2_qp_set_read_limits(c2dev, qp, iw_param->ord, iw_param->ird); + if (err) + goto bail0; + + /* + * Create and send a WR_QP_CONNECT... + */ + wr = kmalloc(c2dev->req_vq.msg_size, GFP_KERNEL); + if (!wr) { + err = -ENOMEM; + goto bail0; + } + + vq_req = vq_req_alloc(c2dev); + if (!vq_req) { + err = -ENOMEM; + goto bail1; + } + + c2_wr_set_id(wr, CCWR_QP_CONNECT); + wr->hdr.context = 0; + wr->rnic_handle = c2dev->adapter_handle; + wr->qp_handle = qp->adapter_handle; + + wr->remote_addr = cm_id->remote_addr.sin_addr.s_addr; + wr->remote_port = cm_id->remote_addr.sin_port; + + /* + * Move any private data from the callers's buf into + * the WR. + */ + if (iw_param->private_data) { + wr->private_data_length = + cpu_to_be32(iw_param->private_data_len); + memcpy(&wr->private_data[0], iw_param->private_data, + iw_param->private_data_len); + } else + wr->private_data_length = 0; + + /* + * Send WR to adapter. NOTE: There is no synch reply from + * the adapter. + */ + err = vq_send_wr(c2dev, (union c2wr *) wr); + vq_req_free(c2dev, vq_req); + + bail1: + kfree(wr); + bail0: + if (err) { + /* + * If we fail, release reference on QP and + * disassociate QP from CM_ID + */ + cm_id->provider_data = NULL; + qp->cm_id = NULL; + cm_id->rem_ref(cm_id); + } + return err; +} + +int c2_llp_service_create(struct iw_cm_id *cm_id, int backlog) +{ + struct c2_dev *c2dev; + struct c2wr_ep_listen_create_req wr; + struct c2wr_ep_listen_create_rep *reply; + struct c2_vq_req *vq_req; + int err; + + c2dev = to_c2dev(cm_id->device); + if (c2dev == NULL) + return -EINVAL; + + /* + * Allocate verbs request. + */ + vq_req = vq_req_alloc(c2dev); + if (!vq_req) + return -ENOMEM; + + /* + * Build the WR + */ + c2_wr_set_id(&wr, CCWR_EP_LISTEN_CREATE); + wr.hdr.context = (u64) (unsigned long) vq_req; + wr.rnic_handle = c2dev->adapter_handle; + wr.local_addr = cm_id->local_addr.sin_addr.s_addr; + wr.local_port = cm_id->local_addr.sin_port; + wr.backlog = cpu_to_be32(backlog); + wr.user_context = (u64) (unsigned long) cm_id; + + /* + * Reference the request struct. Dereferenced in the int handler. + */ + vq_req_get(c2dev, vq_req); + + /* + * Send WR to adapter + */ + err = vq_send_wr(c2dev, (union c2wr *) & wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail0; + } + + /* + * Wait for reply from adapter + */ + err = vq_wait_for_reply(c2dev, vq_req); + if (err) + goto bail0; + + /* + * Process reply + */ + reply = + (struct c2wr_ep_listen_create_rep *) (unsigned long) vq_req->reply_msg; + if (!reply) { + err = -ENOMEM; + goto bail1; + } + + if ((err = c2_errno(reply)) != 0) + goto bail1; + + /* + * Keep the adapter handle. Used in subsequent destroy + */ + cm_id->provider_data = (void*)(unsigned long) reply->ep_handle; + + /* + * free vq stuff + */ + vq_repbuf_free(c2dev, reply); + vq_req_free(c2dev, vq_req); + + return 0; + + bail1: + vq_repbuf_free(c2dev, reply); + bail0: + vq_req_free(c2dev, vq_req); + return err; +} + + +int c2_llp_service_destroy(struct iw_cm_id *cm_id) +{ + + struct c2_dev *c2dev; + struct c2wr_ep_listen_destroy_req wr; + struct c2wr_ep_listen_destroy_rep *reply; + struct c2_vq_req *vq_req; + int err; + + c2dev = to_c2dev(cm_id->device); + if (c2dev == NULL) + return -EINVAL; + + /* + * Allocate verbs request. + */ + vq_req = vq_req_alloc(c2dev); + if (!vq_req) + return -ENOMEM; + + /* + * Build the WR + */ + c2_wr_set_id(&wr, CCWR_EP_LISTEN_DESTROY); + wr.hdr.context = (unsigned long) vq_req; + wr.rnic_handle = c2dev->adapter_handle; + wr.ep_handle = (u32)(unsigned long)cm_id->provider_data; + + /* + * reference the request struct. dereferenced in the int handler. + */ + vq_req_get(c2dev, vq_req); + + /* + * Send WR to adapter + */ + err = vq_send_wr(c2dev, (union c2wr *) & wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail0; + } + + /* + * Wait for reply from adapter + */ + err = vq_wait_for_reply(c2dev, vq_req); + if (err) + goto bail0; + + /* + * Process reply + */ + reply=(struct c2wr_ep_listen_destroy_rep *)(unsigned long)vq_req->reply_msg; + if (!reply) { + err = -ENOMEM; + goto bail0; + } + if ((err = c2_errno(reply)) != 0) + goto bail1; + + bail1: + vq_repbuf_free(c2dev, reply); + bail0: + vq_req_free(c2dev, vq_req); + return err; +} + +int c2_llp_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param) +{ + struct c2_dev *c2dev = to_c2dev(cm_id->device); + struct c2_qp *qp; + struct ib_qp *ibqp; + struct c2wr_cr_accept_req *wr; /* variable length WR */ + struct c2_vq_req *vq_req; + struct c2wr_cr_accept_rep *reply; /* VQ Reply msg ptr. */ + int err; + + ibqp = c2_get_qp(cm_id->device, iw_param->qpn); + if (!ibqp) + return -EINVAL; + qp = to_c2qp(ibqp); + + /* Set the RDMA read limits */ + err = c2_qp_set_read_limits(c2dev, qp, iw_param->ord, iw_param->ird); + if (err) + goto bail0; + + /* Allocate verbs request. */ + vq_req = vq_req_alloc(c2dev); + if (!vq_req) { + err = -ENOMEM; + goto bail1; + } + vq_req->qp = qp; + vq_req->cm_id = cm_id; + vq_req->event = IW_CM_EVENT_ESTABLISHED; + + wr = kmalloc(c2dev->req_vq.msg_size, GFP_KERNEL); + if (!wr) { + err = -ENOMEM; + goto bail2; + } + + /* Build the WR */ + c2_wr_set_id(wr, CCWR_CR_ACCEPT); + wr->hdr.context = (unsigned long) vq_req; + wr->rnic_handle = c2dev->adapter_handle; + wr->ep_handle = (u32) (unsigned long) cm_id->provider_data; + wr->qp_handle = qp->adapter_handle; + + /* Replace the cr_handle with the QP after accept */ + cm_id->provider_data = qp; + cm_id->add_ref(cm_id); + qp->cm_id = cm_id; + + cm_id->provider_data = qp; + + /* Validate private_data length */ + if (iw_param->private_data_len > C2_MAX_PRIVATE_DATA_SIZE) { + err = -EINVAL; + goto bail2; + } + + if (iw_param->private_data) { + wr->private_data_length = cpu_to_be32(iw_param->private_data_len); + memcpy(&wr->private_data[0], + iw_param->private_data, iw_param->private_data_len); + } else + wr->private_data_length = 0; + + /* Reference the request struct. Dereferenced in the int handler. */ + vq_req_get(c2dev, vq_req); + + /* Send WR to adapter */ + err = vq_send_wr(c2dev, (union c2wr *) wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail2; + } + + /* Wait for reply from adapter */ + err = vq_wait_for_reply(c2dev, vq_req); + if (err) + goto bail2; + + /* Check that reply is present */ + reply = (struct c2wr_cr_accept_rep *) (unsigned long) vq_req->reply_msg; + if (!reply) { + err = -ENOMEM; + goto bail2; + } + + err = c2_errno(reply); + vq_repbuf_free(c2dev, reply); + + if (!err) + c2_set_qp_state(qp, C2_QP_STATE_RTS); + bail2: + kfree(wr); + bail1: + vq_req_free(c2dev, vq_req); + bail0: + if (err) { + /* + * If we fail, release reference on QP and + * disassociate QP from CM_ID + */ + cm_id->provider_data = NULL; + qp->cm_id = NULL; + cm_id->rem_ref(cm_id); + } + return err; +} + +int c2_llp_reject(struct iw_cm_id *cm_id, const void *pdata, u8 pdata_len) +{ + struct c2_dev *c2dev; + struct c2wr_cr_reject_req wr; + struct c2_vq_req *vq_req; + struct c2wr_cr_reject_rep *reply; + int err; + + c2dev = to_c2dev(cm_id->device); + + /* + * Allocate verbs request. + */ + vq_req = vq_req_alloc(c2dev); + if (!vq_req) + return -ENOMEM; + + /* + * Build the WR + */ + c2_wr_set_id(&wr, CCWR_CR_REJECT); + wr.hdr.context = (unsigned long) vq_req; + wr.rnic_handle = c2dev->adapter_handle; + wr.ep_handle = (u32) (unsigned long) cm_id->provider_data; + + /* + * reference the request struct. dereferenced in the int handler. + */ + vq_req_get(c2dev, vq_req); + + /* + * Send WR to adapter + */ + err = vq_send_wr(c2dev, (union c2wr *) & wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail0; + } + + /* + * Wait for reply from adapter + */ + err = vq_wait_for_reply(c2dev, vq_req); + if (err) + goto bail0; + + /* + * Process reply + */ + reply = (struct c2wr_cr_reject_rep *) (unsigned long) + vq_req->reply_msg; + if (!reply) { + err = -ENOMEM; + goto bail0; + } + err = c2_errno(reply); + /* + * free vq stuff + */ + vq_repbuf_free(c2dev, reply); + + bail0: + vq_req_free(c2dev, vq_req); + return err; +} diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c new file mode 100644 index 0000000..77351d7 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_cq.c @@ -0,0 +1,424 @@ +/* + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. + * Copyright (c) 2005 Cisco Systems, Inc. All rights reserved. + * Copyright (c) 2005 Mellanox Technologies. All rights reserved. + * Copyright (c) 2004 Voltaire, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ +#include "c2.h" +#include "c2_vq.h" +#include "c2_status.h" + +#define C2_CQ_MSG_SIZE ((sizeof(struct c2wr_ce) + 32-1) & ~(32-1)) + +struct c2_cq *c2_cq_get(struct c2_dev *c2dev, int cqn) +{ + struct c2_cq *cq; + unsigned long flags; + + spin_lock_irqsave(&c2dev->lock, flags); + cq = c2dev->qptr_array[cqn]; + if (!cq) { + spin_unlock_irqrestore(&c2dev->lock, flags); + return NULL; + } + atomic_inc(&cq->refcount); + spin_unlock_irqrestore(&c2dev->lock, flags); + return cq; +} + +void c2_cq_put(struct c2_cq *cq) +{ + if (atomic_dec_and_test(&cq->refcount)) + wake_up(&cq->wait); +} + +void c2_cq_event(struct c2_dev *c2dev, u32 mq_index) +{ + struct c2_cq *cq; + + cq = c2_cq_get(c2dev, mq_index); + if (!cq) { + printk("discarding events on destroyed CQN=%d\n", mq_index); + return; + } + + assert(cq->ibcq.comp_handler); + (*cq->ibcq.comp_handler) (&cq->ibcq, cq->ibcq.cq_context); + c2_cq_put(cq); +} + +void c2_cq_clean(struct c2_dev *c2dev, struct c2_qp *qp, u32 mq_index) +{ + struct c2_cq *cq; + struct c2_mq *q; + + cq = c2_cq_get(c2dev, mq_index); + if (!cq) + return; + + spin_lock_irq(&cq->lock); + q = &cq->mq; + if (q && !c2_mq_empty(q)) { + u16 priv = q->priv; + struct c2wr_ce *msg; + + while (priv != be16_to_cpu(*q->shared)) { + msg = (struct c2wr_ce *) + (q->msg_pool.host + priv * q->msg_size); + if (msg->qp_user_context == (u64) (unsigned long) qp) { + msg->qp_user_context = (u64) 0; + } + priv = (priv + 1) % q->q_size; + } + } + spin_unlock_irq(&cq->lock); + c2_cq_put(cq); +} + +static inline enum ib_wc_status c2_cqe_status_to_openib(u8 status) +{ + switch (status) { + case C2_OK: + return IB_WC_SUCCESS; + case CCERR_FLUSHED: + return IB_WC_WR_FLUSH_ERR; + case CCERR_BASE_AND_BOUNDS_VIOLATION: + return IB_WC_LOC_PROT_ERR; + case CCERR_ACCESS_VIOLATION: + return IB_WC_LOC_ACCESS_ERR; + case CCERR_TOTAL_LENGTH_TOO_BIG: + return IB_WC_LOC_LEN_ERR; + case CCERR_INVALID_WINDOW: + return IB_WC_MW_BIND_ERR; + default: + return IB_WC_GENERAL_ERR; + } +} + + +static inline int c2_poll_one(struct c2_dev *c2dev, + struct c2_cq *cq, struct ib_wc *entry) +{ + struct c2wr_ce *ce; + struct c2_qp *qp; + int is_recv = 0; + + ce = (struct c2wr_ce *) c2_mq_consume(&cq->mq); + if (!ce) { + return -EAGAIN; + } + + /* + * if the qp returned is null then this qp has already + * been freed and we are unable process the completion. + * try pulling the next message + */ + while ((qp = + (struct c2_qp *) (unsigned long) ce->qp_user_context) == NULL) { + c2_mq_free(&cq->mq); + ce = (struct c2wr_ce *) c2_mq_consume(&cq->mq); + if (!ce) + return -EAGAIN; + } + + entry->status = c2_cqe_status_to_openib(c2_wr_get_result(ce)); + entry->wr_id = ce->hdr.context; + entry->qp_num = ce->handle; + entry->wc_flags = 0; + entry->slid = 0; + entry->sl = 0; + entry->src_qp = 0; + entry->dlid_path_bits = 0; + entry->pkey_index = 0; + + switch (c2_wr_get_id(ce)) { + case C2_WR_TYPE_SEND: + entry->opcode = IB_WC_SEND; + break; + case C2_WR_TYPE_RDMA_WRITE: + entry->opcode = IB_WC_RDMA_WRITE; + break; + case C2_WR_TYPE_RDMA_READ: + entry->opcode = IB_WC_RDMA_READ; + break; + case C2_WR_TYPE_BIND_MW: + entry->opcode = IB_WC_BIND_MW; + break; + case C2_WR_TYPE_RECV: + entry->byte_len = be32_to_cpu(ce->bytes_rcvd); + entry->opcode = IB_WC_RECV; + is_recv = 1; + break; + default: + break; + } + + /* consume the WQEs */ + if (is_recv) + c2_mq_lconsume(&qp->rq_mq, 1); + else + c2_mq_lconsume(&qp->sq_mq, + be32_to_cpu(c2_wr_get_wqe_count(ce)) + 1); + + /* free the message */ + c2_mq_free(&cq->mq); + + return 0; +} + +int c2_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry) +{ + struct c2_dev *c2dev = to_c2dev(ibcq->device); + struct c2_cq *cq = to_c2cq(ibcq); + unsigned long flags; + int npolled, err; + + spin_lock_irqsave(&cq->lock, flags); + + for (npolled = 0; npolled < num_entries; ++npolled) { + + err = c2_poll_one(c2dev, cq, entry + npolled); + if (err) + break; + } + + spin_unlock_irqrestore(&cq->lock, flags); + + return npolled; +} + +int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify) +{ + struct c2_mq_shared __iomem *shared; + struct c2_cq *cq; + + cq = to_c2cq(ibcq); + shared = cq->mq.peer; + + if (notify == IB_CQ_NEXT_COMP) + writeb(C2_CQ_NOTIFICATION_TYPE_NEXT, &shared->notification_type); + else if (notify == IB_CQ_SOLICITED) + writeb(C2_CQ_NOTIFICATION_TYPE_NEXT_SE, &shared->notification_type); + else + return -EINVAL; + + writeb(CQ_WAIT_FOR_DMA | CQ_ARMED, &shared->armed); + + /* + * Now read back shared->armed to make the PCI + * write synchronous. This is necessary for + * correct cq notification semantics. + */ + readb(&shared->armed); + + return 0; +} + +static void c2_free_cq_buf(struct c2_mq *mq) +{ + free_pages((unsigned long) mq->msg_pool.host, + get_order(mq->q_size * mq->msg_size)); +} + +static int c2_alloc_cq_buf(struct c2_mq *mq, int q_size, int msg_size) +{ + unsigned long pool_start; + + pool_start = __get_free_pages(GFP_KERNEL, + get_order(q_size * msg_size)); + if (!pool_start) + return -ENOMEM; + + c2_mq_rep_init(mq, + 0, /* index (currently unknown) */ + q_size, + msg_size, + (u8 *) pool_start, + NULL, /* peer (currently unknown) */ + C2_MQ_HOST_TARGET); + + return 0; +} + +int c2_init_cq(struct c2_dev *c2dev, int entries, + struct c2_ucontext *ctx, struct c2_cq *cq) +{ + struct c2wr_cq_create_req wr; + struct c2wr_cq_create_rep *reply; + unsigned long peer_pa; + struct c2_vq_req *vq_req; + int err; + + might_sleep(); + + cq->ibcq.cqe = entries - 1; + cq->is_kernel = !ctx; + + /* Allocate a shared pointer */ + cq->mq.shared = c2_alloc_mqsp(c2dev->kern_mqsp_pool); + if (!cq->mq.shared) + return -ENOMEM; + + /* Allocate pages for the message pool */ + err = c2_alloc_cq_buf(&cq->mq, entries + 1, C2_CQ_MSG_SIZE); + if (err) + goto bail0; + + vq_req = vq_req_alloc(c2dev); + if (!vq_req) { + err = -ENOMEM; + goto bail1; + } + + memset(&wr, 0, sizeof(wr)); + c2_wr_set_id(&wr, CCWR_CQ_CREATE); + wr.hdr.context = (unsigned long) vq_req; + wr.rnic_handle = c2dev->adapter_handle; + wr.msg_size = cpu_to_be32(cq->mq.msg_size); + wr.depth = cpu_to_be32(cq->mq.q_size); + wr.shared_ht = cpu_to_be64(__pa(cq->mq.shared)); + wr.msg_pool = cpu_to_be64(__pa(cq->mq.msg_pool.host)); + wr.user_context = (u64) (unsigned long) (cq); + + vq_req_get(c2dev, vq_req); + + err = vq_send_wr(c2dev, (union c2wr *) & wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail2; + } + + err = vq_wait_for_reply(c2dev, vq_req); + if (err) + goto bail2; + + reply = (struct c2wr_cq_create_rep *) (unsigned long) (vq_req->reply_msg); + if (!reply) { + err = -ENOMEM; + goto bail2; + } + + if ((err = c2_errno(reply)) != 0) + goto bail3; + + cq->adapter_handle = reply->cq_handle; + cq->mq.index = be32_to_cpu(reply->mq_index); + + peer_pa = c2dev->pa + be32_to_cpu(reply->adapter_shared); + cq->mq.peer = ioremap_nocache(peer_pa, PAGE_SIZE); + if (!cq->mq.peer) { + err = -ENOMEM; + goto bail3; + } + + vq_repbuf_free(c2dev, reply); + vq_req_free(c2dev, vq_req); + + spin_lock_init(&cq->lock); + atomic_set(&cq->refcount, 1); + init_waitqueue_head(&cq->wait); + + /* + * Use the MQ index allocated by the adapter to + * store the CQ in the qptr_array + */ + cq->cqn = cq->mq.index; + c2dev->qptr_array[cq->cqn] = cq; + + return 0; + + bail3: + vq_repbuf_free(c2dev, reply); + bail2: + vq_req_free(c2dev, vq_req); + bail1: + c2_free_cq_buf(&cq->mq); + bail0: + c2_free_mqsp(cq->mq.shared); + + return err; +} + +void c2_free_cq(struct c2_dev *c2dev, struct c2_cq *cq) +{ + int err; + struct c2_vq_req *vq_req; + struct c2wr_cq_destroy_req wr; + struct c2wr_cq_destroy_rep *reply; + + might_sleep(); + + /* Clear CQ from the qptr array */ + spin_lock_irq(&c2dev->lock); + c2dev->qptr_array[cq->mq.index] = NULL; + atomic_dec(&cq->refcount); + spin_unlock_irq(&c2dev->lock); + + wait_event(cq->wait, !atomic_read(&cq->refcount)); + + vq_req = vq_req_alloc(c2dev); + if (!vq_req) { + goto bail0; + } + + memset(&wr, 0, sizeof(wr)); + c2_wr_set_id(&wr, CCWR_CQ_DESTROY); + wr.hdr.context = (unsigned long) vq_req; + wr.rnic_handle = c2dev->adapter_handle; + wr.cq_handle = cq->adapter_handle; + + vq_req_get(c2dev, vq_req); + + err = vq_send_wr(c2dev, (union c2wr *) & wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail1; + } + + err = vq_wait_for_reply(c2dev, vq_req); + if (err) + goto bail1; + + reply = (struct c2wr_cq_destroy_rep *) (unsigned long) (vq_req->reply_msg); + + vq_repbuf_free(c2dev, reply); + bail1: + vq_req_free(c2dev, vq_req); + bail0: + if (cq->is_kernel) { + c2_free_cq_buf(&cq->mq); + } + + return; +} diff --git a/drivers/infiniband/hw/amso1100/c2_pd.c b/drivers/infiniband/hw/amso1100/c2_pd.c new file mode 100644 index 0000000..27459b8 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_pd.c @@ -0,0 +1,71 @@ +/* + * Copyright (c) 2004 Topspin Communications. All rights reserved. + * Copyright (c) 2005 Cisco Systems. All rights reserved. + * Copyright (c) 2005 Mellanox Technologies. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include + +#include "c2.h" +#include "c2_provider.h" + +int c2_pd_alloc(struct c2_dev *dev, int privileged, struct c2_pd *pd) +{ + int err = 0; + + might_sleep(); + + atomic_set(&pd->sqp_count, 0); + pd->pd_id = c2_alloc(&dev->pd_table.alloc); + if (pd->pd_id == -1) + return -ENOMEM; + + return err; +} + +void c2_pd_free(struct c2_dev *dev, struct c2_pd *pd) +{ + might_sleep(); + c2_free(&dev->pd_table.alloc, pd->pd_id); +} + +int __devinit c2_init_pd_table(struct c2_dev *dev) +{ + return c2_alloc_init(&dev->pd_table.alloc, dev->props.max_pd, 0); +} + +void __devexit c2_cleanup_pd_table(struct c2_dev *dev) +{ + /* XXX check if any PDs are still allocated? */ + c2_alloc_cleanup(&dev->pd_table.alloc); +} diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c new file mode 100644 index 0000000..0d38c54 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_provider.c @@ -0,0 +1,871 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include +#include "c2.h" +#include "c2_provider.h" +#include "c2_user.h" + +static int c2_query_device(struct ib_device *ibdev, + struct ib_device_attr *props) +{ + struct c2_dev *c2dev = to_c2dev(ibdev); + + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + + *props = c2dev->props; + return 0; +} + +static int c2_query_port(struct ib_device *ibdev, + u8 port, struct ib_port_attr *props) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + + props->max_mtu = IB_MTU_4096; + props->lid = 0; + props->lmc = 0; + props->sm_lid = 0; + props->sm_sl = 0; + props->state = IB_PORT_ACTIVE; + props->phys_state = 0; + props->port_cap_flags = + IB_PORT_CM_SUP | + IB_PORT_REINIT_SUP | + IB_PORT_VENDOR_CLASS_SUP | IB_PORT_BOOT_MGMT_SUP; + props->gid_tbl_len = 1; + props->pkey_tbl_len = 1; + props->qkey_viol_cntr = 0; + props->active_width = 1; + props->active_speed = 1; + + return 0; +} + +static int c2_modify_port(struct ib_device *ibdev, + u8 port, int port_modify_mask, + struct ib_port_modify *props) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + return 0; +} + +static int c2_query_pkey(struct ib_device *ibdev, + u8 port, u16 index, u16 * pkey) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + *pkey = 0; + return 0; +} + +static int c2_query_gid(struct ib_device *ibdev, u8 port, + int index, union ib_gid *gid) +{ + struct c2_dev *c2dev = to_c2dev(ibdev); + + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + memset(&(gid->raw[0]), 0, sizeof(gid->raw)); + memcpy(&(gid->raw[0]), c2dev->pseudo_netdev->dev_addr, 6); + + return 0; +} + +/* Allocate the user context data structure. This keeps track + * of all objects associated with a particular user-mode client. + */ +static struct ib_ucontext *c2_alloc_ucontext(struct ib_device *ibdev, + struct ib_udata *udata) +{ + struct c2_ucontext *context; + + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + context = kmalloc(sizeof *context, GFP_KERNEL); + if (!context) + return ERR_PTR(-ENOMEM); + + return &context->ibucontext; +} + +static int c2_dealloc_ucontext(struct ib_ucontext *context) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + kfree(context); + return 0; +} + +static int c2_mmap_uar(struct ib_ucontext *context, struct vm_area_struct *vma) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + return -ENOSYS; +} + +static struct ib_pd *c2_alloc_pd(struct ib_device *ibdev, + struct ib_ucontext *context, + struct ib_udata *udata) +{ + struct c2_pd *pd; + int err; + + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + + pd = kmalloc(sizeof *pd, GFP_KERNEL); + if (!pd) + return ERR_PTR(-ENOMEM); + + err = c2_pd_alloc(to_c2dev(ibdev), !context, pd); + if (err) { + kfree(pd); + return ERR_PTR(err); + } + + if (context) { + if (ib_copy_to_udata(udata, &pd->pd_id, sizeof(__u32))) { + c2_pd_free(to_c2dev(ibdev), pd); + kfree(pd); + return ERR_PTR(-EFAULT); + } + } + + return &pd->ibpd; +} + +static int c2_dealloc_pd(struct ib_pd *pd) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + c2_pd_free(to_c2dev(pd->device), to_c2pd(pd)); + kfree(pd); + + return 0; +} + +static struct ib_ah *c2_ah_create(struct ib_pd *pd, struct ib_ah_attr *ah_attr) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + return ERR_PTR(-ENOSYS); +} + +static int c2_ah_destroy(struct ib_ah *ah) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + return -ENOSYS; +} + +static void c2_add_ref(struct ib_qp *ibqp) +{ + struct c2_qp *qp; + BUG_ON(!ibqp); + qp = to_c2qp(ibqp); + atomic_inc(&qp->refcount); +} + +static void c2_rem_ref(struct ib_qp *ibqp) +{ + struct c2_qp *qp; + BUG_ON(!ibqp); + qp = to_c2qp(ibqp); +#if 0 + dprintk("%s:%d qp=%p, qp->refcount=%d\n", __FUNCTION__, __LINE__, + qp, atomic_read(&qp->refcount)); +#endif + if (atomic_dec_and_test(&qp->refcount)) + wake_up(&qp->wait); +} + +struct ib_qp *c2_get_qp(struct ib_device *device, int qpn) +{ + struct c2_dev* c2dev = to_c2dev(device); + struct c2_qp *qp; + + qp = c2dev->qp_table.map[qpn]; + dprintk("%s Returning QP=%p for QPN=%d, device=%p, refcount=%d\n", + __FUNCTION__, qp, qpn, device, + (qp?atomic_read(&qp->refcount):0)); + + return (qp?&qp->ibqp:NULL); +} + +static struct ib_qp *c2_create_qp(struct ib_pd *pd, + struct ib_qp_init_attr *init_attr, + struct ib_udata *udata) +{ + struct c2_qp *qp; + int err; + + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + + switch (init_attr->qp_type) { + case IB_QPT_RC: + qp = kzalloc(sizeof(*qp), GFP_KERNEL); + if (!qp) { + dprintk("%s: Unable to allocate QP\n", __FUNCTION__); + return ERR_PTR(-ENOMEM); + } + spin_lock_init(&qp->lock); + if (pd->uobject) { + /* XXX userspace specific */ + } + + err = c2_alloc_qp(to_c2dev(pd->device), + to_c2pd(pd), init_attr, qp); + + if (err && pd->uobject) { + /* XXX userspace specific */ + } + + break; + default: + dprintk("%s: Invalid QP type: %d\n", __FUNCTION__, + init_attr->qp_type); + return ERR_PTR(-EINVAL); + break; + } + + if (err) { + kfree(qp); + return ERR_PTR(err); + } + + return &qp->ibqp; +} + +static int c2_destroy_qp(struct ib_qp *ib_qp) +{ + struct c2_qp *qp = to_c2qp(ib_qp); + + dprintk("%s:%u qp=%p,qp->state=%d\n", + __FUNCTION__, __LINE__,ib_qp,qp->state); + c2_free_qp(to_c2dev(ib_qp->device), qp); + kfree(qp); + return 0; +} + +static struct ib_cq *c2_create_cq(struct ib_device *ibdev, int entries, + struct ib_ucontext *context, + struct ib_udata *udata) +{ + struct c2_cq *cq; + int err; + + cq = kmalloc(sizeof(*cq), GFP_KERNEL); + if (!cq) { + dprintk("%s: Unable to allocate CQ\n", __FUNCTION__); + return ERR_PTR(-ENOMEM); + } + + err = c2_init_cq(to_c2dev(ibdev), entries, NULL, cq); + if (err) { + dprintk("%s: error initializing CQ\n", __FUNCTION__); + kfree(cq); + return ERR_PTR(err); + } + + return &cq->ibcq; +} + +static int c2_destroy_cq(struct ib_cq *ib_cq) +{ + struct c2_cq *cq = to_c2cq(ib_cq); + + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + + c2_free_cq(to_c2dev(ib_cq->device), cq); + kfree(cq); + + return 0; +} + +static inline u32 c2_convert_access(int acc) +{ + return (acc & IB_ACCESS_REMOTE_WRITE ? C2_ACF_REMOTE_WRITE : 0) | + (acc & IB_ACCESS_REMOTE_READ ? C2_ACF_REMOTE_READ : 0) | + (acc & IB_ACCESS_LOCAL_WRITE ? C2_ACF_LOCAL_WRITE : 0) | + C2_ACF_LOCAL_READ | C2_ACF_WINDOW_BIND; +} + +static struct ib_mr *c2_reg_phys_mr(struct ib_pd *ib_pd, + struct ib_phys_buf *buffer_list, + int num_phys_buf, int acc, u64 * iova_start) +{ + struct c2_mr *mr; + u64 *page_list; + u32 total_len; + int err, i, j, k, page_shift, pbl_depth; + + pbl_depth = 0; + total_len = 0; + + page_shift = PAGE_SHIFT; + /* + * If there is only 1 buffer we assume this could + * be a map of all phy mem...use a 32k page_shift. + */ + if (num_phys_buf == 1) + page_shift += 3; /* XXX */ + + for (i = 0; i < num_phys_buf; i++) { + + if (buffer_list[i].addr & ~PAGE_MASK) { + dprintk("Unaligned Memory Buffer: 0x%x\n", + (unsigned int) buffer_list[i].addr); + return ERR_PTR(-EINVAL); + } + + if (!buffer_list[i].size) { + dprintk("Invalid Buffer Size\n"); + return ERR_PTR(-EINVAL); + } + + total_len += buffer_list[i].size; + pbl_depth += ALIGN(buffer_list[i].size, + (1 << page_shift)) >> page_shift; + } + + page_list = vmalloc(sizeof(u64) * pbl_depth); + if (!page_list) { + dprintk("couldn't vmalloc page_list of size %zd\n", + (sizeof(u64) * pbl_depth)); + return ERR_PTR(-ENOMEM); + } + + for (i = 0, j = 0; i < num_phys_buf; i++) { + + int naddrs; + + naddrs = ALIGN(buffer_list[i].size, + (1 << page_shift)) >> page_shift; + for (k = 0; k < naddrs; k++) + page_list[j++] = (buffer_list[i].addr + + (k << page_shift)); + } + + mr = kmalloc(sizeof(*mr), GFP_KERNEL); + if (!mr) + return ERR_PTR(-ENOMEM); + + mr->pd = to_c2pd(ib_pd); + dprintk("%s - page shift %d, pbl_depth %d, total_len %u, " + "*iova_start %llx, first pa %llx, last pa %llx\n", + __FUNCTION__, page_shift, pbl_depth, total_len, + *iova_start, page_list[0], page_list[pbl_depth-1]); + err = c2_nsmr_register_phys_kern(to_c2dev(ib_pd->device), page_list, + (1 << page_shift), pbl_depth, + total_len, 0, iova_start, + c2_convert_access(acc), mr); + vfree(page_list); + if (err) { + kfree(mr); + return ERR_PTR(err); + } + + return &mr->ibmr; +} + +static struct ib_mr *c2_get_dma_mr(struct ib_pd *pd, int acc) +{ + struct ib_phys_buf bl; + u64 kva = 0; + + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + + /* AMSO1100 limit */ + bl.size = 0xffffffff; + bl.addr = 0; + return c2_reg_phys_mr(pd, &bl, 1, acc, &kva); +} + +static struct ib_mr *c2_reg_user_mr(struct ib_pd *pd, struct ib_umem *region, + int acc, struct ib_udata *udata) +{ + u64 *pages; + u64 kva = 0; + int shift, n, len; + int i, j, k; + int err = 0; + struct ib_umem_chunk *chunk; + struct c2_pd *c2pd = to_c2pd(pd); + struct c2_mr *c2mr; + + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + shift = ffs(region->page_size) - 1; + + c2mr = kmalloc(sizeof(*c2mr), GFP_KERNEL); + if (!c2mr) + return ERR_PTR(-ENOMEM); + c2mr->pd = c2pd; + + n = 0; + list_for_each_entry(chunk, ®ion->chunk_list, list) + n += chunk->nents; + + pages = kmalloc(n * sizeof(u64), GFP_KERNEL); + if (!pages) { + err = -ENOMEM; + goto err; + } + + i = 0; + list_for_each_entry(chunk, ®ion->chunk_list, list) { + for (j = 0; j < chunk->nmap; ++j) { + len = sg_dma_len(&chunk->page_list[j]) >> shift; + for (k = 0; k < len; ++k) { + pages[i++] = + sg_dma_address(&chunk->page_list[j]) + + (region->page_size * k); + } + } + } + + kva = (u64)region->virt_base; + err = c2_nsmr_register_phys_kern(to_c2dev(pd->device), + pages, + region->page_size, + i, + region->length, + region->offset, + &kva, + c2_convert_access(acc), + c2mr); + kfree(pages); + if (err) { + kfree(c2mr); + return ERR_PTR(err); + } + return &c2mr->ibmr; + +err: + kfree(c2mr); + return ERR_PTR(err); +} + +static int c2_dereg_mr(struct ib_mr *ib_mr) +{ + struct c2_mr *mr = to_c2mr(ib_mr); + int err; + + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + + err = c2_stag_dealloc(to_c2dev(ib_mr->device), ib_mr->lkey); + if (err) + dprintk("c2_stag_dealloc failed: %d\n", err); + else + kfree(mr); + + return err; +} + +static ssize_t show_rev(struct class_device *cdev, char *buf) +{ + struct c2_dev *dev = container_of(cdev, struct c2_dev, ibdev.class_dev); + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + return sprintf(buf, "%x\n", dev->props.hw_ver); +} + +static ssize_t show_fw_ver(struct class_device *cdev, char *buf) +{ + struct c2_dev *dev = container_of(cdev, struct c2_dev, ibdev.class_dev); + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + return sprintf(buf, "%x.%x.%x\n", + (int) (dev->props.fw_ver >> 32), + (int) (dev->props.fw_ver >> 16) & 0xffff, + (int) (dev->props.fw_ver & 0xffff)); +} + +static ssize_t show_hca(struct class_device *cdev, char *buf) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + return sprintf(buf, "AMSO1100\n"); +} + +static ssize_t show_board(struct class_device *cdev, char *buf) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + return sprintf(buf, "%.*s\n", 32, "AMSO1100 Board ID"); +} + +static CLASS_DEVICE_ATTR(hw_rev, S_IRUGO, show_rev, NULL); +static CLASS_DEVICE_ATTR(fw_ver, S_IRUGO, show_fw_ver, NULL); +static CLASS_DEVICE_ATTR(hca_type, S_IRUGO, show_hca, NULL); +static CLASS_DEVICE_ATTR(board_id, S_IRUGO, show_board, NULL); + +static struct class_device_attribute *c2_class_attributes[] = { + &class_device_attr_hw_rev, + &class_device_attr_fw_ver, + &class_device_attr_hca_type, + &class_device_attr_board_id +}; + +static int c2_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, + int attr_mask) +{ + int err; + + err = + c2_qp_modify(to_c2dev(ibqp->device), to_c2qp(ibqp), attr, + attr_mask); + + return err; +} + +static int c2_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + return -ENOSYS; +} + +static int c2_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + return -ENOSYS; +} + +static int c2_process_mad(struct ib_device *ibdev, + int mad_flags, + u8 port_num, + struct ib_wc *in_wc, + struct ib_grh *in_grh, + struct ib_mad *in_mad, struct ib_mad *out_mad) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + return -ENOSYS; +} + +static int c2_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + + /* Request a connection */ + return c2_llp_connect(cm_id, iw_param); +} + +static int c2_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + + /* Accept the new connection */ + return c2_llp_accept(cm_id, iw_param); +} + +static int c2_reject(struct iw_cm_id *cm_id, const void *pdata, u8 pdata_len) +{ + int err; + + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + + err = c2_llp_reject(cm_id, pdata, pdata_len); + return err; +} + +static int c2_service_create(struct iw_cm_id *cm_id, int backlog) +{ + int err; + + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + err = c2_llp_service_create(cm_id, backlog); + dprintk("%s:%u err=%d\n", + __FUNCTION__, __LINE__, + err); + return err; +} + +static int c2_service_destroy(struct iw_cm_id *cm_id) +{ + int err; + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + + err = c2_llp_service_destroy(cm_id); + + return err; +} + +static int c2_pseudo_up(struct net_device *netdev) +{ + struct in_device *ind; + struct c2_dev *c2dev = netdev->priv; + + ind = in_dev_get(netdev); + if (!ind) + return 0; + + dprintk("adding...\n"); + for_ifa(ind) { +#ifdef C2_DEBUG + u8 *ip = (u8 *) & ifa->ifa_address; + + dprintk("%s: %d.%d.%d.%d\n", + ifa->ifa_label, ip[0], ip[1], ip[2], ip[3]); +#endif + c2_add_addr(c2dev, ifa->ifa_address, ifa->ifa_mask); + } + endfor_ifa(ind); + in_dev_put(ind); + + return 0; +} + +static int c2_pseudo_down(struct net_device *netdev) +{ + struct in_device *ind; + struct c2_dev *c2dev = netdev->priv; + + ind = in_dev_get(netdev); + if (!ind) + return 0; + + dprintk("deleting...\n"); + for_ifa(ind) { +#ifdef C2_DEBUG + u8 *ip = (u8 *) & ifa->ifa_address; + + dprintk("%s: %d.%d.%d.%d\n", + ifa->ifa_label, ip[0], ip[1], ip[2], ip[3]); +#endif + c2_del_addr(c2dev, ifa->ifa_address, ifa->ifa_mask); + } + endfor_ifa(ind); + in_dev_put(ind); + + return 0; +} + +static int c2_pseudo_xmit_frame(struct sk_buff *skb, struct net_device *netdev) +{ + kfree_skb(skb); + return NETDEV_TX_OK; +} + +static int c2_pseudo_change_mtu(struct net_device *netdev, int new_mtu) +{ + int ret = 0; + + if (new_mtu < ETH_ZLEN || new_mtu > ETH_JUMBO_MTU) + return -EINVAL; + + netdev->mtu = new_mtu; + + /* XXX tell rnic about new rmda interface mtu */ + return ret; +} + +static void setup(struct net_device *netdev) +{ + SET_MODULE_OWNER(netdev); + netdev->open = c2_pseudo_up; + netdev->stop = c2_pseudo_down; + netdev->hard_start_xmit = c2_pseudo_xmit_frame; + netdev->get_stats = NULL; + netdev->tx_timeout = NULL; + netdev->set_mac_address = NULL; + netdev->change_mtu = c2_pseudo_change_mtu; + netdev->watchdog_timeo = 0; + netdev->type = ARPHRD_ETHER; + netdev->mtu = 1500; + netdev->hard_header_len = ETH_HLEN; + netdev->addr_len = ETH_ALEN; + netdev->tx_queue_len = 0; + netdev->flags |= IFF_NOARP; + return; +} + +static struct net_device *c2_pseudo_netdev_init(struct c2_dev *c2dev) +{ + char name[IFNAMSIZ]; + struct net_device *netdev; + + /* change ethxxx to iwxxx */ + strcpy(name, "iw"); + strcat(name, &c2dev->netdev->name[3]); + netdev = alloc_netdev(sizeof(*netdev), name, setup); + if (!netdev) { + printk(KERN_ERR PFX "%s - etherdev alloc failed", + __FUNCTION__); + return NULL; + } + + netdev->priv = c2dev; + + SET_NETDEV_DEV(netdev, &c2dev->pcidev->dev); + + memcpy_fromio(netdev->dev_addr, c2dev->kva + C2_REGS_RDMA_ENADDR, 6); + + /* Print out the MAC address */ + dprintk("%s: MAC %02X:%02X:%02X:%02X:%02X:%02X\n", + netdev->name, + netdev->dev_addr[0], netdev->dev_addr[1], netdev->dev_addr[2], + netdev->dev_addr[3], netdev->dev_addr[4], netdev->dev_addr[5]); + + /* Disable network packets */ + netif_stop_queue(netdev); + return netdev; +} + +int c2_register_device(struct c2_dev *dev) +{ + int ret; + int i; + + /* Register pseudo network device */ + dev->pseudo_netdev = c2_pseudo_netdev_init(dev); + if (dev->pseudo_netdev) { + ret = register_netdev(dev->pseudo_netdev); + if (ret) { + printk(KERN_ERR PFX + "Unable to register netdev, ret = %d\n", ret); + free_netdev(dev->pseudo_netdev); + return ret; + } + } + + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + strlcpy(dev->ibdev.name, "amso%d", IB_DEVICE_NAME_MAX); + dev->ibdev.owner = THIS_MODULE; + dev->ibdev.uverbs_cmd_mask = + (1ull << IB_USER_VERBS_CMD_GET_CONTEXT) | + (1ull << IB_USER_VERBS_CMD_QUERY_DEVICE) | + (1ull << IB_USER_VERBS_CMD_QUERY_PORT) | + (1ull << IB_USER_VERBS_CMD_ALLOC_PD) | + (1ull << IB_USER_VERBS_CMD_DEALLOC_PD) | + (1ull << IB_USER_VERBS_CMD_REG_MR) | + (1ull << IB_USER_VERBS_CMD_DEREG_MR) | + (1ull << IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) | + (1ull << IB_USER_VERBS_CMD_CREATE_CQ) | + (1ull << IB_USER_VERBS_CMD_DESTROY_CQ) | + (1ull << IB_USER_VERBS_CMD_REQ_NOTIFY_CQ) | + (1ull << IB_USER_VERBS_CMD_CREATE_QP) | + (1ull << IB_USER_VERBS_CMD_MODIFY_QP) | + (1ull << IB_USER_VERBS_CMD_POLL_CQ) | + (1ull << IB_USER_VERBS_CMD_DESTROY_QP) | + (1ull << IB_USER_VERBS_CMD_POST_SEND) | + (1ull << IB_USER_VERBS_CMD_POST_RECV); + + dev->ibdev.node_type = RDMA_NODE_RNIC; + memset(&dev->ibdev.node_guid, 0, sizeof(dev->ibdev.node_guid)); + memcpy(&dev->ibdev.node_guid, dev->pseudo_netdev->dev_addr, 6); + dev->ibdev.phys_port_cnt = 1; + dev->ibdev.dma_device = &dev->pcidev->dev; + dev->ibdev.class_dev.dev = &dev->pcidev->dev; + dev->ibdev.query_device = c2_query_device; + dev->ibdev.query_port = c2_query_port; + dev->ibdev.modify_port = c2_modify_port; + dev->ibdev.query_pkey = c2_query_pkey; + dev->ibdev.query_gid = c2_query_gid; + dev->ibdev.alloc_ucontext = c2_alloc_ucontext; + dev->ibdev.dealloc_ucontext = c2_dealloc_ucontext; + dev->ibdev.mmap = c2_mmap_uar; + dev->ibdev.alloc_pd = c2_alloc_pd; + dev->ibdev.dealloc_pd = c2_dealloc_pd; + dev->ibdev.create_ah = c2_ah_create; + dev->ibdev.destroy_ah = c2_ah_destroy; + dev->ibdev.create_qp = c2_create_qp; + dev->ibdev.modify_qp = c2_modify_qp; + dev->ibdev.destroy_qp = c2_destroy_qp; + dev->ibdev.create_cq = c2_create_cq; + dev->ibdev.destroy_cq = c2_destroy_cq; + dev->ibdev.poll_cq = c2_poll_cq; + dev->ibdev.get_dma_mr = c2_get_dma_mr; + dev->ibdev.reg_phys_mr = c2_reg_phys_mr; + dev->ibdev.reg_user_mr = c2_reg_user_mr; + dev->ibdev.dereg_mr = c2_dereg_mr; + + dev->ibdev.alloc_fmr = NULL; + dev->ibdev.unmap_fmr = NULL; + dev->ibdev.dealloc_fmr = NULL; + dev->ibdev.map_phys_fmr = NULL; + + dev->ibdev.attach_mcast = c2_multicast_attach; + dev->ibdev.detach_mcast = c2_multicast_detach; + dev->ibdev.process_mad = c2_process_mad; + + dev->ibdev.req_notify_cq = c2_arm_cq; + dev->ibdev.post_send = c2_post_send; + dev->ibdev.post_recv = c2_post_receive; + + dev->ibdev.iwcm = kmalloc(sizeof(*dev->ibdev.iwcm), GFP_KERNEL); + dev->ibdev.iwcm->add_ref = c2_add_ref; + dev->ibdev.iwcm->rem_ref = c2_rem_ref; + dev->ibdev.iwcm->get_qp = c2_get_qp; + dev->ibdev.iwcm->connect = c2_connect; + dev->ibdev.iwcm->accept = c2_accept; + dev->ibdev.iwcm->reject = c2_reject; + dev->ibdev.iwcm->create_listen = c2_service_create; + dev->ibdev.iwcm->destroy_listen = c2_service_destroy; + + ret = ib_register_device(&dev->ibdev); + if (ret) + return ret; + + for (i = 0; i < ARRAY_SIZE(c2_class_attributes); ++i) { + ret = class_device_create_file(&dev->ibdev.class_dev, + c2_class_attributes[i]); + if (ret) { + unregister_netdev(dev->pseudo_netdev); + free_netdev(dev->pseudo_netdev); + ib_unregister_device(&dev->ibdev); + return ret; + } + } + + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + return 0; +} + +void c2_unregister_device(struct c2_dev *dev) +{ + dprintk("%s:%u\n", __FUNCTION__, __LINE__); + unregister_netdev(dev->pseudo_netdev); + free_netdev(dev->pseudo_netdev); + ib_unregister_device(&dev->ibdev); +} diff --git a/drivers/infiniband/hw/amso1100/c2_provider.h b/drivers/infiniband/hw/amso1100/c2_provider.h new file mode 100644 index 0000000..05c4ab6 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_provider.h @@ -0,0 +1,182 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#ifndef C2_PROVIDER_H +#define C2_PROVIDER_H +#include + +#include +#include + +#include "c2_mq.h" +#include + +#define C2_MPT_FLAG_ATOMIC (1 << 14) +#define C2_MPT_FLAG_REMOTE_WRITE (1 << 13) +#define C2_MPT_FLAG_REMOTE_READ (1 << 12) +#define C2_MPT_FLAG_LOCAL_WRITE (1 << 11) +#define C2_MPT_FLAG_LOCAL_READ (1 << 10) + +struct c2_buf_list { + void *buf; + DECLARE_PCI_UNMAP_ADDR(mapping) +}; + + +/* The user context keeps track of objects allocated for a + * particular user-mode client. */ +struct c2_ucontext { + struct ib_ucontext ibucontext; +}; + +struct c2_mtt; + +/* All objects associated with a PD are kept in the + * associated user context if present. + */ +struct c2_pd { + struct ib_pd ibpd; + u32 pd_id; + atomic_t sqp_count; +}; + +struct c2_mr { + struct ib_mr ibmr; + struct c2_pd *pd; +}; + +struct c2_av; + +enum c2_ah_type { + C2_AH_ON_HCA, + C2_AH_PCI_POOL, + C2_AH_KMALLOC +}; + +struct c2_ah { + struct ib_ah ibah; +}; + +struct c2_cq { + struct ib_cq ibcq; + spinlock_t lock; + atomic_t refcount; + int cqn; + int is_kernel; + wait_queue_head_t wait; + + u32 adapter_handle; + struct c2_mq mq; +}; + +struct c2_wq { + spinlock_t lock; +}; +struct iw_cm_id; +struct c2_qp { + struct ib_qp ibqp; + struct iw_cm_id *cm_id; + spinlock_t lock; + atomic_t refcount; + wait_queue_head_t wait; + int qpn; + + u32 adapter_handle; + u32 send_sgl_depth; + u32 recv_sgl_depth; + u32 rdma_write_sgl_depth; + u8 state; + + struct c2_mq sq_mq; + struct c2_mq rq_mq; +}; + +struct c2_cr_query_attrs { + u32 local_addr; + u32 remote_addr; + u16 local_port; + u16 remote_port; +}; + +static inline struct c2_pd *to_c2pd(struct ib_pd *ibpd) +{ + return container_of(ibpd, struct c2_pd, ibpd); +} + +static inline struct c2_ucontext *to_c2ucontext(struct ib_ucontext *ibucontext) +{ + return container_of(ibucontext, struct c2_ucontext, ibucontext); +} + +static inline struct c2_mr *to_c2mr(struct ib_mr *ibmr) +{ + return container_of(ibmr, struct c2_mr, ibmr); +} + + +static inline struct c2_ah *to_c2ah(struct ib_ah *ibah) +{ + return container_of(ibah, struct c2_ah, ibah); +} + +static inline struct c2_cq *to_c2cq(struct ib_cq *ibcq) +{ + return container_of(ibcq, struct c2_cq, ibcq); +} + +static inline struct c2_qp *to_c2qp(struct ib_qp *ibqp) +{ + return container_of(ibqp, struct c2_qp, ibqp); +} + +static inline int is_rnic_addr(struct net_device *netdev, u32 addr) +{ + struct in_device *ind; + int ret = 0; + + ind = in_dev_get(netdev); + if (!ind) + return 0; + + for_ifa(ind) { + if (ifa->ifa_address == addr) { + ret = 1; + break; + } + } + endfor_ifa(ind); + in_dev_put(ind); + return ret; +} +#endif /* C2_PROVIDER_H */ diff --git a/drivers/infiniband/hw/amso1100/c2_qp.c b/drivers/infiniband/hw/amso1100/c2_qp.c new file mode 100644 index 0000000..e0e3c83 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_qp.c @@ -0,0 +1,975 @@ +/* + * Copyright (c) 2004 Topspin Communications. All rights reserved. + * Copyright (c) 2005 Cisco Systems. All rights reserved. + * Copyright (c) 2005 Mellanox Technologies. All rights reserved. + * Copyright (c) 2004 Voltaire, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#include "c2.h" +#include "c2_vq.h" +#include "c2_status.h" + +#define C2_MAX_ORD_PER_QP 128 +#define C2_MAX_IRD_PER_QP 128 + +#define C2_HINT_MAKE(q_index, hint_count) (((q_index) << 16) | hint_count) +#define C2_HINT_GET_INDEX(hint) (((hint) & 0x7FFF0000) >> 16) +#define C2_HINT_GET_COUNT(hint) ((hint) & 0x0000FFFF) + +#define NO_SUPPORT -1 +static const u8 c2_opcode[] = { + [IB_WR_SEND] = C2_WR_TYPE_SEND, + [IB_WR_SEND_WITH_IMM] = NO_SUPPORT, + [IB_WR_RDMA_WRITE] = C2_WR_TYPE_RDMA_WRITE, + [IB_WR_RDMA_WRITE_WITH_IMM] = NO_SUPPORT, + [IB_WR_RDMA_READ] = C2_WR_TYPE_RDMA_READ, + [IB_WR_ATOMIC_CMP_AND_SWP] = NO_SUPPORT, + [IB_WR_ATOMIC_FETCH_AND_ADD] = NO_SUPPORT, +}; + +static int to_c2_state(enum ib_qp_state ib_state) +{ + switch (ib_state) { + case IB_QPS_RESET: + return C2_QP_STATE_IDLE; + case IB_QPS_RTS: + return C2_QP_STATE_RTS; + case IB_QPS_SQD: + return C2_QP_STATE_CLOSING; + case IB_QPS_SQE: + return C2_QP_STATE_CLOSING; + case IB_QPS_ERR: + return C2_QP_STATE_ERROR; + default: + return -1; + } +} + +int to_ib_state(enum c2_qp_state c2_state) +{ + switch (c2_state) { + case C2_QP_STATE_IDLE: + return IB_QPS_RESET; + case C2_QP_STATE_CONNECTING: + return IB_QPS_RTR; + case C2_QP_STATE_RTS: + return IB_QPS_RTS; + case C2_QP_STATE_CLOSING: + return IB_QPS_SQD; + case C2_QP_STATE_ERROR: + return IB_QPS_ERR; + case C2_QP_STATE_TERMINATE: + return IB_QPS_SQE; + default: + return -1; + } +} + +const char *to_ib_state_str(int ib_state) +{ + static const char *state_str[] = { + "IB_QPS_RESET", + "IB_QPS_INIT", + "IB_QPS_RTR", + "IB_QPS_RTS", + "IB_QPS_SQD", + "IB_QPS_SQE", + "IB_QPS_ERR" + }; + if (ib_state < IB_QPS_RESET || + ib_state > IB_QPS_ERR) + return ""; + + ib_state -= IB_QPS_RESET; + return state_str[ib_state]; +} + +void c2_set_qp_state(struct c2_qp *qp, int c2_state) +{ + int new_state = to_ib_state(c2_state); + + dprintk("%s: qp[%p] state modify %s --> %s\n", + __FUNCTION__, + qp, + to_ib_state_str(qp->state), + to_ib_state_str(new_state)); + qp->state = new_state; +} + +#define C2_QP_NO_ATTR_CHANGE 0xFFFFFFFF + +int c2_qp_modify(struct c2_dev *c2dev, struct c2_qp *qp, + struct ib_qp_attr *attr, int attr_mask) +{ + struct c2wr_qp_modify_req wr; + struct c2wr_qp_modify_rep *reply; + struct c2_vq_req *vq_req; + unsigned long flags; + u8 next_state; + int err; + + dprintk("%s:%d qp=%p, %s --> %s\n", + __FUNCTION__, __LINE__, + qp, + to_ib_state_str(qp->state), + to_ib_state_str(attr->qp_state)); + + vq_req = vq_req_alloc(c2dev); + if (!vq_req) + return -ENOMEM; + + c2_wr_set_id(&wr, CCWR_QP_MODIFY); + wr.hdr.context = (unsigned long) vq_req; + wr.rnic_handle = c2dev->adapter_handle; + wr.qp_handle = qp->adapter_handle; + wr.ord = cpu_to_be32(C2_QP_NO_ATTR_CHANGE); + wr.ird = cpu_to_be32(C2_QP_NO_ATTR_CHANGE); + wr.sq_depth = cpu_to_be32(C2_QP_NO_ATTR_CHANGE); + wr.rq_depth = cpu_to_be32(C2_QP_NO_ATTR_CHANGE); + + if (attr_mask & IB_QP_STATE) { + /* Ensure the state is valid */ + if (attr->qp_state < 0 || attr->qp_state > IB_QPS_ERR) + return -EINVAL; + + wr.next_qp_state = cpu_to_be32(to_c2_state(attr->qp_state)); + + if (attr->qp_state == IB_QPS_ERR) { + spin_lock_irqsave(&qp->lock, flags); + if (qp->cm_id && qp->state == IB_QPS_RTS) { + dprintk("Generating CLOSE event for QP-->ERR, " + "qp=%p, cm_id=%p\n",qp,qp->cm_id); + /* Generate an CLOSE event */ + vq_req->cm_id = qp->cm_id; + vq_req->event = IW_CM_EVENT_CLOSE; + } + spin_unlock_irqrestore(&qp->lock, flags); + } + next_state = attr->qp_state; + + } else if (attr_mask & IB_QP_CUR_STATE) { + + if (attr->cur_qp_state != IB_QPS_RTR && + attr->cur_qp_state != IB_QPS_RTS && + attr->cur_qp_state != IB_QPS_SQD && + attr->cur_qp_state != IB_QPS_SQE) + return -EINVAL; + else + wr.next_qp_state = + cpu_to_be32(to_c2_state(attr->cur_qp_state)); + + next_state = attr->cur_qp_state; + + } else { + err = 0; + goto bail0; + } + + /* reference the request struct */ + vq_req_get(c2dev, vq_req); + + err = vq_send_wr(c2dev, (union c2wr *) & wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail0; + } + + err = vq_wait_for_reply(c2dev, vq_req); + if (err) + goto bail0; + + reply = (struct c2wr_qp_modify_rep *) (unsigned long) vq_req->reply_msg; + if (!reply) { + err = -ENOMEM; + goto bail0; + } + + err = c2_errno(reply); + if (!err) + qp->state = next_state; +#ifdef C2_DEBUG + else + dprintk("%s: c2_errno=%d\n", __FUNCTION__, err); +#endif + /* + * If we're going to error and generating the event here, then + * we need to remove the reference because there will be no + * close event generated by the adapter + */ + spin_lock_irqsave(&qp->lock, flags); + if (vq_req->event==IW_CM_EVENT_CLOSE && qp->cm_id) { + qp->cm_id->rem_ref(qp->cm_id); + qp->cm_id = NULL; + } + spin_unlock_irqrestore(&qp->lock, flags); + + vq_repbuf_free(c2dev, reply); + bail0: + vq_req_free(c2dev, vq_req); + + dprintk("%s:%d qp=%p, cur_state=%s\n", + __FUNCTION__, __LINE__, + qp, + to_ib_state_str(qp->state)); + return err; +} + +int c2_qp_set_read_limits(struct c2_dev *c2dev, struct c2_qp *qp, + int ord, int ird) +{ + struct c2wr_qp_modify_req wr; + struct c2wr_qp_modify_rep *reply; + struct c2_vq_req *vq_req; + int err; + + vq_req = vq_req_alloc(c2dev); + if (!vq_req) + return -ENOMEM; + + c2_wr_set_id(&wr, CCWR_QP_MODIFY); + wr.hdr.context = (unsigned long) vq_req; + wr.rnic_handle = c2dev->adapter_handle; + wr.qp_handle = qp->adapter_handle; + wr.ord = cpu_to_be32(ord); + wr.ird = cpu_to_be32(ird); + wr.sq_depth = cpu_to_be32(C2_QP_NO_ATTR_CHANGE); + wr.rq_depth = cpu_to_be32(C2_QP_NO_ATTR_CHANGE); + wr.next_qp_state = cpu_to_be32(C2_QP_NO_ATTR_CHANGE); + + /* reference the request struct */ + vq_req_get(c2dev, vq_req); + + err = vq_send_wr(c2dev, (union c2wr *) & wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail0; + } + + err = vq_wait_for_reply(c2dev, vq_req); + if (err) + goto bail0; + + reply = (struct c2wr_qp_modify_rep *) (unsigned long) + vq_req->reply_msg; + if (!reply) { + err = -ENOMEM; + goto bail0; + } + + err = c2_errno(reply); + vq_repbuf_free(c2dev, reply); + bail0: + vq_req_free(c2dev, vq_req); + return err; +} + +static int destroy_qp(struct c2_dev *c2dev, struct c2_qp *qp) +{ + struct c2_vq_req *vq_req; + struct c2wr_qp_destroy_req wr; + struct c2wr_qp_destroy_rep *reply; + unsigned long flags; + int err; + + /* + * Allocate a verb request message + */ + vq_req = vq_req_alloc(c2dev); + if (!vq_req) { + return -ENOMEM; + } + + /* + * Initialize the WR + */ + c2_wr_set_id(&wr, CCWR_QP_DESTROY); + wr.hdr.context = (unsigned long) vq_req; + wr.rnic_handle = c2dev->adapter_handle; + wr.qp_handle = qp->adapter_handle; + + /* + * reference the request struct. dereferenced in the int handler. + */ + vq_req_get(c2dev, vq_req); + + spin_lock_irqsave(&qp->lock, flags); + if (qp->cm_id && qp->state == IB_QPS_RTS) { + dprintk("destroy_qp: generating CLOSE event for QP-->ERR, " + "qp=%p, cm_id=%p\n",qp,qp->cm_id); + /* Generate an CLOSE event */ + vq_req->qp = qp; + vq_req->cm_id = qp->cm_id; + vq_req->event = IW_CM_EVENT_CLOSE; + } + spin_unlock_irqrestore(&qp->lock, flags); + + /* + * Send WR to adapter + */ + err = vq_send_wr(c2dev, (union c2wr *) & wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail0; + } + + /* + * Wait for reply from adapter + */ + err = vq_wait_for_reply(c2dev, vq_req); + if (err) { + goto bail0; + } + + /* + * Process reply + */ + reply = (struct c2wr_qp_destroy_rep *) (unsigned long) (vq_req->reply_msg); + if (!reply) { + err = -ENOMEM; + goto bail0; + } + + spin_lock_irqsave(&qp->lock, flags); + if (qp->cm_id) { + qp->cm_id->rem_ref(qp->cm_id); + qp->cm_id = NULL; + } + spin_unlock_irqrestore(&qp->lock, flags); + + vq_repbuf_free(c2dev, reply); + bail0: + vq_req_free(c2dev, vq_req); + return err; +} + +int c2_alloc_qp(struct c2_dev *c2dev, + struct c2_pd *pd, + struct ib_qp_init_attr *qp_attrs, struct c2_qp *qp) +{ + struct c2wr_qp_create_req wr; + struct c2wr_qp_create_rep *reply; + struct c2_vq_req *vq_req; + struct c2_cq *send_cq = to_c2cq(qp_attrs->send_cq); + struct c2_cq *recv_cq = to_c2cq(qp_attrs->recv_cq); + unsigned long peer_pa; + u32 q_size, msg_size, mmap_size; + void __iomem *mmap; + int err; + + qp->qpn = c2_alloc(&c2dev->qp_table.alloc); + if (qp->qpn == -1) + return -ENOMEM; + + qp->ibqp.qp_num = qp->qpn; + qp->ibqp.qp_type = IB_QPT_RC; + + /* Allocate the SQ and RQ shared pointers */ + qp->sq_mq.shared = c2_alloc_mqsp(c2dev->kern_mqsp_pool); + if (!qp->sq_mq.shared) { + err = -ENOMEM; + goto bail0; + } + + qp->rq_mq.shared = c2_alloc_mqsp(c2dev->kern_mqsp_pool); + if (!qp->rq_mq.shared) { + err = -ENOMEM; + goto bail1; + } + + /* Allocate the verbs request */ + vq_req = vq_req_alloc(c2dev); + if (vq_req == NULL) { + err = -ENOMEM; + goto bail2; + } + + /* Initialize the work request */ + memset(&wr, 0, sizeof(wr)); + c2_wr_set_id(&wr, CCWR_QP_CREATE); + wr.hdr.context = (unsigned long) vq_req; + wr.rnic_handle = c2dev->adapter_handle; + wr.sq_cq_handle = send_cq->adapter_handle; + wr.rq_cq_handle = recv_cq->adapter_handle; + wr.sq_depth = cpu_to_be32(qp_attrs->cap.max_send_wr + 1); + wr.rq_depth = cpu_to_be32(qp_attrs->cap.max_recv_wr + 1); + wr.srq_handle = 0; + wr.flags = cpu_to_be32(QP_RDMA_READ | QP_RDMA_WRITE | QP_MW_BIND | + QP_ZERO_STAG | QP_RDMA_READ_RESPONSE); + wr.send_sgl_depth = cpu_to_be32(qp_attrs->cap.max_send_sge); + wr.recv_sgl_depth = cpu_to_be32(qp_attrs->cap.max_recv_sge); + wr.rdma_write_sgl_depth = cpu_to_be32(qp_attrs->cap.max_send_sge); + // XXX no write depth? + wr.shared_sq_ht = cpu_to_be64(__pa(qp->sq_mq.shared)); + wr.shared_rq_ht = cpu_to_be64(__pa(qp->rq_mq.shared)); + wr.ord = cpu_to_be32(C2_MAX_ORD_PER_QP); + wr.ird = cpu_to_be32(C2_MAX_IRD_PER_QP); + wr.pd_id = pd->pd_id; + wr.user_context = (unsigned long) qp; + + vq_req_get(c2dev, vq_req); + + /* Send the WR to the adapter */ + err = vq_send_wr(c2dev, (union c2wr *) & wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail3; + } + + /* Wait for the verb reply */ + err = vq_wait_for_reply(c2dev, vq_req); + if (err) { + goto bail3; + } + + /* Process the reply */ + reply = (struct c2wr_qp_create_rep *) (unsigned long) (vq_req->reply_msg); + if (!reply) { + err = -ENOMEM; + goto bail3; + } + + if ((err = c2_wr_get_result(reply)) != 0) { + goto bail4; + } + + /* Fill in the kernel QP struct */ + atomic_set(&qp->refcount, 1); + qp->adapter_handle = reply->qp_handle; + qp->state = IB_QPS_RESET; + qp->send_sgl_depth = qp_attrs->cap.max_send_sge; + qp->rdma_write_sgl_depth = qp_attrs->cap.max_send_sge; + qp->recv_sgl_depth = qp_attrs->cap.max_recv_sge; + + /* Initialize the SQ MQ */ + q_size = be32_to_cpu(reply->sq_depth); + msg_size = be32_to_cpu(reply->sq_msg_size); + peer_pa = c2dev->pa + be32_to_cpu(reply->sq_mq_start); + mmap_size = PAGE_ALIGN(sizeof(struct c2_mq_shared) + msg_size * q_size); + mmap = ioremap_nocache(peer_pa, mmap_size); + if (!mmap) { + err = -ENOMEM; + goto bail5; + } + + c2_mq_req_init(&qp->sq_mq, + be32_to_cpu(reply->sq_mq_index), + q_size, + msg_size, + mmap + sizeof(struct c2_mq_shared), /* pool start */ + mmap, /* peer */ + C2_MQ_ADAPTER_TARGET); + + /* Initialize the RQ mq */ + q_size = be32_to_cpu(reply->rq_depth); + msg_size = be32_to_cpu(reply->rq_msg_size); + peer_pa = c2dev->pa + be32_to_cpu(reply->rq_mq_start); + mmap_size = PAGE_ALIGN(sizeof(struct c2_mq_shared) + msg_size * q_size); + mmap = ioremap_nocache(peer_pa, mmap_size); + if (!mmap) { + err = -ENOMEM; + goto bail6; + } + + c2_mq_req_init(&qp->rq_mq, + be32_to_cpu(reply->rq_mq_index), + q_size, + msg_size, + mmap + sizeof(struct c2_mq_shared), /* pool start */ + mmap, /* peer */ + C2_MQ_ADAPTER_TARGET); + + vq_repbuf_free(c2dev, reply); + vq_req_free(c2dev, vq_req); + + spin_lock_irq(&c2dev->qp_table.lock); + c2_array_set(&c2dev->qp_table.qp, qp->qpn & (c2dev->props.max_qp - 1), qp); + c2dev->qp_table.map[qp->qpn] = qp; + spin_unlock_irq(&c2dev->qp_table.lock); + + return 0; + + bail6: + iounmap(qp->sq_mq.peer); + bail5: + destroy_qp(c2dev, qp); + bail4: + vq_repbuf_free(c2dev, reply); + bail3: + vq_req_free(c2dev, vq_req); + bail2: + c2_free_mqsp(qp->rq_mq.shared); + bail1: + c2_free_mqsp(qp->sq_mq.shared); + bail0: + c2_free(&c2dev->qp_table.alloc, qp->qpn); + return err; +} + +void c2_free_qp(struct c2_dev *c2dev, struct c2_qp *qp) +{ + struct c2_cq *send_cq; + struct c2_cq *recv_cq; + + send_cq = to_c2cq(qp->ibqp.send_cq); + recv_cq = to_c2cq(qp->ibqp.recv_cq); + + /* + * Lock CQs here, so that CQ polling code can do QP lookup + * without taking a lock. + */ + spin_lock_irq(&send_cq->lock); + if (send_cq != recv_cq) + spin_lock(&recv_cq->lock); + + spin_lock(&c2dev->qp_table.lock); + c2_array_clear(&c2dev->qp_table.qp, qp->qpn & (c2dev->props.max_qp - 1)); + c2dev->qp_table.map[qp->qpn] = NULL; + spin_unlock(&c2dev->qp_table.lock); + + if (send_cq != recv_cq) + spin_unlock(&recv_cq->lock); + spin_unlock_irq(&send_cq->lock); + + /* + * Destory qp in the rnic... + */ + destroy_qp(c2dev, qp); + + /* + * Mark any unreaped CQEs as null and void. + */ + c2_cq_clean(c2dev, qp, send_cq->cqn); + if (send_cq != recv_cq) + c2_cq_clean(c2dev, qp, recv_cq->cqn); + /* + * Unmap the MQs and return the shared pointers + * to the message pool. + */ + iounmap(qp->sq_mq.peer); + iounmap(qp->rq_mq.peer); + c2_free_mqsp(qp->sq_mq.shared); + c2_free_mqsp(qp->rq_mq.shared); + + atomic_dec(&qp->refcount); + wait_event(qp->wait, !atomic_read(&qp->refcount)); + c2_free(&c2dev->qp_table.alloc, qp->qpn); +} + +/* + * Function: move_sgl + * + * Description: + * Move an SGL from the user's work request struct into a CCIL Work Request + * message, swapping to WR byte order and ensure the total length doesn't + * overflow. + * + * IN: + * dst - ptr to CCIL Work Request message SGL memory. + * src - ptr to the consumers SGL memory. + * + * OUT: none + * + * Return: + * CCIL status codes. + */ +static int +move_sgl(struct c2_data_addr * dst, struct ib_sge *src, int count, u32 * p_len, + u8 * actual_count) +{ + u32 tot = 0; /* running total */ + u8 acount = 0; /* running total non-0 len sge's */ + + while (count > 0) { + /* + * If the addition of this SGE causes the + * total SGL length to exceed 2^32-1, then + * fail-n-bail. + * + * If the current total plus the next element length + * wraps, then it will go negative and be less than the + * current total... + */ + if ((tot + src->length) < tot) { + return -EINVAL; + } + /* + * Bug: 1456 (as well as 1498 & 1643) + * Skip over any sge's supplied with len=0 + */ + if (src->length) { + tot += src->length; + dst->stag = cpu_to_be32(src->lkey); + dst->to = cpu_to_be64(src->addr); + dst->length = cpu_to_be32(src->length); + dst++; + acount++; + } + src++; + count--; + } + + if (acount == 0) { + /* + * Bug: 1476 (as well as 1498, 1456 and 1643) + * Setup the SGL in the WR to make it easier for the RNIC. + * This way, the FW doesn't have to deal with special cases. + * Setting length=0 should be sufficient. + */ + dst->stag = 0; + dst->to = 0; + dst->length = 0; + } + + *p_len = tot; + *actual_count = acount; + return 0; +} + +/* + * Function: c2_activity (private function) + * + * Description: + * Post an mq index to the host->adapter activity fifo. + * + * IN: + * c2dev - ptr to c2dev structure + * mq_index - mq index to post + * shared - value most recently written to shared + * + * OUT: + * + * Return: + * none + */ +static inline void c2_activity(struct c2_dev *c2dev, u32 mq_index, u16 shared) +{ + /* + * First read the register to see if the FIFO is full, and if so, + * spin until it's not. This isn't perfect -- there is no + * synchronization among the clients of the register, but in + * practice it prevents multiple CPU from hammering the bus + * with PCI RETRY. Note that when this does happen, the card + * cannot get on the bus and the card and system hang in a + * deadlock -- thus the need for this code. [TOT] + */ + while (readl(c2dev->regs + PCI_BAR0_ADAPTER_HINT) & 0x80000000) { + set_current_state(TASK_UNINTERRUPTIBLE); + schedule_timeout(0); + } + + __raw_writel(C2_HINT_MAKE(mq_index, shared), + c2dev->regs + PCI_BAR0_ADAPTER_HINT); +} + +/* + * Function: qp_wr_post + * + * Description: + * This in-line function allocates a MQ msg, then moves the host-copy of + * the completed WR into msg. Then it posts the message. + * + * IN: + * q - ptr to user MQ. + * wr - ptr to host-copy of the WR. + * qp - ptr to user qp + * size - Number of bytes to post. Assumed to be divisible by 4. + * + * OUT: none + * + * Return: + * CCIL status codes. + */ +static int qp_wr_post(struct c2_mq *q, union c2wr * wr, struct c2_qp *qp, u32 size) +{ + union c2wr *msg; + + msg = c2_mq_alloc(q); + if (msg == NULL) { + return -EINVAL; + } +#ifdef CCMSGMAGIC + ((c2wr_hdr_t *) wr)->magic = cpu_to_be32(CCWR_MAGIC); +#endif + + /* + * Since all header fields in the WR are the same as the + * CQE, set the following so the adapter need not. + */ + c2_wr_set_result(wr, CCERR_PENDING); + + /* + * Copy the wr down to the adapter + */ + memcpy((void *) msg, (void *) wr, size); + + c2_mq_produce(q); + return 0; +} + + +int c2_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr, + struct ib_send_wr **bad_wr) +{ + struct c2_dev *c2dev = to_c2dev(ibqp->device); + struct c2_qp *qp = to_c2qp(ibqp); + union c2wr wr; + int err = 0; + + u32 flags; + u32 tot_len; + u8 actual_sge_count; + u32 msg_size; + + if (qp->state > IB_QPS_RTS) + return -EINVAL; + + while (ib_wr) { + + flags = 0; + wr.sqwr.sq_hdr.user_hdr.hdr.context = ib_wr->wr_id; + if (ib_wr->send_flags & IB_SEND_SIGNALED) { + flags |= SQ_SIGNALED; + } + + switch (ib_wr->opcode) { + case IB_WR_SEND: + if (ib_wr->send_flags & IB_SEND_SOLICITED) { + c2_wr_set_id(&wr, C2_WR_TYPE_SEND_SE); + msg_size = sizeof(struct c2wr_send_req); + } else { + c2_wr_set_id(&wr, C2_WR_TYPE_SEND); + msg_size = sizeof(struct c2wr_send_req); + } + + wr.sqwr.send.remote_stag = 0; + msg_size += sizeof(struct c2_data_addr) * ib_wr->num_sge; + if (ib_wr->num_sge > qp->send_sgl_depth) { + err = -EINVAL; + break; + } + if (ib_wr->send_flags & IB_SEND_FENCE) { + flags |= SQ_READ_FENCE; + } + err = move_sgl((struct c2_data_addr *) & (wr.sqwr.send.data), + ib_wr->sg_list, + ib_wr->num_sge, + &tot_len, &actual_sge_count); + wr.sqwr.send.sge_len = cpu_to_be32(tot_len); + c2_wr_set_sge_count(&wr, actual_sge_count); + break; + case IB_WR_RDMA_WRITE: + c2_wr_set_id(&wr, C2_WR_TYPE_RDMA_WRITE); + msg_size = sizeof(struct c2wr_rdma_write_req) + + (sizeof(struct c2_data_addr) * ib_wr->num_sge); + if (ib_wr->num_sge > qp->rdma_write_sgl_depth) { + err = -EINVAL; + break; + } + if (ib_wr->send_flags & IB_SEND_FENCE) { + flags |= SQ_READ_FENCE; + } + wr.sqwr.rdma_write.remote_stag = + cpu_to_be32(ib_wr->wr.rdma.rkey); + wr.sqwr.rdma_write.remote_to = + cpu_to_be64(ib_wr->wr.rdma.remote_addr); + err = move_sgl((struct c2_data_addr *) + & (wr.sqwr.rdma_write.data), + ib_wr->sg_list, + ib_wr->num_sge, + &tot_len, &actual_sge_count); + wr.sqwr.rdma_write.sge_len = cpu_to_be32(tot_len); + c2_wr_set_sge_count(&wr, actual_sge_count); + break; + case IB_WR_RDMA_READ: + c2_wr_set_id(&wr, C2_WR_TYPE_RDMA_READ); + msg_size = sizeof(struct c2wr_rdma_read_req); + + /* IWarp only suppots 1 sge for RDMA reads */ + if (ib_wr->num_sge > 1) { + err = -EINVAL; + break; + } + + /* + * Move the local and remote stag/to/len into the WR. + */ + wr.sqwr.rdma_read.local_stag = + cpu_to_be32(ib_wr->sg_list->lkey); + wr.sqwr.rdma_read.local_to = + cpu_to_be64(ib_wr->sg_list->addr); + wr.sqwr.rdma_read.remote_stag = + cpu_to_be32(ib_wr->wr.rdma.rkey); + wr.sqwr.rdma_read.remote_to = + cpu_to_be64(ib_wr->wr.rdma.remote_addr); + wr.sqwr.rdma_read.length = + cpu_to_be32(ib_wr->sg_list->length); + break; + default: + /* error */ + msg_size = 0; + err = -EINVAL; + break; + } + + /* + * If we had an error on the last wr build, then + * break out. Possible errors include bogus WR + * type, and a bogus SGL length... + */ + if (err) { + break; + } + + /* + * Store flags + */ + c2_wr_set_flags(&wr, flags); + + /* + * Post the puppy! + */ + err = qp_wr_post(&qp->sq_mq, &wr, qp, msg_size); + if (err) { + break; + } + + /* + * Enqueue mq index to activity FIFO. + */ + c2_activity(c2dev, qp->sq_mq.index, qp->sq_mq.hint_count); + + ib_wr = ib_wr->next; + } + + if (err) + *bad_wr = ib_wr; + return err; +} + +int c2_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *ib_wr, + struct ib_recv_wr **bad_wr) +{ + struct c2_dev *c2dev = to_c2dev(ibqp->device); + struct c2_qp *qp = to_c2qp(ibqp); + union c2wr wr; + int err = 0; + + if (qp->state > IB_QPS_RTS) + return -EINVAL; + + /* + * Try and post each work request + */ + while (ib_wr) { + u32 tot_len; + u8 actual_sge_count; + + if (ib_wr->num_sge > qp->recv_sgl_depth) { + err = -EINVAL; + break; + } + + /* + * Create local host-copy of the WR + */ + wr.rqwr.rq_hdr.user_hdr.hdr.context = ib_wr->wr_id; + c2_wr_set_id(&wr, CCWR_RECV); + c2_wr_set_flags(&wr, 0); + + /* sge_count is limited to eight bits. */ + assert(ib_wr->num_sge < 256); + err = move_sgl((struct c2_data_addr *) & (wr.rqwr.data), + ib_wr->sg_list, + ib_wr->num_sge, &tot_len, &actual_sge_count); + c2_wr_set_sge_count(&wr, actual_sge_count); + + /* + * If we had an error on the last wr build, then + * break out. Possible errors include bogus WR + * type, and a bogus SGL length... + */ + if (err) { + break; + } + + err = qp_wr_post(&qp->rq_mq, &wr, qp, qp->rq_mq.msg_size); + if (err) { + break; + } + + /* + * Enqueue mq index to activity FIFO + */ + c2_activity(c2dev, qp->rq_mq.index, qp->rq_mq.hint_count); + + ib_wr = ib_wr->next; + } + + if (err) + *bad_wr = ib_wr; + return err; +} + +int __devinit c2_init_qp_table(struct c2_dev *c2dev) +{ + int err; + + spin_lock_init(&c2dev->qp_table.lock); + + err = c2_alloc_init(&c2dev->qp_table.alloc, + c2dev->props.max_qp, 1); + if (err) + return err; + + err = c2_array_init(&c2dev->qp_table.qp, c2dev->props.max_qp); + if (err) { + c2_alloc_cleanup(&c2dev->qp_table.alloc); + return err; + } + + c2dev->qp_table.map = vmalloc(sizeof(struct c2_qp *) * c2dev->props.max_qp); + if (!c2dev->qp_table.map) { + dprintk("Could not allocate QPN <-> QP map\n"); + c2_alloc_cleanup(&c2dev->qp_table.alloc); + c2_array_cleanup(&c2dev->qp_table.qp, c2dev->props.max_qp); + return -ENOMEM; + } + + return 0; +} + +void __devexit c2_cleanup_qp_table(struct c2_dev *c2dev) +{ + c2_alloc_cleanup(&c2dev->qp_table.alloc); + c2_array_cleanup(&c2dev->qp_table.qp, c2dev->props.max_qp); +} diff --git a/drivers/infiniband/hw/amso1100/c2_user.h b/drivers/infiniband/hw/amso1100/c2_user.h new file mode 100644 index 0000000..7e9e7ad --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_user.h @@ -0,0 +1,82 @@ +/* + * Copyright (c) 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2005 Cisco Systems. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#ifndef C2_USER_H +#define C2_USER_H + +#include + +/* + * Make sure that all structs defined in this file remain laid out so + * that they pack the same way on 32-bit and 64-bit architectures (to + * avoid incompatibility between 32-bit userspace and 64-bit kernels). + * In particular do not use pointer types -- pass pointers in __u64 + * instead. + */ + +struct c2_alloc_ucontext_resp { + __u32 qp_tab_size; + __u32 uarc_size; +}; + +struct c2_alloc_pd_resp { + __u32 pdn; + __u32 reserved; +}; + +struct c2_create_cq { + __u32 lkey; + __u32 pdn; + __u64 arm_db_page; + __u64 set_db_page; + __u32 arm_db_index; + __u32 set_db_index; +}; + +struct c2_create_cq_resp { + __u32 cqn; + __u32 reserved; +}; + +struct c2_create_qp { + __u32 lkey; + __u32 reserved; + __u64 sq_db_page; + __u64 rq_db_page; + __u32 sq_db_index; + __u32 rq_db_index; +}; + +#endif /* C2_USER_H */ From swise at opengridcomputing.com Wed May 31 11:27:46 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 31 May 2006 13:27:46 -0500 Subject: [openib-general] [PATCH 6/7] AMSO1100 Message Queues. In-Reply-To: <20060531182733.3652.54755.stgit@stevo-desktop> References: <20060531182733.3652.54755.stgit@stevo-desktop> Message-ID: <20060531182746.3652.84026.stgit@stevo-desktop> --- drivers/infiniband/hw/amso1100/c2_mq.c | 181 ++++++++++++++++++++++++++++++++ drivers/infiniband/hw/amso1100/c2_mq.h | 103 ++++++++++++++++++ 2 files changed, 284 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_mq.c b/drivers/infiniband/hw/amso1100/c2_mq.c new file mode 100644 index 0000000..e9e8bb0 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_mq.c @@ -0,0 +1,181 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include "c2.h" +#include "c2_mq.h" + +void *c2_mq_alloc(struct c2_mq *q) +{ + assert(q); + assert(q->magic == C2_MQ_MAGIC); + assert(q->type == C2_MQ_ADAPTER_TARGET); + + if (c2_mq_full(q)) { + return NULL; + } else { +#ifdef C2_DEBUG + struct c2wr_hdr *m = + (struct c2wr_hdr *) (q->msg_pool.host + q->priv * q->msg_size); +#ifdef CCMSGMAGIC + assert(m->magic == be32_to_cpu(~CCWR_MAGIC)); + m->magic = cpu_to_be32(CCWR_MAGIC); +#endif + return m; +#else + return q->msg_pool.host + q->priv * q->msg_size; +#endif + } +} + +void c2_mq_produce(struct c2_mq *q) +{ + assert(q); + assert(q->magic == C2_MQ_MAGIC); + assert(q->type == C2_MQ_ADAPTER_TARGET); + + if (!c2_mq_full(q)) { + q->priv = (q->priv + 1) % q->q_size; + q->hint_count++; + /* Update peer's offset. */ + __raw_writew(cpu_to_be16(q->priv), &q->peer->shared); + } +} + +void *c2_mq_consume(struct c2_mq *q) +{ + assert(q); + assert(q->magic == C2_MQ_MAGIC); + assert(q->type == C2_MQ_HOST_TARGET); + + if (c2_mq_empty(q)) { + return NULL; + } else { +#ifdef C2_DEBUG + struct c2wr_hdr *m = (struct c2wr_hdr *) + (q->msg_pool.host + q->priv * q->msg_size); +#ifdef CCMSGMAGIC + assert(m->magic == be32_to_cpu(CCWR_MAGIC)); +#endif + return m; +#else + return q->msg_pool.host + q->priv * q->msg_size; +#endif + } +} + +void c2_mq_free(struct c2_mq *q) +{ + assert(q); + assert(q->magic == C2_MQ_MAGIC); + assert(q->type == C2_MQ_HOST_TARGET); + + if (!c2_mq_empty(q)) { + +#ifdef CCMSGMAGIC + { + struct c2wr_hdr __iomem *m = (struct c2wr_hdr __iomem *) + (q->msg_pool.adapter + q->priv * q->msg_size); + __raw_writel(cpu_to_be32(~CCWR_MAGIC), &m->magic); + } +#endif + q->priv = (q->priv + 1) % q->q_size; + /* Update peer's offset. */ + __raw_writew(cpu_to_be16(q->priv), &q->peer->shared); + } +} + + +void c2_mq_lconsume(struct c2_mq *q, u32 wqe_count) +{ + assert(q); + assert(q->magic == C2_MQ_MAGIC); + assert(q->type == C2_MQ_ADAPTER_TARGET); + + while (wqe_count--) { + assert(!c2_mq_empty(q)); + *q->shared = cpu_to_be16((be16_to_cpu(*q->shared)+1) % q->q_size); + } +} + + +u32 c2_mq_count(struct c2_mq *q) +{ + s32 count; + + assert(q); + if (q->type == C2_MQ_HOST_TARGET) { + count = be16_to_cpu(*q->shared) - q->priv; + } else { + count = q->priv - be16_to_cpu(*q->shared); + } + + if (count < 0) { + count += q->q_size; + } + + return (u32) count; +} + +void c2_mq_req_init(struct c2_mq *q, u32 index, u32 q_size, u32 msg_size, + u8 __iomem *pool_start, u16 __iomem *peer, u32 type) +{ + assert(q->shared); + + /* This code assumes the byte swapping has already been done! */ + q->index = index; + q->q_size = q_size; + q->msg_size = msg_size; + q->msg_pool.adapter = pool_start; + q->peer = (struct c2_mq_shared __iomem *) peer; + q->magic = C2_MQ_MAGIC; + q->type = type; + q->priv = 0; + q->hint_count = 0; + return; +} +void c2_mq_rep_init(struct c2_mq *q, u32 index, u32 q_size, u32 msg_size, + u8 *pool_start, u16 __iomem *peer, u32 type) +{ + assert(q->shared); + + /* This code assumes the byte swapping has already been done! */ + q->index = index; + q->q_size = q_size; + q->msg_size = msg_size; + q->msg_pool.host = pool_start; + q->peer = (struct c2_mq_shared __iomem *) peer; + q->magic = C2_MQ_MAGIC; + q->type = type; + q->priv = 0; + q->hint_count = 0; + return; +} diff --git a/drivers/infiniband/hw/amso1100/c2_mq.h b/drivers/infiniband/hw/amso1100/c2_mq.h new file mode 100644 index 0000000..de00184 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_mq.h @@ -0,0 +1,103 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef _C2_MQ_H_ +#define _C2_MQ_H_ +#include +#include "c2_wr.h" + +enum c2_shared_regs { + + C2_SHARED_ARMED = 0x10, + C2_SHARED_NOTIFY = 0x18, + C2_SHARED_SHARED = 0x40, +}; + +struct c2_mq_shared { + u16 unused1; + u8 armed; + u8 notification_type; + u32 unused2; + u16 shared; + /* Pad to 64 bytes. */ + u8 pad[64 - sizeof(u16) - 2 * sizeof(u8) - sizeof(u32) - sizeof(u16)]; +}; + +enum c2_mq_type { + C2_MQ_HOST_TARGET = 1, + C2_MQ_ADAPTER_TARGET = 2, +}; + +/* + * c2_mq_t is for kernel-mode MQs like the VQs Cand the AEQ. + * c2_user_mq_t (which is the same format) is for user-mode MQs... + */ +#define C2_MQ_MAGIC 0x4d512020 /* 'MQ ' */ +struct c2_mq { + u32 magic; + union { + u8 *host; + u8 __iomem *adapter; + } msg_pool; + u16 hint_count; + u16 priv; + struct c2_mq_shared __iomem *peer; + u16 *shared; + u32 q_size; + u32 msg_size; + u32 index; + enum c2_mq_type type; +}; + +static __inline__ int c2_mq_empty(struct c2_mq *q) +{ + return q->priv == be16_to_cpu(*q->shared); +} + +static __inline__ int c2_mq_full(struct c2_mq *q) +{ + return q->priv == (be16_to_cpu(*q->shared) + q->q_size - 1) % q->q_size; +} + +extern void c2_mq_lconsume(struct c2_mq *q, u32 wqe_count); +extern void *c2_mq_alloc(struct c2_mq *q); +extern void c2_mq_produce(struct c2_mq *q); +extern void *c2_mq_consume(struct c2_mq *q); +extern void c2_mq_free(struct c2_mq *q); +extern u32 c2_mq_count(struct c2_mq *q); +extern void c2_mq_req_init(struct c2_mq *q, u32 index, u32 q_size, u32 msg_size, + u8 __iomem *pool_start, u16 __iomem *peer, u32 type); +extern void c2_mq_rep_init(struct c2_mq *q, u32 index, u32 q_size, u32 msg_size, + u8 *pool_start, u16 __iomem *peer, u32 type); + +#endif /* _C2_MQ_H_ */ From swise at opengridcomputing.com Wed May 31 11:27:48 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 31 May 2006 13:27:48 -0500 Subject: [openib-general] [PATCH 7/7] AMSO1100: Privileged Verbs Queues. In-Reply-To: <20060531182733.3652.54755.stgit@stevo-desktop> References: <20060531182733.3652.54755.stgit@stevo-desktop> Message-ID: <20060531182748.3652.46671.stgit@stevo-desktop> --- drivers/infiniband/hw/amso1100/c2_vq.c | 260 ++++++++++++++++++++++++++++++++ drivers/infiniband/hw/amso1100/c2_vq.h | 63 ++++++++ 2 files changed, 323 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_vq.c b/drivers/infiniband/hw/amso1100/c2_vq.c new file mode 100644 index 0000000..f98b531 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_vq.c @@ -0,0 +1,260 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include +#include + +#include "c2_vq.h" +#include "c2_provider.h" + +/* + * Verbs Request Objects: + * + * VQ Request Objects are allocated by the kernel verbs handlers. + * They contain a wait object, a refcnt, an atomic bool indicating that the + * adapter has replied, and a copy of the verb reply work request. + * A pointer to the VQ Request Object is passed down in the context + * field of the work request message, and reflected back by the adapter + * in the verbs reply message. The function handle_vq() in the interrupt + * path will use this pointer to: + * 1) append a copy of the verbs reply message + * 2) mark that the reply is ready + * 3) wake up the kernel verbs handler blocked awaiting the reply. + * + * + * The kernel verbs handlers do a "get" to put a 2nd reference on the + * VQ Request object. If the kernel verbs handler exits before the adapter + * can respond, this extra reference will keep the VQ Request object around + * until the adapter's reply can be processed. The reason we need this is + * because a pointer to this object is stuffed into the context field of + * the verbs work request message, and reflected back in the reply message. + * It is used in the interrupt handler (handle_vq()) to wake up the appropriate + * kernel verb handler that is blocked awaiting the verb reply. + * So handle_vq() will do a "put" on the object when it's done accessing it. + * NOTE: If we guarantee that the kernel verb handler will never bail before + * getting the reply, then we don't need these refcnts. + * + * + * VQ Request objects are freed by the kernel verbs handlers only + * after the verb has been processed, or when the adapter fails and + * does not reply. + * + * + * Verbs Reply Buffers: + * + * VQ Reply bufs are local host memory copies of a + * outstanding Verb Request reply + * message. The are always allocated by the kernel verbs handlers, and _may_ be + * freed by either the kernel verbs handler -or- the interrupt handler. The + * kernel verbs handler _must_ free the repbuf, then free the vq request object + * in that order. + */ + +int vq_init(struct c2_dev *c2dev) +{ + sprintf(c2dev->vq_cache_name, "c2-vq:dev%c", + (char) ('0' + c2dev->devnum)); + c2dev->host_msg_cache = + kmem_cache_create(c2dev->vq_cache_name, c2dev->rep_vq.msg_size, 0, + SLAB_HWCACHE_ALIGN, NULL, NULL); + if (c2dev->host_msg_cache == NULL) { + return -ENOMEM; + } + return 0; +} + +void vq_term(struct c2_dev *c2dev) +{ + kmem_cache_destroy(c2dev->host_msg_cache); +} + +/* vq_req_alloc - allocate a VQ Request Object and initialize it. + * The refcnt is set to 1. + */ +struct c2_vq_req *vq_req_alloc(struct c2_dev *c2dev) +{ + struct c2_vq_req *r; + + r = kmalloc(sizeof(struct c2_vq_req), GFP_KERNEL); + if (r) { + init_waitqueue_head(&r->wait_object); + r->reply_msg = (u64) NULL; + r->event = 0; + r->cm_id = NULL; + r->qp = NULL; + atomic_set(&r->refcnt, 1); + atomic_set(&r->reply_ready, 0); + } + return r; +} + + +/* vq_req_free - free the VQ Request Object. It is assumed the verbs handler + * has already free the VQ Reply Buffer if it existed. + */ +void vq_req_free(struct c2_dev *c2dev, struct c2_vq_req *r) +{ + r->reply_msg = (u64) NULL; + if (atomic_dec_and_test(&r->refcnt)) { + kfree(r); + } +} + +/* vq_req_get - reference a VQ Request Object. Done + * only in the kernel verbs handlers. + */ +void vq_req_get(struct c2_dev *c2dev, struct c2_vq_req *r) +{ + atomic_inc(&r->refcnt); +} + + +/* vq_req_put - dereference and potentially free a VQ Request Object. + * + * This is only called by handle_vq() on the + * interrupt when it is done processing + * a verb reply message. If the associated + * kernel verbs handler has already bailed, + * then this put will actually free the VQ + * Request object _and_ the VQ Reply Buffer + * if it exists. + */ +void vq_req_put(struct c2_dev *c2dev, struct c2_vq_req *r) +{ + if (atomic_dec_and_test(&r->refcnt)) { + if (r->reply_msg != (u64) NULL) + vq_repbuf_free(c2dev, + (void *) (unsigned long) r->reply_msg); + kfree(r); + } +} + + +/* + * vq_repbuf_alloc - allocate a VQ Reply Buffer. + */ +void *vq_repbuf_alloc(struct c2_dev *c2dev) +{ + return kmem_cache_alloc(c2dev->host_msg_cache, SLAB_ATOMIC); +} + +/* + * vq_send_wr - post a verbs request message to the Verbs Request Queue. + * If a message is not available in the MQ, then block until one is available. + * NOTE: handle_mq() on the interrupt context will wake up threads blocked here. + * When the adapter drains the Verbs Request Queue, + * it inserts MQ index 0 in to the + * adapter->host activity fifo and interrupts the host. + */ +int vq_send_wr(struct c2_dev *c2dev, union c2wr *wr) +{ + void *msg; + wait_queue_t __wait; + + /* + * grab adapter vq lock + */ + spin_lock(&c2dev->vqlock); + + /* + * allocate msg + */ + msg = c2_mq_alloc(&c2dev->req_vq); + + /* + * If we cannot get a msg, then we'll wait + * When a messages are available, the int handler will wake_up() + * any waiters. + */ + while (msg == NULL) { + dprintk("%s:%d no available msg in VQ, waiting...\n", + __FUNCTION__, __LINE__); + init_waitqueue_entry(&__wait, current); + add_wait_queue(&c2dev->req_vq_wo, &__wait); + spin_unlock(&c2dev->vqlock); + for (;;) { + set_current_state(TASK_INTERRUPTIBLE); + if (!c2_mq_full(&c2dev->req_vq)) { + break; + } + if (!signal_pending(current)) { + schedule_timeout(1 * HZ); /* 1 second... */ + continue; + } + set_current_state(TASK_RUNNING); + remove_wait_queue(&c2dev->req_vq_wo, &__wait); + return -EINTR; + } + set_current_state(TASK_RUNNING); + remove_wait_queue(&c2dev->req_vq_wo, &__wait); + spin_lock(&c2dev->vqlock); + msg = c2_mq_alloc(&c2dev->req_vq); + } + + /* + * copy wr into adapter msg + */ + memcpy(msg, wr, c2dev->req_vq.msg_size); + + /* + * post msg + */ + c2_mq_produce(&c2dev->req_vq); + + /* + * release adapter vq lock + */ + spin_unlock(&c2dev->vqlock); + return 0; +} + + +/* + * vq_wait_for_reply - block until the adapter posts a Verb Reply Message. + */ +int vq_wait_for_reply(struct c2_dev *c2dev, struct c2_vq_req *req) +{ + if (!wait_event_timeout(req->wait_object, + atomic_read(&req->reply_ready), + 60*HZ)) + return -ETIMEDOUT; + + return 0; +} + +/* + * vq_repbuf_free - Free a Verbs Reply Buffer. + */ +void vq_repbuf_free(struct c2_dev *c2dev, void *reply) +{ + kmem_cache_free(c2dev->host_msg_cache, reply); +} diff --git a/drivers/infiniband/hw/amso1100/c2_vq.h b/drivers/infiniband/hw/amso1100/c2_vq.h new file mode 100644 index 0000000..3380562 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_vq.h @@ -0,0 +1,63 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef _C2_VQ_H_ +#define _C2_VQ_H_ +#include +#include "c2.h" +#include "c2_wr.h" +#include "c2_provider.h" + +struct c2_vq_req { + u64 reply_msg; /* ptr to reply msg */ + wait_queue_head_t wait_object; /* wait object for vq reqs */ + atomic_t reply_ready; /* set when reply is ready */ + atomic_t refcnt; /* used to cancel WRs... */ + int event; + struct iw_cm_id *cm_id; + struct c2_qp *qp; +}; + +extern int vq_init(struct c2_dev *c2dev); +extern void vq_term(struct c2_dev *c2dev); + +extern struct c2_vq_req *vq_req_alloc(struct c2_dev *c2dev); +extern void vq_req_free(struct c2_dev *c2dev, struct c2_vq_req *req); +extern void vq_req_get(struct c2_dev *c2dev, struct c2_vq_req *req); +extern void vq_req_put(struct c2_dev *c2dev, struct c2_vq_req *req); +extern int vq_send_wr(struct c2_dev *c2dev, union c2wr * wr); + +extern void *vq_repbuf_alloc(struct c2_dev *c2dev); +extern void vq_repbuf_free(struct c2_dev *c2dev, void *reply); + +extern int vq_wait_for_reply(struct c2_dev *c2dev, struct c2_vq_req *req); +#endif /* _C2_VQ_H_ */ From swise at opengridcomputing.com Wed May 31 11:27:44 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 31 May 2006 13:27:44 -0500 Subject: [openib-general] [PATCH 5/7] AMSO1100 Memory Management. In-Reply-To: <20060531182733.3652.54755.stgit@stevo-desktop> References: <20060531182733.3652.54755.stgit@stevo-desktop> Message-ID: <20060531182744.3652.54099.stgit@stevo-desktop> --- drivers/infiniband/hw/amso1100/c2_alloc.c | 256 ++++++++++++++++++++ drivers/infiniband/hw/amso1100/c2_mm.c | 378 +++++++++++++++++++++++++++++ 2 files changed, 634 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_alloc.c b/drivers/infiniband/hw/amso1100/c2_alloc.c new file mode 100644 index 0000000..3934ac8 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_alloc.c @@ -0,0 +1,256 @@ +/* + * Copyright (c) 2004 Topspin Communications. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include +#include +#include + +#include "c2.h" + +/* Trivial bitmap-based allocator */ +u32 c2_alloc(struct c2_alloc *alloc) +{ + u32 obj; + + spin_lock(&alloc->lock); + obj = find_next_zero_bit(alloc->table, alloc->max, alloc->last); + if (obj >= alloc->max) + obj = find_first_zero_bit(alloc->table, alloc->max); + if (obj >= 0) { + alloc->last = obj+1; + if (alloc->last > alloc->max) + alloc->last = 0; + } + spin_unlock(&alloc->lock); + + return obj; +} + +void c2_free(struct c2_alloc *alloc, u32 obj) +{ + spin_lock(&alloc->lock); + clear_bit(obj, alloc->table); + spin_unlock(&alloc->lock); +} + +int c2_alloc_init(struct c2_alloc *alloc, u32 num, u32 reserved) +{ + int i; + + alloc->last = 0; + alloc->max = num; + spin_lock_init(&alloc->lock); + alloc->table = kmalloc(BITS_TO_LONGS(num) * sizeof(long), GFP_KERNEL); + if (!alloc->table) + return -ENOMEM; + + bitmap_zero(alloc->table, num); + for (i = 0; i < reserved; ++i) + set_bit(i, alloc->table); + + return 0; +} + +void c2_alloc_cleanup(struct c2_alloc *alloc) +{ + kfree(alloc->table); +} + +/* + * Array of pointers with lazy allocation of leaf pages. Callers of + * _get, _set and _clear methods must use a lock or otherwise + * serialize access to the array. + */ + +void *c2_array_get(struct c2_array *array, int index) +{ + int p = (index * sizeof(void *)) >> PAGE_SHIFT; + + if (array->page_list[p].page) { + int i = index & (PAGE_SIZE / sizeof(void *) - 1); + return array->page_list[p].page[i]; + } else + return NULL; +} + +int c2_array_set(struct c2_array *array, int index, void *value) +{ + int p = (index * sizeof(void *)) >> PAGE_SHIFT; + + /* Allocate with GFP_ATOMIC because we'll be called with locks held. */ + if (!array->page_list[p].page) + array->page_list[p].page = + (void **) get_zeroed_page(GFP_ATOMIC); + + if (!array->page_list[p].page) + return -ENOMEM; + + array->page_list[p].page[index & (PAGE_SIZE / sizeof(void *) - 1)] = + value; + ++array->page_list[p].used; + + return 0; +} + +void c2_array_clear(struct c2_array *array, int index) +{ + int p = (index * sizeof(void *)) >> PAGE_SHIFT; + + if (--array->page_list[p].used == 0) { + free_page((unsigned long) array->page_list[p].page); + array->page_list[p].page = NULL; + } + + if (array->page_list[p].used < 0) + pr_debug("Array %p index %d page %d with ref count %d < 0\n", + array, index, p, array->page_list[p].used); +} + +int c2_array_init(struct c2_array *array, int nent) +{ + int npage = (nent * sizeof(void *) + PAGE_SIZE - 1) / PAGE_SIZE; + int i; + + array->page_list = + kmalloc(npage * sizeof *array->page_list, GFP_KERNEL); + if (!array->page_list) + return -ENOMEM; + + for (i = 0; i < npage; ++i) { + array->page_list[i].page = NULL; + array->page_list[i].used = 0; + } + + return 0; +} + +void c2_array_cleanup(struct c2_array *array, int nent) +{ + int i; + + for (i = 0; i < (nent * sizeof(void *) + PAGE_SIZE - 1) / PAGE_SIZE; + ++i) + free_page((unsigned long) array->page_list[i].page); + + kfree(array->page_list); +} + +static int c2_alloc_mqsp_chunk(gfp_t gfp_mask, struct sp_chunk **head) +{ + int i; + struct sp_chunk *new_head; + + new_head = (struct sp_chunk *) __get_free_page(gfp_mask | GFP_DMA); + if (new_head == NULL) + return -ENOMEM; + + new_head->next = NULL; + new_head->head = 0; + new_head->gfp_mask = gfp_mask; + + /* build list where each index is the next free slot */ + for (i = 0; + i < (PAGE_SIZE - sizeof(struct sp_chunk) - + sizeof(u16)) / sizeof(u16) - 1; + i++) { + new_head->shared_ptr[i] = i + 1; + } + /* terminate list */ + new_head->shared_ptr[i] = 0xFFFF; + + *head = new_head; + return 0; +} + +int c2_init_mqsp_pool(gfp_t gfp_mask, struct sp_chunk **root) +{ + return c2_alloc_mqsp_chunk(gfp_mask, root); +} + +void c2_free_mqsp_pool(struct sp_chunk *root) +{ + struct sp_chunk *next; + + while (root) { + next = root->next; + __free_page((struct page *) root); + root = next; + } +} + +u16 *c2_alloc_mqsp(struct sp_chunk *head) +{ + u16 mqsp; + + while (head) { + mqsp = head->head; + if (mqsp != 0xFFFF) { + head->head = head->shared_ptr[mqsp]; + break; + } else if (head->next == NULL) { + if (c2_alloc_mqsp_chunk(head->gfp_mask, &head->next)==0) { + head = head->next; + mqsp = head->head; + head->head = head->shared_ptr[mqsp]; + break; + } else + return NULL; + } else + head = head->next; + } + if (head) + return &(head->shared_ptr[mqsp]); + return NULL; +} + +void c2_free_mqsp(u16 * mqsp) +{ + struct sp_chunk *head; + u16 idx; + + /* The chunk containing this ptr begins at the page boundary */ + head = (struct sp_chunk *) ((unsigned long) mqsp & PAGE_MASK); + + /* Link head to new mqsp */ + *mqsp = head->head; + + /* Compute the shared_ptr index */ + idx = ((unsigned long) mqsp & ~PAGE_MASK) >> 1; + idx -= (unsigned long) &(((struct sp_chunk *) 0)->shared_ptr[0]) >> 1; + + /* Point this index at the head */ + head->shared_ptr[idx] = head->head; + + /* Point head at this index */ + head->head = idx; +} diff --git a/drivers/infiniband/hw/amso1100/c2_mm.c b/drivers/infiniband/hw/amso1100/c2_mm.c new file mode 100644 index 0000000..13c8122 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_mm.c @@ -0,0 +1,378 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include "c2.h" +#include "c2_vq.h" + +#define PBL_VIRT 1 +#define PBL_PHYS 2 + +/* + * Send all the PBL messages to convey the remainder of the PBL + * Wait for the adapter's reply on the last one. + * This is indicated by setting the MEM_PBL_COMPLETE in the flags. + * + * NOTE: vq_req is _not_ freed by this function. The VQ Host + * Reply buffer _is_ freed by this function. + */ +static int +send_pbl_messages(struct c2_dev *c2dev, u32 stag_index, + unsigned long va, u32 pbl_depth, + struct c2_vq_req *vq_req, int pbl_type) +{ + u32 pbe_count; /* amt that fits in a PBL msg */ + u32 count; /* amt in this PBL MSG. */ + struct c2wr_nsmr_pbl_req *wr; /* PBL WR ptr */ + struct c2wr_nsmr_pbl_rep *reply; /* reply ptr */ + int err, pbl_virt, pbl_index, i; + + switch (pbl_type) { + case PBL_VIRT: + pbl_virt = 1; + break; + case PBL_PHYS: + pbl_virt = 0; + break; + default: + return -EINVAL; + break; + } + + pbe_count = (c2dev->req_vq.msg_size - + sizeof(struct c2wr_nsmr_pbl_req)) / sizeof(u64); + wr = kmalloc(c2dev->req_vq.msg_size, GFP_KERNEL); + if (!wr) { + return -ENOMEM; + } + c2_wr_set_id(wr, CCWR_NSMR_PBL); + + /* + * Only the last PBL message will generate a reply from the verbs, + * so we set the context to 0 indicating there is no kernel verbs + * handler blocked awaiting this reply. + */ + wr->hdr.context = 0; + wr->rnic_handle = c2dev->adapter_handle; + wr->stag_index = stag_index; /* already swapped */ + wr->flags = 0; + pbl_index = 0; + while (pbl_depth) { + count = min(pbe_count, pbl_depth); + wr->addrs_length = cpu_to_be32(count); + + /* + * If this is the last message, then reference the + * vq request struct cuz we're gonna wait for a reply. + * also make this PBL msg as the last one. + */ + if (count == pbl_depth) { + /* + * reference the request struct. dereferenced in the + * int handler. + */ + vq_req_get(c2dev, vq_req); + wr->flags = cpu_to_be32(MEM_PBL_COMPLETE); + + /* + * This is the last PBL message. + * Set the context to our VQ Request Object so we can + * wait for the reply. + */ + wr->hdr.context = (unsigned long) vq_req; + } + + /* + * If pbl_virt is set then va is a virtual address + * that describes a virtually contiguous memory + * allocation. The wr needs the start of each virtual page + * to be converted to the corresponding physical address + * of the page. If pbl_virt is not set then va is an array + * of physical addresses and there is no conversion to do. + * Just fill in the wr with what is in the array. + */ + for (i = 0; i < count; i++) { + if (pbl_virt) { + /* XXX */ + //wr->paddrs[i] = + // cpu_to_be64(user_virt_to_phys(va)); + va += PAGE_SIZE; + } else { + wr->paddrs[i] = + cpu_to_be64(((u64 *)va)[pbl_index + i]); + } + } + + /* + * Send WR to adapter + */ + err = vq_send_wr(c2dev, (union c2wr *) wr); + if (err) { + if (count <= pbe_count) { + vq_req_put(c2dev, vq_req); + } + goto bail0; + } + pbl_depth -= count; + pbl_index += count; + } + + /* + * Now wait for the reply... + */ + err = vq_wait_for_reply(c2dev, vq_req); + if (err) { + goto bail0; + } + + /* + * Process reply + */ + reply = (struct c2wr_nsmr_pbl_rep *) (unsigned long) vq_req->reply_msg; + if (!reply) { + err = -ENOMEM; + goto bail0; + } + + err = c2_errno(reply); + + vq_repbuf_free(c2dev, reply); + bail0: + kfree(wr); + return err; +} + +#define C2_PBL_MAX_DEPTH 131072 +int +c2_nsmr_register_phys_kern(struct c2_dev *c2dev, u64 *addr_list, + int page_size, int pbl_depth, u32 length, + u32 offset, u64 *va, enum c2_acf acf, + struct c2_mr *mr) +{ + struct c2_vq_req *vq_req; + struct c2wr_nsmr_register_req *wr; + struct c2wr_nsmr_register_rep *reply; + u16 flags; + int i, pbe_count, count; + int err; + + if (!va || !length || !addr_list || !pbl_depth) + return -EINTR; + + /* + * Verify PBL depth is within rnic max + */ + if (pbl_depth > C2_PBL_MAX_DEPTH) { + return -EINTR; + } + + /* + * allocate verbs request object + */ + vq_req = vq_req_alloc(c2dev); + if (!vq_req) + return -ENOMEM; + + wr = kmalloc(c2dev->req_vq.msg_size, GFP_KERNEL); + if (!wr) { + err = -ENOMEM; + goto bail0; + } + + /* + * build the WR + */ + c2_wr_set_id(wr, CCWR_NSMR_REGISTER); + wr->hdr.context = (unsigned long) vq_req; + wr->rnic_handle = c2dev->adapter_handle; + + flags = (acf | MEM_VA_BASED | MEM_REMOTE); + + /* + * compute how many pbes can fit in the message + */ + pbe_count = (c2dev->req_vq.msg_size - + sizeof(struct c2wr_nsmr_register_req)) / sizeof(u64); + + if (pbl_depth <= pbe_count) { + flags |= MEM_PBL_COMPLETE; + } + wr->flags = cpu_to_be16(flags); + wr->stag_key = 0; //stag_key; + wr->va = cpu_to_be64(*va); + wr->pd_id = mr->pd->pd_id; + wr->pbe_size = cpu_to_be32(page_size); + wr->length = cpu_to_be32(length); + wr->pbl_depth = cpu_to_be32(pbl_depth); + wr->fbo = cpu_to_be32(offset); + count = min(pbl_depth, pbe_count); + wr->addrs_length = cpu_to_be32(count); + + /* + * fill out the PBL for this message + */ + for (i = 0; i < count; i++) { + wr->paddrs[i] = cpu_to_be64(addr_list[i]); + } + + /* + * regerence the request struct + */ + vq_req_get(c2dev, vq_req); + + /* + * send the WR to the adapter + */ + err = vq_send_wr(c2dev, (union c2wr *) wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail1; + } + + /* + * wait for reply from adapter + */ + err = vq_wait_for_reply(c2dev, vq_req); + if (err) { + goto bail1; + } + + /* + * process reply + */ + reply = + (struct c2wr_nsmr_register_rep *) (unsigned long) (vq_req->reply_msg); + if (!reply) { + err = -ENOMEM; + goto bail1; + } + if ((err = c2_errno(reply))) { + goto bail2; + } + //*p_pb_entries = be32_to_cpu(reply->pbl_depth); + mr->ibmr.lkey = mr->ibmr.rkey = be32_to_cpu(reply->stag_index); + vq_repbuf_free(c2dev, reply); + + /* + * if there are still more PBEs we need to send them to + * the adapter and wait for a reply on the final one. + * reuse vq_req for this purpose. + */ + pbl_depth -= count; + if (pbl_depth) { + + vq_req->reply_msg = (unsigned long) NULL; + atomic_set(&vq_req->reply_ready, 0); + err = send_pbl_messages(c2dev, + cpu_to_be32(mr->ibmr.lkey), + (unsigned long) &addr_list[i], + pbl_depth, vq_req, PBL_PHYS); + if (err) { + goto bail1; + } + } + + vq_req_free(c2dev, vq_req); + kfree(wr); + + return err; + + bail2: + vq_repbuf_free(c2dev, reply); + bail1: + kfree(wr); + bail0: + vq_req_free(c2dev, vq_req); + return err; +} + +int c2_stag_dealloc(struct c2_dev *c2dev, u32 stag_index) +{ + struct c2_vq_req *vq_req; /* verbs request object */ + struct c2wr_stag_dealloc_req wr; /* work request */ + struct c2wr_stag_dealloc_rep *reply; /* WR reply */ + int err; + + + /* + * allocate verbs request object + */ + vq_req = vq_req_alloc(c2dev); + if (!vq_req) { + return -ENOMEM; + } + + /* + * Build the WR + */ + c2_wr_set_id(&wr, CCWR_STAG_DEALLOC); + wr.hdr.context = (u64) (unsigned long) vq_req; + wr.rnic_handle = c2dev->adapter_handle; + wr.stag_index = cpu_to_be32(stag_index); + + /* + * reference the request struct. dereferenced in the int handler. + */ + vq_req_get(c2dev, vq_req); + + /* + * Send WR to adapter + */ + err = vq_send_wr(c2dev, (union c2wr *) & wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail0; + } + + /* + * Wait for reply from adapter + */ + err = vq_wait_for_reply(c2dev, vq_req); + if (err) { + goto bail0; + } + + /* + * Process reply + */ + reply = (struct c2wr_stag_dealloc_rep *) (unsigned long) vq_req->reply_msg; + if (!reply) { + err = -ENOMEM; + goto bail0; + } + + err = c2_errno(reply); + + vq_repbuf_free(c2dev, reply); + bail0: + vq_req_free(c2dev, vq_req); + return err; +} From shemminger at osdl.org Wed May 31 11:59:06 2006 From: shemminger at osdl.org (Stephen Hemminger) Date: Wed, 31 May 2006 11:59:06 -0700 Subject: [openib-general] Re: [PATCH 2/7] AMSO1100 Low Level Driver. In-Reply-To: <20060531182737.3652.24752.stgit@stevo-desktop> References: <20060531182733.3652.54755.stgit@stevo-desktop> <20060531182737.3652.24752.stgit@stevo-desktop> Message-ID: <20060531115906.30f4bbda@localhost.localdomain> The following should be replaced with BUG_ON() or WARN_ON(). and pr_debug() +#ifdef C2_DEBUG +#define assert(expr) \ + if(!(expr)) { \ + printk(KERN_ERR PFX "Assertion failed! %s, %s, %s, line %d\n",\ + #expr, __FILE__, __FUNCTION__, __LINE__); \ + } +#define dprintk(fmt, args...) do {printk(KERN_INFO PFX fmt, ##args);} while (0) +#else +#define assert(expr) do {} while (0) +#define dprintk(fmt, args...) do {} while (0) +#endif /* C2_DEBUG */ -------------------- Also, you tend to use assert() as a bogus NULL pointer check. If you get passed a NULL, it is a bug, and the deref will fail and cause a pretty stack dump... +static void c2_set_rxbufsize(struct c2_port *c2_port) +{ + struct net_device *netdev = c2_port->netdev; + + assert(netdev != NULL); Bogus, you will just fail on the deref below + + if (netdev->mtu > RX_BUF_SIZE) + c2_port->rx_buf_size = + netdev->mtu + ETH_HLEN + sizeof(struct c2_rxp_hdr) + + NET_IP_ALIGN; + else + c2_port->rx_buf_size = sizeof(struct c2_rxp_hdr) + RX_BUF_SIZE; +} +static void c2_rx_interrupt(struct net_device *netdev) +{ + struct c2_port *c2_port = netdev_priv(netdev); + struct c2_dev *c2dev = c2_port->c2dev; + struct c2_ring *rx_ring = &c2_port->rx_ring; + struct c2_element *elem; + struct c2_rx_desc *rx_desc; + struct c2_rxp_hdr *rxp_hdr; + struct sk_buff *skb; + dma_addr_t mapaddr; + u32 maplen, buflen; + unsigned long flags; + + spin_lock_irqsave(&c2dev->lock, flags); + + /* Begin where we left off */ + rx_ring->to_clean = rx_ring->start + c2dev->cur_rx; + + for (elem = rx_ring->to_clean; elem->next != rx_ring->to_clean; + elem = elem->next) { + rx_desc = elem->ht_desc; + mapaddr = elem->mapaddr; + maplen = elem->maplen; + skb = elem->skb; + rxp_hdr = (struct c2_rxp_hdr *) skb->data; + + if (rxp_hdr->flags != RXP_HRXD_DONE) + break; + buflen = rxp_hdr->len; + + /* Sanity check the RXP header */ + if (rxp_hdr->status != RXP_HRXD_OK || + buflen > (rx_desc->len - sizeof(*rxp_hdr))) { + c2_rx_error(c2_port, elem); + continue; + } + + /* + * Allocate and map a new skb for replenishing the host + * RX desc + */ + if (c2_rx_alloc(c2_port, elem)) { + c2_rx_error(c2_port, elem); + continue; + } + + /* Unmap the old skb */ + pci_unmap_single(c2dev->pcidev, mapaddr, maplen, + PCI_DMA_FROMDEVICE); + prefetch(skb->data) here will help performance. + /* + * Skip past the leading 8 bytes comprising of the + * "struct c2_rxp_hdr", prepended by the adapter + * to the usual Ethernet header ("struct ethhdr"), + * to the start of the raw Ethernet packet. + * + * Fix up the various fields in the sk_buff before + * passing it up to netif_rx(). The transfer size + * (in bytes) specified by the adapter len field of + * the "struct rxp_hdr_t" does NOT include the + * "sizeof(struct c2_rxp_hdr)". + */ + skb->data += sizeof(*rxp_hdr); + skb->tail = skb->data + buflen; + skb->len = buflen; + skb->dev = netdev; + skb->protocol = eth_type_trans(skb, netdev); + + /* Drop arp requests to the pseudo nic ip addr */ + if (unlikely(ntohs(skb->protocol) == ETH_P_ARP)) { + u8 *tpa; + + /* pull out the tgt ip addr */ + tpa = skb->data /* beginning of the arp packet */ + + 8 /* arp addr fmts, lens, and opcode */ + + 6 /* arp src hw addr */ + + 4 /* arp src proto addr */ + + 6; /* arp tgt hw addr */ + if (is_rnic_addr(c2dev->pseudo_netdev, *((u32 *)tpa))) { + dprintk("Dropping arp req for" + " %03d.%03d.%03d.%03d\n", + tpa[0], tpa[1], tpa[2], tpa[3]); + kfree_skb(skb); + continue; + } + } This is looks like a mess, please do it at a higher level or code it with proper structure headers + + netif_rx(skb); + + netdev->last_rx = jiffies; + c2_port->netstats.rx_packets++; + c2_port->netstats.rx_bytes += buflen; + } + + /* Save where we left off */ + rx_ring->to_clean = elem; + c2dev->cur_rx = elem - rx_ring->start; + C2_SET_CUR_RX(c2dev, c2dev->cur_rx); + + spin_unlock_irqrestore(&c2dev->lock, flags); +} + +/* + * Handle netisr0 TX & RX interrupts. + */ +static irqreturn_t c2_interrupt(int irq, void *dev_id, struct pt_regs *regs) +{ + unsigned int netisr0, dmaisr; + int handled = 0; + struct c2_dev *c2dev = (struct c2_dev *) dev_id; + + assert(c2dev != NULL); + + /* Process CCILNET interrupts */ + netisr0 = readl(c2dev->regs + C2_NISR0); + if (netisr0) { + + /* + * There is an issue with the firmware that always + * provides the status of RX for both TX & RX + * interrupts. So process both queues here. + */ + c2_rx_interrupt(c2dev->netdev); + c2_tx_interrupt(c2dev->netdev); + + /* Clear the interrupt */ + writel(netisr0, c2dev->regs + C2_NISR0); + handled++; + } + + /* Process RNIC interrupts */ + dmaisr = readl(c2dev->regs + C2_DISR); + if (dmaisr) { + writel(dmaisr, c2dev->regs + C2_DISR); + c2_rnic_interrupt(c2dev); + handled++; + } + + if (handled) { + return IRQ_HANDLED; + } else { + return IRQ_NONE; + } return IRQ_RETVAL(handled); +} + +static int c2_up(struct net_device *netdev) +{ + struct c2_port *c2_port = netdev_priv(netdev); + struct c2_dev *c2dev = c2_port->c2dev; + struct c2_element *elem; + struct c2_rxp_hdr *rxp_hdr; + size_t rx_size, tx_size; + int ret, i; + unsigned int netimr0; + + assert(c2dev != NULL); More bogus asserts + + if (netif_msg_ifup(c2_port)) + dprintk("%s: enabling interface\n", netdev->name); + + /* Set the Rx buffer size based on MTU */ + c2_set_rxbufsize(c2_port); + + /* Allocate DMA'able memory for Tx/Rx host descriptor rings */ + rx_size = c2_port->rx_ring.count * sizeof(struct c2_rx_desc); + tx_size = c2_port->tx_ring.count * sizeof(struct c2_tx_desc); + + c2_port->mem_size = tx_size + rx_size; + c2_port->mem = pci_alloc_consistent(c2dev->pcidev, c2_port->mem_size, + &c2_port->dma); + if (c2_port->mem == NULL) { + dprintk("Unable to allocate memory for " + "host descriptor rings\n"); + return -ENOMEM; + } + + memset(c2_port->mem, 0, c2_port->mem_size); + + /* Create the Rx host descriptor ring */ + if ((ret = + c2_rx_ring_alloc(&c2_port->rx_ring, c2_port->mem, c2_port->dma, + c2dev->mmio_rxp_ring))) { + dprintk("Unable to create RX ring\n"); + goto bail0; + } + + /* Allocate Rx buffers for the host descriptor ring */ + if (c2_rx_fill(c2_port)) { + dprintk("Unable to fill RX ring\n"); + goto bail1; + } + + /* Create the Tx host descriptor ring */ + if ((ret = c2_tx_ring_alloc(&c2_port->tx_ring, c2_port->mem + rx_size, + c2_port->dma + rx_size, + c2dev->mmio_txp_ring))) { + dprintk("Unable to create TX ring\n"); + goto bail1; + } + + /* Set the TX pointer to where we left off */ + c2_port->tx_avail = c2_port->tx_ring.count - 1; + c2_port->tx_ring.to_use = c2_port->tx_ring.to_clean = + c2_port->tx_ring.start + c2dev->cur_tx; + + /* missing: Initialize MAC */ + + BUG_ON(c2_port->tx_ring.to_use != c2_port->tx_ring.to_clean); + + /* Reset the adapter, ensures the driver is in sync with the RXP */ + c2_reset(c2_port); + + /* Reset the READY bit in the sk_buff RXP headers & adapter HRXDQ */ + for (i = 0, elem = c2_port->rx_ring.start; i < c2_port->rx_ring.count; + i++, elem++) { + rxp_hdr = (struct c2_rxp_hdr *) elem->skb->data; + rxp_hdr->flags = 0; + __raw_writew(cpu_to_be16(RXP_HRXD_READY), + elem->hw_desc + C2_RXP_FLAGS); + } + + /* Enable network packets */ + netif_start_queue(netdev); + + /* Enable IRQ */ + writel(0, c2dev->regs + C2_IDIS); + netimr0 = readl(c2dev->regs + C2_NIMR0); + netimr0 &= ~(C2_PCI_HTX_INT | C2_PCI_HRX_INT); + writel(netimr0, c2dev->regs + C2_NIMR0); + + return 0; + + bail1: + c2_rx_clean(c2_port); + kfree(c2_port->rx_ring.start); + + bail0: + pci_free_consistent(c2dev->pcidev, c2_port->mem_size, c2_port->mem, + c2_port->dma); + + return ret; +} + +static int c2_down(struct net_device *netdev) +{ + struct c2_port *c2_port = netdev_priv(netdev); + struct c2_dev *c2dev = c2_port->c2dev; + + if (netif_msg_ifdown(c2_port)) + dprintk("%s: disabling interface\n", + netdev->name); + + /* Wait for all the queued packets to get sent */ + c2_tx_interrupt(netdev); + + /* Disable network packets */ + netif_stop_queue(netdev); + + /* Disable IRQs by clearing the interrupt mask */ + writel(1, c2dev->regs + C2_IDIS); + writel(0, c2dev->regs + C2_NIMR0); + + /* missing: Stop transmitter */ + + /* missing: Stop receiver */ + + /* Reset the adapter, ensures the driver is in sync with the RXP */ + c2_reset(c2_port); + + /* missing: Turn off LEDs here */ + + /* Free all buffers in the host descriptor rings */ + c2_tx_clean(c2_port); + c2_rx_clean(c2_port); + + /* Free the host descriptor rings */ + kfree(c2_port->rx_ring.start); + kfree(c2_port->tx_ring.start); + pci_free_consistent(c2dev->pcidev, c2_port->mem_size, c2_port->mem, + c2_port->dma); + + return 0; +} + +static void c2_reset(struct c2_port *c2_port) +{ + struct c2_dev *c2dev = c2_port->c2dev; + unsigned int cur_rx = c2dev->cur_rx; + + /* Tell the hardware to quiesce */ + C2_SET_CUR_RX(c2dev, cur_rx | C2_PCI_HRX_QUI); + + /* + * The hardware will reset the C2_PCI_HRX_QUI bit once + * the RXP is quiesced. Wait 2 seconds for this. + */ + ssleep(2); + + cur_rx = C2_GET_CUR_RX(c2dev); + + if (cur_rx & C2_PCI_HRX_QUI) + dprintk("c2_reset: failed to quiesce the hardware!\n"); + + cur_rx &= ~C2_PCI_HRX_QUI; + + c2dev->cur_rx = cur_rx; + + dprintk("Current RX: %u\n", c2dev->cur_rx); +} + +static int c2_xmit_frame(struct sk_buff *skb, struct net_device *netdev) +{ + struct c2_port *c2_port = netdev_priv(netdev); + struct c2_dev *c2dev = c2_port->c2dev; + struct c2_ring *tx_ring = &c2_port->tx_ring; + struct c2_element *elem; + dma_addr_t mapaddr; + u32 maplen; + unsigned long flags; + unsigned int i; + + spin_lock_irqsave(&c2_port->tx_lock, flags); + + if (unlikely(c2_port->tx_avail < (skb_shinfo(skb)->nr_frags + 1))) { + netif_stop_queue(netdev); + spin_unlock_irqrestore(&c2_port->tx_lock, flags); + + dprintk("%s: Tx ring full when queue awake!\n", + netdev->name); + return NETDEV_TX_BUSY; + } + + maplen = skb_headlen(skb); + mapaddr = + pci_map_single(c2dev->pcidev, skb->data, maplen, PCI_DMA_TODEVICE); + + elem = tx_ring->to_use; + elem->skb = skb; + elem->mapaddr = mapaddr; + elem->maplen = maplen; + + /* Tell HW to xmit */ + __raw_writeq(cpu_to_be64(mapaddr), elem->hw_desc + C2_TXP_ADDR); + __raw_writew(cpu_to_be16(maplen), elem->hw_desc + C2_TXP_LEN); + __raw_writew(cpu_to_be16(TXP_HTXD_READY), elem->hw_desc + C2_TXP_FLAGS); + + c2_port->netstats.tx_packets++; + c2_port->netstats.tx_bytes += maplen; + + /* Loop thru additional data fragments and queue them */ + if (skb_shinfo(skb)->nr_frags) { + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { + skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; + maplen = frag->size; + mapaddr = + pci_map_page(c2dev->pcidev, frag->page, + frag->page_offset, maplen, + PCI_DMA_TODEVICE); + + elem = elem->next; + elem->skb = NULL; + elem->mapaddr = mapaddr; + elem->maplen = maplen; + + /* Tell HW to xmit */ + __raw_writeq(cpu_to_be64(mapaddr), + elem->hw_desc + C2_TXP_ADDR); + __raw_writew(cpu_to_be16(maplen), + elem->hw_desc + C2_TXP_LEN); + __raw_writew(cpu_to_be16(TXP_HTXD_READY), + elem->hw_desc + C2_TXP_FLAGS); + + c2_port->netstats.tx_packets++; + c2_port->netstats.tx_bytes += maplen; + } + } + + tx_ring->to_use = elem->next; + c2_port->tx_avail -= (skb_shinfo(skb)->nr_frags + 1); + + if (c2_port->tx_avail <= MAX_SKB_FRAGS + 1) { + netif_stop_queue(netdev); + if (netif_msg_tx_queued(c2_port)) + dprintk("%s: transmit queue full\n", + netdev->name); + } + + spin_unlock_irqrestore(&c2_port->tx_lock, flags); + + netdev->trans_start = jiffies; + + return NETDEV_TX_OK; +} + +static struct net_device_stats *c2_get_stats(struct net_device *netdev) +{ + struct c2_port *c2_port = netdev_priv(netdev); + + return &c2_port->netstats; +} + +static int c2_set_mac_address(struct net_device *netdev, void *p) +{ + return -1; +} If you don't handle changing mac_address, just leaveing dev->set_mac_address will do the right thing. Also, if you need to return an error, use -ESOMEERROR, not -1. + +static void c2_tx_timeout(struct net_device *netdev) +{ + struct c2_port *c2_port = netdev_priv(netdev); + + if (netif_msg_timer(c2_port)) + dprintk("%s: tx timeout\n", netdev->name); + + c2_tx_clean(c2_port); +} + +static int c2_change_mtu(struct net_device *netdev, int new_mtu) +{ + int ret = 0; + + if (new_mtu < ETH_ZLEN || new_mtu > ETH_JUMBO_MTU) + return -EINVAL; + + netdev->mtu = new_mtu; + + if (netif_running(netdev)) { + c2_down(netdev); + + c2_up(netdev); + } + + return ret; +} + +/* Initialize network device */ +static struct net_device *c2_devinit(struct c2_dev *c2dev, + void __iomem * mmio_addr) +{ + struct c2_port *c2_port = NULL; + struct net_device *netdev = alloc_etherdev(sizeof(*c2_port)); + + if (!netdev) { + dprintk("c2_port etherdev alloc failed"); + return NULL; + } + + SET_MODULE_OWNER(netdev); + SET_NETDEV_DEV(netdev, &c2dev->pcidev->dev); + + netdev->open = c2_up; + netdev->stop = c2_down; + netdev->hard_start_xmit = c2_xmit_frame; + netdev->get_stats = c2_get_stats; + netdev->tx_timeout = c2_tx_timeout; + netdev->set_mac_address = c2_set_mac_address; + netdev->change_mtu = c2_change_mtu; + netdev->watchdog_timeo = C2_TX_TIMEOUT; + netdev->irq = c2dev->pcidev->irq; + + c2_port = netdev_priv(netdev); + c2_port->netdev = netdev; + c2_port->c2dev = c2dev; + c2_port->msg_enable = netif_msg_init(debug, default_msg); + c2_port->tx_ring.count = C2_NUM_TX_DESC; + c2_port->rx_ring.count = C2_NUM_RX_DESC; + + spin_lock_init(&c2_port->tx_lock); + + /* Copy our 48-bit ethernet hardware address */ + memcpy_fromio(netdev->dev_addr, mmio_addr + C2_REGS_ENADDR, 6); + + /* Validate the MAC address */ + if (!is_valid_ether_addr(netdev->dev_addr)) { + dprintk("Invalid MAC Address\n"); + c2_print_macaddr(netdev); + free_netdev(netdev); + return NULL; + } + + c2dev->netdev = netdev; + + return netdev; +} + +static int __devinit c2_probe(struct pci_dev *pcidev, + const struct pci_device_id *ent) +{ + int ret = 0, i; + unsigned long reg0_start, reg0_flags, reg0_len; + unsigned long reg2_start, reg2_flags, reg2_len; + unsigned long reg4_start, reg4_flags, reg4_len; + unsigned kva_map_size; + struct net_device *netdev = NULL; + struct c2_dev *c2dev = NULL; + void __iomem *mmio_regs = NULL; + + assert(pcidev != NULL); + assert(ent != NULL); + + printk(KERN_INFO PFX "AMSO1100 Gigabit Ethernet driver v%s loaded\n", + DRV_VERSION); + + /* Enable PCI device */ + ret = pci_enable_device(pcidev); + if (ret) { + printk(KERN_ERR PFX "%s: Unable to enable PCI device\n", + pci_name(pcidev)); + goto bail0; + } + + reg0_start = pci_resource_start(pcidev, BAR_0); + reg0_len = pci_resource_len(pcidev, BAR_0); + reg0_flags = pci_resource_flags(pcidev, BAR_0); + + reg2_start = pci_resource_start(pcidev, BAR_2); + reg2_len = pci_resource_len(pcidev, BAR_2); + reg2_flags = pci_resource_flags(pcidev, BAR_2); + + reg4_start = pci_resource_start(pcidev, BAR_4); + reg4_len = pci_resource_len(pcidev, BAR_4); + reg4_flags = pci_resource_flags(pcidev, BAR_4); + + dprintk("BAR0 size = 0x%lX bytes\n", reg0_len); + dprintk("BAR2 size = 0x%lX bytes\n", reg2_len); + dprintk("BAR4 size = 0x%lX bytes\n", reg4_len); + + /* Make sure PCI base addr are MMIO */ + if (!(reg0_flags & IORESOURCE_MEM) || + !(reg2_flags & IORESOURCE_MEM) || !(reg4_flags & IORESOURCE_MEM)) { + printk(KERN_ERR PFX "PCI regions not an MMIO resource\n"); + ret = -ENODEV; + goto bail1; + } + + /* Check for weird/broken PCI region reporting */ + if ((reg0_len < C2_REG0_SIZE) || + (reg2_len < C2_REG2_SIZE) || (reg4_len < C2_REG4_SIZE)) { + printk(KERN_ERR PFX "Invalid PCI region sizes\n"); + ret = -ENODEV; + goto bail1; + } + + /* Reserve PCI I/O and memory resources */ + ret = pci_request_regions(pcidev, DRV_NAME); + if (ret) { + printk(KERN_ERR PFX "%s: Unable to request regions\n", + pci_name(pcidev)); + goto bail1; + } + + if ((sizeof(dma_addr_t) > 4)) { + ret = pci_set_dma_mask(pcidev, DMA_64BIT_MASK); + if (ret < 0) { + printk(KERN_ERR PFX "64b DMA configuration failed\n"); + goto bail2; + } + } else { + ret = pci_set_dma_mask(pcidev, DMA_32BIT_MASK); + if (ret < 0) { + printk(KERN_ERR PFX "32b DMA configuration failed\n"); + goto bail2; + } + } + + /* Enables bus-mastering on the device */ + pci_set_master(pcidev); + + /* Remap the adapter PCI registers in BAR4 */ + mmio_regs = ioremap_nocache(reg4_start + C2_PCI_REGS_OFFSET, + sizeof(struct c2_adapter_pci_regs)); + if (mmio_regs == 0UL) { + printk(KERN_ERR PFX + "Unable to remap adapter PCI registers in BAR4\n"); + ret = -EIO; + goto bail2; + } + + /* Validate PCI regs magic */ + for (i = 0; i < sizeof(c2_magic); i++) { + if (c2_magic[i] != readb(mmio_regs + C2_REGS_MAGIC + i)) { + printk(KERN_ERR PFX "Downlevel Firmware boot loader " + "[%d/%Zd: got 0x%x, exp 0x%x]. Use the cc_flash " + "utility to update your boot loader\n", + i + 1, sizeof(c2_magic), + readb(mmio_regs + C2_REGS_MAGIC + i), + c2_magic[i]); + printk(KERN_ERR PFX "Adapter not claimed\n"); + iounmap(mmio_regs); + ret = -EIO; + goto bail2; + } + } + + /* Validate the adapter version */ + if (be32_to_cpu(readl(mmio_regs + C2_REGS_VERS)) != C2_VERSION) { + printk(KERN_ERR PFX "Version mismatch " + "[fw=%u, c2=%u], Adapter not claimed\n", + be32_to_cpu(readl(mmio_regs + C2_REGS_VERS)), + C2_VERSION); + ret = -EINVAL; + iounmap(mmio_regs); + goto bail2; + } + + /* Validate the adapter IVN */ + if (be32_to_cpu(readl(mmio_regs + C2_REGS_IVN)) != C2_IVN) { + printk(KERN_ERR PFX "Downlevel FIrmware level. You should be using " + "the OpenIB device support kit. " + "[fw=0x%x, c2=0x%x], Adapter not claimed\n", + be32_to_cpu(readl(mmio_regs + C2_REGS_IVN)), + C2_IVN); + ret = -EINVAL; + iounmap(mmio_regs); + goto bail2; + } + + /* Allocate hardware structure */ + c2dev = (struct c2_dev *) ib_alloc_device(sizeof *c2dev); + if (!c2dev) { + printk(KERN_ERR PFX "%s: Unable to alloc hardware struct\n", + pci_name(pcidev)); + ret = -ENOMEM; + iounmap(mmio_regs); + goto bail2; + } + + memset(c2dev, 0, sizeof(*c2dev)); + spin_lock_init(&c2dev->lock); + c2dev->pcidev = pcidev; + c2dev->cur_tx = 0; + + /* Get the last RX index */ + c2dev->cur_rx = + (be32_to_cpu(readl(mmio_regs + C2_REGS_HRX_CUR)) - + 0xffffc000) / sizeof(struct c2_rxp_desc); + + /* Request an interrupt line for the driver */ + ret = request_irq(pcidev->irq, c2_interrupt, SA_SHIRQ, DRV_NAME, c2dev); + if (ret) { + printk(KERN_ERR PFX "%s: requested IRQ %u is busy\n", + pci_name(pcidev), pcidev->irq); + iounmap(mmio_regs); + goto bail3; + } + + /* Set driver specific data */ + pci_set_drvdata(pcidev, c2dev); + + /* Initialize network device */ + if ((netdev = c2_devinit(c2dev, mmio_regs)) == NULL) { + iounmap(mmio_regs); + goto bail4; + } + + /* Save off the actual size prior to unmapping mmio_regs */ + kva_map_size = be32_to_cpu(readl(mmio_regs + C2_REGS_PCI_WINSIZE)); + + /* Unmap the adapter PCI registers in BAR4 */ + iounmap(mmio_regs); + + /* Register network device */ + ret = register_netdev(netdev); + if (ret) { + printk(KERN_ERR PFX "Unable to register netdev, ret = %d\n", + ret); + goto bail5; + } + + /* Disable network packets */ + netif_stop_queue(netdev); + + /* Remap the adapter HRXDQ PA space to kernel VA space */ + c2dev->mmio_rxp_ring = ioremap_nocache(reg4_start + C2_RXP_HRXDQ_OFFSET, + C2_RXP_HRXDQ_SIZE); + if (c2dev->mmio_rxp_ring == 0UL) { + printk(KERN_ERR PFX "Unable to remap MMIO HRXDQ region\n"); + ret = -EIO; + goto bail6; + } + + /* Remap the adapter HTXDQ PA space to kernel VA space */ + c2dev->mmio_txp_ring = ioremap_nocache(reg4_start + C2_TXP_HTXDQ_OFFSET, + C2_TXP_HTXDQ_SIZE); + if (c2dev->mmio_txp_ring == 0UL) { + printk(KERN_ERR PFX "Unable to remap MMIO HTXDQ region\n"); + ret = -EIO; + goto bail7; + } + + /* Save off the current RX index in the last 4 bytes of the TXP Ring */ + C2_SET_CUR_RX(c2dev, c2dev->cur_rx); + + /* Remap the PCI registers in adapter BAR0 to kernel VA space */ + c2dev->regs = ioremap_nocache(reg0_start, reg0_len); + if (c2dev->regs == 0UL) { + printk(KERN_ERR PFX "Unable to remap BAR0\n"); + ret = -EIO; + goto bail8; + } + + /* Remap the PCI registers in adapter BAR4 to kernel VA space */ + c2dev->pa = reg4_start + C2_PCI_REGS_OFFSET; + c2dev->kva = ioremap_nocache(reg4_start + C2_PCI_REGS_OFFSET, + kva_map_size); + if (c2dev->kva == 0UL) { + printk(KERN_ERR PFX "Unable to remap BAR4\n"); + ret = -EIO; + goto bail9; + } + + /* Print out the MAC address */ + c2_print_macaddr(netdev); + + ret = c2_rnic_init(c2dev); + if (ret) { + printk(KERN_ERR PFX "c2_rnic_init failed: %d\n", ret); + goto bail10; + } + + c2_register_device(c2dev); + + return 0; + + bail10: + iounmap(c2dev->kva); + + bail9: + iounmap(c2dev->regs); + + bail8: + iounmap(c2dev->mmio_txp_ring); + + bail7: + iounmap(c2dev->mmio_rxp_ring); + + bail6: + unregister_netdev(netdev); + + bail5: + free_netdev(netdev); + + bail4: + free_irq(pcidev->irq, c2dev); + + bail3: + ib_dealloc_device(&c2dev->ibdev); + + bail2: + pci_release_regions(pcidev); + + bail1: + pci_disable_device(pcidev); + + bail0: + return ret; +} + +static void __devexit c2_remove(struct pci_dev *pcidev) +{ + struct c2_dev *c2dev = pci_get_drvdata(pcidev); + struct net_device *netdev = c2dev->netdev; + + assert(netdev != NULL); + + /* Unregister with OpenIB */ + c2_unregister_device(c2dev); + + /* Clean up the RNIC resources */ + c2_rnic_term(c2dev); + + /* Remove network device from the kernel */ + unregister_netdev(netdev); + + /* Free network device */ + free_netdev(netdev); + + /* Free the interrupt line */ + free_irq(pcidev->irq, c2dev); + + /* missing: Turn LEDs off here */ + + /* Unmap adapter PA space */ + iounmap(c2dev->kva); + iounmap(c2dev->regs); + iounmap(c2dev->mmio_txp_ring); + iounmap(c2dev->mmio_rxp_ring); + + /* Free the hardware structure */ + ib_dealloc_device(&c2dev->ibdev); + + /* Release reserved PCI I/O and memory resources */ + pci_release_regions(pcidev); + + /* Disable PCI device */ + pci_disable_device(pcidev); + + /* Clear driver specific data */ + pci_set_drvdata(pcidev, NULL); +} + +static struct pci_driver c2_pci_driver = { + .name = DRV_NAME, + .id_table = c2_pci_table, + .probe = c2_probe, + .remove = __devexit_p(c2_remove), +}; + +static int __init c2_init_module(void) +{ + return pci_module_init(&c2_pci_driver); +} + +static void __exit c2_exit_module(void) +{ + pci_unregister_driver(&c2_pci_driver); +} + +module_init(c2_init_module); +module_exit(c2_exit_module); diff --git a/drivers/infiniband/hw/amso1100/c2.h b/drivers/infiniband/hw/amso1100/c2.h new file mode 100644 index 0000000..8124c6b --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2.h @@ -0,0 +1,567 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef __C2_H +#define __C2_H + +#include +#include +#include +#include +#include +#include + +#include "c2_provider.h" +#include "c2_mq.h" +#include "c2_status.h" + +#define DRV_NAME "c2" +#define DRV_VERSION "1.1" +#define PFX DRV_NAME ": " + +#ifdef C2_DEBUG +#define assert(expr) \ + if(!(expr)) { \ + printk(KERN_ERR PFX "Assertion failed! %s, %s, %s, line %d\n",\ + #expr, __FILE__, __FUNCTION__, __LINE__); \ + } +#define dprintk(fmt, args...) do {printk(KERN_INFO PFX fmt, ##args);} while (0) +#else +#define assert(expr) do {} while (0) +#define dprintk(fmt, args...) do {} while (0) +#endif /* C2_DEBUG */ + +#define BAR_0 0 +#define BAR_2 2 +#define BAR_4 4 + +#define RX_BUF_SIZE (1536 + 8) +#define ETH_JUMBO_MTU 9000 +#define C2_MAGIC "CEPHEUS" +#define C2_VERSION 4 +#define C2_IVN (18 & 0x7fffffff) + +#define C2_REG0_SIZE (16 * 1024) +#define C2_REG2_SIZE (2 * 1024 * 1024) +#define C2_REG4_SIZE (256 * 1024 * 1024) +#define C2_NUM_TX_DESC 341 +#define C2_NUM_RX_DESC 256 +#define C2_PCI_REGS_OFFSET (0x10000) +#define C2_RXP_HRXDQ_OFFSET (((C2_REG4_SIZE)/2)) +#define C2_RXP_HRXDQ_SIZE (4096) +#define C2_TXP_HTXDQ_OFFSET (((C2_REG4_SIZE)/2) + C2_RXP_HRXDQ_SIZE) +#define C2_TXP_HTXDQ_SIZE (4096) +#define C2_TX_TIMEOUT (6*HZ) + +/* CEPHEUS */ +static const u8 c2_magic[] = { + 0x43, 0x45, 0x50, 0x48, 0x45, 0x55, 0x53 +}; + +enum adapter_pci_regs { + C2_REGS_MAGIC = 0x0000, + C2_REGS_VERS = 0x0008, + C2_REGS_IVN = 0x000C, + C2_REGS_PCI_WINSIZE = 0x0010, + C2_REGS_Q0_QSIZE = 0x0014, + C2_REGS_Q0_MSGSIZE = 0x0018, + C2_REGS_Q0_POOLSTART = 0x001C, + C2_REGS_Q0_SHARED = 0x0020, + C2_REGS_Q1_QSIZE = 0x0024, + C2_REGS_Q1_MSGSIZE = 0x0028, + C2_REGS_Q1_SHARED = 0x0030, + C2_REGS_Q2_QSIZE = 0x0034, + C2_REGS_Q2_MSGSIZE = 0x0038, + C2_REGS_Q2_SHARED = 0x0040, + C2_REGS_ENADDR = 0x004C, + C2_REGS_RDMA_ENADDR = 0x0054, + C2_REGS_HRX_CUR = 0x006C, +}; + +struct c2_adapter_pci_regs { + char reg_magic[8]; + u32 version; + u32 ivn; + u32 pci_window_size; + u32 q0_q_size; + u32 q0_msg_size; + u32 q0_pool_start; + u32 q0_shared; + u32 q1_q_size; + u32 q1_msg_size; + u32 q1_pool_start; + u32 q1_shared; + u32 q2_q_size; + u32 q2_msg_size; + u32 q2_pool_start; + u32 q2_shared; + u32 log_start; + u32 log_size; + u8 host_enaddr[8]; + u8 rdma_enaddr[8]; + u32 crash_entry; + u32 crash_ready[2]; + u32 fw_txd_cur; + u32 fw_hrxd_cur; + u32 fw_rxd_cur; +}; + +enum pci_regs { + C2_HISR = 0x0000, + C2_DISR = 0x0004, + C2_HIMR = 0x0008, + C2_DIMR = 0x000C, + C2_NISR0 = 0x0010, + C2_NISR1 = 0x0014, + C2_NIMR0 = 0x0018, + C2_NIMR1 = 0x001C, + C2_IDIS = 0x0020, +}; + +enum { + C2_PCI_HRX_INT = 1 << 8, + C2_PCI_HTX_INT = 1 << 17, + C2_PCI_HRX_QUI = 1 << 31, +}; + +/* + * Cepheus registers in BAR0. + */ +struct c2_pci_regs { + u32 hostisr; + u32 dmaisr; + u32 hostimr; + u32 dmaimr; + u32 netisr0; + u32 netisr1; + u32 netimr0; + u32 netimr1; + u32 int_disable; +}; + +/* TXP flags */ +enum c2_txp_flags { + TXP_HTXD_DONE = 0, + TXP_HTXD_READY = 1 << 0, + TXP_HTXD_UNINIT = 1 << 1, +}; + +/* RXP flags */ +enum c2_rxp_flags { + RXP_HRXD_UNINIT = 0, + RXP_HRXD_READY = 1 << 0, + RXP_HRXD_DONE = 1 << 1, +}; + +/* RXP status */ +enum c2_rxp_status { + RXP_HRXD_ZERO = 0, + RXP_HRXD_OK = 1 << 0, + RXP_HRXD_BUF_OV = 1 << 1, +}; + +/* TXP descriptor fields */ +enum txp_desc { + C2_TXP_FLAGS = 0x0000, + C2_TXP_LEN = 0x0002, + C2_TXP_ADDR = 0x0004, +}; + +/* RXP descriptor fields */ +enum rxp_desc { + C2_RXP_FLAGS = 0x0000, + C2_RXP_STATUS = 0x0002, + C2_RXP_COUNT = 0x0004, + C2_RXP_LEN = 0x0006, + C2_RXP_ADDR = 0x0008, +}; + +struct c2_txp_desc { + u16 flags; + u16 len; + u64 addr; +} __attribute__ ((packed)); + +struct c2_rxp_desc { + u16 flags; + u16 status; + u16 count; + u16 len; + u64 addr; +} __attribute__ ((packed)); + +struct c2_rxp_hdr { + u16 flags; + u16 status; + u16 len; + u16 rsvd; +} __attribute__ ((packed)); + +struct c2_tx_desc { + u32 len; + u32 status; + dma_addr_t next_offset; +}; + +struct c2_rx_desc { + u32 len; + u32 status; + dma_addr_t next_offset; +}; + +struct c2_alloc { + u32 last; + u32 max; + spinlock_t lock; + unsigned long *table; +}; + +struct c2_array { + struct { + void **page; + int used; + } *page_list; +}; + +/* + * The MQ shared pointer pool is organized as a linked list of + * chunks. Each chunk contains a linked list of free shared pointers + * that can be allocated to a given user mode client. + * + */ +struct sp_chunk { + struct sp_chunk *next; + gfp_t gfp_mask; + u16 head; + u16 shared_ptr[0]; +}; + +struct c2_pd_table { + struct c2_alloc alloc; + struct c2_array pd; +}; + +struct c2_qp_table { + struct c2_alloc alloc; + spinlock_t lock; + struct c2_array qp; + struct c2_qp** map; +}; + +struct c2_element { + struct c2_element *next; + void *ht_desc; /* host descriptor */ + void __iomem *hw_desc; /* hardware descriptor */ + struct sk_buff *skb; + dma_addr_t mapaddr; + u32 maplen; +}; + +struct c2_ring { + struct c2_element *to_clean; + struct c2_element *to_use; + struct c2_element *start; + unsigned long count; +}; + +struct c2_dev { + struct ib_device ibdev; + void __iomem *regs; + void __iomem *mmio_txp_ring; /* remapped adapter memory for hw rings */ + void __iomem *mmio_rxp_ring; + spinlock_t lock; + struct pci_dev *pcidev; + struct net_device *netdev; + struct net_device *pseudo_netdev; + unsigned int cur_tx; + unsigned int cur_rx; + u32 adapter_handle; + int device_cap_flags; + void __iomem *kva; /* KVA device memory */ + unsigned long pa; /* PA device memory */ + void **qptr_array; + + kmem_cache_t *host_msg_cache; + + struct list_head cca_link; /* adapter list */ + struct list_head eh_wakeup_list; /* event wakeup list */ + wait_queue_head_t req_vq_wo; + + /* Cached RNIC properties */ + struct ib_device_attr props; + + struct c2_pd_table pd_table; + struct c2_qp_table qp_table; + int ports; /* num of GigE ports */ + int devnum; + spinlock_t vqlock; /* sync vbs req MQ */ + + /* Verbs Queues */ + struct c2_mq req_vq; /* Verbs Request MQ */ + struct c2_mq rep_vq; /* Verbs Reply MQ */ + struct c2_mq aeq; /* Async Events MQ */ + + /* Kernel client MQs */ + struct sp_chunk *kern_mqsp_pool; + + /* Device updates these values when posting messages to a host + * target queue */ + u16 req_vq_shared; + u16 rep_vq_shared; + u16 aeq_shared; + u16 irq_claimed; + + /* + * Shared host target pages for user-accessible MQs. + */ + int hthead; /* index of first free entry */ + void *htpages; /* kernel vaddr */ + int htlen; /* length of htpages memory */ + void *htuva; /* user mapped vaddr */ + spinlock_t htlock; /* serialize allocation */ + + u64 adapter_hint_uva; /* access to the activity FIFO */ + + // spinlock_t aeq_lock; + // spinlock_t rnic_lock; + + u16 hint_count; + u16 hints_read; + + int init; /* TRUE if it's ready */ + char ae_cache_name[16]; + char vq_cache_name[16]; +}; + +struct c2_port { + u32 msg_enable; + struct c2_dev *c2dev; + struct net_device *netdev; + + spinlock_t tx_lock; + u32 tx_avail; + struct c2_ring tx_ring; + struct c2_ring rx_ring; + + void *mem; /* PCI memory for host rings */ + dma_addr_t dma; + unsigned long mem_size; + + u32 rx_buf_size; + + struct net_device_stats netstats; +}; + +/* + * Activity FIFO registers in BAR0. + */ +#define PCI_BAR0_HOST_HINT 0x100 +#define PCI_BAR0_ADAPTER_HINT 0x2000 + +/* + * Ammasso PCI vendor id and Cepheus PCI device id. + */ +#define CQ_ARMED 0x01 +#define CQ_WAIT_FOR_DMA 0x80 + +/* + * The format of a hint is as follows: + * Lower 16 bits are the count of hints for the queue. + * Next 15 bits are the qp_index + * Upper most bit depends on who reads it: + * If read by producer, then it means Full (1) or Not-Full (0) + * If read by consumer, then it means Empty (1) or Not-Empty (0) + */ +#define C2_HINT_MAKE(q_index, hint_count) (((q_index) << 16) | hint_count) +#define C2_HINT_GET_INDEX(hint) (((hint) & 0x7FFF0000) >> 16) +#define C2_HINT_GET_COUNT(hint) ((hint) & 0x0000FFFF) + + +/* + * The following defines the offset in SDRAM for the c2_adapter_pci_regs_t + * struct. + */ +#define C2_ADAPTER_PCI_REGS_OFFSET 0x10000 + +#ifndef readq +static inline u64 readq(const void __iomem * addr) +{ + u64 ret = readl(addr + 4); + ret <<= 32; + ret |= readl(addr); + + return ret; +} +#endif + +#ifndef __raw_writeq +static inline void __raw_writeq(u64 val, void __iomem * addr) +{ + __raw_writel((u32) (val), addr); + __raw_writel((u32) (val >> 32), (addr + 4)); +} +#endif + +#define C2_SET_CUR_RX(c2dev, cur_rx) \ + __raw_writel(cpu_to_be32(cur_rx), c2dev->mmio_txp_ring + 4092) + +#define C2_GET_CUR_RX(c2dev) \ + be32_to_cpu(readl(c2dev->mmio_txp_ring + 4092)) + +static inline struct c2_dev *to_c2dev(struct ib_device *ibdev) +{ + return container_of(ibdev, struct c2_dev, ibdev); +} + +static inline int c2_errno(void *reply) +{ + switch (c2_wr_get_result(reply)) { + case C2_OK: + return 0; + case CCERR_NO_BUFS: + case CCERR_INSUFFICIENT_RESOURCES: + case CCERR_ZERO_RDMA_READ_RESOURCES: + return -ENOMEM; + case CCERR_MR_IN_USE: + case CCERR_QP_IN_USE: + return -EBUSY; + case CCERR_ADDR_IN_USE: + return -EADDRINUSE; + case CCERR_ADDR_NOT_AVAIL: + return -EADDRNOTAVAIL; + case CCERR_CONN_RESET: + return -ECONNRESET; + case CCERR_NOT_IMPLEMENTED: + case CCERR_INVALID_WQE: + return -ENOSYS; + case CCERR_QP_NOT_PRIVILEGED: + return -EPERM; + case CCERR_STACK_ERROR: + return -EPROTO; + case CCERR_ACCESS_VIOLATION: + case CCERR_BASE_AND_BOUNDS_VIOLATION: + return -EFAULT; + case CCERR_STAG_STATE_NOT_INVALID: + case CCERR_INVALID_ADDRESS: + case CCERR_INVALID_CQ: + case CCERR_INVALID_EP: + case CCERR_INVALID_MODIFIER: + case CCERR_INVALID_MTU: + case CCERR_INVALID_PD_ID: + case CCERR_INVALID_QP: + case CCERR_INVALID_RNIC: + case CCERR_INVALID_STAG: + return -EINVAL; + default: + return -EAGAIN; + } +} + +/* Device */ +extern int c2_register_device(struct c2_dev *c2dev); +extern void c2_unregister_device(struct c2_dev *c2dev); +extern int c2_rnic_init(struct c2_dev *c2dev); +extern void c2_rnic_term(struct c2_dev *c2dev); +extern void c2_rnic_interrupt(struct c2_dev *c2dev); +extern int c2_rnic_query(struct c2_dev *c2dev, struct ib_device_attr *props); +extern int c2_del_addr(struct c2_dev *c2dev, u32 inaddr, u32 inmask); +extern int c2_add_addr(struct c2_dev *c2dev, u32 inaddr, u32 inmask); + +/* QPs */ +extern int c2_alloc_qp(struct c2_dev *c2dev, struct c2_pd *pd, + struct ib_qp_init_attr *qp_attrs, struct c2_qp *qp); +extern void c2_free_qp(struct c2_dev *c2dev, struct c2_qp *qp); +extern struct ib_qp *c2_get_qp(struct ib_device *device, int qpn); +extern int c2_qp_modify(struct c2_dev *c2dev, struct c2_qp *qp, + struct ib_qp_attr *attr, int attr_mask); +extern int c2_qp_set_read_limits(struct c2_dev *c2dev, struct c2_qp *qp, + int ord, int ird); +extern int c2_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr, + struct ib_send_wr **bad_wr); +extern int c2_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *ib_wr, + struct ib_recv_wr **bad_wr); +extern int __devinit c2_init_qp_table(struct c2_dev *c2dev); +extern void __devexit c2_cleanup_qp_table(struct c2_dev *c2dev); +extern void c2_set_qp_state(struct c2_qp *, int); + +/* PDs */ +extern int c2_pd_alloc(struct c2_dev *c2dev, int privileged, struct c2_pd *pd); +extern void c2_pd_free(struct c2_dev *c2dev, struct c2_pd *pd); +extern int __devinit c2_init_pd_table(struct c2_dev *c2dev); +extern void __devexit c2_cleanup_pd_table(struct c2_dev *c2dev); + +/* CQs */ +extern int c2_init_cq(struct c2_dev *c2dev, int entries, + struct c2_ucontext *ctx, struct c2_cq *cq); +extern void c2_free_cq(struct c2_dev *c2dev, struct c2_cq *cq); +extern void c2_cq_event(struct c2_dev *c2dev, u32 mq_index); +extern void c2_cq_clean(struct c2_dev *c2dev, struct c2_qp *qp, u32 mq_index); +extern int c2_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry); +extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify); + +/* CM */ +extern int c2_llp_connect(struct iw_cm_id *cm_id, + struct iw_cm_conn_param *iw_param); +extern int c2_llp_accept(struct iw_cm_id *cm_id, + struct iw_cm_conn_param *iw_param); +extern int c2_llp_reject(struct iw_cm_id *cm_id, const void *pdata, + u8 pdata_len); +extern int c2_llp_service_create(struct iw_cm_id *cm_id, int backlog); +extern int c2_llp_service_destroy(struct iw_cm_id *cm_id); + +/* MM */ +extern int c2_nsmr_register_phys_kern(struct c2_dev *c2dev, u64 *addr_list, + int page_size, int pbl_depth, u32 length, + u32 off, u64 *va, enum c2_acf acf, + struct c2_mr *mr); +extern int c2_stag_dealloc(struct c2_dev *c2dev, u32 stag_index); + +/* AE */ +extern void c2_ae_event(struct c2_dev *c2dev, u32 mq_index); + +/* Allocators */ +extern u32 c2_alloc(struct c2_alloc *alloc); +extern void c2_free(struct c2_alloc *alloc, u32 obj); +extern int c2_alloc_init(struct c2_alloc *alloc, u32 num, u32 reserved); +extern void c2_alloc_cleanup(struct c2_alloc *alloc); +extern int c2_init_mqsp_pool(gfp_t gfp_mask, struct sp_chunk **root); +extern void c2_free_mqsp_pool(struct sp_chunk *root); +extern u16 *c2_alloc_mqsp(struct sp_chunk *head); +extern void c2_free_mqsp(u16 * mqsp); +extern void c2_array_cleanup(struct c2_array *array, int nent); +extern int c2_array_init(struct c2_array *array, int nent); +extern void c2_array_clear(struct c2_array *array, int index); +extern int c2_array_set(struct c2_array *array, int index, void *value); +extern void *c2_array_get(struct c2_array *array, int index); + +#endif diff --git a/drivers/infiniband/hw/amso1100/c2_ae.c b/drivers/infiniband/hw/amso1100/c2_ae.c new file mode 100644 index 0000000..d5e6729 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_ae.c @@ -0,0 +1,360 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include "c2.h" +#include +#include "c2_status.h" +#include "c2_ae.h" + +static int c2_convert_cm_status(u32 c2_status) +{ + switch (c2_status) { + case C2_CONN_STATUS_SUCCESS: + return 0; + case C2_CONN_STATUS_REJECTED: + return -ENETRESET; + case C2_CONN_STATUS_REFUSED: + return -ECONNREFUSED; + case C2_CONN_STATUS_TIMEDOUT: + return -ETIMEDOUT; + case C2_CONN_STATUS_NETUNREACH: + return -ENETUNREACH; + case C2_CONN_STATUS_HOSTUNREACH: + return -EHOSTUNREACH; + case C2_CONN_STATUS_INVALID_RNIC: + return -EINVAL; + case C2_CONN_STATUS_INVALID_QP: + return -EINVAL; + case C2_CONN_STATUS_INVALID_QP_STATE: + return -EINVAL; + case C2_CONN_STATUS_ADDR_NOT_AVAIL: + return -EADDRNOTAVAIL; + default: + printk(KERN_ERR PFX + "%s - Unable to convert CM status: %d\n", + __FUNCTION__, c2_status); + return -EIO; + } +} + +#ifdef C2_DEBUG +static const char* to_event_str(int event) +{ + static const char* event_str[] = { + "CCAE_REMOTE_SHUTDOWN", + "CCAE_ACTIVE_CONNECT_RESULTS", + "CCAE_CONNECTION_REQUEST", + "CCAE_LLP_CLOSE_COMPLETE", + "CCAE_TERMINATE_MESSAGE_RECEIVED", + "CCAE_LLP_CONNECTION_RESET", + "CCAE_LLP_CONNECTION_LOST", + "CCAE_LLP_SEGMENT_SIZE_INVALID", + "CCAE_LLP_INVALID_CRC", + "CCAE_LLP_BAD_FPDU", + "CCAE_INVALID_DDP_VERSION", + "CCAE_INVALID_RDMA_VERSION", + "CCAE_UNEXPECTED_OPCODE", + "CCAE_INVALID_DDP_QUEUE_NUMBER", + "CCAE_RDMA_READ_NOT_ENABLED", + "CCAE_RDMA_WRITE_NOT_ENABLED", + "CCAE_RDMA_READ_TOO_SMALL", + "CCAE_NO_L_BIT", + "CCAE_TAGGED_INVALID_STAG", + "CCAE_TAGGED_BASE_BOUNDS_VIOLATION", + "CCAE_TAGGED_ACCESS_RIGHTS_VIOLATION", + "CCAE_TAGGED_INVALID_PD", + "CCAE_WRAP_ERROR", + "CCAE_BAD_CLOSE", + "CCAE_BAD_LLP_CLOSE", + "CCAE_INVALID_MSN_RANGE", + "CCAE_INVALID_MSN_GAP", + "CCAE_IRRQ_OVERFLOW", + "CCAE_IRRQ_MSN_GAP", + "CCAE_IRRQ_MSN_RANGE", + "CCAE_IRRQ_INVALID_STAG", + "CCAE_IRRQ_BASE_BOUNDS_VIOLATION", + "CCAE_IRRQ_ACCESS_RIGHTS_VIOLATION", + "CCAE_IRRQ_INVALID_PD", + "CCAE_IRRQ_WRAP_ERROR", + "CCAE_CQ_SQ_COMPLETION_OVERFLOW", + "CCAE_CQ_RQ_COMPLETION_ERROR", + "CCAE_QP_SRQ_WQE_ERROR", + "CCAE_QP_LOCAL_CATASTROPHIC_ERROR", + "CCAE_CQ_OVERFLOW", + "CCAE_CQ_OPERATION_ERROR", + "CCAE_SRQ_LIMIT_REACHED", + "CCAE_QP_RQ_LIMIT_REACHED", + "CCAE_SRQ_CATASTROPHIC_ERROR", + "CCAE_RNIC_CATASTROPHIC_ERROR" + }; + + if (event < CCAE_REMOTE_SHUTDOWN || + event > CCAE_RNIC_CATASTROPHIC_ERROR) + return ""; + + event -= CCAE_REMOTE_SHUTDOWN; + return event_str[event]; +} + +const char *to_qp_state_str(int state) +{ + switch (state) { + case C2_QP_STATE_IDLE: + return "C2_QP_STATE_IDLE"; + case C2_QP_STATE_CONNECTING: + return "C2_QP_STATE_CONNECTING"; + case C2_QP_STATE_RTS: + return "C2_QP_STATE_RTS"; + case C2_QP_STATE_CLOSING: + return "C2_QP_STATE_CLOSING"; + case C2_QP_STATE_TERMINATE: + return "C2_QP_STATE_TERMINATE"; + case C2_QP_STATE_ERROR: + return "C2_QP_STATE_ERROR"; + default: + return ""; + }; +} +#endif + +void c2_ae_event(struct c2_dev *c2dev, u32 mq_index) +{ + struct c2_mq *mq = c2dev->qptr_array[mq_index]; + union c2wr *wr; + void *resource_user_context; + struct iw_cm_event cm_event; + struct ib_event ib_event; + enum c2_resource_indicator resource_indicator; + enum c2_event_id event_id; + unsigned long flags; + u8 *pdata = NULL; + int status; + + /* + * retreive the message + */ + wr = c2_mq_consume(mq); + if (!wr) + return; + + memset(&ib_event, 0, sizeof(ib_event)); + memset(&cm_event, 0, sizeof(cm_event)); + + event_id = c2_wr_get_id(wr); + resource_indicator = be32_to_cpu(wr->ae.ae_generic.resource_type); + resource_user_context = + (void *) (unsigned long) wr->ae.ae_generic.user_context; + + status = cm_event.status = c2_convert_cm_status(c2_wr_get_result(wr)); + + dprintk("event received c2_dev=%p, event_id=%d, " + "resource_indicator=%d, user_context=%p, status = %d\n", + c2dev, event_id, resource_indicator, resource_user_context, + status); + + switch (resource_indicator) { + case C2_RES_IND_QP:{ + + struct c2_qp *qp = (struct c2_qp *)resource_user_context; + struct iw_cm_id *cm_id = qp->cm_id; + struct c2wr_ae_active_connect_results *res; + + if (!cm_id) { + dprintk("event received, but cm_id is , qp=%p!\n", + qp); + goto ignore_it; + } + dprintk("%s: event = %s, user_context=%llx, " + "resource_type=%x, " + "resource=%x, qp_state=%s\n", + __FUNCTION__, + to_event_str(event_id), + be64_to_cpu(wr->ae.ae_generic.user_context), + be32_to_cpu(wr->ae.ae_generic.resource_type), + be32_to_cpu(wr->ae.ae_generic.resource), + to_qp_state_str(be32_to_cpu(wr->ae.ae_generic.qp_state))); + + c2_set_qp_state(qp, be32_to_cpu(wr->ae.ae_generic.qp_state)); + + switch (event_id) { + case CCAE_ACTIVE_CONNECT_RESULTS: + res = &wr->ae.ae_active_connect_results; + cm_event.event = IW_CM_EVENT_CONNECT_REPLY; + cm_event.local_addr.sin_addr.s_addr = res->laddr; + cm_event.remote_addr.sin_addr.s_addr = res->raddr; + cm_event.local_addr.sin_port = res->lport; + cm_event.remote_addr.sin_port = res->rport; + if (status == 0) { + cm_event.private_data_len = + be32_to_cpu(res->private_data_length); + } else { + spin_lock_irqsave(&qp->lock, flags); + if (qp->cm_id) { + qp->cm_id->rem_ref(qp->cm_id); + qp->cm_id = NULL; + } + spin_unlock_irqrestore(&qp->lock, flags); + cm_event.private_data_len = 0; + cm_event.private_data = NULL; + } + if (cm_event.private_data_len) { + /* copy private data */ + pdata = + kmalloc(cm_event.private_data_len, + GFP_ATOMIC); + if (!pdata) { + /* Ignore the request, maybe the + * remote peer will retry */ + dprintk ("Ignored connect request -- " + "no memory for pdata" + "private_data_len=%d\n", + cm_event.private_data_len); + goto ignore_it; + } + + memcpy(pdata, res->private_data, + cm_event.private_data_len); + + cm_event.private_data = pdata; + } + if (cm_id->event_handler) + cm_id->event_handler(cm_id, &cm_event); + break; + case CCAE_TERMINATE_MESSAGE_RECEIVED: + case CCAE_CQ_SQ_COMPLETION_OVERFLOW: + ib_event.device = &c2dev->ibdev; + ib_event.element.qp = &qp->ibqp; + ib_event.event = IB_EVENT_QP_REQ_ERR; + + if (qp->ibqp.event_handler) + qp->ibqp.event_handler(&ib_event, + qp->ibqp. + qp_context); + break; + case CCAE_BAD_CLOSE: + case CCAE_LLP_CLOSE_COMPLETE: + case CCAE_LLP_CONNECTION_RESET: + case CCAE_LLP_CONNECTION_LOST: + BUG_ON(cm_id == NULL); + BUG_ON(cm_id->event_handler==(void*)0x6b6b6b6b); + + spin_lock_irqsave(&qp->lock, flags); + if (qp->cm_id) { + qp->cm_id->rem_ref(qp->cm_id); + qp->cm_id = NULL; + } + spin_unlock_irqrestore(&qp->lock, flags); + cm_event.event = IW_CM_EVENT_CLOSE; + cm_event.status = 0; + if (cm_id->event_handler) + cm_id->event_handler(cm_id, &cm_event); + break; + default: + BUG_ON(1); + dprintk("%s:%d Unexpected event_id=%d on QP=%p, " + "CM_ID=%p\n", + __FUNCTION__, __LINE__, + event_id, qp, cm_id); + break; + } + break; + } + + case C2_RES_IND_EP:{ + + struct c2wr_ae_connection_request *req = + &wr->ae.ae_connection_request; + struct iw_cm_id *cm_id = + (struct iw_cm_id *)resource_user_context; + + dprintk("C2_RES_IND_EP event_id=%d\n", event_id); + if (event_id != CCAE_CONNECTION_REQUEST) { + dprintk("%s: Invalid event_id: %d\n", + __FUNCTION__, event_id); + break; + } + cm_event.event = IW_CM_EVENT_CONNECT_REQUEST; + cm_event.provider_data = (void*)(unsigned long)req->cr_handle; + cm_event.local_addr.sin_addr.s_addr = req->laddr; + cm_event.remote_addr.sin_addr.s_addr = req->raddr; + cm_event.local_addr.sin_port = req->lport; + cm_event.remote_addr.sin_port = req->rport; + cm_event.private_data_len = + be32_to_cpu(req->private_data_length); + + if (cm_event.private_data_len) { + pdata = + kmalloc(cm_event.private_data_len, + GFP_ATOMIC); + if (!pdata) { + /* Ignore the request, maybe the remote peer + * will retry */ + dprintk ("Ignored connect request -- " + "no memory for pdata" + "private_data_len=%d\n", + cm_event.private_data_len); + goto ignore_it; + } + memcpy(pdata, + req->private_data, + cm_event.private_data_len); + + cm_event.private_data = pdata; + } + if (cm_id->event_handler) + cm_id->event_handler(cm_id, &cm_event); + break; + } + + case C2_RES_IND_CQ:{ + struct c2_cq *cq = + (struct c2_cq *) resource_user_context; + + dprintk("IB_EVENT_CQ_ERR\n"); + ib_event.device = &c2dev->ibdev; + ib_event.element.cq = &cq->ibcq; + ib_event.event = IB_EVENT_CQ_ERR; + + if (cq->ibcq.event_handler) + cq->ibcq.event_handler(&ib_event, + cq->ibcq.cq_context); + } + + default: + printk("Bad resource indicator = %d\n", + resource_indicator); + break; + } + + ignore_it: + c2_mq_free(mq); +} diff --git a/drivers/infiniband/hw/amso1100/c2_intr.c b/drivers/infiniband/hw/amso1100/c2_intr.c new file mode 100644 index 0000000..5306a15 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_intr.c @@ -0,0 +1,211 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include "c2.h" +#include +#include "c2_vq.h" + +static void handle_mq(struct c2_dev *c2dev, u32 index); +static void handle_vq(struct c2_dev *c2dev, u32 mq_index); + +/* + * Handle RNIC interrupts + */ +void c2_rnic_interrupt(struct c2_dev *c2dev) +{ + unsigned int mq_index; + + while (c2dev->hints_read != be16_to_cpu(c2dev->hint_count)) { + mq_index = readl(c2dev->regs + PCI_BAR0_HOST_HINT); + if (mq_index & 0x80000000) { + break; + } + + c2dev->hints_read++; + handle_mq(c2dev, mq_index); + } + +} + +/* + * Top level MQ handler + */ +static void handle_mq(struct c2_dev *c2dev, u32 mq_index) +{ + if (c2dev->qptr_array[mq_index] == NULL) { + dprintk(KERN_INFO "handle_mq: stray activity for mq_index=%d\n", + mq_index); + return; + } + + switch (mq_index) { + case (0): + /* + * An index of 0 in the activity queue + * indicates the req vq now has messages + * available... + * + * Wake up any waiters waiting on req VQ + * message availability. + */ + wake_up(&c2dev->req_vq_wo); + break; + case (1): + handle_vq(c2dev, mq_index); + break; + case (2): + /* We have to purge the VQ in case there are pending + * accept reply requests that would result in the + * generation of an ESTABLISHED event. If we don't + * generate these first, a CLOSE event could end up + * being delivered before the ESTABLISHED event. + */ + handle_vq(c2dev, 1); + + c2_ae_event(c2dev, mq_index); + break; + default: + /* There is no event synchronization between CQ events + * and AE or CM events. In fact, CQE could be + * delivered for all of the I/O up to and including the + * FLUSH for a peer disconenct prior to the ESTABLISHED + * event being delivered to the app. The reason for this + * is that CM events are delivered on a thread, while AE + * and CM events are delivered on interrupt context. + */ + c2_cq_event(c2dev, mq_index); + break; + } + + return; +} + +/* + * Handles verbs WR replies. + */ +static void handle_vq(struct c2_dev *c2dev, u32 mq_index) +{ + void *adapter_msg, *reply_msg; + struct c2wr_hdr *host_msg; + struct c2wr_hdr tmp; + struct c2_mq *reply_vq; + struct c2_vq_req *req; + struct iw_cm_event cm_event; + int err; + + reply_vq = (struct c2_mq *) c2dev->qptr_array[mq_index]; + + /* + * get next msg from mq_index into adapter_msg. + * don't free it yet. + */ + adapter_msg = c2_mq_consume(reply_vq); + if (adapter_msg == NULL) { + return; + } + + host_msg = vq_repbuf_alloc(c2dev); + + /* + * If we can't get a host buffer, then we'll still + * wakeup the waiter, we just won't give him the msg. + * It is assumed the waiter will deal with this... + */ + if (!host_msg) { + dprintk("handle_vq: no repbufs!\n"); + + /* + * just copy the WR header into a local variable. + * this allows us to still demux on the context + */ + host_msg = &tmp; + memcpy(host_msg, adapter_msg, sizeof(tmp)); + reply_msg = NULL; + } else { + memcpy(host_msg, adapter_msg, reply_vq->msg_size); + reply_msg = host_msg; + } + + /* + * consume the msg from the MQ + */ + c2_mq_free(reply_vq); + + /* + * wakeup the waiter. + */ + req = (struct c2_vq_req *) (unsigned long) host_msg->context; + if (req == NULL) { + /* + * We should never get here, as the adapter should + * never send us a reply that we're not expecting. + */ + vq_repbuf_free(c2dev, host_msg); + dprintk("handle_vq: UNEXPECTEDLY got NULL req\n"); + return; + } + + err = c2_errno(reply_msg); + if (!err) switch (req->event) { + case IW_CM_EVENT_ESTABLISHED: + BUG_ON(!req->qp); + c2_set_qp_state(req->qp, + C2_QP_STATE_RTS); + case IW_CM_EVENT_CLOSE: + BUG_ON(!req->cm_id); + /* + * Move the QP to RTS if this is + * the established event + */ + cm_event.event = req->event; + cm_event.status = 0; + cm_event.local_addr = req->cm_id->local_addr; + cm_event.remote_addr = req->cm_id->remote_addr; + cm_event.private_data = NULL; + cm_event.private_data_len = 0; + BUG_ON(req->cm_id->event_handler == NULL); + req->cm_id->event_handler(req->cm_id, &cm_event); + break; + default: + break; + } + + req->reply_msg = (u64) (unsigned long) (reply_msg); + atomic_set(&req->reply_ready, 1); + wake_up(&req->wait_object); + + /* + * If the request was cancelled, then this put will + * free the vq_req memory...and reply_msg!!! + */ + vq_req_put(c2dev, req); +} diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c new file mode 100644 index 0000000..6f255b0 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c @@ -0,0 +1,720 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#ifdef NETEVENT_NOTIFIER +#include +#include +#include +#endif + + +#include +#include +#include +#include +#include "c2.h" +#include "c2_vq.h" + +/* Device capabilities */ +#define C2_MIN_PAGESIZE 1024 + +#define C2_MAX_MRS 32768 +#define C2_MAX_QPS 16000 +#define C2_MAX_WQE_SZ 256 +#define C2_MAX_QP_WR ((128*1024)/C2_MAX_WQE_SZ) +#define C2_MAX_SGES 4 +#define C2_MAX_SGE_RD 1 +#define C2_MAX_CQS 32768 +#define C2_MAX_CQES 4096 +#define C2_MAX_PDS 16384 + +/* + * Send the adapter INIT message to the amso1100 + */ +static int c2_adapter_init(struct c2_dev *c2dev) +{ + struct c2wr_init_req wr; + int err; + + memset(&wr, 0, sizeof(wr)); + c2_wr_set_id(&wr, CCWR_INIT); + wr.hdr.context = 0; + wr.hint_count = cpu_to_be64(__pa(&c2dev->hint_count)); + wr.q0_host_shared = cpu_to_be64(__pa(c2dev->req_vq.shared)); + wr.q1_host_shared = cpu_to_be64(__pa(c2dev->rep_vq.shared)); + wr.q1_host_msg_pool = cpu_to_be64(__pa(c2dev->rep_vq.msg_pool.host)); + wr.q2_host_shared = cpu_to_be64(__pa(c2dev->aeq.shared)); + wr.q2_host_msg_pool = cpu_to_be64(__pa(c2dev->aeq.msg_pool.host)); + + /* Post the init message */ + err = vq_send_wr(c2dev, (union c2wr *) & wr); + + return err; +} + +/* + * Send the adapter TERM message to the amso1100 + */ +static void c2_adapter_term(struct c2_dev *c2dev) +{ + struct c2wr_init_req wr; + + memset(&wr, 0, sizeof(wr)); + c2_wr_set_id(&wr, CCWR_TERM); + wr.hdr.context = 0; + + /* Post the init message */ + vq_send_wr(c2dev, (union c2wr *) & wr); + c2dev->init = 0; + + return; +} + +/* + * Query the adapter + */ +int c2_rnic_query(struct c2_dev *c2dev, + struct ib_device_attr *props) +{ + struct c2_vq_req *vq_req; + struct c2wr_rnic_query_req wr; + struct c2wr_rnic_query_rep *reply; + int err; + + vq_req = vq_req_alloc(c2dev); + if (!vq_req) + return -ENOMEM; + + c2_wr_set_id(&wr, CCWR_RNIC_QUERY); + wr.hdr.context = (unsigned long) vq_req; + wr.rnic_handle = c2dev->adapter_handle; + + vq_req_get(c2dev, vq_req); + + err = vq_send_wr(c2dev, (union c2wr *) &wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail1; + } + + err = vq_wait_for_reply(c2dev, vq_req); + if (err) + goto bail1; + + reply = + (struct c2wr_rnic_query_rep *) (unsigned long) (vq_req->reply_msg); + if (!reply) + err = -ENOMEM; + + err = c2_errno(reply); + if (err) + goto bail2; + + props->fw_ver = + ((u64)be32_to_cpu(reply->fw_ver_major) << 32) | + ((be32_to_cpu(reply->fw_ver_minor) && 0xFFFF) << 16) | + (be32_to_cpu(reply->fw_ver_patch) && 0xFFFF); + memcpy(&props->sys_image_guid, c2dev->netdev->dev_addr, 6); + props->max_mr_size = 0xFFFFFFFF; + props->page_size_cap = ~(C2_MIN_PAGESIZE-1); + props->vendor_id = be32_to_cpu(reply->vendor_id); + props->vendor_part_id = be32_to_cpu(reply->part_number); + props->hw_ver = be32_to_cpu(reply->hw_version); + props->max_qp = be32_to_cpu(reply->max_qps); + props->max_qp_wr = be32_to_cpu(reply->max_qp_depth); + props->device_cap_flags = c2dev->device_cap_flags; + props->max_sge = C2_MAX_SGES; + props->max_sge_rd = C2_MAX_SGE_RD; + props->max_cq = be32_to_cpu(reply->max_cqs); + props->max_cqe = be32_to_cpu(reply->max_cq_depth); + props->max_mr = be32_to_cpu(reply->max_mrs); + props->max_pd = be32_to_cpu(reply->max_pds); + props->max_qp_rd_atom = be32_to_cpu(reply->max_qp_ird); + props->max_ee_rd_atom = 0; + props->max_res_rd_atom = be32_to_cpu(reply->max_global_ird); + props->max_qp_init_rd_atom = be32_to_cpu(reply->max_qp_ord); + props->max_ee_init_rd_atom = 0; + props->atomic_cap = IB_ATOMIC_NONE; + props->max_ee = 0; + props->max_rdd = 0; + props->max_mw = be32_to_cpu(reply->max_mws); + props->max_raw_ipv6_qp = 0; + props->max_raw_ethy_qp = 0; + props->max_mcast_grp = 0; + props->max_mcast_qp_attach = 0; + props->max_total_mcast_qp_attach = 0; + props->max_ah = 0; + props->max_fmr = 0; + props->max_map_per_fmr = 0; + props->max_srq = 0; + props->max_srq_wr = 0; + props->max_srq_sge = 0; + props->max_pkeys = 0; + props->local_ca_ack_delay = 0; + + bail2: + vq_repbuf_free(c2dev, reply); + + bail1: + vq_req_free(c2dev, vq_req); + return err; +} + +/* + * Add an IP address to the RNIC interface + */ +int c2_add_addr(struct c2_dev *c2dev, u32 inaddr, u32 inmask) +{ + struct c2_vq_req *vq_req; + struct c2wr_rnic_setconfig_req *wr; + struct c2wr_rnic_setconfig_rep *reply; + struct c2_netaddr netaddr; + int err, len; + + vq_req = vq_req_alloc(c2dev); + if (!vq_req) + return -ENOMEM; + + len = sizeof(struct c2_netaddr); + wr = kmalloc(c2dev->req_vq.msg_size, GFP_KERNEL); + if (!wr) { + err = -ENOMEM; + goto bail0; + } + + c2_wr_set_id(wr, CCWR_RNIC_SETCONFIG); + wr->hdr.context = (unsigned long) vq_req; + wr->rnic_handle = c2dev->adapter_handle; + wr->option = cpu_to_be32(C2_CFG_ADD_ADDR); + + netaddr.ip_addr = inaddr; + netaddr.netmask = inmask; + netaddr.mtu = 0; + + memcpy(wr->data, &netaddr, len); + + vq_req_get(c2dev, vq_req); + + err = vq_send_wr(c2dev, (union c2wr *) wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail1; + } + + err = vq_wait_for_reply(c2dev, vq_req); + if (err) + goto bail1; + + reply = + (struct c2wr_rnic_setconfig_rep *) (unsigned long) (vq_req->reply_msg); + if (!reply) { + err = -ENOMEM; + goto bail1; + } + + err = c2_errno(reply); + vq_repbuf_free(c2dev, reply); + + bail1: + kfree(wr); + bail0: + vq_req_free(c2dev, vq_req); + return err; +} + +/* + * Delete an IP address from the RNIC interface + */ +int c2_del_addr(struct c2_dev *c2dev, u32 inaddr, u32 inmask) +{ + struct c2_vq_req *vq_req; + struct c2wr_rnic_setconfig_req *wr; + struct c2wr_rnic_setconfig_rep *reply; + struct c2_netaddr netaddr; + int err, len; + + vq_req = vq_req_alloc(c2dev); + if (!vq_req) + return -ENOMEM; + + len = sizeof(struct c2_netaddr); + wr = kmalloc(c2dev->req_vq.msg_size, GFP_KERNEL); + if (!wr) { + err = -ENOMEM; + goto bail0; + } + + c2_wr_set_id(wr, CCWR_RNIC_SETCONFIG); + wr->hdr.context = (unsigned long) vq_req; + wr->rnic_handle = c2dev->adapter_handle; + wr->option = cpu_to_be32(C2_CFG_DEL_ADDR); + + netaddr.ip_addr = inaddr; + netaddr.netmask = inmask; + netaddr.mtu = 0; + + memcpy(wr->data, &netaddr, len); + + vq_req_get(c2dev, vq_req); + + err = vq_send_wr(c2dev, (union c2wr *) wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail1; + } + + err = vq_wait_for_reply(c2dev, vq_req); + if (err) + goto bail1; + + reply = + (struct c2wr_rnic_setconfig_rep *) (unsigned long) (vq_req->reply_msg); + if (!reply) { + err = -ENOMEM; + goto bail1; + } + + err = c2_errno(reply); + vq_repbuf_free(c2dev, reply); + + bail1: + kfree(wr); + bail0: + vq_req_free(c2dev, vq_req); + return err; +} + +/* + * Open a single RNIC instance to use with all + * low level openib calls + */ +static int c2_rnic_open(struct c2_dev *c2dev) +{ + struct c2_vq_req *vq_req; + union c2wr wr; + struct c2wr_rnic_open_rep *reply; + int err; + + vq_req = vq_req_alloc(c2dev); + if (vq_req == NULL) { + return -ENOMEM; + } + + memset(&wr, 0, sizeof(wr)); + c2_wr_set_id(&wr, CCWR_RNIC_OPEN); + wr.rnic_open.req.hdr.context = (unsigned long) (vq_req); + wr.rnic_open.req.flags = cpu_to_be16(RNIC_PRIV_MODE); + wr.rnic_open.req.port_num = cpu_to_be16(0); + wr.rnic_open.req.user_context = (unsigned long) c2dev; + + vq_req_get(c2dev, vq_req); + + err = vq_send_wr(c2dev, &wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail0; + } + + err = vq_wait_for_reply(c2dev, vq_req); + if (err) { + goto bail0; + } + + reply = (struct c2wr_rnic_open_rep *) (unsigned long) (vq_req->reply_msg); + if (!reply) { + err = -ENOMEM; + goto bail0; + } + + if ((err = c2_errno(reply)) != 0) { + goto bail1; + } + + c2dev->adapter_handle = reply->rnic_handle; + + bail1: + vq_repbuf_free(c2dev, reply); + bail0: + vq_req_free(c2dev, vq_req); + return err; +} + +/* + * Close the RNIC instance + */ +static int c2_rnic_close(struct c2_dev *c2dev) +{ + struct c2_vq_req *vq_req; + union c2wr wr; + struct c2wr_rnic_close_rep *reply; + int err; + + vq_req = vq_req_alloc(c2dev); + if (vq_req == NULL) { + return -ENOMEM; + } + + memset(&wr, 0, sizeof(wr)); + c2_wr_set_id(&wr, CCWR_RNIC_CLOSE); + wr.rnic_close.req.hdr.context = (unsigned long) vq_req; + wr.rnic_close.req.rnic_handle = c2dev->adapter_handle; + + vq_req_get(c2dev, vq_req); + + err = vq_send_wr(c2dev, &wr); + if (err) { + vq_req_put(c2dev, vq_req); + goto bail0; + } + + err = vq_wait_for_reply(c2dev, vq_req); + if (err) { + goto bail0; + } + + reply = (struct c2wr_rnic_close_rep *) (unsigned long) (vq_req->reply_msg); + if (!reply) { + err = -ENOMEM; + goto bail0; + } + + if ((err = c2_errno(reply)) != 0) { + goto bail1; + } + + c2dev->adapter_handle = 0; + + bail1: + vq_repbuf_free(c2dev, reply); + bail0: + vq_req_free(c2dev, vq_req); + return err; +} This seems like log spam, or developer debug thing. You need to learn to watch netlink event's from user space. + +#ifdef NETEVENT_NOTIFIER +static int netevent_notifier(struct notifier_block *self, unsigned long event, + void *data) +{ + int i; + u8 *ha; + struct neighbour *neigh = data; + struct netevent_redirect *redir = data; + struct netevent_route_change *rev = data; + + switch (event) { + case NETEVENT_ROUTE_UPDATE: + printk(KERN_ERR "NETEVENT_ROUTE_UPDATE:\n"); + printk(KERN_ERR "fib_flags : %d\n", + rev->fib_info->fib_flags); + printk(KERN_ERR "fib_protocol : %d\n", + rev->fib_info->fib_protocol); + printk(KERN_ERR "fib_prefsrc : %08x\n", + rev->fib_info->fib_prefsrc); + printk(KERN_ERR "fib_priority : %d\n", + rev->fib_info->fib_priority); + break; + + case NETEVENT_NEIGH_UPDATE: + printk(KERN_ERR "NETEVENT_NEIGH_UPDATE:\n"); + printk(KERN_ERR "nud_state : %d\n", neigh->nud_state); + printk(KERN_ERR "refcnt : %d\n", neigh->refcnt); + printk(KERN_ERR "used : %d\n", neigh->used); + printk(KERN_ERR "confirmed : %d\n", neigh->confirmed); + printk(KERN_ERR " ha: "); + for (i = 0; i < neigh->dev->addr_len; i += 4) { + ha = &neigh->ha[i]; + printk("%02x:%02x:%02x:%02x:", ha[0], ha[1], ha[2], + ha[3]); + } + printk("\n"); + + printk(KERN_ERR "%8s: ", neigh->dev->name); + for (i = 0; i < neigh->dev->addr_len; i += 4) { + ha = &neigh->ha[i]; + printk("%02x:%02x:%02x:%02x:", ha[0], ha[1], ha[2], + ha[3]); + } + printk("\n"); + break; + + case NETEVENT_REDIRECT: + printk(KERN_ERR "NETEVENT_REDIRECT:\n"); + printk(KERN_ERR "old: "); + for (i = 0; i < redir->old->neighbour->dev->addr_len; i += 4) { + ha = &redir->old->neighbour->ha[i]; + printk("%02x:%02x:%02x:%02x:", ha[0], ha[1], ha[2], + ha[3]); + } + printk("\n"); + + printk(KERN_ERR "new: "); + for (i = 0; i < redir->new->neighbour->dev->addr_len; i += 4) { + ha = &redir->new->neighbour->ha[i]; + printk("%02x:%02x:%02x:%02x:", ha[0], ha[1], ha[2], + ha[3]); + } + printk("\n"); + break; + + default: + printk(KERN_ERR "NETEVENT_WTFO:\n"); + } + + return NOTIFY_DONE; +} + +static struct notifier_block nb = { + .notifier_call = netevent_notifier, +}; +#endif +/* + * Called by c2_probe to initialize the RNIC. This principally + * involves initalizing the various limits and resouce pools that + * comprise the RNIC instance. + */ +int c2_rnic_init(struct c2_dev *c2dev) +{ + int err; + u32 qsize, msgsize; + void *q1_pages; + void *q2_pages; + void __iomem *mmio_regs; + + /* Device capabilities */ + c2dev->device_cap_flags = + (IB_DEVICE_RESIZE_MAX_WR | + IB_DEVICE_CURR_QP_STATE_MOD | + IB_DEVICE_SYS_IMAGE_GUID | + IB_DEVICE_ZERO_STAG | + IB_DEVICE_SEND_W_INV | IB_DEVICE_MEM_WINDOW); + + /* Allocate the qptr_array */ + c2dev->qptr_array = vmalloc(C2_MAX_CQS * sizeof(void *)); + if (!c2dev->qptr_array) { + return -ENOMEM; + } + + /* Inialize the qptr_array */ + memset(c2dev->qptr_array, 0, C2_MAX_CQS * sizeof(void *)); + c2dev->qptr_array[0] = (void *) &c2dev->req_vq; + c2dev->qptr_array[1] = (void *) &c2dev->rep_vq; + c2dev->qptr_array[2] = (void *) &c2dev->aeq; + + /* Initialize data structures */ + init_waitqueue_head(&c2dev->req_vq_wo); + spin_lock_init(&c2dev->vqlock); + spin_lock_init(&c2dev->lock); + + /* Allocate MQ shared pointer pool for kernel clients. User + * mode client pools are hung off the user context + */ + err = c2_init_mqsp_pool(GFP_KERNEL, &c2dev->kern_mqsp_pool); + if (err) { + goto bail0; + } + + /* Allocate shared pointers for Q0, Q1, and Q2 from + * the shared pointer pool. + */ + c2dev->req_vq.shared = c2_alloc_mqsp(c2dev->kern_mqsp_pool); + c2dev->rep_vq.shared = c2_alloc_mqsp(c2dev->kern_mqsp_pool); + c2dev->aeq.shared = c2_alloc_mqsp(c2dev->kern_mqsp_pool); + if (!c2dev->req_vq.shared || + !c2dev->rep_vq.shared || !c2dev->aeq.shared) { + err = -ENOMEM; + goto bail1; + } + + mmio_regs = c2dev->kva; + /* Initialize the Verbs Request Queue */ + c2_mq_req_init(&c2dev->req_vq, 0, + be32_to_cpu(readl(mmio_regs + C2_REGS_Q0_QSIZE)), + be32_to_cpu(readl(mmio_regs + C2_REGS_Q0_MSGSIZE)), + mmio_regs + + be32_to_cpu(readl(mmio_regs + C2_REGS_Q0_POOLSTART)), + mmio_regs + + be32_to_cpu(readl(mmio_regs + C2_REGS_Q0_SHARED)), + C2_MQ_ADAPTER_TARGET); + + /* Initialize the Verbs Reply Queue */ + qsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q1_QSIZE)); + msgsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q1_MSGSIZE)); + q1_pages = kmalloc(qsize * msgsize, GFP_KERNEL); + if (!q1_pages) { + err = -ENOMEM; + goto bail1; + } + c2_mq_rep_init(&c2dev->rep_vq, + 1, + qsize, + msgsize, + q1_pages, + mmio_regs + + be32_to_cpu(readl(mmio_regs + C2_REGS_Q1_SHARED)), + C2_MQ_HOST_TARGET); + + /* Initialize the Asynchronus Event Queue */ + qsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q2_QSIZE)); + msgsize = be32_to_cpu(readl(mmio_regs + C2_REGS_Q2_MSGSIZE)); + q2_pages = kmalloc(qsize * msgsize, GFP_KERNEL); + if (!q2_pages) { + err = -ENOMEM; + goto bail2; + } + c2_mq_rep_init(&c2dev->aeq, + 2, + qsize, + msgsize, + q2_pages, + mmio_regs + + be32_to_cpu(readl(mmio_regs + C2_REGS_Q2_SHARED)), + C2_MQ_HOST_TARGET); + + /* Initialize the verbs request allocator */ + err = vq_init(c2dev); + if (err) + goto bail3; + + /* Enable interrupts on the adapter */ + writel(0, c2dev->regs + C2_IDIS); + + /* create the WR init message */ + err = c2_adapter_init(c2dev); + if (err) + goto bail4; + c2dev->init++; + + /* open an adapter instance */ + err = c2_rnic_open(c2dev); + if (err) + goto bail4; + + /* Initialize cached the adapter limits */ + if (c2_rnic_query(c2dev, &c2dev->props)) + goto bail4; + + /* Initialize the PD pool */ + err = c2_init_pd_table(c2dev); + if (err) + goto bail5; + + /* Initialize the QP pool */ + err = c2_init_qp_table(c2dev); + if (err) + goto bail6; + +#ifdef NETEVENT_NOTIFIER + register_netevent_notifier(&nb); +#endif + return 0; + + bail6: + c2_cleanup_pd_table(c2dev); + bail5: + c2_rnic_close(c2dev); + bail4: + vq_term(c2dev); + bail3: + kfree(q2_pages); + bail2: + kfree(q1_pages); + bail1: + c2_free_mqsp_pool(c2dev->kern_mqsp_pool); + bail0: + vfree(c2dev->qptr_array); + + return err; +} + +/* + * Called by c2_remove to cleanup the RNIC resources. + */ +void c2_rnic_term(struct c2_dev *c2dev) +{ +#ifdef NETEVENT_NOTIFIER + unregister_netevent_notifier(&nb); +#endif + + /* Close the open adapter instance */ + c2_rnic_close(c2dev); + + /* Send the TERM message to the adapter */ + c2_adapter_term(c2dev); + + /* Disable interrupts on the adapter */ + writel(1, c2dev->regs + C2_IDIS); + + /* Free the QP pool */ + c2_cleanup_qp_table(c2dev); + + /* Free the PD pool */ + c2_cleanup_pd_table(c2dev); + + /* Free the verbs request allocator */ + vq_term(c2dev); + + /* Free the asynchronus event queue */ + kfree(c2dev->aeq.msg_pool.host); + + /* Free the verbs reply queue */ + kfree(c2dev->rep_vq.msg_pool.host); + + /* Free the MQ shared pointer pool */ + c2_free_mqsp_pool(c2dev->kern_mqsp_pool); + + /* Free the qptr_array */ + vfree(c2dev->qptr_array); + + return; +} - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From rdreier at cisco.com Wed May 31 12:17:26 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 12:17:26 -0700 Subject: [openib-general] Re: [PATCH 2/2] iWARP Core Changes. In-Reply-To: <20060531182654.3308.41372.stgit@stevo-desktop> (Steve Wise's message of "Wed, 31 May 2006 13:26:55 -0500") References: <20060531182650.3308.81538.stgit@stevo-desktop> <20060531182654.3308.41372.stgit@stevo-desktop> Message-ID: > +EXPORT_SYMBOL(copy_addr); I think if you want to export this, it needs a less generic name (something with an rdma_ prefix probably). Otherwise it's going to collide someday... From trimmer at silverstorm.com Wed May 31 12:40:42 2006 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Wed, 31 May 2006 15:40:42 -0400 Subject: [openib-general] QoS RFC - Resend using a friendly mailer Message-ID: I am am member of MgtWG and will look for the discussion there. It doesn't seem like a topic LWG would cover. Todd Rimmer > -----Original Message----- > From: Eitan Zahavi [mailto:eitan at mellanox.co.il] > Sent: Wednesday, May 31, 2006 1:35 AM > To: Rimmer, Todd > Cc: openib-general at openib.org > Subject: RE: [openib-general] QoS RFC - Resend using a friendly mailer > > > Hi Todd, > > It is LWG. MgtWG will also be involved. > > > > I am a member of IBTA however I have not noticed this discussion on > the IBTA > > working groups. Which working group have you engaged with this > proposal? > > > > Todd Rimmer > From eitan at mellanox.co.il Wed May 31 13:14:52 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 31 May 2006 23:14:52 +0300 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30243EF3D@mtlexch01.mtl.com> > -----Original Message----- > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Wednesday, May 31, 2006 9:15 PM > To: Eitan Zahavi > Cc: Leonid Arsh; Roland Dreier; openib-general at openib.org > Subject: Re: [openib-general][PATCH 1 of 3] repost: Client Reregister support for > kernel space > > Eitan Zahavi wrote: > > Leonid just sent an example for a race that might happen if the SM is to > > be the maintainer of the data. > > The race Leonid mentioned is a client sending a request when the SM is down. > That request will fail, so there's no data for the SM to maintain for that node. > That's a retry condition that the client must deal with. [EZ] The race is happening when the SM received the request and responded but the other SMs or the file system did not fully stored that registration and the SM crashed. > > > [EZ] The SM is a single entity that has to respond to all requests from > > the entire cluster. (Even redirection requests). When you require that > > SM to also provide transaction safe storage or even worse then that > > consistency with multiple standby SMs you worsen the problem. The > > clients on the their side only need to maintain their own registrations. > > I don't believe that there's any requirement that the SM be a single system. > But I do believe that the SM should be able to recover from all SM problems > without interrupting any existing communication that is occurring the fabric. > SM failover or failure/restart should be as transparent to the clients (i.e the > non-SM nodes in the fabric) as possible. (Btw, I also believe that the SM > should run on top of a real DBMS and support SQL style queries...) [EZ] Please do the math of how many transactions per second you are going to get from a reasonably priced SM if you are going to have a distributed DBMS for that sake. > > You don't want to push this problem to every application running in the fabric, > so why even push it to every node in the fabric? > > - Sean From swise at opengridcomputing.com Wed May 31 13:30:35 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 31 May 2006 15:30:35 -0500 Subject: [openib-general] Re: [PATCH 2/2] iWARP Core Changes. In-Reply-To: References: <20060531182650.3308.81538.stgit@stevo-desktop> <20060531182654.3308.41372.stgit@stevo-desktop> Message-ID: <1149107435.7469.7.camel@stevo-desktop> On Wed, 2006-05-31 at 12:17 -0700, Roland Dreier wrote: > > +EXPORT_SYMBOL(copy_addr); > > I think if you want to export this, it needs a less generic name > (something with an rdma_ prefix probably). Otherwise it's going to > collide someday... ok. The function is needed by the iwcm module, so that's why we exported it. I could change the name to rdma_copy_addr(), or make the function a static inline in a header file since its kinda small anyway... Any preference? From rdreier at cisco.com Wed May 31 13:32:44 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 13:32:44 -0700 Subject: [openib-general] Re: [PATCH 2/2] iWARP Core Changes. In-Reply-To: <1149107435.7469.7.camel@stevo-desktop> (Steve Wise's message of "Wed, 31 May 2006 15:30:35 -0500") References: <20060531182650.3308.81538.stgit@stevo-desktop> <20060531182654.3308.41372.stgit@stevo-desktop> <1149107435.7469.7.camel@stevo-desktop> Message-ID: Steve> The function is needed by the iwcm module, so that's why we Steve> exported it. I could change the name to rdma_copy_addr(), Steve> or make the function a static inline in a header file since Steve> its kinda small anyway... It looks too big to inline to me, and I don't think it's on a fast path anyway. So I would export it. - R. From rdreier at cisco.com Wed May 31 13:36:44 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 13:36:44 -0700 Subject: [openib-general] Re: [PATCH 1/7] AMSO1100 Makefiles and Kconfig changes. In-Reply-To: <20060531182735.3652.44197.stgit@stevo-desktop> (Steve Wise's message of "Wed, 31 May 2006 13:27:35 -0500") References: <20060531182733.3652.54755.stgit@stevo-desktop> <20060531182735.3652.44197.stgit@stevo-desktop> Message-ID: Can you reorder things so these changes go last? Otherwise after this patch we're left with a kernel tree that has a Makefile that refers to sources that don't exist yet. It's not really a practical issue but it is neater to do that way. (It's easy to do in stgit -- just pop all the patches and then use "stg push " to push them in a different order) - R. From swise at opengridcomputing.com Wed May 31 13:39:13 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 31 May 2006 15:39:13 -0500 Subject: [openib-general] Re: [PATCH 1/7] AMSO1100 Makefiles and Kconfig changes. In-Reply-To: References: <20060531182733.3652.54755.stgit@stevo-desktop> <20060531182735.3652.44197.stgit@stevo-desktop> Message-ID: <1149107953.7469.9.camel@stevo-desktop> On Wed, 2006-05-31 at 13:36 -0700, Roland Dreier wrote: > Can you reorder things so these changes go last? Otherwise after this > patch we're left with a kernel tree that has a Makefile that refers to > sources that don't exist yet. It's not really a practical issue but > it is neater to do that way. > > (It's easy to do in stgit -- just pop all the patches and then use > "stg push " to push them in a different order) > > - R. will do. From rdreier at cisco.com Wed May 31 13:40:43 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 13:40:43 -0700 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: <10e223bf0605310848r20ae5dd3rdf279a387524a326@mail.gmail.com> (Leonid Arsh's message of "Wed, 31 May 2006 18:48:04 +0300") References: <20060509060958.GA482@voltaire.com> <10e223bf0605300403lcc24b8bwf1e1d7059edab416@mail.gmail.com> <10e223bf0605310249t57e89b72ne8e38e57945a2cec@mail.gmail.com> <10e223bf0605310848r20ae5dd3rdf279a387524a326@mail.gmail.com> Message-ID: Leonid> Generating the LID_CHANGE event instead of Leonid> CLIENT_REREGISTER is simply not correct. Leonid> We need the event for our user mode applications. Leonid> Although the patch doesn't change current functionality, I Leonid> wouldn't like to write applications based on the erroneous Leonid> code. The application won't just work with devices that Leonid> generate the event correctly. OK, I don't have any real objection to this patch, so I'll fix up the series and apply them for 2.6.18. And it looks easy to fix up ipath too -- any reason you didn't do that? - R. From swise at opengridcomputing.com Wed May 31 13:50:38 2006 From: swise at opengridcomputing.com (Steve Wise) Date: Wed, 31 May 2006 15:50:38 -0500 Subject: [openib-general] [PATCH 3/7] AMSO1100 WR / Event Definitions. In-Reply-To: <20060531182733.3652.54755.stgit@stevo-desktop> References: <20060531182733.3652.54755.stgit@stevo-desktop> Message-ID: <1149108638.7469.12.camel@stevo-desktop> I've sent this twice in-line and it is not going through. So here it is as an attachment. Steve. -------------- next part -------------- AMSO1100 WR / Event Definitions. From: Steve Wise --- drivers/infiniband/hw/amso1100/c2_ae.h | 108 ++ drivers/infiniband/hw/amso1100/c2_status.h | 158 +++ drivers/infiniband/hw/amso1100/c2_wr.h | 1523 ++++++++++++++++++++++++++++ 3 files changed, 1789 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/amso1100/c2_ae.h b/drivers/infiniband/hw/amso1100/c2_ae.h new file mode 100644 index 0000000..3a065c3 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_ae.h @@ -0,0 +1,108 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef _C2_AE_H_ +#define _C2_AE_H_ + +/* + * WARNING: If you change this file, also bump C2_IVN_BASE + * in common/include/clustercore/c2_ivn.h. + */ + +/* + * Asynchronous Event Identifiers + * + * These start at 0x80 only so it's obvious from inspection that + * they are not work-request statuses. This isn't critical. + * + * NOTE: these event id's must fit in eight bits. + */ +enum c2_event_id { + CCAE_REMOTE_SHUTDOWN = 0x80, + CCAE_ACTIVE_CONNECT_RESULTS, + CCAE_CONNECTION_REQUEST, + CCAE_LLP_CLOSE_COMPLETE, + CCAE_TERMINATE_MESSAGE_RECEIVED, + CCAE_LLP_CONNECTION_RESET, + CCAE_LLP_CONNECTION_LOST, + CCAE_LLP_SEGMENT_SIZE_INVALID, + CCAE_LLP_INVALID_CRC, + CCAE_LLP_BAD_FPDU, + CCAE_INVALID_DDP_VERSION, + CCAE_INVALID_RDMA_VERSION, + CCAE_UNEXPECTED_OPCODE, + CCAE_INVALID_DDP_QUEUE_NUMBER, + CCAE_RDMA_READ_NOT_ENABLED, + CCAE_RDMA_WRITE_NOT_ENABLED, + CCAE_RDMA_READ_TOO_SMALL, + CCAE_NO_L_BIT, + CCAE_TAGGED_INVALID_STAG, + CCAE_TAGGED_BASE_BOUNDS_VIOLATION, + CCAE_TAGGED_ACCESS_RIGHTS_VIOLATION, + CCAE_TAGGED_INVALID_PD, + CCAE_WRAP_ERROR, + CCAE_BAD_CLOSE, + CCAE_BAD_LLP_CLOSE, + CCAE_INVALID_MSN_RANGE, + CCAE_INVALID_MSN_GAP, + CCAE_IRRQ_OVERFLOW, + CCAE_IRRQ_MSN_GAP, + CCAE_IRRQ_MSN_RANGE, + CCAE_IRRQ_INVALID_STAG, + CCAE_IRRQ_BASE_BOUNDS_VIOLATION, + CCAE_IRRQ_ACCESS_RIGHTS_VIOLATION, + CCAE_IRRQ_INVALID_PD, + CCAE_IRRQ_WRAP_ERROR, + CCAE_CQ_SQ_COMPLETION_OVERFLOW, + CCAE_CQ_RQ_COMPLETION_ERROR, + CCAE_QP_SRQ_WQE_ERROR, + CCAE_QP_LOCAL_CATASTROPHIC_ERROR, + CCAE_CQ_OVERFLOW, + CCAE_CQ_OPERATION_ERROR, + CCAE_SRQ_LIMIT_REACHED, + CCAE_QP_RQ_LIMIT_REACHED, + CCAE_SRQ_CATASTROPHIC_ERROR, + CCAE_RNIC_CATASTROPHIC_ERROR +/* WARNING If you add more id's, make sure their values fit in eight bits. */ +}; + +/* + * Resource Indicators and Identifiers + */ +enum c2_resource_indicator { + C2_RES_IND_QP = 1, + C2_RES_IND_EP, + C2_RES_IND_CQ, + C2_RES_IND_SRQ, +}; + +#endif /* _C2_AE_H_ */ diff --git a/drivers/infiniband/hw/amso1100/c2_status.h b/drivers/infiniband/hw/amso1100/c2_status.h new file mode 100644 index 0000000..6ee4aa9 --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_status.h @@ -0,0 +1,158 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef _C2_STATUS_H_ +#define _C2_STATUS_H_ + +/* + * Verbs Status Codes + */ +enum c2_status { + C2_OK = 0, /* This must be zero */ + CCERR_INSUFFICIENT_RESOURCES = 1, + CCERR_INVALID_MODIFIER = 2, + CCERR_INVALID_MODE = 3, + CCERR_IN_USE = 4, + CCERR_INVALID_RNIC = 5, + CCERR_INTERRUPTED_OPERATION = 6, + CCERR_INVALID_EH = 7, + CCERR_INVALID_CQ = 8, + CCERR_CQ_EMPTY = 9, + CCERR_NOT_IMPLEMENTED = 10, + CCERR_CQ_DEPTH_TOO_SMALL = 11, + CCERR_PD_IN_USE = 12, + CCERR_INVALID_PD = 13, + CCERR_INVALID_SRQ = 14, + CCERR_INVALID_ADDRESS = 15, + CCERR_INVALID_NETMASK = 16, + CCERR_INVALID_QP = 17, + CCERR_INVALID_QP_STATE = 18, + CCERR_TOO_MANY_WRS_POSTED = 19, + CCERR_INVALID_WR_TYPE = 20, + CCERR_INVALID_SGL_LENGTH = 21, + CCERR_INVALID_SQ_DEPTH = 22, + CCERR_INVALID_RQ_DEPTH = 23, + CCERR_INVALID_ORD = 24, + CCERR_INVALID_IRD = 25, + CCERR_QP_ATTR_CANNOT_CHANGE = 26, + CCERR_INVALID_STAG = 27, + CCERR_QP_IN_USE = 28, + CCERR_OUTSTANDING_WRS = 29, + CCERR_STAG_IN_USE = 30, + CCERR_INVALID_STAG_INDEX = 31, + CCERR_INVALID_SGL_FORMAT = 32, + CCERR_ADAPTER_TIMEOUT = 33, + CCERR_INVALID_CQ_DEPTH = 34, + CCERR_INVALID_PRIVATE_DATA_LENGTH = 35, + CCERR_INVALID_EP = 36, + CCERR_MR_IN_USE = CCERR_STAG_IN_USE, + CCERR_FLUSHED = 38, + CCERR_INVALID_WQE = 39, + CCERR_LOCAL_QP_CATASTROPHIC_ERROR = 40, + CCERR_REMOTE_TERMINATION_ERROR = 41, + CCERR_BASE_AND_BOUNDS_VIOLATION = 42, + CCERR_ACCESS_VIOLATION = 43, + CCERR_INVALID_PD_ID = 44, + CCERR_WRAP_ERROR = 45, + CCERR_INV_STAG_ACCESS_ERROR = 46, + CCERR_ZERO_RDMA_READ_RESOURCES = 47, + CCERR_QP_NOT_PRIVILEGED = 48, + CCERR_STAG_STATE_NOT_INVALID = 49, + CCERR_INVALID_PAGE_SIZE = 50, + CCERR_INVALID_BUFFER_SIZE = 51, + CCERR_INVALID_PBE = 52, + CCERR_INVALID_FBO = 53, + CCERR_INVALID_LENGTH = 54, + CCERR_INVALID_ACCESS_RIGHTS = 55, + CCERR_PBL_TOO_BIG = 56, + CCERR_INVALID_VA = 57, + CCERR_INVALID_REGION = 58, + CCERR_INVALID_WINDOW = 59, + CCERR_TOTAL_LENGTH_TOO_BIG = 60, + CCERR_INVALID_QP_ID = 61, + CCERR_ADDR_IN_USE = 62, + CCERR_ADDR_NOT_AVAIL = 63, + CCERR_NET_DOWN = 64, + CCERR_NET_UNREACHABLE = 65, + CCERR_CONN_ABORTED = 66, + CCERR_CONN_RESET = 67, + CCERR_NO_BUFS = 68, + CCERR_CONN_TIMEDOUT = 69, + CCERR_CONN_REFUSED = 70, + CCERR_HOST_UNREACHABLE = 71, + CCERR_INVALID_SEND_SGL_DEPTH = 72, + CCERR_INVALID_RECV_SGL_DEPTH = 73, + CCERR_INVALID_RDMA_WRITE_SGL_DEPTH = 74, + CCERR_INSUFFICIENT_PRIVILEGES = 75, + CCERR_STACK_ERROR = 76, + CCERR_INVALID_VERSION = 77, + CCERR_INVALID_MTU = 78, + CCERR_INVALID_IMAGE = 79, + CCERR_PENDING = 98, /* not an error; user internally by adapter */ + CCERR_DEFER = 99, /* not an error; used internally by adapter */ + CCERR_FAILED_WRITE = 100, + CCERR_FAILED_ERASE = 101, + CCERR_FAILED_VERIFICATION = 102, + CCERR_NOT_FOUND = 103, + +}; + +/* + * CCAE_ACTIVE_CONNECT_RESULTS status result codes. + */ +enum c2_connect_status { + C2_CONN_STATUS_SUCCESS = C2_OK, + C2_CONN_STATUS_NO_MEM = CCERR_INSUFFICIENT_RESOURCES, + C2_CONN_STATUS_TIMEDOUT = CCERR_CONN_TIMEDOUT, + C2_CONN_STATUS_REFUSED = CCERR_CONN_REFUSED, + C2_CONN_STATUS_NETUNREACH = CCERR_NET_UNREACHABLE, + C2_CONN_STATUS_HOSTUNREACH = CCERR_HOST_UNREACHABLE, + C2_CONN_STATUS_INVALID_RNIC = CCERR_INVALID_RNIC, + C2_CONN_STATUS_INVALID_QP = CCERR_INVALID_QP, + C2_CONN_STATUS_INVALID_QP_STATE = CCERR_INVALID_QP_STATE, + C2_CONN_STATUS_REJECTED = CCERR_CONN_RESET, + C2_CONN_STATUS_ADDR_NOT_AVAIL = CCERR_ADDR_NOT_AVAIL, +}; + +/* + * Flash programming status codes. + */ +enum c2_flash_status { + C2_FLASH_STATUS_SUCCESS = 0x0000, + C2_FLASH_STATUS_VERIFY_ERR = 0x0002, + C2_FLASH_STATUS_IMAGE_ERR = 0x0004, + C2_FLASH_STATUS_ECLBS = 0x0400, + C2_FLASH_STATUS_PSLBS = 0x0800, + C2_FLASH_STATUS_VPENS = 0x1000, +}; + +#endif /* _C2_STATUS_H_ */ diff --git a/drivers/infiniband/hw/amso1100/c2_wr.h b/drivers/infiniband/hw/amso1100/c2_wr.h new file mode 100644 index 0000000..9d6468d --- /dev/null +++ b/drivers/infiniband/hw/amso1100/c2_wr.h @@ -0,0 +1,1523 @@ +/* + * Copyright (c) 2005 Ammasso, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef _C2_WR_H_ +#define _C2_WR_H_ + +#ifdef CCDEBUG +#define CCWR_MAGIC 0xb07700b0 +#endif + +#define C2_QP_NO_ATTR_CHANGE 0xFFFFFFFF + +/* Maximum allowed size in bytes of private_data exchange + * on connect. + */ +#define C2_MAX_PRIVATE_DATA_SIZE 200 + +/* + * These types are shared among the adapter, host, and CCIL consumer. + */ +enum c2_cq_notification_type { + C2_CQ_NOTIFICATION_TYPE_NONE = 1, + C2_CQ_NOTIFICATION_TYPE_NEXT, + C2_CQ_NOTIFICATION_TYPE_NEXT_SE +}; + +enum c2_setconfig_cmd { + C2_CFG_ADD_ADDR = 1, + C2_CFG_DEL_ADDR = 2, + C2_CFG_ADD_ROUTE = 3, + C2_CFG_DEL_ROUTE = 4 +}; + +enum c2_getconfig_cmd { + C2_GETCONFIG_ROUTES = 1, + C2_GETCONFIG_ADDRS +}; + +/* + * CCIL Work Request Identifiers + */ +enum c2wr_ids { + CCWR_RNIC_OPEN = 1, + CCWR_RNIC_QUERY, + CCWR_RNIC_SETCONFIG, + CCWR_RNIC_GETCONFIG, + CCWR_RNIC_CLOSE, + CCWR_CQ_CREATE, + CCWR_CQ_QUERY, + CCWR_CQ_MODIFY, + CCWR_CQ_DESTROY, + CCWR_QP_CONNECT, + CCWR_PD_ALLOC, + CCWR_PD_DEALLOC, + CCWR_SRQ_CREATE, + CCWR_SRQ_QUERY, + CCWR_SRQ_MODIFY, + CCWR_SRQ_DESTROY, + CCWR_QP_CREATE, + CCWR_QP_QUERY, + CCWR_QP_MODIFY, + CCWR_QP_DESTROY, + CCWR_NSMR_STAG_ALLOC, + CCWR_NSMR_REGISTER, + CCWR_NSMR_PBL, + CCWR_STAG_DEALLOC, + CCWR_NSMR_REREGISTER, + CCWR_SMR_REGISTER, + CCWR_MR_QUERY, + CCWR_MW_ALLOC, + CCWR_MW_QUERY, + CCWR_EP_CREATE, + CCWR_EP_GETOPT, + CCWR_EP_SETOPT, + CCWR_EP_DESTROY, + CCWR_EP_BIND, + CCWR_EP_CONNECT, + CCWR_EP_LISTEN, + CCWR_EP_SHUTDOWN, + CCWR_EP_LISTEN_CREATE, + CCWR_EP_LISTEN_DESTROY, + CCWR_EP_QUERY, + CCWR_CR_ACCEPT, + CCWR_CR_REJECT, + CCWR_CONSOLE, + CCWR_TERM, + CCWR_FLASH_INIT, + CCWR_FLASH, + CCWR_BUF_ALLOC, + CCWR_BUF_FREE, + CCWR_FLASH_WRITE, + CCWR_INIT, /* WARNING: Don't move this ever again! */ + + + + /* Add new IDs here */ + + + + /* + * WARNING: CCWR_LAST must always be the last verbs id defined! + * All the preceding IDs are fixed, and must not change. + * You can add new IDs, but must not remove or reorder + * any IDs. If you do, YOU will ruin any hope of + * compatability between versions. + */ + CCWR_LAST, + + /* + * Start over at 1 so that arrays indexed by user wr id's + * begin at 1. This is OK since the verbs and user wr id's + * are always used on disjoint sets of queues. + */ + /* + * The order of the CCWR_SEND_XX verbs must + * match the order of the RDMA_OPs + */ + CCWR_SEND = 1, + CCWR_SEND_INV, + CCWR_SEND_SE, + CCWR_SEND_SE_INV, + CCWR_RDMA_WRITE, + CCWR_RDMA_READ, + CCWR_RDMA_READ_INV, + CCWR_MW_BIND, + CCWR_NSMR_FASTREG, + CCWR_STAG_INVALIDATE, + CCWR_RECV, + CCWR_NOP, + CCWR_UNIMPL, +/* WARNING: This must always be the last user wr id defined! */ +}; +#define RDMA_SEND_OPCODE_FROM_WR_ID(x) (x+2) + +/* + * SQ/RQ Work Request Types + */ +enum c2_wr_type { + C2_WR_TYPE_SEND = CCWR_SEND, + C2_WR_TYPE_SEND_SE = CCWR_SEND_SE, + C2_WR_TYPE_SEND_INV = CCWR_SEND_INV, + C2_WR_TYPE_SEND_SE_INV = CCWR_SEND_SE_INV, + C2_WR_TYPE_RDMA_WRITE = CCWR_RDMA_WRITE, + C2_WR_TYPE_RDMA_READ = CCWR_RDMA_READ, + C2_WR_TYPE_RDMA_READ_INV_STAG = CCWR_RDMA_READ_INV, + C2_WR_TYPE_BIND_MW = CCWR_MW_BIND, + C2_WR_TYPE_FASTREG_NSMR = CCWR_NSMR_FASTREG, + C2_WR_TYPE_INV_STAG = CCWR_STAG_INVALIDATE, + C2_WR_TYPE_RECV = CCWR_RECV, + C2_WR_TYPE_NOP = CCWR_NOP, +}; + +struct c2_netaddr { + u32 ip_addr; + u32 netmask; + u32 mtu; +}; + +struct c2_route { + u32 ip_addr; /* 0 indicates the default route */ + u32 netmask; /* netmask associated with dst */ + u32 flags; + union { + u32 ipaddr; /* address of the nexthop interface */ + u8 enaddr[6]; + } nexthop; +}; + +/* + * A Scatter Gather Entry. + */ +struct c2_data_addr { + u32 stag; + u32 length; + u64 to; +}; + +/* + * MR and MW flags used by the consumer, RI, and RNIC. + */ +enum c2_mm_flags { + MEM_REMOTE = 0x0001, /* allow mw binds with remote access. */ + MEM_VA_BASED = 0x0002, /* Not Zero-based */ + MEM_PBL_COMPLETE = 0x0004, /* PBL array is complete in this msg */ + MEM_LOCAL_READ = 0x0008, /* allow local reads */ + MEM_LOCAL_WRITE = 0x0010, /* allow local writes */ + MEM_REMOTE_READ = 0x0020, /* allow remote reads */ + MEM_REMOTE_WRITE = 0x0040, /* allow remote writes */ + MEM_WINDOW_BIND = 0x0080, /* binds allowed */ + MEM_SHARED = 0x0100, /* set if MR is shared */ + MEM_STAG_VALID = 0x0200 /* set if STAG is in valid state */ +}; + +/* + * CCIL API ACF flags defined in terms of the low level mem flags. + * This minimizes translation needed in the user API + */ +enum c2_acf { + C2_ACF_LOCAL_READ = MEM_LOCAL_READ, + C2_ACF_LOCAL_WRITE = MEM_LOCAL_WRITE, + C2_ACF_REMOTE_READ = MEM_REMOTE_READ, + C2_ACF_REMOTE_WRITE = MEM_REMOTE_WRITE, + C2_ACF_WINDOW_BIND = MEM_WINDOW_BIND +}; + +/* + * Image types of objects written to flash + */ +#define C2_FLASH_IMG_BITFILE 1 +#define C2_FLASH_IMG_OPTION_ROM 2 +#define C2_FLASH_IMG_VPD 3 + +/* + * to fix bug 1815 we define the max size allowable of the + * terminate message (per the IETF spec).Refer to the IETF + * protocal specification, section 12.1.6, page 64) + * The message is prefixed by 20 types of DDP info. + * + * Then the message has 6 bytes for the terminate control + * and DDP segment length info plus a DDP header (either + * 14 or 18 byts) plus 28 bytes for the RDMA header. + * Thus the max size in: + * 20 + (6 + 18 + 28) = 72 + */ +#define C2_MAX_TERMINATE_MESSAGE_SIZE (72) + +/* + * Build String Length. It must be the same as C2_BUILD_STR_LEN in ccil_api.h + */ +#define WR_BUILD_STR_LEN 64 + +/* + * WARNING: All of these structs need to align any 64bit types on + * 64 bit boundaries! 64bit types include u64 and u64. + */ + +/* + * Clustercore Work Request Header. Be sensitive to field layout + * and alignment. + */ +struct c2wr_hdr { + /* wqe_count is part of the cqe. It is put here so the + * adapter can write to it while the wr is pending without + * clobbering part of the wr. This word need not be dma'd + * from the host to adapter by libccil, but we copy it anyway + * to make the memcpy to the adapter better aligned. + */ + u32 wqe_count; + + /* Put these fields next so that later 32- and 64-bit + * quantities are naturally aligned. + */ + u8 id; + u8 result; /* adapter -> host */ + u8 sge_count; /* host -> adapter */ + u8 flags; /* host -> adapter */ + + u64 context; +#ifdef CCMSGMAGIC + u32 magic; + u32 pad; +#endif +} __attribute__((packed)); + +/* + *------------------------ RNIC ------------------------ + */ + +/* + * WR_RNIC_OPEN + */ + +/* + * Flags for the RNIC WRs + */ +enum c2_rnic_flags { + RNIC_IRD_STATIC = 0x0001, + RNIC_ORD_STATIC = 0x0002, + RNIC_QP_STATIC = 0x0004, + RNIC_SRQ_SUPPORTED = 0x0008, + RNIC_PBL_BLOCK_MODE = 0x0010, + RNIC_SRQ_MODEL_ARRIVAL = 0x0020, + RNIC_CQ_OVF_DETECTED = 0x0040, + RNIC_PRIV_MODE = 0x0080 +}; + +struct c2wr_rnic_open_req { + struct c2wr_hdr hdr; + u64 user_context; + u16 flags; /* See enum c2_rnic_flags */ + u16 port_num; +} __attribute__((packed)); + +struct c2wr_rnic_open_rep { + struct c2wr_hdr hdr; + u32 rnic_handle; +} __attribute__((packed)); + +union c2wr_rnic_open { + struct c2wr_rnic_open_req req; + struct c2wr_rnic_open_rep rep; +} __attribute__((packed)); + +struct c2wr_rnic_query_req { + struct c2wr_hdr hdr; + u32 rnic_handle; +} __attribute__((packed)); + +/* + * WR_RNIC_QUERY + */ +struct c2wr_rnic_query_rep { + struct c2wr_hdr hdr; + u64 user_context; + u32 vendor_id; + u32 part_number; + u32 hw_version; + u32 fw_ver_major; + u32 fw_ver_minor; + u32 fw_ver_patch; + char fw_ver_build_str[WR_BUILD_STR_LEN]; + u32 max_qps; + u32 max_qp_depth; + u32 max_srq_depth; + u32 max_send_sgl_depth; + u32 max_rdma_sgl_depth; + u32 max_cqs; + u32 max_cq_depth; + u32 max_cq_event_handlers; + u32 max_mrs; + u32 max_pbl_depth; + u32 max_pds; + u32 max_global_ird; + u32 max_global_ord; + u32 max_qp_ird; + u32 max_qp_ord; + u32 flags; + u32 max_mws; + u32 pbe_range_low; + u32 pbe_range_high; + u32 max_srqs; + u32 page_size; +} __attribute__((packed)); + +union c2wr_rnic_query { + struct c2wr_rnic_query_req req; + struct c2wr_rnic_query_rep rep; +} __attribute__((packed)); + +/* + * WR_RNIC_GETCONFIG + */ + +struct c2wr_rnic_getconfig_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 option; /* see c2_getconfig_cmd_t */ + u64 reply_buf; + u32 reply_buf_len; +} __attribute__((packed)) ; + +struct c2wr_rnic_getconfig_rep { + struct c2wr_hdr hdr; + u32 option; /* see c2_getconfig_cmd_t */ + u32 count_len; /* length of the number of addresses configured */ +} __attribute__((packed)) ; + +union c2wr_rnic_getconfig { + struct c2wr_rnic_getconfig_req req; + struct c2wr_rnic_getconfig_rep rep; +} __attribute__((packed)) ; + +/* + * WR_RNIC_SETCONFIG + */ +struct c2wr_rnic_setconfig_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 option; /* See c2_setconfig_cmd_t */ + /* variable data and pad. See c2_netaddr and c2_route */ + u8 data[0]; +} __attribute__((packed)) ; + +struct c2wr_rnic_setconfig_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_rnic_setconfig { + struct c2wr_rnic_setconfig_req req; + struct c2wr_rnic_setconfig_rep rep; +} __attribute__((packed)) ; + +/* + * WR_RNIC_CLOSE + */ +struct c2wr_rnic_close_req { + struct c2wr_hdr hdr; + u32 rnic_handle; +} __attribute__((packed)) ; + +struct c2wr_rnic_close_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_rnic_close { + struct c2wr_rnic_close_req req; + struct c2wr_rnic_close_rep rep; +} __attribute__((packed)) ; + +/* + *------------------------ CQ ------------------------ + */ +struct c2wr_cq_create_req { + struct c2wr_hdr hdr; + u64 shared_ht; + u64 user_context; + u64 msg_pool; + u32 rnic_handle; + u32 msg_size; + u32 depth; +} __attribute__((packed)) ; + +struct c2wr_cq_create_rep { + struct c2wr_hdr hdr; + u32 mq_index; + u32 adapter_shared; + u32 cq_handle; +} __attribute__((packed)) ; + +union c2wr_cq_create { + struct c2wr_cq_create_req req; + struct c2wr_cq_create_rep rep; +} __attribute__((packed)) ; + +struct c2wr_cq_modify_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 cq_handle; + u32 new_depth; + u64 new_msg_pool; +} __attribute__((packed)) ; + +struct c2wr_cq_modify_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_cq_modify { + struct c2wr_cq_modify_req req; + struct c2wr_cq_modify_rep rep; +} __attribute__((packed)) ; + +struct c2wr_cq_destroy_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 cq_handle; +} __attribute__((packed)) ; + +struct c2wr_cq_destroy_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_cq_destroy { + struct c2wr_cq_destroy_req req; + struct c2wr_cq_destroy_rep rep; +} __attribute__((packed)) ; + +/* + *------------------------ PD ------------------------ + */ +struct c2wr_pd_alloc_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 pd_id; +} __attribute__((packed)) ; + +struct c2wr_pd_alloc_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_pd_alloc { + struct c2wr_pd_alloc_req req; + struct c2wr_pd_alloc_rep rep; +} __attribute__((packed)) ; + +struct c2wr_pd_dealloc_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 pd_id; +} __attribute__((packed)) ; + +struct c2wr_pd_dealloc_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_pd_dealloc { + struct c2wr_pd_dealloc_req req; + struct c2wr_pd_dealloc_rep rep; +} __attribute__((packed)) ; + +/* + *------------------------ SRQ ------------------------ + */ +struct c2wr_srq_create_req { + struct c2wr_hdr hdr; + u64 shared_ht; + u64 user_context; + u32 rnic_handle; + u32 srq_depth; + u32 srq_limit; + u32 sgl_depth; + u32 pd_id; +} __attribute__((packed)) ; + +struct c2wr_srq_create_rep { + struct c2wr_hdr hdr; + u32 srq_depth; + u32 sgl_depth; + u32 msg_size; + u32 mq_index; + u32 mq_start; + u32 srq_handle; +} __attribute__((packed)) ; + +union c2wr_srq_create { + struct c2wr_srq_create_req req; + struct c2wr_srq_create_rep rep; +} __attribute__((packed)) ; + +struct c2wr_srq_destroy_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 srq_handle; +} __attribute__((packed)) ; + +struct c2wr_srq_destroy_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_srq_destroy { + struct c2wr_srq_destroy_req req; + struct c2wr_srq_destroy_rep rep; +} __attribute__((packed)) ; + +/* + *------------------------ QP ------------------------ + */ +enum c2wr_qp_flags { + QP_RDMA_READ = 0x00000001, /* RDMA read enabled? */ + QP_RDMA_WRITE = 0x00000002, /* RDMA write enabled? */ + QP_MW_BIND = 0x00000004, /* MWs enabled */ + QP_ZERO_STAG = 0x00000008, /* enabled? */ + QP_REMOTE_TERMINATION = 0x00000010, /* remote end terminated */ + QP_RDMA_READ_RESPONSE = 0x00000020 /* Remote RDMA read */ + /* enabled? */ +}; + +struct c2wr_qp_create_req { + struct c2wr_hdr hdr; + u64 shared_sq_ht; + u64 shared_rq_ht; + u64 user_context; + u32 rnic_handle; + u32 sq_cq_handle; + u32 rq_cq_handle; + u32 sq_depth; + u32 rq_depth; + u32 srq_handle; + u32 srq_limit; + u32 flags; /* see enum c2wr_qp_flags */ + u32 send_sgl_depth; + u32 recv_sgl_depth; + u32 rdma_write_sgl_depth; + u32 ord; + u32 ird; + u32 pd_id; +} __attribute__((packed)) ; + +struct c2wr_qp_create_rep { + struct c2wr_hdr hdr; + u32 sq_depth; + u32 rq_depth; + u32 send_sgl_depth; + u32 recv_sgl_depth; + u32 rdma_write_sgl_depth; + u32 ord; + u32 ird; + u32 sq_msg_size; + u32 sq_mq_index; + u32 sq_mq_start; + u32 rq_msg_size; + u32 rq_mq_index; + u32 rq_mq_start; + u32 qp_handle; +} __attribute__((packed)) ; + +union c2wr_qp_create { + struct c2wr_qp_create_req req; + struct c2wr_qp_create_rep rep; +} __attribute__((packed)) ; + +struct c2wr_qp_query_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 qp_handle; +} __attribute__((packed)) ; + +struct c2wr_qp_query_rep { + struct c2wr_hdr hdr; + u64 user_context; + u32 rnic_handle; + u32 sq_depth; + u32 rq_depth; + u32 send_sgl_depth; + u32 rdma_write_sgl_depth; + u32 recv_sgl_depth; + u32 ord; + u32 ird; + u16 qp_state; + u16 flags; /* see c2wr_qp_flags_t */ + u32 qp_id; + u32 local_addr; + u32 remote_addr; + u16 local_port; + u16 remote_port; + u32 terminate_msg_length; /* 0 if not present */ + u8 data[0]; + /* Terminate Message in-line here. */ +} __attribute__((packed)) ; + +union c2wr_qp_query { + struct c2wr_qp_query_req req; + struct c2wr_qp_query_rep rep; +} __attribute__((packed)) ; + +struct c2wr_qp_modify_req { + struct c2wr_hdr hdr; + u64 stream_msg; + u32 stream_msg_length; + u32 rnic_handle; + u32 qp_handle; + u32 next_qp_state; + u32 ord; + u32 ird; + u32 sq_depth; + u32 rq_depth; + u32 llp_ep_handle; +} __attribute__((packed)) ; + +struct c2wr_qp_modify_rep { + struct c2wr_hdr hdr; + u32 ord; + u32 ird; + u32 sq_depth; + u32 rq_depth; + u32 sq_msg_size; + u32 sq_mq_index; + u32 sq_mq_start; + u32 rq_msg_size; + u32 rq_mq_index; + u32 rq_mq_start; +} __attribute__((packed)) ; + +union c2wr_qp_modify { + struct c2wr_qp_modify_req req; + struct c2wr_qp_modify_rep rep; +} __attribute__((packed)) ; + +struct c2wr_qp_destroy_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 qp_handle; +} __attribute__((packed)) ; + +struct c2wr_qp_destroy_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_qp_destroy { + struct c2wr_qp_destroy_req req; + struct c2wr_qp_destroy_rep rep; +} __attribute__((packed)) ; + +/* + * The CCWR_QP_CONNECT msg is posted on the verbs request queue. It can + * only be posted when a QP is in IDLE state. After the connect request is + * submitted to the LLP, the adapter moves the QP to CONNECT_PENDING state. + * No synchronous reply from adapter to this WR. The results of + * connection are passed back in an async event CCAE_ACTIVE_CONNECT_RESULTS + * See c2wr_ae_active_connect_results_t + */ +struct c2wr_qp_connect_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 qp_handle; + u32 remote_addr; + u16 remote_port; + u16 pad; + u32 private_data_length; + u8 private_data[0]; /* Private data in-line. */ +} __attribute__((packed)) ; + +struct c2wr_qp_connect { + struct c2wr_qp_connect_req req; + /* no synchronous reply. */ +} __attribute__((packed)) ; + + +/* + *------------------------ MM ------------------------ + */ + +struct c2wr_nsmr_stag_alloc_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 pbl_depth; + u32 pd_id; + u32 flags; +} __attribute__((packed)) ; + +struct c2wr_nsmr_stag_alloc_rep { + struct c2wr_hdr hdr; + u32 pbl_depth; + u32 stag_index; +} __attribute__((packed)) ; + +union c2wr_nsmr_stag_alloc { + struct c2wr_nsmr_stag_alloc_req req; + struct c2wr_nsmr_stag_alloc_rep rep; +} __attribute__((packed)) ; + +struct c2wr_nsmr_register_req { + struct c2wr_hdr hdr; + u64 va; + u32 rnic_handle; + u16 flags; + u8 stag_key; + u8 pad; + u32 pd_id; + u32 pbl_depth; + u32 pbe_size; + u32 fbo; + u32 length; + u32 addrs_length; + /* array of paddrs (must be aligned on a 64bit boundary) */ + u64 paddrs[0]; +} __attribute__((packed)) ; + +struct c2wr_nsmr_register_rep { + struct c2wr_hdr hdr; + u32 pbl_depth; + u32 stag_index; +} __attribute__((packed)) ; + +union c2wr_nsmr_register { + struct c2wr_nsmr_register_req req; + struct c2wr_nsmr_register_rep rep; +} __attribute__((packed)) ; + +struct c2wr_nsmr_pbl_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 flags; + u32 stag_index; + u32 addrs_length; + /* array of paddrs (must be aligned on a 64bit boundary) */ + u64 paddrs[0]; +} __attribute__((packed)) ; + +struct c2wr_nsmr_pbl_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_nsmr_pbl { + struct c2wr_nsmr_pbl_req req; + struct c2wr_nsmr_pbl_rep rep; +} __attribute__((packed)) ; + +struct c2wr_mr_query_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 stag_index; +} __attribute__((packed)) ; + +struct c2wr_mr_query_rep { + struct c2wr_hdr hdr; + u8 stag_key; + u8 pad[3]; + u32 pd_id; + u32 flags; + u32 pbl_depth; +} __attribute__((packed)) ; + +union c2wr_mr_query { + struct c2wr_mr_query_req req; + struct c2wr_mr_query_rep rep; +} __attribute__((packed)) ; + +struct c2wr_mw_query_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 stag_index; +} __attribute__((packed)) ; + +struct c2wr_mw_query_rep { + struct c2wr_hdr hdr; + u8 stag_key; + u8 pad[3]; + u32 pd_id; + u32 flags; +} __attribute__((packed)) ; + +union c2wr_mw_query { + struct c2wr_mw_query_req req; + struct c2wr_mw_query_rep rep; +} __attribute__((packed)) ; + + +struct c2wr_stag_dealloc_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 stag_index; +} __attribute__((packed)) ; + +struct c2wr_stag_dealloc_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)) ; + +union c2wr_stag_dealloc { + struct c2wr_stag_dealloc_req req; + struct c2wr_stag_dealloc_rep rep; +} __attribute__((packed)) ; + +struct c2wr_nsmr_reregister_req { + struct c2wr_hdr hdr; + u64 va; + u32 rnic_handle; + u16 flags; + u8 stag_key; + u8 pad; + u32 stag_index; + u32 pd_id; + u32 pbl_depth; + u32 pbe_size; + u32 fbo; + u32 length; + u32 addrs_length; + u32 pad1; + /* array of paddrs (must be aligned on a 64bit boundary) */ + u64 paddrs[0]; +} __attribute__((packed)) ; + +struct c2wr_nsmr_reregister_rep { + struct c2wr_hdr hdr; + u32 pbl_depth; + u32 stag_index; +} __attribute__((packed)) ; + +union c2wr_nsmr_reregister { + struct c2wr_nsmr_reregister_req req; + struct c2wr_nsmr_reregister_rep rep; +} __attribute__((packed)) ; + +struct c2wr_smr_register_req { + struct c2wr_hdr hdr; + u64 va; + u32 rnic_handle; + u16 flags; + u8 stag_key; + u8 pad; + u32 stag_index; + u32 pd_id; +} __attribute__((packed)) ; + +struct c2wr_smr_register_rep { + struct c2wr_hdr hdr; + u32 stag_index; +} __attribute__((packed)) ; + +union c2wr_smr_register { + struct c2wr_smr_register_req req; + struct c2wr_smr_register_rep rep; +} __attribute__((packed)) ; + +struct c2wr_mw_alloc_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 pd_id; +} __attribute__((packed)) ; + +struct c2wr_mw_alloc_rep { + struct c2wr_hdr hdr; + u32 stag_index; +} __attribute__((packed)) ; + +union c2wr_mw_alloc { + struct c2wr_mw_alloc_req req; + struct c2wr_mw_alloc_rep rep; +} __attribute__((packed)) ; + +/* + *------------------------ WRs ----------------------- + */ + +struct c2wr_user_hdr { + struct c2wr_hdr hdr; /* Has status and WR Type */ +} __attribute__((packed)) ; + +enum c2_qp_state { + C2_QP_STATE_IDLE = 0x01, + C2_QP_STATE_CONNECTING = 0x02, + C2_QP_STATE_RTS = 0x04, + C2_QP_STATE_CLOSING = 0x08, + C2_QP_STATE_TERMINATE = 0x10, + C2_QP_STATE_ERROR = 0x20, +}; + +/* Completion queue entry. */ +struct c2wr_ce { + struct c2wr_hdr hdr; /* Has status and WR Type */ + u64 qp_user_context; /* c2_user_qp_t * */ + u32 qp_state; /* Current QP State */ + u32 handle; /* QPID or EP Handle */ + u32 bytes_rcvd; /* valid for RECV WCs */ + u32 stag; +} __attribute__((packed)) ; + + +/* + * Flags used for all post-sq WRs. These must fit in the flags + * field of the struct c2wr_hdr (eight bits). + */ +enum { + SQ_SIGNALED = 0x01, + SQ_READ_FENCE = 0x02, + SQ_FENCE = 0x04, +}; + +/* + * Common fields for all post-sq WRs. Namely the standard header and a + * secondary header with fields common to all post-sq WRs. + */ +struct c2_sq_hdr { + struct c2wr_user_hdr user_hdr; +} __attribute__((packed)); + +/* + * Same as above but for post-rq WRs. + */ +struct c2_rq_hdr { + struct c2wr_user_hdr user_hdr; +} __attribute__((packed)); + +/* + * use the same struct for all sends. + */ +struct c2wr_send_req { + struct c2_sq_hdr sq_hdr; + u32 sge_len; + u32 remote_stag; + u8 data[0]; /* SGE array */ +} __attribute__((packed)); +/* XXX c2wr_send_req_t, c2wr_send_se_req_t, c2wr_send_inv_req_t, + c2wr_send_se_inv_req_t;*/ + +union c2wr_send { + struct c2wr_send_req req; + struct c2wr_ce rep; +} __attribute__((packed)); + +struct c2wr_rdma_write_req { + struct c2_sq_hdr sq_hdr; + u64 remote_to; + u32 remote_stag; + u32 sge_len; + u8 data[0]; /* SGE array */ +} __attribute__((packed)); + +union c2wr_rdma_write { + struct c2wr_rdma_write_req req; + struct c2wr_ce rep; +} __attribute__((packed)); + +struct c2wr_rdma_read_req { + struct c2_sq_hdr sq_hdr; + u64 local_to; + u64 remote_to; + u32 local_stag; + u32 remote_stag; + u32 length; +} __attribute__((packed)); + +union c2wr_rdma_read { + struct c2wr_rdma_read_req req; + struct c2wr_ce rep; +} __attribute__((packed)); + +struct c2wr_mw_bind_req { + struct c2_sq_hdr sq_hdr; + u64 va; + u8 stag_key; + u8 pad[3]; + u32 mw_stag_index; + u32 mr_stag_index; + u32 length; + u32 flags; +} __attribute__((packed)); + +union c2wr_mw_bind { + struct c2wr_mw_bind_req req; + struct c2wr_ce rep; +} __attribute__((packed)); + +struct c2wr_nsmr_fastreg_req { + struct c2_sq_hdr sq_hdr; + u64 va; + u8 stag_key; + u8 pad[3]; + u32 stag_index; + u32 pbe_size; + u32 fbo; + u32 length; + u32 addrs_length; + /* array of paddrs (must be aligned on a 64bit boundary) */ + u64 paddrs[0]; +} __attribute__((packed)); + +union c2wr_nsmr_fastreg { + struct c2wr_nsmr_fastreg_req req; + struct c2wr_ce rep; +} __attribute__((packed)); + +struct c2wr_stag_invalidate_req { + struct c2_sq_hdr sq_hdr; + u8 stag_key; + u8 pad[3]; + u32 stag_index; +} __attribute__((packed)); + +union c2wr_stag_invalidate { + struct c2wr_stag_invalidate_req req; + struct c2wr_ce rep; +} __attribute__((packed)); + +union c2wr_sqwr { + struct c2_sq_hdr sq_hdr; + struct c2wr_send_req send; + struct c2wr_send_req send_se; + struct c2wr_send_req send_inv; + struct c2wr_send_req send_se_inv; + struct c2wr_rdma_write_req rdma_write; + struct c2wr_rdma_read_req rdma_read; + struct c2wr_mw_bind_req mw_bind; + struct c2wr_nsmr_fastreg_req nsmr_fastreg; + struct c2wr_stag_invalidate_req stag_inv; +} __attribute__((packed)); + + +/* + * RQ WRs + */ +struct c2wr_rqwr { + struct c2_rq_hdr rq_hdr; + u8 data[0]; /* array of SGEs */ +} __attribute__((packed)); +/* XXX c2wr_rqwr_t, c2wr_recv_req_t; */ + +union c2wr_recv { + struct c2wr_rqwr req; + struct c2wr_ce rep; +} __attribute__((packed)); + +/* + * All AEs start with this header. Most AEs only need to convey the + * information in the header. Some, like LLP connection events, need + * more info. The union typdef c2wr_ae_t has all the possible AEs. + * + * hdr.context is the user_context from the rnic_open WR. NULL If this + * is not affiliated with an rnic + * + * hdr.id is the AE identifier (eg; CCAE_REMOTE_SHUTDOWN, + * CCAE_LLP_CLOSE_COMPLETE) + * + * resource_type is one of: C2_RES_IND_QP, C2_RES_IND_CQ, C2_RES_IND_SRQ + * + * user_context is the context passed down when the host created the resource. + */ +struct c2wr_ae_hdr { + struct c2wr_hdr hdr; + u64 user_context; /* user context for this res. */ + u32 resource_type; /* see enum c2_resource_indicator */ + u32 resource; /* handle for resource */ + u32 qp_state; /* current QP State */ +} __attribute__((packed)); + +/* + * After submitting the CCAE_ACTIVE_CONNECT_RESULTS message on the AEQ, + * the adapter moves the QP into RTS state + */ +struct c2wr_ae_active_connect_results { + struct c2wr_ae_hdr ae_hdr; + u32 laddr; + u32 raddr; + u16 lport; + u16 rport; + u32 private_data_length; + u8 private_data[0]; /* data is in-line in the msg. */ +} __attribute__((packed)); + +/* + * When connections are established by the stack (and the private data + * MPA frame is received), the adapter will generate an event to the host. + * The details of the connection, any private data, and the new connection + * request handle is passed up via the CCAE_CONNECTION_REQUEST msg on the + * AE queue: + */ +struct c2wr_ae_connection_request { + struct c2wr_ae_hdr ae_hdr; + u32 cr_handle; /* connreq handle (sock ptr) */ + u32 laddr; + u32 raddr; + u16 lport; + u16 rport; + u32 private_data_length; + u8 private_data[0]; /* data is in-line in the msg. */ +} __attribute__((packed)); + +union c2wr_ae { + struct c2wr_ae_hdr ae_generic; + struct c2wr_ae_active_connect_results ae_active_connect_results; + struct c2wr_ae_connection_request ae_connection_request; +} __attribute__((packed)); + +struct c2wr_init_req { + struct c2wr_hdr hdr; + u64 hint_count; + u64 q0_host_shared; + u64 q1_host_shared; + u64 q1_host_msg_pool; + u64 q2_host_shared; + u64 q2_host_msg_pool; +} __attribute__((packed)); + +struct c2wr_init_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)); + +union c2wr_init { + struct c2wr_init_req req; + struct c2wr_init_rep rep; +} __attribute__((packed)); + +/* + * For upgrading flash. + */ + +struct c2wr_flash_init_req { + struct c2wr_hdr hdr; + u32 rnic_handle; +} __attribute__((packed)); + +struct c2wr_flash_init_rep { + struct c2wr_hdr hdr; + u32 adapter_flash_buf_offset; + u32 adapter_flash_len; +} __attribute__((packed)); + +union c2wr_flash_init { + struct c2wr_flash_init_req req; + struct c2wr_flash_init_rep rep; +} __attribute__((packed)); + +struct c2wr_flash_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 len; +} __attribute__((packed)); + +struct c2wr_flash_rep { + struct c2wr_hdr hdr; + u32 status; +} __attribute__((packed)); + +union c2wr_flash { + struct c2wr_flash_req req; + struct c2wr_flash_rep rep; +} __attribute__((packed)); + +struct c2wr_buf_alloc_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 size; +} __attribute__((packed)); + +struct c2wr_buf_alloc_rep { + struct c2wr_hdr hdr; + u32 offset; /* 0 if mem not available */ + u32 size; /* 0 if mem not available */ +} __attribute__((packed)); + +union c2wr_buf_alloc { + struct c2wr_buf_alloc_req req; + struct c2wr_buf_alloc_rep rep; +} __attribute__((packed)); + +struct c2wr_buf_free_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 offset; /* Must match value from alloc */ + u32 size; /* Must match value from alloc */ +} __attribute__((packed)); + +struct c2wr_buf_free_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)); + +union c2wr_buf_free { + struct c2wr_buf_free_req req; + struct c2wr_ce rep; +} __attribute__((packed)); + +struct c2wr_flash_write_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 offset; + u32 size; + u32 type; + u32 flags; +} __attribute__((packed)); + +struct c2wr_flash_write_rep { + struct c2wr_hdr hdr; + u32 status; +} __attribute__((packed)); + +union c2wr_flash_write { + struct c2wr_flash_write_req req; + struct c2wr_flash_write_rep rep; +} __attribute__((packed)); + +/* + * Messages for LLP connection setup. + */ + +/* + * Listen Request. This allocates a listening endpoint to allow passive + * connection setup. Newly established LLP connections are passed up + * via an AE. See c2wr_ae_connection_request_t + */ +struct c2wr_ep_listen_create_req { + struct c2wr_hdr hdr; + u64 user_context; /* returned in AEs. */ + u32 rnic_handle; + u32 local_addr; /* local addr, or 0 */ + u16 local_port; /* 0 means "pick one" */ + u16 pad; + u32 backlog; /* tradional tcp listen bl */ +} __attribute__((packed)); + +struct c2wr_ep_listen_create_rep { + struct c2wr_hdr hdr; + u32 ep_handle; /* handle to new listening ep */ + u16 local_port; /* resulting port... */ + u16 pad; +} __attribute__((packed)); + +union c2wr_ep_listen_create { + struct c2wr_ep_listen_create_req req; + struct c2wr_ep_listen_create_rep rep; +} __attribute__((packed)); + +struct c2wr_ep_listen_destroy_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 ep_handle; +} __attribute__((packed)); + +struct c2wr_ep_listen_destroy_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)); + +union c2wr_ep_listen_destroy { + struct c2wr_ep_listen_destroy_req req; + struct c2wr_ep_listen_destroy_rep rep; +} __attribute__((packed)); + +struct c2wr_ep_query_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 ep_handle; +} __attribute__((packed)); + +struct c2wr_ep_query_rep { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 local_addr; + u32 remote_addr; + u16 local_port; + u16 remote_port; +} __attribute__((packed)); + +union c2wr_ep_query { + struct c2wr_ep_query_req req; + struct c2wr_ep_query_rep rep; +} __attribute__((packed)); + + +/* + * The host passes this down to indicate acceptance of a pending iWARP + * connection. The cr_handle was obtained from the CONNECTION_REQUEST + * AE passed up by the adapter. See c2wr_ae_connection_request_t. + */ +struct c2wr_cr_accept_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 qp_handle; /* QP to bind to this LLP conn */ + u32 ep_handle; /* LLP handle to accept */ + u32 private_data_length; + u8 private_data[0]; /* data in-line in msg. */ +} __attribute__((packed)); + +/* + * adapter sends reply when private data is successfully submitted to + * the LLP. + */ +struct c2wr_cr_accept_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)); + +union c2wr_cr_accept { + struct c2wr_cr_accept_req req; + struct c2wr_cr_accept_rep rep; +} __attribute__((packed)); + +/* + * The host sends this down if a given iWARP connection request was + * rejected by the consumer. The cr_handle was obtained from a + * previous c2wr_ae_connection_request_t AE sent by the adapter. + */ +struct c2wr_cr_reject_req { + struct c2wr_hdr hdr; + u32 rnic_handle; + u32 ep_handle; /* LLP handle to reject */ +} __attribute__((packed)); + +/* + * Dunno if this is needed, but we'll add it for now. The adapter will + * send the reject_reply after the LLP endpoint has been destroyed. + */ +struct c2wr_cr_reject_rep { + struct c2wr_hdr hdr; +} __attribute__((packed)); + +union c2wr_cr_reject { + struct c2wr_cr_reject_req req; + struct c2wr_cr_reject_rep rep; +} __attribute__((packed)); + +/* + * console command. Used to implement a debug console over the verbs + * request and reply queues. + */ + +/* + * Console request message. It contains: + * - message hdr with id = CCWR_CONSOLE + * - the physaddr/len of host memory to be used for the reply. + * - the command string. eg: "netstat -s" or "zoneinfo" + */ +struct c2wr_console_req { + struct c2wr_hdr hdr; /* id = CCWR_CONSOLE */ + u64 reply_buf; /* pinned host buf for reply */ + u32 reply_buf_len; /* length of reply buffer */ + u8 command[0]; /* NUL terminated ascii string */ + /* containing the command req */ +} __attribute__((packed)); + +/* + * flags used in the console reply. + */ +enum c2_console_flags { + CONS_REPLY_TRUNCATED = 0x00000001 /* reply was truncated */ +} __attribute__((packed)); + +/* + * Console reply message. + * hdr.result contains the c2_status_t error if the reply was _not_ generated, + * or C2_OK if the reply was generated. + */ +struct c2wr_console_rep { + struct c2wr_hdr hdr; /* id = CCWR_CONSOLE */ + u32 flags; +} __attribute__((packed)); + +union c2wr_console { + struct c2wr_console_req req; + struct c2wr_console_rep rep; +} __attribute__((packed)); + + +/* + * Giant union with all WRs. Makes life easier... + */ +union c2wr { + struct c2wr_hdr hdr; + struct c2wr_user_hdr user_hdr; + union c2wr_rnic_open rnic_open; + union c2wr_rnic_query rnic_query; + union c2wr_rnic_getconfig rnic_getconfig; + union c2wr_rnic_setconfig rnic_setconfig; + union c2wr_rnic_close rnic_close; + union c2wr_cq_create cq_create; + union c2wr_cq_modify cq_modify; + union c2wr_cq_destroy cq_destroy; + union c2wr_pd_alloc pd_alloc; + union c2wr_pd_dealloc pd_dealloc; + union c2wr_srq_create srq_create; + union c2wr_srq_destroy srq_destroy; + union c2wr_qp_create qp_create; + union c2wr_qp_query qp_query; + union c2wr_qp_modify qp_modify; + union c2wr_qp_destroy qp_destroy; + struct c2wr_qp_connect qp_connect; + union c2wr_nsmr_stag_alloc nsmr_stag_alloc; + union c2wr_nsmr_register nsmr_register; + union c2wr_nsmr_pbl nsmr_pbl; + union c2wr_mr_query mr_query; + union c2wr_mw_query mw_query; + union c2wr_stag_dealloc stag_dealloc; + union c2wr_sqwr sqwr; + struct c2wr_rqwr rqwr; + struct c2wr_ce ce; + union c2wr_ae ae; + union c2wr_init init; + union c2wr_ep_listen_create ep_listen_create; + union c2wr_ep_listen_destroy ep_listen_destroy; + union c2wr_cr_accept cr_accept; + union c2wr_cr_reject cr_reject; + union c2wr_console console; + union c2wr_flash_init flash_init; + union c2wr_flash flash; + union c2wr_buf_alloc buf_alloc; + union c2wr_buf_free buf_free; + union c2wr_flash_write flash_write; +} __attribute__((packed)); + + +/* + * Accessors for the wr fields that are packed together tightly to + * reduce the wr message size. The wr arguments are void* so that + * either a struct c2wr*, a struct c2wr_hdr*, or a pointer to any of the types + * in the struct c2wr union can be passed in. + */ +static __inline__ u8 c2_wr_get_id(void *wr) +{ + return ((struct c2wr_hdr *) wr)->id; +} +static __inline__ void c2_wr_set_id(void *wr, u8 id) +{ + ((struct c2wr_hdr *) wr)->id = id; +} +static __inline__ u8 c2_wr_get_result(void *wr) +{ + return ((struct c2wr_hdr *) wr)->result; +} +static __inline__ void c2_wr_set_result(void *wr, u8 result) +{ + ((struct c2wr_hdr *) wr)->result = result; +} +static __inline__ u8 c2_wr_get_flags(void *wr) +{ + return ((struct c2wr_hdr *) wr)->flags; +} +static __inline__ void c2_wr_set_flags(void *wr, u8 flags) +{ + ((struct c2wr_hdr *) wr)->flags = flags; +} +static __inline__ u8 c2_wr_get_sge_count(void *wr) +{ + return ((struct c2wr_hdr *) wr)->sge_count; +} +static __inline__ void c2_wr_set_sge_count(void *wr, u8 sge_count) +{ + ((struct c2wr_hdr *) wr)->sge_count = sge_count; +} +static __inline__ u32 c2_wr_get_wqe_count(void *wr) +{ + return ((struct c2wr_hdr *) wr)->wqe_count; +} +static __inline__ void c2_wr_set_wqe_count(void *wr, u32 wqe_count) +{ + ((struct c2wr_hdr *) wr)->wqe_count = wqe_count; +} + +#endif /* _C2_WR_H_ */ From mshefty at ichips.intel.com Wed May 31 14:15:24 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 31 May 2006 14:15:24 -0700 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30243EF3D@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30243EF3D@mtlexch01.mtl.com> Message-ID: <447E076C.80707@ichips.intel.com> Eitan Zahavi wrote: > [EZ] The race is happening when the SM received the request and > responded but the other SMs or the file system did not fully stored that > registration and the SM crashed. If the client received a response that the join was successful, then I consider that an SM issue. The problem is that the SM lost *its* state information. Requiring end nodes to maintain the SM's state for it still doesn't make sense to me. Your converting an SM issue into a requirement that all end nodes must support for proper operation. Why can't the local system store the same data in another process? (E.g. record all join MADs that have been processed by the SM.) Why can't that data be saved to disk? Why can't some other arbitrary system in the fabric save that data? I still believe that there are a lot of potential solutions to this problem than requiring end nodes to maintain the SM's state. - Sean From admin at aol.com Wed May 31 14:12:25 2006 From: admin at aol.com (Aol) Date: Wed, 31 May 2006 23:12:25 +0200 (CEST) Subject: [openib-general] Aol New Message-ID: <20060531211225.30DEE125269@dd5436.kasserver.com> AOL offers to you a space of stoke of given personal and a domain name .com with hegergment free 0$ but it is necessary to have a bank card even if it is empty http://www.aol.ift.fr/ From mshefty at ichips.intel.com Wed May 31 14:26:42 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 31 May 2006 14:26:42 -0700 Subject: [openib-general] Re: [PATCH 2/2] iWARP Core Changes. In-Reply-To: <20060531182654.3308.41372.stgit@stevo-desktop> References: <20060531182650.3308.81538.stgit@stevo-desktop> <20060531182654.3308.41372.stgit@stevo-desktop> Message-ID: <447E0A12.5090209@ichips.intel.com> Mainly nits... Steve Wise wrote: > -static int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, > +int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, > unsigned char *dst_dev_addr) Might want to rename this to something like rdma_copy_addr if you're going to export it. > +static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) > +{ > + struct rdma_id_private *id_priv = iw_id->context; > + enum rdma_cm_event_type event = 0; > + struct sockaddr_in *sin; > + int ret = 0; > + > + atomic_inc(&id_priv->dev_remove); > + > + switch (iw_event->event) { > + case IW_CM_EVENT_CLOSE: > + event = RDMA_CM_EVENT_DISCONNECTED; > + break; > + case IW_CM_EVENT_CONNECT_REPLY: > + sin = (struct sockaddr_in*)&id_priv->id.route.addr.src_addr; > + *sin = iw_event->local_addr; > + sin = (struct sockaddr_in*)&id_priv->id.route.addr.dst_addr; spacing nit - (struct sockaddr_in *) &id_priv->... > +struct net_device *ip_dev_find(u32 ip); Just include header file with definition. > + sin = (struct sockaddr_in*)&new_cm_id->route.addr.src_addr; > + *sin = iw_event->local_addr; > + sin = (struct sockaddr_in*)&new_cm_id->route.addr.dst_addr; same spacing nit... appears in a couple other places as well. > +static inline union ib_gid* iw_addr_get_sgid(struct rdma_dev_addr* rda) > +{ > + return (union ib_gid*)rda->src_dev_addr; > +} > + > +static inline union ib_gid* iw_addr_get_dgid(struct rdma_dev_addr* rda) > +{ > + return (union ib_gid*)rda->dst_dev_addr; > +} spacing nits > +struct iw_cm_verbs; > struct ib_device { > struct device *dma_device; > > @@ -846,6 +873,8 @@ struct ib_device { > > u32 flags; > > + struct iw_cm_verbs* iwcm; > + '*' placement nit - Sean From robert.j.woodruff at intel.com Wed May 31 14:45:55 2006 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Wed, 31 May 2006 14:45:55 -0700 Subject: [openib-general] OFED RC6 Tag Message-ID: <1AC79F16F5C5284499BB9591B33D6F0007D8CE2B@orsmsx408> Hi, I noticed that you now have a rc6 tag for the OFED kernel code. Is there a tag for the userspace code ? or what SVN rev will be used for RC6. woody From erika2006 at cooltoad.com Wed May 31 15:32:49 2006 From: erika2006 at cooltoad.com (erika2006 at cooltoad.com) Date: Wed, 31 May 2006 15:32:49 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCM2RAWkBsTGcycTB3QCkbKEI=?= =?iso-2022-jp?b?GyRCMXxNTUYxOSUycRsoQg==?= Message-ID: 20060601060559.54587mail@mail.hyper_grandy552158754_lookserver772_serebusystem03_woman-grandy.tv ・*☆*・・*☆*・・*☆*・・*☆*・*☆*・*☆*・*☆*・*☆*・*☆*・*☆*・ 更なる飛躍を求めてこの度「高級会員制奥様倶楽部 プラチナム」と 「サポート専門女性紹介所 一期一会」の両団体が合併を経て 団体名を【grandee】といたしました。以前より懸念視されていた点も 大幅に改善され、心機一転、万全の状態で皆様をお待ちしております。 ・*☆*・・*☆*・・*☆*・・*☆*・*☆*・*☆*・*☆*・*☆*・*☆*・*☆*・ :*.☆。       http://club-grandee.cx/h/         。☆.*: 今回は以前からご登録になっております方々にご報告を兼ねてつつ 感謝気持ちを込めて〈サポート額解除枠〉を進呈いたします。 ↓詳しくはこちらからどうぞ↓ http://club-grandee.cx/h/ ↑↑〈サポート額解除枠〉↑↑ 以前から定評のありましたセレブな女性方とのエスコートデートコース も勿論健在です。また女性側の入会審査に関しましては以前にも 増して厳しい審査を設けております。下記資料をご参考ください。 【入会資格(女性)】……………………………………………………… ・18歳以上で収入のある女性 ・上記条件を原則とし、その他本クラブ独自の入会審査基準を適用 【必要書類(女性)】……………………………………………………… ・納税証明書 ・入会申込書 ・自己プロフィールフォーム ・住民票(1ヶ月以内に発行されたものもしくはそのコピー) ・公的資格証明書(特別な資格を証明する書類、医師・弁護士等) ・写真(3ヶ月以内に撮影したはっきりと映っている物) ・デジカメまたは携帯電話による写真は長辺が最低120ピクセル以上 以上が当サークルで設けております厳しい審査となります。 これからも、よりグレードの高い女性会員様との出会いをサポートし続けます。 …………☆…………☆…………☆…………☆…………☆………… ご不明な点に関しては http://club-grandee.cx/h/ よりどうぞ …………☆…………☆…………☆…………☆…………☆………… 女性の方はこちらから→ http://club-grandee.cx/h/                 【grandee】代表   姫野 未耶 From mshefty at ichips.intel.com Wed May 31 15:22:24 2006 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 31 May 2006 15:22:24 -0700 Subject: [openib-general] Re: [PATCH 1/2] iWARP Connection Manager. In-Reply-To: <20060531182652.3308.1244.stgit@stevo-desktop> References: <20060531182650.3308.81538.stgit@stevo-desktop> <20060531182652.3308.1244.stgit@stevo-desktop> Message-ID: <447E1720.7000307@ichips.intel.com> Steve Wise wrote: > +/* > + * Release a reference on cm_id. If the last reference is being removed > + * and iw_destroy_cm_id is waiting, wake up the waiting thread. > + */ > +static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) > +{ > + int ret = 0; > + > + BUG_ON(atomic_read(&cm_id_priv->refcount)==0); > + if (atomic_dec_and_test(&cm_id_priv->refcount)) { > + BUG_ON(!list_empty(&cm_id_priv->work_list)); > + if (waitqueue_active(&cm_id_priv->destroy_wait)) { > + BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING); > + BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, > + &cm_id_priv->flags)); > + ret = 1; > + wake_up(&cm_id_priv->destroy_wait); We recently changed the RDMA CM, IB CM, and a couple of other modules from using wait objects to completions. This avoids a race condition between decrementing the reference count, which allows destruction to proceed, and calling wake_up on a freed cm_id. My guess is that you may need to do the same. Can you also explain the use of the return value here? It's ignored below in rem_ref() and destroy_cm_id(). > +static void add_ref(struct iw_cm_id *cm_id) > +{ > + struct iwcm_id_private *cm_id_priv; > + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); > + atomic_inc(&cm_id_priv->refcount); > +} > + > +static void rem_ref(struct iw_cm_id *cm_id) > +{ > + struct iwcm_id_private *cm_id_priv; > + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); > + iwcm_deref_id(cm_id_priv); > +} > + > +/* > + * CM_ID <-- CLOSING > + * > + * Block if a passive or active connection is currenlty being processed. Then > + * process the event as follows: > + * - If we are ESTABLISHED, move to CLOSING and modify the QP state > + * based on the abrupt flag > + * - If the connection is already in the CLOSING or IDLE state, the peer is > + * disconnecting concurrently with us and we've already seen the > + * DISCONNECT event -- ignore the request and return 0 > + * - Disconnect on a listening endpoint returns -EINVAL > + */ > +int iw_cm_disconnect(struct iw_cm_id *cm_id, int abrupt) > +{ > + struct iwcm_id_private *cm_id_priv; > + unsigned long flags; > + int ret = 0; > + > + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); > + /* Wait if we're currently in a connect or accept downcall */ > + wait_event(cm_id_priv->connect_wait, > + !test_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags)); Am I understanding this check correctly? You're checking to see if the user has called iw_cm_disconnect() at the same time that they called iw_cm_connect() or iw_cm_accept(). Are connect / accept blocking, or are you just waiting for an event? > + > + spin_lock_irqsave(&cm_id_priv->lock, flags); > + switch (cm_id_priv->state) { > + case IW_CM_STATE_ESTABLISHED: > + cm_id_priv->state = IW_CM_STATE_CLOSING; > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + if (cm_id_priv->qp) { /* QP could be for user-mode client */ > + if (abrupt) > + ret = iwcm_modify_qp_err(cm_id_priv->qp); > + else > + ret = iwcm_modify_qp_sqd(cm_id_priv->qp); > + /* > + * If both sides are disconnecting the QP could > + * already be in ERR or SQD states > + */ > + ret = 0; > + } > + else > + ret = -EINVAL; > + break; > + case IW_CM_STATE_LISTEN: > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + ret = -EINVAL; > + break; > + case IW_CM_STATE_CLOSING: > + /* remote peer closed first */ > + case IW_CM_STATE_IDLE: > + /* accept or connect returned !0 */ > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + break; > + case IW_CM_STATE_CONN_RECV: > + /* > + * App called disconnect before/without calling accept after > + * connect_request event delivered. > + */ > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + break; > + case IW_CM_STATE_CONN_SENT: > + /* Can only get here if wait above fails */ > + default: > + BUG_ON(1); > + } > + > + return ret; > +} > +EXPORT_SYMBOL(iw_cm_disconnect); > +static void destroy_cm_id(struct iw_cm_id *cm_id) > +{ > + struct iwcm_id_private *cm_id_priv; > + unsigned long flags; > + int ret; > + > + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); > + /* Wait if we're currently in a connect or accept downcall. A > + * listening endpoint should never block here. */ > + wait_event(cm_id_priv->connect_wait, > + !test_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags)); Same question/comment as above. > + > + spin_lock_irqsave(&cm_id_priv->lock, flags); > + switch (cm_id_priv->state) { > + case IW_CM_STATE_LISTEN: > + cm_id_priv->state = IW_CM_STATE_DESTROYING; > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + /* destroy the listening endpoint */ > + ret = cm_id->device->iwcm->destroy_listen(cm_id); > + break; > + case IW_CM_STATE_ESTABLISHED: > + cm_id_priv->state = IW_CM_STATE_DESTROYING; > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + /* Abrupt close of the connection */ > + (void)iwcm_modify_qp_err(cm_id_priv->qp); > + break; > + case IW_CM_STATE_IDLE: > + case IW_CM_STATE_CLOSING: > + cm_id_priv->state = IW_CM_STATE_DESTROYING; > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + break; > + case IW_CM_STATE_CONN_RECV: > + /* > + * App called destroy before/without calling accept after > + * receiving connection request event notification. > + */ > + cm_id_priv->state = IW_CM_STATE_DESTROYING; > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + break; > + case IW_CM_STATE_CONN_SENT: > + case IW_CM_STATE_DESTROYING: > + default: > + BUG_ON(1); > + break; > + } > + > + spin_lock_irqsave(&cm_id_priv->lock, flags); As an alternative, you could hold the lock from above, an let the LISTEN / ESTABLISHED state checks release and reacquire. > + if (cm_id_priv->qp) { > + cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp); > + cm_id_priv->qp = NULL; > + } > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + > + (void)iwcm_deref_id(cm_id_priv); > +} > + > +/* > + * This function is only called by the application thread and cannot > + * be called by the event thread. The function will wait for all > + * references to be released on the cm_id and then kfree the cm_id > + * object. > + */ > +void iw_destroy_cm_id(struct iw_cm_id *cm_id) > +{ > + struct iwcm_id_private *cm_id_priv; > + > + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); > + BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)); > + > + destroy_cm_id(cm_id); > + > + wait_event(cm_id_priv->destroy_wait, > + !atomic_read(&cm_id_priv->refcount)); > + > + kfree(cm_id_priv); > +} > +EXPORT_SYMBOL(iw_destroy_cm_id); > + > +/* > + * CM_ID <-- LISTEN > + * > + * Start listening for connect requests. Generates one CONNECT_REQUEST > + * event for each inbound connect request. > + */ > +int iw_cm_listen(struct iw_cm_id *cm_id, int backlog) > +{ > + struct iwcm_id_private *cm_id_priv; > + unsigned long flags; > + int ret = 0; > + > + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); > + spin_lock_irqsave(&cm_id_priv->lock, flags); > + switch (cm_id_priv->state) { > + case IW_CM_STATE_IDLE: > + cm_id_priv->state = IW_CM_STATE_LISTEN; > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + ret = cm_id->device->iwcm->create_listen(cm_id, backlog); > + if (ret) > + cm_id_priv->state = IW_CM_STATE_IDLE; > + break; > + default: > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + ret = -EINVAL; > + } > + > + return ret; > +} > +EXPORT_SYMBOL(iw_cm_listen); > + > +/* > + * CM_ID <-- IDLE > + * > + * Rejects an inbound connection request. No events are generated. > + */ > +int iw_cm_reject(struct iw_cm_id *cm_id, > + const void *private_data, > + u8 private_data_len) > +{ > + struct iwcm_id_private *cm_id_priv; > + unsigned long flags; > + int ret; > + > + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); > + set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); > + > + spin_lock_irqsave(&cm_id_priv->lock, flags); > + if (cm_id_priv->state != IW_CM_STATE_CONN_RECV) { > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); > + wake_up_all(&cm_id_priv->connect_wait); > + return -EINVAL; > + } > + cm_id_priv->state = IW_CM_STATE_IDLE; > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + > + ret = cm_id->device->iwcm->reject(cm_id, private_data, > + private_data_len); > + > + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); > + wake_up_all(&cm_id_priv->connect_wait); > + > + return ret; > +} > +EXPORT_SYMBOL(iw_cm_reject); > + > +/* > + * CM_ID <-- ESTABLISHED > + * > + * Accepts an inbound connection request and generates an ESTABLISHED > + * event. Callers of iw_cm_disconnect and iw_destroy_cm_id will block > + * until the ESTABLISHED event is received from the provider. > + */ This makes it sound like we're just waiting for an event. > +int iw_cm_accept(struct iw_cm_id *cm_id, > + struct iw_cm_conn_param *iw_param) > +{ > + struct iwcm_id_private *cm_id_priv; > + struct ib_qp *qp; > + unsigned long flags; > + int ret; > + > + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); > + set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); > + > + spin_lock_irqsave(&cm_id_priv->lock, flags); > + if (cm_id_priv->state != IW_CM_STATE_CONN_RECV) { > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); > + wake_up_all(&cm_id_priv->connect_wait); > + return -EINVAL; > + } > + /* Get the ib_qp given the QPN */ > + qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn); > + if (!qp) { > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + return -EINVAL; > + } > + cm_id->device->iwcm->add_ref(qp); > + cm_id_priv->qp = qp; > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + > + ret = cm_id->device->iwcm->accept(cm_id, iw_param); > + if (ret) { > + /* An error on accept precludes provider events */ > + BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_RECV); > + cm_id_priv->state = IW_CM_STATE_IDLE; > + spin_lock_irqsave(&cm_id_priv->lock, flags); > + if (cm_id_priv->qp) { > + cm_id->device->iwcm->rem_ref(qp); > + cm_id_priv->qp = NULL; > + } > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + printk("Accept failed, ret=%d\n", ret); > + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); > + wake_up_all(&cm_id_priv->connect_wait); > + } > + > + return ret; > +} > +EXPORT_SYMBOL(iw_cm_accept); > + > +/* > + * Active Side: CM_ID <-- CONN_SENT > + * > + * If successful, results in the generation of a CONNECT_REPLY > + * event. iw_cm_disconnect and iw_cm_destroy will block until the > + * CONNECT_REPLY event is received from the provider. > + */ > +int iw_cm_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param) > +{ > + struct iwcm_id_private *cm_id_priv; > + int ret = 0; > + unsigned long flags; > + struct ib_qp *qp; > + > + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); > + set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); > + > + spin_lock_irqsave(&cm_id_priv->lock, flags); > + if (cm_id_priv->state != IW_CM_STATE_IDLE) { > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); > + wake_up_all(&cm_id_priv->connect_wait); > + return -EINVAL; > + } > + > + /* Get the ib_qp given the QPN */ > + qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn); > + if (!qp) { > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + return -EINVAL; > + } > + cm_id->device->iwcm->add_ref(qp); > + cm_id_priv->qp = qp; > + cm_id_priv->state = IW_CM_STATE_CONN_SENT; > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + > + ret = cm_id->device->iwcm->connect(cm_id, iw_param); > + if (ret) { > + spin_lock_irqsave(&cm_id_priv->lock, flags); > + if (cm_id_priv->qp) { > + cm_id->device->iwcm->rem_ref(qp); > + cm_id_priv->qp = NULL; > + } > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_SENT); > + cm_id_priv->state = IW_CM_STATE_IDLE; > + printk("Connect failed, ret=%d\n", ret); > + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); > + wake_up_all(&cm_id_priv->connect_wait); > + } > + > + return ret; > +} > +EXPORT_SYMBOL(iw_cm_connect); > + > +/* > + * Passive Side: new CM_ID <-- CONN_RECV > + * > + * Handles an inbound connect request. The function creates a new > + * iw_cm_id to represent the new connection and inherits the client > + * callback function and other attributes from the listening parent. > + * > + * The work item contains a pointer to the listen_cm_id and the event. The > + * listen_cm_id contains the client cm_handler, context and > + * device. These are copied when the device is cloned. The event > + * contains the new four tuple. > + * > + * An error on the child should not affect the parent, so this > + * function does not return a value. > + */ > +static void cm_conn_req_handler(struct iwcm_id_private *listen_id_priv, > + struct iw_cm_event *iw_event) > +{ > + unsigned long flags; > + struct iw_cm_id *cm_id; > + struct iwcm_id_private *cm_id_priv; > + int ret; > + > + /* The provider should never generate a connection request > + * event with a bad status. > + */ > + BUG_ON(iw_event->status); > + > + /* We could be destroying the listening id. If so, ignore this > + * upcall. */ > + spin_lock_irqsave(&listen_id_priv->lock, flags); > + if (listen_id_priv->state != IW_CM_STATE_LISTEN) { > + spin_unlock_irqrestore(&listen_id_priv->lock, flags); > + return; > + } > + spin_unlock_irqrestore(&listen_id_priv->lock, flags); > + > + cm_id = iw_create_cm_id(listen_id_priv->id.device, > + listen_id_priv->id.cm_handler, > + listen_id_priv->id.context); > + /* If the cm_id could not be created, ignore the request */ > + if (IS_ERR(cm_id)) > + return; > + > + cm_id->provider_data = iw_event->provider_data; > + cm_id->local_addr = iw_event->local_addr; > + cm_id->remote_addr = iw_event->remote_addr; > + > + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); > + cm_id_priv->state = IW_CM_STATE_CONN_RECV; > + > + /* Call the client CM handler */ > + ret = cm_id->cm_handler(cm_id, iw_event); > + if (ret) { > + printk("destroying child id %p, ret=%d\n", > + cm_id, ret); We probably don't always want to print a message here. > + set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); > + destroy_cm_id(cm_id); > + if (atomic_read(&cm_id_priv->refcount)==0) > + kfree(cm_id); > + } > +} > + > +/* > + * Passive Side: CM_ID <-- ESTABLISHED > + * > + * The provider generated an ESTABLISHED event which means that > + * the MPA negotion has completed successfully and we are now in MPA > + * FPDU mode. > + * > + * This event can only be received in the CONN_RECV state. If the > + * remote peer closed, the ESTABLISHED event would be received followed > + * by the CLOSE event. If the app closes, it will block until we wake > + * it up after processing this event. > + */ > +static int cm_conn_est_handler(struct iwcm_id_private *cm_id_priv, > + struct iw_cm_event *iw_event) > +{ > + unsigned long flags; > + int ret = 0; > + > + spin_lock_irqsave(&cm_id_priv->lock, flags); > + > + /* We clear the CONNECT_WAIT bit here to allow the callback > + * function to call iw_cm_disconnect. Calling iw_destroy_cm_id > + * from a callback handler is not allowed */ > + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); > + switch (cm_id_priv->state) { > + case IW_CM_STATE_CONN_RECV: > + cm_id_priv->state = IW_CM_STATE_ESTABLISHED; > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event); > + break; > + default: > + BUG_ON(1); Can just BUG_ON the state and avoid the switch. Same comment applies below. > + } > + wake_up_all(&cm_id_priv->connect_wait); > + > + return ret; > +} > + > +/* > + * Active Side: CM_ID <-- ESTABLISHED > + * > + * The app has called connect and is waiting for the established event to > + * post it's requests to the server. This event will wake up anyone > + * blocked in iw_cm_disconnect or iw_destroy_id. > + */ > +static int cm_conn_rep_handler(struct iwcm_id_private *cm_id_priv, > + struct iw_cm_event *iw_event) > +{ > + unsigned long flags; > + int ret = 0; > + > + spin_lock_irqsave(&cm_id_priv->lock, flags); > + /* Clear the connect wait bit so a callback function calling > + * iw_cm_disconnect will not wait and deadlock this thread */ > + clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); > + switch (cm_id_priv->state) { > + case IW_CM_STATE_CONN_SENT: > + if (iw_event->status == IW_CM_EVENT_STATUS_ACCEPTED) { > + cm_id_priv->id.local_addr = iw_event->local_addr; > + cm_id_priv->id.remote_addr = iw_event->remote_addr; > + cm_id_priv->state = IW_CM_STATE_ESTABLISHED; > + } else { > + /* REJECTED or RESET */ > + cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp); > + cm_id_priv->qp = NULL; > + cm_id_priv->state = IW_CM_STATE_IDLE; > + } > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > + ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event); > + break; > + default: > + BUG_ON(1); > + } > + /* Wake up waiters on connect complete */ > + wake_up_all(&cm_id_priv->connect_wait); > + > + return ret; > +} > + > +/* > + * CM_ID <-- CLOSING > + * > + * If in the ESTABLISHED state, move to CLOSING. > + */ > +static void cm_disconnect_handler(struct iwcm_id_private *cm_id_priv, > + struct iw_cm_event *iw_event) > +{ > + unsigned long flags; > + > + spin_lock_irqsave(&cm_id_priv->lock, flags); > + if (cm_id_priv->state == IW_CM_STATE_ESTABLISHED) > + cm_id_priv->state = IW_CM_STATE_CLOSING; > + spin_unlock_irqrestore(&cm_id_priv->lock, flags); > +} > + > +/* > + * CM_ID <-- IDLE > + * > + * If in the ESTBLISHED or CLOSING states, the QP will have have been > + * moved by the provider to the ERR state. Disassociate the CM_ID from > + * the QP, move to IDLE, and remove the 'connected' reference. > + * > + * If in some other state, the cm_id was destroyed asynchronously. > + * This is the last reference that will result in waking up > + * the app thread blocked in iw_destroy_cm_id. > + */ > +static int cm_close_handler(struct iwcm_id_private *cm_id_priv, > + struct iw_cm_event *iw_event) > +{ > + unsigned long flags; > + int ret = 0; > + /* TT */printk("%s:%d cm_id_priv=%p, state=%d\n", > + __FUNCTION__, __LINE__, > + cm_id_priv,cm_id_priv->state); Will want to remove this. - Sean From rolandd at cisco.com Wed May 31 15:32:08 2006 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 15:32:08 -0700 Subject: [openib-general] [PATCH 1/5] IB: Add client reregister event type In-Reply-To: <20060531223205.10506.51241.stgit@localhost.localdomain> References: <20060531223205.10506.51241.stgit@localhost.localdomain> Message-ID: <20060531223208.10506.53856.stgit@localhost.localdomain> From: Leonid Arsh Add IB_EVENT_CLIENT_REREGISTER to enum so low-level drivers can generate "client reregister" events. Signed-off-by: Leonid Arsh Signed-off-by: Roland Dreier --- include/rdma/ib_verbs.h | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index aeb4fcd..10a6268 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -260,7 +260,8 @@ enum ib_event_type { IB_EVENT_SM_CHANGE, IB_EVENT_SRQ_ERR, IB_EVENT_SRQ_LIMIT_REACHED, - IB_EVENT_QP_LAST_WQE_REACHED + IB_EVENT_QP_LAST_WQE_REACHED, + IB_EVENT_CLIENT_REREGISTER }; struct ib_event { From rolandd at cisco.com Wed May 31 15:32:10 2006 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 15:32:10 -0700 Subject: [openib-general] [PATCH 2/5] IPoIB: Handle client reregister events In-Reply-To: <20060531223205.10506.51241.stgit@localhost.localdomain> References: <20060531223205.10506.51241.stgit@localhost.localdomain> Message-ID: <20060531223210.10506.69085.stgit@localhost.localdomain> From: Leonid Arsh Handle client reregister events by treating them just like LID or SM changes -- flush all cached paths and rejoin multicast groups. Signed-off-by: Leonid Arsh Signed-off-by: Roland Dreier --- drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c index 1d49d16..7b717c6 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c @@ -255,7 +255,8 @@ void ipoib_event(struct ib_event_handler record->event == IB_EVENT_PKEY_CHANGE || record->event == IB_EVENT_PORT_ACTIVE || record->event == IB_EVENT_LID_CHANGE || - record->event == IB_EVENT_SM_CHANGE) { + record->event == IB_EVENT_SM_CHANGE || + record->event == IB_EVENT_CLIENT_REREGISTER) { ipoib_dbg(priv, "Port state change event\n"); queue_work(ipoib_workqueue, &priv->flush_task); } From rolandd at cisco.com Wed May 31 15:32:13 2006 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 15:32:13 -0700 Subject: [openib-general] [PATCH 3/5] IB: Move struct port_info from ipath to In-Reply-To: <20060531223205.10506.51241.stgit@localhost.localdomain> References: <20060531223205.10506.51241.stgit@localhost.localdomain> Message-ID: <20060531223212.10506.84411.stgit@localhost.localdomain> From: Leonid Arsh Move ipath's struct port_info into , so that it can be used by mthca to implement client reregister support. Remove the __attribute__((packed)) because all the members of the struct are naturally aligned anyway. Signed-off-by: Leonid Arsh Signed-off-by: Roland Dreier --- drivers/infiniband/hw/ipath/ipath_mad.c | 40 ++----------------------------- include/rdma/ib_smi.h | 36 ++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 38 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c b/drivers/infiniband/hw/ipath/ipath_mad.c index f7f8391..49acd1e 100644 --- a/drivers/infiniband/hw/ipath/ipath_mad.c +++ b/drivers/infiniband/hw/ipath/ipath_mad.c @@ -137,47 +137,11 @@ static int recv_subn_get_guidinfo(struct return reply(smp); } -struct port_info { - __be64 mkey; - __be64 gid_prefix; - __be16 lid; - __be16 sm_lid; - __be32 cap_mask; - __be16 diag_code; - __be16 mkey_lease_period; - u8 local_port_num; - u8 link_width_enabled; - u8 link_width_supported; - u8 link_width_active; - u8 linkspeed_portstate; /* 4 bits, 4 bits */ - u8 portphysstate_linkdown; /* 4 bits, 4 bits */ - u8 mkeyprot_resv_lmc; /* 2 bits, 3, 3 */ - u8 linkspeedactive_enabled; /* 4 bits, 4 bits */ - u8 neighbormtu_mastersmsl; /* 4 bits, 4 bits */ - u8 vlcap_inittype; /* 4 bits, 4 bits */ - u8 vl_high_limit; - u8 vl_arb_high_cap; - u8 vl_arb_low_cap; - u8 inittypereply_mtucap; /* 4 bits, 4 bits */ - u8 vlstallcnt_hoqlife; /* 3 bits, 5 bits */ - u8 operationalvl_pei_peo_fpi_fpo; /* 4 bits, 1, 1, 1, 1 */ - __be16 mkey_violations; - __be16 pkey_violations; - __be16 qkey_violations; - u8 guid_cap; - u8 clientrereg_resv_subnetto; /* 1 bit, 2 bits, 5 */ - u8 resv_resptimevalue; /* 3 bits, 5 bits */ - u8 localphyerrors_overrunerrors; /* 4 bits, 4 bits */ - __be16 max_credit_hint; - u8 resv; - u8 link_roundtrip_latency[3]; -} __attribute__ ((packed)); - static int recv_subn_get_portinfo(struct ib_smp *smp, struct ib_device *ibdev, u8 port) { struct ipath_ibdev *dev; - struct port_info *pip = (struct port_info *)smp->data; + struct ib_port_info *pip = (struct ib_port_info *)smp->data; u16 lid; u8 ibcstat; u8 mtu; @@ -312,7 +276,7 @@ static int recv_subn_set_guidinfo(struct static int recv_subn_set_portinfo(struct ib_smp *smp, struct ib_device *ibdev, u8 port) { - struct port_info *pip = (struct port_info *)smp->data; + struct ib_port_info *pip = (struct ib_port_info *)smp->data; struct ib_event event; struct ipath_ibdev *dev; u32 flags; diff --git a/include/rdma/ib_smi.h b/include/rdma/ib_smi.h index 87f6073..f29af13 100644 --- a/include/rdma/ib_smi.h +++ b/include/rdma/ib_smi.h @@ -85,6 +85,42 @@ #define IB_SMP_ATTR_VENDOR_DIAG __cons #define IB_SMP_ATTR_LED_INFO __constant_htons(0x0031) #define IB_SMP_ATTR_VENDOR_MASK __constant_htons(0xFF00) +struct ib_port_info { + __be64 mkey; + __be64 gid_prefix; + __be16 lid; + __be16 sm_lid; + __be32 cap_mask; + __be16 diag_code; + __be16 mkey_lease_period; + u8 local_port_num; + u8 link_width_enabled; + u8 link_width_supported; + u8 link_width_active; + u8 linkspeed_portstate; /* 4 bits, 4 bits */ + u8 portphysstate_linkdown; /* 4 bits, 4 bits */ + u8 mkeyprot_resv_lmc; /* 2 bits, 3, 3 */ + u8 linkspeedactive_enabled; /* 4 bits, 4 bits */ + u8 neighbormtu_mastersmsl; /* 4 bits, 4 bits */ + u8 vlcap_inittype; /* 4 bits, 4 bits */ + u8 vl_high_limit; + u8 vl_arb_high_cap; + u8 vl_arb_low_cap; + u8 inittypereply_mtucap; /* 4 bits, 4 bits */ + u8 vlstallcnt_hoqlife; /* 3 bits, 5 bits */ + u8 operationalvl_pei_peo_fpi_fpo; /* 4 bits, 1, 1, 1, 1 */ + __be16 mkey_violations; + __be16 pkey_violations; + __be16 qkey_violations; + u8 guid_cap; + u8 clientrereg_resv_subnetto; /* 1 bit, 2 bits, 5 */ + u8 resv_resptimevalue; /* 3 bits, 5 bits */ + u8 localphyerrors_overrunerrors; /* 4 bits, 4 bits */ + __be16 max_credit_hint; + u8 resv; + u8 link_roundtrip_latency[3]; +}; + static inline u8 ib_get_smp_direction(struct ib_smp *smp) { From rolandd at cisco.com Wed May 31 15:32:16 2006 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 15:32:16 -0700 Subject: [openib-general] [PATCH 4/5] IB/mthca: Add client reregister event generation In-Reply-To: <20060531223205.10506.51241.stgit@localhost.localdomain> References: <20060531223205.10506.51241.stgit@localhost.localdomain> Message-ID: <20060531223215.10506.28838.stgit@localhost.localdomain> From: Leonid Arsh Change the mthca snoop of MADs that set PortInfo to check if the SM has set the client reregister bit, and if it has, generate a client reregister event. If the bit is not set, just generate a LID change event as usual. Signed-off-by: Leonid Arsh Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mthca/mthca_mad.c | 14 +++++++++++--- 1 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c index 4730863..d9bc030 100644 --- a/drivers/infiniband/hw/mthca/mthca_mad.c +++ b/drivers/infiniband/hw/mthca/mthca_mad.c @@ -114,14 +114,22 @@ static void smp_snoop(struct ib_device * mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && mad->mad_hdr.method == IB_MGMT_METHOD_SET) { if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { + struct ib_port_info *pinfo = + (struct ib_port_info *) ((struct ib_smp *) mad)->data; + mthca_update_rate(to_mdev(ibdev), port_num); update_sm_ah(to_mdev(ibdev), port_num, - be16_to_cpup((__be16 *) (mad->data + 58)), - (*(u8 *) (mad->data + 76)) & 0xf); + be16_to_cpu(pinfo->lid), + pinfo->neighbormtu_mastersmsl & 0xf); event.device = ibdev; - event.event = IB_EVENT_LID_CHANGE; event.element.port_num = port_num; + + if(pinfo->clientrereg_resv_subnetto & 0x80) + event.event = IB_EVENT_CLIENT_REREGISTER; + else + event.event = IB_EVENT_LID_CHANGE; + ib_dispatch_event(&event); } From rolandd at cisco.com Wed May 31 15:32:18 2006 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 15:32:18 -0700 Subject: [openib-general] [PATCH 5/5] IB/ipath: Add client reregister event generation In-Reply-To: <20060531223205.10506.51241.stgit@localhost.localdomain> References: <20060531223205.10506.51241.stgit@localhost.localdomain> Message-ID: <20060531223218.10506.76076.stgit@localhost.localdomain> From: Roland Dreier Generate client reregister event instead of LID change event when client reregister bit is set. Signed-off-by: Roland Dreier --- drivers/infiniband/hw/ipath/ipath_mad.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c b/drivers/infiniband/hw/ipath/ipath_mad.c index 49acd1e..1a9d0a2 100644 --- a/drivers/infiniband/hw/ipath/ipath_mad.c +++ b/drivers/infiniband/hw/ipath/ipath_mad.c @@ -409,7 +409,7 @@ static int recv_subn_set_portinfo(struct if (pip->clientrereg_resv_subnetto & 0x80) { clientrereg = 1; - event.event = IB_EVENT_LID_CHANGE; + event.event = IB_EVENT_CLIENT_REREGISTER; ib_dispatch_event(&event); } From rdreier at cisco.com Wed May 31 15:38:18 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 15:38:18 -0700 Subject: [openib-general][PATCH 1 of 3] repost: Client Reregister support for kernel space In-Reply-To: <20060509060958.GA482@voltaire.com> (Leonid Arsh's message of "Tue, 9 May 2006 09:09:58 +0300") References: <20060509060958.GA482@voltaire.com> Message-ID: OK, I cleaned up your patches and applied the following to my for-2.6.18 tree. I think all of my changes were fixes and/or cleanups, but you may want to check that I didn't break anything -- I'm sending the 5 patches I ended up with to the list. - R. From rdreier at cisco.com Wed May 31 15:49:14 2006 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 31 May 2006 15:49:14 -0700 Subject: [openib-general] [PATCH 5/5] IB/ipath: Add client reregister event generation In-Reply-To: <20060531223218.10506.76076.stgit@localhost.localdomain> (Roland Dreier's message of "Wed, 31 May 2006 15:32:18 -0700") References: <20060531223205.10506.51241.stgit@localhost.localdomain> <20060531223218.10506.76076.stgit@localhost.localdomain> Message-ID: BTW to ipath maintainers -- please NAK this patch and feel free to back it out of svn if you don't think it's OK. - R. From fzkhzevldsv at hotmail.com Wed May 31 16:15:37 2006 From: fzkhzevldsv at hotmail.com (boxboydiamond) Date: Wed, 31 May 2006 16:15:37 -0700 (PDT) Subject: [openib-general] By increasing your penis with Penis Enlarge Patch you Message-ID: <20060531231537.24BF522834D@openib.ca.sandia.gov> An HTML attachment was scrubbed... URL: From geoffrey at scandinavianseed.biz Wed May 31 11:57:29 2006 From: geoffrey at scandinavianseed.biz (Gilbert) Date: Wed, 31 May 2006 19:57:29 +0100 Subject: [openib-general] V1agra BUY IT HERE! Message-ID: <000001c6851e$bcd18a80$0100007f@troy-g44korqsj7> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled-2.jpg Type: image/jpeg Size: 20429 bytes Desc: not available URL: From peter at euwest.biz Wed May 31 14:02:52 2006 From: peter at euwest.biz (Gilbert) Date: Wed, 31 May 2006 22:02:52 +0100 Subject: [openib-general] Buy drugs here and save money without a peep! Message-ID: <000001c6851f$7d576e00$0100007f@DDBQBS61> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled-2.jpg Type: image/jpeg Size: 20429 bytes Desc: not available URL: From hitotsumami114 at yahoo.co.jp Wed May 31 19:35:26 2006 From: hitotsumami114 at yahoo.co.jp (hitotsumami114 at yahoo.co.jp) Date: Wed, 31 May 2006 19:35:26 -0700 (PDT) Subject: [openib-general] =?utf-8?b?wpZ7wpPDusKCw6bCgsOowpbCs8KXwr/CgsOM?= =?utf-8?b?woLCssKLw5/Cj8KKwoLCs8KCw7HCklTCgsK1?= Message-ID: 20030926185847.94737mail@mail.love-woman889889_gogo-server114_freesystem01_freefree-lovelove.tv ���S�����ł��ߏ������T���܂��񂩁H �G���g���[���烁�[������M��܂߂đS�Ė����ł��B �߂��݁@23�΁@�t���[�^�[ �薼�F���b�Z���܂��񂩁H �Ƃ�PC����̂ňꏏ��yahoo���b�Z���W���[�ł���܂��񂩁H �Ȃ񂩖����ދ����悧�B�҂��Ă܂��ˁB http://yaii.net/htm �ʍ��@27�΁@OL �薼�F�������ꂾ�����āc �͂����茾���ė~���s���ł��B�������ꂾ�����đʖڂȂ̂��ȁH �����ꂽ���������Ăق����ł��B�������������Ĉ����ꂿ�Ⴄ�̂��ȁc�B �T�����Ԃ��邩��A���~�����ł��B http://yaii.net/htm �~�T�L�@34�΁@��w �薼�F�ꉞ�����҂ł����ǁc �T�C�g�ʓ|�����A����Ă��b�o���邩�Ȃ��H �o����΍�����������ł����ǁc �ꉞ�����҂ł����Ǖv����͌�������Ă܂�����c�B �閧����o����l���肢���܂��B http://yaii.net/htm ======================================================= �l�Ȕ���http://yaii.net/htm ======================================================= From admin at aol.com Wed May 31 19:55:38 2006 From: admin at aol.com (Aol) Date: Thu, 1 Jun 2006 04:55:38 +0200 (CEST) Subject: [openib-general] Aol New Message-ID: <20060601025538.644F343B84@dd5436.kasserver.com> to have a domain name .com, .fr .net with lodging and php, mysql during 2 years free at AOL but it is necessary to have a bank card to ensure that your site did not tighten out the law http://aol.ift.fr ----------------------------------------------- avoir un nom de domaine .com , .fr .net avec hebergement et php , mysql pendant 2 ans gratuit chez aol mais il faut avoir une carte bancaire pour assurer que votre site ne serra pas hors la lois http://aol.ift.fr From erika2006 at cooltoad.com Wed May 31 21:07:57 2006 From: erika2006 at cooltoad.com (erika2006 at cooltoad.com) Date: Wed, 31 May 2006 21:07:57 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCPCtKLCRAJDEkTjBZJEsbKEI=?= =?iso-2022-jp?b?GyRCSnQ7RSQ3JEYkLyRsJGtIa0wpJE49d0AtPlIycBsoQiAg?= =?iso-2022-jp?b?ICAgICAgICAwMDMyMQ==?= Message-ID: 20060601114608.82444mail@mail.hyper_luckylady8754158754_lookserver772_serebusystem03_woman-luckylady.tv $BFMA3$N$4O"Mm<:NiCW$7$^$9!#(B $B:#$+$i$$$/$D$+$N=w at -$KL\$,$$$/!#(B $B-"<+J,$K<+?.$,$"$k$,$*6b$OL5$$!#(B $B-#%a%kM'$,$$$k!#(B $B-$<+J,$K<+?.$,$J$/=w at -$,6lR2p$,CY$l$^$7$?$,!"Ev%/%i%V$O>e5-$N]$KAO at _$7$?!"!X%5%]!<%H at lMQ=w at ->R2p=j!Y$K$J$j$^$9!#(B $B$3$NEY!"$*5RMM$K2?$NCG$j$b$J$/O"MmCW$7$^$7$?;v$r?4$+$i(B $B$*OM$S?=$7>e$2$^$9!#(B $B$H$$$&$N$b!"8=:_=w at -$N2q0w?t$KCK at -$N2q0w?t$,DI$$$D$+$J$$(B $B>u67$H$J$C$F$*$j$3$N$h$&$J7A$G$4O"MmCW$7$^$7$?!#(B $B"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#(B $B!!!!!!(B http://luckylady.cx/h/ $B"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#"""#(B $B!z=w at -2q0wMM$N$[$H$s$I$NJ}$,6bA,E*$KM>M5$O$"$k$N$K(B $B:#$N at 83h$KK~B-$G$-$:!"$b$N$?$j$J$5$r46$8$k$H$$$&J}$,(B $B$[$H$s$I$G$9!#(B $B!X$O$C$-$j8@$C$FCK at -$K52$($F$$$^$9!#!Y(B $B:#$3$N2hLL$r8+$F$$$k$"$J$?$N$h$&$JJ}$HCN$j9g$$!"(B $B<+J,$rK~$?$7$F$/$l$l$P$$$/$i$G$bR2p!Y$N9`L\$r(B $B8+$l$P!"$"$J$?9%$_$N=w at -$,I,$:8+$D$+$j$^$9!#(B $B3d$j at Z$C$?$*IU$-9g$$$G<+J,$@$1$N0Y$K?T$/$7$F$/$l$k(B $BHkL)$N=w at -!&!&!&$$$+$,$G$7$g$&$+!)(B $B$b$A$m$s%;%l%V$J=w at -AjfIW$G$b(B $B$=$N;~$K$J$C$F$7$^$C$?$i4JC1$K>R2p$9$k;v$,Fq$7$/$J$j$^$9!#(B $B$3$N%A%c%s%9$rF($9;v$J$/@'Hs$43hMQ2<$5$$!#(B $B"'"$"'"$"'"$"'"$"'"$"'"$"'"$"'"$"'"$"'"$"'"$"'"$"'"$"'"$"'"$(B $B!!!!!!(B http://luckylady.cx/h/ $B"%"&"%"&"%"&"%"&"%"&"%"&"%"&"%"&"%"&"%"&"%"&"%"&"%"&"%"&"%"&(B $BD9J8<:NiCW$7$^$7$?!#!!!!!!!Z$4O"Mm?4$h$j$*BT$A$7$F$*$j$^$9![(B From eitan at mellanox.co.il Wed May 31 22:35:26 2006 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 01 Jun 2006 08:35:26 +0300 Subject: [openib-general] [PATCH 4/5] IB/mthca: Add client reregister event generation In-Reply-To: <20060531223215.10506.28838.stgit@localhost.localdomain> References: <20060531223205.10506.51241.stgit@localhost.localdomain> <20060531223215.10506.28838.stgit@localhost.localdomain> Message-ID: <447E7C9E.1060907@mellanox.co.il> Hi Roland, Is there a reason why the LID_CHANGE event is happening even if the LID did not change? Roland Dreier wrote: > From: Leonid Arsh > > Change the mthca snoop of MADs that set PortInfo to check if the SM > has set the client reregister bit, and if it has, generate a client > reregister event. If the bit is not set, just generate a LID change > event as usual. > > Signed-off-by: Leonid Arsh > Signed-off-by: Roland Dreier > --- > > drivers/infiniband/hw/mthca/mthca_mad.c | 14 +++++++++++--- > 1 files changed, 11 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c > index 4730863..d9bc030 100644 > --- a/drivers/infiniband/hw/mthca/mthca_mad.c > +++ b/drivers/infiniband/hw/mthca/mthca_mad.c > @@ -114,14 +114,22 @@ static void smp_snoop(struct ib_device * > mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && > mad->mad_hdr.method == IB_MGMT_METHOD_SET) { > if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { > + struct ib_port_info *pinfo = > + (struct ib_port_info *) ((struct ib_smp *) mad)->data; > + > mthca_update_rate(to_mdev(ibdev), port_num); > update_sm_ah(to_mdev(ibdev), port_num, > - be16_to_cpup((__be16 *) (mad->data + 58)), > - (*(u8 *) (mad->data + 76)) & 0xf); > + be16_to_cpu(pinfo->lid), > + pinfo->neighbormtu_mastersmsl & 0xf); > > event.device = ibdev; > - event.event = IB_EVENT_LID_CHANGE; > event.element.port_num = port_num; > + > + if(pinfo->clientrereg_resv_subnetto & 0x80) > + event.event = IB_EVENT_CLIENT_REREGISTER; > + else > + event.event = IB_EVENT_LID_CHANGE; > + > ib_dispatch_event(&event); > } > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From krwdthxkuijzl at yahoo.com Wed May 31 11:55:59 2006 From: krwdthxkuijzl at yahoo.com (Sabrina Drummond) Date: Wed, 31 May 2006 10:55:59 -0800 Subject: [openib-general] elbow alongside battalion imprudent adrift frederic cartilaginous incondensable stupendous impediment barrack electro denote tutorial compendia accurate era microscopy belly midwestern pessimist payday magna woody effaceable pascal antipode northrop noteworthy exuberant muse poor peripheral hewitt caught bella Message-ID: <41770.$$.56228.Etrack@hotmail.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: issuance.5.gif Type: image/gif Size: 8467 bytes Desc: not available URL: From fauhb2 at oaclub.com Wed May 31 03:17:37 2006 From: fauhb2 at oaclub.com (Nathan Bell) Date: Wed, 31 May 2006 12:17:37 +0200 Subject: [openib-general] cheap oem soft shipping //orldwide Message-ID: <000001c6867a$48abf980$0100007f@localhost> Special Offer Adobe Video Collection Adobe Premiere 1.5 Professional Adobe After Effects 6.5 Professional Adobe Audition 1.5 Adobe Encore DVD 1.5 $149.95 More Info >> Microsoft 2 in 1 MS Windows XP Pro MS Office 2003 Pro $99.95 More Info >> Microsoft + Adobe 3 in 1 MS Windows XP Pro MS Office 2003 Pro Adobe Acrobat 7.0 Professional $149.95 More Info >> Bestsellers Microsoft Office Professional Edition 2003 Rating: 6 reviews Retail price: $550.00 You save: $480.05 (87%) Our price: $69.95 [Add to cart] Microsoft Windows XP Professional Rating: 8 reviews Retail price: $200.00 You save: $150.05 (75%) Our price: $49.95 [Add to cart] Adobe Photoshop CS2 V 9.0 Rating: 3 reviews Retail price: $599.00 You save: $529.05 (88%) Our price: $69.95 [Add to cart] -------------- next part -------------- An HTML attachment was scrubbed... URL: